Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key


Play button


Play button




Click to flip

30 Cards in this Set

  • Front
  • Back
an item of information
Data Warehouse
large data base of information collected by company
Data mining
using data to make predictions or make decisions
Metadata- contains all information about data
Who-Specific case, what data is describing
What-what about case was recorded/ measured
Why- reason for examining data
Where- actual location
Rows- cases
Columns- variable
Respondent- individuals in survey
Subjects/participants- people in experiment
Experimental units- if not people in experiment
Relational database
two or more tables linked together so info can be merged across them.
-adds clarity
-keep track of transactions better instead of having one huge data table with many columns for just one customer
Categorical Vs. Quantitative
categorical- can't use math, doesn't have specific numerical units. Nominal- categorical Ordinal- intrinsic order involved such as Freshman, Sophomore, Junior, Senior.
Quantitative- numerical, it has UNITS ; PERCENTAGES
Identifier variable
unique type of categorical, assigned to each individual.
Example- Social security, ID number
Time Series VS Cross Sectional
time series- variables measured at regular intervals of time. Ex- Every week, month, year
Cross Sectional- Several variables measured at relatively same point in time.
3 Rules of Sampling
1) Make a sample- examine a part of a whole.
2) Randomization ensures every member of population is accounted for
3) Sample size is what matters not size of population
entire group of individuals in which we hope to learn from
Population parameter
the valued answer of the population
Sampling frame
what list you are choosing from for the sample
the subset that responds/ represents the data that is used to learn from

The size of the sample is what matters not the size of the population (as long as sample is representative)
Voluntary Response bias
when individuals can choose on their own if they wish to participate in sample.
-People who participate are more likely to polarize on whatever the issue is.
Undercoverage bias
when some portion of the population is not sampled at all
Nonresponse bias
large fraction of those sampled failed to respond
Response Bias
when a survey design influences responses
Sampling error/ variability

Measurement error
differences in responses between random samples

built in bias of sampling.
Convenience sampling
sample consisting of individuals who are conveniently available
Simple Random Sample
a sample drawn so that every possible sample of the size we plan to draw has equal chance of being selected.
Stratified Random Sampling
Put population into homogeneous groups and then use random sampling within each stratum
-ensures sample represents diff groups in population
Cluster Sampling
Putting population into clusters at random and perform census within each cluster
-Useful for when you don't have a big list to choose from, ex- getting poles from counties
-more practical
-saves money
Systematic Sampling
selecting individuals in a selected order.
Multistage Sampling
a more complicated form of cluster sampling in which larger clusters are further subdivided into smaller, more targeted groupings for the purposes of surveying.
Frequency Vs Relative Frequency table
Frequency- shows the number of a variable
Relative- shows the percentage of the variable to the whole
Area Principle
can't represent data with 2 different dimensions.
-must fix height or width and change other
Contingency table
table that shows how an individual is distributed under one variable which is contingent upon another
no relationship between 2 variables
Simpson's Paradox
when percentages across groups contradict the overall percentages.
- Group A) 90/100= 90% and 10/20=50% Total: 83%
- Group B) 19/20=95% and 75/100= 75% Total: 78%