initial data analysis distinctions. some distinctions population vs. sample descriptive vs....

19
Initial Data Analysis DISTINCTIONS

Upload: aubrie-morris

Post on 28-Dec-2015

230 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Initial Data Analysis

DISTINCTIONS

Page 2: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Some Distinctions

Population vs. Sample

Descriptive vs. Inferential stats

Variables

Types of data Quantitative versus Categorical Measurement scales

Page 3: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Population

The entire collection of events that you are interested in generalizing to. For example, our population could be the

students in this class, UNT students, all students in U.S., people in general.

Although we wish to make claims about the entire population, it is often too large to deal with, and so we will take a portion of it to study. Random sampling

Page 4: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Random Sampling

Choose a subset (sample) of the population ensuring that each member of the population has an equivalent chance of being sampled.

Examine that sample and use your observations to draw inferences about the population.

Example : Voting polls, television ratings, rolling a die.

Page 5: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Random Sampling

Note, however, that the inferences drawn are only as good as the representativeness of the sample.

If the sample is not random, it may not be representative of the population. When a sample is not representative of its parent population, the external validity of any inference is called into question i.e. how well will we be able to generalize? Example : Most psychology studies and freshman

psych students.

Page 6: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Random Assignment

When studying the effects of some treatment variable in an experimental fashion, it is also important to randomly assign subjects to treatments. Control1 vs. Experimental group

Oftentimes we want to look at the effects of some treatment e.g. a drug, teaching strategy, memory technique etc.

To study the effects of the treatment we’ll often give one or more groups the treatment and one group no treatment and then compare the groups

Random assignment reduces the likelihood that groups differ in some critical way other than the treatment since everyone has an equal chance to be put in one of the treatment groups.

Page 7: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Random Assignment

If random assignment is not used then the internal validity of the experimental results may be compromised i.e. are our results due to the treatment we’ve imposed or something else?

Example: don’t randomly assign male/females to receive treatment -> effects seen due to gender rather than treatment

Page 8: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Variables

Assume we have a random sample of subjects that we have randomly assigned to treatment groups. Example: Stop-smoking study.

Now we must select the variables we wish to study, with the term variable referring to a property of an object or event that can take on different values. Example: # of cigs smoked, abstinence after one week (yes or

no).

Note the distinction; # of cigarettes smoked is a continuous variable, whereas abstinence is a categorical variable.

A variable is to be contrasted with a constant, that which only takes on one value.1

Page 9: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Types of Data

Measurement (quantitative, magnitude) Data Continuous vs. Discrete

Example: GPA during college vs. GPA for class Example: 9 point “Likert” scale- continuous or discrete? 20

point?

Categorical (frequency, nominal, qualitative) Data Named data e.g. different brands, political party, race,

genderHow you think about your data and what scale of

measurement your variables are is very important. What you decide about the variable will have a say on the analyses available, and even possibly even have vast effect on the theory itself. Early developmental theories suggested clear cut stages

which imply categories of development

Page 10: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

A Note about Categorical Variables

Consider the grouping variable in which people are classified as In-patient, Out-patient, and Control groups.

There is only one variable in a theoretical sense, and our goal is to determine the relationship between the grouping variable and the outcome, and an ANOVA table in this sense would speak to the overall effect group membership has on the outcome.

Statistically speaking there are actually two coded variables if we are applying the general linear model See dummy coding, effects coding etc.1

What you want to keep in mind is that a single group has no relationship with the outcome, as membership in a single group is a constant.

Page 11: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Variables

Another distinction related to variables concerns those that we are interested in understanding and explaining (dependent, criterion, outcome variables) versus those we expect to have an effect (perhaps causal) on that outcome, and which we may be able to manipulate or have control over experimentally (independent, predictor variables). Which one is predicting and which is being predicted?

Both predictor and dependent variables can be quantitative or categorical Example: Whether or not we give a subject the stop-smoking

treatment would be the independent variable, and the # of cigarettes smoked would be a dependent variable.

Other examples: age:income, shoe size:intelligence, gender:hostility, intelligence:voting outcome

Page 12: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Parameters and Statistics

Parameters are simply values associated with the population and as such are often inferred rather than actually known They are designated with Greek notation e.g. for the

population mean“Statistics” in this sense speak specifically to the

data set at hand (the sample) and make no reference to values outside the sample.

Using statistics we have collected we will then infer the population values (parameters) e.g. use the sample mean to infer

Most commonly employed methods assume a fixed population parameter

Page 13: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

What Do We Do With The Data?

Descriptive statistics are used to describe the data set itself without reference to the population from which it is derived. Examples: graphing, calculating, averages,

looking for extreme scores.

Exploratory/Initial data analysis (Tukey, Chatfield, others) typically relies on descriptive information most

Page 14: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

What Do We Do With The Data?

Inferential statistics allow you to infer something about the parameters of the population based on the statistics of the sample, and the various tests we perform on the sample.

Examples: Chi-Square, T-Tests, Correlations, ANOVA

Page 15: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Measurement Scales

Nominal category labels assigned in some meaningful way (e.g. gender,

political party)

Ordinal orders or ranks objects on some continuum (e.g. military ranks)

Interval Can speak of differences between scale points, arbitrary zero

point (Fahrenheit scale- 30°-20°=20°-10°, but 20°/10° is not twice as hot!)

Probably most common in psych

Ratio Same as interval but with true zero point (distance, weight,

Kelvin- physical measurements). Ratios are interval scales too but not the other way around.

Page 16: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Scales

There is much debate with regard to scale distinction and how to deal with different data types. Even some types of data seem to qualify as more

than one type. Although some analyses will result in the

same outcome whatever you want to call your data, which analysis you perform may be affected by what you see the underlying construct to be, and so it is important that you give it some thought.

Page 17: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Decision tree

Page 18: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Decision Trees

While decision trees might be helpful, they are, at best, a suggestion, and should never be used as a definitive statement on what analysis to do

Example: political party (republican vs. democrats) and 3 ‘continuous’ attitude measures (gun control, abortion, Iraq) Simple enough right?

Possible analyses Purely descriptive assessment Simple correlations 3 t-tests

Classical, Non-parametric, or Robust? Bootstrap, Mann-Whitney, Differences on M-estimators rather than means etc.

MANOVA Differences on the linear combination of continuous

Discriminant Function Analysis or Logistic regression Predicting party based on attitude (classification)

Factor analysis on continuous, t-test on factor scores

Page 19: Initial Data Analysis DISTINCTIONS. Some Distinctions Population vs. Sample Descriptive vs. Inferential stats Variables Types of data  Quantitative versus

Analyses

The point is you will always have an option for how to both understand and describe, as well as analyze the data

What you must do is work this out BEFORE collecting the data

While you will still have the options on how to analyze the data in the end, those options should be known before anything is collected

To collect without an analysis in mind that will test your theoretical model is to waste one’s time, and suggests a lack of truly understanding just what your theory/hypothesis is