analyzing the results of an experiment… -not straightforward.. –why not?
TRANSCRIPT
Analyzing the Results of an Experiment…
• -not straightforward..
– Why not?
Variability and Random/chance outcomes
Inferential Statistics
• Statistical analysis appropriate for inferring causal relationships and effects.
• Many different formulas…which one do you use?
Inferential Stat selection
• -Determine that you are analyzing the results of an experimental manipulation, not a correlation
• Identify the IV and DV.
• The IV Will always be nominal on some level, even when it may seem to be continuous..low, medium and high doses of a drug
Inf. Stat Selection
• What is the scale of the DV?
– Scale of DV -Statistic to use
Nominal Chi-squared
Ordinal Mann-Whitney U-test
Continuous T-test or ANOVA
t-test or ANOVA?
How many levels of the IV are there?
2 levels more than 2 levels
T-test or ANOVA ANOVA
There are different forms of T-tests and ANOVA’s:
Did the Study Use a Within Group or Between group Experimental Design?
Between Group Within Group
Only 2 levels of the IV Unpaired t-tests (or “t for independent samples”).
“Paired t-tests ( or “t for dependent samples”)
Or…ANOVA ( the basic ANOVA is fitted for between group designs)
Or…Within group ANOVA (often referred to as a “repeated measures ANOVA”)
More than 2 levels of the IV
ANOVA Repeated Measures ANOVA
In some ways all inferential Stats are similar.
• They calculate the probability that a result was due to the IV as opposed to random variability…
• Let’s focus on the Basic ANOVA since it is likely to be the statistic you may use most commonly.
ANOVA
• ANOVA produces an F-value.
• F values are the ratio of overall between group Variability to the Mean within group variability
Between Var. (+ chance) /Mean within grp.
Variability (+ chance)
What does this mean?
Lets suppose:
• Experiment- IV marijuana– Control– Placebo control– Low dose– High dose
Dependent Variable is:
• Performance on a short term memory task measured number correct out of 10 test items.
• 9 subjects in each group
Possible out come 1
Possible Outcome 1
Control Placebo Low dose High dose
• 4 2 2 2• 5 3 3 3• 6 4 4 5• 5 6 4 3• 5 5 5 4• 6 5 4 4• 4 4 5 4• 3 4 6 6• 7 3 3 5
Distribution of scores for control sample
0
.5
1
1.5
2
2.5
3
3.5
Cou
nt
0 2 4 6 8 10 12control
Placebo scores
0
.5
1
1.5
2
2.5
3
3.5C
ount
0 2 4 6 8 10 12placebo
Low dose scores
0
.5
1
1.5
2
2.5
3
3.5C
ount
0 2 4 6 8 10 12low
High dose scores
0
.5
1
1.5
2
2.5
3
3.5
Cou
nt
0 2 4 6 8 10 12high
The population distribution of scores
0
2
4
6
8
10
12C
ount
0 1 2 3 4 5 6 7 8 9 10 11population
F value relatively low
Highlow placebo
control
Between grp. Var
w/in grp. var
Now consider this: Possible Outcome 2
Control Placebo Low dose High dose
• 4 2 2 2• 5 3 3 3• 6 4 4 5• 5 6 4 3• 5 5 5 4• 6 5 4 4• 4 4 5 4• 3 4 6 6• 7 3 3 5
Distribution of scores for control sample
0
.5
1
1.5
2
2.5
3
3.5C
ount
0 2 4 6 8 10 12control
Placebo scores
0
.5
1
1.5
2
2.5
3
3.5C
ount
-2 0 2 4 6 8 10 12placebo
Low dose scores
0
.5
1
1.5
2
2.5
3
3.5C
ount
0 2 4 6 8 10 12low
High dose scores
0
.5
1
1.5
2
2.5
3
3.5C
ount
0 2 4 6 8 10 12high
F value relatively High
Highlow placebo
control
Between grp. Var
w/in grp. var
The high F value reflects
• Logic!
• Distribution of score are much more obviously separated, and in this case are completely non-overlapping
• Low F values indicate highly overlapping score distributions
So how do we decide if an F value is large enough to consider the result as causal?
• We consult a table of established probabilities of different F values, within the context of Degree of freedom terms:
ANOVA Significance table
Where is/are the difference (s)?
0
10
20
30
40
50
60
70
Neutral Positive Negative Sex Drug Taboo
Neutral
Positive
Negative
Sex
Drug
Taboo
Inferential Statistics
The story of “Scratch”
Why not jus use repeated t-tests? Probability pyramiding
• 15 t-tests required for this data set
• Post-hocs include compensations for repeated testing of a large data set
0
10
20
30
40
50
60
70
Neutral Positive Negative Sex Drug Taboo
Neutral
Positive
Negative
Sex
Drug
Taboo
After all this where so we stand?We can still be wrong.
Factors that affect “power.”Sample size
One vs two-tailed testing
• Effect size