hypothesis testing chapter 13. hypothesis testing decision-making process statistics used as a tool...
TRANSCRIPT
Hypothesis TestingHypothesis Testing
Chapter 13
Hypothesis TestingHypothesis Testing
Decision-making processStatistics used as a tool to assist with
decision-makingScientific hypothesis is a statement of the
predicted relationship amongst the variablesNull hypothesis is a statement of no
relationship amongst the variables
Null Hypothesis Not RejectedNull Hypothesis Not Rejected
Total Population
Samplereared inenrichedenvironment
Samplereared insterileenvironment
Null Hypothesis RejectedNull Hypothesis Rejected
Total populationof rats reared insterile environment
Sample usedin study
Total populationof rats reared inenriched environment
Sample usedin study
Hypothesis TestingHypothesis TestingIn Experimental StudiesIn Experimental Studies
Your research design determines the kind of statistical test you will use.
Experimental studies test hypotheses while quasi-experimental studies tend to focus more on generating hypotheses.
Research Research Designs/ApproachesDesigns/Approaches
Type Purpose Time frame
Degree of control
Examples
Experi-mental
Test for cause/
effect relationships
current High Comparing two types of treatments for anxiety.
Quasi-experi-mental
Test for cause/
effect relationships without full control
Current or past
Moderate to high
Gender differences in visual/spatial abilities
Research Research Designs/ApproachesDesigns/Approaches
Type Purpose Time frame
Degree of control
Examples
Non-experimental - corre-lational
Examine relationship between two variables
Current (cross-sectional) or past
Low to medium
Relationship between studying style and grade point average.
Ex post facto
Examine the effect of past event on current functioning.
Past & current
Low to medium
Relationship between history of child abuse & depression.
Research Research Designs/ApproachesDesigns/Approaches
Type Purpose Time frame
Degree of control
Examples
Non-experimental -corre-lational
Examine relat. betw. 2 var. where 1 is measured later.
Future -predictive
Low to moderate
Relat. betw. history of depression & development of cancer.
Cohort-sequen-tial
Examine change in a var. over time in overlapping groups.
Future Low to moderate
How mother-child negativity changed over adolescence.
Research Research Designs/ApproachesDesigns/Approaches
Type Purpose Time frame
Degree of control
Examples
Survey Assess opinions or characteristics that exist at a given time.
Current None or low
Voting preferences before an election.
Quali-tative
Discover potential relationships; descriptive.
Past or current
None or Low
People’s experiences of quitting smoking.
Tests of SignificanceTests of SignificanceThe Question Null Hypothesis Statistical Test
Group Difference between means of 2 diff. groups
H0: g1 = g2 t-independent
Diff. betw. 2 means of related groups
H0: g1a = g1b t-dependent
Diff. betw. means of 3 groups
H0: g1 = g2 = g3 ANOVA
Group Relationships: betw. 2 variables
H0: xy = 0 t-test for sig. Of correlation
Group Relationships: betw. 2 correlations
H0: ab = cd t-test for sig. Of diff. betw. 2 corr.
Experimental DesignsExperimental DesignsExamines differences between experimentally
manipulated groups or variables (e.g., one group gets a certain drug and the other gets a placebo).
At minimum, experimental (independent) variable has two levels (e.g., drug vs. placebo).– Advantage is that you can determine causality.– Disadvantage is cost and many variables cannot
be experimentally manipulated (e.g., smoke exposure over time).
Null HypothesisNull HypothesisSignificance TestingSignificance Testing
Null hypothesis– Results are due to “chance” – H0
Alternative (scientific) hypothesis– Results are due to a true “effect”– H1
Null HypothesisNull HypothesisSignificance TestingSignificance Testing
Null hypothesis– Results are due to “chance” (H0)
Alternative (scientific) hypothesis– Results are due to a true “effect” (H1)
Assess– Assuming H0 is true, what is the probability or
“chance” of obtaining the data we did?
Null HypothesisNull HypothesisSignificance TestingSignificance Testing
Null hypothesis– Results are due to “chance” (H0)
Alternative (scientific) hypothesis– Results are due to a true “effect” (H1)
Assess– Assuming H0 is true, what is the probability or
“chance” of obtaining the data we did?Decide
– If the chance is small enough, reject H0 and infer the “effect” is real.
Experimental Designs:Experimental Designs:Hypothesis TestingHypothesis Testing
Type of Experim ental Research Design
In d ep en d en tsam p les t-tes
Tw o g rou p s
O n e-w ayA N O V A
M ore th antw o g rou p s
O n e in d ep en d en tvariab le
Tw o-w ayA N O V A
Tw o in d ep en d en tvariab les
N u m b er o fin d ep en d en t
variab les
B etw eenS u b jec t
C orre la tedt-tes ts
Tw o g rou p s o rtw o leve ls o f th e
in d ep en d en t va riab le
R ep ea ted m easu resA N O V A
M ore th an tw o g rou p sor m ore th en tw o leve ls o fth e in d ep en d en t va riab le
N u m b er o f g rou p sor leve ls o f th e
in d ep en d en t va riab le
W ith inS u b jec t
Parametric Vs. Non-Parametric Parametric Vs. Non-Parametric Statistics: Two-Sample CasesStatistics: Two-Sample Cases
Level of measurement
Related Samples Independent Samples
Nominal McNemar test Fisher exactX2 test
Ordinal Sign testWilcoxon matched-pairs sign test
Median testMann-Witney U test
Interval T-test for matched pairs
T-independent test
Parametric Vs. Non-Parametric Parametric Vs. Non-Parametric Statistics: > 2-Sample CasesStatistics: > 2-Sample Cases
Level of measurement
Related Samples Independent Samples
Nominal Cochran Q test X2 test
Ordinal Friedman 2-way ANOVA
Kruskal-Wallis one-way ANOVA
Interval Repeated measures ANOVA
ANOVA
Parametric Vs. Non-Parametric Parametric Vs. Non-Parametric Statistics: > 2-Sample CasesStatistics: > 2-Sample Cases
Level of measurement
Correlation
Nominal Contingency coefficient
Ordinal Spearman rank correlationKendall rank correlation, etc.
Interval Pearson’s Correlation Coefficient
Sampling Distribution of Mean Sampling Distribution of Mean Difference ScoresDifference Scores
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Normal Curve
95% of all cases
99% of all cases 0
Critical Values of TCritical Values of T
Need to determine the degrees of freedom– df = N-2
Need to determine the p value for rejecting the null hypothesis (alpha)
Need to determine if this is a 1-tailed or 2-tailed level of significance.
T-ValuesT-Values
T120 = 2.00, p < 0.05
What is one of the major What is one of the major criticisms of employing criticisms of employing
statistical tests of the null statistical tests of the null hypothesis to determine if hypothesis to determine if
effects are true?effects are true?
Limitations of Statistical Tests Limitations of Statistical Tests of the Null Hypothesisof the Null Hypothesis
Does not take into account the size of the difference between means (effect size)
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)
F-ratio = MSbet
MSwithin
Essentially is the between group variance divided by the within group variance.
If the groups come from similar populations, the variances between the groups will be similar to the variance within groups (null hypothesis is not rejected).
ANOVAANOVABetween group variance consists of:
– Variability due to the effect of the independent variable (treatment effect)
– Variability due to chance factors
Within group variance consists of:– Variability in data with the treatment groups that
is due to chance since if treatment effect was consistent, all subjects within a treatment group would experience similar magnitude of effect.
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)
F-ratio = MSbet
MSwithin
The MS refers to the mean square and is the sums of squares divided by the appropriate degrees of freedom.
Df for MSbet is the number of groups minus 1.
Df for MSwithin is the total number of scores in the experiment minus the number of groups.
ANOVAANOVA
MSbet = treatment effect + chance variability
MSwithin = chance variability
Ratio will be 1 if there is no treatment effectF(2,144) = 5.56, p < 0.05.
Two-Way ANOVATwo-Way ANOVA
Where you have 2 independent variables, each having at least 2 levels. For example,– Drug dose (none vs. 5 mg)– Delivery mood (intravenous vs. oral)
Factorial design so you can test both main effects and interaction effects
Mixed Model:Mixed Model:2 Between Subject Factors2 Between Subject Factors
1 within Subject Factor1 within Subject Factor Where you have 2 independent variables, each having
at least 2 levels. For example,– Drug dose (none vs. 5 mg)– Delivery mood (intravenous vs. oral)
One within subject factor with for example 3 levels– Pre-treatment, 3 and 6 months follow-up
Factorial design so you can test both main effects and interaction effects (3-way interaction effects)
Rejecting the Null HypothesisRejecting the Null HypothesisNull hypothesis can be rejected but not
acceptedArguments made for allowing some
flexibility in being able to conclude the null hypothesis is true;– No other studies of the phenomenon have
rejected the null hypothesis– P value for the test of the null hypothesis is
large (e.g., > .20 or .40).– Research design is sufficiently powerful
Errors in Statistical Errors in Statistical Decision-MakingDecision-Making
Type I error – falsely reject the null hypothesis– At p < .05 there is a 5% chance (5 in 100) of
falsely rejecting null hypothesis
Type II error – failing to reject the null hypothesis when it is false
External ValidityExternal Validity
Chapter 14
Goals of Psychology Goals of Psychology ResearchResearch
Goal is to understand the underlying laws governing the behaviour of organisms.
The extent to which the results of your study help inform one about these underlying laws, the more valuable the findings.
Limits to the importance of the findings are the internal/external validity.
External ValidityExternal ValidityExtent to which the results of the study can
be generalized across different persons, settings, and times.
Typically think of generalizing to specific populations (e.g., North American elementary school students) than world at large.
Best safeguard is random selection but not usually feasible.
Threats to External ValidityThreats to External Validity
Lack of population validityLack of ecological validityLack of time validity
Population ValidityPopulation Validity
Generalizing to the defined population (i.e., target population) from which the sample was drawn.
Sample is the experimentally accessible population.
Population ValidityPopulation Validity
TargetPopulation
Experimentallyaccessiblepopulation
Sample
Population ValidityPopulation Validity
Threatened by a selection by treatment interaction:– Treatment results may not be exactly
reproducible in target population.
Even willingness to volunteer for studies have been shown to result in a selection by treatment interaction effect.
Ecological ValidityEcological Validity
Extent to which the results can be generalized across settings or environmental conditions.– E.g., Would the treatment effect observed in
patients recruited from a 1st class medical centre be the same as the the treatment effect observed in patients recruited from a local community hospital?
Ecological ValidityEcological Validity
Multiple-Treatment Interference– Sequencing effect whereby exposure to one
treatment influences responses to another treatment; or
– Exposure to one experiment influences response in another experiment (e.g., sophisticated participants).
Ecological ValidityEcological Validity
Hawthorne Effect– Knowing one is in a study can affect one’s
behaviour– Participant bias effects (e.g., social
acceptability, compliance)
Novelty or Disruption Effect– Effects are simply due to novelty and wear off
once novelty diminishes.
Ecological ValidityEcological Validity
Experimenter Effect– Enthusiastic experimenter/clinician may get
different effects than a clinician who is implementing the treatment in routine care.
Pre-testing Effect– Administering a pre-test may sensitive the
participant in such a way that he/she may respond differently to the experiment than what would have occurred without a pre-test.
Temporal ValidityTemporal Validity
Extent to which the results would generalize to other times– Results might vary depending on the time
elapsed between presentation of the independent variable and the measurement of the dependent variable.
Temporal ValidityTemporal Validity
Seasonal Variation– Variation that appears regularly over time (e.g.,
change in traffic accident rates between daylight savings time and non-daylight savings time).
– Fixed-time variation – variation at specific, predictable time points
– Variable-time variation – don’t know when variation will occur but when it occurs, there are predictable responses.
Temporal ValidityTemporal Validity
Cyclical Variation– Predictable variation within people or other
organisms
Personological Variation– Variation in the characteristics of the individual
over time
Internal Vs. External ValidityInternal Vs. External ValidityTends to be an inverse relationship
– Internal validity ; external validityIn testing for between group differences,
you want to minimize within group variability and maximize between group differences
To do so you want to ensure high control over factors that could confound the results but this often results in increasingly artificial experimental conditions.
When Is External Validity Less When Is External Validity Less ImportantImportant
When you don’t need to demonstrate that “X” will happen but rather “X” can happen.
Sometimes the main goal is to test a theory and extent to which it reflects “real-life” is less important.