power, effect size, and issues in nhst
DESCRIPTION
TRANSCRIPT
Calculating Effect SizePower Analysis
Issues in Null Hypothesis Significance Testing
Carlo Magno, PhD.
De La Salle University, Manila
1X 2Xn df SD1 SD2 t
p value
30 28 4.6 4.1 2.3 2.3 0.60 0.56
60 58 4.6 4.1 2.3 2.3 0.84 0.4
100 98 4.6 4.1 2.3 2.3 1.09 0.28
500 498 4.6 4.1 2.3 2.3 2.43 0.02*
1000 998 4.6 4.1 2.3 2.3 3.44 0.00*
A researcher wanted to look at the effect of behavior modification technique on the aggression of clients. A group of participants in the experimental group were given behavior modification technique and no treatment in the control. The aggression of the two groups were measured after.
Criticism on NHST
• 1. NHST does not provide the information which the researcher wants to obtain
• 2. Logical problems derived from the probabilistic nature of NHST.
• 3. NHST does not enable psychological theories to be tested.
• 4. The fallacy of replication. • 5. NHST fails to provide useful information because
H0 is always false. • 6. Problems associated with the dichotomous
decision to reject/not reject the H0. • 7. NHST impedes the advance of knowledge.
Alternatives to NHST
• Effect size
• Confidence levels
• Power analysis
Effect Size
• Cohen (1988) defines the effect size as the extent to which the phenomenon is found within the population or, in the context of statistical significance testing, the degree to which the H0 is false.
• Snyder and Lawson (1993) argue that the effect size indicates the extent to which the dependent variable can be controlled, predicted and explained by the independent variable(s).
Effect Size Measures• Effect size measures of Two In/dependent
Groups– Cohen’s d– Hedges g– Glass Delta
• Correlation Measure of Effect Size– r– χ2 ►Φ; t ► r; F ► r; d ► r
• Effect size for Analysis of Variance– Eta Squared– Omega Square Index of Strength– Intercalss correlation
22
21
2
21
ssMMd
2
22
1
12
21
ns
ns
MMt
Cohen’s d Formulat-test for independent Means Formula
ComputationA research compared students who engaged
in group and individual sports on their passion on the sport. Passion was measured using the Passion Scale by Vallerand with tow factors, harmonious and obsessive passion. The t-test for two independent samples was used to determine the significant difference between the students in the group and individual sports on the two factors of passion. The following statistical output was obtained:
Statistical Results
M1 M2 t-value df p N1 N2 SD1 SD2
HP 5.51 5.68 -1.01 58 0.315 30 30 0.70 0.61
OP 4.91 5.36 -1.40 58 0.167 30 30 1.51 0.87
Compute for the effect sizehttp://www.uccs.edu/~lbecker/http://effect-size-generator.software.informer.com/download/
Cohen's Standard Effect Size Percentile StandingPercent of Nonoverlap
2 97.7 81.10% 1.9 97.1 79.40% 1.8 96.4 77.40% 1.7 95.5 75.40% 1.6 94.5 73.10% 1.5 93.3 70.70% 1.4 91.9 68.10% 1.3 90 65.30% 1.2 88 62.20% 1.1 86 58.90% 1 84 55.40% 0.9 82 51.60%
LARGE 0.8 79 47.40% 0.7 76 43.00% 0.6 73 38.20%
MEDIUM 0.5 69 33.00% 0.4 66 27.40% 0.3 62 21.30%
SMALL 0.2 58 14.70% 0.1 54 7.70% 0 50 0%
Cohen's Standard d r r2
2 0.707 0.5 1.9 0.689 0.474
1.8 0.669 0.448 1.7 0.648 0.419 1.6 0.625 0.39 1.5 0.6 0.36 1.4 0.573 0.329 1.3 0.545 0.297 1.2 0.514 0.265 1.1 0.482 0.232 1 0.447 0.2 0.9 0.41 0.168
LARGE 0.8 0.371 0.138 0.7 0.33 0.109 0.6 0.287 0.083
MEDIUM 0.5 0.243 0.059 0.4 0.196 0.038 0.3 0.148 0.022
SMALL 0.2 0.1 0.01 0.1 0.05 0.002 0 0 0
Statistical Power
No real effect Real Effect
Reject H0 Type 1 errorα (.01, .05)
Ho not rejected
Type 2 errorβ (small as possible)
1-βStatistical
power
Slim chance of concluding that the treatment is
effective, despite the fact that it is
Statistical Power
• β=.20 (the error of rejecting a true Ho is 4x more serious than the error of not rejecting a false Ho)
• .80=acceptable power
Statistical Power
• The probability of rejecting a false null hypothesis.
• The likelihood that a study will detect an effect when there is an effect to be detected.
• If statistical power is high, the probability of making a Type II error, (or concluding there is no effect when, in fact, there is one) goes down.
Statistical Power
• The power of any test of statistical significance will be affected by four main parameters:– the effect size– the sample size (N)– the alpha significance criterion (α)– statistical power, or the chosen or implied
beta (β)
Statistics Power small medium Large
r .80 26 63 393
t .80 29 85 781
Statistical Power
http://danielsoper.com/statcalc3/default.aspxhttp://www.statisticalsolutions.net/pss_calc.phphttps://www.dssresearch.com/KnowledgeCenter/toolkitcalculators/statisticalpowercalculators.aspxhttp://homepage.stat.uiowa.edu/~rlenth/Power/
Influence of Effect Size on Power
High school n=65College n=153
Influence of Effect Size on Power
Influence of Effect Size on Power
N=82 Taiwanese in TaiwanN=98 Taiwanese in the Philippines
Influence of Effect Size on Power
• What inference can be gained between effect size and power with fixed sample size and alpha level?
Influence of Significance Level on Power
• Study of De Frias, Dixon, and Strauss (2006)
• N=418
• r=.14 (not significant)
• α=.01 (power=.23)
α=.05 Power=.45
α=.10 Power=.58
α=.15 Power=.65
α=.20 Power=.71
• What inference can be gained between level of significance and power with fixed sample size?
Influence of Sample size on Power
Magno (2005)Monitoring and metacognition
N=280 r=.14 Power=.65
Magno, Mamauag, & Parinas (2007)Independence and self-esteem
N=373 r=.14 Power=.78
Chemers, Hu, & Garcia (2001)Challenge-threat and self-efficacy
N=381 r=.15 Power=.83
Influence of Sample size on Power
• What inference can be gained between sample size and power with fixed effect size and significance level?