power, effect size, and issues in nhst

Calculating Effect SizePower Analysis

Issues in Null Hypothesis Significance Testing

Carlo Magno, PhD.

De La Salle University, Manila

1X 2Xn df SD1 SD2 t

p value

30 28 4.6 4.1 2.3 2.3 0.60 0.56

60 58 4.6 4.1 2.3 2.3 0.84 0.4

100 98 4.6 4.1 2.3 2.3 1.09 0.28

500 498 4.6 4.1 2.3 2.3 2.43 0.02*

1000 998 4.6 4.1 2.3 2.3 3.44 0.00*

A researcher wanted to look at the effect of behavior modification technique on the aggression of clients. A group of participants in the experimental group were given behavior modification technique and no treatment in the control. The aggression of the two groups were measured after.

Criticism on NHST

• 1. NHST does not provide the information which the researcher wants to obtain

• 2. Logical problems derived from the probabilistic nature of NHST.

• 3. NHST does not enable psychological theories to be tested.

• 4. The fallacy of replication. • 5. NHST fails to provide useful information because

H0 is always false. • 6. Problems associated with the dichotomous

decision to reject/not reject the H0. • 7. NHST impedes the advance of knowledge.

Alternatives to NHST

• Effect size

• Confidence levels

• Power analysis

Effect Size

• Cohen (1988) defines the effect size as the extent to which the phenomenon is found within the population or, in the context of statistical significance testing, the degree to which the H0 is false.

• Snyder and Lawson (1993) argue that the effect size indicates the extent to which the dependent variable can be controlled, predicted and explained by the independent variable(s).

Effect Size Measures• Effect size measures of Two In/dependent

Groups– Cohen’s d– Hedges g– Glass Delta

• Correlation Measure of Effect Size– r– χ2 ►Φ; t ► r; F ► r; d ► r

• Effect size for Analysis of Variance– Eta Squared– Omega Square Index of Strength– Intercalss correlation

22

21

2

21

ssMMd

2

22

1

12

21

ns

ns

MMt

Cohen’s d Formulat-test for independent Means Formula

ComputationA research compared students who engaged

in group and individual sports on their passion on the sport. Passion was measured using the Passion Scale by Vallerand with tow factors, harmonious and obsessive passion. The t-test for two independent samples was used to determine the significant difference between the students in the group and individual sports on the two factors of passion. The following statistical output was obtained:

Statistical Results

M1 M2 t-value df p N1 N2 SD1 SD2

HP 5.51 5.68 -1.01 58 0.315 30 30 0.70 0.61

OP 4.91 5.36 -1.40 58 0.167 30 30 1.51 0.87

Compute for the effect sizehttp://www.uccs.edu/~lbecker/http://effect-size-generator.software.informer.com/download/

http://www.uccs.edu/~lbecker/

Cohen's Standard Effect Size Percentile StandingPercent of Nonoverlap

2 97.7 81.10% 1.9 97.1 79.40% 1.8 96.4 77.40% 1.7 95.5 75.40% 1.6 94.5 73.10% 1.5 93.3 70.70% 1.4 91.9 68.10% 1.3 90 65.30% 1.2 88 62.20% 1.1 86 58.90% 1 84 55.40% 0.9 82 51.60%

LARGE 0.8 79 47.40% 0.7 76 43.00% 0.6 73 38.20%

MEDIUM 0.5 69 33.00% 0.4 66 27.40% 0.3 62 21.30%

SMALL 0.2 58 14.70% 0.1 54 7.70% 0 50 0%

Cohen's Standard d r r2

2 0.707 0.5 1.9 0.689 0.474

1.8 0.669 0.448 1.7 0.648 0.419 1.6 0.625 0.39 1.5 0.6 0.36 1.4 0.573 0.329 1.3 0.545 0.297 1.2 0.514 0.265 1.1 0.482 0.232 1 0.447 0.2 0.9 0.41 0.168

LARGE 0.8 0.371 0.138 0.7 0.33 0.109 0.6 0.287 0.083

MEDIUM 0.5 0.243 0.059 0.4 0.196 0.038 0.3 0.148 0.022

SMALL 0.2 0.1 0.01 0.1 0.05 0.002 0 0 0

Statistical Power

No real effect Real Effect

Reject H0 Type 1 errorα (.01, .05)

Ho not rejected

Type 2 errorβ (small as possible)

1-βStatistical

power

Slim chance of concluding that the treatment is

effective, despite the fact that it is

Statistical Power

• β=.20 (the error of rejecting a true Ho is 4x more serious than the error of not rejecting a false Ho)

• .80=acceptable power

Statistical Power

• The probability of rejecting a false null hypothesis.

• The likelihood that a study will detect an effect when there is an effect to be detected.

• If statistical power is high, the probability of making a Type II error, (or concluding there is no effect when, in fact, there is one) goes down.

Statistical Power

• The power of any test of statistical significance will be affected by four main parameters:– the effect size– the sample size (N)– the alpha significance criterion (α)– statistical power, or the chosen or implied

beta (β)

Statistics Power small medium Large

r .80 26 63 393

t .80 29 85 781

Statistical Power

http://danielsoper.com/statcalc3/default.aspxhttp://www.statisticalsolutions.net/pss_calc.phphttps://www.dssresearch.com/KnowledgeCenter/toolkitcalculators/statisticalpowercalculators.aspxhttp://homepage.stat.uiowa.edu/~rlenth/Power/

Influence of Effect Size on Power

High school n=65College n=153


N=82 Taiwanese in TaiwanN=98 Taiwanese in the Philippines

• What inference can be gained between effect size and power with fixed sample size and alpha level?

Influence of Significance Level on Power

• Study of De Frias, Dixon, and Strauss (2006)

• N=418

• r=.14 (not significant)

• α=.01 (power=.23)

α=.05 Power=.45

α=.10 Power=.58

α=.15 Power=.65

α=.20 Power=.71

• What inference can be gained between level of significance and power with fixed sample size?

Influence of Sample size on Power

Magno (2005)Monitoring and metacognition

N=280 r=.14 Power=.65

Magno, Mamauag, & Parinas (2007)Independence and self-esteem

N=373 r=.14 Power=.78

Chemers, Hu, & Garcia (2001)Challenge-threat and self-efficacy

N=381 r=.15 Power=.83

Influence of Sample size on Power

• What inference can be gained between sample size and power with fixed effect size and significance level?

power, effect size, and issues in nhst

Technology

effect size http

effect size cohen

t rf rdr effect size

factors of passion

cohens d formulam1 m2

passion scale

obsessive passion

probabilistic nature