previous lecture: phylogenetics. analysis of variance this lecture judy zhong ph.d

30
Previous Lecture: Phylogenetics

Upload: lewis-robertson

Post on 05-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Previous Lecture: Phylogenetics

Page 2: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Analysis of Variance

This Lecture

Judy Zhong Ph.D.

Page 3: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Learning Objectives

Until now, we have considered two groups of individuals and we've wanted to know if the two groups were sampled from distributions with equal population means or medians.

Suppose we would like to consider more than two groups of individuals and, in particular, test whether the groups were sampled from distributions with equal population means.

How to use one-way analysis of variance (ANOVA) to test for differences among the means of several populations ( “groups”)

Page 4: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Hypotheses of One-Way ANOVA

All population means are equal No treatment effect (no variation in means among groups)

At least one population mean is different There is a treatment effect Does not mean that all population means are different

(some pairs may be the same)

H1:Not all of the population means are the same

Page 5: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

One-Factor ANOVA

All means are the same:The null hypothesis is true

(No treatment effect)

Page 6: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

One-Factor ANOVA

At least one mean is different:The null hypothesis is NOT true

(Treatment effect is present)

or

(continued)

Page 7: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

One-Way ANOVA:Model Assumptions

The K random samples are drawn from K independent populations

The variances of the populations are identical The underlying data are approximately normally

distributed

Page 8: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Basic Idea partitioning the variation

Suppose there are K groups with observations.

yij j-th observation in i-th group, y overall mean,

yi mean of group i

y

ijy y

i y y

ij y

i Deviation of group mean from grand

mean

Deviation of observations from

group mean

y

ij

i

ij

Knnn ,...,, 21

Page 9: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Partitioning the variation

y

ij y 2 y

ij y

i 2 yi y 2

Total variation(total SS)

Variation due to random sampling(within SS)

Variation due to factor(between SS)

Total variation is the sum of Within-group variability and Between-group variability

y

ij y y

ij y

i yi y

y

ij y

i Deviation of observations from group mean (within group variability)

Deviation of observations from overall mean (between group variability) y i y

Page 10: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Partitioning the variation

y

ij y 2

j1

ni

i1

3

yij y

i 2j1

ni

i1

3

yi y 2

j1

ni

i1

3

y overall mean

yi mean of group i

n13,n

24,n

34

G rou p 1 G rou p 2 G rou p 3

Resp on se , X

y

y1 y2

y3

Page 11: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Group 1 Group 2 Group 3

Response, X

Group 1 Group 2 Group 3

Response, X

If Between group variability is large and Within group variability is small => reject Ho

If Between group variability is small and Within group variability is large => accept Ho

Basic Idea of ANOVA

Page 12: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Partition of Total Variation

Variation Due to Factor (Between SS)

Variation Due to Random Sampling (Within SS)

Total Variation (total SS)

Commonly referred to as: Sum of Squares Within Sum of Squares Error Sum of Squares Unexplained Within-Group Variation

Commonly referred to as: Sum of Squares Between Sum of Squares Among Sum of Squares Explained Among Groups Variation

= +

d.f. = n – 1

d.f. = k – 1 d.f. = n – k

Page 13: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Total Sum of Squares

Total SS (y

ij y )2

i1

n j

j1

k

Where:

Total SS = Total sum of squares

k = number of groups (levels or treatments)

nj = number of observations in group j

yij = ith observation from group j

= grand mean (mean of all data values) y

Total SS = Between SS + Within SS

Page 14: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Total Variation

G rou p 1 G rou p 2 G rou p 3

Resp on se , X

Total SS (y

11 y )2 (y

12 y )2 ... (y

knk y )2

y

Page 15: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Between-Group Variation

y1

Group 1 Group 2 Group 3

Response, X

Between SS (y

j y )2

i1

n j

j1

k

n1(y

1 y )2 n

2(y

2 y )2 ...n

k(y

k y )2

y2

y3

y

Page 16: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Within-Group Variation

1Y

3Y

G rou p 1 G rou p 2 G rou p 3

Resp on se , X

Within SS (y

ij y

i)

j1

ni

i1

k

(ni 1) *S

i2

i1

k

(continued)

2Y

Page 17: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Obtaining the Mean Squares

Within MS

Within SS

n k

Between MS

Between SS

k 1

Total MS

Total SS

n 1

Page 18: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

One-Way ANOVA Table

Source of Variation

dfSS MS(Variance)

Between Groups

B SS BMS =

Within Groups

n - kW SS WMS =

Total n - 1TSS =BSS+WSS

k - 1 BMS

WMS

F ratio

k = number of groupsn = sum of the sample sizes from all groupsdf = degrees of freedom

BSS

k - 1

WSS

n - k

F =

Page 19: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

One-Way ANOVAF Test Statistic

Test statistic

Degrees of freedom df1 = k – 1 (k = number of groups) df2 = n – k (n = sum of sample sizes from all populations)

F

Between MS

Within MS

Page 20: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Interpreting One-Way ANOVA F Statistic

The F statistic is the ratio of the among estimate of variance and the within estimate of variance The ratio must always be positive df1 = k -1 will typically be small df2 = n - k will typically be large

Decision Rule: Reject H0 if F > FU

Otherwise do not reject H0

0

= .05

Reject H0Do not reject H0

FU

Page 21: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Example

You want to see if three different golf clubs yield different distances. You randomly select five measurements from trials on an automated driving machine for each club. At the 0.05 significance level, is there a difference in mean distance?

Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204

Page 22: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

••••

Example

270

260

250

240

230

220

210

200

190

••

•••

•••••

Distance

Y 1 249.2 Y 2 226.0 Y 3 205.8

Y 227.0

Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204

Club1 2 3

Y 1

Y 2

Y 3

Y

Page 23: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Example

Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204

Y1 = 249.2

Y2 = 226.0

Y3 = 205.8

Y = 227.0

n1 = 5

n2 = 5

n3 = 5

n = 15

k = 3

B SS = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 = 4716.4

W SS = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6

BMS = 4716.4 / (3-1) = 2358.2

WMS = 1119.6 / (15-3) = 93.325.275

93.3

2358.2F

Page 24: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Test Statistic:

Decision:

Conclusion:

0

= .05

FU = 3.89Reject H0Do not

reject H0

Critical Value:

FU = 3.89

Example

H0: µ1 = µ2 = µ3

H1: µj not all equal 0.05 df1= 2, df2 = 12

Table 9: Critical Value=2.052

Page 25: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Test Statistic:

Decision:

Conclusion:

0

= .05

FU = 3.89Reject H0Do not

reject H0

Critical Value:

FU = 3.89

Example

H0: µ1 = µ2 = µ3

H1: µj not all equal 0.05 df1= 2, df2 = 12

Table 9: Critical Value=2.052

F

BMS

WMS

2358.2

93.325.275

Page 26: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Test Statistic:

Decision:

Conclusion:

0

= .05

FU = 3.89Reject H0Do not

reject H0

Critical Value:

FU = 3.89

Example

H0: µ1 = µ2 = µ3

H1: µj not all equal 0.05 df1= 2, df2 = 12

Table 9: Critical Value=2.052

F = 25.275

F

BMS

WMS

2358.2

93.325.275

Page 27: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Test Statistic:

Decision:

Reject H0 at = 0.05

Conclusion:

0

= .05

FU = 3.89Reject H0Do not

reject H0

Critical Value:

FU = 3.89

Example

H0: µ1 = µ2 = µ3

H1: µj not all equal 0.05 df1= 2, df2 = 12

Table 9: Critical Value=2.052

F = 25.275

F

BMS

WMS

2358.2

93.325.275

Page 28: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Test Statistic:

Decision:

Reject H0 at = 0.05

Conclusion:There is evidence that at least one µj differs from the rest

0

= .05

FU = 3.89Reject H0Do not

reject H0

Critical Value:

FU = 3.89

Example

H0: µ1 = µ2 = µ3

H1: µj not all equal 0.05 df1= 2, df2 = 12

Table 9: Critical Value=2.052

F

BMS

WMS

2358.2

93.325.275

F = 25.275

Page 29: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Source SS DF MS F P-value

Between 4716.4 2 2358.2 25.76 <0.001

Within 1119.6 12 93.3

Total 5836.0

ANOVA Table

Page 30: Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D

Next Lecture: Categorical Data Methods