group #4 ams 572 – data analysis
DESCRIPTION
ANCOVA. Group #4 AMS 572 – Data Analysis. Professor : Wei Zhu. Team 4. Lin Wang (Lana). Xian Lin (Ben). Zhide Mo (Jeff). Miao Zhang. Juan E. Mojica. Yuan Bian. Ruofeng Wen. Hemal Khandwala. Lei Lei. Xiaochen Li ( Joe ). Team 4. Introduction to ANCOVA. What is ANCOVA. - PowerPoint PPT PresentationTRANSCRIPT
Group #4AMS 572 – Data Analysis
ANCOVA
Professor: Wei Zhu1/85
Team 4Lin Wang (Lana)
Zhide Mo (Jeff)
Juan E. Mojica
Yuan Bian
Hemal Khandwala
Xiaochen Li (Joe)
Ruofeng Wen
Miao Zhang
Xian Lin (Ben)
Lei Lei
2/85
Team 4
3/85
Introduction to ANCOVA
4/85
What is ANCOVA
ANCOVA Analysis of Covariance
ANCOVA merge of ANOVA & Linear Regression
Analysis of Variance 5/85
Development and Application of ANOVA
6/85
ANOVA • described by R. A. Fisher to assist in the
analysis of data from agricultural experiments.
H0 is rejected when it is true
• Compare the means of any number of experimental conditions without any increase in Type 1 error.
7/85
ANOVA a way of determining whether the average scores of groups differed significantly.
Psychology Assess the average effect of different experimental conditions on subjects in terms of a particular dependent variable.
8/85
Ronald Aylmer Fisher
An English statistician,
Evolutionary biologist, and
Geneticist.
Contributions: Feb.17, 1890 – July 29, 1962
Analysis of Variance(ANOVA), Maximum
likelihood, F-distribution, etc.9/85
Development and Application of Linear
Regression
10/85
• developed and applied in different areas with
that of ANOVA
• got developed in biology and psychology
• The term "regression" was coined by Francis
Galton in the nineteenth century to describe a
biological phenomenon
Linear Regression
11/85
Francis Galton studied
the height of parents and
their adult children
Conclusion: short
parents’ children are usually
shorter than average, but
still taller than their parents.
5’6’’ 5’4’’
5’8’’
5’9’’
<
Average height
Regression toward the Mean 12/85
Regression applied to data obtained
from correlational or non-experimental research
Regression analysis helps us
understand the effect of changing one
independent variable on changing dependent
variable value13/85
Francis Galton(Feb. 16, 1822-Jan. 17, 1851)English anthropologist, eugenicist, and statistician.
Contributions:• widely promoted regression
toward the mean• created the statistical concept of correlation
• a pioneer in eugenics, coined the term in 1883
• the first to apply statistical methods to the study of human differences 14/85
• a statistical technique that combines regression and ANOVA(analysis of variance).
What is ANCOVA
• originally developed by R.A. Fisher to increase the precision of experimental analysis
• applied most frequently in quasi-experimental research
involve variables cannot be controlled directly 15/85
• DDDDDDDSDLCJASKDJFLKASJDFLASJD
16/85
One-Way Layout Experiment
Treatment1 2
Sample Mean
Sample SD
Balanced design, if
factor ALevels
Samples 17/85
• , where
• , where is the grand mean
1, 2,..., ; 1, 2,..., ii a j n
This is a linear model to represent Yij
18/85
ESTIMATORS
(grand mean)
•
19/85
Treatment1 2
Sample Mean
Sample SD
𝒏𝒊(𝒚 𝒊− �� )𝟐
… … …
What is SSA?
20/85
• the factor A sum of squares
• the factor A mean square, with d.f.
What is SSA?
21/85
Treatment1 2
Sample Mean
Sample SD
(𝒚 𝒊𝒋−𝒚 𝒊)𝟐
…… …
What is SSE?
22/85
• What is SSE?
23/85
Treatment1 2
Sample Mean
Sample SD
𝑺𝑺𝑻=∑𝒊=𝟏
𝒂
∑𝒋=𝟏
𝒏𝒊
(𝒚 𝒊𝒋− �� )𝟐What is SST?
24/85
• the total sum of squares
• ANOVA identity
What is SST?
25/85
Source of Variance
Sum of Squares Degrees of Freedom
Mean Square F
Treatments
Error
Total
ANOVA TABLE
26/85
Theorethical Background
27/85
Model of ANOVA
ij i ijY Data, the jth observation
of the ith group
Grand mean of Y
Error N(0,σ2)
Effects of the ith group (We focus on if αi = 0, i = 1, …, a)
28/85
Model of Linear Regression
1 0ij ij ijY X
Data, the (ij)th
observation
ErrorPredictor
Slope and Intersect(We focus on the
estimate)29/85
ANCOVA is ANOVA merged with Linear Regression
( ..)ij i ij ijY X X
Known Covariate(What is this guy
doing here?)
Effects of the ith group
(We still focus on if αi = 0, i = 1, …, a)
30/85
How to perform ANCOVA
( ..)ij i ij ijY X X
¿
( )ij i ijY adjust
(This is just the ANOVA Model!)31/85
( ..)ij i ij ijY X X
1 0ij ij ijY X
Within each group, consider αi a constant, and notice that we actually only desire the estimate of slope β instead of INTERSECT.
How do we get ,then?
32/85
How do we get ,then?(2)
• Within each group, do Least Square:
. .
2.
( )( )ˆ( )ij i ij ij
iij ij
X X Y Y
X X
• Assume that
33/85
2. . .
22.
.
ˆ ( ) ( )( )ˆ
( ) ( )
i ij i ij i ij ii j i j
ij iij ii j
i j
X X X X Y Y
X X X X
How do we get ,then?(3)
• We use Pooled Estimate of β
. .
2.
( )( )ˆ( )ij i ij ij
iij ij
X X Y Y
X X
34/85
In each group, find Slope Estimation
via Linear Regression
��𝑖=∑𝑗
¿¿¿
Pool them together
2.
2.
ˆ ( )ˆ
( )
i ij ii j
ij ii j
X X
X X
Get rid of the Covariate ¿
Do ANOVA on the model
~𝑌 𝑖𝑗(𝑎𝑑𝑗𝑢𝑠𝑡)=𝜇+𝛼 𝑖+𝜀𝑖𝑗
ANCOVA begins: ( ..)ij i ij ijY X X
Go home and have dinner.
2 ( ) ?Yammy Cheeseburg ice Coke 35/85
ANCOVA, ANOVA and Regression
36/85
ANOVA /ANCOVA
Regression
General Linear Model
Simple Linear Regression
0Y X
Response Variable Predictor
Error
IntersectSlope
All of them are Scalars!37/85
Multiple Linear Regression
Y X
11 1,( 1)
1 ,( 1)
1
1
n
m m n
x x
x x
1
n
1
n
1
m
y
y
38/85
ANOVA: Dummy Variable Regression
0 1i i iY Z Outcome of the ith
unit Categorical variable (binary)
Residual for the ith
unit
coefficient for the intersect
coefficient for the slope
More about the : =1 if unit is the treatment group =0 if unit is the control group
iZ
iZiZ
39/85
40
Two-way ANOVA
ijk i j ij ijkY
Response variable
the effect due to any
interaction between the ith level of A and
the jth level of B
Residual for the ith
unit
effect due to the ith level of
factor A
effect due to the jth level of
factor B
Overall mean response
General Linear Model
0 1 1 2 2 1 1 2 2...i i i p p p p iy X X X X
Categorical Variables
Continuous Variable
Random Error
Categorical Variables
Continuous Variable
The above formula can be simply denoted as:
41/85
Y X What can this X be?
Before we see an example of X, we have learned thatGeneral Linear Model covers (1) Simple Linear Regression; (2) Multiple Linear Regression; (3) ANOVA; (4) 2-way/n-way ANOVA.
The ith response variable
X: Interaction Between Random Variables
Did you see the tricks?Next, let us see what assumptions shall be satisfied before using ANCOVA.
42/85
0 1 1 2 2 3 3Y X X X
X in the GLM might be expanded as
Where X3 in the above formula could be the INTERACTION between X1 and X20 1 1 2 2 3 1 2*Y X X X X
1 ... ...i a
Test the Three Assumptions
1. Test the homogeneity of variance
2. Test the homogeneity of regression whether H0:
3. Test whether there is a linear relationship between the dependent variable and covariate.
43/85
Before using ANCOVA…
For each i, calculate the MSE/ / 2
i
i i iMSE SSE df SSE n
1. Test the Homogeneity of Variance (1)
44/85
Utilize ( )and ( )i ii iMax MSE Min MSE maxto do a F test
to make sure is a constant under each different
levels.F=Max(MSE ) / ( )i iMin MSE
1 ... ...i a 2. Test Whether H0: (1)
45/85
i
2. Test Whether H0: 1 ... ...i a
1
aG
ii
SSE SSE
(1) DefineGSSE Sum of Square of Errors within Groups
iSSE Is calculated based on
AND, GSSE is generated by the random error .
(2)
46/85
i
2. Test Whether H0: 1 ... ...i a
(2) SSE is generated by
SSB Sum of Square between Groups
• Random Error
SSB is constituted by the difference between different
• Difference between distinct
(3) Let SSB=SSE – SSEG.
We can calculate SSE based on a common
i
(3)
47/85
[ ( 1) 1] ( 2) 1/ / 1
/ / ( 2)
Gb e e
b
G G G Ge
df df df a n a n aMSB SSB df SSB a
MSE SSE df SSE a n
Do F test on MSB and MSEG to see whether we can reject our HO
2. Test Whether H0: 1 ... ...i a
MSB Mean Square between GroupsGMSE Mean Square within Groups
F=MSB / MSEG
(4)
48/85
3. Test Linear Relationship (1)Assumption 3: Test a linear relationship between the
How to do it?
andHo: = 0 dependent variable covariate.
F test SSRon and SSE
Sum of Square of Regression
49/85
From each ix ˆiy0 1
ˆ ˆˆi iy x
SSR is the difference obtained from the summation of the square of the differences between and .yˆiy
3. Test Linear Relationship (2)How to calculate SSR and MSR?
2
1
ˆ( )n
ii
SSR y y
/1MSR SSR
50/85
From each ix ˆiy0 1
ˆ ˆˆi iy x
SSE is the error obtained from the summation of the square of the differences between and .
iyˆiy
3. Test Linear Relationship (3)How to calculate SSE and MSE?
2
1
ˆ( )n
i ii
SSE y y
/( 2)MSE SSE n
51/85
3. Linear Relationship Test (4)MSRFMSE
0
Based on the T.S. we determine whether to accept H0 ( ) or not.0
Assume Assumptions 01 and 02 are already passed.
• If H0 is true ( ),we do ANOVA.• Otherwise, we do ANCOVA.
So, anytime we want to use ANCOVA, we need to test the three assumptions first!
52/85
Application of ANCOVA
53/85
Our case• In this hypothetical study, a sample of 36 teams (id in the
data set) of 12-year-old children attending a summer camp participated in a study to determine which one of three different tree-watering techniques worked best to promote tree growth.
Techniques Frequency CodeWatering the base with a hose
10 minutes once per day
1
Watering the ground surrounding (drip system)
2 hours each day 2
Deep watering (sunk pipe) 10 minutes every 3 days
3
54/85
Conditions for the experiment• From a large set of equally sized and equally
healthy fast-growing trees, each team was given a tree to plant at the start of the camp.
• Each team was responsible for the watering and general care of their trees
• At the end of the summer, the height of each tree was measured.
60/85
Concerns• that some children might have had more
gardening experience than others, and • that any knowledge gained as a result of that
prior experience might affect the way the tree was planted and perhaps even the way in which the children cared for the tree and carried out the watering regime.
How to approach?Create a indicator for that knowledge. (i.e. a 40 point scale gardering experience)
61/85
id watering technique
tree growth
dvgardening
exp cov
1 1 39 242 1 36 183 1 30 214 1 42 24
……. ……… ……….. ………32 3 36 1533 3 30 1834 3 39 1835 3 27 936 3 24 6
Real Data
Grouping (1,2,3)
Dependend Variable
Covariate Variable
Data Structure
62/85
id watering technique
tree growth
dvgardening
exp cov
1 1 39 242 1 36 183 1 30 214 1 42 24
……. ……… ……….. ………32 3 36 1533 3 30 1834 3 39 1835 3 27 936 3 24 6
Real Data
Grouping (1,2,3)
Dependend Variable
Covariate Variable
( ..)ij i ij ijY X X
Overall Mean Response
Regression coefficient parameter.
Residual error
Data Structure
63/85
Model Assumptions
ANCOVASAS
Linearity of Regression
Homogenity of Regression
Homogenity of Variance
and dv is Normal
64/85
The Pearson correlation coefficient between the covariate and the dependentvar.is .81150.
n
i in
i i
n
i ii
YX
YX
YXYX
YYXX
YYXXYXEYX
12
12
1,
)()(
))(()])([(),cov(
65/85
Assumptions
Clearly a strong linear component to the relationship.
Linearity of regressionassumption appears to be met by the data set
66/85
Assumptions (Homogenity of Regresion)
The assumption of homogeneity of regression is tested by examining the interaction of the covariate and the independent variable. If it is not statistically significant, as is the case here, then the assumption is met.
67/85
Output
The Model contains the effectsof both the covariate and theindependent variable.
The effects of the covariateand the independent variableare separately evaluated inthis summary table.
68/85
Output
69/85
Output
Watering techniques coded as 1 (hose watering) and 3 (deep watering) are the only two groups whose means differ significantly
78/85
Experiment Conclusions• We can assert that prior gardening experience and
knowledge was quite influential in how well the trees fared under the attention of the young campers.
• when we statistically control for or equate the gardening experience and knowledge of the children, was a relatively strong factor in how much growth was seen in the trees.
• On the basis of the adjusted means, we may therefore conclude that, when we statistically control for gardening experience,deep watering is more effective than hose watering but is not significantly more effective than drip watering.
79/85
SAS Code for ANCOVA
GROUP VARIABLE, DEPENDENT VARIABLE and COVARIATE
THIS IS ANCOVA!!!!!
80/85
ENTERPRISE GUIDE APPORACH
81/85
ENTERPRISE GUIDE APPORACH
Tasks->Graph->Scatter Plot82/85
ENTERPRISE GUIDE APPORACH
Tasks->ANOVA->Linear Models83/85
ENTERPRISE GUIDE APPORACH
84/85
QUESTIONS?THANK YOU!
85/85