bps - 3rd ed. chapter 51 regression. bps - 3rd ed. chapter 52 u to describe the change in y per unit...
DESCRIPTION
BPS - 3rd Ed. Chapter 53 “Returning Birds” Example Plot data first to see if relation can be described by straight line (important!) Illustrative data from Exercise 4.4 Y = adult birds joining colony X = percent of birds returning, prior yearTRANSCRIPT
BPS - 3rd Ed. Chapter 5 1
Chapter 5
Regression
BPS - 3rd Ed. Chapter 5 2
To describe the change in Y per unit X
To predict the average level of Y at a given level of X
Objectives of Regression
BPS - 3rd Ed. Chapter 5 3
“Returning Birds” Example
Plot data first to see if relation can be described by straight line (important!)
Illustrative data from Exercise 4.4
Y = adult birds joining colony
X = percent of birds returning, prior year
BPS - 3rd Ed. Chapter 5 4
If data can be described by straight line
… describe relationship with equation Y = (intercept) + (slope)(X)
May also be written:Y = (slope)(X) + (intercept)
Intercept where line crosses Y axis
Slope “angle” of line
BPS - 3rd Ed. Chapter 5 5
Linear Regression Algebraic line every point falls on line:
exact y = intercept + (slope)(X)
Statistical line scatter cloud suggests a linear trend:
“predicted y” = intercept + (slope)(X)
BPS - 3rd Ed. Chapter 5 6
Regression Equation ŷ = a + bx, where
– ŷ (“y-hat”) is the predicted value of Y– a is the intercept
– b is the slope
– x is a value for X
Determine a & b for “best fitting line”
The TI calculators reverse a & b!
BPS - 3rd Ed. Chapter 5 7
What Line Fits Best?
If we try to draw the line by eye, different people will draw different lines
We need a method to draw the “best line”
This method is called “least squares”
BPS - 3rd Ed. Chapter 5 8
The “least squares” regression lineEach point has:Residual = observed y – predicted y
= distance of point from prediction line
The least squares line minimizes the sum of the square residuals
BPS - 3rd Ed. Chapter 5 9
Calculating Least Squares Regression Coefficients
Formula (next slide) Technology
– TI-30XIIS– Two variable Applet – Other
BPS - 3rd Ed. Chapter 5 10
xbya
ss
rbx
y
b = slope coefficient a = intercept coefficient
Formulas
where sx and sy are the standard deviations of the two variables, and r is their correlation
BPS - 3rd Ed. Chapter 5 11
Technology: Calculator
BEWARE!
TI calculators label the slope and intercept backwards!
BPS - 3rd Ed. Chapter 5 12
Regression Line For the “bird data”:
a = 31.9343 b = 0.3040
The linear regression equation is: ŷ = 31.9343 0.3040x
The slope (-0.3040) represents the average change in Y per unit X
BPS - 3rd Ed. Chapter 5 13
Use of Regression for Prediction
Suppose an individual colony has 60% returning (x = 60). What is the predicted number of new birds for this colony?
Answer: ŷ = a + bx = 31.9343 (0.3040)(60) = 13.69
Interpretation: the regression model predicts 13.69 new birds (ŷ) for a colony with x = 60.
BPS - 3rd Ed. Chapter 5 14
Prediction via Regression Line Number of new birds and Percent returning
When X = 60, the regression model predicts Y = 13.69
BPS - 3rd Ed. Chapter 5 15
Case Study
Per Capita Gross Domestic Productand Average Life Expectancy for
Countries in Western Europe
BPS - 3rd Ed. Chapter 5 16
Country Per Capita GDP (x) Life Expectancy (y)Austria 21.4 77.48Belgium 23.2 77.53Finland 20.0 77.32France 22.7 78.63
Germany 20.8 77.17Ireland 18.6 76.39
Italy 21.5 78.51Netherlands 22.0 78.15Switzerland 23.8 78.99
United Kingdom 21.2 77.37
Regression CalculationCase Study
BPS - 3rd Ed. Chapter 5 17
Life Expectancy and GDP (Europe)
Case Study (Life Expectancy)
76
77
78
79
18 19 20 21 22 23 24
Per Capital GDP
Life
exp
ecta
ncy
(yrs
)
BPS - 3rd Ed. Chapter 5 18
0.795 1.5320.809 77.754 21.52
yx ssryx
Calculations:
68.716.52)(0.420)(21-77.754
0.4201.5320.795(0.809)
xbyass
rbx
y
ŷ = 68.716 + 0.420x
Regression Calculationby Hand (Life Expectancy Study)
BPS - 3rd Ed. Chapter 5 19
BPS/3e Two Variable Applet
BPS - 3rd Ed. Chapter 5 20
Applet: Data Entry
BPS - 3rd Ed. Chapter 5 21
Applet: Calculations
BPS - 3rd Ed. Chapter 5 22
Applet: Scatterplot
BPS - 3rd Ed. Chapter 5 23
Applet: least squares line
BPS - 3rd Ed. Chapter 5 24
InterpretationLife Expectancy Case Study
Model: ŷ = 68.716 + (0.420)X Slope: For each increase in GDP
0.420 years increase in life expectancy Prediction example: What is the life
expectancy in a country with a GDP of 20.0?ANSWER:ŷ = 68.716 + (0.420)(20.0) = 77.12
BPS - 3rd Ed. Chapter 5 25
Coefficient of Determination (R2)(Fact 4 on p. 111)
“Coefficient of determination, (R2)Quantifies the fraction of the Y “mathematically
explained” by X
Examples: r=1: R2=1: regression line explains all (100%) of
the variation in Y r=.7: R2=.49: regression line explains almost half
(49%) of the variation in Y
BPS - 3rd Ed. Chapter 5 26
We are NOT going to cover the analysis of residual plots (pp. 113-116)
BPS - 3rd Ed. Chapter 5 27
Outliers and Influential Points
An outlier is an observation that lies far from the regression line
Outliers in the y direction have large residuals
Outliers in the x direction are influential– removal of influential point would markedly
change the regression and correlation values
BPS - 3rd Ed. Chapter 5 28
Outliers:Case Study
Gesell Adaptive Score and Age at First Word
From all the datar2 = 41%
r2 = 11%
After removing child 18
BPS - 3rd Ed. Chapter 5 29
CautionsAbout Correlation and Regression Describe only linear relationships Are influenced by outliers Cannot be used to predict beyond the
range of X (do not extrapolate) Beware of lurking variables (variables other
than X and Y) – Association does not always equal causation!
BPS - 3rd Ed. Chapter 5 30
Do not extrapolate (Sarah’s height)
Sarah’s height is plotted against her age
Can you predict her height at age 42 months?
Can you predict her height at age 30 years (360 months)?
80
85
90
95
100
30 35 40 45 50 55 60 65
age (months)
heig
ht (c
m)
BPS - 3rd Ed. Chapter 5 31
Do not extrapolate (Sarah’s height)
Regression equation: ŷ = 71.95 + .383(X)
At age 42 months: ŷ = 71.95 + .383(42) = 88(Reasonable)
At age 360 months: ŷ = 71.95 + .383(360) = 209.8(That’s over 17 feet
tall!)
7090
110130150170190210
30 90 150 210 270 330 390
age (months)
heig
ht (c
m)
BPS - 3rd Ed. Chapter 5 32
Even very strong correlations may not correspond to a causal
relationship between x and y
(Beware of the lurking variable!)
Caution: Correlation does not always mean causation