7.1 - motivation 7.1 - motivation 7.2 - correlation / simple linear regression 7.2 - correlation /...

84
7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple Linear Regression CHAPTER 7 Linear Correlation & Regression Methods

Upload: julian-paul

Post on 28-Dec-2015

241 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

• 7.1 - Motivation

• 7.2 - Correlation / Simple Linear Regression

• 7.3 - Extensions of Simple Linear Regression

CHAPTER 7 Linear Correlation & Regression Methods

Page 2: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …Testing for association between two POPULATION variables X and Y…

• Categorical variables • Numerical variables

Categories of X

Categories of Y

Chi-squared Test ???????

Examples: X = Disease status (D+, D–)

Y = Exposure status (E+, E–)

X = # children in household (0, 1-2, 3-4, 5+)Y = Income level (Low, Middle, High)

PARAMETERS

Means: [ ] [ ]X YE X E Y

Variances: 2 2

2 2

( )

( )

X X

Y Y

E X

E Y

Covariance:

( )( )XY X YE X Y

Page 3: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

2 2

2 2

( )

( )

X X

Y Y

E X

E Y

2

2

( )21

( )21

x x

x n

y y

y n

s

s

PARAMETERS

Means: [ ] [ ]X YE X E Y

Variances:

( )( )XY X YE X Y

• Numerical variables

???????

Means: [ ] [ ]X YE X E Y

Variances:

( )( )XY X YE X Y

PARAMETERSx y

n nx y STATISTICS

( )( )

1

x x y y

xy ns

1 2 3 4, , , , , nx x x x x

1 2 3 4, , , , , ny y y y y

(can be +, –, or 0)

Covariance: Covariance:

Page 4: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

2

2

( )21

( )21

x x

x n

y y

y n

s

s

1 2 3 4, , , , , nx x x x x

1 2 3 4, , , , , ny y y y yx1 x2 x3 x4 … xn

y1 y2 y3 y4 … ynPARAMETERS

Means: [ ] [ ]X YE X E Y

Variances:

Covariance:

( )( )XY X YE X Y

• Numerical variables

???????

Means: [ ] [ ]X YE X E Y

Variances:

( )( )XY X YE X Y

PARAMETERSx y

n nx y STATISTICS

( )( )

1

x x y y

xy ns

(can be +, –, or 0)

X

Y

JAMA. 2003;290:1486-1493

Scatterplot

(n data points) Covariance:

Page 5: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

2

2

( )21

( )21

x x

x n

y y

y n

s

s

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … ynPARAMETERS

Means: [ ] [ ]X YE X E Y

Variances:

Covariance:

( )( )XY X YE X Y

• Numerical variables

???????

Means: [ ] [ ]X YE X E Y

Variances:

( )( )XY X YE X Y

PARAMETERSx y

n nx y STATISTICS

( )( )

1

x x y y

xy ns

(can be +, –, or 0)

X

Y

JAMA. 2003;290:1486-1493

Scatterplot

Does this suggest a linear trend between X and Y?

If so, how do we measure it?

(n data points) Covariance:

Page 6: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Testing for association between two population variables X and Y…

• Numerical variables

???????

PARAMETERS

Means: [ ] [ ]X YE X E Y

Variances: 2 2

2 2

( )

( )

X X

Y Y

E X

E Y

Covariance:

( )( )XY X YE X Y

Linear Correlation Coefficient:

2 2

XY

X Y

Always between –1 and +1

LINEAR^

Page 7: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

2

2

( )21

( )21

x x

x n

y y

y n

s

s

2 2

XY

X Y

2 2

xy

x y

sr

s s

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … ynPARAMETERS

Means: [ ] [ ]X YE X E Y

Variances:

Covariance:

( )( )XY X YE X Y

• Numerical variables

???????

Means: [ ] [ ]X YE X E Y

Variances:

Covariance:

( )( )XY X YE X Y

PARAMETERSx y

n nx y STATISTICS

( )( )

1

x x y y

xy ns

(can be +, –, or 0)

X

Y

JAMA. 2003;290:1486-1493

Scatterplot

Linear Correlation Coefficient:

Always between –1 and +1

(n data points)

Page 8: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

JAMA. 2003;290:1486-1493

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … ynPARAMETERS

Means: [ ] [ ]X YE X E Y

Variances: 2 2

2 2

( )

( )

X X

Y Y

E X

E X

Covariance:

( )( )XY X YE X Y

• Numerical variables

???????

Means: [ ] [ ]X YE X E Y

Variances: 2 2

2 2

( )

( )

X X

Y Y

E X

E X

Covariance:

( )( )XY X YE X Y

PARAMETERSx y

n nx y 2

2

( )21

( )21

x x

x n

y y

y n

s

s

STATISTICS

( )( )

1

x x y y

xy ns

(can be +, –, or 0)

X

Y

Scatterplot

(n data points)

Example in R (reformatted for brevity):

> c(mean(x), mean(y)) 7.05 12.08

> var(x) 29.48944

> var(y) 43.76178

> cov(x, y) -25.86667

2 2

xy

x y

sr

s s

Linear Correlation Coefficient:

Always between –1 and +1

> cor(x, y) -0.7200451

> pop = seq(0, 20, 0.1)

> x = sort(sample(pop, 10))1.1 1.8 2.1 3.7 4.0 7.3 9.1 11.9 12.4 17.1

> y = sample(pop, 10)13.1 18.3 17.6 19.1 19.3 3.2 5.6 13.6 8.0 3.0

n = 10

plot(x, y, pch = 19)

Page 9: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

2 2

xy

x y

sr

s s

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … yn

• Numerical variables

X

Y

JAMA. 2003;290:1486-1493

Scatterplot

Linear Correlation Coefficient:

Always between –1 and +1

r measures the strength of linear association

(n data points)

Page 10: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

2 2

xy

x y

sr

s s

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … yn

• Numerical variables

X

Y

JAMA. 2003;290:1486-1493

Scatterplot

Linear Correlation Coefficient:

Always between –1 and +1

–1 0 +1positive linear

correlationnegative linear

correlation

r

r measures the strength of linear association

(n data points)

Page 11: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

2 2

xy

x y

sr

s s

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … yn

• Numerical variables

X

Y

JAMA. 2003;290:1486-1493

Scatterplot

Linear Correlation Coefficient:

Always between –1 and +1

–1 0 +1positive linear

correlationnegative linear

correlation

r

r measures the strength of linear association

(n data points)

Page 12: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

2 2

xy

x y

sr

s s

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … yn

• Numerical variables

X

Y

JAMA. 2003;290:1486-1493

Scatterplot

Linear Correlation Coefficient:

Always between –1 and +1

–1 0 +1positive linear

correlationnegative linear

correlation

r

r measures the strength of linear association

(n data points)

r measures the strength of linear association

Page 13: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

2 2

xy

x y

sr

s s

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … yn

• Numerical variables

X

Y

JAMA. 2003;290:1486-1493

Scatterplot

Linear Correlation Coefficient:

Always between –1 and +1

–1 0 +1positive linear

correlationnegative linear

correlation

r

(n data points)

> cor(x, y) -0.7200451

r measures the strength of linear association

Page 14: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Testing for linear association between two numerical population variables X and Y…

Linear Correlation Coefficient

2 2

XY

X Y

0: 0 "

."

H No linear association

between X and Y

: 0 "

."AH Linear association

between X and Y

2 2

xy

x y

sr

s s

Linear Correlation Coefficient

Now that we have r, we can conduct HYPOTHESIS TESTING on

Test Statistic for p-value

222 ~

1n

rT n t

r

2

0.7210 2

1 ( .72)

82.935 on t

p-value = .0189 < .052 * pt(-2.935, 8)

Page 15: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

If such an association between X and Y exists, then it follows that for any intercept 0 and slope 1, we have…

2 2

xy

x y

sr

s s

Linear Correlation Coefficient:

r measures the strength of linear association 0 1Y X

“Response = Model + Error”

Find estimates and for the “best” line0 1

0 1ˆ ˆY X

> cor(x, y) -0.7200451

Residuals

( , )i ix y

ˆ( , )i ix y

ˆi i ie y y

in what sense??? 2ErrSS ie

Page 16: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

in what sense???

If such an association between X and Y exists, then it follows that for any intercept 0 and slope 1, we have…

2 2

xy

x y

sr

s s

Linear Correlation Coefficient:

r measures the strength of linear association 0 1Y X

“Response = Model + Error”

Find estimates and for the “best” line0 1

0 1ˆ ˆY X

> cor(x, y) -0.7200451

Residuals

( , )i ix y

ˆ( , )i ix y

ˆi i ie y y

i.e., that minimizes

1 2ˆ xy

x

s

s

0 1ˆ ˆy x

25.866670.87715

29.48944

12.08 ( 0.87715)(7.05)

18.26391

“Least Squares Regression Line”

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

2ErrSS ie

( , ) is on linex y

Page 17: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

0 1ˆ ˆY X

If such an association between X and Y exists, then it follows that for any intercept 0 and slope 1, we have…

2 2

xy

x y

sr

s s

Linear Correlation Coefficient:

r measures the strength of linear association 0 1Y X

“Response = Model + Error”

Find estimates and for the “best” line0 1

ˆ 18.26391 0.87715Y X

> cor(x, y) -0.7200451

Residuals

( , )i ix y

ˆ( , )i ix y

ˆi i ie y y

i.e., that minimizes

1 2ˆ xy

x

s

s

0 1ˆ ˆy x

25.866670.87715

29.48944

12.08 ( 0.87715)(7.05)

18.26391

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

2ErrSS ie

( , ) is on linex yCheck

Page 18: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

0 1ˆ ˆY X

Find estimates and for the “best” line0 1

ˆ 18.26391 0.87715Y X

> cor(x, y) -0.7200451

Residuals

( , )i ix y

ˆ( , )i ix y

ˆi i ie y y

i.e., that minimizes

X 1.1 1.8 2.1 3.7 4.0 7.3 9.1 11.9 12.4 17.1

Y 13.1 18.3 17.6 19.1 19.3 3.2 5.6 13.6 8.0 3.0

1 2ˆ xy

x

s

s

0 1ˆ ˆy x

25.866670.87715

29.48944

12.08 ( 0.87715)(7.05)

18.26391

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

2ErrSS ie

Page 19: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Find estimates and for the “best” line0 1

ˆ 18.26391 0.87715Y X

> cor(x, y) -0.7200451

Residuals

( , )i ix y

ˆ( , )i ix y

ˆi i ie y y

i.e., that minimizes

X 1.1 1.8 2.1 3.7 4.0 7.3 9.1 11.9 12.4 17.1

Y 13.1 18.3 17.6 19.1 19.3 3.2 5.6 13.6 8.0 3.0

1 2ˆ xy

x

s

s

0 1ˆ ˆy x

25.866670.87715

29.48944

12.08 ( 0.87715)(7.05)

18.26391

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response Y

2ErrSS ie

Page 20: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Find estimates and for the “best” line0 1> cor(x, y) -0.7200451

Residuals

( , )i ix y

ˆ( , )i ix y

ˆi i ie y y

i.e., that minimizes

X 1.1 1.8 2.1 3.7 4.0 7.3 9.1 11.9 12.4 17.1

Y 13.1 18.3 17.6 19.1 19.3 3.2 5.6 13.6 8.0 3.0

~ E X E R C I S E ~

1 2ˆ xy

x

s

s

0 1ˆ ˆy x

25.866670.87715

29.48944

12.08 ( 0.87715)(7.05)

18.26391

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response Y

ˆ 18.26391 0.87715Y X 2

ErrSS ie

Page 21: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

ˆ 18.26391 0.87715Y X

Find estimates and for the “best” line0 1> cor(x, y) -0.7200451

Residuals

( , )i ix y

ˆ( , )i ix y

ˆi i ie y y

i.e., that minimizes

1 2ˆ xy

x

s

s

0 1ˆ ˆy x

25.866670.87715

29.48944

12.08 ( 0.87715)(7.05)

18.26391

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response

residuals

X 1.1 1.8 2.1 3.7 4.0 7.3 9.1 11.9 12.4 17.1

Y 13.1 18.3 17.6 19.1 19.3 3.2 5.6 13.6 8.0 3.0

~ E X E R C I S E ~Y

ˆY Y

2ErrSS ie

Page 22: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

ˆ 18.26391 0.87715Y X

Find estimates and for the “best” line0 1> cor(x, y) -0.7200451

( , )i ix y

ˆ( , )i ix y

ˆi i ie y y

i.e., that minimizes

1 2ˆ xy

x

s

s

0 1ˆ ˆy x

25.866670.87715

29.48944

12.08 ( 0.87715)(7.05)

18.26391

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response

residuals

X 1.1 1.8 2.1 3.7 4.0 7.3 9.1 11.9 12.4 17.1

Y 13.1 18.3 17.6 19.1 19.3 3.2 5.6 13.6 8.0 3.0

~ E X E R C I S E ~

~ E X E R C I S E ~

Y

ˆY Y

Residuals

2ErrSS ie 189.6555

Page 23: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Testing for linear association between two numerical population variables X and Y…

Linear Regression Coefficients

0 1: 0 "

."

H No linear association

between X and Y

1: 0 "

."AH Linear association

between X and Y

Linear Regression CoefficientsTest Statistic for p-value?

1 2ˆ xy

x

s

s 0 1

ˆ ˆy x

0 1Y X

0 1ˆ ˆY X

“Response = Model + Error”

Now that we have these, we can conduct HYPOTHESIS TESTING on 0 and 1

Page 24: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

ˆ 18.26391 0.87715Y X

Find estimates and for the “best” line0 1> cor(x, y) -0.7200451

( , )i ix y

ˆ( , )i ix y

ˆi i ie y y

i.e., that minimizes

2ErrSS ie

1 2ˆ xy

x

s

s

0 1ˆ ˆy x

25.866670.87715

29.48944

12.08 ( 0.87715)(7.05)

18.26391

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response

residuals

X 1.1 1.8 2.1 3.7 4.0 7.3 9.1 11.9 12.4 17.1

Y 13.1 18.3 17.6 19.1 19.3 3.2 5.6 13.6 8.0 3.0

~ E X E R C I S E ~

~ E X E R C I S E ~

Y

ˆY Y

Residuals

189.6555

Page 25: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Testing for linear association between two numerical population variables X and Y…

Linear Regression Coefficients

0 1: 0 "

."

H No linear association

between X and Y

1: 0 "

."AH Linear association

between X and Y

Linear Regression Coefficients

Now that we have these, we can conduct HYPOTHESIS TESTING on 0 and 1

Test Statistic for p-value

1 2ˆ xy

x

s

s 0 1

ˆ ˆy x

0 1Y X

0 1ˆ ˆY X

“Response = Model + Error”

2Err ˆSS ( )y y

21 12

Err

ˆ( 1)

MSx nT n s t

ErrErr

SSMS

2n

0.87715 0(9)(29.48944)

189.6555 / 8

82.935 on t

Same t-score as H0: = 0! p-value = .0189

Page 26: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

> plot(x, y, pch = 19)> lsreg = lm(y ~ x) # or lsfit(x,y)> abline(lsreg)> summary(lsreg)

Call:lm(formula = y ~ x)

Residuals: Min 1Q Median 3Q Max -8.6607 -3.2154 0.8954 3.4649 5.7742

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 18.2639 2.6097 6.999 0.000113 ***x -0.8772 0.2989 -2.935 0.018857 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.869 on 8 degrees of freedomMultiple R-squared: 0.5185, Adjusted R-squared: 0.4583 F-statistic: 8.614 on 1 and 8 DF, p-value: 0.01886

BUT WHY HAVE TWO METHODS FOR THE SAME PROBLEM???

Because this second method generalizes…

Page 27: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Source df SS MS F-ratio p-value

Treatment

Error

Total –

ANOVA Table0 1

1

: 0

: 0A

H

H

0 1Y X

Page 28: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Source df SS MS F-ratio p-value

Regression

Error

Total –

ANOVA Table0 1

1

: 0

: 0A

H

H

0 1Y X

?

Page 29: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Source df SS MS F-ratio p-value

Regression 1

Error

Total –

ANOVA Table0 1

1

: 0

: 0A

H

H

0 1Y X

?

Page 30: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Testing for linear association between two numerical population variables X and Y…

Linear Regression Coefficients

0 1: 0 "

."

H No linear association

between X and Y

1: 0 "

."AH Linear association

between X and Y

Linear Regression Coefficients

Now that we have these, we can conduct HYPOTHESIS TESTING on 0 and 1

Test Statistic for p-value

1 2ˆ xy

x

s

s 0 1

ˆ ˆy x

0 1Y X

0 1ˆ ˆY X

“Response = Model + Error”

2Err ˆSS ( )y y

21 12

Err

ˆ( 1)

MSx nT n s t

ErrErr

SSMS

2n

0.87715 0(9)(29.48944)

189.6555 / 8

82.935 on t

Same t-score as H0: = 0! p-value = .0189

Errdf 8

Page 31: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Source df SS MS F-ratio p-value

Regression 1

Error 8

Total –

ANOVA Table0 1

1

: 0

: 0A

H

H

0 1Y X

?

?

?

?

Page 32: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

Means:

Variances:

x y

n nx y 2

2

( )21

( )21

x x

x n

y y

y n

s

s

STATISTICS

JAMA. 2003;290:1486-1493

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … yn

Scatterplot

(n data points)

Total

Total

SS

df

Page 33: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

Means:

Variances:

x y

n nx y 2

2

( )21

( )21

x x

x n

y y

y n

s

s

STATISTICS

JAMA. 2003;290:1486-1493

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … yn

Scatterplot

(n data points)

Total

Total

SS

df

SSTot is a measure of the total amount

of variability in the observed responses (i.e., before any model-fitting).

2TotSS ( )y y 2( 1) yn s

Page 34: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

JAMA. 2003;290:1486-1493

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … yn

Scatterplot

(n data points)

Means:

Variances:

x y

n nx y 2

2

( )21

( )21

x x

x n

y y

y n

s

s

STATISTICS

Total

Total

SS

df

SSReg is a measure of the total amount

of variability in the fitted responses (i.e., after model-fitting.)

2TotSS ( )y y

2Reg ˆSS ( )y y

2( 1) yn s

Page 35: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Parameter Estimation via SAMPLE DATA …

Means:

Variances:

x y

n nx y 2

2

( )21

( )21

x x

x n

y y

y n

s

s

STATISTICS

Total

Total

SS

df

JAMA. 2003;290:1486-1493

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … yn

Scatterplot

(n data points)

SSErr is a measure of the total amount

of variability in the resulting residuals (i.e., after model-fitting).

2Err ˆSS ( )y y

2TotSS ( )y y

2Reg ˆSS ( )y y

2( 1) yn s

Page 36: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

ˆ 18.26391 0.87715Y X > cor(x, y) -0.7200451

( , )i ix y

ˆ( , )i ix y

ˆi i ie y y

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response

residuals

X 1.1 1.8 2.1 3.7 4.0 7.3 9.1 11.9 12.4 17.1

Y 13.1 18.3 17.6 19.1 19.3 3.2 5.6 13.6 8.0 3.0

~ E X E R C I S E ~

~ E X E R C I S E ~

Y

ˆY Y

Residuals

2Err ˆSS ( )y y

2TotSS ( )y y

2Reg ˆSS ( )y y

2( 1) yn s

= 189.656

= 393.856

= 9 (43.76178)

= 204.2

Page 37: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

ˆ 18.26391 0.87715Y X

( , )i ix y

ˆ( , )i ix y

ˆi i ie y y

SIMPLE LINEAR REGRESSION via the METHOD OF LEAST SQUARES

predictor

observed response

fitted response

residuals

X 1.1 1.8 2.1 3.7 4.0 7.3 9.1 11.9 12.4 17.1

Y 13.1 18.3 17.6 19.1 19.3 3.2 5.6 13.6 8.0 3.0

~ E X E R C I S E ~

~ E X E R C I S E ~

Y

ˆY Y

Residuals

2Err ˆSS ( )y y

2TotSS ( )y y

2Reg ˆSS ( )y y

= 189.656

= 393.856

= 204.2

SSTot = SSReg + SSErr minimum

> cor(x, y) -0.7200451

TotErr

Reg

Page 38: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Source df SS MS F-ratio p-value

Regression 1 204.200 MSReg

Fk – 1, n – k 0 < p < 1

Error 8 189.656 MSErr

Total 9 393.856 –

ANOVA Table0 1

1

: 0

: 0A

H

H

0 1Y X

Page 39: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Source df SS MS F-ratio p-value

Regression 1 204.200 204.200

8.61349 0.018857

Error 8 189.656 23.707

Total 9 393.856 –

ANOVA Table

Same as before!

0 1

1

: 0

: 0A

H

H

0 1Y X

Page 40: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Source df SS MS F-ratio p-value

Regression 1 204.200 204.200

8.61349 0.018857

Error 8 189.656 23.707

Total 9 393.856 –

> summary(aov(lsreg))

Df Sum Sq Mean Sq F value Pr(>F) x 1 204.20 204.201 8.6135 0.01886 *Residuals 8 189.66 23.707

Page 41: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Source df SS MS F-ratio p-value

Regression 1 204.200 204.200

8.61349 0.018857

Error 8 189.656 23.707

Total 9 393.856 –

Reg

Tot

SS

SS

204.2.

393.856 0.5185Moreover,

The least squares regression line accounts for 51.85% of the total variability in the observed response, with 48.15% remaining.

Coefficient of Determination

Page 42: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

> cor(x, y) -0.7200451

Coefficient of Determination

Reg

Tot

SS

SS

204.2.

393.856 0.5185Moreover,

The least squares regression line accounts for 51.85% of the total variability in the observed response, with 48.15% remaining.

2 2( 0.72)r 0.5185

Page 43: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

> plot(x, y, pch = 19)> lsreg = lm(y ~ x)> abline(lsreg)> summary(lsreg)

Call:lm(formula = y ~ x)

Residuals: Min 1Q Median 3Q Max -8.6607 -3.2154 0.8954 3.4649 5.7742

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 18.2639 2.6097 6.999 0.000113 ***x -0.8772 0.2989 -2.935 0.018857 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.869 on 8 degrees of freedomMultiple R-squared: 0.5185, Adjusted R-squared: 0.4583 F-statistic: 8.614 on 1 and 8 DF, p-value: 0.01886

Reg

Tot

SS

SS 0.5185

2 2( 0.72)r 2 2( 0.72)r 0.5185Coefficient of Determination

The least squares regression line accounts for 51.85% of the total variability in the observed response, with 48.15% remaining.

Page 44: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Given:

2 2

xy

x y

sr

s s

Linear Correlation Coefficient

Least Squares Regression Line

12ˆ ,xy xs s 0 1

ˆ ˆy x 0 1

ˆ ˆˆ XY minimizes SSErr =

Summary of Linear Correlation and Simple Linear Regression

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … yn

Means

x

y

Variances2

2

x

y

s

s

Covariance

xys

JAMA. 2003;290:1486-1493

X

Y

X

Y

= SSTot – SSReg

2ˆ( )y y

–1 r +1measures the strength of linear association

(ANOVA)All point estimates can be upgraded to CIs for hypothesis testing, etc.

Page 45: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Given:

2 2

xy

x y

sr

s s

Linear Correlation Coefficient

–1 r +1measures the strength of linear association

Least Squares Regression Line

12ˆ ,xy xs s 0 1

ˆ ˆy x

minimizes SSErr =

(ANOVA)

Summary of Linear Correlation and Simple Linear Regression

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … yn

Means Variances Covariance

x

y

2

2

x

y

s

sxys

JAMA. 2003;290:1486-1493

X

Y

X

Y

= SSTot – SSReg

2ˆ( )y y0 1ˆ ˆˆ XY

upper 95% confidence band

95% Confidence Intervals

lower 95% confidence band

y

All point estimates can be upgraded to CIs for hypothesis testing, etc.

(see notes for “95% prediction intervals”)

Page 46: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Given:

2 2

xy

x y

sr

s s

Linear Correlation Coefficient

–1 r +1measures the strength of linear association

Least Squares Regression Line

12ˆ ,xy xs s 0 1

ˆ ˆy x 0 1

ˆ ˆˆ XY minimizes SSErr =

(ANOVA)

Summary of Linear Correlation and Simple Linear Regression

x1 x2 x3 x4 … xn

y1 y2 y3 y4 … yn

Means Variances Covariance

x

y

2

2

x

y

s

sxys

JAMA. 2003;290:1486-1493

X

Y

X

Y

Coefficient of Determination

= SSTot – SSReg

2ˆ( )y y

All point estimates can be upgraded to CIs for hypothesis testing, etc.

Reg2

Tot

SS

SSr proportion of total variability modeled

by the regression line’s variability.

Page 47: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Testing for linear association between a population response variable Y and multiple predictor variables X1, X2, X3, … etc.

0 1 1 2 2 3 3 1 1k kY X X X X

0 1 1 2 2 1 1ˆ ˆ ˆ ˆˆ

k kY X X X

0 1 2 3 1: 0kH 1 2 3 1

"

, , , , ."k

No linear association between Y and

any of its predictors X X X X : 0

for some 1,2,..., 1A iH

i k

"

."

Linear association between Y and

at least one of its predictors

“Response = Model + Error”

Multilinear Regression

“main effects”

For now, assume the “additive model,” i.e., main effects only.

Page 48: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Multilinear Regression

Fitted response

Residual

True response yi

X1

X20

Y

(x1i , x2i)

Predictors

ˆiy

ˆii iyye

1 2( , , )x x y

0 1 1 2 2ˆ ˆ ˆ ˆY X X

Once calculated, how do we then test the null hypothesis?

Least Squares calculation of regression coefficients is computer-intensive. Formulas require Linear Algebra (matrices)!

ANOVA

Page 49: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Testing for linear association between a population response variable Y and multiple predictor variables X1, X2, X3, … etc.

0 1 1 2 2 3 3 1 1k kY X X X X

“Response = Model + Error”

Multilinear Regression

“main effects”

R code example: lsreg = lm(y ~ x1+x2+x3)

Page 50: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

R code example: lsreg = lm(y ~ x1+x2+x3)R code example: lsreg = lm(y ~ x+x^2+x^3)

Testing for linear association between a population response variable Y and multiple predictor variables X1, X2, X3, … etc.

0 1 1 2 2 3 3 1 1

2 2 21,1 1 2,2 2 1, 1 1

cubes +

k k

k k k

Y X X X X

X X X

“Response = Model + Error”

Multilinear Regression

“main effects”

quadratic terms, etc.(“polynomial regression”)

Page 51: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

R code example: lsreg = lm(y ~ x+x^2+x^3)R code example: lsreg = lm(y ~ x1+x2+x1:x2)R code example: lsreg = lm(y ~ x1*x2)

Testing for linear association between a population response variable Y and multiple predictor variables X1, X2, X3, … etc.

0 1 1 2 2 3 3 1 1

2 2 21,1 1 2,2 2 1, 1 1

1,2 1 2 1,3 1 3 1, 1 1 1

2,3 2 3 2,4 2 4 2, 1 2 1

cubes +

+

+

+

k k

k k k

k k

k k

Y X X X X

X X X

X X X X X X

X X X X X X

“Response = Model + Error”

Multilinear Regression

“main effects”

quadratic terms, etc.(“polynomial regression”)

“interactions”“interactions”

Page 52: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 53: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 54: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 55: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 56: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Recall…

Example in R (reformatted for brevity):

> I = c(1,1,1,1,1,0,0,0,0,0)

> lsreg = lm(y ~ x*I)> summary(lsreg)

Coefficients:

Estimate(Intercept) 6.56463x 0.00998I 6.80422x:I 1.60858

Suppose these are actually two subgroups, requiring two distinct linear regressions!

Multiple Linear Reg with interactionwith an indicator (“dummy”) variable:

ˆ 6.56 0.01 6.80 1.61Y X I X I

I = 1

I = 0

ˆ 6.56 0.01Y X

ˆ 13.36 1.62Y X 0 1 2 3

ˆ ˆ ˆ ˆY X I X I

Page 57: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

ANOVA Table (revisited)

0 1 2 3 1: 0kH

From sample of n data points…. 0 1 1 2 2 1 1ˆ ˆ ˆ ˆˆ

k kY X X X

-11 2 3 k

"No linear association between Y and

any of its predictors X , X , X ,…, X ."

0 1 1 2 2 1 1k kY X X X

: 0

for some 1,2,..., 1A iH

i k

"Linear association between Y and

at least one of its predictors."

Note that if true, then it would follow that 0Y

But how are these regression coefficients calculated in general?

“Normal equations” solved via computer (intensive).

Note that if true, then it would follow that

0 .Y

0ˆ .y

Page 58: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Source df SS MS F p-value

Regression

Error

Total

0 1 2 3 1: 0kH

1k

n k

1n 2

1

( )n

ii

y y

2

1

ˆ( )n

ii

y y

2

1

ˆ( )n

i ii

y y

SS

df

RegMS

ErrMS

Reg

Err

MS

MS

1,k n kF 0 1p

0 1 1 2 2 1 1ˆ ˆ ˆ ˆˆ

k kY X X X

1 2 3 1

"

, , , , ."k

No linear association between Y and

any of its predictors X X X X

ANOVA Table (revisited)

(based on n data points).

*** How are only the statistically significant variables determined? ***

Page 59: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

p-values: p1 < .05 p2 < .05 p4 < .05 3 .05p

“MODEL SELECTION”(BE) Step 0. Conduct an overall F-test of significance (via ANOVA) of the full model.

X1

+ + + + ……

X2 X3 X4

1 23 4

Y

0 1: 0H 0 2: 0H 0 3: 0H 0 4: 0H ……Step 1.t-tests:

Reject H0 Reject H0 Accept H0 Reject H0

…… ……

Step 2. Are all coefficients significant at level ? If not….

If significant, then…

3 .05p

Page 60: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

p-values: p1 < .05 p2 < .05 p4 < .05 3 .05p 3 .05p

“MODEL SELECTION”(BE) Step 0. Conduct an overall F-test of significance (via ANOVA) of the full model.

X1

+ + + + ……

X2 X3 X4

1 23 4

Y

0 1: 0H 0 2: 0H 0 3: 0H 0 4: 0H ……Step 1.t-tests:

Reject H0 Reject H0 Accept H0 Reject H0

…… ……

Step 2. Are all coefficients significant at level ? If not….

X1

+ + + + ……

X2 X41 2 X3

3 4

Y

delete that term,

If significant, then…

Page 61: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

3 .05p p-values: p1 < .05 p2 < .05 p4 < .05

“MODEL SELECTION”(BE) Step 0. Conduct an overall F-test of significance (via ANOVA) of the full model.

X1

+ + + + ……

X2 X3 X4

1 23 4

Y

0 1: 0H 0 2: 0H 0 3: 0H 0 4: 0H ……Step 1.t-tests:

Reject H0 Reject H0 Accept H0 Reject H0

…… ……

Step 2. Are all coefficients significant at level ? If not….

X1

+ + + + ……

X2 X41 2

4

Y

Step 3. Repeat 1-2 as necessary until all coefficients are significant → reduced model

delete that term, and recompute new coefficients!

If significant, then…

X1 X2 X4

+ + + ……

1 2 4

Page 62: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

1

2

k

1Y 2Y kY

1 2 k

12

k

= ==H0:

Analysis of Variance (ANOVA)k 2 independent, equivariant, normally-distributed “treatment groups”

Recall ~

MODEL ASSUMPTIONS?

Page 63: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

“Regression Diagnostics”

Page 64: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 65: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 66: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 67: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 68: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 69: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 70: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 71: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 72: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Re-plot data on a “log-log” scale.

Page 73: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 74: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple
Page 75: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Re-plot data on a “log” scale (of Y only)..

Page 76: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Binary outcome, e.g., “Have you ever had surgery?” (Yes / No)

Page 77: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Binary outcome, e.g., “Have you ever had surgery?” (Yes / No)

Page 78: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Binary outcome, e.g., “Have you ever had surgery?” (Yes / No)

0 1

ˆ ˆ ˆlnˆ1

X

0 1ˆ ˆ( )

1ˆ1 Xe

“MAXIMUM LIKELIHOOD ESTIMATION”

“log-odds” (“logit”) = example of a general “link function”

( )g

(Note: Not based on LS implies “pseudo-R2,” etc.)

Page 79: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Binary outcome, e.g., “Have you ever had surgery?” (Yes / No)

0 1 1 2 2

ˆ ˆ ˆ ˆ ˆlnˆ1

k kX X X

0 1 1 2 2ˆ ˆ ˆ ˆ( )

1ˆ1 k kX X Xe

Suppose one of the predictor variables is binary… 1

1, Age 500, Age 50

X

“log-odds” (“logit”)

10 1 2 2

1

ˆ ˆ ˆ ˆ ˆlnˆ1

k kX X

1 1:X

1 0 :X 00 2 2

0

ˆ ˆ ˆ ˆlnˆ1

k kX X

SUBTRACT!

Page 80: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Binary outcome, e.g., “Have you ever had surgery?” (Yes / No)

0 1 1 2 2

ˆ ˆ ˆ ˆ ˆlnˆ1

k kX X X

0 1 1 2 2ˆ ˆ ˆ ˆ( )

1ˆ1 k kX X Xe

Suppose one of the predictor variables is binary… 1

1, Age 500, Age 50

X

“log-odds” (“logit”)

10

1

ˆ ˆlnˆ1

1 2 2ˆ ˆ X ˆ

k kX 1 1:X

1 0 :X 00

0

ˆ ˆlnˆ1

2 2ˆ X ˆ

k kX

SUBTRACT!

Page 81: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Binary outcome, e.g., “Have you ever had surgery?” (Yes / No)

0 1 1 2 2

ˆ ˆ ˆ ˆ ˆlnˆ1

k kX X X

0 1 1 2 2ˆ ˆ ˆ ˆ( )

1ˆ1 k kX X Xe

Suppose one of the predictor variables is binary… 1

1, Age 500, Age 50

X

“log-odds” (“logit”)

011

1 0

ˆˆ ˆln lnˆ ˆ1 1

Page 82: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

Binary outcome, e.g., “Have you ever had surgery?” (Yes / No)

0 1 1 2 2

ˆ ˆ ˆ ˆ ˆlnˆ1

k kX X X

0 1 1 2 2ˆ ˆ ˆ ˆ( )

1ˆ1 k kX X Xe

Suppose one of the predictor variables is binary… 1

1, Age 500, Age 50

X

“log-odds” (“logit”)

1

11

0

0

ˆ

ˆ1 ˆlnˆ

ˆ1

Page 83: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

1

odds of surgery given Age 50 ˆlnodds of surgery given Age 50

Binary outcome, e.g., “Have you ever had surgery?” (Yes / No)

0 1 1 2 2

ˆ ˆ ˆ ˆ ˆlnˆ1

k kX X X

0 1 1 2 2ˆ ˆ ˆ ˆ( )

1ˆ1 k kX X Xe

Suppose one of the predictor variables is binary… 1

1, Age 500, Age 50

X

“log-odds” (“logit”)

ODDS RATIO

OR e ………….. implies ………….. 1ˆln OR

Page 84: 7.1 - Motivation 7.1 - Motivation 7.2 - Correlation / Simple Linear Regression 7.2 - Correlation / Simple Linear Regression 7.3 - Extensions of Simple

(1 )d

adt

d y

a ydt

( )M y

1

(1 )d a dt

ln | | ln |1 | at b

ln1

at b

d ya y

dt

in population dynamics

Unrestricted population growth(e.g., bacteria)

Population size y obeys the following law

with constant a > 0.

1d y a dt

y

ln | |y at b a t by e a t be e

0a ty y e

a tC eWith initial condition 0(0)y y

Restricted population growth(disease, predation, starvation, etc.)

Population size y obeys the following law,constant a > 0, and “carrying capacity” M.

Exponential growth

Let survival probability = .y M

1 1

1d a dt

Logistic growth0

0 0(1 ) ate