regression. correlation measures the strength of the linear relationship great! but what is that...

17
Regression

Upload: annis-floyd

Post on 19-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Regression

Page 2: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Regression

• Correlation measures the strength of the linear relationship

• Great! But what is that relationship? How do we describe it?

– regression, regression line, regression equation

• Regression line is used for prediction

Page 3: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Predicting weights from heights• Independent variable: height• Dependent variable: weight• How can we predict one from the other ?• Regression is to a scatter plot as the mean is to a

histogram.

Page 4: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Weights vs. Heights

Page 5: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

YRS EM

302520151050-5

SA

LA

RY

70000

60000

50000

40000

30000

20000

Salary by years employed

Page 6: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Regression by local averages

Approximation ofLocal averages by regression line

Inappropriate useof regression line(use other methods)

Page 7: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

The equation of a line

• a represents the y-intercept

– when x equals zero, y equals a

– Is this always meaningful in the context of a problem?

– Is it always useful in defining a line?

• b represents the slope of the line (rise/run)

– for every unit change in x, y changes by b.

– Does this mean that if we physically change x by one unit, y will change by b units? Say we gain another year of experience. Will our salary go up by 1107?

bxay

Page 8: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Regression equation• What is the predicted weight of somebody

whose height is h cm ?

• w = intercept + slope x h

• This is known as the regression equation.

• How do we get this formula ?

• We have a statistical model

Page 9: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

YRS EM

302520151050-5

SA

LAR

Y

70000

60000

50000

40000

30000

20000

A residual

xy 110728394

line regression gives Minimising

errors, squared of sum theMinimise 2i

Regression line by minimising residual errors

iii bxay i = error of i-th obs from regression line •The best candidate line willminimise these errors•No line can make all errors vanish (some +ve, some –ve)

Page 10: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Regression and correlation• Want to predict weight for those people who are 1 SD

more than avg. height.

• SD line says:• pred. wt. = overall avg. wt. + SD of wt.

• Regression line says:• Predicted wt. = overall avg. wt. + r x SD of wt.• • For people who are k SDs away from avg. height:• Predicted wt. = overall avg. wt. + r x k SD of wt.• Clearly valid for r 0 or r 1

Page 11: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

RMS error of regression

• RMS error = SD of y

• RMS inversely related to correlation

21 r

RMS error is to regression what SD is to average

Page 12: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Residuals

residual =observed -predicted

Page 13: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Example: ozone vs. temperature> air[,c(1,3)]

ozone temperature

3.45 67

3.30 72

2.29 74

2.62 62

2.84 65

. . .> cor(ozone,temperature)

[1] 0.7531038

Page 14: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Fitting a regression model in S> ozone.lm <- lm(ozone ~ temperature, data = air)

Coefficients:

. Value Std. Error tvalue Pr(>|t|)

(Intercept) -2.23 0.46 -4.82 0.0000

temperature 0.07 0.01 11.95 0.0000

Multiple R-Squared: 0.5672

> var(ozone)

[1] 0.7928069

> var(resid(ozone.lm))

[1] 0.3431544

> cor(ozone,temperature)

[1] 0.7531038

Page 15: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Checking model appropriatenessWhat assumptions have we made in the regression model ?

Checking model assumptions in S-plus

> par(mfrow=c(2,3))

> plot(ozone.lm)

Page 16: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Fitted : temperature

Res

idua

ls

2.0 2.5 3.0 3.5 4.0 4.5

-10

12

45

23

77

fitssq

rt(a

bs(R

esid

uals

))

2.0 2.5 3.0 3.5 4.0 4.5

0.2

0.4

0.6

0.8

1.0

1.2

1.4

4523

77

Fitted : temperature

ozon

e

2.0 2.5 3.0 3.5 4.0 4.5

12

34

5

Quantiles of Standard Normal

Res

idua

ls

-2 -1 0 1 2

-10

12

45

23

77

Fitted Values

0.0 0.4 0.8

-10

12

Residuals

0.0 0.4 0.8

-10

12

f-value

ozon

e

Index

Coo

k's

Dis

tanc

e0 20 40 60 80 100

0.0

0.02

0.04

0.06 17 77

20

Residual diagnostics for ozone data

Page 17: Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? –regression, regression

Pizza party at the Frat.• How many laps would you

predict a pledge could run if he ate 6 slices of pizza?

• How many laps if he ate 9 slices of pizza?

• A pledge shows off and eats 35 slices of pizza. How many laps would you predict he would run? SLICES

121086420D

ISTA

NC

E

20

18

16

14

12

10

8

6

4

2

965.0

5.120

r

xy

Beware of extrapolation