topic 12 simple linear regression and correlation

72
TOPIC 12 Simple Linear Regression and Correlation

Upload: magdalen-poole

Post on 17-Dec-2015

246 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TOPIC 12 Simple Linear Regression and Correlation

TOPIC 12TOPIC 12

Simple Linear Regression and

Correlation

Simple Linear Regression and

Correlation

Page 2: TOPIC 12 Simple Linear Regression and Correlation

• Representation of some phenomenon• Mathematical model is a mathematical

expression of some phenomenon• Often describe relationships between

variables• Types

– Deterministic models– Probabilistic models

ModelsModels

Page 3: TOPIC 12 Simple Linear Regression and Correlation

• Hypothesize exact relationships

• Suitable when prediction error is negligible

• Example: force is exactly mass times acceleration– F = m·a

Deterministic ModelsDeterministic Models

Page 4: TOPIC 12 Simple Linear Regression and Correlation

• Hypothesize two components– Deterministic– Random error

• Example: sales volume (y) is 10 times advertising spending (x) + random error– y = 10x + – Random error may be due to factors

other than advertising

Probabilistic ModelsProbabilistic Models

Page 5: TOPIC 12 Simple Linear Regression and Correlation

ProbabilisticModels

Regression Models

Correlation Models

Regression ModelsRegression Models

Page 6: TOPIC 12 Simple Linear Regression and Correlation

• Answers ‘What is the relationship between the variables?’

• Equation used– One numerical dependent (response) variable

What is to be predicted– One or more numerical or categorical

independent (explanatory) variables• Used mainly for prediction and estimation

Regression ModelsRegression Models

Page 7: TOPIC 12 Simple Linear Regression and Correlation

1. Hypothesize deterministic component

2. Estimate unknown model parameters

3. Specify probability distribution of random error term• Estimate standard deviation of error

4. Evaluate model

5. Use model for prediction and estimation

Regression Modeling StepsRegression Modeling Steps

Page 8: TOPIC 12 Simple Linear Regression and Correlation

Simple

1 ExplanatoryVariable

RegressionModels

2+ ExplanatoryVariables

Multiple

LinearNon-

LinearLinear

Non-Linear

Types of Regression ModelsTypes of Regression Models

Page 9: TOPIC 12 Simple Linear Regression and Correlation

y x 0 1

Relationship between variables is a linear function

Dependent (Response) Variable

Independent (Explanatory) Variable

Population Slope

Population y-intercept

Random Error

Linear Regression ModelLinear Regression Model

Page 10: TOPIC 12 Simple Linear Regression and Correlation

Population

$ $

$

$

$

Unknown Relationship

0 1y x

Random Sample

$ $

$ $

0 1ˆ ˆ ˆy x

Population & Sample Regression ModelPopulation & Sample Regression Model

Page 11: TOPIC 12 Simple Linear Regression and Correlation

y

x

0 1i i iy x

0 1E y x

Observedvalue

Observed value

i = Random error

(line of means)

Population Linear Regression ModelPopulation Linear Regression Model

Page 12: TOPIC 12 Simple Linear Regression and Correlation

y

x

0 1ˆ ˆ ˆi i iy x

0 1ˆ ˆˆi iy x

Unsampled observation

i = Random

error

Observed value

^

Sample Linear Regression ModelSample Linear Regression Model

Page 13: TOPIC 12 Simple Linear Regression and Correlation

1. Plot of all (xi, yi) pairs

2. Suggests how well model will fit

0204060

0 20 40 60

x

y

Scatter DiagramScatter Diagram

Page 14: TOPIC 12 Simple Linear Regression and Correlation

0204060

0 20 40 60

x

y

• How would you draw a line through the points?• How do you determine which line ‘fits best’?

Thinking ChallengeThinking Challenge

Page 15: TOPIC 12 Simple Linear Regression and Correlation

• ‘Best fit’ means difference between actual y values and predicted y values are a minimum– But positive differences off-set negative

2 2

1 1

ˆ ˆn n

ii ii i

y y

• Least Squares minimizes the Sum of the Squared Differences (SSE)

Least Squares MethodLeast Squares Method

Page 16: TOPIC 12 Simple Linear Regression and Correlation

e2

y

x

e1 e

3

e4

^^

^^

2 0 1 2 2ˆ ˆ ˆy x

0 1ˆ ˆˆi iy x

2 2 2 2 21 2 3 4

1

ˆ ˆ ˆ ˆ ˆLS minimizes n

ii

Least Squares GraphicallyLeast Squares Graphically

Page 17: TOPIC 12 Simple Linear Regression and Correlation

1 1

11 2

12

1

ˆ

n n

i ini i

i ixy i

nxx

ini

ii

x y

x ySS nSS

x

xn

Slope

y-intercept

0 1ˆ ˆy x Prediction Equation

0 1ˆ ˆy x

Coefficient EquationsCoefficient Equations

Page 18: TOPIC 12 Simple Linear Regression and Correlation

You’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $ Sales (Units)1 12 13 24 25 4

Find the least squares line relatingsales and advertising.

Least Squares ExampleLeast Squares Example

Page 19: TOPIC 12 Simple Linear Regression and Correlation

2 2

1 1 1 1 1

2 1 4 1 2

3 2 9 4 6

4 2 16 4 8

5 4 25 16 20

15 10 55 26 37

xi yi xi yi xiyi

Parameter Estimation Solution TableParameter Estimation Solution Table

Page 20: TOPIC 12 Simple Linear Regression and Correlation

1 1

11 2 2

12

1

15 1037

5ˆ .7015

555

n n

i ini i

i ii

n

ini

ii

x y

x yn

x

xn

ˆ .1 .7y x

0 1ˆ ˆ 2 .70 3 .10y x

Parameter Estimation SolutionParameter Estimation Solution

Page 21: TOPIC 12 Simple Linear Regression and Correlation

Parameter Estimates

Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|INTERCEP 1 -0.1000 0.6350 -0.157 0.8849ADVERT 1 0.7000 0.1914 3.656 0.0354

0^

1^

ˆ .1 .7y x

Parameter Estimation Computer OutputParameter Estimation Computer Output

Page 22: TOPIC 12 Simple Linear Regression and Correlation

0

1

2

3

4

0 1 2 3 4 5

Sales

Advertising

ˆ .1 .7y x

Regression Line Fitted to the DataRegression Line Fitted to the Data

Page 23: TOPIC 12 Simple Linear Regression and Correlation

You’re an economist for the county cooperative. You gather the following data:

Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.510 6.512 9.0

Find the least squares line

Relating crop yield and fertilizer.

Least Squares Thinking ChallengeLeast Squares Thinking Challenge

Page 24: TOPIC 12 Simple Linear Regression and Correlation

4 3.0 16 9.00 12

6 5.5 36 30.25 33

10 6.5 100 42.25 65

12 9.0 144 81.00 108

32 24.0 296 162.50 218

xi2yi xi yi xiyi

2

Parameter Estimation Solution TableParameter Estimation Solution Table

Page 25: TOPIC 12 Simple Linear Regression and Correlation

1 1

11 2 2

12

1

32 24218

4ˆ .6532

2964

n n

i ini i

i ii

n

ini

ii

x y

x yn

x

xn

0 1ˆ ˆ 6 .65 8 .80y x

ˆ .8 .65y x

Parameter Estimation SolutionParameter Estimation Solution

Page 26: TOPIC 12 Simple Linear Regression and Correlation

02468

10

0 5 10 15

Yield (lb.)

Fertilizer (lb.)

ˆ .8 .65y x

Regression Line Fitted to the Data*Regression Line Fitted to the Data*

Page 27: TOPIC 12 Simple Linear Regression and Correlation

1. Mean of probability distribution of error, ε, is 0

2. Probability distribution of error has constant variance

3. Probability distribution of error, ε, is normal

4. Errors are independent

Linear Regression AssumptionsLinear Regression Assumptions

Page 28: TOPIC 12 Simple Linear Regression and Correlation

x1 x2 x3

yE(y) = β0 + β1x

x

Error Probability DistributionError Probability Distribution

Page 29: TOPIC 12 Simple Linear Regression and Correlation

• Measured by standard error of regression model– Sample standard deviation of : s^

• Affects several factors– Parameter significance– Prediction accuracy

^• Variation of actual y from predicted y, y

Random Error VariationRandom Error Variation

Page 30: TOPIC 12 Simple Linear Regression and Correlation

y

xxi

0 1ˆ ˆˆi iy x

yi 2ˆ( )i iy y

Unexplained sum of squares

2( )iy yTotal sum of squares

2ˆ( )iy yExplained sum of squares

y

Variation MeasuresVariation Measures

Page 31: TOPIC 12 Simple Linear Regression and Correlation

22 ˆ2 i i

SSEs where SSE y y

n

2

2

SSEs s

n

Estimation of σ2Estimation of σ2

Page 32: TOPIC 12 Simple Linear Regression and Correlation

You’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $ Sales (Units)1 12 13 24 25 4

Find SSE, s2, and s.

Calculating SSE, s2, s ExampleCalculating SSE, s2, s Example

Page 33: TOPIC 12 Simple Linear Regression and Correlation

1 1

2 1

3 2

4 2

5 4

xi yi

.6

1.3

2

2.7

3.4

ˆy y

.4

-.3

0

-.7

.6

2ˆ( )y yˆ .1 .7y x

.16

.09

0

.49

.36

SSE=1.1

Calculating SSE SolutionCalculating SSE Solution

Page 34: TOPIC 12 Simple Linear Regression and Correlation

Calculating s2 and s Solution

2 1.1.36667

2 5 2

SSEs

n

.36667 .6055s

Estimation of σ2Estimation of σ2

Page 35: TOPIC 12 Simple Linear Regression and Correlation

• Shows if there is a linear relationship between x and y

• Involves population slope 1

• Hypotheses – H0: 1 = 0 (No Linear Relationship)

– Ha: 1 0 (Linear Relationship)

• Theoretical basis is sampling distribution of slope

Test of Slope CoefficientTest of Slope Coefficient

Page 36: TOPIC 12 Simple Linear Regression and Correlation

y

Population Linex

Sample 1 Line

Sample 2 Line

b1

Sampling Distribution

1

1S

^

^

All Possible Sample Slopes

Sample 1: 2.5Sample 2: 1.6 Sample 3: 1.8Sample 4: 2.1 : :

Very large number of sample slopes

Sampling Distribution of Sample SlopesSampling Distribution of Sample Slopes

Page 37: TOPIC 12 Simple Linear Regression and Correlation

1

1 1

ˆ

2

12

1

ˆ ˆ2

wherexx

n

ini

xx ii

t df nsS

SS

x

SS xn

Slope Coefficient Test StatisticSlope Coefficient Test Statistic

Page 38: TOPIC 12 Simple Linear Regression and Correlation

You’re a marketing analyst for Hasbro Toys.

You find β0 = –.1, β1 = .7 and s = .6055.Ad $ Sales (Units)1 12 13 24 25 4

Is the relationship significant at the .05 level of significance?

^^

Test of Slope Coefficient ExampleTest of Slope Coefficient Example

Page 39: TOPIC 12 Simple Linear Regression and Correlation

xi yi xi2 yi

2 xiyi

1 1 1 1 1

2 1 4 1 2

3 2 9 4 6

4 2 16 4 8

5 4 25 16 20

15 10 55 26 37

Solution TableSolution Table

Page 40: TOPIC 12 Simple Linear Regression and Correlation

1

1

ˆ 2

1

ˆ

.6055.1914

1555

5

ˆ .703.657

.1914

xx

SS

SS

tS

Test Statistic SolutionTest Statistic Solution

Page 41: TOPIC 12 Simple Linear Regression and Correlation

• H0: 1 = 0

• Ha: 1 0• .05• df 5 - 2 = 3• Critical Value(s):

Test Statistic:

Decision:

Conclusion:

1

1

ˆ

ˆ .703.657

.1914t

S

Reject at = .05

There is evidence of a relationshipt0 3.182-3.182

.025

Reject H0 Reject H0

.025

Test of Slope Coefficient SolutionTest of Slope Coefficient Solution

Page 42: TOPIC 12 Simple Linear Regression and Correlation

Parameter Estimates Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|INTERCEP 1 -0.1000 0.6350 -0.157 0.8849ADVERT 1 0.7000 0.1914 3.656 0.0354

t = 1 / S

P-Value

S11 1

^^^^

Test of Slope Coefficient Computer OutputTest of Slope Coefficient Computer Output

Page 43: TOPIC 12 Simple Linear Regression and Correlation

ProbabilisticModels

Regression Models

Correlation Models

Correlation Model MapCorrelation Model Map

Page 44: TOPIC 12 Simple Linear Regression and Correlation

• Answers ‘How strong is the linear relationship between two variables?’

• Coefficient of correlation– Sample correlation coefficient denoted r– Values range from –1 to +1– Measures degree of association– Does not indicate cause–effect

relationship

Correlation ModelsCorrelation Models

Page 45: TOPIC 12 Simple Linear Regression and Correlation

xy

xx yy

SSr

SS SS

2

2

2

2

xx

yy

xy

xSS x

n

ySS y

n

x ySS xy

n

where

Coefficient of CorrelationCoefficient of Correlation

Page 46: TOPIC 12 Simple Linear Regression and Correlation

–1.0 +1.00–.5 +.5

No Linear Correlation

Perfect Negative

Correlation

Perfect Positive

Correlation

Increasing degree of positive correlation

Increasing degree of negative correlation

Coefficient of Correlation ValuesCoefficient of Correlation Values

Page 47: TOPIC 12 Simple Linear Regression and Correlation

You’re a marketing analyst for Hasbro Toys.

Ad $ Sales (Units)1 12 13 24 25 4

Calculate the coefficient ofcorrelation.

Coefficient of Correlation ExampleCoefficient of Correlation Example

Page 48: TOPIC 12 Simple Linear Regression and Correlation

xi yi xi2 yi

2 xiyi

1 1 1 1 1

2 1 4 1 2

3 2 9 4 6

4 2 16 4 8

5 4 25 16 20

15 10 55 26 37

Solution TableSolution Table

Page 49: TOPIC 12 Simple Linear Regression and Correlation

22

2

22

2

(15)55 10

5

(10)26 6

5

(15)(10)37 7

5

xx

yy

xy

xSS x

n

ySS y

n

x ySS xy

n

7.904

10 6xy

xx yy

SSr

SS SS

Coefficient of Correlation SolutionCoefficient of Correlation Solution

Page 50: TOPIC 12 Simple Linear Regression and Correlation

You’re an economist for the county cooperative. You gather the following data:

Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.510 6.512 9.0

Find the coefficient of correlation.

Coefficient of Correlation Thinking ChallengeCoefficient of Correlation Thinking Challenge

Page 51: TOPIC 12 Simple Linear Regression and Correlation

4 3.0 16 9.00 12

6 5.5 36 30.25 33

10 6.5 100 42.25 65

12 9.0 144 81.00 108

32 24.0 296 162.50 218

xi2yi xi yi xiyi

2

Solution TableSolution Table

Page 52: TOPIC 12 Simple Linear Regression and Correlation

22

2

22

2

(32)296 40

4

(24)162.5 18.5

4

(32)(24)218 26

4

xx

yy

xy

xSS x

n

ySS y

n

x ySS xy

n

26.956

40 18.5xy

xx yy

SSr

SS SS

Coefficient of Correlation SolutionCoefficient of Correlation Solution

Page 53: TOPIC 12 Simple Linear Regression and Correlation

Proportion of variation ‘explained’ by relationship between x and y

2 Explained Variation

Total Variationyy

yy

SS SSEr

SS

0 r2 1

r2 = (coefficient of correlation)2

Coefficient of DeterminationCoefficient of Determination

Page 54: TOPIC 12 Simple Linear Regression and Correlation

You’re a marketing analyst for Hasbro Toys.

You know r = .904.

Ad $ Sales (Units)1 12 13 24 25 4

Calculate and interpret thecoefficient of determination.

Coefficient of Determination ExampleCoefficient of Determination Example

Page 55: TOPIC 12 Simple Linear Regression and Correlation

r2 = (coefficient of correlation)2

r2 = (.904)2

r2 = .817

Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model.

Coefficient of Determination SolutionCoefficient of Determination Solution

Page 56: TOPIC 12 Simple Linear Regression and Correlation

Root MSE 0.60553 R-square 0.8167 Dep Mean 2.00000 Adj R-sq 0.7556 C.V. 30.27650

r2 adjusted for number of explanatory variables & sample size

r2

r2 Computer Outputr2 Computer Output

Page 57: TOPIC 12 Simple Linear Regression and Correlation

• Types of predictions– Point estimates– Interval estimates

• What is predicted– Population mean response E(y) for given x

Point on population regression line

– Individual response (yi) for given x

Prediction With Regression ModelsPrediction With Regression Models

Page 58: TOPIC 12 Simple Linear Regression and Correlation

Mean y, E(y)

y

^yIndividual

Prediction, y

E(y) = b0 + b1x

^x

xP

^ ^y i =

b 0 + b 1x

What is Predicted?What is Predicted?

Page 59: TOPIC 12 Simple Linear Regression and Correlation

Confidence Interval Estimate for Mean Value of y at x = xp

xx

p

SS

xx

nSty

2

2/

df = n – 2

Confidence Interval EstimateConfidence Interval Estimate

Page 60: TOPIC 12 Simple Linear Regression and Correlation

1. Level of confidence (1 – )• Width increases as confidence increases

2. Data dispersion (s)• Width increases as variation increases

3. Sample size• Width decreases as sample size increases

4. Distance of xp from mean • Width increases as distance increases

Factors Affecting Interval WidthFactors Affecting Interval Width

x

Page 61: TOPIC 12 Simple Linear Regression and Correlation

Sample 2 Line

y

xx1 x2

y

Sample 1 LineGreater dispersion than x1

x

Why Distance from MeanWhy Distance from Mean

Page 62: TOPIC 12 Simple Linear Regression and Correlation

You’re a marketing analyst for Hasbro Toys.

You find β0 = -.1, β 1 = .7 and s = .6055.

Ad $ Sales (Units)1 12 13 24 25 4

Find a 95% confidence interval forthe mean sales when advertising is $4.

^^

Confidence Interval Estimate ExampleConfidence Interval Estimate Example

Page 63: TOPIC 12 Simple Linear Regression and Correlation

x i y i x i2 y i

2 x iy i

1 1 1 1 1

2 1 4 1 2

3 2 9 4 6

4 2 16 4 8

5 4 25 16 20

15 10 55 26 37

Solution TableSolution Table

Page 64: TOPIC 12 Simple Linear Regression and Correlation

2

/ 2

2

ˆ .1 .7 4 2.7

4 312.7 3.182 .6055

5 10

1.645 ( ) 3.755

p

xx

x xy t s

n SS

y

E Y

x to be predicted

Confidence Interval Estimate SolutionConfidence Interval Estimate Solution

Page 65: TOPIC 12 Simple Linear Regression and Correlation

Prediction Interval of Individual Value of y at x = xp

Note!

2

/ 2

1ˆ 1

p

xx

x xy t S

n SS

df = n – 2

Prediction Interval of Individual ValuePrediction Interval of Individual Value

Page 66: TOPIC 12 Simple Linear Regression and Correlation

Expected(Mean) y

yy we're trying topredict

Prediction, y

xxp

eE(y) = b0 + b1x

^^

^y i =

b 0 + b 1x i

Why the Extra ‘S’ ?Why the Extra ‘S’ ?

Page 67: TOPIC 12 Simple Linear Regression and Correlation

You’re a marketing analyst for Hasbro Toys.

You find β0 = -.1, β 1 = .7 and s = .6055.

Ad $ Sales (Units)1 12 13 24 25 4

Predict the sales when advertising is $4. Use a 95% prediction interval.

^^

Prediction Interval ExamplePrediction Interval Example

Page 68: TOPIC 12 Simple Linear Regression and Correlation

x i y i x i2 y i

2 x iy i

1 1 1 1 1

2 1 4 1 2

3 2 9 4 6

4 2 16 4 8

5 4 25 16 20

15 10 55 26 37

Solution TableSolution Table

Page 69: TOPIC 12 Simple Linear Regression and Correlation

2

/ 2

2

4

1ˆ 1

ˆ .1 .7 4 2.7

4 312.7 3.182 .6055 1

5 10

.503 4.897

p

xx

x xy t s

n SS

y

y

x to be predicted

Prediction Interval SolutionPrediction Interval Solution

Page 70: TOPIC 12 Simple Linear Regression and Correlation

Dep Var Pred Std Err Low95% Upp95% Low95% Upp95%Obs SALES Value Predict Mean Mean Predict Predict 1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037 2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497 3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111 4 2.000 2.700 0.332 1.644 3.755 0.502 4.897 5 4.000 3.400 0.469 1.907 4.892 0.962 5.837

Predicted y when x = 4

Confidence Interval

SYPredictionInterval

Interval Estimate Computer OutputInterval Estimate Computer Output

Page 71: TOPIC 12 Simple Linear Regression and Correlation

x

y

x

^^ ^

y i = b 0 +

b 1x i

Confidence Intervals vs. Prediction IntervalsConfidence Intervals vs. Prediction Intervals

Page 72: TOPIC 12 Simple Linear Regression and Correlation