ms. khatijahhusna abd rani school of electrical system engineering sem ii 2014/2015

43
SIMPLE LINEAR REGRESSION Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015

Upload: ashlyn-riley

Post on 29-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

SIMPLE LINEAR REGRESSION

Ms. Khatijahhusna Abd RaniSchool of Electrical System EngineeringSem II 2014/2015

• Regression analysis explores the relationship between a quantitative response variable and one or more explanatory variables.

• 1 exp.var/ind. Var :SLR• >1 exp.var/ ind.var :MLR

• 3 major objectives:

i. Description

ii. Control

iii. Prediction

To describe the effect of income on expenditure

To increase the export of rubber by controlling other factors such as price

To predict the price of houses based on lot size & location

1) A nutritionist studying weight loss programs might wants to find out if reducing intake of carbohydrate can help a person reduce weight.a) X is the carbohydrate intake (independent variable).b) Y is the weight (dependent variable).

2) An entrepreneur might want to know whether increasing the cost of packaging his new product will have an effect on the sales volume.a) X is cost of packagingb) Y is sales volume

4

EXAMPLE 4.1

EXAMPLE 4.2

(X 1 ,Y1) (X 8 ,Y8)

FIRST: PLOT YOUR DATA!A graph of the ordered pairs (x,y) of num. consisting of the ind. Var Xand the dep. Var. Y

• Can we use a known value of temperature (X) to help predict the number of pairs of gloves (Y)

THE QUESTION IS…..

FITTING LINEUSING THIS LINE FOR PREDICTION

1. Good fitting line??2. Is a line reasonable summaryof the r/ship between variables??

Negative relationship: since as the num. of absences increases, the final grade decrease

Positive relationship: since as the num. of cars rented increases, revenue tends to increase

No relationship

• Linear regression : we assume to have linear r/ship between X and Y

E(Y|X)= β₀+ β₁Xi

Expectation of Y for a given value of X

interceptslope

The observed values of Y vary about the line

Parameters that we do not know Estimated!!!

• We will use sample data to obtain the Estimated regression line:

X Y1 0ˆ ˆ ˆ

XY 10ˆˆˆ

No Error term… WHY??

because my predicted value of Y will fall precisely on this line

XY 10ˆˆˆ

How we are going to estimate the 2 parameters values??

We usually use the method of least squares to estimate 10ˆˆ and

LEAST SQUARES REGRESSION LINE

• Recall the assumed relationship between Y and X:

XY 10ˆˆˆ

iXY 10

We use data to find the estimated regression line:

How we are going to choose them wisely… so that we can have a good regression line.

+ve Residual/ error

-ve Residual/ error

• What is the best line?

• Minimize the iii YY ˆ

• are chosen to minimize the sum of the squared residual:

2

10

22

)ˆˆ(

ˆ

ii

ii

XY

YYe

This is called the method of least squares.

10ˆ&ˆ

Assumptions About the Error Term

1. The error is a random variable with mean of zero.

2. The variance of , denoted by , is the same for all values of the independent variable.

3. The values of are independent.

4. The error is a normally distributed random variable.

2

EXAMPLE 4.2

• Solution:

xy

xy

S

S

S

yy

xx

xy

5121.06545.73ˆ

6545.73)75.74(5121.0375.35

ˆˆ

5121.05.123

25.63ˆ

875.378

)283(10049

5.1238

)598(44824

25.638

)283)(598(21091

10

1

2

2

10049,21091,283,598 2 yxyyx

USING THIS LINE FOR PREDICTION

When we increase 1 unit of X, so it will decrease 0.5121 unit of Y

xy 5121.06545.73ˆ

gloves of pairs 36

)74(5121.06545.73

5121.0 6545.73ˆ

xy

• The coefficient of determination is a measure of the variation of the dependent variable (Y) that is explained by the regression line and the independent variable (X).

• The symbol for the coefficient of determination is r2 or R2

21.00 r26

COEFFICIENT OF DETERMINATION (R2)

• If r =0.90, then r2 =0.81. It means that 81% of the variation in the dependent variable (Y) is accounted for by the variations in the independent variable (X).

• The rest of the variation, 0.19 or 19%, is unexplained and called the coefficient of nondetermination.

• Formula for the coefficient of nondetermination is (1.00 - r2 )

• Relationship Among SST, SSR, SSE

where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error

SST = SSR + SSE

2( )iy y 2ˆ( )iy y 2ˆ( )i iy y

The coefficient of determination is:

where:SSR = sum of squares due to regressionSST = total sum of squares 28

222ˆˆ iiii yyyyyy

yyxx

xy

SS

S

SST

SSRr

22

variationtotal

variationexplained

COEFFICIENT OF DETERMINATION (R2)

Refer Example 4.2

855.0

)875.37)(5.123(

25.63 2

22

yyxx

xy

SS

Sr

It means that 85.5% of the variation in the dependent variable (Y: number of pairs of gloves) is explained by the variations in the independent variable (X:temperature).

• Correlation measures the strength of a linear relationship between the two variables.

• Also known as Pearson’s product moment coefficient of correlation.

• The symbol for the sample coefficient of correlation is r , population coefficient of correlation is .

• Formula :

COEFFICIENT OF CORRELATION (r)

yyxx

xy

SS

Sr

Properties of r :

• • Values of r close to 1 implies there is a strong

positive linear relationship between x and y.• Values of r close to -1 implies there is a strong

negative linear relationship between x and y. • Values of r close to 0 implies little or no linear

relationship between x and y

11 r

Refer Example 4.2: Number of pairs of gloves

Solution:

Thus, there is a strong negative linear relationship between score obtain before (x) and after (y).

92.0

)875.37)(5.123(

25.63

yyxx

xy

SS

Sr Or

92.0

855.0

855.02

r

r

Next, refer to equation

xy 5121.0 6545.73ˆ

Negative relationship, since sign for b1

is negative

(d)

• To determine whether X provides information in predicting Y, we proceed with testing the hypothesis.

TEST OF SIGNIFICANCE

Two test are commonly used:

F-testT-test

1. Hypotheses:

2. Significance level,

3. Rejection Region

t-test

r/ship)linear (exist 0:

r/ship)linear (no 0:

11

10

H

H

2,2

2,2

or

or

value-p

ntest

ntest tttt

P-value approach

Critical -value approach

4. Test Statistic

5. Decision Rule

6. Conclusion

t-test

xx

xyyy

Sn

SS

Vart

1

2

ˆ)ˆvar(

)ˆ(

ˆ

11

1

1

2,2

2,2

0

or

or

value-p

:ifReject

ntest

ntest tttt

H

There is a significant relationship between variable X and Y.

Refer Example 4.2

r/ship)linear (exist 0:

r/ship)linear (no 0:

11

10

H

H

Significance level, 05.0

Rejection region:

2.447or 2.447

or

or

6,025.06,025.0

28,2

05.028,

2

05.0

2,2

2,2

tttt

tttt

tttt

testtest

testtest

ntest

ntest

1

2

3

(e)

953.50074.0

5121.0

)ˆ(

ˆ

0074.05.123

1

6

63.25 x5121.0(875.371

2

ˆ)ˆvar(

)ˆ(

ˆ

1

1

11

1

1

Vart

Sn

SS

Vart

test

xx

xyyy

test

06,025.0 reject weso ,447.2953.5 Since Htttest

We conclude that the temperature is linearly related to the number of pairs of gloves produced

Test Statistic4

5

6

Decision Rule

Conclusion

• We may also use the analysis of variance approach to test significance of regression.

• The ANOVA approach involves the partitioning of total variability in the response variable Y.

• SST (total sum of squares).

• If SST=0, all observations are the same.

• The greater is SST, the greater is the variation among the Y observations.

ANOVA

• SSE (error sum of squares).

• If SSE=0, all observations falls on the fitted regression line.

• The larger the SSE, the greater is the variation of the Y observations around the regression line.

ANOVA

• SSR (Regression sum of squares)

• SSR: measure of the variability of the Y’s associated with regression line.

• The larger is SSR in relation to SST, the greater is the effect of the regression line relation in accounting for the total variation in the Y observations.

ANOVA

1. Hypotheses:

2. Significance level,

3. Rejection Region

ANOVA (F-TEST)

r/ship)linear (exist 0:

r/ship)linear (no 0:

11

10

H

H

987.56,1,05.0 fFtestCritical -value

approach

05.0

4. Test Statistic

5. Decision Rule

6. Conclusion

We can conclude that there is a significant relationship between variable X and Y.

Alternatively, we conclude that the regression model is significant

reject weso 5.987,46.35 Since, 06,1,05.0 HfFtest

F-TEST

46.3583.112

5625.4000

83.1126

5625.40005625.4677

282

1

5625.4000

.

F

SSRSST

n

SSEMSE

fd

SSRMSR

MSE

MSRF

To calculate MSR and MSE, first compute the regression sum of squares (SSR) and the error sum of squares (SSE).

Chapter

4