linear regression when looking for a linear relationship between two sets of data we can plot what...

19
Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph we can see that there is some positive correlation.

Upload: chad-doyle

Post on 28-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

Linear Regression

When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram.

x

y

Looking at the graph we can see that there is some positive correlation.

Page 2: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

x

y

It is possible to draw a line called a regression line. There are two types y on x and x on y.

First lets consider y on x regression line.

y on x

The y on x line, draws the regression line by keeping the sum of the squares of the vertical distance to a minimum.

y ax b

Note: The equation of the line is called “The Equation of the Least Squares regressions Lines”

Page 3: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

x

y

Now consider the x on y regression line.

x on y

The x on y line, draws the regression line by keeping the sum of the squares of the horizontal distance to a minimum.

x cy d

Page 4: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

x

yy on x

y ax b

x on y

x cy d

,x y

Drawing both graphs on the same graph we have

We should note that both graphs will pass through the means of both sets of data, . ,x y

Page 5: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

It is possible to calculate the equations of the y on x and x on y regression lines.

Important formulae

y on x regression line is of the form and can be calculated by using the formula.

y ax b

2

xy

x

sy y x x

s

Where is called the covariance and links the x and y data.

is the variance of the x data

xyS2

xS

,xy

x x y y xys xy

n n

2 2

2x

x x xs x

n n

Page 6: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

x on y regression line is of the form and can be calculated by using the formula.

x cy d

2

xy

y

sx x y y

s

Where is called the covariance and links the x and y data.

is the variance of the y data

xyS2

yS

,xy

x x y y xys xy

n n

2 2

2y

y y ys y

n n

Page 7: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

Example

In the table below are the results of ten students in both their Mathematics and Physics examinations. The teacher thinks there might be a relationship between the two. His hypothesis is “a student who has Mathematical ability also has ability in Physics.”

Mathematics Mark /100 (x) Physics Mark /100 (y)

61 56

34 45

24 15

89 92

47 61

67 57

82 75

6 8

53 47

89 76

Page 8: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

Drawing a scatter graph

A scatter graph to show Mathematics against Physics result

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

M athematics result /100 (x )

Ph

ys

isc

s r

es

ult

/1

00

(y

)

x y

61 56

34 45

24 15

89 92

47 61

67 57

82 75

6 8

53 47

89 76

Page 9: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

Now calculating the regression lines

7689

47

8

75

57

61

92

15

45

56

y

53

6

82

67

47

89

24

34

61

x x x y y 2x x 2

y y x x y y

552 532

552

1055.2

xx

n

532

1053.2

yy

n

5.8

-21.2

-31.2

-33.8

-8.2

11.8

26.8

-49.2

-2.2

33.8

2.8

-8.2

-38.2

38.8

7.8

3.8

21.8

-45.2

-6.2

22.8

33.64

449.44

973.44

1142.44

67.24

139.24

718.24

2420.64

4.84

1142.44

7091.60

7.84

67.24

1459.24

1505.44

60.84

14.44

475.24

2043.04

38.44

519.84

6191.60

16.24

173.84

1191.84

1311.44

-63.96

44.84

584.24

2223.84

13.64

770.64

6266.60

2

2 7091.60

109. 6

07 1x

x xs

n

2

2 6191.60

119. 6

06 1y

y ys

n

6266

626.6

.

6

60

10

xy

x x y ys

n

Page 10: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

2 255.2, 53.2, 709.16, 619.16, 626.66x y xyx y s s s

2

626.6653.2 55.2

709.0.884 4

16.42

xy

x

sy y x x

s

y x

y x

For regression line y on x which has form y ax b

2

626.6655.2 53.2

619.1.012 1

16.36

xy

x

sx x y y

s

x y

x y

For regression line x on y which has form x cx d

Page 11: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

A scatter graph to show Mathematics against Physics result

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

M athematics result /100 (x )

Ph

ys

isc

s r

es

ult

/1

00

(y

)

Plotting both lines on the scatter diagram

0.884 4.42y x y on x,

and for

x on y, 1.012 1.36x y

Note: For x on y line, remember to rearrange it into the following form before trying to plot

1.012 1.36

1.36 1.012

1.36

1.012

x

x

x y

y

y

y on x

x on y

,x y

Page 12: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

Correlation

We need a way to determine if there is linear correlation or not. So we calculate what is known as the Product-Moment Correlation Coefficient (r).

xy

x y

sr

s s

xy

x x y ys

n

(covariance),

2

x

x xs

n

(standard deviation of x)

2

y

y ys

n

(standard deviation of y).

We can see that the quantity r from the following five sets of data above tells us something about the degree of scatter of the two sets of data, if we are looking for a linear relationship.

Page 13: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

x 0 5 10 15 20 25 30 35

y 38 28 26 19 17 8 5 1

Table 1

0

10

20

30

40

0 10 20 30 40x

y

Initial Data

y on x

x on y

Coordinates of Mean

Table 1

1.024 35.666y x

0.957 34.484x y

y on x

x on y

0.990r

The product moment correlation coefficient

In table 1 we notice that the two regressions lines (y on x and x on y) nearly coincide and that as the x-data increases the y-data decreases. The value of r is -0.990, which is close to –1. Here we have what is called strong negative linear correlation.

Page 14: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

Table 2

0

10

20

30

40

0 10 20 30 40 x

y

Initial Data

y on x

x on y

Coordinates of Mean

x 0 5 10 15 20 25 30 35

y 23 30 20 23 15 32 20 2

Table 2

0.402 27.666y x

0.695 31.835x y

y on x

x on y

0.529r

The product moment correlation coefficient

In table 2, the two regression lines are further apart although there is weak negative linear correlation. The value of r is -0.529 and it is getting closer to 0.

Page 15: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

Table 3

0

10

20

30

40

0 10 20 30 40x

y

Initial Data

y on x

x on y

Coordinates of Mean

Table 3

0.00476 20.833y x

0.00632 17.631x y

y on x

x on y

0.00548r

The product moment correlation coefficient

x 0 5 10 15 20 25 30 35

y 5 31 19 23 30 32 20 6

In table 3, the two regression lines are virtually perpendicular and there is no linear correlation. The value of r is -.00548 and it is very close to 0.

Page 16: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

Table 4

0

10

20

30

40

0 10 20 30 40x

y

Initial Data

y on x

x on y

Coordinates of Mean

Table 4

0.593 10.75y x

0.632 4.148x y

y on x

x on y

0.612r

The product moment correlation coefficient

x 0 5 10 15 20 25 30 35

y 12 17 23 9 12 38 18 40

In table 4, the two regression lines are further apart but we notice that as the x-data increases the y-data increases. We say there is weak positive linear correlation. The value of r is 0.612 and it is moving away from 0 and getting closer to 1.

Page 17: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

x 0 5 10 15 20 25 30 35

y 38 28 26 19 17 8 5 1

Table 5

0.879 1.75y x

1.116 1.605x y

y on x

x on y

0.990r

The product moment correlation coefficient

Table 5

0

10

20

30

40

0 10 20 30 40

x

y

Initial Data

y on x

x on y

Coordinates of Mean

In table 5, we notice that the two regressions lines (y on x and x on y) nearly coincide and that as the x-data increases the y-data increases. The value of r is 0.990, which is very close to 1. Here we have what is called strong positive linear correlation.

Page 18: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

The value of r determines the degree of linear scatter of the two sets of data and

1 1r

1r - indicates that the data have perfect negative linear correlation,

0r - indicates that the data has no linear correlation,

1r - indicates that the data have perfect positive linear correlation.

r is called Product-Moment Correlation Coefficient.

Page 19: Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph

A scatter graph to show Mathematics against Physics result

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

M athematics result /100 (x )

Ph

ys

isc

s r

es

ult

/1

00

(y

)

y on x

x on y

,x y

Returning to our example

626.66

709.16 619.0.946

16

xy

x y

sr

s s

So we can conclude that as r is close to 1, that the results show that his hypothesis that “a student who has Mathematical ability also has ability in Physics’” might be true.