1 re-expressing data chapter 6 – normal model –what if data do not follow a normal model? ...

30
1 Re-expressing Data Chapter 6 – Normal Model What if data do not follow a Normal model? Chapters 8 & 9 – Linear Model What if a relationship between two variables is not linear?

Upload: lorena-roberta-baldwin

Post on 22-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

1

Re-expressing Data Chapter 6 – Normal Model

–What if data do not follow a Normal model?

Chapters 8 & 9 – Linear Model–What if a relationship between

two variables is not linear?

Page 2: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

2

Re-expressing Data Re-expression is another

name for changing the scale of (transforming) the data.

Usually we re-express the response variable, Y.

Page 3: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

3

Goals of Re-expression Goal 1 – Make the distribution

of the re-expressed data more symmetric.

Goal 2 – Make the spread of the re-expressed data more similar across groups.

Page 4: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

4

Goals of Re-expression Goal 3 – Make the form of a

scatter plot more linear. Goal 4 – Make the scatter in

the scatter plot more even across all values of the explanatory variable.

Page 5: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

5

Ladder of PowersPower: 2Re-expression:Comment: Use on left skewed

data.

2y

Page 6: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

6

Ladder of PowersPower: 1Re-expression:Comment: No re-expression.

Do not re-express the data if they are already well behaved.

y

Page 7: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

7

Ladder of PowersPower: ½ Re-expression:Comment: Use on count data

or when scatter in a scatter plot tends to increase as the explanatory variable increases.

y

Page 8: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

8

Ladder of PowersPower: “0” Re-expression: Comments: Not really the “0”

power. Use on right skewed data. Measurements cannot be negative or zero.

ylog

Page 9: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

9

Ladder of PowersPower: –½, –1 Re-expression: Comments: Use on right

skewed data. Measurements cannot be negative or zero. Use on ratios.

yy

1,

1

Page 10: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

10

Goal 1 - Symmetry Data are obtained on the time

between nerve pulses along a nerve fiber.

Time is rounded to the nearest half unit where a unit is of a second.

– 30.5 represents

th

501

sec 61050530 ..

Page 11: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

11

.01

.05

.10

.25

.50

.75

.90

.95

.99

-3

-2

-1

0

1

2

3

Nor

mal

Qua

ntile

Plo

t

20

40

60

Cou

nt0 10 20 30 40 50 60 70

Time ( sec)th

501

Page 12: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

12

Time – Nerve Pulses Distribution is skewed right. Sample mean (12.305) is much

larger than the sample median (7.5).

Many potential outliers. Data not from a Normal model.

Page 13: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

13

.01

.05

.10

.25

.50

.75

.90

.95

.99

-3

-2

-1

0

1

2

3

Nor

mal

Qua

ntile

Plo

t

10

20

30

40

Cou

nt0 1 2 3 4 5 6 7 8 9

Sqrt(Time)

Page 14: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

14

.01

.05

.10

.25

.50

.75

.90

.95

.99

-3

-2

-1

0

1

2

3

Nor

mal

Qua

ntile

Plo

t

10

20

30

Cou

nt-1 0 1 2 3 4 5

Log(Time)

Page 15: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

15

Summary Time – Highly skewed to the

right. Sqrt(Time) – Still skewed right. Log(Time) –Fairly symmetric

and mounded in the middle.– Could have come from a Normal

model.

Page 16: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

16

Goal 3 – Straighten Up What is the relationship

between the temperature of coffee and the time since it was poured?–Y, temperature ( oF)–X, time (minutes)

Page 17: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

17

80

90

100

110

120

130

140

150

160

170

180

190

200Te

mp

0 10 20 30 40 50 60

Time (min)

Bivariate Fit of Temp By Time (min)

Page 18: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

18

Cooling Coffee There is a general negative

association – as time since the coffee was poured increases the temperature of the coffee decreases.

Page 19: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

19

Linear Model

100

110

120

130

140

150

160

170

180

190

Tem

p (F

)

-10 0 10 20 30 40 50 60

Time (min)

Page 20: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

20

Linear Model Fit Summary

– Predicted Temp = 176.7 – 1.56*Time

– On average, temperature decreases 1.56 oF per minute.

– R2 = 0.99, 99% of the variation in temperature is explained by the linear relationship with time.

Page 21: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

21

Plot of Residuals

-5

-4

-3

-2

-1

0

1

2

3

4

5R

esid

ual

-10 0 10 20 30 40 50 60

Time (min)

Page 22: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

22

Curved Pattern There is a clear pattern in the

plot of residuals versus time.–Under predict, over predict,

under predict. The linear fit is very good,

but we can do better.

Page 23: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

23

4.5

4.6

4.7

4.8

4.9

5

5.1

5.2

5.3

5.4

5.5

Log(

Tem

p)

-10 0 10 20 30 40 50 60

Time (min)

Linear Fit

Bivariate Fit of Log(Temp) By Time (min)

Page 24: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

24

Log(Temp) by Time Summary

– Predicted Log(Temp) = 5.1946 –0.0114*Time

–On average, log temperature decreases 0.0114 log(oF) per minute.

Page 25: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

25

Plot of Residuals

-0.010

-0.005

0.000

0.005

0.010

Res

idua

l

-10 0 10 20 30 40 50 60

Time (min)

Page 26: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

26

Interpretation There is a random scatter of

points around the zero line. The linear model relating

Log(Temp) to Time is the best we can do.

Page 27: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

27

Original Scale? Predicted Log(Temp) = 5.1946 –

0.0114*Time Predicted Temp =

180.3*e–0.0114*Time

– Predicted temp at time=0, 180.3 oF– The predicted temp in one more minute

is the predicted temp now multiplied by e–0.0114 = 0.98866

Page 28: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

28

JMP Method 1

–Create a new column in JMP, Log(Temp): Cols – Formula –Transcendental – Log.

Page 29: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

29

JMP Method 1 (continued)

–Fit Y by XY – Log(Temp)X – Time

–Fit Linear

Page 30: 1 Re-expressing Data  Chapter 6 – Normal Model –What if data do not follow a Normal model?  Chapters 8 & 9 – Linear Model –What if a relationship between

30

JMP Method 2

–Fit Y by XY – TempX – Time

–Fit SpecialTransform Y – Log