transforming relationships ap statistics practice of statistics section 4.1

33
Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Upload: darrell-morrison

Post on 29-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Transforming Relationships

AP StatisticsPractice of Statistics

Section 4.1

Page 2: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

What You’ll Learn

• Recognize when the relationship between two variables is either an exponential relationship or a power relationship

Perform the appropriate transformation to “linearize” the data, find the LSRL on the transformed points, “untransform” to find a model for the original data

Page 3: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Not everything is Linear!

We’ve looked at several sets of data in which the relationships are linear in nature

What about those relationships that exhibit a different “nonlinear” pattern?

Consider for a moment gypsy moths.An outbreak of gypsy moths in Massachusetts from 1978 to 1981 resulted in many acres of defoliated land. The acreages are listed in the following table.

Page 4: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Gypsy MothsThe data and graph depict the number of acres defoliated by gypsy moths in Massachusetts between 1978 and 1981.

YearsYears 19781978 19791979 19801980 19811981

Acres of Acres of Defoliated landDefoliated land 6304263042 226260226260 907075907075 28260952826095

Calculator:Create a scatter plotL1: YearsL2: AcresStat Plot, On, scatterplot, zoom9

Page 5: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

So, this doesn’t look too bad! Let’s try a linear regression on the data, remembering to check both the

correlation coefficient and the residual plot.Calculator: Stat Calc 4Store RegEQ, VARS, Y-VARS, Y1Calculate, Graph (LSRL appears)ORStat Calc 4Y=, VARS, 5, EQ, 1RegEQ

Dependent Variable: Acres Independent Variable: Year

Acres = -1.7746007E9 + 896997.4 (Year) Sample size: 4 R (correlation coefficient) = 0.9136 R-sq = 0.8347045 Estimate of error standard deviation: 631139.44

Well a visual of the line doesn’t look too bad, and that’s a great correlation coefficient.

(remember though, sometimes “r” is deceptive---be sure to check the residuals!)

Page 6: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

The Residuals

• A check of the residuals indicates that a linear model is NOT appropriate!

• (Notice the parabolic pattern in the plot that even with only 4 data points can be seen!)

Page 7: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

So, what type of relationship is this?

• Remember from linear regression that when the relationship is linear, the response variable increases (or decreases) by a constant amount. (Add or subtract the same number each time)

Years Since 1977Years Since 1977 11 22 33 44

Acres of defoliated landAcres of defoliated land 6304263042 226260226260 907075907075 28260952826095

Difference in AcresDifference in Acres 163218163218 680815680815 19190201919020

•Notice that the difference between number of acres is not constant

•With this in mind and the problem with the residual plot, let’s consider another type of relationship.

Page 8: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Exponential Relationships In an exponential relationship, the response variable increases

by a fixed percentage of the previous total.

Years Since 1977Years Since 1977 11 22 33 44

Acres of defoliated landAcres of defoliated land 6304263042 226260226260 907075907075 28260952826095

Ratio (Next/Prev)Ratio (Next/Prev) 3.58903.5890 4.00904.0090 3.11563.1156

•Notice that although the ratio is not exactly the same (we wouldn’t expect it to be exact with “real” data) that there does appear to be a pretty consistent ratio value.

In other words, we should be able to multiply the previous value by some constant to get the next one.

So, let’s check out this possibility (we will again disregard the increase from 1990-1993 and only look at the increases for 1-year intervals.

Ratio: 226260/63042 = 35890

Page 9: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

So How Do We Create the Model?

• If the relationship is an exponential one, we can use a mathematical transformation to “linearize” the data, find the LSRL of the transformed data, then “untransform” to find the model that willfit the original data.

Ok, so let’s take it step by step

Page 10: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Finding the Model• Step 1: Use a mathematical model to “linearize” (create a new

data set whose relationship is linear)

YearsYears 19781978 19791979 19801980 19811981

Acres of Defoliated Acres of Defoliated landland 6304263042 226260226260 907075907075 28260952826095

Years Since 1977Years Since 1977 11 22 33 44

LogLog1010 (acres) (acres) 4.79964.7996 5.35465.3546 5.95765.9576 6.45126.4512

If the original data is exponential, find the logarithm (either common log or natural log) of each of the response

values.

When working with years it is also helpful to “code” the year data so our calculators can handle the values (most computer programs are capable of creating models using the full year) To do this we will take each year and subtract 1977 (this way all of our values are > 0)

Calculator:Stat EditL1: 1, 2, 3, 4L3, up to select,Log(L2), Enter

Page 11: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Finding the ModelNow, let’s check a scatterplot of the transformed dataCalculator: Stat Plot, On, Scatter L1,L3 Graph, Zoom9

Notice the change in the pattern from our original data to the transformed data. The logarithm transformation really “straightened our data”. (Using the natural logarithm would have had the same effect, our values would have just been different)

Page 12: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Finding the Model• Step 2: Find the LSRL for the transformed data

(remember to check the “r” and the residuals!)

This model looks promising, but remember to CHECK THE RESIDUALS!!!

Calculator: Stat Calc 4, L1, L3 Enter2nd Zero, DiagnosticOn

Dependent Variable: log10(Acres) Independent Variable: Year-1977

log 10(Acres) = 4.2513404 + 0.5557706 (Year-1977) Sample size: 4 R (correlation coefficient) = 0.9993 R-sq = 0.9985874 Estimate of error standard deviation: 0.033050213

Page 13: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Check the Residual Plot

A check of the residuals confirms that a exponential model is appropriate. (No pattern is present now).

Calculator:Stat Edit L4, up, selectEnter LSRL equation,4.2513404 + 0.5557706 (L1)Enter, this populates the y-hat data in L4.

Stat Edit L5, up selectEnter Residual equation,L3 – L4, Enter.

yy ˆresidual

Remember, L3 is the new (log) transformed y, and L4 is y-hat

Stat Plot, On, Scatter, L1 L5 , Graph, Zoom9

Page 14: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

“Untransforming” to find the model for our original data

★ Remember that our goal was to find a model that we could use for prediction of the number of defoliated acres of land for a given year.

★ The linear model we have would predict the common logarithm of acres. In order for our model to be useful, we need to reverse the transformation to create the model that fits the original data.

★ Although many transformations are easier to “untransform” after evaluating, we can use the properties of logarithms with both exponential and power (we’ll look at those next) to find the model for our original data.

Page 15: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Properties of Logarithms• Before we try to “untransform”, let’s review the

properties of logarithms you learned in Algebra (yes, you really did learn these!)

Logb xy = logb x + logb y (Addition rule)

Logb xm = mlogb x (Power rule)

Logb bn = n (Same base)10logn = n

Logb(x/y) = logb x – logb y (Subtraction rule)Since any subtraction can be changed to an addition equation, we will not use this last rule much!

Page 16: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Rewriting Log/Exponential Forms

Also recall rewriting fromExponential to Logarithmic form: bx = a logba = x

“log base answer = exponent”

Page 17: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Review Exponent Rules

Page 18: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Homework:

Notebook, page 69 and 70

Page 19: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Day 2: UNTRANSFORMING Linearized Data

Notes: Page 73, 74

Page 20: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

“Untransforming” exponential expressions

• An exponential function takes the form:y = abx, where a, b are constants

• (This is the form we want to end up with)

log10 (Acres) = 4.2513404 + 0.5557706 (Year-1977) Linear regression of the transformed data

Raise both sides using power of 10 (same base)

Same base law and multiplication law for exponents.

Simplify the constants

This is now in the form of y=abx, where a=17837.7634 and b = 3.5956

Notice that “b” is approximately the average of the ratios (next/prev) we calculated when we began looking for a model.

So, let’s get started

10log10

(Acres) = 10 4.2513404 + 0.5557706 (Year-1977)

Acres = (10 4.2513404) (10.5557706(Year-1977))

Acres = 17837.7634 (3.5956(Year-1977))

Page 21: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

So, does it fit our original data?• Since our original goal was to find a model that would allow

us to predict the number of acres of defoliated land if we knew the year, we need to check to see if our model actually fits the data.

0

500000

1000000

1500000

2000000

2500000

3000000

Acr

es

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5YearsSince1977

Acres = YearsSince1977

Gypsy Moth Outbreak Scatter Plot

The model looks pretty good, but as with any model we need to use caution when predicting outside our original data range.

Page 22: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Power Models• Another important transformation used in modeling

is the power model.

Power models have the formY = axb where a and b are constants

We can find an appropriate power model by taking the logarithms for both the response and explanatory variables, finding the linear regression for the transformed data, then using the laws of logarithms and exponents to “untransform”Let’s look at an example

Page 23: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Fishing Tournament• In a fishing tournament that you are in charge of you need to

find a way to record the weight of each fish caught without destroying or killing the fish.

• Since it is easier to measure the length of the fish rather than it’s weight, we must find a way to convert the length to weight.

• The local marine research lab has been gracious enough to provide you with the data for the average length and weight at different ages for Atlantic Ocean rockfish which model most fish species growing under normal feeding conditions.

Page 24: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

The DataAge (yr)Age (yr) Length Length

(cm)(cm)Weight Weight

(g)(g)

11 5.25.2 22

22 8.58.5 88

33 11.511.5 2121

44 14.314.3 3838

55 16.816.8 6969

66 19.219.2 117117

77 21.321.3 148148

88 23.323.3 190190

99 25.025.0 264264

1010 26.726.7 293293

1111 28.228.2 318318

1212 29.629.6 371371

1313 30.830.8 455455

1414 32.032.0 504504

1515 33.033.0 518518

1616 34.034.0 537537

1717 34.934.9 651651

1818 36.436.4 719719

1919 37.137.1 726726

2020 37.737.7 810810

•Since length is one dimensional and weight is three dimensional we should be able to find a reasonable model using power model (the residuals for a regression on the original data confirms that the variables are NOT linearly related—but we already knew that!)

•As before we need to first transform our data but we have to perform transformations on both length and weight

Page 25: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Transforming the DataAge (yr)Age (yr) Length Length

(cm)(cm)Log Log 10 10

(length)(length)Weight Weight

(g)(g)LogLog1010

(weight)(weight)

11 5.25.2 .7160.7160 22 .3010.3010

22 8.58.5 .9294.9294 88 .9031.9031

33 11.511.5 1.06071.0607 2121 1.32221.3222

44 14.314.3 1.15531.1553 3838 1.57981.5798

55 16.816.8 1.22531.2253 6969 1.83881.8388

66 19.219.2 1.28331.2833 117117 2.06822.0682

77 21.321.3 1.32841.3284 148148 2.17032.1703

88 23.323.3 1.36741.3674 190190 2.27882.2788

99 25.025.0 1.39791.3979 264264 2.42162.4216

1010 26.726.7 1.42651.4265 293293 2.46692.4669

1111 28.228.2 1.45021.4502 318318 2.50242.5024

1212 29.629.6 1.47131.4713 371371 2.56942.5694

1313 30.830.8 1.48861.4886 455455 2.65802.6580

1414 32.032.0 1.50521.5052 504504 2.70242.7024

1515 33.033.0 1.53151.5315 518518 2.71432.7143

1616 34.034.0 1.54281.5428 537537 2.73002.7300

1717 34.934.9 1.56111.5611 651651 2.81362.8136

1818 36.436.4 1.56941.5694 719719 2.85672.8567

1919 37.137.1 1.57631.5763 726726 2.86092.8609

2020 37.737.7 1.57631.5763 810810 2.90852.9085

This scatterplot indicates that a linear regression on the logarithms of both variables is certainly one to consider.

Page 26: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Linear Regression on the transformed data

Simple linear regression results: Dependent Variable: log10(Weight(g)) Independent Variable: log10(Length(cm)) log10 (Weight(g)) = -1.8993973 + 3.049418 log10 (Length(cm)) Sample size: 20 R (correlation coefficient) = 0.9993 R-sq = 0.9985228

A check of the correlation coefficient is certainly promising (r=.9993), the scatterplot of the transformed data indicates the line fits very well, and most importantly-----look at those residuals!!! Yes, statisticians get very excited when they see residuals that look that good!

Page 27: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

“Untransforming” a power modellog10 (Weight(g)) = -1.8993973 + 3.049418 log10

(Length(cm))

10log10

(Weight(g)) = 10-1.8993973 + 3.049418 log10

(length(cm))

Weight = 10-1.8993973 (103.049418log10

(length(cm)))

Weight = 10-1.8993973(10log10

(length(cm))3.049418

)

Weight = 10-1.8993973(length(cm))3.049418)

Weight = .01261 (length(cm))3.049418

Linear equation of the transformed data

Raise both sides using a base of 10

Same base and Multiplication law for exponents

Power rule for logarithms

Same base

Simplify constants

We

igh

t

0100200300400500600700800900

Length5 10 15 20 25 30 35 40

Weight = Length

Atlantic Ocean Rockfish Scatter Plot

Last check: plot the new model on the original data.

Looks like we’ve got a model that will be very useful for estimating the weight of a fish if we know its length!

Page 28: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Are there Other Possibilities?

• There are many other possibilities to transform data in order to find a model.

• If either an exponential or power model is not appropriate you may try:– Square the response or explanatory variable– Take the square root of either variable– Take the reciprocal of either variable

• The possibilities are endless, but for now we will concentrate mostly on either an exponential or power model.

Page 29: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Transforming on the TI

• There are a couple of different ways to find both an exponential and power regression model on your TI-calculator

• Using lists to transform• Using the built in regression models

Page 30: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Using lists to transform• We’ll use the Gypsy Moth data first.

Enter in lists 1 & 2

L1: years since 1977

L2: acres of defoliated land

Take the common log of the values in list 2 and put the new values in list 3

L3: log (L2)

Now do a linear regression on lists 1 & 3

You can check residuals just like we did before to verify this regression.

Now “untransform” as we did before to get the exponential

Note: for a power model create another list for the logarithm of the explanatory variable and do the linear regression on these two lists.

Page 31: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Using the Regression Models• The TI family of calculators has both an exponential and power model built

into the stat calc menus.• Create a list for the explanatory variable and one for the response variable

• From the home screen– STAT– CALC– 0:ExpReg (A:PwrReg)

L1, L2

– The model does not need untransforming

– The residuals created are the residuals from the linear transformation on the transformed data (yes, your calculator actually transforms the data, does a linear regression, then untransforms

Page 32: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

How to decide which model• Creating mathematical models for real data involves

a lot of trial and error.• One strategy:

– Try a linear model first ( residuals)– Then try an exponential model ( residuals)– Then try a power model ( residuals)

• If all residuals show a pattern, you can continue to try different transformations or choose the one with the best correlation

• Remember, no model is perfect, some models are useful…..we wish to find a useful model.

Page 33: Transforming Relationships AP Statistics Practice of Statistics Section 4.1

Homework:

• Notebook, page 71, problem #1 only• Handout “Practice Before Quiz 3.3”