chapter 4 review: more a bout relationship between t wo v ariables

23
Chapter 4 Review: More About Relationship Between Two Variables Group Members: Qianya Meng Nikta Kheiri Min Kim 1 st period 12/14/11

Upload: evadne

Post on 22-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Chapter 4 Review: More A bout Relationship Between T wo V ariables. Group Members: Qianya Meng Nikta Kheiri Min Kim 1 st period 12/14/11. The Big Idea. Transform the graph to achieve linearity - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Chapter 4 Review:More About Relationship

Between Two Variables

Group Members:Qianya MengNikta Kheiri

Min Kim

1st period12/14/11

Page 2: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

The Big Idea

• Transform the graph to achieve linearity • Transform exponential graphs: to achieve linearity

and come up with a transformed equation for the use of extrapolation.

• Transform power functions to achieve linearity and come up with a transformed equation for the use of extrapolation.

• Learn to use marginal distribution and conditional • Recognize relationships between two variables.

Page 3: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Vocabulary You Need to Know• Transforming or re-expressing the data is applying a function

such as the logarithm or square root to a quantitative variable

• Log Rules:

• 1) logb(mn) = logb(m) + logb(n)

• 2) logb(m/n) = logb(m) – logb(n)

• 3) logb(mn) = n · logb(m)

Page 4: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Vocabulary

• Linear growth increases by a fixed amount in each equal time period.

• Exponential growth model• Log y = log a + (log b)x• Predicted y = ab^x• Power law model• Log y = log a + p log x• Predicted y = ax^p

Page 5: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Vocabulary• Two-way table describes two categorical variables• Marginal distributions are the total in each column and row

variable• Conditional distributions of column variable, given row

variable• Conditional distributions of row variable, given column

variable• Simpson’s paradox is a reversal that an association or

comparison that holds for all of several groups can reverse direction when the data are combined to form a single group

Page 6: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Vocabulary

• Causation: Changes in x cause changes in y• Common response: Changes in both x and y

are caused by changes in a lurking variable z• Confounding: The effect (if any) of x on y is

confounded with the effect of a lurking variable z

Page 7: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Key Topics Covered in this Chapter

• Modeling nonlinear data• Relations in categorical data• Establishing causation

Page 8: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Formulas You Should Know

• Exponential growth model• Log y = log a + (log b)x• Predicted y = ab^x• Power law model• Log y = log a + p log x• Predicted y = ax^p

Page 9: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Calculator Key Strokes• Exponential growth modeling• Enter the explanatory data into L1 and response data into L2• Draw the scatterplot y versus x• Define L3 as the (natural) logarithm of L2 then make a

scatterplot of (ln) log versus L1• Perform the least-squares regression on the transformed data• Draw the scatterplot• Plot the residuals versus L1• With the regression equation in Y1, define Y2 = e^(Y1) or Y2 =

log^(Y1).

Page 10: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Calculator Key Strokes• Power law modeling• Enter the explanatory data into L1 and response data into L2• Draw the scatterplot y versus x• Define L3 as the (natural) logarithm of L1 and define L4 as the (natural)

logarithm of L2• Plot L4 versus L3• Calculate the regression equation for the transformed data and store it in

Y1• Construct a residual plot• Define Y2 as (10^a)(x^b) or (e^a)(x^b)• Plot Y2 and the scatterplot for the original data together• To make a prediction for the value x = k, evaluate Y2(k) on the home screen

Page 11: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Helpful Hints• When the explanatory variable is years, transform the

data to “years since” so that the values are smaller and don’t create overflow problems when you perform the inverse transformation

• If there is a clear explanatory/response relationship, compare the conditional distributions of the response variable for the separate values of the explanatory variable

• Even when direct causation is present, it is rarely a complete explanation of an association between two variables

Page 12: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Q1Some college students collected data on the intensity of light at various depths in a lake. Here are their data:a) Make a scatterplot suitable for predicting light intensity from depth. Describe the form

of the relationship.b) To verify that the decrease in light intensity follows an exponential model, calculate the

ratio of light intensity at consecutive depths. Start with 120.42/168.00=.0717. what do you conclude?

c) Take the natural logarithm(ln) of the light intensity measurements and plot these values against the corresponding depth. Does this transformation achieve linearity?

d) Calculate the least-square regression equation for the transformed data. Interpret the slope and y intercept of this equation in this setting.

e) Construct and interpret a residual plot.f) Perform the inverse transformation to express light intensity as an exponential function

of depth in the lake. Display scatter plot of the original data with the exponential model superimposed. Is your exponential function a satisfactory model for the data?

g) Use your model to predict the light intensity at a depth of 22 meters. The actual light intensity reading at the depth was .58 lumens. Does this surprise you?

Depths (m)

Light intensity

5 168.00

6 120.42

7 86.31

8 61.87

9 44.34

10 31.78

11 22.78

Page 13: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Answer Q1• A) the relationship is strong, negative, and curved.• B) the ratios are all 0.717, so an exponential model is appropriate.• C) it achieves linearity.• D) if x= depth and y=ln(light intensity), then =6.7891-0.3330x. T5hye

i8ntercept, 6.7891, provides an estimate for the average value of the natural log of the light intensity decreases on average by 0.3330 for each one meter increase in depth.

• E) the residual plot shows a fairly random scatter and relatively small residuals, so the linear model is appropriate.

• F) if x=depth and y=light intensity, y=(e^6.789)(e^-.333x). It is a satisfactory model.

• G) at 22m, the predicted light intensity would be .584 lumens. No, not surprised.

Page 14: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Q2Some high school physics students dropped a ball and measured its height at various points along its descent. Table 4.3 shows the time since release and the distance the ball had fallena) Make a scatterplot suitable for predicting distance

fallen from time since release. describe the direction, form, and strength of the relationship.

b) Perform an appropriate transformation to achieve linearity . Then find a least-square regression model for the transformed data.

c) Comment on the quality of your model in (b) by referring to a residual plot and .

d) Make a scatter plot of the point (time, ) to see if this transformation works. Then find a least-square regression model for the transformed data.

e) Comment on the quality of your model in (d) by referring to a residual plot and

f) Use the two models you obtained in (b) and (d) to predict the distance that the object had fallen after 0.47 seconds. Which prediction do you think is closer to the actual value? Why?

time distance

.16 12.1

.24 29.8

.25 32.7

.3 42.8

.3 44.2

.32 55.8

.36 63.5

.36 65.1

.5 124.6

.5 129.7

.57 150.2

.61 182.2

.61 1189.4

.68 220.4

.72 254.0

.72 261.0

.83 334.6

.88 375.5

.89 399.1

Page 15: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Answer Q2• (a) relationship is curved, strong, and positive.• (b) if x = time and y = distance, predicted y = 0.99 +

490.416x^2• (c) r^2 = 0.9984 and the residual plot shows random scatter

and fairly small-sized residuals, so this looks like an appropriate model

• (d) yes. Square-root of the predicted y = 0.1046 + 22.0428x• (e) r^2 = 0.9986 and the residual plot show no pattern, which

suggest a good model• (f) using model from (b): 109.32 cm. using model from (d):

109.51cm

Page 16: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Q3 Here are data from eight schools on smoking among students and among their parents.a) How many students are described in the two-way table ?b)What percent of these students smoke?c) Give the marginal distribution of parents’ smoking behavior, both in counts

and in percents.d)Calculate three conditional distributions of students’ smoking behavior:

one for each of the three parental smoking categories. Describe the relationship between the smoking behaviors of students and their parents in a few sentences.

Neither parent smoke One parent smoke Both parents smoke

Students does not smoke

1168 1823 1380

Student smoke 188 416 400

Page 17: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Answer Q3• A) 5375 students• B) 18.7%• C) both parents smoke: 1780, 33.1%. One parent smokes: 2239, 41.7%.

Neither parents smoke: 1356, 25.2%.• D) student smokes, given both parents smoke: 400/(400+1380)=.2247.

student doesn’t smoke, given both parents smoke: 1380/(400+1380)=.7753. student smoke, given one parent smokes: 416/(416+1823)=.1858. student doesn’t smoke, given one parent smokes: 1823/(416+1823)=.8142. student smokes, given neither parent smokes : 188?(188+1168)=.1386. student doesn’t smoke, given that neither parent smokes: 1168/(188+1168)=.8614. students who smoke are most likely to come from families where one or more of their parents smoke.

Page 18: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Q4 Whether a convicted murder gets the death penalty seems to be influenced by the race of the victim. Here are data on 326 cases in which the defendants was convicted of murdera) Use these data to make a two-way table of defendant’s race vs. death

penaltyb)Show that Simpson’s paradox holds: a higher percent of white defendants

are sentenced to death overall, but for the black and white victims a higher percent of black defendants are sentenced to death.

c) Use the data to explain why the paradox hold in language that a judge could understand

White defendant Black defendant

White victim Black victim White victim Black victim

Death 19 0 11 6

Not 132 9 52 97

Page 19: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Answer Q4

• A) white defendant: 19 yes, 141 no. Black defendant: 17 yes, 149 no.

• B) overall death penalty: 11.9% of white defendants, 10.2% of Black defendants. For white victims, 12.6% and 17.5%; for black victims, 0% and 5.8%.

• C) the death penalty is more likely when the victim was white(14%) rather than lack (5.4%). Because most convicted killers are of the same race as their victims, whites are more often sentenced to death.

Page 20: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Q5 A study showed that woman who work in the production of computer chips have abnormally high numbers of miscarriages. The union claimed that exposure to chemical used in production causes the miscarriage. Another possible explanation is that these workers spend most of their time standing up. Can we conclude that exposure to chemicals causes more miscarriages? Why or why not?

Page 21: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Answer Q5

• No. The “number of hours standing up at work” is a confounding variable.

Page 22: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Q6 A study finds that high school students who take the SAT, enroll in an SAT coaching courses, and then take the SAT a second time raise their SAT mathematics scores from a mean of 521 to a mean of 561. what factors other taking the course might explain this improvement?

Page 23: Chapter 4 Review: More  A bout Relationship  Between  T wo  V ariables

Answer Q6

• The variable “knowledge gained as a result of taking the SAT previously is a confounding variable.