ap statistics chapters 3 & 4 measuring relationships between 2 variables

21
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Upload: eleanor-victoria-campbell

Post on 17-Dec-2015

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

AP Statistics

Chapters 3 & 4Measuring Relationships Between 2 Variables

Page 2: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Basic Terms

Response Variable: Measures an outcome of a study.

Explanatory Variable: Helps explain or influences changes in a response variable.

Scatterplot: Shows the relationship between two quantitative variables measured on the same individuals (one variable on each axis).

We are examining relationships and associations. DO NOT ASSUME that the explanatory variable causes a change in the response variable.

Page 3: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Interpreting a Scatterplot Just like with univariate data, we are

looking for an overall pattern and for deviation from that pattern.

Overall pattern– Direction: Negative or positive association– Form: curved or linear? Are there clusters?– Strength: How closely do the points follow

a clear form? Deviations

– Outliers: Individual value that falls outside of the overall pattern.

Page 4: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Correlation

Correlation (r) measures the direction and strength of the linear relationship between two quantitative variables.– Does not distinguish between explanatory

and response variables (i.e. r would stay the same if you switched the x and y axes)

– r has no units of measurement (correlation will not change if you change the units for either of the two variables)

Page 5: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Correlation

(+) r indicates a positive association– As one variable increases, so does the

other. (-) r indicates a negative association

– As one variable increases, the other decreases.

r is always between -1 and 1.– If r is close to zero, then the linear

relationship is weak.– If r is close to 1 or -1, then the linear

relationship is strong.

Page 6: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Correlation

CORRELATION DOES NOT IMPLY CAUSATION!!!

Page 7: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Regression line

A line that describes how a response variable, y, changes as an explanatory variable, x, changes. It is often used to predict y given x.– y = a + bx

b slope: the amount by which y changes on average when x changes one unit.

a y-intercept

Page 8: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Making predictions with the regression line Interpolation

– Estimating predicted values between known values.

(Good )

Extrapolation– Predicting values outside the range

of values used to make the regression line.

(Bad )

Page 9: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Least-Squares Regression Line Line that makes the sum of the

squared vertical distances between the data points and the line as small as possible.– ŷ = a + bx (ŷ y-hat)

Slope: b = r(sy/sx) Passes through the point (x, y)

Page 10: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Example

An SRS of 50 families has provided the following statistics– # of children in the family

Mean: 2.1, std dev: 1.4– Annual Gross Income

Mean: $34,250, std dev: $10,540– r = .75

Write the equation for the least squares regression line that can be used to predict gross income based on # of children. – Be sure to define your variables.

Page 11: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Residuals

Residual: The difference between an observed value of the response variable and the value predicted by the regression line.– Residual = observed y – predicted y

= y – ŷStandard deviation of the residuals:

s = residuals2

n - 2

Page 12: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

How well does the line fit the data? To answer this question, you must

look at two things. 1. Residual plot: scatterplot of the

regression residuals plotted against (usually) the explanatory variable.– If the regression line represents the

pattern of data well, then… The residual plot will show no pattern. The residuals will be relatively small.

Page 13: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

How well does the line fit the data? 2. Coefficient of Determination: r2

– The fraction (%) of the variation in the values of y that is explained by the least squares regression line of y on x.

Template:– r2 % of the variation in (y-

variable) is explained by the least squares regression line with (x-variable) .

Page 14: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Other Considerations

Outlier: Observation that lies outside the overall pattern (may or may not have a large residual).

Influential Point: Observation which, if removed, would greatly change the statistical calculation.

Lurking variable: An additional variable that may influence the relationship between the explanatory and response variables.

Page 15: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Correlation v. Causation The goal of a study or experiment is

often to establish causation…a direct cause and effect link.– Lurking variables make establishing

causation difficult. Common response: Observed

association between two variables, x and y, is explained by a lurking variable, z. Both x and y change in response to changes in z.

Page 16: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Correlation v. Causation Confounding: Occurs when the

effects of two or more variables on a response variable cannot be distinguished from each other, (often occurs in an observational study).

Page 17: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Establishing Causation w/o an Experiment 1. The association is strong. 2. The association is consistent. 3. Larger values of the

explanatory variable are associated with stronger responses.

4. The alleged cause precedes the effect in time.

5. The alleged cause is plausible.

Page 18: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Non-linear relationships If data follows a non-linear form,

we can sometimes transform the data to become linear. By doing so we can then perform the same analyses that we do for linear data. (regression line, correlation, r2, residual plot).

What are the most common non-linear models for bivariate data?

Page 19: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Transforming non-linear data. Exponential model

– y = abx

– For each unit increase in x, y is multiplied by constant, b.

To transform to linearity, plot log y against x on the coordinate plane. Then perform a linear regression.– log y = a + bx OR ln y = a + bx

Page 20: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Transforming non-linear data Power Model

– y = axb

– Often used when trying to use a one-dimensional variable (e.g. length), to predict a multi-dimensional variable (e.g. area, volume, weight)

To transform to linearity, plot log y against log x on the coordinate plane. Then perform a linear regression.– log y = a + b(log x) OR ln y = a + b(ln x)

Page 21: AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables

Analyzing the relationship between categorical variables A two-way table is used to

compare categorical variables. Marginal distribution: Analyzing

the totals for one of the variables by itself.

Conditional distribution: The distribution of the response variable for each value of the explanatory variable.