chapter 3 examining relationships

53
Chapter 3 Examining Relationships Section 3.1 Scatterplots

Upload: abel

Post on 19-Mar-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Chapter 3 Examining Relationships. Section 3.1 Scatterplots. Terms to Know. A response variable measures an outcome of a study. An explanatory variable attempts to explain the observed outcomes. Example of an Explanatory and Response Variable. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 3 Examining Relationships

Chapter 3Examining Relationships

Section 3.1 Scatterplots

Page 2: Chapter 3 Examining Relationships

Terms to Know

A response variable measures an outcome of a study. An explanatory

variable attempts to explain the observed outcomes.

Page 3: Chapter 3 Examining Relationships

Example of an Explanatory and Response Variable

One degree day is accumulated for each degree a day’s average temp falls below or rises above 65 degrees.

Page 4: Chapter 3 Examining Relationships

Key Concept

The statistical techniques used to study relations among variables are more complex than one-variable methods.

Fortunately we build on the tools used for examining individual variables. The principles that guide

examination are the same.

1. Start with a graph

2. Look for an overall pattern and deviations from the pattern

3. Add numerical descriptions of specific aspects of the data

4. Sometimes there is a way to describe that

Page 5: Chapter 3 Examining Relationships

Term to Know

The most effective way to display the relation between two quantitative variables

is a scatterplot. Plot the explanatory variable, if there is one, on the x-axis, and the response variable on the y-axis. Each individual in the data appears as a point.

Page 6: Chapter 3 Examining Relationships

ScatterPlot

Page 7: Chapter 3 Examining Relationships

Interpreting Scatterplots

To interpret a scatterplot, look first for a pattern. The pattern should reveal direction, form and strength of the relationship between two variables.

Refer to Figure 3.1 on page 175. Form: two clusters

Direction: Negatively associatedStrength: moderate

Page 8: Chapter 3 Examining Relationships

Strong, Positive Association with Linear Form

Page 9: Chapter 3 Examining Relationships

Some Relationships Have No Direction or Pattern

Page 10: Chapter 3 Examining Relationships

Not All Relationships are Linear

Mile

age

4

6

8

10

12

14

16

18

20

22

Speed0 20 40 60 80 100 120 140 160

Collection 1 Scatter Plot

Page 11: Chapter 3 Examining Relationships

Yes, you can have outliers on ScatterPlots

An outlier in any graph of data is an individual observation that falls outside the overall pattern of the graph (ie: WV)

Page 12: Chapter 3 Examining Relationships

Add a Third Variable (Categorical) of Southern and non-Southern by Using Different Symbols

Page 13: Chapter 3 Examining Relationships

Scatter Plot Heads Up

When several individuals have exactly the same data, they occupy the same point on the scatter plot. Some software packages address the issue by using different symbols for multiple individuals with the same data. You can do the same by hand. However, your calculator does not. So be careful. Use trace to identify such cases.

Page 14: Chapter 3 Examining Relationships

Scatterplots display direction, form, strength and relationship between two variables. However, our eyes are not a good judge of the strength of the relationship.

Page 15: Chapter 3 Examining Relationships

Key Concept

Correlation measure the direction and strength of the linear relationship between two quantitative variables. Correlation is usually written as r.

y

i

x

i

syy

sxx

nr

11

Page 16: Chapter 3 Examining Relationships

Facts About Correlation• No distinction between

explanatory and response variable

• Requires two quantitative variables

• Unit change of observation does not change correlation

• Positive r indicates positive association, negative r indicates negative association

• Range: -1 < r < 1 • Measures strength of linear

relationships of two variables only

• Is not resistant to outliers

Page 17: Chapter 3 Examining Relationships

Correlation Exercise

• Technology Toolbox, page 186• Yes, The process is long and convoluted,

but there is a shortcut using LinReg Command

Page 18: Chapter 3 Examining Relationships

Key Concept

1) A key thing to remember when working with correlations is never to assume a correlation means that a change in one variable causes a change in another. Sales of personal computers and athletic shoes have both risen strongly in the last several years and there is a high correlation between them, but you cannot assume that buying computers causes people to buy athletic shoes (or vice versa).

Page 19: Chapter 3 Examining Relationships

Key Concept

2) Correlation only describes linear relationships only, now matter how strong how strong the curved relationship may be.

3) Like mean and standard deviation, correlation, r, is not resistant to outliers

4) Correlation is not a complete summary of a two variable relationship. You should give the means of x and y.

Page 20: Chapter 3 Examining Relationships

Homework

• Read 3.2• Complete problems 1, 2, 6, 7 ,8, 13, 15,

19, 21, 23

Page 21: Chapter 3 Examining Relationships

Chapter 3Examining Relationships

Section 3.2 Least-Squares Regression

Page 22: Chapter 3 Examining Relationships

Key Term

Least Squares Regression is a method for finding a line the summarizes the relationship between two variables that show a linear trend.

We often use a regression line to predict the value of y for a given value of x. Regression, unlike correlation requires that we have an explanatory variable and a response variable.

Page 23: Chapter 3 Examining Relationships

Regression Line for Predicting Gas Consumption from Degree Days

Page 24: Chapter 3 Examining Relationships

Why is it Called A Least-Squares Regression Line (“LSRL”)?

Page 25: Chapter 3 Examining Relationships

Why is it Called A Least-Squares Regression Line (“LSRL”)?

Page 26: Chapter 3 Examining Relationships

Correlation and Regression Appletwww.whfreeman.com

Page 27: Chapter 3 Examining Relationships

LSRL – Using TI84

NEAΔ(cal)

-94 -57 -29 135 143 151 245 355 392 573 486 535 571 580 620 690

FatΔ(kg)

4.2 3.0 3.7 2.7 3.2 3.6 2.4 1.3 3.8 1.7 1.6 2.2 1.0 0.4 2.3 1.1

Enter NEA data in L1 and Fat data in L2

Page 28: Chapter 3 Examining Relationships

NEA/Fat Least-Squares Regression Line Exercise

Complete Technology Toolbox on page 210

Page 29: Chapter 3 Examining Relationships

Interpret you regression equation in terms of your variables

(ie: fat gain = a + b(NEA change)

Page 30: Chapter 3 Examining Relationships

Use your Model to predict weight gain given an NEA of 400

(interpolation)

Use your Model to predict weight gain given an NEA of 1000

(extrapolation)

Page 31: Chapter 3 Examining Relationships

Equation of the Least-Squares Regression Line

• You can manually calculate the equation of the Least-Squares Regression Line

bxay ˆWith slope

x

y

ssrb

And Intercept

xbya

Page 32: Chapter 3 Examining Relationships

Homework

• Exercises 3.29 – 32, 35, 36• Read Section 3.3

Page 33: Chapter 3 Examining Relationships

Chapter 3Examining Relationships

Section 3.2 Least-Squares Regression (Continued)

Section 3.3 Correlation and Regression Wisdom

Page 34: Chapter 3 Examining Relationships

Key Concept

• A residual is the difference between and observed value of the response variable and the value predicted by the regression line. That is,

residual = observed y – predicted y

yyresidual ˆ

Page 35: Chapter 3 Examining Relationships

Residual is the distance between actual and predicted y

Page 36: Chapter 3 Examining Relationships

Example residual plot

Page 37: Chapter 3 Examining Relationships

Interpreting a Residual Plot• The uniform scatter of points indicates the

regression line fits the data well, so the line is a good model.

Page 38: Chapter 3 Examining Relationships

Interpreting a Residual Plot

• The residual have a curved pattern, so a straight line is an inappropriate model

Page 39: Chapter 3 Examining Relationships

Interpreting a Residual Plot

• The response variable y has more spread for larger values of the explanatory variable x, so prediction will be less accurate when x is large.

Page 40: Chapter 3 Examining Relationships

Create a Residual Plot with Hand-Span Data

Follow procedures detailed in Technology toolbox on page 219

Page 41: Chapter 3 Examining Relationships

Key Concept

The coefficient of determination, r2, is the fraction of the variation in the values of y that is explained by least-squares regression of y on x.

Eg: ____% of the variation in Height is accounted for by the linear relationship between hand size and Height.

Page 42: Chapter 3 Examining Relationships

Facts about Least-Squares Regression

• Fact 1 – The distinction between explanatory and response variables is essential in regression

Page 43: Chapter 3 Examining Relationships

Facts about Least-Squares Regression

• Fact 2 – There is a close connection between correlation and the slope of the least squares line. The slope is:

x

y

ssrb

Page 44: Chapter 3 Examining Relationships

Facts about Least-Squares Regression

• Fact 3 – The least squares line always passes through the point

),( yx

Page 45: Chapter 3 Examining Relationships

Constructing the Least-Squares Example

Suppose we have explanatory and response variables and we know that the mean of x=17, mean of y=161.111, sx=19.696, sy=33.479 and the correlation r = .997. Even though we don’t know the actual data, we can still construct the equation for the least-squares line and use it to make predictions.

Page 46: Chapter 3 Examining Relationships

Constructing the Least-Squares Example

695.1696.19479.33997.

x

y

ssrb

920.131)222.17)(695.1(111.161 xbya

So the Least-squares Line has an equation

xy 695.1920.131ˆ

Page 47: Chapter 3 Examining Relationships

Facts about Least-Squares Regression

• Fact 4 – The correlation r describes the strength of a straight-line relationship. In the regression setting, this description takes a specific form: The square of the correlation, r2 , is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.

Page 48: Chapter 3 Examining Relationships

Key Concept

• Correlation and regression describe only linear relationships

• Extrapolation (using a model outside of the range of the data) often produces unreliable predictions

Page 49: Chapter 3 Examining Relationships

Outliers and Influential Observations in Regression

• An outlier is an observation that lies outside the overall pattern of the other observations

• An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction of a scatter plot are often influential for the least-squares regression line. (Example: Revisit correlation applet)

Page 50: Chapter 3 Examining Relationships

Child 19 and Child 18 are both outliers. Child 18 is more influential.

Page 51: Chapter 3 Examining Relationships

Beware the Lurking Variable• A lurking variable is a variable that is not

among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables.

• Examples: – 1 A strong positive correlation exist

between the weight and reading skills of elementary school children.

– 2 Methodist Preacher and Whisky. • What are the lurking variables?

Page 52: Chapter 3 Examining Relationships

Beware of Correlations Based on Averaged Data

• Correlations based on average data are usually too high when applied to individuals.

• Example: age vs height of individual young children and average age vs average height of young children.

• Variation decreases with averaged data

Page 53: Chapter 3 Examining Relationships

Homework

• Exercises 37, 38, 39, 40, 48, 50, 64, 66, 67, 71

• Take Home Quiz