chapter 3 examining relationships
Post on 19-Mar-2016
38 Views
Preview:
DESCRIPTION
TRANSCRIPT
Chapter 3Examining Relationships
Section 3.1 Scatterplots
Terms to Know
A response variable measures an outcome of a study. An explanatory
variable attempts to explain the observed outcomes.
Example of an Explanatory and Response Variable
One degree day is accumulated for each degree a day’s average temp falls below or rises above 65 degrees.
Key Concept
The statistical techniques used to study relations among variables are more complex than one-variable methods.
Fortunately we build on the tools used for examining individual variables. The principles that guide
examination are the same.
1. Start with a graph
2. Look for an overall pattern and deviations from the pattern
3. Add numerical descriptions of specific aspects of the data
4. Sometimes there is a way to describe that
Term to Know
The most effective way to display the relation between two quantitative variables
is a scatterplot. Plot the explanatory variable, if there is one, on the x-axis, and the response variable on the y-axis. Each individual in the data appears as a point.
ScatterPlot
Interpreting Scatterplots
To interpret a scatterplot, look first for a pattern. The pattern should reveal direction, form and strength of the relationship between two variables.
Refer to Figure 3.1 on page 175. Form: two clusters
Direction: Negatively associatedStrength: moderate
Strong, Positive Association with Linear Form
Some Relationships Have No Direction or Pattern
Not All Relationships are Linear
Mile
age
4
6
8
10
12
14
16
18
20
22
Speed0 20 40 60 80 100 120 140 160
Collection 1 Scatter Plot
Yes, you can have outliers on ScatterPlots
An outlier in any graph of data is an individual observation that falls outside the overall pattern of the graph (ie: WV)
Add a Third Variable (Categorical) of Southern and non-Southern by Using Different Symbols
Scatter Plot Heads Up
When several individuals have exactly the same data, they occupy the same point on the scatter plot. Some software packages address the issue by using different symbols for multiple individuals with the same data. You can do the same by hand. However, your calculator does not. So be careful. Use trace to identify such cases.
Scatterplots display direction, form, strength and relationship between two variables. However, our eyes are not a good judge of the strength of the relationship.
Key Concept
Correlation measure the direction and strength of the linear relationship between two quantitative variables. Correlation is usually written as r.
y
i
x
i
syy
sxx
nr
11
Facts About Correlation• No distinction between
explanatory and response variable
• Requires two quantitative variables
• Unit change of observation does not change correlation
• Positive r indicates positive association, negative r indicates negative association
• Range: -1 < r < 1 • Measures strength of linear
relationships of two variables only
• Is not resistant to outliers
Correlation Exercise
• Technology Toolbox, page 186• Yes, The process is long and convoluted,
but there is a shortcut using LinReg Command
Key Concept
1) A key thing to remember when working with correlations is never to assume a correlation means that a change in one variable causes a change in another. Sales of personal computers and athletic shoes have both risen strongly in the last several years and there is a high correlation between them, but you cannot assume that buying computers causes people to buy athletic shoes (or vice versa).
Key Concept
2) Correlation only describes linear relationships only, now matter how strong how strong the curved relationship may be.
3) Like mean and standard deviation, correlation, r, is not resistant to outliers
4) Correlation is not a complete summary of a two variable relationship. You should give the means of x and y.
Homework
• Read 3.2• Complete problems 1, 2, 6, 7 ,8, 13, 15,
19, 21, 23
Chapter 3Examining Relationships
Section 3.2 Least-Squares Regression
Key Term
Least Squares Regression is a method for finding a line the summarizes the relationship between two variables that show a linear trend.
We often use a regression line to predict the value of y for a given value of x. Regression, unlike correlation requires that we have an explanatory variable and a response variable.
Regression Line for Predicting Gas Consumption from Degree Days
Why is it Called A Least-Squares Regression Line (“LSRL”)?
Why is it Called A Least-Squares Regression Line (“LSRL”)?
Correlation and Regression Appletwww.whfreeman.com
LSRL – Using TI84
NEAΔ(cal)
-94 -57 -29 135 143 151 245 355 392 573 486 535 571 580 620 690
FatΔ(kg)
4.2 3.0 3.7 2.7 3.2 3.6 2.4 1.3 3.8 1.7 1.6 2.2 1.0 0.4 2.3 1.1
Enter NEA data in L1 and Fat data in L2
NEA/Fat Least-Squares Regression Line Exercise
Complete Technology Toolbox on page 210
Interpret you regression equation in terms of your variables
(ie: fat gain = a + b(NEA change)
Use your Model to predict weight gain given an NEA of 400
(interpolation)
Use your Model to predict weight gain given an NEA of 1000
(extrapolation)
Equation of the Least-Squares Regression Line
• You can manually calculate the equation of the Least-Squares Regression Line
bxay ˆWith slope
x
y
ssrb
And Intercept
xbya
Homework
• Exercises 3.29 – 32, 35, 36• Read Section 3.3
Chapter 3Examining Relationships
Section 3.2 Least-Squares Regression (Continued)
Section 3.3 Correlation and Regression Wisdom
Key Concept
• A residual is the difference between and observed value of the response variable and the value predicted by the regression line. That is,
residual = observed y – predicted y
yyresidual ˆ
Residual is the distance between actual and predicted y
Example residual plot
Interpreting a Residual Plot• The uniform scatter of points indicates the
regression line fits the data well, so the line is a good model.
Interpreting a Residual Plot
• The residual have a curved pattern, so a straight line is an inappropriate model
Interpreting a Residual Plot
• The response variable y has more spread for larger values of the explanatory variable x, so prediction will be less accurate when x is large.
Create a Residual Plot with Hand-Span Data
Follow procedures detailed in Technology toolbox on page 219
Key Concept
The coefficient of determination, r2, is the fraction of the variation in the values of y that is explained by least-squares regression of y on x.
Eg: ____% of the variation in Height is accounted for by the linear relationship between hand size and Height.
Facts about Least-Squares Regression
• Fact 1 – The distinction between explanatory and response variables is essential in regression
Facts about Least-Squares Regression
• Fact 2 – There is a close connection between correlation and the slope of the least squares line. The slope is:
x
y
ssrb
Facts about Least-Squares Regression
• Fact 3 – The least squares line always passes through the point
),( yx
Constructing the Least-Squares Example
Suppose we have explanatory and response variables and we know that the mean of x=17, mean of y=161.111, sx=19.696, sy=33.479 and the correlation r = .997. Even though we don’t know the actual data, we can still construct the equation for the least-squares line and use it to make predictions.
Constructing the Least-Squares Example
695.1696.19479.33997.
x
y
ssrb
920.131)222.17)(695.1(111.161 xbya
So the Least-squares Line has an equation
xy 695.1920.131ˆ
Facts about Least-Squares Regression
• Fact 4 – The correlation r describes the strength of a straight-line relationship. In the regression setting, this description takes a specific form: The square of the correlation, r2 , is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.
Key Concept
• Correlation and regression describe only linear relationships
• Extrapolation (using a model outside of the range of the data) often produces unreliable predictions
Outliers and Influential Observations in Regression
• An outlier is an observation that lies outside the overall pattern of the other observations
• An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation. Points that are outliers in the x direction of a scatter plot are often influential for the least-squares regression line. (Example: Revisit correlation applet)
Child 19 and Child 18 are both outliers. Child 18 is more influential.
Beware the Lurking Variable• A lurking variable is a variable that is not
among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables.
• Examples: – 1 A strong positive correlation exist
between the weight and reading skills of elementary school children.
– 2 Methodist Preacher and Whisky. • What are the lurking variables?
Beware of Correlations Based on Averaged Data
• Correlations based on average data are usually too high when applied to individuals.
• Example: age vs height of individual young children and average age vs average height of young children.
• Variation decreases with averaged data
Homework
• Exercises 37, 38, 39, 40, 48, 50, 64, 66, 67, 71
• Take Home Quiz
top related