relationships
DESCRIPTION
Relationships. If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the variables ?. Association Between Variables :. Two variables measured on the - PowerPoint PPT PresentationTRANSCRIPT
Relationships• If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the variables ?
• Association Between Variables : Two variables measured on thesame individuals are associated if some values of one variabletend to occur more often with some values of the secondvariable than with other values of that variable.
• Response Variable : A response variable measures an outcomeof a study.
• Explanatory Variable : An explanatory variable explains or causes changes in the response variable.
Scatterplots• A scatterplot shows the relationship between two variables.
• The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis.
• Always plot the explanatory variable on the horizontal axis, and the response variable as the vertical axis.
Example: If we are going to try to predict someone’s weight from theirheight, then the height is the explanatory variable, and the weight isthe response variable.
• The explanatory variable is often denoted by the variable x, and is sometimes called the independent variable.
• The response variable is often denoted by the variable y, and is sometimes called the dependent variable.
ScatterplotsExample: Do you think that a father’s height would affect a son’s height?
We are saying that given a father’s height, can we make any determinations about the son’s height ?The explanatory variable is : The father’s height
The response variable is : The son’s height
Data Set : Father’s Height Son’s Height
64 6568 6768 7070 7272 7574 7075 7375 7676 7777 76
Father’s Height Son’s Height
64 6568 6768 7070 7272 75
Father’s Height Son’s Height
74 7075 7375 7676 7777 76
64 68 72 76
64
68
72
76
Explanatory Variable (Father’s Height)
Response Variable (Son’s Height)
Father’s Height Son’s Height
64 6568 6768 7070 7272 75
Father’s Height Son’s Height
74 7075 7375 7676 7777 76
64 68 72 76
64
68
72
76
Father
Son
Examining A Scatterplot• In any graph of data, look for the overall pattern and for striking striking deviations from that pattern.
• You can describe the overall pattern of a scatterplot by the form, direction, and strength of the relationship.
• An important kind of deviation is an outlier, an individual that falls outside the overall pattern of the relationship.
• Two variables are positively associated when above-average values of one tend to accompany above average values of the other and below average values also tend to occur together.
• Two variables are negatively associated when above-average values of one accompany below-average values of the other; and vice versa.
• Strength : How closely the points follow a clear form.
Examining A ScatterplotConsider the previous scatterplot :
64 68 72 76
64
68
72
76
Father
Son
Direction : Going up
Form : Linear
Association : Positive
Strength : Strong
Outliers : None
Example : The following is a scatterplot of data collected from statesabout students taking the SAT. The question is whether the percentageof students from a state that takes the test will influence the state’saverage scores.
For instance, in California, 45 % of high school graduates took the SATand the mean verbal score was 495.
Direction : Downward
Form : Curved
Association : Negative
Strength : Strong
Outliers : Maybe
Categorical Variables• To add a categorical variable to a scatterplot, use a different plot color or symbol for each category.
Example : Take the last scatterplot and mark the northeastern stateswith an “e” and the midwestern states with an “m” :
Notice the grouping :
Outliers ?
Notes• When we draw the line though the data set, we are drawing the model we want to use for the data set. We would like to find the equation for this line to help us understand the data. This is called “smoothing”.
(Figure 2.5) (Figure 2.6)
Notes• When we draw the line though the data set, we are drawing the model we want to use for the data set. We would like to find the equation for this line to help us understand the data. This is called “smoothing”.
• How can we display a relationship between a categorical explanatory variable, and a quantitative response variable :
• Use a back-to-back stemplot to compare the distributions
• Use side-by-side boxplots to compare any number of distributions.
Example : It would make sense that the more hours people work in a week would lead to higher wages. The Census Bureau publishesrelevant data. Unfortunately, “how much a person works appears as acategorical variable :
A = 26 weeks or less B = 27 to 39 weeks C = 50 weeks or more
Notice also that wages is a quantitative variable.
Homework1, 3, 4, 6, 7, 10, 13, 17