![Page 1: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/1.jpg)
LECTURE 2Understanding Relationships Between 2 Numerical Variables1 Scatterplots and correlation
2 Fitting a straight line to bivariate data
![Page 2: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/2.jpg)
Objectives
Scatterplots
Explanatory (independent) and response (dependent) variables
Interpreting scatterplots
Outliers
Categorical variables in scatterplots
![Page 3: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/3.jpg)
Focus on Three Features of a Scatterplot
Look for an overall pattern regarding …
1. Shape - ? Approximately linear, curved, up-and-down?
2. Direction - ? Positive, negative, none?
3. Strength - ? Are the points tightly clustered in the particular shape, or are they spread out?
Blood Alcohol as a function of Number of Beers
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0 1 2 3 4 5 6 7 8 9 10
Number of Beers
Blo
od A
lcoh
ol L
evel
(m
g/m
l)… and deviations from the overall pattern:Outliers
![Page 4: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/4.jpg)
Explanatory (independent) variable: number of beers
Blood Alcohol as a function of Number of Beers
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0 1 2 3 4 5 6 7 8 9 10
Number of Beers
Blo
od A
lcoh
ol L
evel
(m
g/m
l)Response
(dependent)
variable:
blood alcohol
content
xy
Explanatory and response variablesA response variable measures or records an outcome of a study. An
explanatory variable explains changes in the response variable.
Typically, the explanatory or independent variable is plotted on the x
axis, and the response or dependent variable is plotted on the y axis.
![Page 5: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/5.jpg)
Making Scatterplots
House Price Square Feet is bivariate data:
Excel: Insert scatterplot
MegaStat: Correlation/Regression - Scatterplot
![Page 6: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/6.jpg)
Form and direction of an association
Linear
Nonlinear
No relationship
![Page 7: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/7.jpg)
Positive association: High values of one variable tend to occur together
with high values of the other variable.
Negative association: High values of one variable tend to occur together
with low values of the other variable.
![Page 8: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/8.jpg)
One way to think about this is to remember the following: The equation for this line is y = 5.x is not involved.
No relationship: X and Y vary independently. Knowing X tells you nothing about Y.
![Page 9: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/9.jpg)
Strength of the association
The strength of the relationship between the two variables can be
seen by how much variation, or scatter, there is around the main form.
With a strong relationship, you can get a pretty good estimate
of y if you know x.
With a weak relationship, for any x you might get a wide range of
y values.
![Page 10: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/10.jpg)
This is a very strong relationship.
The daily amount of gas consumed
can be predicted quite accurately for
a given temperature value.
This is a weak relationship. For a
particular state median household
income, you can’t predict the state
per capita income very well.
![Page 11: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/11.jpg)
How to scale a scatterplot
Using an inappropriate scale for a scatterplot can give an incorrect impression.
For greatest detail, both variables should be given a similar amount of space:• Plot roughly square• Points should occupy all
the plot space (no blank space)
• For most accurate the Y-axis should have a 0 origin.
Same data in all four plots
![Page 12: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/12.jpg)
Outliers
An outlier is a data value that has a very low probability of occurrence
(i.e., it is unusual or unexpected).
In a scatterplot, outliers are points that fall outside of the overall pattern
of the relationship.
![Page 13: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/13.jpg)
Not an outlier:
The upper right-hand point here is
not an outlier of the relationship—It
is what you would expect for this
many beers given the linear
relationship between beers/weight
and blood alcohol. It is however an
outlier for both X and Y values
This point is not in line with the
others, so it is an outlier of the
relationship. It is also an X
outlier but not a Y outlier
Outliers
![Page 14: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/14.jpg)
IQ score and Grade point average
a)Describe in words what this plot shows.
b)Describe the direction, shape, and strength. Are there outliers?
c) What is the deal with these people?
![Page 15: LECTURE 2 Understanding Relationships Between 2 Numerical Variables 1 Scatterplots and correlation 2 Fitting a straight line to bivariate data](https://reader035.vdocuments.site/reader035/viewer/2022081519/56649dbf5503460f94ab2fda/html5/thumbnails/15.jpg)
Categorical variables in scatterplotsOften, things are not simple and one-dimensional. We need to group
the data into categories to reveal trends.
What may look like a positive linear
relationship is in fact a series of
negative linear associations.
Plotting different habitats in different
colors allows us to make that
important distinction.