chapter 7

25
Chapter 7 Scatterplots, Association, and Correlation 60 min

Upload: kolina

Post on 23-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Chapter 7. Scatterplots , Association, and Correlation . 60 min. Associations. Associations between two quantitative variables can be explored using scatterplots . Ex: Study time v.s test scores. Scatterplots. Scatterplots may be the most common and most effective display for data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 7

Chapter 7Scatterplots, Association,

and Correlation

60 min

Page 2: Chapter 7

Associations between two quantitative variables can be explored using scatterplots.

Ex: Study time v.s test scores

Associations

Study time (in hours) 0.5 1.2 1.3 1.5 1.5 1.8 2.5 3 3.5 4 4 4.5 5 4.5 4.5 4 0 1 2.6 3.8 3.8 4.2 1 5.5 5

Test score (%) 54 65 67 69 66 68 80 82 84 94 90 92 70 98 100 88 40 95 82 82 84 87 66 99 98

Study time (in hours) 0.5 1.2 1.3 1.5 1.5 1.8 2.5 3 3.5 4 4 4.5 5 4.5 4.5 4 0 1 2.6 3.8 3.8 4.2 1 5.5 5

Test score (%) 54 65 67 69 66 68 80 82 84 94 90 92 70 98 100 88 40 95 82 82 84 87 66 99 98

Page 3: Chapter 7

0 1 2 3 4 5 60

20

40

60

80

100

120

Test Scores v.s. Study Time

Study time (hours)

Test

Sco

re (%

)

Page 4: Chapter 7

Scatterplots may be the most common and most effective display for data.

Scatterplots are the best way to start observing the relationship and the ideal way to picture associations between two quantitative variables.

When looking at scatterplots, we will look for direction, form, strength, and unusual features.

Scatterplots

Page 5: Chapter 7

A pattern that runs from the upper left to the lower right is said to have a negative direction.

A trend running the other way has a positive direction.

Direction

The figure shows a negative direction between the year since 1970 and the and the prediction errors made by NOAA.

As the years have passed, the predictions have improved (errors have decreased).

Page 6: Chapter 7

The example in the text shows a negative association between central pressure and maximum wind speed

As the central pressure increases, the maximum wind speed decreases.

Direction

Page 7: Chapter 7

Form If there is a straight line (linear) relationship,

it will appear as a cloud or swarm of points stretched out in a generally consistent, straight form.

Page 8: Chapter 7

Example What is the coldest temperature? -273.150 ( 0 in Kelvin scale, the so-called absolute zero) How do scientists determine that it cannot be colder than 0 K?

The volume of hydrogen gas depends on its temperature: 4334.2208213.0 TV

Temp -40 -20 0 20 40 60 80V(mole)

19.1482 20.7908 22.4334 24.0760 25.7186 27.3612 29.0038

Page 9: Chapter 7

The relationship is straight? curved gently? Increasing then decreasing? No pattern?

Form

Page 10: Chapter 7

How strong is the relationship?

Note: we will quantify the amount of scatter soon.

Strength

Page 11: Chapter 7

Look for the unexpected: Oftentimes the most interesting thing to see in a scatterplot is the thing you never thought to look for.

One example of such a surprise is an outlier standing away from the overall pattern of the scatterplot.

Clusters or subgroups should also raise questions.

Unusual features

Page 12: Chapter 7

It is important to determine which of the two quantitative variables goes on the x-axis and which on the y-axis. This determination is made based on the roles played by the variables.

When the roles are clear, the explanatory or predictor variable goes on the x-axis, and the response variable goes on the y-axis.◦ Example 1: Price versus Demand of a product◦ Example 2: A study confirm that people who use

artificial sweeteners in place of sugar tend to be heavier than people who use sugar. Does this mean that artificial sweeteners cause weight gain?

Roles for Variables

Page 13: Chapter 7

The roles that we choose for variables are more about how we think about them rather than about the variables themselves.

Just placing a variable on the x-axis doesn’t necessarily mean that it explains or predicts anything. And the variable on the y-axis may not respond to it in any way.

Roles for variables

Page 14: Chapter 7

Data collected from students in Statistics classes included their heights (in inches) and weights (in pounds):

Here we see a positive association and a fairly straight form, although there seems to be a high outlier.

Correlation

Page 15: Chapter 7

How strong is the association between weight and height of Statistics students?

If we had to put a number on the strength, we would not want it to depend on the units we used.

A scatterplot of heights (in centimeters) and weights (in kilograms) doesn’t change the shape of the pattern:

Correlation (cont.)

Page 16: Chapter 7

Since the units don’t matter, why not remove them altogether?

We could standardize both variables and write the coordinates of a point as (zx, zy).

Here is a scatterplot of the standardized weights and heights:

Correlation (cont.)

Page 17: Chapter 7

Some points (those in green) strengthen the impression of a positive association between height and weight.

Other points (those in red) tend to weaken the positive association.

Points with z-scores of zero (those in blue) don’t vote either way.

Correlation (cont.)

Page 18: Chapter 7

1x yz z

rn

Correlation Coefficient The correlation coefficient (r) gives us a

numerical measurement of the strength of the linear relationship between the explanatory and response variables.

For the students’ heights and weights, the correlation is 0.644.

What does this mean in terms of strength? We’ll address this shortly.

Page 19: Chapter 7

Correlation measures the strength of the linear association between the two variables. ◦ Variables can have a strong association but still have a

small correlation if the association isn’t linear. The sign of a correlation coefficient r gives the

direction of the association. Correlation coefficient r is always between -1

and +1. ◦ A correlation near 1 or -1 corresponds to a very strong

linear association.◦ A correlation near zero corresponds to a weak (or

almost no) linear association.

Correlation Properties

Page 20: Chapter 7

Correlation treats x and y symmetrically: ◦ The correlation of x with y is the same as the

correlation of y with x. Correlation has no units. Correlation is not affected by changes in the

center or scale of either variable. ◦ Correlation depends only on the z-scores, and they

are unaffected by changes in center or scale. Correlation is sensitive to outliers. A single

outlying value can make a small correlation large or make a large one small.

Correlation Properties (cont.)

Page 21: Chapter 7

Example: Lying Children will grow up to be successful citizens.

◦Researchers have recently found that kids who lie may have more success ahead. “Don’t tell a lie” is one of the first lessons parents try to teach their kids, but a new study finds children who lie may end up doing better as adults. A study at Toronto University of more than 1000 children ages 2 to 17 found the ability….(May 21, 2010)

Should we encourage our kids to lie?

Page 22: Chapter 7

Whenever we have a strong correlation, it is tempting to explain it by imagining that the predictor variable has caused the response variable to help.

Scatterplots and correlation coefficients never prove causation.

A hidden variable that stands behind a relationship and determines it by simultaneously affecting the other two variables is called a lurking variable.

Correlation ≠ Causation

Page 23: Chapter 7

Don’t confuse “correlation” with “causation.”◦ Scatterplots and correlations never demonstrate

causation. These statistical tools can only demonstrate an association between variables.

Don’t correlate categorical variables. Be sure the association is linear.

◦ There may be a strong association between two variables that have a nonlinear association.

What Can Go Wrong?

Page 24: Chapter 7

We examine scatterplots for direction, form, strength, and unusual features.

Although not every relationship is linear, when the scatterplot is straight enough, the correlation coefficient is a useful numerical summary.◦ The sign of the correlation tells us the direction

of the association.◦ The magnitude of the correlation tells us the

strength of a linear association.◦ Correlation has no units, so shifting or scaling

the data, standardizing, or swapping the variables has no effect on the numerical value.

What have we learned?

Page 25: Chapter 7

Page 186 – 193Problem # 3, 5, 7, 11, 13, 17, 25, 29, 33, 35.

Homework Assignment