the big picture where we are coming from and where we are headed… chapter 5 showed us methods for...

29
The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only one variable at a time. In Chapter 6, we learn how to analyze the relationship between two quantitative variables using scatterplots, correlation, and regression. In Chapter 7, we will learn about probability, which we will need in order to perform statistical inference. 1

Upload: clementine-anderson

Post on 13-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

The Big Picture

Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only one variable at a time. In Chapter 6, we learn how to analyze the relationship between two quantitative variables using scatterplots, correlation, and regression. In Chapter 7, we will learn about probability, which we will need in order to perform statistical inference.

1

Page 2: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

• Scatterplots and Correlation

Section 6.1

Objectives:

Construct and interpret scatterplots for two quantitative variables.

Calculate and interpret the correlation coefficient.

Determine whether a linear correlation exists between two variables.

Page 3: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Explanatory and Response Variables• Response variable measures an

outcome of a study.• An explanatory variable explains,

influences or causes change in a response variable.

• Independent variable and dependent variable.

• Be careful!! The relationship between two variables can be strongly influenced by other variables that are lurking in the background.

Page 4: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Explanatory and response variables

In each of the following examples, determine if there is a clear explanatory and response variable, or if it is just best to explore the relationship.

• Price of a house and square footage of a house

• The arm span and height of a person

• Amount of snow in the Colorado mountains and the volume of water in area rivers

Explanatory-square feet Response-price

Explanatory-arm span response-height

Explore the relationship

Page 5: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Displaying relationships: Scatterplots

– A scatterplot displays the relationship between two quantitative variables measured on the same individuals.

– It is the most common way to display the relation between two quantitative variables.

– It displays the form, direction, and strength of the relationship between two quantitative variables.

– The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual.

Page 6: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Lot x=square footage (100s of sq ft)

y=sales price ($1000s)

Harding St 75 155

Newton Ave 125 210

Stacy Ct 125 290

Eastern Ave 175 360

Second St 175 250

Sunnybrook Rd 225 450

Ahlstrand Rd 225 530

Eastern Ave 275 635

Example:

Page 7: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

7

ScatterplotsThe relationship between two quantitative variables can take many different forms. Four of the most common are:

Positive linear relationship: As x increases, y also tends to increase.

Negative linear relationship: As x increases, y tends to decrease.

No apparent relationship: As x increases, y tends to remain unchanged.

Nonlinear relationship: The x and y variable are related, but not in a way that can be approximated using a straight line.

Page 8: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Interpreting scatterplots

• How to examine a scatterplot:

– Determine the overall pattern showing:• The form, direction, and strength of the relationship

– Identify any outliers or other deviations from this pattern.

Page 9: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Interpreting scatterplots• Overall Pattern

– Form: Linear relationships, where the points show a straight-line pattern, are an important form of relationship between two variables. Curved relationships and clusters (a number of similar individuals that occur together) are other forms to watch for.

– Direction: If the relationship has a clear direction, we speak of either positive association (the more the x, the more the y) or negative association (the more the x, the less the y).

– Strength: The strength of a relationship is determined by how close the points in the scatterplot lie to a line.

Page 10: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Describe the scatterplot:

Strong positive linearStrong negative linear

Page 11: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Strong positive linearStrong negative linear

Strong negative curved

Page 12: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Sketch a scatterplot of the data and then describe the overall pattern.

Is there an obvious explanatory and response variable?

Page 13: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only
Page 14: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Exercises:

Pg. 337/6.1,6.3,6.4

Pg. 343-345/6.5,6.6,6.8 

 

Page 15: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Scatterplot & Correlation

• Scatterplots provide a visual tool for looking at the relationship between two variables. Unfortunately, our eyes are not good tools for judging the strength of the relationship. Changes in the scale or the amount of white space in the graph can easily change our judgment of the strength of the relationship.

• Correlation is a numerical measure we use to show the strength of linear association.

Page 16: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

A scatter plot is helpful in understanding the form, direction, and strength of the relationship between two variables.

Correlation allows us to quantify the direction and strength of the relationship.

Page 17: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Ex 1: Describe the correlation illustrated by the scatter plot.

There is a positive correlation between the two data sets.

As the average daily temperature increased, the number of visitors increased.

Page 18: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Ex. 2: Describe the correlation illustrated by the scatter plot.

There is a negative correlation between elevation and mean annual temp.

As the elevation in Nevada increases, the mean annual temperature decreases.

Page 19: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Facts about correlation• What kind of variables do we use?

– 1. No distinction between explanatory and response variables.

– 2. Both variables must be quantitative• Numerical properties

– 1. – 2. r > 0: positive association between variables– 3. r < 0: negative association between variables– 4. If r = 1 or r = - 1, it indicates a perfect linear

relationship– 5. As |r| is getting close to 1, much stronger relationship

– 6. Effected by a few outliers not resistant.– 7. It doesn’t describe curved relationships– 8. It is not affected by changing units

strongerstronger

iprelationshpositiveiprelationshnegative

101

11 r

Page 20: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

20

Page 21: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Measuring linear association: correlation r(The Pearson Product-Moment Correlation Coefficient or Correlation Coefficient)

))((1

1

y

i

x

i

s

yy

s

xx

nr

Don’t worry, that’s why we have

graphing calculators!!!

Page 22: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

You can use a graphing calculator to perform a linear regression and find the correlation coefficient r.

To display the correlation coefficient r, you may have to turn on the diagnostic mode. To do this, press and choose the DiagnosticOn mode.Press enter, and then press enter again to activate it. You can use a graphing calculator to perform a linear regression and find the correlation coefficient r.

Page 23: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Example 1:

1.) Sketch a scatterplot2.) State the overall pattern3.) Are there any outliers?4.) Calculate the correlation coefficient

Page 24: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

In one of the Boston city parks, there has been a problem with muggings in the summer months. A police officer took a random sample of 10 days (out of the 90-day summer) and compiled the following data. For each day, x represents the number of police officers on duty in the park and y represents the number of reported muggings on that day.

Example 2:

X 10 15 16 1 4 6 18 12 14

y 5 2 1 9 7 8 1 5 3

7

6

Page 25: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

a. Construct a scatterplot

b. Estimate a value for r.c. Calculate the actual r value.

Page 26: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

A Caution• The correlation coefficient measures the

strength of the relationship between two variables.

• A strong correlation does not imply a cause and effect relationship.

• A correlation between two variables may be caused by other (either known or unknown) variables called lurking variables.

Page 27: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Example Cause-Effect Relationship

During the months of March and April, the weekly weight increases of a puppy in New York were collected.  For the same time frame, the retail price increases of snowshoes in Alaska were collected.

The weight of a The retail price ofGrowing puppy in NY snowshoes in Alaska

8 pounds $32.458.5 $32.959 $33.459.6 $34.00

10.1 $34.5010.7 $35.10

11.5 $35.63

Page 28: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Example Cause-Effect Relationship cont.

• The data was examined and was found to have a very strong linear correlation. So, this must mean that the weight increase of a puppy in New York is causing snowshoe prices in Alaska to increase.  Of course this is not true!

•  The moral of this example is:  "be careful what you infer from your statistical analyses."  Be sure your relationship makes sense.  Also keep in mind that other factors may be involved in a cause-effect relationship

Page 29: The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only

Exercises:Pg.350/6.10-6.12Pg.355/6.13-6.15Pg.359/6.17-6.22 (section review)