mat 1000 mathematics in today's world. last time we saw how to use the mean and standard...

37
MAT 1000 Mathematics in Today's World

Upload: beverly-reed

Post on 24-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

MAT 1000

Mathematics in Today's World

Last Time

We saw how to use the mean and standard deviation of a normal distribution to determine the percentile of a data value from that distribution.

The Pth percentile of a distribution is a value which P percent of the data is less than. For instance: 80% of a distribution is less than the 80th percentile.

For a normal distribution, we can find percentiles by computing a standard score, and then using a table to look up the percentile.

Today

Recall that “variables” are characteristics or attributes of individuals. We will consider pairs of variables. In other words, we will look at a pair of characteristics of an individual. The key question will be: are these variables related?

We will discuss scatterplots, which are a way to visualize data that consists of pairs of variables.

We will talk about the key features of scatterplots: form, direction, and strength.

Today

We will also talk about correlation.

For a data set consisting of pairs of numbers, correlation is a number between -1 and 1.

If the data has a linear form, then correlation tells us about the strength and direction of the relationship between the variables.

Pairs of variablesWhat are some examples of pairs of variables?

Height and weight of people. This gives us two numbers, one for each individual.

The time it takes me to run a mile and my heart rate afterwards. Each time I run a mile I get another pair of numbers.

Pairs of variablesWe collect data on pairs of variables in order to study relationships between those variables. Sometimes we are interested in cause and effect relationships.

ExampleWill you live longer if you increase your intake of vitamin A?

Cause: Amount of vitamin A taken (in IUs)Effect: Lifespan (in years)

For each individual we get two numbers: average daily intake of vitamin A, and lifespan.

Pairs of variablesWhen we talk about a pair of variables that we know, or at least believe or hope, have a cause and effect relationship, we use the following terms:

The explanatory variable is what we believe to be the cause.

The response variable is what we believe to be the effect.

Statistics give us evidence for a cause and effect relationship, but statistics will not prove this.

Scatterplots

Scatterplots are visual representations of pairs of data.

There is a horizontal scale and a vertical scale. Each direction corresponds to one of the variables.

Each individuals is represented by one dot. The horizontal and vertical location of the dot corresponds to the values of each variable.

Scatterplots

Scatterplot of the life expectancy of people in many nations against eachnation’s gross domestic product per person.

Scatterplots

Interpreting scatterplots

To interpret a scatterplot, look for three things:

1. Form

2. Direction

3. Strength

The form of a scatterplot is its overall shape. This may be a straight line, a curved line, or some other shape altogether.

The strength is how close the scatterplot is to its form.

Interpreting scatterplotsWe distinguish between two directions: positive and negative.

This is especially useful when the form of a scatterplot is a straight line. (In this case, the direction corresponds to the sign of the slope of the line: positive slope = positive direction.)

The rule for a positive direction: larger values of the explanatory variable correspond to larger values of the response variable.

The rule for a negative direction: larger values of the explanatory variable correspond to smaller values of the response variable.

Form: curved lineStrength: fairly strong

Direction: positve

Interpreting scatterplots

Form: straight lineStrength: moderateDirection: positve

Interpreting scatterplots

Interpreting scatterplots

If height and weight have a positive association what does that tell us?

It means that taller people tend to weigh more.

This is a statement about a general tendency. We don’t worry about the exceptions.

Interpreting scatterplotsWhat about the time it takes me to run a mile and my heart rate afterwards?

If I run faster, my time is less, and I’m working harder so my heart rate will go up.

If I run slower, I will have a longer time, and I won’t be working as hard, so my heart rate won’t go up as much.

What direction is this association?

Negative: larger values of the explanatory variable (time) correspond to lower values of the response variable (heart rate).

Interpreting scatterplots

In addition to form, direction, and strength, which are general features of a scatterplot, you should also note any outliers.

On a scatterplot the outliers are dots that don’t fit into the overall pattern.

Sierra Leone is a clear outlier on this scatterplot.

Interpreting scatterplots

Linear form

To find the form of an association, look at a scatterplot.

If one straight line gives a reasonable approximation to the scatterplot, the form is said to be “linear.”

Let’s consider some examples.

Linear form

Linear form

Non-linear form

Not every relationship is linear

ExampleConsider the relationship between the speed you drive and the gas mileage you get.

As your speed increases, your mileage increases, up to a certain speed (usually around 55 or 60 mph). This will look roughly like a straight line.

But around 55 or 60 mph (the exact speed depends on the type of car), your mileage begins to decrease.

Let’s look at a scatterplot.

Non-linear form

This is not a linear scatterplot

Non-linear form

Correlation

When a scatterplot has a linear form, we can measure the strength of the association using a number called the “correlation.”

Here are some facts about correlation:• Abbreviated by the letter• is a number between -1 and 1• The sign of (positive or negative) is same as the direction of

the association• Stronger associations have closer to either 1, or to -1.• Correlation has no units.

Interpreting correlation

Here are some guidelines on using the value of to interpret the strength of a relationship

Value of correlation Strength of relationship

0.8 to 1.0 -1.0 to -0.8 Very strong

0.6 to 0.8 -0.8 to -0.6 Strong

0.4 to 0.6 -0.6 to -0.4 Moderate

0.2 to 0.4 -0.4 to -0.2 Weak

-0.2 to 0.2 Either very weak, or not a linear relationship

Interpreting correlation

Here are some concrete examples to give you a better feel for correlations:• The correlation between SAT score and college GPA is about

0.6. • The correlation between height and weight for American

males is about 0.4.• The correlation between income and education level in the

United States is about 0.4.• The correlation between a person’s income and the last 4

digits of their phone number is 0.

Interpreting correlationHere are examples of scatterplots for various values of 𝑟

Notice the relationship between direction and sign, and also that the closer r is to 1 or -1, the stronger the association

Calculating correlation

Calculating correlations by hand takes some work.

ExampleFind the correlation between the height and weight of the following five men:

Notice that our data set has five individuals and two variables.

Height (inches) 67 72 77 74 69Weight (pounds) 155 220 240 195 175

Calculating correlation

ExampleWe start by finding four numbers:1. The mean of the five heights2. The mean of the five weights3. The standard deviation of the five heights4. The standard deviation of the five weights

Remember our notation for the mean: . With two different means, it would be confusing to call them both

To keep them separate, call the mean of the heights and the mean of the weights

Calculating correlation

ExampleWe have the same issue for the standard deviations: we don’t want to call both of them .

So let’s call the standard deviation of the height and the standard deviation of the weights (this is the usual notation).

Using this notation we can find that:

Calculating correlation

ExampleNext we find standard scores for each height and weight. Remember the formula for standard scores:

For each height we subtract the average of the heights , and divide by , the standard deviation of the heights.

Likewise, for each weight we subtract the average of the weights , and divide by , the standard deviation of the weights.

To keep organized, let’s make a table

Calculating correlation

Multiply the standard score of a person’s weight by the standard score of their height.Then we add up this last column.

6772777469

-1.210.051.310.56-0.71

155220240195175

-1.230.681.26-0.06-0.65

1.500.031.66-0.030.46

3.61

Calculating correlation

ExampleFinally, we take this number 3.61 and divide by . Here is the number of individuals in the data set. Don’t forget there are 2 numbers per individual, so we have

The correlation is

This means there is a very strong positive correlation between height and weight for these five men.

Calculating correlation

In review, the steps for finding correlation are: 1. Find standard scores for each variable2. Multiply corresponding pairs of standard scores3. Add up these products4. Divide by

There is a formula that encapsulates all the steps we’ve taken:

Calculating correlation

One more fact about correlation worth noting: the correlation between two variables does not depend on the units we use to measure them.

For this data, we found

If we had measured the heights and weights of these five men in centimeters and kilograms, our data would look like this:

Height (inches) 67 72 77 74 69Weight (pounds) 155 220 240 195 175

Height (cm) 170 183 196 188 175Weight (kg) 70 100 109 88 79

Calculating correlation

It turns out that for this data the correlation is also

Even though the numbers are different, the correlation is exactly the same. This is not a coincidence:

When we find correlation, it does not matter what units we use.

Height (cm) 170 183 196 188 175Weight (kg) 70 100 109 88 79