statistics and quantitative analysis u4320 segment 8 prof. sharyn o’halloran

57
Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

Upload: crystal-mcbride

Post on 30-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

Statistics and Quantitative Analysis U4320

Segment 8Prof. Sharyn O’Halloran

Page 2: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

I. Introduction A. Overview

1. Ways to describe, summarize and display data.

2.Summary statements: Mean Standard deviation Variance

3. Distributions Central Limit Theorem

Page 3: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

I. Introduction (cont.)

A. Overview

4. Test hypotheses

5. Differences of Means

B. What's to come?

1. Analyze the relationship between two or more variables with a specific technique called regression analysis.

Page 4: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

I. Introduction (cont.)

A. Overview

B. What's to come?

2. This tools allows us to predict the impact of one

variable on another.

For example, what is the expected impact of a SIPA degree on income?

Page 5: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

II. Causal Models Causal models explain how changes in one variable

affect changes in another variable.

Incinerator -------------------------> Bad Public Health

Regression analysis gives us a way to analyze precisely

the cause-and-effect relationships between variables.

Directional Magnitude

?

Page 6: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

II. Causal Models (cont.)

A. Variables Let us start off with a few basic definitions.

1. Dependent Variable The dependent variable is the factor that we want

to explain. 2. Independent Variables

Independent variable is the factor that we believe causes or influences the dependent variable.

Independent variable-------> Dependent VariableCause ------------------> Effect

Page 7: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

II. Causal Models (cont.)

A. Variables

B. Voting Example Let us say that we have a vote in the House of

Representatives on health. And we want to know if party affiliation influenced individual members' voting decisions?

1. The raw data looks like this:

Vote (Dep)(Indep) YES NOParty DEM 220 65 285

REP 30 120 150250 185 435

Page 8: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

II. Causal Models (cont.)

A. Variables

B. Voting Example 2. Percentages look like this:

3. Does party affect voting behavior?

Given that the legislator is a Democrat, what is the chance of

voting for the health care proposal?

YES NODEM 50.6% 14.9% 65.5REP 6.9% 27.6% 34.5

57.5 42.5 100

Page 9: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

II. Causal Models (cont.)

A. Variables

B. Voting Example 3. Does party affect voting behavior? (cont.)

What is the Probability of being a democrat?

What is the Probability of being a Democrat and voting yes?

Vote (DepVar)Indep YES NOParty DEM

REP

Page 10: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

II. Causal Models (cont.)

A. Variables

B. Voting Example 4. Casual Model

This is the simplest way to state a causal model

A-------------> B

Party ---------> Vote

5. Interpretation The interpretation is that if party influences vote, then as we

move from Republicans to Democrats we should see a move from a No vote to a YES vote.

Page 11: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

II. Causal Models (cont.)

A. Variables B. Voting Example

C. Summary 1. Regression analysis helps us to explain the impact

of one variable on another.

We will be able to answer such questions as what is the

relative importance of race in explaining one's income?

Or perhaps the influence of economic conditions on the levels

of trade barriers?

Page 12: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

II. Causal Models (cont.)

A. Variables B. Voting Example

C. Summary 2. Univariate Model

For now, we will focus on the univariate case, or the causal

relation between two variables.

We will then relax this assumption and look at the relation of

multiple variables in a couple of weeks.

Page 13: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

III. Fitted Line Although regression analysis can be very

complicated, the heart of it is actually very simple.

It centers on the notion of fitting a line through the data.

1. Example

Suppose we have a study of how wheat yield depends on fertilizer. And

we observe this relation:

XFertilizer(lb/Acre)

YYield (bu/acre)

100 40200 50300 50400 70500 65600 65700 80

Page 14: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

III. Fitted Line (cont.)

1. Example (cont.) The observed relation between Fertilizer and Yield then

can be plotted as follows:

Yield

Fertilizer

40

50

60

70

80

100 200 300 400 500 600 700

x

x x

xx x

x

Page 15: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

III. Fitted Line (cont.)

1. Example

2. What line best approximates the relation between these observations? a) Highest and Lowest Value

Yield

Fertilizer

40

50

60

70

80

100 200 300 400 500 600 700

x

x x

xx x

x

x

x

xx

Lowest & highest value

Page 16: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

III. Fitted Line (cont.)

1. Example

2. What line best approximates the relation between these observations? (cont.) b) Median Value

Yield

Fertilizer

40

50

60

70

80

100 200 300 400 500 600 700

x

x

x

x

x

x

x

x

x

x

x

[Median]

Page 17: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

III. Fitted Line (cont.)

1. Example 2. What line best approximates the relation between these

observations?

3. Predicted Values a) Example 1:

The line that is fitted to the data gives the predicted

value of Y for any give level of X.

Page 18: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

III. Fitted Line (cont.)

1. Example 2. What line best approximates the relation between these

observations?

3. Predicted Values (cont.)

a) Example 1:

Yield

Fertilizer

40

50

60

70

80

100 200 300 400 500 600 700

x

x x

xx x

x

If X is 400 and all we know was the fitted line then we would expect the yield to

be around 65.

Page 19: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

III. Fitted Line (cont.)

1. Example 2. What line best approximates the relation between these

observations?

3. Predicted Values (cont.)

b) Example 2:

Many times we have a lot of data and fitting the line becomes

rather difficult.

Page 20: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

III. Fitted Line (cont.)

1. Example 2. What line best approximates the relation between these

observations?

3. Predicted Values (cont.)

b) Example 2:

Yield

Fertilizer

40

50

60

70

80

100 200 300 400 500 600 700

x

x x

xx x

x

xx

x

x

xx

For example, if our plotted data looked like this:

Page 21: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares We want a methodology that allows us to be able to

draw a line that best fits the data.

A. The Least Square Criteria

What we want to do is to fit a line whose equation is of the form:

This is just the algebraic representation of a line.

$Y a bX= +

Page 22: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria (cont.)

1. Intercept:

a represents the intercept of the line. That is, the point at which the line crosses the Y axis.

2. Slope of the line:

b represents the slope of the line. Yield

Fertilizer

40

50

60

70

80

100 200 300 400 500 600 700

x

x x

xx x

x

a change in x

change in Y

Page 23: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria (cont.) 1. Intercept: 2. Slope of the line:

Remember: the slope is just the change in Y divided by the change in X. Rise/Run

3. Minimizing the Sum or Squares a) Problem:

How do we select a and b so that we minimize the pattern of

vertical Y deviations (predicted errors)?

We what to minimize the deviation:

d Y Y= −$

Page 24: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria (cont.) 1. Intercept: 2. Slope of the line: 3. Minimizing the Sum or Squares

b) There are several ways in which we can do this.

1. First, we could minimize the sum of d.

We could find the line that will give us the

lowest sum of all the d's. The problem of course is that some d's would be

positive and others would be negative and when we add them all up they would end up canceling

each other. In effect, we would be picking a line so that the

d's add up to zero.

Page 25: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria (cont.)

1. Intercept: 2. Slope of the line: 3. Minimizing the Sum or Squares

b) There are several ways in which we can do this.

2. Absolute Values

3. Sum of Squared Deviations

Minimize d Y -YΣ Σ= $()MinimizeYY dΣΣ2 2=−$

Page 26: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria

B. OLS Formulas 1. Fitted Line

The line that we what to fit to the data is:

This is simply what we call the OLS line. Remember: we are concerned with how to

calculate the slope of the line b and the intercept of the line

$Y a bX= +

Page 27: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria

B. OLS Formulas 1. Fitted Line

2. OLS Slope

The OLS slope can becalculated from the formula:()()()bXXYYXX=−−−2

Page 28: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria

B. OLS Formulas 1. Fitted Line

2. OLS Slope

In the book they use the abbreviations:xXXyYY…−…−⇒ b=

xyx2

Page 29: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria

B. OLS Formulas 1. Fitted Line 2. OLS Slope

3. Intercept

Now that we have the slope b it is easy to calculate a

Note: when b=0 then the intercept is just the mean of the dependent variable.

aYbX=-

Page 30: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas

C. Example 1: Fertilizer and Yield

Data Deviation Form ProductsX Y xXX…−y = YY−xy x2100 40 -300 -20 6000 90,000200 50 -200 -10 2000 40,000300 50 -100 -10 1000 10,000400 70 0 10 0 0500 65 100 5 500 10,000600 65 200 5 1000 40,000700 80 300 20 6000 90,000X= 400 Y= 60 x=0 y=0 xy=16,500 x2=280000

Page 31: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas

C. Example 1: Fertilizer and Yield

So to calculate the slope we solve:

We can then use the slope b to calculate the intercept

b = xyx=216500280000,, = .059

Page 32: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas

C. Example 1: Fertilizer and Yield

Remember:

Plugging these estimated values into our fitted line equation, we get:

$YabX=+⇒aYbX=−

a = 60-.059(400) = 36.4

$ . .Y X= +36 4 059

Page 33: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas

C. Example 1: Fertilizer and Yield

What is the predicted bushels produced with 400 lbs of fertilizer?

What if we add 700 lbs of fertilizer what would be the

expected yield?

$..()Y=+364059400= 60

Page 34: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas C. Example 1: Fertilizer and Yield

D. Interpretation of b and a

1. Slope b

Change in Y that accompanies a unit change X.

The slope tells us that when there is a one unit change in the independent variable what is the predicted effect on the

dependent variable?

Page 35: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas C. Example 1: Fertilizer and Yield

D. Interpretation of b and a

1. Slope b

The slope then tells us two things: i) The directional effect of the independent variable on the

dependent variable. There was a positive relation between fertilizer and yield.

Page 36: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas C. Example 1: Fertilizer and Yield

D. Interpretation of b and a

1. Slope b

The slope then tells us two things: ii) It also tells you the magnitude of the effect on the

dependent variable. For each additional pound of fertilizer we expect an

increased yield of .059 bushels.

Page 37: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas C. Example 1: Fertilizer and Yield

D. Interpretation of b and a 2. The Intercept

The intercept tells us what we would expect if there is no fertilizer

added, we expect a yield of 36.4 bushels.

So independent of the fertilizer you can expect 36.4 bushels. Alternatively, if fertilizer has no effect on yield, we would simply

expect 36.4 bushels. The yield we expected with no fertilizer.

$..()Y=+3640590= 36.4

Page 38: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas C. Example 1: Fertilizer and Yield D. Interpretation of b and a

E. Example II: Radio Active Exposure 1. Casual Model

We want to know if exposure to radio active waste is linked to cancer?

Radio Active Waste --------------> Cancer

Page 39: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.) A. The Least Square Criteria B. OLS Formulas C. Example 1: Fertilizer and Yield D. Interpretation of b and a

E. Example II: Radio Active Exposure 2. Data

Index of Radio deaths perActive Exposure 10,000

X Y xXX…−y = YY−xy x2

8.3 210 3.7 50 185 13.696.4 180 1.8 20 36 3.243.4 130 -1.2 -30 36 1.443.8 170 -0.8 10 -8 0.642.6 130 -2.0 -30 60 411.6 210 7.0 50 350 491.2 120 -3.4 -40 136 11.562.5 150 -2.1 -10 21 4.411.6 140 -3.0 -20 60 9X= 4.6 Y= 160 x=0 y=0 xy=876 x2=97.0

Page 40: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas C. Example 1: Fertilizer and Yield D. Interpretation of b and a

E. Example II: Radio Active Exposure 3. Graph

100

110

120

130

140

150

160

200

190

180

170

1 2 3 4 5 6 7 8 9 10 11 12

x

x

x

x

x

x

x

xx

Page 41: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas C. Example 1: Fertilizer and Yield D. Interpretation of b and a

E. Example II: Radio Active Exposure 4. Calculate the regression line for predicting Y

from X

i) Slope

How do we interpret the slope coefficient?

For each unit of radioactive exposure, the cancer mortality rate rises by 9.03 deaths per 10,000 individuals.

b = xyx=2876970. = 9.03

Page 42: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas C. Example 1: Fertilizer and Yield D. Interpretation of b and a

E. Example II: Radio Active Exposure ii) Calculate the intercept

Plugging these estimated values into our fitted line equation, we get:

aYbX=−a = 160- 9.03 (4.6) = 118.5

$ . .Y X= +118 5 9 03

Page 43: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas C. Example 1: Fertilizer and Yield D. Interpretation of b and a

E. Example II: Radio Active Exposure 5. Predictions:

Let's calculate the mortality rate if X were 5.0.

How about if X were 0?

$..(.)Y=+118590350= 163.6

$..()Y=+11859030= 118.5

Page 44: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

IV. OLS Ordinary Least Squares (cont.)

A. The Least Square Criteria B. OLS Formulas C. Example 1: Fertilizer and Yield D. Interpretation of b and a

E. Example II: Radio Active Exposure How can we

interpret this result?

Even with no radioactive exposure, the mortality rate would be 118.5.

100

110

120

130

140

150

160

200

190

180

170

1 2 3 4 5 6 7 8 9 10 11 12

x

x

x

x

x

x

x

xx

Y=118.5+9.03X

Page 45: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

III. Advantages of OLS A. Easy

1. The least square method gives relative easy or at least

computable formulas for calculating a and b. $YabX=+b= xyx2aYbX=−

Page 46: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

III. Advantages of OLS (cont.)

A. Easy

B. OLS is similar to many concepts we have already used.

1. We are minimizing the sum of the squared deviations. In effect, this is very similar to how we find the variance.

2. Also, we saw above that when b=0,

The interpretation of this is that the best prediction we can make of Y is just the sample mean .

This is the case when the two variables are independent.

$Y = a or $Y = Y

Page 47: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

III. Advantages of OLS (cont.)

A. Easy B. OLS is similar to many concepts we have already used.

C. Extension of the Sample Mean

Since OLS is just an extension of the sample mean, it has

many of the same properties like efficient and unbiased.

D. Weighted Least Squares

We might want to weigh some observations more heavily

than others.

Page 48: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

V. Homework Example In the homework assignment, you are asked to select two

interval/ratio level variables and calculate the fitted line that

minimizes the sum of the squared deviations (the regression line).

A. Choose 2 Variables

What effect does the number of years of education have on the

frequency that one reads the newspaper?

The independent variable is Education And the dependent variable is Newspaper reading.

Page 49: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

V. Homework Example(cont.)

A. Choose 2 Variables

B. Coding the Variables

First, I made a new variable called PAPER.

Recode all the missing data values to a single value.

Remove missing values from the data set.

Then do the same for education

Page 50: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

V. Homework Example(cont.)

A. Choose 2 Variables B. Coding the Variables

C. Getting the number of valid observations

Next, see how many valid observations are left by using the “Summarize” command under the “Data” menu.

Page 51: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

V. Homework Example(cont.)

A. Choose 2 Variables B. Coding the Variables C. Getting the number of valid observations

D. Sampling five observations

1. So we randomly sample 5 from 1019.

2. As before, use the “Select” command under the “Data” menu to get 5 random observations.

3. Then go to the “Statistics” menu and use the “Summarize” > “List” command to get the entries for the variables of interest.

Page 52: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

V. Homework Example(cont.)

A. Choose 2 Variables B. Coding the Variables C. Getting the number of valid observations D. Sampling five observations

E. Calculate the OLS Line Finally, you will have to compute the fitted line for these data.

X= SMARTS Y= PAPER xXX…−y = YY−xy x2

15 1 1.6 -0.4 -0.64 2.568 2 -5.4 0.6 -3.24 29.1615 1 1.6 -0.4 -0.64 2.5613 2 -0.4 0.6 -0.24 0.1616 1 2.6 -0.4 -1.04 6.76X= 13.4 Y= 1.4 x=0 y=0 xy=-5.8 x2=41.2

Page 53: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

V. Homework Example(cont.)

A. Choose 2 Variables B. Coding the Variables C. Getting the number of valid observations D. Sampling five observations

E. Calculate the OLS Line

1. Calculate b =

2 . Calculate the intercept:

3 . Calculate the OLS line:

xy/ x2= -5.8/41.2 = -0.14

a =Y - bXa = 1.4- (-0.14)13.4

= 1.4 + 1.876 = 3.276$Y = a + bX$Y = 3.3 - 0.14X

Page 54: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

V. Homework Example(cont.)

A. Choose 2 Variables B. Coding the Variables C. Getting the number of valid observations D. Sampling five observations

E. Calculate the OLS Line

4. Plot

1

5 10 15 20

x

x

x x

x

3

2

3.3

Y=3.3=0.14X

Page 55: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

V. Homework Example(cont.)

A. Choose 2 Variables B. Coding the Variables C. Getting the number of valid observations D. Sampling five observations

E. Calculate the OLS Line

5. Interpretation

A person with no education would read 3.3 newspapers a day.

Page 56: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

V. Homework Example(cont.)

A. Choose 2 Variables B. Coding the Variables C. Getting the number of valid observations D. Sampling five observations

E. Calculate the OLS Line

5. Interpretation (cont.)

Our results further tell us that each additional year of education reduces the number of newspapers a person reads by 0.14.

So for every year of education you read 14% less.

Page 57: Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

V. Homework Example(cont.)

A. Choose 2 Variables B. Coding the Variables C. Getting the number of valid observations D. Sampling five observations

E. Calculate the OLS Line

5. Interpretation (cont.)

This example suggests some of the problems with drawing inferences about the underlying population

from small samples.