the distribution of the slope coe cient - john m...

33
The Distribution of the Slope Coefficient Given our population assumptions: E (b 2 )= β 2 (so b 2 is an unbiased estimator) The standard error of b 2 is: s b 2 = s s 2 e n i =1 (x i - ¯ x ) 2 The test statistic t * = b 2 -β 2 s b 2 is t distributed with (n - 2) degrees of freedom J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 1 / 33

Upload: lambao

Post on 13-Oct-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

The Distribution of the Slope Coefficient

Given our population assumptions:

E (b2) = β2 (so b2 is an unbiased estimator)

The standard error of b2 is:

sb2 =

√s2e∑n

i=1(xi − x̄)2

The test statistic t∗ = b2−β2sb2

is t distributed with

(n − 2) degrees of freedom

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 1 / 33

Page 2: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

The Distribution of the Slope Coefficient

0true population slope

coefficient

t distribution

distribution of estimated slope coefficient

subtracting beta

dividing by standard error

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 2 / 33

Page 3: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

The Distribution of the Intercept

Given our population assumptions:

E (b1) = β1 (so b1 is an unbiased estimator)

The standard error of b1 is:

sb1 =

√s2e · 1n

∑ni=1 x2

i∑ni=1(xi − x̄)2

The test statistic t∗ = b1−β1sb1

is t distributed with

(n − 2) degrees of freedom

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 3 / 33

Page 4: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Bivariate Hypothesis Testing

The basic idea behind our hypothesis testing is thesame as with univariate data

We will choose a value for β1 (or β2)

Based on the distribution of b1 (or b2), we willdetermine the probability of observing our b1 (or b2) ifthe true value of the population coefficient is what weguessed

If this probability is high, we don’t reject our initialguess

It this probability is very low, we reject our intitial guessin favor of whatever we have specified as the alternative

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 4 / 33

Page 5: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Bivariate Hypothesis Testing

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 5 / 33

Page 6: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Bivariate Hypothesis Testing

Once we have calculated our test statistic, everythingworks the same as it did for univariate hypothesistesting

The basic steps:1 Formulate the null hypothesis (say β2 = β∗

2 ) andalternative hypothesis and choose a significance level

2 Calculate b2 and sb23 Use the values of b2 and sb2 to calculate t∗

4 Either compare t∗ to your critical value or calculate thep-value and compare to α

The one difference from univariate hypothesis testing isthat we are now using (n − 2) degrees of freedom(important when you are calculating critical values orp-values)

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 6 / 33

Page 7: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Bivariate Hypothesis Testing

For a two-tailed test:

Setting up the hypothesis:

Ho : β2 = β∗2Ha: β2 6= β∗2

The test statistic:

t∗ =b2 − β∗2

sb2

Evaluating the test statistic

p = Prob(|Tn−2| ≥ |t∗|)

c = tα2,n−2

Reject Ho if p < α or if |t∗| > c

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 7 / 33

Page 8: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Bivariate Hypothesis Testing

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 8 / 33

Page 9: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Bivariate Hypothesis Testing

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 9 / 33

Page 10: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Bivariate Hypothesis Testing

For an upper one-tailed test:

Setting up the hypothesis:

Ho : β2 ≤ β∗2Ha: β2 > β∗2

The test statistic:

t∗ =b2 − β∗2

sb2

Evaluating the test statistic

p = Prob(Tn−2 ≥ t∗)

c = tα,n−2

Reject Ho if p < α or if t∗ > c

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 10 / 33

Page 11: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Bivariate Hypothesis Testing

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 11 / 33

Page 12: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Bivariate Hypothesis Testing

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 12 / 33

Page 13: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Bivariate Hypothesis Testing

For a lower one-tailed test:

Setting up the hypothesis:

Ho : β2 ≥ β∗2Ha: β2 < β∗2

The test statistic:

t∗ =b2 − β∗2

sb2

Evaluating the test statistic

p = Prob(Tn−2 ≤ t∗)

c = −tα,n−2

Reject Ho if p < α or if t∗ < c

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 13 / 33

Page 14: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Excel Commands for Hypothesis Testing

p-value for a two-tailed test: TDIST (|t∗|, n − 2, 2)

p-value for an upper one-tailed test (assuming t∗ > 0):TDIST (t∗, n − 2, 1)

p-value for a lower one-tailed test (assuming t∗ < 0):TDIST (−t∗, n − 2, 1)

critical value for a two-tailed test: TINV (α, n − 2)

critical value for an upper one-tailed test:TINV (2α, n − 2)

critical value for a lower one-tailed test:−TINV (2α, n − 2)

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 14 / 33

Page 15: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Hypothesis Testing Example

Let’s go back to our NBA salary data and try outbivariate hypothesis testing

It turns out that when ln(salary) is regressed on assistsper game, the slope coefficient is .147, so an extra assistper game is associated with a 14.7% increase in salary

Suppose we want to know whether an extra reboundper game is less valuable than an extra assist per game

The hypothesis we want to test is whether an extrarebound per game is associated with an increase insalary of less than 14.7%:

H0: β2 ≥ 14.7Ha: β2 < 14.7

Let’s go to Excel to test this (nba-data.csv) ...

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 15 / 33

Page 16: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Rebounds per Game and Salary

2

2.5

‐0.5

0

0.5

1

1.5

2

0 5 10 15 20ln(salary)

‐2.5

‐2

‐1.5

‐1

0.5

Rebounds or assists per game

Rebounds per game

Assists per game

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 16 / 33

Page 17: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Testing the Value of Rebounds

H0: β2 ≥ 14.7Ha: β2 < 14.7

From Excel, b2 was 0.146474 and sb2 was 0.016257

t∗ =0.146474− .147

0.016257= −0.03235

p = TDIST (0.03235, 270− 2, 1) = 0.487

So we fail to reject the null hypothesis that the value ofa rebound is greater than or equal to the value of anassist

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 17 / 33

Page 18: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

When to Use a Two-Tailed Test

The most common time to use a two-tailed test is whenwe want to know whether x has any statisticallysignificant relationship with y

In this case, we are testing the following hypotheses:

Ho : β2 = 0Ha: β2 6= 0

This is the test that the t-stats and p values given inExcel’s regression output correspond to

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 18 / 33

Page 19: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

When to Use a Two-Tailed Test

There are situtations that call for a two-sided test and avalue for β∗2 other than zero

Consider a regression of GDP on government spending,we would want to know whether an extra dollar spentby the government leads to simply an extra dollar ofGDP or some other amount

Another case would be estimating whether demand isunit elastic by regressing log of demand on the log ofthe price

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 19 / 33

Page 20: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

When to Use a One-Tailed Test

One-sided tests are useful when we want to test for thesign of a coefficient or when we want to test whether acoefficient is greater than an economically importantcutoff

Consider the government spending example again, wemay want to test for a multiplier effect (so we wouldtest whether the coefficient on government spendingwas greater than 1)

When testing for a sign of a coefficient, people oftenuse a two-tailed test to see if there is any significantrelationship and then just check the sign of thecoefficient (be careful about interpreting p-values)

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 20 / 33

Page 21: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Confidence Intervals for Regression Coefficients

Our hypothesis testing techniques let us test whetherthe coefficient is equal to a particular value

Another useful way to do statistical inference is toconstruct a confidence interval for the coefficient

Recall confidence intervals from univariate inference:the population mean fell within the (1− α) confidenceinterval with a probability of (1− α)%

The confidence interval for the slope coefficient isjusting telling us the range of values β2 is likely to be in

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 21 / 33

Page 22: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Confidence Intervals for Regression Coefficients

We calculate the (1− α) confidence interval as:

b2 ± tα2,n−2 · sb2

Excel will give the 95% confidence interval in theregression output...for other confidence intervals, youneed to do the calculation yourself

The confidence interval will be centered around ourestimated slope coefficient

The interval will be wider if the standard error is larger

The interval will be wider if we choose a smaller α

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 22 / 33

Page 23: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

An Example: Drinking and Obesity

Suppose we are interested in whether drinking isassociated with obesity (think of beer guts)

First, we need to decide which data is most useful toour research question

We could regress weight on amount of drinking but thishas some problems

A lot of people weigh more because they are taller, notbecause they are overweight

An alternative is to use the body mass index (bmi):

bmi =weight

height2· 703

To Excel (alcohol-bmi.csv) ...

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 23 / 33

Page 24: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

An Example: Drinking and Obesity

The coefficient on days of alcohol consumption is verystatistically significant (the p-value was 0.0084)

We had a fairly narrow confidence interval which meanswe have a good idea of how big the true coefficient is

But all of this information still doesn’t tell us whetherwe should really care

Given our regression results, let’s predict the bmi’s for aperson who doesn’t drink and a person who drinks everyday:

b̂mi(0) = 27.69− .25 · 0 = 27.69

b̂mi(7) = 27.69− .25 · 7 = 25.94

Is this an important difference?

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 24 / 33

Page 25: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

An Example: Drinking and Obesity

<16.5 severely underweight16.5 to 18.5 underweight18.5 to 25 normal25 to 30 overweight30 to 35 obese I35 to 40 obese II40 to 45 severely obese45 to 50 morbidly obese50 to 60 super obese>60 hyper obese

BMI cutoff values

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 25 / 33

Page 26: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

An Example: Drinking and Obesity

So the association between drinking and bmi is verystatistically significant

But the size of the coefficient actually isn’t all thatimpressive (and there are issues with interpreting whatit means)

An extra day of drinking a week is associated with avery small change in bmi relative to our various bmicutoffs

This is a case where we say that the coefficient isstatistically significant but not economically significant

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 26 / 33

Page 27: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Economic vs. Statistical Significance

Statistical significance is just telling us whether acoefficient is different than zero (or whatever we choseas β∗2)

This doesn’t mean we should care about the coefficient

We also need to think about whether the magnitude ofcoefficient is large

When we consider whether the magnitude is largeenough to be an important effect, we are considereconomic significance

We don’t have any formal tests for economicsignificance, it is left to our judgement

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 27 / 33

Page 28: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

Economic Significance and the Confidence Interval

We can have a coefficient that is statistically significantbut that has a confidence interval too wide to makeconclusions about economic significance

For example, suppose that the coefficient on educationwhen ln(wage) is regressed on years of education hasthe following 95% confidence interval:

βedu = .05± .04

At the lower end of this interval, the effect of educationon wage is not all that impressive while at the other endit is quite large

We can say that education is statistically significant butto say whether it is really important would requirenarrowing the confidence interval (say by using betterdata and more observations)

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 28 / 33

Page 29: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

A Cautionary Note

A regression still doesn’t tell you everything about abivariate relationship

We’ve already talked about how a basic regression doesnot establish causality

Beyond that, there are many other features of data thatwon’t be apparent in regression results

We’ll take a look at just how important this can be witha set of data called the Anscombe’s Quartet(anscombe-example.xlsx)

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 29 / 33

Page 30: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

The Anscombe Quartet

y = 0.50x + 3.00R² = 0.67

10

12A

y = 0.50x + 3.00R² = 0.67

2

4

6

8

10

12

y

Ay = 0.50x + 3.00

R² = 0.67

0

2

4

6

8

10

12

0 2 4 6 8 10 12 14 16

y

x

Ay = 0.50x + 3.00

R² = 0.67

0

2

4

6

8

10

12

0 2 4 6 8 10 12 14 16

y

x

A

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 30 / 33

Page 31: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

The Anscombe Quartet

y = 0.50x + 3.0010

12B

y = 0.50x + 3.00R² = 0.67

4

6

8

10

12

y

By = 0.50x + 3.00

R² = 0.67

0

2

4

6

8

10

12

0 2 4 6 8 10 12 14 16

y

x

By = 0.50x + 3.00

R² = 0.67

0

2

4

6

8

10

12

0 2 4 6 8 10 12 14 16

y

x

B

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 31 / 33

Page 32: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

The Anscombe Quartet

y = 0.50x + 3.00R² = 0.6712

14Cy = 0.50x + 3.00

R² = 0.67

2

4

6

8

10

12

14

y

Cy = 0.50x + 3.00R² = 0.67

0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14 16

y

x

Cy = 0.50x + 3.00R² = 0.67

0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14 16

y

x

C

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 32 / 33

Page 33: The Distribution of the Slope Coe cient - John M …jmparman.people.wm.edu/102-lecture-slides/lecture-2-03-11.pdf · The Distribution of the Slope Coe cient J. Parman (UC-Davis) Analysis

The Anscombe Quartet

y = 0.50x + 3.00R² = 0.67

12

14Dy = 0.50x + 3.00

R² = 0.67

4

6

8

10

12

14

y

Dy = 0.50x + 3.00R² = 0.67

0

2

4

6

8

10

12

14

0 5 10 15 20

y

x

D

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 3, 2011 33 / 33