1 section 3 probit and logit models. 2 dichotomous data suppose data is discrete but there are only...

75
1 Section 3 Probit and Logit Models

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

1

Section 3

Probit and Logit Models

Page 2: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

2

Dichotomous Data

• Suppose data is discrete but there are only 2 outcomes

• Examples– Graduate high school or not– Patient dies or not– Working or not– Smoker or not

• In data, yi=1 if yes, yi =0 if no

Page 3: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

3

How to model the data generating process?

• There are only two outcomes• Research question: What factors

impact whether the event occurs?• To answer, will model the probability

the outcome occurs• Pr(Yi=1) when yi=1 or

• Pr(Yi=0) = 1- Pr(Yi=1) when yi=0

Page 4: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

4

• Think of the problem from a MLE perspective

• Likelihood for i’th observation

• Li= Pr(Yi=1)Yi [1 - Pr(Yi=1)](1-Yi)

• When yi=1, only relevant part is Pr(Yi=1)

• When yi=0, only relevant part is [1 - Pr(Yi=1)]

Page 5: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

5

• L = Σi ln[Li] =

= Σi {yi ln[Pr(yi=1)] + (1-yi)ln[Pr(yi=0)] }

• Notice that up to this point, the model is generic. The log likelihood function will determined by the assumptions concerning how we determine Pr(yi=1)

Page 6: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

6

Modeling the probability

• There is some process (biological, social, decision theoretic, etc) that determines the outcome y

• Some of the variables impacting are observed, some are not

• Requires that we model how these factors impact the probabilities

• Model from a ‘latent variable’ perspective

Page 7: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

7

• Consider a women’s decision to work• yi* = the person’s net benefit to work• Two components of yi*

– Characteristics that we can measure• Education, age, income of spouse, prices of

child care

– Some we cannot measure• How much you like spending time with your

kids• how much you like/hate your job

Page 8: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

8

• We aggregate these two components into one equation• yi* = β0 + x1i β1+ x2i β2+… xki βk+ εi

= xi β + εi

• xi β (measurable characteristics but with uncertain weights)• εi random unmeasured characteristics

• Decision rule: person will work if yi* > 0 (if net benefits are positive)

yi=1 if yi*>0

yi=0 if yi*≤0

Page 9: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

9

• yi=1 if yi*>0• yi* = xi β + εi > 0 only if

• εi > - xi β

• yi=0 if yi*≤0

• yi* = xi β + εi ≤ 0 only if

• εi ≤ - xi β

Page 10: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

10

• Suppose xi β is ‘big.’ – High wages– Low husband’s income– Low cost of child care

• We would expect this person to work, UNLESS, there is some unmeasured ‘variable’ that counteracts this

Page 11: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

11

• Suppose a mom really likes spending time with her kids, or she hates her job.

• The unmeasured benefit of working has a big negative coefficient εi

• If we observe them working, εi must not have been too big, since

• yi=1 if εi > - xi β

Page 12: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

12

• Consider the opposite. Suppose we observe someone NOT working.

• Then εi must not have been big, since

• yi=0 if εi ≤ - xi β

Page 13: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

13

Logit

• Recall yi =1 if εi > - xi β• Since εi is a logistic distribution• Pr(εi > - xi β) = 1 – F(- xi β)• The logistic is also a symmetric

distribution, so• 1 – F(- xi β) • = F(xi β) • = exp(xi β)/(1+exp(xi β))

Page 14: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

14

• When εi is a logistic distribution

• Pr(yi =1) = exp(xi β)/(1+exp(xi β))

• Pr(yi=0) = 1/(1+exp(xi β))

Page 15: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

15

Example: Workplace smoking bans

• Smoking supplements to 1991 and 1993 National Health Interview Survey

• Asked all respondents whether they currently smoke

• Asked workers about workplace tobacco policies

• Sample: workers• Key variables: current smoking and

whether they faced by workplace ban

Page 16: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

16

• Data: workplace1.dta• Sample program: workplace1.doc• Results: workplace1.log

Page 17: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

17

Description of variables in data• . desc;

• storage display value• variable name type format label variable label• ------------------------------------------------------------------------• > -• smoker byte %9.0g is current smoking• worka byte %9.0g has workplace smoking bans• age byte %9.0g age in years• male byte %9.0g male• black byte %9.0g black• hispanic byte %9.0g hispanic• incomel float %9.0g log income• hsgrad byte %9.0g is hs graduate• somecol byte %9.0g has some college• college float %9.0g • -----------------------------------------------------------------------

Page 18: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

18

Summary statistics• sum;

• Variable | Obs Mean Std. Dev. Min Max• -------------+--------------------------------------------------------• smoker | 16258 .25163 .433963 0 1• worka | 16258 .6851396 .4644745 0 1• age | 16258 38.54742 11.96189 18 87• male | 16258 .3947595 .488814 0 1• black | 16258 .1119449 .3153083 0 1• -------------+--------------------------------------------------------• hispanic | 16258 .0607086 .2388023 0 1• incomel | 16258 10.42097 .7624525 6.214608 11.22524• hsgrad | 16258 .3355271 .4721889 0 1• somecol | 16258 .2685447 .4432161 0 1• college | 16258 .3293763 .4700012 0 1

Page 19: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

19

Running a probit

• probit smoker age incomel male black hispanic hsgrad somecol college worka;

• The first variable after ‘probit’ is the discrete outcome, the rest of the variables are the independent variables

• Includes a constant as a default

Page 20: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

20

Running a logit

• logit smoker age incomel male black hispanic hsgrad somecol college worka;

• Same as probit, just change the first word

Page 21: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

21

Running linear probability

• reg smoker age incomel male black hispanic hsgrad somecol college worka, robust;

• Simple regression. • Standard errors are incorrect

(heteroskedasticity)• robust option produces standard

errors with arbitrary form of heteroskedasticity

Page 22: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

22

Probit Results• Probit estimates Number of obs = 16258• LR chi2(9) = 819.44• Prob > chi2 = 0.0000• Log likelihood = -8761.7208 Pseudo R2 = 0.0447

• ------------------------------------------------------------------------------• smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval]• -------------+----------------------------------------------------------------• age | -.0012684 .0009316 -1.36 0.173 -.0030943 .0005574• incomel | -.092812 .0151496 -6.13 0.000 -.1225047 -.0631193• male | .0533213 .0229297 2.33 0.020 .0083799 .0982627• black | -.1060518 .034918 -3.04 0.002 -.17449 -.0376137• hispanic | -.2281468 .0475128 -4.80 0.000 -.3212701 -.1350235• hsgrad | -.1748765 .0436392 -4.01 0.000 -.2604078 -.0893453• somecol | -.363869 .0451757 -8.05 0.000 -.4524118 -.2753262• college | -.7689528 .0466418 -16.49 0.000 -.860369 -.6775366• worka | -.2093287 .0231425 -9.05 0.000 -.2546873 -.1639702• _cons | .870543 .154056 5.65 0.000 .5685989 1.172487• ------------------------------------------------------------------------------

Page 23: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

23

How to measure fit?

• Regression (OLS) – minimize sum of squared errors– Or, maximize R2

– The model is designed to maximize predictive capacity

• Not the case with Probit/Logit– MLE models pick distribution parameters so as

best describe the data generating process– May or may not ‘predict’ the outcome well

Page 24: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

24

Pseudo R2

• LLk log likelihood with all variables• LL1 log likelihood with only a constant• 0 > LLk > LL1 so | LLk | < |LL1|

• Pseudo R2 = 1 - |LL1/LLk| • Bounded between 0-1• Not anything like an R2 from a

regression

Page 25: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

25

Predicting Y

• Let b be the estimated value of β

• For any candidate vector of xi , we can predict probabilities, Pi

• Pi = Ф(xib)

• Once you have Pi, pick a threshold value, T, so that you predict

• Yp = 1 if Pi > T

• Yp = 0 if Pi ≤ T

• Then compare, fraction correctly predicted

Page 26: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

26

• Question: what value to pick for T?• Can pick .5

– Intuitive. More likely to engage in the activity than to not engage in it

– However, when the is small, this criteria does a poor job of predicting Yi=1

– However, when the is close to 1, this criteria does a poor job of picking Yi=0

Page 27: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

27

• *predict probability of smoking;• predict pred_prob_smoke;• * get detailed descriptive data about predicted

prob;• sum pred_prob, detail;

• * predict binary outcome with 50% cutoff;• gen pred_smoke1=pred_prob_smoke>=.5;• label variable pred_smoke1 "predicted smoking, 50%

cutoff";

• * compare actual values;• tab smoker pred_smoke1, row col cell;

Page 28: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

28

• . sum pred_prob, detail;

• Pr(smoker)• -------------------------------------------------------------• Percentiles Smallest• 1% .0959301 .0615221• 5% .1155022 .0622963• 10% .1237434 .0633929 Obs 16258• 25% .1620851 .0733495 Sum of Wgt. 16258

• 50% .2569962 Mean .2516653• Largest Std. Dev. .0960007• 75% .3187975 .5619798• 90% .3795704 .5655878 Variance .0092161• 95% .4039573 .5684112 Skewness .1520254• 99% .4672697 .6203823 Kurtosis 2.149247

Page 29: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

29

• Notice two things– Sample mean of the predicted

probabilities is close to the sample mean outcome

– 99% of the probabilities are less than .5– Should predict few smokers if use a 50%

cutoff

Page 30: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

30

• | predicted smoking,• is current | 50% cutoff• smoking | 0 1 | Total• -----------+----------------------+----------• 0 | 12,153 14 | 12,167 • | 99.88 0.12 | 100.00 • | 74.93 35.90 | 74.84 • | 74.75 0.09 | 74.84 • -----------+----------------------+----------• 1 | 4,066 25 | 4,091 • | 99.39 0.61 | 100.00 • | 25.07 64.10 | 25.16 • | 25.01 0.15 | 25.16 • -----------+----------------------+----------• Total | 16,219 39 | 16,258 • | 99.76 0.24 | 100.00 • | 100.00 100.00 | 100.00 • | 99.76 0.24 | 100.00

Page 31: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

31

• Check on-diagonal elements. • The last number in each 2x2 element

is the fraction in the cell • The model correctly predicts 74.75 +

0.15 = 74.90% of the obs• It only predicts a small fraction of

smokers

Page 32: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

32

• Do not be amazed by the 75% percent correct prediction

• If you said everyone has a chance of smoking (a case of no covariates), you would be correct Max[(,(1-)] percent of the time

Page 33: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

33

• In this case, 25.16% smoke. • If everyone had the same chance of

smoking, we would assign everyone Pr(y=1) = .2516

• We would be correct for the 1 - .2516 = 0.7484 people who do not smoke

Page 34: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

34

Key points about prediction

• MLE models are not designed to maximize prediction

• Should not be surprised they do not predict well

• In this case, not particularly good measures of predictive capacity

Page 35: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

35

Translating coefficients in probit:

Continuous Covariates• Pr(yi=1) = Φ[β0 + x1i β1+ x2i β2+… xki βk]

• Suppose that x1i is a continuous variable

• d Pr(yi=1) /d x1i = ?

• What is the change in the probability of an event give a change in x1i?

Page 36: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

36

Marginal Effect

• d Pr(yi=1) /d x1i

• = β1 φ[β0 + x1i β1+ x2i β2+… xki βk]

• Notice two things. Marginal effect is a function of the other parameters and the values of x.

Page 37: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

37

Translating Coefficients:Discrete Covariates

• Pr(yi=1) = Φ[β0 + x1i β1+ x2i β2+… xki βk]

• Suppose that x2i is a dummy variable (1 if yes, 0 if no)

• Marginal effect makes no sense, cannot change x2i by a little amount. It is either 1 or 0.

• Redefine the variable of interest. Compare outcomes with and without x2i

Page 38: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

38

• y1 = Pr(yi=1 | x2i=1)

= Φ[β0 + x1iβ1+ β2 + x3iβ3 +… ]

• y0 = Pr(yi=1 | x2i=0)

= Φ[β0 + x1iβ1+ x3iβ3 … ]

Marginal effect = y1 – y0.

Difference in probabilities with and without x2i?

Page 39: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

39

In STATA

• Marginal effects for continuous variables, STATA picks sample means for X’s

• Change in probabilities for dichotomous outcomes, STATA picks sample means for X’s

Page 40: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

40

STATA command for Marginal Effects

• mfx compute;

• Must be after the outcome when estimates are still active in program.

Page 41: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

41

• Marginal effects after probit• y = Pr(smoker) (predict)• = .24093439• ------------------------------------------------------------------------------• variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X• ---------+--------------------------------------------------------------------• age | -.0003951 .00029 -1.36 0.173 -.000964 .000174 38.5474• incomel | -.0289139 .00472 -6.13 0.000 -.03816 -.019668 10.421• male*| .0166757 .0072 2.32 0.021 .002568 .030783 .39476• black*| -.0320621 .01023 -3.13 0.002 -.052111 -.012013 .111945• hispanic*| -.0658551 .01259 -5.23 0.000 -.090536 -.041174 .060709• hsgrad*| -.053335 .01302 -4.10 0.000 -.07885 -.02782 .335527• somecol*| -.1062358 .01228 -8.65 0.000 -.130308 -.082164 .268545• college*| -.2149199 .01146 -18.76 0.000 -.237378 -.192462 .329376• worka*| -.0668959 .00756 -8.84 0.000 -.08172 -.052072 .68514• ------------------------------------------------------------------------------• (*) dy/dx is for discrete change of dummy variable from 0 to 1

Page 42: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

42

Interpret results

• 10% increase in income will reduce smoking by 2.9 percentage points

• 10 year increase in age will decrease smoking rates .4 percentage points

• Those with a college degree are 21.5 percentage points less likely to smoke

• Those that face a workplace smoking ban have 6.7 percentage point lower probability of smoking

Page 43: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

43

• Do not confuse percentage point and percent differences– A 6.7 percentage point drop is 29% of

the sample mean of 24 percent.– Blacks have smoking rates that are 3.2

percentage points lower than others, which is 13 percent of the sample mean

Page 44: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

44

Comparing Marginal Effects

Variable LP Probit Logit

age -0.00040 -0.00048 -0.00048

incomel -0.0289 -0.0287 -0.0276

male 0.0167 0.0168 0.0172

Black -0.0321 -0.0357 -0.0342

hispanic -0.0658 -0.0706 -0.0602

hsgrad -0.0533 -0.0661 -0.0514

college -0.2149 -0.2406 -0.2121

worka -0.0669 -0.0661 -0.0658

Page 45: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

45

When will results differ?

• Normal and logit CDF look – Similar in the mid point of the distribution– Different in the tails

• You obtain more observations in the tails of the distribution when – Samples sizes are large approaches 1 or 0

• These situations will produce more differences in estimates

Page 46: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

46

Some nice properties of the Logit

• Outcome, y=1 or 0• Treatment, x=1 or 0• Other covariates, x

• Context, – x = whether a baby is born with a low

weight birth– x = whether the mom smoked or not

during pregnancy

Page 47: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

47

• Risk ratio

RR = Prob(y=1|x=1)/Prob(y=1|x=0)

Differences in the probability of an event when x is and is not observed

How much does smoking elevate the chance your child will be a low weight birth

Page 48: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

48

• Let Yyx be the probability y=1 or 0 given x=1 or 0

• Think of the risk ratio the following way

• Y11 is the probability Y=1 when X=1• Y10 is the probability Y=1 when X=0

• Y11 = RR*Y10

Page 49: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

49

• Odds Ratio OR=A/B = [Y11/Y01]/[Y10/Y00]

A = [Pr(Y=1|X=1)/Pr(Y=0|X=1)] = odds of Y occurring if you are a smoker

B = [Pr(Y=1|X=0)/Pr(Y=0|X=0)] = odds of y happening if you are not a

smoker

What are the relative odds of Y happening if you do or do not experience X

Page 50: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

50

• Suppose Pr(Yi =1) = F(βo+ β1Xi + β2Z) and F is the logistic function

• Can show that

• OR = exp(β1) = e β1

• This number is typically reported by most statistical packages

Page 51: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

51

• Details• Y11 = exp(βo+ β1 + β2Z) /(1+ exp(βo+ β1+ β2Z) )

• Y10 = exp(βo+ β2Z)/(1+ exp(βo+β2Z))

• Y01 = 1 /(1+ exp(βo+ β1 + β2Z) )

• Y00 = 1/(1+ exp(βo+β2Z)

• [Y11/Y01] = exp(βo+ β1 + β2Z)

• [Y10/Y00] = exp(βo+ β2Z)

• OR=A/B = [Y11/Y01]/[Y10/Y00]

= exp(βo+ β1 + β2Z)/ exp(βo + β2Z)

= exp(β1)

Page 52: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

52

• Suppose Y is rare, close to 0– Pr(Y=0|X=1) and Pr(Y=0|X=0) are both

close to 1, so they cancel

• Therefore, when is close to 0– Odds Ratio = Risk Ratio

• Why is this nice?

Page 53: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

53

Population attributable risk• Average outcome in the population

= (1-) Y10 + Y11 = (1- )Y10 + (RR)Y10

• Average outcomes are a weighted average of outcomes for X=0 and X=1

• What would the average outcome be in the absence of X (e.g., reduce smoking rates to 0)

• Ya = Y10

Page 54: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

54

Population Attributable Risk

• PAR• Fraction of outcome attributed to X• The difference between the current

rate and the rate that would exist without X, divided by the current rate

• PAR = ( – Ya)/

= (RR – 1)/[(1-) + RR]

Page 55: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

55

Example: Maternal Smoking and Low Weight Births

• 6% births are low weight– < 2500 grams (– Average birth is 3300 grams (5.5 lbs)

• Maternal smoking during pregnancy has been identified as a key cofactor– 13% of mothers smoke – This number was falling about 1

percentage point per year during 1980s/90s

– Doubles chance of low weight birth

Page 56: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

56

Natality detail data

• Census of all births (4 million/year)• Annual files starting in the 60s• Information about

– Baby (birth weight, length, date, sex, plurality, birth injuries)

– Demographics (age, race, marital, educ of mom)

– Birth (who delivered, method of delivery)– Health of mom (smoke/drank during preg,

weight gain)

Page 57: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

57

• Smoking not available from CA or NY• ~3 million usable observations• I pulled .5% random sample from

1995• About 12,500 obs• Variables: birthweight (grams),

smoked, married, 4-level race, 5 level education, mothers age at birth

Page 58: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

58

• ------------------------------------------------------------------------------• > -• storage display value• variable name type format label variable label• ------------------------------------------------------------------------------• > -• birthw int %9.0g birth weight in grams• smoked byte %9.0g =1 if mom smoked during• pregnancy• age byte %9.0g moms age at birth• married byte %9.0g =1 if married• race4 byte %9.0g 1=white,2=black,3=asian,4=other• educ5 byte %9.0g 1=0-8, 2=9-11, 3=12, 4=13-15,• 5=16+• visits byte %9.0g prenatal visits• ------------------------------------------------------------------------------

Page 59: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

59

• dummy |• variable, |• =1 | =1 if mom smoked• ifBW<2500 | during pregnancy• grams | 0 1 | Total• -----------+----------------------+----------• 0 | 11,626 1,745 | 13,371 • | 86.95 13.05 | 100.00 • | 94.64 89.72 | 93.96 • | 81.70 12.26 | 93.96 • -----------+----------------------+----------• 1 | 659 200 | 859 • | 76.72 23.28 | 100.00 • | 5.36 10.28 | 6.04 • | 4.63 1.41 | 6.04 • -----------+----------------------+----------• Total | 12,285 1,945 | 14,230 • | 86.33 13.67 | 100.00 • | 100.00 100.00 | 100.00 • | 86.33 13.67 | 100.00

Page 60: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

60

• Notice a few things– 13.7% of women smoke– 6% have low weight birth

• Pr(LBW | Smoke) =10.28%• Pr(LBW |~ Smoke) = 5.36%• RR = Pr(LBW | Smoke)/ Pr(LBW |~ Smoke) = 0.1028/0.0536 = 1.92

Page 61: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

61

Logit results• Log likelihood = -3136.9912 Pseudo R2 = 0.0330

• ------------------------------------------------------------------------------• lowbw | Coef. Std. Err. z P>|z| [95% Conf. Interval]• -------------+----------------------------------------------------------------• smoked | .6740651 .0897869 7.51 0.000 .4980861 .8500441• age | .0080537 .006791 1.19 0.236 -.0052564 .0213638• married | -.3954044 .0882471 -4.48 0.000 -.5683654 -.2224433• _Ieduc5_2 | -.1949335 .1626502 -1.20 0.231 -.5137221 .1238551• _Ieduc5_3 | -.1925099 .1543239 -1.25 0.212 -.4949791 .1099594• _Ieduc5_4 | -.4057382 .1676759 -2.42 0.016 -.7343769 -.0770994• _Ieduc5_5 | -.3569715 .1780322 -2.01 0.045 -.7059081 -.0080349• _Irace4_2 | .7072894 .0875125 8.08 0.000 .5357681 .8788107• _Irace4_3 | .386623 .307062 1.26 0.208 -.2152075 .9884535• _Irace4_4 | .3095536 .2047899 1.51 0.131 -.0918271 .7109344• _cons | -2.755971 .2104916 -13.09 0.000 -3.168527 -2.343415• ------------------------------------------------------------------------------

Page 62: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

62

Odds Ratios

• Smoked– exp(0.674) = 1.96– Smokers are twice as likely to have a

low weight birth

• _Irace4_2 (Blacks)– exp(0.707) = 2.02– Blacks are twice as likely to have a low

weight birth

Page 63: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

63

Asking for odds ratios

• Logistic y x1 x2;

• In this case

• xi: logistic lowbw smoked age married i.educ5 i.race4;

Page 64: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

64

• Log likelihood = -3136.9912 Pseudo R2 = 0.0330

• ------------------------------------------------------------------------------• lowbw | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]• -------------+----------------------------------------------------------------• smoked | 1.962198 .1761796 7.51 0.000 1.645569 2.33975• age | 1.008086 .0068459 1.19 0.236 .9947574 1.021594• married | .6734077 .0594262 -4.48 0.000 .5664506 .8005604• _Ieduc5_2 | .8228894 .1338431 -1.20 0.231 .5982646 1.131852• _Ieduc5_3 | .8248862 .1272996 -1.25 0.212 .6095837 1.116233• _Ieduc5_4 | .6664847 .1117534 -2.42 0.016 .4798043 .9257979• _Ieduc5_5 | .6997924 .1245856 -2.01 0.045 .4936601 .9919973• _Irace4_2 | 2.028485 .1775178 8.08 0.000 1.70876 2.408034• _Irace4_3 | 1.472001 .4519957 1.26 0.208 .8063741 2.687076• _Irace4_4 | 1.362817 .2790911 1.51 0.131 .9122628 2.035893• ------------------------------------------------------------------------------

Page 65: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

65

PAR

• PAR = (RR – 1)/[(1-) + RR]

= 0.137• RR = 1.96

• PAR = 0.116• 11.6% of low weight births attributed

to maternal smoking

Page 66: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

66

Hypothesis Testing in MLE models

• MLE are asymptotically normally distributed, one of the properties of MLE

• Therefore, standard t-tests of hypothesis will work as long as samples are ‘large’

• What ‘large’ means is open to question• What to do when samples are ‘small’ –

table for a moment

Page 67: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

67

Testing a linear combination of parameters

• Suppose you have a probit model• Φ[β0 + x1iβ1+ x2i β2 + x3iβ3 +… ]

• Test a linear combination or parameters• Simplest example, test a subset are zero• β1= β2 = β3 = β4 =0• To fix the discussion

• N observations• K parameters• J restrictions (count the equals signs, j=4)

Page 68: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

68

Wald Test

• Based on the fact that the parameters are distributed asymptotically normal

• Probability theory review– Suppose you have m draws from a

standard normal distribution (zi)

– M = z12 + z2

2 + …. Zm2

– M is distributed as a Chi-square with m degrees of freedom

Page 69: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

69

• Wald test constructs a ‘quadratic form’ suggested by the test you want to perform

• This combination, because it contains squares of the true parameters, should, if the hypothesis is true, be distributed as a Chi square with j degrees of freedom.

• If the test statistic is ‘large’, relative to the degrees of freedom of the test, we reject, because there is a low probability we would have drawn that value at random from the distribution

Page 70: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

70

Reading values from a Table

• All stats books will report the ‘percentiles’ of a chi-square– Vertical axis (degrees of freedom)– Horizontal axis (percentiles)– Entry is the value where ‘percentile’ of

the distribution falls below

Page 71: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

71

• Example: Suppose 4 restrictions• 95% of a chi-square distribution falls

below 9.488. • So there is only a 5% a number drawn

at random will exceed 9.488• If your test statistic is below, cannot

reject null• If your test statistics is above, reject

null

Page 72: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

72

Chi-square

Percentiles of the Chi-squaredDOF 0.500 0.750 0.800 0.900 0.950 0.990 0.995

1 0.455 1.323 1.642 2.706 3.841 6.635 7.8792 1.386 2.773 3.219 4.605 5.991 9.210 10.5973 2.366 4.108 4.642 6.251 7.815 11.345 12.8384 3.357 5.385 5.989 7.779 9.488 13.277 14.8605 4.351 6.626 7.289 9.236 11.070 15.086 16.7506 5.348 7.841 8.558 10.645 12.592 16.812 18.5487 6.346 9.037 9.803 12.017 14.067 18.475 20.2788 7.344 10.219 11.030 13.362 15.507 20.090 21.9559 8.343 11.389 12.242 14.684 16.919 21.666 23.589

10 9.342 12.549 13.442 15.987 18.307 23.209 25.188

Page 73: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

73

Wald test in STATA

• Default test in MLE models• Easy to do. Look at program

• test hsgrad somecol college

• Does not estimate the ‘restricted’ model

• ‘Lower power’ than other tests, i.e., high chance of false negative

Page 74: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

74

-2 Log likelihood test• * how to run the same tests with a -2 log like test;

• * estimate the unresticted model and save the estimates ;

• * in urmodel;• probit smoker age incomel male black hispanic • hsgrad somecol college worka;• estimates store urmodel;

• * estimate the restricted model. save results in rmodel;

• probit smoker age incomel male black hispanic • worka;• estimates store rmodel;

• lrtest urmodel rmodel;

Page 75: 1 Section 3 Probit and Logit Models. 2 Dichotomous Data Suppose data is discrete but there are only 2 outcomes Examples –Graduate high school or not –Patient

75

• I prefer -2 log likelihood test– Estimates the restricted and

unrestricted model– Therefore, has more power than a Wald

test

• In most cases, they give the same ‘decision’ (reject/not reject)