12 probit and logit - amine ouazad · binary dependent variable:! probit and logit" amine...

Binary dependent variable: ��

Probit and Logit Amine Ouazad

Ass. Prof. of Economics

Outline

1.  Problemo

2.  Probit/Logit Framework

3.  Structural interpretation

4.  Interpreting results

5.  Testing assumptions

6.  Further remarks

PROBLEMO: OLS WITH A BINARY DEPENDENT VARIABLE

Problemos •  Consider the estimation of the probability of smoking y = x’b + e, where y

= 0,1 . x a set of covariates.

•  We know that OLS is consistent, asympt. normal, and unbiased.

•  However:

•  The predictions can be outside 0,1.

•  E(y|x) =x’b is the probability of smoking given the characteristics of the individual, E(y|x) = P(y=1|x).

•  The residuals are not normal for a finite sample.

•  Conditional on x, the residual takes one of two values e = 1 - x’b or e = - x’b.

•  Under A6, the residuals would be N(0,s2), but that is not possible given y = 0,1.

•  The residuals are heteroskedastic.

•  Since y is binary, Var(y|x) = x’b(1-x’b)

Problemo 1: Predictions

•  In-sample predictions may be outside 0,1

•  Smoking = a + b Age + c Income + e

•  regress smoking age income

•  predict smoking , xb

•  sum smoking

•  Out-of-sample predictions may be outside 0,1

•  Smoking = a + b Age + c Income + e


•  use another_dataset.dta

•  predict smoking , xb

•  sum smoking_predicted

Problemo 2: ��Normality of the residuals

•  Recap: Normality of the residuals is needed for the validity of confidence intervals, test statistics, when “far from the asymptotics” (i.e. small sample size).

•  But a normally distributed residuals cannot make y=0,1.

•  Hence, in principle, confidence intervals and test statistics are incorrect if using an OLS regression.


•  predict resid, resid

•  hist resid.

•  See next page.

•  Non normal residuals. Conditional on x, e can take only two values.

•  Considering that x has a distribution, that can give a double-peaked distribution for the residuals such as this one. A6 is obviously violated.

Test of normality •  Using the third and fourth moments of the

residuals. If they are normal, the kurtosis should be 3 and the skewness should be 0.

•  regress y x

•  predict epsilon, resid

•  sum epsilon, detail

•  hist epsilon

•  sktest epsilon

PROBIT/LOGIT FRAMEWORK

Individuals’ preferences •  Intuition is that individuals are making a discrete choice.

•  y* = U(smoking)-U(not smoking) = x’b + e

•  The difference in utilities is the benefit-cost analysis.

•  The cost and benefit is unobserved, but the choice is ultimately observed.

•  The cost and benefit is a continuous variable, so e can be normally distributed.

•  Then P(y=1)=P(smoking) = P(y*>0)=P(x’b+e>0) = P(e>-x’b) = F(x’b)

•  With a symmetric distribution P(e>-x’b)=P(e<x’b).

•  And P(y=0) = F(x’b).

•  F is the cdf of the residual.

Probit vs Logit

•  Probit: the residual is normally distributed, with variance 1.

•  Variance is fixed, more on this later.

•  F(x) is the integral of the normal distribution.

•  Logit: the residual has a logistic distribution.

•  F(x) = ex/(1+ex).

•  Choice of one versus the other makes little practical difference (and should make no practical difference, otherwise your model is not robust).

•  Difficulty in the probit model is that the cdf has no closed form expression.

Likelihood of the model

•  We observe the choices yi , i=1,2,...,N, and the characteristics xi.

•  The likelihood of an observation yi,xi is:

•  L(yi,xi;b) = P(yi=1|xi) if yi = 1

•  L(yi,xi;b) = P(yi=0|xi) if yi = 0

•  Combine: L = P(yi=1|xi)^yi P(yi=0|xi)^(1-yi)

•  In logs, log L = yi log P(yi=1|xi) + (1-yi) log P(yi=0|xi)

Identification and estimation

•  See Greene. The likelihood function has a single global maximum and is globally concave

INTERPRETING RESULTS

Variance is not identified

•  The likelihood was maximized over the coefficient vector b only, because we fixed the value of the variance.

•  The variance has to be fixed, by convention to 1 for probit, to π2/3 for logit.

•  Indeed, consider a model where the variance of the residual is 4, and coefficients inflated by 2.

•  The model generates the same probability of smoking as the original model.

•  This also tells us that the absolute value of the coefficients have little interpretation. Only the odds ratios and the marginal effects have an interpretation.

Identification

•  By the same reasoning, the value of a coefficient is not identified.

•  The sign of a coefficient is identified.

•  The ratio of two coefficients is identified.

•  Implications for the interpretation of the logit coefficients?

Marginal effects

•  The marginal effect of a covariate x on the probability that y=1 is easiest to read.

•  It measures the effect of a marginal increase in x on the probability of y=1.

•  This marginal effect is identified. It does not depend on the particular scaling of the coefficients.

•  In Stata it is computed by mfx after performing the logit/probit regression.

Marginal Effects •  Formula:

•  Marginal effects do not depend on a particular scaling of β.

•  Their value depends on the point at which the marginal effect is taken… (different from OLS A1, the model is nonlinear).

•  Either taken at the mean of the covariates, or the mean of the marginal effects is taken.

•  By default, mfx calculates the marginal effects or elasticities at the means of the independent variables.

∂P(y =1 x)∂xk

= βk f (x 'β)

STRUCTURAL INTERPRETATION:��RANDOM UTILITY MODEL

Structural interpretation (1/2)

•  y1* = U(smoking) = x’b1 + e1

•  y0* = U(not smoking) = x’b0 + e0

•  take the difference ��y* = y1* - y0* = x’(b1-b0) + e1-e0

•  write b = b1-b0 , and e = e1-e0.

•  The coefficient is the impact of the covariate on the relative preference for smoking.

Structural Interpretation (2/2)

•  If e1 and e0 are normally distributed, then e is normally distributed. The estimation of the model is done via probit.

•  If e1 and e0 are extreme-value distributed, then e is logistically distributed, the estimation of the model is done via logit.

•  Extreme value distribution:

TESTING ASSUMPTIONS

Testing assumptions

•  Test for the significance of a coefficient: z stat.

•  Linear and non-linear constraints: Likelihood ratio, Lagrange multiplier, Wald statistic.

Goodness of fit

• McFadden’s Pseudo R2:

•  Reported by Stata.

•  Equals 0 if the log likelihood of the model is equal to the log likelihood of the model with only a constant.

•  Equals 1 when ln L equals 0, i.e. when L equals 1, the model perfectly predicts outcomes.

PseudoR2 =1− lnL1lnL0

FURTHER REMARKS

Further remarks

•  Endogeneity: since the mean of the residual of the latent equation is assumed to be independent from the covariates, any correlation is an issue. Same reasoning as for OLS.

•  Direction and magnitude of biases are more complicated, but use the reasonings of econometrics A as a heuristic.

Further remarks

• Measurement error can also bias estimates although there is no corresponding theorem on attenuation bias. The direction and magnitude of the bias is unknown.

•  Fixed effects cannot typically be consistently estimated as T is fixed with N infinite as there is no within- or first-differenced transformation that leads to a specification without the fixed effects.

12 probit and logit - amine ouazad · binary dependent variable:! probit and logit" amine...

Documents