12 probit and logit - amine ouazad · binary dependent variable:! probit and logit" amine...
TRANSCRIPT
Binary dependent variable: ���
Probit and Logit Amine Ouazad
Ass. Prof. of Economics
Outline
1. Problemo
2. Probit/Logit Framework
3. Structural interpretation
4. Interpreting results
5. Testing assumptions
6. Further remarks
PROBLEMO: OLS WITH A BINARY DEPENDENT VARIABLE
Problemos • Consider the estimation of the probability of smoking y = x’b + e, where y
= 0,1 . x a set of covariates.
• We know that OLS is consistent, asympt. normal, and unbiased.
• However:
• The predictions can be outside 0,1.
• E(y|x) =x’b is the probability of smoking given the characteristics of the individual, E(y|x) = P(y=1|x).
• The residuals are not normal for a finite sample.
• Conditional on x, the residual takes one of two values e = 1 - x’b or e = - x’b.
• Under A6, the residuals would be N(0,s2), but that is not possible given y = 0,1.
• The residuals are heteroskedastic.
• Since y is binary, Var(y|x) = x’b(1-x’b)
Problemo 1: Predictions
• In-sample predictions may be outside 0,1
• Smoking = a + b Age + c Income + e
• regress smoking age income
• predict smoking , xb
• sum smoking
• Out-of-sample predictions may be outside 0,1
• Smoking = a + b Age + c Income + e
• regress smoking age income
• use another_dataset.dta
• predict smoking , xb
• sum smoking_predicted
Problemo 2: ���Normality of the residuals
• Recap: Normality of the residuals is needed for the validity of confidence intervals, test statistics, when “far from the asymptotics” (i.e. small sample size).
• But a normally distributed residuals cannot make y=0,1.
• Hence, in principle, confidence intervals and test statistics are incorrect if using an OLS regression.
• regress smoking age income
• predict resid, resid
• hist resid.
• See next page.
• Non normal residuals. Conditional on x, e can take only two values.
• Considering that x has a distribution, that can give a double-peaked distribution for the residuals such as this one. A6 is obviously violated.
Test of normality • Using the third and fourth moments of the
residuals. If they are normal, the kurtosis should be 3 and the skewness should be 0.
• regress y x
• predict epsilon, resid
• sum epsilon, detail
• hist epsilon
• sktest epsilon
PROBIT/LOGIT FRAMEWORK
Individuals’ preferences • Intuition is that individuals are making a discrete choice.
• y* = U(smoking)-U(not smoking) = x’b + e
• The difference in utilities is the benefit-cost analysis.
• The cost and benefit is unobserved, but the choice is ultimately observed.
• The cost and benefit is a continuous variable, so e can be normally distributed.
• Then P(y=1)=P(smoking) = P(y*>0)=P(x’b+e>0) = P(e>-x’b) = F(x’b)
• With a symmetric distribution P(e>-x’b)=P(e<x’b).
• And P(y=0) = F(x’b).
• F is the cdf of the residual.
Probit vs Logit
• Probit: the residual is normally distributed, with variance 1.
• Variance is fixed, more on this later.
• F(x) is the integral of the normal distribution.
• Logit: the residual has a logistic distribution.
• F(x) = ex/(1+ex).
• Choice of one versus the other makes little practical difference (and should make no practical difference, otherwise your model is not robust).
• Difficulty in the probit model is that the cdf has no closed form expression.
Likelihood of the model
• We observe the choices yi , i=1,2,...,N, and the characteristics xi.
• The likelihood of an observation yi,xi is:
• L(yi,xi;b) = P(yi=1|xi) if yi = 1
• L(yi,xi;b) = P(yi=0|xi) if yi = 0
• Combine: L = P(yi=1|xi)^yi P(yi=0|xi)^(1-yi)
• In logs, log L = yi log P(yi=1|xi) + (1-yi) log P(yi=0|xi)
Identification and estimation
• See Greene. The likelihood function has a single global maximum and is globally concave
INTERPRETING RESULTS
Variance is not identified
• The likelihood was maximized over the coefficient vector b only, because we fixed the value of the variance.
• The variance has to be fixed, by convention to 1 for probit, to π2/3 for logit.
• Indeed, consider a model where the variance of the residual is 4, and coefficients inflated by 2.
• The model generates the same probability of smoking as the original model.
• This also tells us that the absolute value of the coefficients have little interpretation. Only the odds ratios and the marginal effects have an interpretation.
Identification
• By the same reasoning, the value of a coefficient is not identified.
• The sign of a coefficient is identified.
• The ratio of two coefficients is identified.
• Implications for the interpretation of the logit coefficients?
Marginal effects
• The marginal effect of a covariate x on the probability that y=1 is easiest to read.
• It measures the effect of a marginal increase in x on the probability of y=1.
• This marginal effect is identified. It does not depend on the particular scaling of the coefficients.
• In Stata it is computed by mfx after performing the logit/probit regression.
Marginal Effects • Formula:
• Marginal effects do not depend on a particular scaling of β.
• Their value depends on the point at which the marginal effect is taken… (different from OLS A1, the model is nonlinear).
• Either taken at the mean of the covariates, or the mean of the marginal effects is taken.
• By default, mfx calculates the marginal effects or elasticities at the means of the independent variables.
∂P(y =1 x)∂xk
= βk f (x 'β)
STRUCTURAL INTERPRETATION:���RANDOM UTILITY MODEL
Structural interpretation (1/2)
• y1* = U(smoking) = x’b1 + e1
• y0* = U(not smoking) = x’b0 + e0
• take the difference ���y* = y1* - y0* = x’(b1-b0) + e1-e0
• write b = b1-b0 , and e = e1-e0.
• The coefficient is the impact of the covariate on the relative preference for smoking.
Structural Interpretation (2/2)
• If e1 and e0 are normally distributed, then e is normally distributed. The estimation of the model is done via probit.
• If e1 and e0 are extreme-value distributed, then e is logistically distributed, the estimation of the model is done via logit.
• Extreme value distribution:
TESTING ASSUMPTIONS
Testing assumptions
• Test for the significance of a coefficient: z stat.
• Linear and non-linear constraints: Likelihood ratio, Lagrange multiplier, Wald statistic.
Goodness of fit
• McFadden’s Pseudo R2:
• Reported by Stata.
• Equals 0 if the log likelihood of the model is equal to the log likelihood of the model with only a constant.
• Equals 1 when ln L equals 0, i.e. when L equals 1, the model perfectly predicts outcomes.
PseudoR2 =1− lnL1lnL0
FURTHER REMARKS
Further remarks
• Endogeneity: since the mean of the residual of the latent equation is assumed to be independent from the covariates, any correlation is an issue. Same reasoning as for OLS.
• Direction and magnitude of biases are more complicated, but use the reasonings of econometrics A as a heuristic.
Further remarks
• Measurement error can also bias estimates although there is no corresponding theorem on attenuation bias. The direction and magnitude of the bias is unknown.
• Fixed effects cannot typically be consistently estimated as T is fixed with N infinite as there is no within- or first-differenced transformation that leads to a specification without the fixed effects.