1
Qualitative
Response
Regression Models
Regression models when the
regressand, the response
variable is qualitative in nature
2
The Regression Analysis
The Regression Analysis deals with prediction of
the Mean value of the Dependent variable by
using information of Independent variables.
Nature of the dependent variable plays very
important role in the regression analysis.
Major types of the dependent variable
encountered in the regression analysis are
Quantitative and Qualitative types.
Estimation framework differ for both type of the
dependent variables.
3
Types of Qualitative
Dependent Variable
Following major types of Qualitative
Variables are met in practice:
Binary Variable
Categorical without Order (Nominal )
Categorical with Order (Ordinal)
4
Regression with Qualitative
Dependent Variable
Qualitative
Dependent
Binary
Dependent
Categorical
Dependent
Ordinal
Dependent
Quantitative
Independent
Qualitative
Independent
Logistic
Regression
Probit
Regression
Log–Linear
Models
Quantitative
Independent
Qualitative
Independent
Multinomial
Logistic
Discriminatory
Analysis
Log – Linear
Models
Quantitative
Independent
Qualitative
Independent
Ordinal
Logistic
Log – Linear
Models
5
Meaning of response function
when dependent variable is
binary
6
Consider SLR model Y=βo+ β1Xi +Ui
E(Yi)=βo+ β1Xi As E(Ui)=0 In case of 0,1 value, Yi is a Bernoulli random variable with probability distribution
Yi Probability
1 Pi
0 1- Pi
By the definition of expected value of a random variable
E(Yi)=1(Pi)+0(1- Pi)= Pi
• The mean response is interpreted as the probability of
success i.e Yi=1, when the regressor variable takes on the
values Xi.
• Linear regression model with a dependent variable that is
either 0 or 1 is called the Linear Probability Model, or LPM.
•The LPM predicts the probability of an event occurring, and,
like other linear models, says that the effects of X’s on the
probabilities are linear.
7
Meaning of Expected value of
dependent binary variable
When the dependent variable is a
Bernoulli variable (i.e. the value is either 0
or 1), then the linear regression model
actually predict the probability for 1 to
happen. Therefore, we also call this model
a “linear probability model”.
8
Difference Between Quantitative And
Qualitative Response Regression Models
In a model where the response, Y, is quantitative, our
objective is to estimate it’s expected, or mean, value
given the values of the regressors.
In models where the response, Y, is qualitative, our
objective is to find the probability of something
happening, such as voting for a Democratic
candidate, or owning a house, or belonging to a
union, or participating in a sport etc. Hence,
qualitative response regression models are often
known as probability models.
9
Approaches to developing a
probability model for a binary
response variable
The Linear Probability Model (LPM)
The Logit Model
The Probit Model
10
Problems with LPM
Non-normality of disturbance term
Non-constant variances
Constraints on the response function
Non-linearity
11
Non-normality of disturbance term
For a binary dependent variable, Like Y,
U also takes on two values as
Ui= Yi-βo- β1Xi
When Yi =1 Ui= 1-βo- β1Xi
Yi =0 Ui= 0-βo- β1Xi
Obviously U cannot be assumed to be
normally distributed; actually it fallows the
binomial distribution
12
Non-constant variances
The probability distribution of U is
Var(U)=E[U-EU]2=EUi
2
(1-βo- β1Xi)2P+(-βo- β1Xi)
2(1-P)
= (1-βo- β1Xi)2(βo+ β1Xi)+ (-βo- β1Xi)
2(1-βo- β1Xi)
=(βo+ β1Xi)(1-βo- β1Xi)
So var(U) depends on X. Hence the variances of
disturbances will differ at different levels of X and OLS will
no longer be optimal (Estimators are although unbiased,
but they do not have minimum variance i.e not BLUE)
U Probability
1-βo- β1Xi Pi
-βo- β1Xi 1- Pi
13
Constraints on the response
function
Since the response function represents
probabilities when the dependent variable is
a binary variable and hence 0<E(Yi)<1
A linear response function may fall outside the
constraint limits within the range of the
independent variable in the scope of the
model.
14
Non-linearity
Even if predicted probabilities did not take on impossible
values, the assumption that there will be a straight linear
relationship between the IV and the DV is also very
questionable when the DV is a dichotomy.
For example, suppose that the DV is home ownership, and one
of the IVs is income. According to the LPM, an increase in
wealth of $50,000 will have the same effect on ownership
regardless of whether the family starts with 0 wealth or wealth
of $1 million. Certainly, a family with $50,000 is more likely to
own a home than one with $0. But, a millionaire is very likely to
own a home, and the addition of $50,000 is not going to
increase the likelihood of home ownership very much.
15
Structural defects of LPM
The LPM predicts Π < 0 and Π > 1 for sufficiently large orsmall x values
The LPM assumes that Π(X) increases linearly with X,that is, the marginal or incremental effect of X remainsconstant throughout
The variability is not constant, but rather depends on Xthrough its influence on Π. Thus the ordinary leastsquares estimators are not BLUE.
Y, being binary, is very far from normally distributed
16
Solution for the problems
Increase the sample size, so that
Binomial distribution tends to Normal
distribution
17
Constant Variance
Since the variance of U depends on the expected value of Y i.e variance are
heteroscadastic. One way of handling the problem of non-constant
variances of disturbances is through the use of weighted least squares.
That is, through appropriate weighting schemes, errors can be made
homoskedastic
To apply weighted least square transform the data by dividing both sides of the model by
iii WYEYE ))(1)((
The transformed model is W
U
W
X
WW
Y i
i
O
ii
i 1
1
18
Problem with Weighted least square
But Wi’s are unknown as it involve EYi which contains
Unknown parameters. So estimate Wi by using two
stage least square procedure
Stage-I: Fit the regression model by OLS and
estimate the parameters.
Stage-II:-Estimate the Weights by using estimated
parameters in stage-I and then transformed the
original model and then estimate the transformed
model by OLS
NOTE;- the application of OLS on transformed model
is called Weighted least square (WLS).
19
Solution for constraints on the
response function
There are two ways of handling the problem about the
constraints on the response function.
1. Estimate the model by usual OLS and find out whether
the Yi^ lie between 0 and 1. If some are less then 0, Y^
is assumed to be zero for those cases, if they are
greater then 1, they are assumed to be 1.
2. Use an estimating technique that will guarantee that the
estimated Y will lie between 0 and 1.
The LOGIT and PROBIT models will guarantee that the
estimated probabilities will indeed lie between the
logical limit 0 and 1.
20
We need a model that has two features:
AS Xi increases, Pi=E(Y=1IX) also increases but never steps outside the 0-1 interval
The relationship between Pi and Xi is nonlinear, i.e. one approaches zero at slower and slower rates as xi gets small and approaches one at slower and slower rates as Xi gets very large.
The LOGIT and PROBIT models will guarantee that the estimated probabilities will indeed lie between the logical limit 0 and 1.
Solution for constraints on
the response function
21
Logistic RegressionBoth theoretical and empirical consideration suggests that when the
dependent variable is an indicator variable the shape of the response function will frequently curvilinear. The following figure contains a curvilinear response function which has been found appropriate in many instances involving a binary dependent variable
22
Note that this response function is shaped like a tilted S and that it has asymptotes
at 0 and 1. The latter features assumes that the constraints on E(Y) i.e 0<E(Y) <1
are automatically met. The response function in the above figure is called the
logistic function and is given by
cumulative logistic distribution function
X
X
iio
o
e
eYEP
1
1
1)(
Here Pi is not only non-linear in X but in B,s as well , so we cannot use the usual OLS
method to estimate the parameters which require parameters in linear form. The non-
linear of parameters in logistic function can be solved easily because logistic function is
intrinsically linear the linearization of the logistic function is as follows
23
o 1
o 1 o 1
o 1
β β X
i β β X β β X
β β Xi
i
io 1
i
e 11 P 1
1 e 1+e
Pe odd-Ratio
1 P
PLi ln β β X log of Odd-Ratio also called logit
1 P
Interpretation of slope coefficientOne unit increase in X will result in a β1 increase in the log odds or in terms of
odd ratio one unit increase in X will result in exp(β1) increase in Odds.
24
Advantage of log odds (Logit)• Although probabilities can range from 0 to 1, log odds can range from -∞
to +∞.
•Although Logits are linear in X , the probabilities themselves are not, in
contrasts to the LPM where probabilities increase linearly with X.
•Logit is a linear function of unknown parameters so we can apply
standard theory of OLS to estimate parameters in the logit model in usual
way.
• Log odds follow an S-shaped curve. At the extremes, changes in the log
odds produce very little change in the probabilities. In the middle of the S
curve, changes in the log odds produce much larger changes in the
probabilities.
• To put it another way, linear, additive increases in the log odds produce
nonlinear changes in the probabilities.
NOTE:- We can add as many regressors as may be dictated by the
underlying theory.
25
0
P1
X + ∞- ∞
Comparison of LPM and
Logistic Regression
26
Features of Logit Model
As Π goes from 0 to 1 (i.e., as η varies from -∞ to +∞), the logit Lgoes from -∞ to +∞. That is, although the probabilities (ofnecessity) lie between 0 and 1, the logits are not so bound.
Whereas the LPM assumes that Π(X) is linearly related to X, thelogit model assumes that the log odds is linearly related to X.
Although L is linear in X, the probabilities themselves are not. Thisproperty is in contrast with the LPM where the probabilitiesincrease linearly with X.
If L, the logit, is positive, it means that when the value of theregressor(s) increases, the odds that the regressand equals 1(meaning some event of interest happens) increases. If L, thelogit, is negative, the odds that the regressand equals 1decreases as the value of X increases.
27
Estimation of the Logit model
Data at the individual level :
If we have data at the individuakl level then we can not estimate the Logit model by
standard OLS metod As if Y=1 then
0
1ln and if Y=0 then
1
0ln are both meaningless. In
such situation we can estimate parameters by MLE
Grouped or Replicated Data:
If replicated data is available i.e corresponding to each level of Xi, ni observation out of
which Ri (Ri<ni) belong to category in which Y=1 is available then we can compute i
ii
n
RP
which is the relative frequency, we can use it as an estimate of the true Pi corresponding to each
Xi. If ni is fairly large Pi^ will be reasonably good estimate of Pi
It can be shown that if ni is fairly large and if each observation is distributed independently as a
binomial variable then
)1(
1
i
iPniPi
NU
28
So the disturbance term in the logit model is heteroscedastic. Thus we have to use Weighted least
square instead of OLS with weights )1( PiniPiWi
Weighted Least Square:
Transformed the original model as UiWiXiWiWiLiWi o 1
Apply Least square to the transformed model ( Called WLS)
As the transformed disturbance term is homoscedastic so application of Least square method to
the transformed model will yield BLUE estimators.
29
Example: Logistic regression
In a study of the effectiveness of coupons offering a price reduction on a
given product, 1,000 homes were selected and a coupon and advertising
material for the product were mailed to each. The coupons offered different
price reduction (5, 10, 15, 20, and 30 cents), and 200 homes were
assigned at random to each of the price reduction categories. The
independent variable X in this study is the amount of price reduction, and
the dependent variable Y is a binary variable indicating whether or not the
coupon was redeemed with in a six-month period. It was expected that the
logistic response function would be an appropriate description of the
relation between price reduction and probability that the coupon is utilized.
Xi ni Ri
5 200 32
10 200 51
15 200 70
20 200 103
30 200 148
30
W LW ˆˆ W XiW
26.88 -8.597 5.1846 25.923
37.995 -6.609 6.164 61.64
45.5 -4.176 6.7454 101.18
49.955 0.4242 7.0679 141.36
38.48 6.4884 6.2032 186.1
Logit VS X
-2
-1.5
-1
-0.5
0
0.5
1
1.5
5 10 15 20 30
Price
Lo
gits
31
i
*
ii
*
i
**
1i
*
i
WXiWLLWhere
WL
i
iio
Xand
X
Estimate the above multiple linear regression equation by OLS (weighted least square) by
considering two independent variables iW and *
iX with regression through origin
LogitsUnWeightedXi0.10872.18506L
LogitsWeightedWXi0.1087W2.18506L
i
ii
*
i
32
LogitsUnWeightedXi0.10872.18506L
LogitsWeightedWXi0.1087W2.18506WL
i
iiii
Direction of relationship:
The direction of relationship (positive or negative) reflects the changes in
the dependent variable associated with changes in the independent
variable. A positive relationship means that an increase in the independent
variable is associated with increase in the predicted probability and vice
versa for a negative relationship
Interpreting the Direction of ORIGINAL COEFFICIENTS:
The sign of the original coefficient (+ve or –ve) indicates the direction of the
relationship. A positive coefficient increases the probability , where as a negative value
decreases the predicted probability In above example sign of b1 (0.1087) is positive
which indicates that increase in price reduction will increase its utility
Interpreting the Direction of EXPONENTIATED COEFFICIENTS:
An exponentiated coefficient above 1 reflect positive relationship
and value less than 1 represent negative relationship. In above
example is greater than 1 so increase in price
reduction will increase its probability of being use.
1be
11.11087.01 eeb
33
Xi0.10872.18506Li
Interpreting the magnitude of relationship
The odds ratio for an independent variable represents the change in the odds for
a one unit change in the independent variable holding all the other independent
variables constant.
Estimated odd-ratio for X i.e Exponentiated coefficient (1.11) indicates that with
the increase of 1 $ in price reduction chance of using the coupons is 1.11 times
higher than not use.
Interpreting the magnitude of relationship
Suppose we want to compare the odds of using a coupon of price reduction 10
to the odds of using a coupon of price reduction 30.
8.8)1087.0(20120 ee b
The result indicates that estimated odds of using a coupon with price
reduction 30 is 8.8 times greater than the estimated odds of using a
coupon with price reduction 10 i.e estimated odds ratio for increase of
$20 in price reduction is 8.8.
34
Percentage Change in Odd Ratio
Odds at X Xbboe 1
Odds at (X+1) 11111 )1( bXbbbXbbXbb
eeee ooo
Change in odds due to one unit change in X
111
111
bXbb
XbbbXbb
ee
eee
o
oo
Percentage change in odds due to one unit change in X
=
1001100
11
1
11
b
Xbb
bXbb
ee
eeo
o
35
Percentage Change in Odd Ratio
For above example, Percentage change in odds due to one unit change in X
%48.1110011087.0 e
One $ increase in price reduction will increase the odds by 11.48% i.e chances of coupon use
will increase by 11.48% with one $ increase in price reduction
36
Xi0.10872.18506Li
The estimated probability of a coupon redemption when the price reduction is 25 cents is given by
63.011 53244.0
53244.0
)25(1087.018506.2
)25(1087.018506.2
e
e
e
ePi
Hence we estimate that about 63% of 25-cents coupons will be redeemed with in six month
37
Output from MINITABPredictor Coef SE Z P Ratio Lower Upper
Constant -2.18551 0.164667 -13.27 0.000
X 0.108719 0.0088429 12.29 0.000 1.11 1.10 1.13
X n P^
5 200 0.162205
10 200 0.250056
15 200 0.364770
20 200 0.497219
30 200 0.745749
38
The following table gives change in probability at different price levels
Xi ni Ri
Probability
)(1087.018506.2
)(1087.018506.2
1ˆ
i
i
X
X
ie
eP
Change in Probability
)ˆ1(ˆ1 ii PPb
5 200 32 0.16221 0.01477
10 200 51 0.25006 0.02039
15 200 70 0.36477 0.02519
20 200 103 0.49722 0.02718
30 200 148 0.74575 0.02061
30252015105
0.0275
0.0250
0.0225
0.0200
0.0175
0.0150
Price Reduction
Ch
an
ge
in
Pro
ba
bilit
y
Change in probability VS price reduction
39
Logit model for
ungrouped or individual dataExample:
Let Y=1 if a student’s final grade in an intermediate microeconomics course was A and
Y=0 if the final grade was a B or C. (not A)
Spector and Mazzeo used grade point average (GPA), TUCE, and Personalized system of instruction (PSI) as the grade predictors. Fit an Logit model to this data set.
Where
TUCE= score on an examination given at the beginning of the term to test entering knowledge of macroeconomics
PSI=1 if the new teaching method is used
=0 otherwise
40
GPA
grade
TUCE
grade
PSI Grade Letter
grade
GPA
grade
TUCE
grade
PSI Grade Letter
grade
1 2.66 20 0 0 C 17 2.75 25 0 0 C
2 2.89 22 0 0 B 18 2.83 19 0 0 C
3 3.28 24 0 0 B 19 3.12 23 1 0 B
4 2.92 12 0 0 B 20 3.16 25 1 1 A
5 4.00 21 0 1 A 21 2.06 22 1 0 C
6 2.86 17 0 0 B 22 3.62 28 1 1 A
7 2.76 17 0 0 B 23 2.89 14 1 0 C
8 2.87 21 0 0 B 24 3.51 26 1 0 B
9 3.03 25 0 0 C 25 3.54 24 1 1 A
10 3.92 29 0 1 A 26 2.83 27 1 1 A
11 2.63 20 0 0 C 27 3.39 17 1 1 A
12 3.32 23 0 0 B 28 2.67 24 1 0 B
13 3.57 23 0 0 B 29 3.65 21 1 1 A
14 3.26 25 0 1 A 30 4.00 23 1 1 A
15 3.53 26 0 0 B 31 3.10 21 1 0 C
16 2.74 19 0 0 B 32 2.39 19 1 1 A
41
42
The logit model here can be written as
1 2 3 41
ii i i i i
i
PL GPA TUCE PSI u
P
Odds 95% CI
Predictor Coef SE Z P Ratio Lower Upper
Constant -13.0213 4.93100 -2.64 0.008
GPA 2.82611 1.26289 2.24 0.025 16.88 1.42 200.61
TUCE 0.0951577 0.141550 0.67 0.501 1.10 0.83 1.45
PSI=1 2.37869 1.06452 2.23 0.025 10.79 1.34 86.93
PSITUCEGPALi 3787.209516.082611.20213.13ˆ
By using MINITAB software
43
Interpretation of regression coefficient
For quantitative regressors, slope coefficient is a partial slope coefficient that measures the change in the estimated logit for a unit change in the value of given quantitative regressor keeping the effect of other regressors constant.
Coefficient of GPA
With other variables held constant one unit increase in GPA will increase on the average the estimated logit by 2.83 times
In terms of odd ratio, keeping the effect of other variables constant with one unit increase in GPA there is 17 times likely to get A-grade
8261.2e
PSITUCEGPALi 3787.209516.082611.20213.13ˆ
44
Interpretation of regression coefficient
Coefficient of TUCE
In terms of odd ratio, keeping the effect of other variablesconstant with one unit increase in TUCE there is 1.10 timesl likely to get A-grade.
Coefficient of PSI
In terms of odd ratio, keeping the effect of other variablesconstant, students who are exposed to the new method ofteaching are more than 10 times likely to get A-grade as compare to those students who are not exposed toit
09516.0e
ˆ 13.0213 2.82611GPA 0.09516 TUCE 2.3787 PSIiL
2.3787e
45
Probit Regression Models
Probit is a variant of logit modeling based ondifferent data assumptions.
The term "probit' was coined in the 1930's byChester Bliss and stands for probability unit.These two analyses, logit and probit, are verysimilar to one another.
46
Similarities and differences
between Logit and Probit models
Neither the logit model nor the probit model are linear,which makes things difficult.
To make the model linear, a transformation is done onthe dependent variable.
Both methods use maximum likelihood, and so requiremore cases than a similar OLS model. Unlike logitmodels, we don't get odds ratios with probit models.
Probit regression is an alternative log-linear approach tohandling categorical dependent variables.
47
48
49
The probit model is defined as
Pr(y=1|x) = Φ(xb)
where Φ is the standard cumulative normalprobability distribution and xb is called the probitscore or index.
Interpretation
Since xb has a normal distribution so itsinterpretation can be made as, the probit coefficient,b, is that a one-unit increase in the predictor leads toincreasing the probit score by b standard deviations
Here ‘xb’ is representing a multiplication of twomatrices, where x is a matrix of x’s and b is a matrixof β’s that is
x = [1,x1,x2,x3,…,xn] and b = [β1,β2,β3,…,βn]
50
The log-likelihood function for probit is
where wj denotes optional weights
Logit models are more popular than probit
models.
51
Example: If salary of people is given and we are interestedto find which of them is likely to own a house. Hereowning a house is binary variable (also dependentvariable) as every one will either buy a house or not asthere is not third situation can happen. So in thissituation Probit regression will be a better fit than LPM(linear probability model).A data on Xi (INCOME), ni (Number of families at income Xi ),and Ri(Number of families owning a house) . is given below
-----------------------------------------------------------------
Xi ni Ri
(thousand of dollars)
-----------------------------------------------------------------
6 40 8
8 50 12
10 60 18
13 80 28
15 100 45
20 70 36
25 65 39
30 50 33
35 40 30
40 25 20
-----------------------------------------------------------------
52
For Probit regression estimating the index , I, from the standard normal, CDF as
----------------------------------------------------------------------------------------------
Xi ni Ri P^i Ii = F-1(P^i )
(thousand of dollars)
----------------------------------------------------------------------------------------------
6 40 8 0.20 -0.8416
8 50 12 0.24 -0.7063
10 60 18 0.30 0.5244
13 80 28 0.35 0.3853
15 100 45 0.45 0.1257
20 70 36 0.51 0.0251
25 65 39 0.60 0.2533
30 50 33 0.66 0.4125
35 40 30 0.75 0.6745
40 25 20 0.80 0.8416
---------------------------------------------------------------------------------------------
53
Results :
The results are as follows
Dependent Variable: I---------------------------------------------------------------------------------------------
Variable coefficient std. error t-statistic probability
---------------------------------------------------------------------------------------------
C -1.0166 0.0572 -17.7473 1.0397E-07
Income 0.04846 0.00247 19.5585 4.8547E-08
---------------------------------------------------------------------------------------------
R2 = 0.97951 Durbin Watson statistics = 0.91384
54
Discussion :The index Ii is expressed as
Ii = β1 + β2Xi
The model is
Pi = P(Y = 1/X) = P(Z ≤ Ii) = P(Z ≤ β1 + β2Xi) = F(β1 +β2Xi)
We want to find out the effect of a unit change in X(income measured in thousands of dollars) on theprobability that Y = 1, that is, a family purchases ahouse. So here we will take the derivative of the functionwith respect to X (that is, rate of change of probabilitywith respect to income)
dPi/dxi = f(β1 + β2Xi)β2
where f(β1 + β2Xi) is a standard normal probabilitydensity function evaluated at β1 + β2Xi.
55
Take X= 6 (thousand dollars)
So the normal density function is
f[-1.0166 + 0.04846(6)] = f(-0.72548)
Referring to normal distribution tables Z = -0.72548,the normal density is about 0.3066
Now,
0.3066* β2 = 0.3066*0.04846 = 0.01485 = 1.4%
So starting with an income level of $6000, if theincome goes up to $1000, the probability of familypurchasing a house goes up by about 1.4%.
56
Practical Importance of Probit Regression
The probit analysis is commonly used
to analyze the data from medical field.
57
Drawback of Probit Regression:
The drawback is that, probit coefficients are moredifficult to interpret, hence they are less used,though the choice is largely one of personalpreference. Both the cumulative standard normalcurve used by probit as a transform and the logisticcurve used in logistic regression display the S-shaped curve.
58
Basic assumption of Probit Regression:
As logistic regression is based on the assumption
that the categorical dependent reflects an underlying
qualitative variable and uses the binomial
distribution, probit regression assumes the
categorical dependent reflects an underlying
quantitative variable and it uses the cumulative
normal distribution.