Download - Qualitative Response Regression Models · 2016. 10. 5. · In models where the response, Y, is qualitative, our objective is to find the probability of something happening, such as

1

Qualitative

Response

Regression Models

Regression models when the

regressand, the response

variable is qualitative in nature

2

The Regression Analysis

The Regression Analysis deals with prediction of

the Mean value of the Dependent variable by

using information of Independent variables.

Nature of the dependent variable plays very

important role in the regression analysis.

Major types of the dependent variable

encountered in the regression analysis are

Quantitative and Qualitative types.

Estimation framework differ for both type of the

dependent variables.

3

Types of Qualitative

Dependent Variable

Following major types of Qualitative

Variables are met in practice:

Binary Variable

Categorical without Order (Nominal )

Categorical with Order (Ordinal)

4

Regression with Qualitative

Dependent Variable

Qualitative

Dependent

Binary

Dependent

Categorical

Dependent

Ordinal

Dependent

Quantitative

Independent

Qualitative

Independent

Logistic

Regression

Probit

Regression

Log–Linear

Models

Quantitative

Independent

Qualitative

Independent

Multinomial

Logistic

Discriminatory

Analysis

Log – Linear

Models

Quantitative

Independent

Qualitative

Independent

Ordinal

Logistic

Log – Linear

Models

5

Meaning of response function

when dependent variable is

binary

6

Consider SLR model Y=βo+ β1Xi +Ui

E(Yi)=βo+ β1Xi As E(Ui)=0 In case of 0,1 value, Yi is a Bernoulli random variable with probability distribution

Yi Probability

1 Pi

0 1- Pi

By the definition of expected value of a random variable

E(Yi)=1(Pi)+0(1- Pi)= Pi

• The mean response is interpreted as the probability of

success i.e Yi=1, when the regressor variable takes on the

values Xi.

• Linear regression model with a dependent variable that is

either 0 or 1 is called the Linear Probability Model, or LPM.

•The LPM predicts the probability of an event occurring, and,

like other linear models, says that the effects of X’s on the

probabilities are linear.

7

Meaning of Expected value of

dependent binary variable

When the dependent variable is a

Bernoulli variable (i.e. the value is either 0

or 1), then the linear regression model

actually predict the probability for 1 to

happen. Therefore, we also call this model

a “linear probability model”.

8

Difference Between Quantitative And

Qualitative Response Regression Models

In a model where the response, Y, is quantitative, our

objective is to estimate it’s expected, or mean, value

given the values of the regressors.

In models where the response, Y, is qualitative, our

objective is to find the probability of something

happening, such as voting for a Democratic

candidate, or owning a house, or belonging to a

union, or participating in a sport etc. Hence,

qualitative response regression models are often

known as probability models.

9

Approaches to developing a

probability model for a binary

response variable

The Linear Probability Model (LPM)

The Logit Model

The Probit Model

10

Problems with LPM

Non-normality of disturbance term

Non-constant variances

Constraints on the response function

Non-linearity

11

Non-normality of disturbance term

For a binary dependent variable, Like Y,

U also takes on two values as

Ui= Yi-βo- β1Xi

When Yi =1 Ui= 1-βo- β1Xi

Yi =0 Ui= 0-βo- β1Xi

Obviously U cannot be assumed to be

normally distributed; actually it fallows the

binomial distribution

12

Non-constant variances

The probability distribution of U is

Var(U)=E[U-EU]2=EUi

2

(1-βo- β1Xi)2P+(-βo- β1Xi)

2(1-P)

= (1-βo- β1Xi)2(βo+ β1Xi)+ (-βo- β1Xi)

2(1-βo- β1Xi)

=(βo+ β1Xi)(1-βo- β1Xi)

So var(U) depends on X. Hence the variances of

disturbances will differ at different levels of X and OLS will

no longer be optimal (Estimators are although unbiased,

but they do not have minimum variance i.e not BLUE)

U Probability

1-βo- β1Xi Pi

-βo- β1Xi 1- Pi

13

Constraints on the response

function

Since the response function represents

probabilities when the dependent variable is

a binary variable and hence 0<E(Yi)<1

A linear response function may fall outside the

constraint limits within the range of the

independent variable in the scope of the

model.

14

Non-linearity

Even if predicted probabilities did not take on impossible

values, the assumption that there will be a straight linear

relationship between the IV and the DV is also very

questionable when the DV is a dichotomy.

For example, suppose that the DV is home ownership, and one

of the IVs is income. According to the LPM, an increase in

wealth of $50,000 will have the same effect on ownership

regardless of whether the family starts with 0 wealth or wealth

of $1 million. Certainly, a family with $50,000 is more likely to

own a home than one with $0. But, a millionaire is very likely to

own a home, and the addition of $50,000 is not going to

increase the likelihood of home ownership very much.

15

Structural defects of LPM

The LPM predicts Π < 0 and Π > 1 for sufficiently large orsmall x values

The LPM assumes that Π(X) increases linearly with X,that is, the marginal or incremental effect of X remainsconstant throughout

The variability is not constant, but rather depends on Xthrough its influence on Π. Thus the ordinary leastsquares estimators are not BLUE.

Y, being binary, is very far from normally distributed

16

Solution for the problems

Increase the sample size, so that

Binomial distribution tends to Normal

distribution

17

Constant Variance

Since the variance of U depends on the expected value of Y i.e variance are

heteroscadastic. One way of handling the problem of non-constant

variances of disturbances is through the use of weighted least squares.

That is, through appropriate weighting schemes, errors can be made

homoskedastic

To apply weighted least square transform the data by dividing both sides of the model by

iii WYEYE ))(1)((

The transformed model is W

U

W

X

WW

Y i

i

O

ii

i 1

1

18

Problem with Weighted least square

But Wi’s are unknown as it involve EYi which contains

Unknown parameters. So estimate Wi by using two

stage least square procedure

Stage-I: Fit the regression model by OLS and

estimate the parameters.

Stage-II:-Estimate the Weights by using estimated

parameters in stage-I and then transformed the

original model and then estimate the transformed

model by OLS

NOTE;- the application of OLS on transformed model

is called Weighted least square (WLS).

19

Solution for constraints on the

response function

There are two ways of handling the problem about the

constraints on the response function.

1. Estimate the model by usual OLS and find out whether

the Yi^ lie between 0 and 1. If some are less then 0, Y^

is assumed to be zero for those cases, if they are

greater then 1, they are assumed to be 1.

2. Use an estimating technique that will guarantee that the

estimated Y will lie between 0 and 1.

The LOGIT and PROBIT models will guarantee that the

estimated probabilities will indeed lie between the

logical limit 0 and 1.

20

We need a model that has two features:

AS Xi increases, Pi=E(Y=1IX) also increases but never steps outside the 0-1 interval

The relationship between Pi and Xi is nonlinear, i.e. one approaches zero at slower and slower rates as xi gets small and approaches one at slower and slower rates as Xi gets very large.

The LOGIT and PROBIT models will guarantee that the estimated probabilities will indeed lie between the logical limit 0 and 1.

Solution for constraints on

the response function

21

Logistic RegressionBoth theoretical and empirical consideration suggests that when the

dependent variable is an indicator variable the shape of the response function will frequently curvilinear. The following figure contains a curvilinear response function which has been found appropriate in many instances involving a binary dependent variable

22

Note that this response function is shaped like a tilted S and that it has asymptotes

at 0 and 1. The latter features assumes that the constraints on E(Y) i.e 0<E(Y) <1

are automatically met. The response function in the above figure is called the

logistic function and is given by

cumulative logistic distribution function

X

X

iio

o

e

eYEP

1

1

1)(

Here Pi is not only non-linear in X but in B,s as well , so we cannot use the usual OLS

method to estimate the parameters which require parameters in linear form. The non-

linear of parameters in logistic function can be solved easily because logistic function is

intrinsically linear the linearization of the logistic function is as follows

23

o 1

o 1 o 1

o 1

β β X

i β β X β β X

β β Xi

i

io 1

i

e 11 P 1

1 e 1+e

Pe odd-Ratio

1 P

PLi ln β β X log of Odd-Ratio also called logit

1 P

Interpretation of slope coefficientOne unit increase in X will result in a β1 increase in the log odds or in terms of

odd ratio one unit increase in X will result in exp(β1) increase in Odds.

24

Advantage of log odds (Logit)• Although probabilities can range from 0 to 1, log odds can range from -∞

to +∞.

•Although Logits are linear in X , the probabilities themselves are not, in

contrasts to the LPM where probabilities increase linearly with X.

•Logit is a linear function of unknown parameters so we can apply

standard theory of OLS to estimate parameters in the logit model in usual

way.

• Log odds follow an S-shaped curve. At the extremes, changes in the log

odds produce very little change in the probabilities. In the middle of the S

curve, changes in the log odds produce much larger changes in the

probabilities.

• To put it another way, linear, additive increases in the log odds produce

nonlinear changes in the probabilities.

NOTE:- We can add as many regressors as may be dictated by the

underlying theory.

25

0

P1

X + ∞- ∞

Comparison of LPM and

Logistic Regression

26

Features of Logit Model

As Π goes from 0 to 1 (i.e., as η varies from -∞ to +∞), the logit Lgoes from -∞ to +∞. That is, although the probabilities (ofnecessity) lie between 0 and 1, the logits are not so bound.

Whereas the LPM assumes that Π(X) is linearly related to X, thelogit model assumes that the log odds is linearly related to X.

Although L is linear in X, the probabilities themselves are not. Thisproperty is in contrast with the LPM where the probabilitiesincrease linearly with X.

If L, the logit, is positive, it means that when the value of theregressor(s) increases, the odds that the regressand equals 1(meaning some event of interest happens) increases. If L, thelogit, is negative, the odds that the regressand equals 1decreases as the value of X increases.

27

Estimation of the Logit model

Data at the individual level :

If we have data at the individuakl level then we can not estimate the Logit model by

standard OLS metod As if Y=1 then

0

1ln and if Y=0 then

1

0ln are both meaningless. In

such situation we can estimate parameters by MLE

Grouped or Replicated Data:

If replicated data is available i.e corresponding to each level of Xi, ni observation out of

which Ri (Ri<ni) belong to category in which Y=1 is available then we can compute i

ii

n

RP

which is the relative frequency, we can use it as an estimate of the true Pi corresponding to each

Xi. If ni is fairly large Pi^ will be reasonably good estimate of Pi

It can be shown that if ni is fairly large and if each observation is distributed independently as a

binomial variable then

)1(

1

i

iPniPi

NU

28

So the disturbance term in the logit model is heteroscedastic. Thus we have to use Weighted least

square instead of OLS with weights )1( PiniPiWi

Weighted Least Square:

Transformed the original model as UiWiXiWiWiLiWi o 1

Apply Least square to the transformed model ( Called WLS)

As the transformed disturbance term is homoscedastic so application of Least square method to

the transformed model will yield BLUE estimators.

29

Example: Logistic regression

In a study of the effectiveness of coupons offering a price reduction on a

given product, 1,000 homes were selected and a coupon and advertising

material for the product were mailed to each. The coupons offered different

price reduction (5, 10, 15, 20, and 30 cents), and 200 homes were

assigned at random to each of the price reduction categories. The

independent variable X in this study is the amount of price reduction, and

the dependent variable Y is a binary variable indicating whether or not the

coupon was redeemed with in a six-month period. It was expected that the

logistic response function would be an appropriate description of the

relation between price reduction and probability that the coupon is utilized.

Xi ni Ri

5 200 32

10 200 51

15 200 70

20 200 103

30 200 148

30

W LW ˆˆ W XiW

26.88 -8.597 5.1846 25.923

37.995 -6.609 6.164 61.64

45.5 -4.176 6.7454 101.18

49.955 0.4242 7.0679 141.36

38.48 6.4884 6.2032 186.1

Logit VS X

-2

-1.5

-1

-0.5

0

0.5

1

1.5

5 10 15 20 30

Price

Lo

gits

31

i

*

ii

*

i

**

1i

*

i

WXiWLLWhere

WL

i

iio

Xand

X

Estimate the above multiple linear regression equation by OLS (weighted least square) by

considering two independent variables iW and *

iX with regression through origin

LogitsUnWeightedXi0.10872.18506L

LogitsWeightedWXi0.1087W2.18506L

i

ii

*

i

32

LogitsUnWeightedXi0.10872.18506L

LogitsWeightedWXi0.1087W2.18506WL

i

iiii

Direction of relationship:

The direction of relationship (positive or negative) reflects the changes in

the dependent variable associated with changes in the independent

variable. A positive relationship means that an increase in the independent

variable is associated with increase in the predicted probability and vice

versa for a negative relationship

Interpreting the Direction of ORIGINAL COEFFICIENTS:

The sign of the original coefficient (+ve or –ve) indicates the direction of the

relationship. A positive coefficient increases the probability , where as a negative value

decreases the predicted probability In above example sign of b1 (0.1087) is positive

which indicates that increase in price reduction will increase its utility

Interpreting the Direction of EXPONENTIATED COEFFICIENTS:

An exponentiated coefficient above 1 reflect positive relationship

and value less than 1 represent negative relationship. In above

example is greater than 1 so increase in price

reduction will increase its probability of being use.

1be

11.11087.01 eeb

33

Xi0.10872.18506Li

Interpreting the magnitude of relationship

The odds ratio for an independent variable represents the change in the odds for

a one unit change in the independent variable holding all the other independent

variables constant.

Estimated odd-ratio for X i.e Exponentiated coefficient (1.11) indicates that with

the increase of 1 $ in price reduction chance of using the coupons is 1.11 times

higher than not use.

Interpreting the magnitude of relationship

Suppose we want to compare the odds of using a coupon of price reduction 10

to the odds of using a coupon of price reduction 30.

8.8)1087.0(20120 ee b

The result indicates that estimated odds of using a coupon with price

reduction 30 is 8.8 times greater than the estimated odds of using a

coupon with price reduction 10 i.e estimated odds ratio for increase of

$20 in price reduction is 8.8.

34

Percentage Change in Odd Ratio

Odds at X Xbboe 1

Odds at (X+1) 11111 )1( bXbbbXbbXbb

eeee ooo

Change in odds due to one unit change in X

111

111

bXbb

XbbbXbb

ee

eee

o

oo

Percentage change in odds due to one unit change in X

=

1001100

11

1

11

b

Xbb

bXbb

ee

eeo

o

35

Percentage Change in Odd Ratio

For above example, Percentage change in odds due to one unit change in X

%48.1110011087.0 e

One $ increase in price reduction will increase the odds by 11.48% i.e chances of coupon use

will increase by 11.48% with one $ increase in price reduction

36

Xi0.10872.18506Li

The estimated probability of a coupon redemption when the price reduction is 25 cents is given by

63.011 53244.0

53244.0

)25(1087.018506.2

)25(1087.018506.2

e

e

e

ePi

Hence we estimate that about 63% of 25-cents coupons will be redeemed with in six month

37

Output from MINITABPredictor Coef SE Z P Ratio Lower Upper

Constant -2.18551 0.164667 -13.27 0.000

X 0.108719 0.0088429 12.29 0.000 1.11 1.10 1.13

X n P^

5 200 0.162205

10 200 0.250056

15 200 0.364770

20 200 0.497219

30 200 0.745749

38

The following table gives change in probability at different price levels

Xi ni Ri

Probability

)(1087.018506.2

)(1087.018506.2

1ˆ

i

i

X

X

ie

eP

Change in Probability

)ˆ1(ˆ1 ii PPb

5 200 32 0.16221 0.01477

10 200 51 0.25006 0.02039

15 200 70 0.36477 0.02519

20 200 103 0.49722 0.02718

30 200 148 0.74575 0.02061

30252015105

0.0275

0.0250

0.0225

0.0200

0.0175

0.0150

Price Reduction

Ch

an

ge

in

Pro

ba

bilit

y

Change in probability VS price reduction

39

Logit model for

ungrouped or individual dataExample:

Let Y=1 if a student’s final grade in an intermediate microeconomics course was A and

Y=0 if the final grade was a B or C. (not A)

Spector and Mazzeo used grade point average (GPA), TUCE, and Personalized system of instruction (PSI) as the grade predictors. Fit an Logit model to this data set.

Where

TUCE= score on an examination given at the beginning of the term to test entering knowledge of macroeconomics

PSI=1 if the new teaching method is used

=0 otherwise

40

GPA

grade

TUCE

grade

PSI Grade Letter

grade

GPA

grade

TUCE

grade

PSI Grade Letter

grade

1 2.66 20 0 0 C 17 2.75 25 0 0 C

2 2.89 22 0 0 B 18 2.83 19 0 0 C

3 3.28 24 0 0 B 19 3.12 23 1 0 B

4 2.92 12 0 0 B 20 3.16 25 1 1 A

5 4.00 21 0 1 A 21 2.06 22 1 0 C

6 2.86 17 0 0 B 22 3.62 28 1 1 A

7 2.76 17 0 0 B 23 2.89 14 1 0 C

8 2.87 21 0 0 B 24 3.51 26 1 0 B

9 3.03 25 0 0 C 25 3.54 24 1 1 A

10 3.92 29 0 1 A 26 2.83 27 1 1 A

11 2.63 20 0 0 C 27 3.39 17 1 1 A

12 3.32 23 0 0 B 28 2.67 24 1 0 B

13 3.57 23 0 0 B 29 3.65 21 1 1 A

14 3.26 25 0 1 A 30 4.00 23 1 1 A

15 3.53 26 0 0 B 31 3.10 21 1 0 C

16 2.74 19 0 0 B 32 2.39 19 1 1 A

42

The logit model here can be written as

1 2 3 41

ii i i i i

i

PL GPA TUCE PSI u

P

Odds 95% CI

Predictor Coef SE Z P Ratio Lower Upper

Constant -13.0213 4.93100 -2.64 0.008

GPA 2.82611 1.26289 2.24 0.025 16.88 1.42 200.61

TUCE 0.0951577 0.141550 0.67 0.501 1.10 0.83 1.45

PSI=1 2.37869 1.06452 2.23 0.025 10.79 1.34 86.93

PSITUCEGPALi 3787.209516.082611.20213.13ˆ

By using MINITAB software

43

Interpretation of regression coefficient

For quantitative regressors, slope coefficient is a partial slope coefficient that measures the change in the estimated logit for a unit change in the value of given quantitative regressor keeping the effect of other regressors constant.

Coefficient of GPA

With other variables held constant one unit increase in GPA will increase on the average the estimated logit by 2.83 times

In terms of odd ratio, keeping the effect of other variables constant with one unit increase in GPA there is 17 times likely to get A-grade

8261.2e

PSITUCEGPALi 3787.209516.082611.20213.13ˆ

44

Interpretation of regression coefficient

Coefficient of TUCE

In terms of odd ratio, keeping the effect of other variablesconstant with one unit increase in TUCE there is 1.10 timesl likely to get A-grade.

Coefficient of PSI

In terms of odd ratio, keeping the effect of other variablesconstant, students who are exposed to the new method ofteaching are more than 10 times likely to get A-grade as compare to those students who are not exposed toit

09516.0e

ˆ 13.0213 2.82611GPA 0.09516 TUCE 2.3787 PSIiL

2.3787e

45

Probit Regression Models

Probit is a variant of logit modeling based ondifferent data assumptions.

The term "probit' was coined in the 1930's byChester Bliss and stands for probability unit.These two analyses, logit and probit, are verysimilar to one another.

46

Similarities and differences

between Logit and Probit models

Neither the logit model nor the probit model are linear,which makes things difficult.

To make the model linear, a transformation is done onthe dependent variable.

Both methods use maximum likelihood, and so requiremore cases than a similar OLS model. Unlike logitmodels, we don't get odds ratios with probit models.

Probit regression is an alternative log-linear approach tohandling categorical dependent variables.

49

The probit model is defined as

Pr(y=1|x) = Φ(xb)

where Φ is the standard cumulative normalprobability distribution and xb is called the probitscore or index.

Interpretation

Since xb has a normal distribution so itsinterpretation can be made as, the probit coefficient,b, is that a one-unit increase in the predictor leads toincreasing the probit score by b standard deviations

Here ‘xb’ is representing a multiplication of twomatrices, where x is a matrix of x’s and b is a matrixof β’s that is

x = [1,x1,x2,x3,…,xn] and b = [β1,β2,β3,…,βn]

50

The log-likelihood function for probit is

where wj denotes optional weights

Logit models are more popular than probit

models.

51

Example: If salary of people is given and we are interestedto find which of them is likely to own a house. Hereowning a house is binary variable (also dependentvariable) as every one will either buy a house or not asthere is not third situation can happen. So in thissituation Probit regression will be a better fit than LPM(linear probability model).A data on Xi (INCOME), ni (Number of families at income Xi ),and Ri(Number of families owning a house) . is given below

-----------------------------------------------------------------

Xi ni Ri

(thousand of dollars)

-----------------------------------------------------------------

6 40 8

8 50 12

10 60 18

13 80 28

15 100 45

20 70 36

25 65 39

30 50 33

35 40 30

40 25 20

-----------------------------------------------------------------

52

For Probit regression estimating the index , I, from the standard normal, CDF as

----------------------------------------------------------------------------------------------

Xi ni Ri P^i Ii = F-1(P^i )

(thousand of dollars)

----------------------------------------------------------------------------------------------

6 40 8 0.20 -0.8416

8 50 12 0.24 -0.7063

10 60 18 0.30 0.5244

13 80 28 0.35 0.3853

15 100 45 0.45 0.1257

20 70 36 0.51 0.0251

25 65 39 0.60 0.2533

30 50 33 0.66 0.4125

35 40 30 0.75 0.6745

40 25 20 0.80 0.8416

---------------------------------------------------------------------------------------------

53

Results :

The results are as follows

Dependent Variable: I---------------------------------------------------------------------------------------------

Variable coefficient std. error t-statistic probability

---------------------------------------------------------------------------------------------

C -1.0166 0.0572 -17.7473 1.0397E-07

Income 0.04846 0.00247 19.5585 4.8547E-08

---------------------------------------------------------------------------------------------

R2 = 0.97951 Durbin Watson statistics = 0.91384

54

Discussion :The index Ii is expressed as

Ii = β1 + β2Xi

The model is

Pi = P(Y = 1/X) = P(Z ≤ Ii) = P(Z ≤ β1 + β2Xi) = F(β1 +β2Xi)

We want to find out the effect of a unit change in X(income measured in thousands of dollars) on theprobability that Y = 1, that is, a family purchases ahouse. So here we will take the derivative of the functionwith respect to X (that is, rate of change of probabilitywith respect to income)

dPi/dxi = f(β1 + β2Xi)β2

where f(β1 + β2Xi) is a standard normal probabilitydensity function evaluated at β1 + β2Xi.

55

Take X= 6 (thousand dollars)

So the normal density function is

f[-1.0166 + 0.04846(6)] = f(-0.72548)

Referring to normal distribution tables Z = -0.72548,the normal density is about 0.3066

Now,

0.3066* β2 = 0.3066*0.04846 = 0.01485 = 1.4%

So starting with an income level of $6000, if theincome goes up to $1000, the probability of familypurchasing a house goes up by about 1.4%.

56

Practical Importance of Probit Regression

The probit analysis is commonly used

to analyze the data from medical field.

57

Drawback of Probit Regression:

The drawback is that, probit coefficients are moredifficult to interpret, hence they are less used,though the choice is largely one of personalpreference. Both the cumulative standard normalcurve used by probit as a transform and the logisticcurve used in logistic regression display the S-shaped curve.

58

Basic assumption of Probit Regression:

As logistic regression is based on the assumption

that the categorical dependent reflects an underlying

qualitative variable and uses the binomial

distribution, probit regression assumes the

categorical dependent reflects an underlying

quantitative variable and it uses the cumulative

normal distribution.

Download - Qualitative Response Regression Models · 2016. 10. 5. · In models where the response, Y, is qualitative, our objective is to find the probability of something happening, such as

Top Related