panel data analysis of household brand choices

43
Journal of Econometrics 103 (2001) 111–153 www.elsevier.com/locate/econbase Panel data analysis of household brand choices Pradeep Chintagunta a , Ekaterini Kyriazidou b; , Josef Perktold c a Graduate School of Business, University of Chicago, Chicago, IL 60637, USA b Department of Economics, UCLA, 8283 Bunche Hall, Box 951477, Los Angeles, CA 90095-1477, USA c Department of Economics, University of Chicago, Chicago, IL 60637, USA Abstract The paper examines theoretical and empirical issues arising in panel data stud- ies of household brand choices. We develop a dynamic utility maximization model with habit formation that yields a discrete choice model that is linear in a vector of observable individual and brand characteristics, the lagged choice, an unobservable permanent individual= brand-specic eect, and an unobservable time-varying error component. We estimate the model using panel data on household yogurt purchases. We compare traditional estimation procedures with the method recently proposed by Honor e and Kyriazidou. Panel data discrete choice models with lagged dependent variables, Econometrica 68 839 –874. The methods’ robustness with respect to un- derlying assumptions is investigated in Monte Carlo simulations. ? 2001 Elsevier Science S.A. All rights reserved. JEL classication: C13; C23; M31 Keywords: Panel data; Dynamic discrete choice; Brand choice 1. Introduction Brand choice is a predominant area of marketing research. In an oligopolis- tic market with dierentiated products and heterogeneous consumer prefer- ences, understanding the determinants of agents’ purchase behavior is Corresponding author. Tel.: +1-310-206-2794; fax: +1-310-825-9528. E-mail address: [email protected] (E. Kyriazidou). 0304-4076/01/$ - see front matter ? 2001 Elsevier Science S.A. All rights reserved. PII: S0304-4076(01)00041-0

Upload: pradeep-chintagunta

Post on 03-Jul-2016

226 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Panel data analysis of household brand choices

Journal of Econometrics 103 (2001) 111–153www.elsevier.com/locate/econbase

Panel data analysis of household brand choices

Pradeep Chintaguntaa, Ekaterini Kyriazidoub; ∗,Josef Perktoldc

aGraduate School of Business, University of Chicago, Chicago, IL 60637, USAbDepartment of Economics, UCLA, 8283 Bunche Hall, Box 951477, Los Angeles, CA

90095-1477, USAcDepartment of Economics, University of Chicago, Chicago, IL 60637, USA

Abstract

The paper examines theoretical and empirical issues arising in panel data stud-ies of household brand choices. We develop a dynamic utility maximization modelwith habit formation that yields a discrete choice model that is linear in a vector ofobservable individual and brand characteristics, the lagged choice, an unobservablepermanent individual=brand-speci,c e-ect, and an unobservable time-varying errorcomponent. We estimate the model using panel data on household yogurt purchases.We compare traditional estimation procedures with the method recently proposed byHonor/e and Kyriazidou. Panel data discrete choice models with lagged dependentvariables, Econometrica 68 839–874. The methods’ robustness with respect to un-derlying assumptions is investigated in Monte Carlo simulations. ? 2001 ElsevierScience S.A. All rights reserved.

JEL classi-cation: C13; C23; M31

Keywords: Panel data; Dynamic discrete choice; Brand choice

1. Introduction

Brand choice is a predominant area of marketing research. In an oligopolis-tic market with di-erentiated products and heterogeneous consumer prefer-ences, understanding the determinants of agents’ purchase behavior is

∗ Corresponding author. Tel.: +1-310-206-2794; fax: +1-310-825-9528.E-mail address: [email protected] (E. Kyriazidou).

0304-4076/01/$ - see front matter ? 2001 Elsevier Science S.A. All rights reserved.PII: S0304-4076(01)00041-0

Page 2: Panel data analysis of household brand choices

112 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

important since the willingness of consumers to switch brands a-ects demandelasticities and hence the degree of competition in the industry.Taste di-erences across consumers for the di-erent brands may be at-

tributed to observable (to the researcher) variables, such as demographicsand marketing e-orts by ,rms, but also to permanent unobserved heterogene-ity in consumers’ preferences and to unobservable transitory ‘taste shocks’.Habit formation is often thought to be another important determinant of brandchoice. In this case, having consumed a brand in the past a-ects currentbrand choice. Both permanent unobserved heterogeneity and the presence ofhabit formation may create the often observed serial persistence in consumers’brand choices. Distinguishing ‘true state dependence’, due to habit formation,from ‘spurious state dependence’, due to the presence of permanent unob-served heterogeneity, has been long recognized as an important issue in theeconomics literature (see, for example, Heckman, 1981). The issue is also ofparticular interest to marketing researchers: the presence of habit formationa-ects ,rms’ behavior, since in this case a ,rm has an incentive to attractconsumers from other brands using, for example, temporary price promotionsin order to enjoy higher revenues through future loyal consumption of itsown brand. Thus, the presence of state dependence a-ects the nature of com-petition in an industry, by introducing dynamic aspects in ,rms’ marketingpolicies.The typical approach adopted in the marketing literature to study brand

choice uses panel data (for a speci,c product category) from several house-holds. The systematic component of a brand’s utility is usually assumed to bea linear function of marketing variables, such as price and promotions, and ofhousehold characteristics. In order to capture the e-ects of previous purchasesof brand choices, a variable that measures brand loyalty is often introduced.This variable is operationalized either as the most recent purchase (Jonesand Landwehr, 1988) or as an exponentially weighted sum of all previouschoices made by the household (Guadagni and Little, 1983). Observationswithin and across households are then pooled, and standard maximum like-lihood methods (probit, logit) are used to estimate the e-ects of marketingvariables and of brand loyalty on choice behavior. In order to account for thepresence of unobserved heterogeneity across households, both ,xed e-ectsmethods (see, for example, Jones and Landwehr, 1988) and predominantlyrandom e-ects methods (see, for example, Jain et al., 1994; Keane, 1997)have been used. More recently, Bayesian methods have been proposed forestimation of panel data models with time-invariant individual-speci,c e-ects(see, for example, McCulloch and Rossi, 1994; and Rossi et al., 1996). Sim-ulation methods (see, for example, Geweke et al., 1994; Borsch-Supan andHajivassiliou, 1993) are also increasingly being used to estimate panel datarandom coeGcients models. Keane (1997) provides a recent and comprehen-sive review of the marketing literature on brand choice.

Page 3: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 113

There are two issues about the typical approach of analyzing consumerbehavior that are worth noting—one theoretical, and the other empirical. Re-garding the theoretical issue, the commonly used speci,cation assumes thatthe consumer is maximizing utility on each purchase occasion. However,if the systematic component of a brand’s utility at a given point in timeis a-ected by the previous choice(s) made by the consumer (i.e., if brandloyalty is an important driver), then such behavior, that ignores the impact ofcurrent choices on future choices, is in general inappropriate. Rather, it maybe reasonable to assume that consumers recognize the inter-temporal linkagein their utility. Consequently, observed choices will be the solution to a dy-namic optimization problem. A question of research interest that arises is:Under what conditions will such dynamic utility maximizing behavior yieldthe standard model of brand choice which includes the lagged decision as anexplanatory variable? An answer to this question will provide a link betweeneconomic theory and the typical econometric speci,cation.To address the theoretical question above, in this paper we set up a dy-

namic model of brand choice. In the spirit of Deaton and Muellbauer (1980)and Hanemann (1984), the consumer’s utility in each period depends on theconsumption of the product category under consideration which comes indi-erent alternative forms (brands) that are perfect substitutes. Each alter-native enters the utility function weighed by a ‘quality index’ that dependsnot only on di-erent exogenous variables (such as brand promotions and ad-vertising, demographics, etc.) and unobserved permanent and time-varyinge-ects (individual=brand heterogeneity and taste shocks), but also on thechoice made by the consumer in the previous period. Consumers maximizeexpected discounted utility over an in,nite time horizon by choosing theoptimal sequence of purchase decisions and the optimal quantities for eachbrand. Based on this model, we derive conditions under which the decisionrule for choosing an alternative on a given purchase occasion reduces to adiscrete choice threshold crossing model that is linearly additive in the exoge-nous variables, the lagged endogenous dependent variable, and unobservabletime-invariant individual=brand-speci,c e-ects.A second issue that the previously mentioned studies in the marketing lit-

erature raise is empirical in nature. The typical random e-ects approach inestimating discrete choice models with lags of the dependent variable andunobserved individual=brand-speci,c individual e-ects conditions on the ini-tial observations, treating them as exogenous variables. 1 This assumption,however, seems untenable in the presence of permanent unobserved hetero-geneity, and its violation leads in general to inconsistent estimation of all

1 Endogeneity of initial conditions is considered in Erdem and Keane (1996), and Roy etal. (1996). Erdem and Keane also allow for possible serial correlation in the idiosyncraticdisturbances.

Page 4: Panel data analysis of household brand choices

114 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

parameters of the model. Furthermore, it is widespread practice to assumethat the time-invariant unobserved e-ects are also independent of the observedcovariates, which is another source of potential inconsistency in estimatingthe model. On the other hand, well-known ,xed e-ects methods for esti-mating panel data discrete choice models, such as the conditional likelihood(see, for example, Chamberlain, 1984) and the conditional maximum scoreapproach (Manski, 1987), do not require any assumptions about the statisticalrelationship between the observed covariates and the individual=brand-speci,ce-ects. However, their validity rests crucially on the strict exogeneity of theobserved covariates. 2 In a recent paper, Honor/e and Kyriazidou (2000) showthat it is possible to identify and consistently estimate panel data discretechoice models in the presence of exogenous variables, the lagged endoge-nous variable, and unobserved heterogeneity. Their method allows for theindividual=brand-speci,c unobservable e-ects to be correlated with the exoge-nous variables included in the model in an unspeci,ed manner. In addition,it does not require modeling of the initial conditions or of their statisticalrelationship with the unobserved heterogeneity.Given the wide range of maintained assumptions in the discrete choice lit-

erature, the question of empirical interest that arises is: How robust are theestimates of the parameters of interest across econometric methods, and howsensitive are these methods to model misspeci-cation? To answer this ques-tion, in this paper, we apply a variety of econometric approaches to estimatethe structural parameters of a model of brand choice using household paneldata on yogurt purchases. In particular, we compare the results obtained fromthe Honor/e and Kyriazidou (2000) method with those obtained from standardmethods, such as the conditional logit and the pooled logit approach, withand without random e-ects. In addition, we carry out a Monte Carlo sim-ulation study using the design matrix of the data. Our goal is to identifysituations under which the di-erent estimation methodologies are most reli-able. We investigate the sensitivity of the estimated parameters with respectto di-erent levels of heterogeneity and to the presence of correlation betweenthe household-speci,c e-ects and the exogenous variables, as well as theirsensitivity with respect to di-erent assumptions concerning the distribution ofinitial observations.We should point out that the analysis in this paper is limited in sev-

eral aspects. On one hand, the assumptions underlying the theoretical modelare strong. In order to derive a tractable solution to the agent’s dynamicprogramming problem of the form typically assumed in empirical applica-tions, we assume a speci,c functional form for the utility function. Such

2 As we discuss in Section 3, a conditional likelihood approach may be used for discretechoice models with lags of the dependent variable and unobserved individual e-ects providedthat the model does not contain any other explanatory variables.

Page 5: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 115

assumptions typically underlie explicitly or implicitly the theoretical frame-work of many economic applications in labor, industrial organization, andother ,elds. Perhaps more importantly, we assume serial independence forthe problem’s state variables (prices, ,rms’ promotional e-orts, transitory er-ror shocks). This implies that the consumer cannot forecast these variablesin optimizing his behavior and hence expectations for their future realiza-tions do not enter his decision rule. While strong and perhaps untenable, thisassumption underlies many analyses of consumer behavior in the market-ing literature. 3 Despite the restrictiveness of the assumptions, the theoreticalmodel that we propose may serve as a useful benchmark that exempli,esconditions under which a commonly used econometric model is consistentwith dynamic utility maximizing behavior.On the other hand, the empirical analysis in the paper is restricted for most

parts to only two brands. This is done primarily for computational reasons.It raises, however, potentially important sample selection issues which webrieKy examine by extending the analysis to three brands. Another importantissue that both our theoretical and empirical analyses ignore is the issue ofpossible endogeneity of the timing and frequency of purchases. 4 We shouldtherefore point out the limited economic signi,cance of our point estimates.However, we hope that the results provide insight into the more generalproblem of multiple brand choice.The main alternative to our approach of reducing the dynamic optimization

problem to the ‘static’ discrete choice model is by using numerical methodsfor solving the optimization problem within the estimation procedure. GLonLul(1999) and Erdem and Keane (1996) are recent examples of this approach.The main advantage of their approach is in the ability to simulate changesin the marketing policies that are not subject to the Lucas critique. However,allowing for unobserved permanent heterogeneity increases substantially thecomplexity and computational cost. GLonLul (1999) does not allow for any per-manent heterogeneity, while Erdem and Keane (1996) restrict it to di-erencesin the learning experience. As we show in this paper, the e-ects of state de-pendence are strongly inKuenced by the speci,cation of heterogeneity. If thisspeci,cation is not correct, then the estimated e-ect of past brand choices canbe considerably biased. The decision which approach to use should thereforedepend on the relative importance of heterogeneity versus dynamic Kexibilityin the speci,c application.The remainder of this paper is organized as follows. Section 2 presents a

theoretical model of dynamic discrete choice. Section 3 describes the di-erentestimation methods for binary choice models. Section 4 describes the data and

3 A recent exception is Erdem and Keane (1996).4 Some of these issues have been addressed in the marketing literature (see, for example, Jain

and Vilcassim, 1991; Vilcassim and Jain, 1996).

Page 6: Panel data analysis of household brand choices

116 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Section 5 presents the estimation results. Section 6 presents the results of ourMonte Carlo study and Section 7 concludes.

2. A model of dynamic discrete choice

Each consumer is assumed to maximize expected lifetime utility which isde,ned over two goods. The ,rst good is available in J alternative forms(brands). The second good is a composite good. In each period the agent’sutility function is given by

u

(J∑

j=1 jityjit

)+ zit ;

where u(·) is a strictly concave function with u′ ¿ 0 and u′(0)=∞, and zit isthe quantity of the composite good. In the speci,cation above, which followsHanemann (1984), we assume that the quantities, y1it ; : : : ; yJit , of the J alter-native brands of the ,rst good, enter the utility function with multiplicativetime-varying quality indices, 1it ; : : : ; Jit , that represent the agent i’s subjec-tive evaluation of each brand in each time period. These quality indices areassumed to be determined as 5

jit = exp(

J∑k=1

Xkit�kj +J∑

k=1�kjdkit−1 + �ji + �jit

);

j = 1; : : : ; J ; t = 1; 2; : : : ;

where dkit−1 ≡ 1{ykit−1 ¿ 0} is the indicator function that takes the value 1, ifthe jth brand was consumed in the previous period, and 0 otherwise. The vari-ables in Xjit include individual=brand characteristics known to the consumerat the beginning of period t and are also observed by the econometrician.�jit is a scalar variable which may, for example, represent an individual- andtime-speci,c taste shock for brand j. �ji is an individual=brand-speci,c per-manent taste component that may depend on Xji0 (and possibly on the entiresample path of deterministic variables). Both �jit and �ji are assumed to beobservable by the agent at the beginning of each time period but are notobserved by the econometrician. Apart from the habit e-ects incorporated inthe term

∑ Jk=1 �kjdkit−1, the above speci,cation is quite standard in the litera-

ture of static random utility maximization models. Note that this speci,cationof the quality indices distinguishes between own e-ects (�jj; �jj) and crosse-ects (�kj; �kj). Furthermore, it allows the cross e-ects (�kj; �kj) to vary withk for the same j, and also to be di-erent across j.

5 Note that the quality indices for the initial time period, ji0, are not speci,ed. Thus, theymay depend on xji0; �ji, and �ji0 in an arbitrary way.

Page 7: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 117

The speci,cation of the period utility function implies that the quality-adjusted alternative forms of the ,rst good are perceived to be perfectsubstitutes for each other. Furthermore, we assume away income e-ects bypostulating that each period’s utility function is quasi-linear in the compositegood which may be thought of as being money. The reason for this is thatthe product categories that we are often interested in (e.g. yogurt, detergent,soft drinks) represent only a small fraction of consumer income.We will further assume that markets are complete so that the agent’s budget

constraint may be expressed as a single equation:

∞∑t=0

[J∑

j=1p̃jityjit

]+

∞∑t=0

q̃tzit = PVIi; (1)

where {p̃jit} Jj=1 and q̃t are the period t prices of the J brands and the com-

posite good, respectively, expressed in present value terms, and PVIi is thepresent value of the agent’s lifetime income.We next turn to the stochastic speci,cation of the model. We assume that

at the beginning of his lifetime the agent faces uncertainty with respect tothe future realizations of ((Xjit ; �jit ; p̃jit)

Jj=1) and his income. The uncertainty

with respect to their current realizations is resolved in the beginning of eachtime period before the agent makes his choice. On the other hand, the priceof the composite good will be assumed to evolve deterministically. De,ning� to be the agent’s discount factor, his problem is therefore:

max{(yjit) J

j=1 ;zit}∞t=0

E0

{∞∑t=0

�t

[u

(J∑

j=1 jityjit

)+ zit

]}

subject to the budget constraint (1). Here, E0 denotes expectations with re-spect to the initial period’s information set.Note that due to the quasi-linearity of the utility function with respect to

zit , if PVI is large enough, the agent will in each period choose optimallythe consumption of one of the brands and will spend the rest of his incomeon the composite good in the period where the price of zit , corrected fortime preference, is lowest. We will assume for simplicity that q̃t = �t (i.e.the current marginal utility of money in every time period is constant andequal to 1) so that there is perfect substitutability in the consumption ofzit over time, i.e. the agent is indi-erent as to the period in which he willconsume the composite good. With this assumption the agent’s problem isto maximize

E0

{∞∑t=0

�t

[u

(J∑

j=1 jityjit

)−

J∑j=1

pjityjit

]}+ E0PVIi

Page 8: Panel data analysis of household brand choices

118 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

over {(yjit) Jj=1}∞t=0, where pjit ≡ p̃jit=�

t is the discounted price of the jthbrand in period t. The expected present value of income enters the opti-mization problem as an additive constant which does not a-ect the optimalconsumption of the non-composite good. For simplicity, we ignore this termfrom here on.Note that, after the removal of income e-ects, the only dynamic link is

through the habit formation due to the presence of the previous brand choicein the current quality indices. Thus, if we condition on past and current brandchoice, future utility will not depend on the quantity decision in the currentperiod. Therefore, if the agent chooses to consume only brand j in period t,the optimal quantity, y∗

jit , solves the static problem:

maxyjit

u( jityjit)− pjityjit :

The ,rst order condition is u′( jity∗jit) jit = pjit and it implies that jity∗

jit =u′−1(pjit= jit). The conditional indirect utility function for brand j given theagent’s past choice is therefore,

u( jity∗jit)− pjity∗

jit = u( jit jit

u′−1(pjit

jit

))− pjit

jitu′−1

(pjit

jit

)

≡ v(pjit

jit

);

where v′ ¡ 0, i.e. it is only a function of the quality-adjusted price of brandj. This is similar to the static case when there are no habit e-ects, i.e. �jk =0for all j; k (see Hanemann, 1984). In the latter case, the agent will choosebrand j at time t if

pjit

jit6

plit

litfor all l �= j: (2)

In the presence of habit formation, however, the agent has in general to takeinto account that his current consumption a-ects his future evaluation of theJ brands through the quality indices. 6 In other words, in determining hiscurrent choice of brand the agent has to account for his expectations aboutthe future realizations of the state variables 7 zjit ≡ (Xjit ; �jit ; pjit) and djit−1

for all j. This may be seen from the Bellman equation which in the two good

6 We assume that the consumer cannot buy a small amount of each brand to form habitsfor all brands. This would be implied if habits are formed only for the brand with the highestconsumed quantity, i.e. if the indicators for the previous choice in the current quality indexwere of the form dkit−1 ≡ 1{yjit−1 ¿ 0; yjit−1 ¿ylit−1 for all l �= j}.

7 Note that �ji is not a state variable since its value is revealed at the beginning of the agent’slife and is constant thereafter.

Page 9: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 119

case (j = 1; 2) is 8

V (z1it ; z2it ; d1it−1; d2it−1)

=max

v(p1it

1it

)+�

∫V (z1it+1; z2it+1; 1; 0) dF(z1it+1; z2it+1|z1it ; z2it)

v(p2it

2it

)+�

∫V (z1it+1; z2it+1; 0; 1) dF(z1it+1; z2it+1|z1it ; z2it)

:

An analytic solution to the agent’s problem is in general infeasible, unlesshe is myopic (�= 0), in which case his decision rule collapses to the sameone as in the static case (2). To the extent that the agent can forecast futurezjit’s based on their current realizations, his period t value function dependsnot only on the ratios pjit= jit but also on the levels of each one of the statevariables. Below we describe two cases which imply a decision rule of theform of (2). In both cases the state variables (Xjit ; �jit ; pjit) are assumed tobe independent over time so that they are not forecastable. This implies thatthe two future expected values in the two sums above, which condition ondi-erent period t brand choices, will be in general di-erent, albeit constantover time, i.e.

W1i ≡ �∫

V (z1it+1; z2it+1; 1; 0)F(z1it+1; z2it+1)

�= �∫

V (z1it+1; z2it+1; 0; 1)dF(z1it+1; z2it+1)

≡W2i :

The ,rst case, where the decision rule collapses to the static=myopic one ofEq. (2), is when the agent is ex ante indi-erent between the two brands. Thislatter situation will occur if, in addition to the serial independence assumptionon zjit ≡ (Xjit ; �jit ; pjit), we assume that (i) �ji=�li for all j �= l, i.e. the agentdoes not have an intrinsic taste for any particular brand; (ii) the zjit’s areidentically distributed across brands; and (iii) the following restrictions on theagent’s structural preference parameters hold: ,rst, the brand j characteristicsenter the own quality index jit with the same coeGcients for all brands,i.e. �jj = �o, and �jj = �o for all j (here o stands for own e-ects). Second,all other brands’ characteristics enter brand j’s quality index with the samecoeGcients which are also the same for all quality indices, i.e. �kj =�jk =�c,and �kj = �jk = �c for all j �= k (here c stands for cross e-ects). Conditions(i)–(iii) imply that the value function is symmetric in all its arguments. Underthese symmetry restrictions and the independence over time assumption on

8 Note that the condition u′(0) =∞ on the subutility function implies that the consumer willbuy a positive quantity of one of the brands in each period.

Page 10: Panel data analysis of household brand choices

120 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

all state variables, it is clear that Wji = Wli for all j �= l. The agent willtherefore choose brand j at time t if pjit= jit 6 plt= lit for all l �= j, similarlyto the static and myopic case.It is also possible to derive a similar decision rule without any of the

symmetry restrictions (i)–(iii) if we are willing to assume that the subutilityfunction u(·), and hence v(·), is logarithmic. Under independence of zjit overtime, the agent will choose brand 1 over brand 2 if

v(p1it

1it

)+W1i ¿ v

(p2it

2it

)+W2i ;

which with logarithmic utility is equivalent to

ln 1it − lnp1it +W1i ¿ ln 2it − lnp2it +W2i ⇔ln ( 1it exp(W1i))− lnp1it ¿ ln ( 2it exp(W2i))− lnp2it ⇔

p1it

1it exp(W1i)6

p2it

2it exp(W2i);

i.e. the agent will choose the brand with the lowest price adjusted for bothquality and expected future utility—compare with the expression in (2) whichapplies to the static and myopic cases. The above analysis naturally general-izes to the case of more than two brands.In the two-brand case the decision rule is therefore,

d1it =1{X1it�1 − X2it�2 + �1d1it−1 − �2d2it−1 − lnp1it + lnp2it + �i

+(�1it − �2it)¿ 0}; t = 1; : : : ; T;

where �1 ≡ (�11 − �12); �2 ≡ (�22 − �21); �1 ≡ (�11 − �12); �2 ≡ (�22 − �21),and �i ≡ (�2i +W2i)− (�1i +W1i) ≡ �̃2i − �̃1i.Given data {(djit ; Xjit ; pjit)Tt=0}j=1;2, it is clear that the parameters of the

discrete choice model above can only be identi,ed up to scale. Furthermore,for brand-speci,c variables in Xjit , only the di-erence �j ≡ �jj − �jk be-tween the own e-ect �jj and the cross e-ect �jk is identi,ed. For purelyindividual-speci,c such as demographic variables, only their di-erential acrossbrands e-ect, �j − �k , can be identi,ed. Similarly, for the feedback param-eters �jk , we can only identify the di-erence �j ≡ �jj − �jk . Note, however,that since d1it + d2it =1 for all t; �1 and �2 cannot be separately identi,ed ifwe allow for an intercept in the model to accommodate di-erent (non-zero)means for (�̃1i + �1it) and (�̃2i + �2it). Finally, the theoretical model impliesthat brand prices enter in logarithms and with opposite coeGcients that havethe same absolute magnitude.The familiar binary logit model is obtained if we assume that the �jit’s

are independent of Xjit ; pjit , and �ji for all j, and are independent of eachother and identically distributed according to the extreme value distributionwith scale parameter, say, !¿ 0. If, in addition, the fore-mentioned symmetry

Page 11: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 121

restrictions on the structural preference parameters hold, namely �11=�22=�o,�12=�21=�c; �11=�22=�o and �12=�21=�c (which in the notation above implythat �1 = �2 and �1 = �2), we obtain the binary logit model with individuale-ects and state dependence of the form often encountered in applied research:

Pr(dit = 1|xi; �i; di0; : : : ; dit−1) =exp(�i + xit�+ �dit−1)

1 + exp(�i + xit�+ �dit−1);

t = 1; : : : ; T;

where dit ≡ 1{y1it ¿ 0}; xi ≡ {xit}Tt=0; xit ≡ (X1it − X2it ; lnp1it − lnp2it); �i ≡(�̃1i − �̃2i)1=!; � ≡ (�o − �c;−1)′1=!, and � ≡ (�o − �c)1=!.In the next section we describe the identi,cation and estimation approach

proposed by Honor/e and Kyriazidou (2000), along with other estimators thathave been used to analyze household brand choices. We will focus on the casewhere the length of the panel T is small, which is the case most frequentlyencountered in applied research.

3. Estimators for dynamic discrete choice models

Honor/e and Kyriazidou (2000) consider the panel data logit model ofSection 2, which contains unobservable individual-speci,c e-ects, exogenousexplanatory variables, as well as the dependent variable lagged once:

Pr(di0 = 1|xi; �i) = p0(xi; �i); i = 1; : : : ; n;

Pr(dit = 1|xi; �i; di0; : : : ; di; t−1) =exp(xit�+ �dit−1 + �i)

1 + exp(xit�+ �dit−1 + �i);

t = 1; : : : ; T ; T ¿ 3: (3)

Here, � is the parameter of interest, and �i is an individual-speci,c e-ectwhich may depend on the exogenous explanatory variables xi ≡ (xi1; : : : ; xiT ).The model is left unspeci,ed in the initial period 0 of the sample, since thevalue of the dependent variable is not assumed to be known in periods priorto the sample. It is assumed, however, that di0 is observed, so that thereare at least four observations per individual. It is not necessary, however,to assume that the explanatory variables are observed in the initial sampleperiod. It is important to note the implicit assumption that the transitoryerror terms in a threshold-crossing model leading to (3) are independent andidentically distributed over time with logistic distributions, and independentof (xi; �i; yi0) in all time periods.

For model (3), Chamberlain (1993) has shown that, if individuals are ob-served in three time periods, i.e. if T = 2, then the parameters of the modelare not identi,ed. Honor/e and Kyriazidou (2000) show that � and � are both

Page 12: Panel data analysis of household brand choices

122 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

identi,ed (subject to regularity conditions) in the case where the econometri-cian has access to four or more observations per individual, i.e. T ¿ 3. Wenext describe Honor/e and Kyriazidou’s identi,cation strategy for T =3. Con-sider the events A={di0; di1 =0; di2 =1; di3} and B={di0; di1 =1; di2 =0; di3},where di0 and di3 are either 0 or 1. Then,

Pr(A|xi; �i) =p0(xi; �i)di0 (1− p0(xi; �i))1−di0

× 11 + exp(xi1�+ �di0 + �i)

× exp(xi2�+ �i)1 + exp(xi2�+ �i)

×exp(di3xi3�+ di3�+ di3�i)1 + exp(xi3�+ �+ �i)

and

Pr(B|xi; �i) =p0(xi; �i)di0 (1− p0(xi; �i))1−di0

× exp(xi1�+ �di0 + �i)1 + exp(xi1�+ �di0 + �i)

× 11 + exp(xi2�+ �+ �i)

×exp(di3xi3�+ di3�i)1 + exp(xi3�+ �i)

:

In general, Pr(A|xi; �i; A∪B) will depend on �i. However, if xi2 = xi3, then

Pr(A|xi; �i; A ∪ B; xi2 = xi3) =1

1 + exp((xi1 − xi2)�+ �(di0 − di3)); (4)

which does not depend on �i. In the special case where all the explanatoryvariables are discrete and the xit process satis,es Pr(xi2 = xi3)¿ 0, one canuse (4) to make inference about �. The resulting estimator will have all theusual properties (consistency and root-n asymptotic normality).While inference based only on observations for which xi2 = xi3 may be

reasonable in some cases (in particular, experimental cases where the dis-tribution of xi is in the control of the researcher), it is not useful in manyeconomic applications. However, if the continuous variables in xi2 − xi3 havepositive density at 0, we may think of constructing estimators that use ob-servations for which xi2 is close to xi3. In particular, assuming for ease ofexposition that all of the k variables in xit are continuously distributed, andthat sampling across individuals is random, Honor/e and Kyriazidou proposeestimating � and � by maximizing

n∑i=1

1{di1 + di2 = 1}K(xi2 − xi3

hn

)

ln(

exp((xi1 − xi2)b+ g(di0 − di3))di1

1 + exp((xi1 − xi2)b+ g(di0 − di3))

)

Page 13: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 123

over some compact set. Here K(·) is a kernel density function which gives theappropriate weight to observation i, while hn is a bandwidth which shrinks tozero as n increases. The asymptotic theory will require that K(·) be chosenso that a number of regularity conditions, such as K()) → 0 as |)| → ∞,are satis,ed. The e-ect of the term K((xi2 − xi3)=hn) is to give more weightto observations for which xi2 is close to xi3. The estimator *̂n ≡ (�̂n; �̂n) of*0 ≡ (�; �) is shown to be consistent and to converge to a normal distributionat rate

√nhk

n, which although slower than the standard√n rate, can be made

close to√n under appropriate smoothness assumptions.

The identi,cation idea described above extends in a natural manner tothe case of more than four observations per individual and also to the caseof multinomial logit models. It is based on sequences where an individualswitches between alternatives in any two of the middle T −1 periods. For thebinary choice model, the objective function in the case of general T takesthe form:n∑

i=1

∑16t¡s6T−1

1{dit + dis = 1}K(xit+1 − xis+1

hn

)

ln(exp((xit−xis)b+g(dit−1−dis+1)+g(dit+1−dis−1)1{s− t ¿ 1})dit

1+exp((xit−xis)b+g(dit−1−dis+1)+g(dit+1−dis−1)1{s− t ¿ 1})):

(5)

Furthermore, Honor/e and Kyriazidou (2000) also show that the model isidenti,ed even in the case where the logit assumption is relaxed and the distri-bution of the unobservable time-varying errors is left unspeci,ed. In either thelogistic or the semiparametric case, their approach su-ers from several limit-ations: (i) the assumption that the errors in the underlying threshold-crossingmodel are independent over time. This assumption, however, typically under-lies most estimation approaches that rely on the maximum likelihood prin-ciple, due to the otherwise prohibitive computational cost implied by therequired integration over multiple dimensions. Furthermore, note that the in-dependence over time assumption is also required by the theoretical modeldeveloped in Section 2. (ii) The assumption that xit − xis has support in aneighborhood of 0 for any t �= s, which rules out time–dummies as ex-planatory variables. (iii) The fact that individual unobservable e-ects cannotbe estimated, and hence it is not possible to carry out predictions or com-pute elasticities for individual agents or at speci,ed values (e.g. means) ofthe explanatory variables. This latter restriction is also a drawback in all‘,xed e-ects’ approaches that eliminate the individual-speci,c e-ects. It is,however, possible to calculate average elasticities for the observed (sample)population, as we discuss in the working paper version. But in contrast toother likelihood-based approaches, the Honor/e and Kyriazidou approach doesnot require modeling of the initial observations of the sample. Further, it does

Page 14: Panel data analysis of household brand choices

124 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

not make any assumptions about the statistical relationship of the individuale-ects with the observed covariates or with the initial conditions.The idea underlying the identi,cation approach in Honor/e and Kyriazidou

(2000) is closely related to the conditional likelihood approach (see, for exam-ple, Chamberlain, 1984) for panel data logit models with individual-speci,ce-ects of the form of (3) when there is no dynamic feedback from the laggedchoice, i.e. �=0. Inference concerning � is based on the fact that, given thetotal number that the individual has chosen 1,

∑t dit , and given that there

has been at least one switch between the two alternatives, the conditionalprobability of a particular history of choices between 0 and 1 is independentof �i. � may be then estimated by maximizing the conditional log-likelihood:

n∑i=1

ln

exp

(b∑T

t=1 xitdit

)∑

c∈Ciexp(b∑T

t=1 xitct) ; (6)

where Ci = {c = (c1; : : : ; cT )|ct = 0 or 1 and∑T

t=1 ct =∑T

t=1 dit}. It is alsopossible to estimate � by maximizing

n∑i=1

T∑16t¡s6T

1{dit + dis = 1} ln(

exp((xit − xis)b)dit

1 + exp((xit − xis)b)

); (7)

i.e. by forming all possible pairs of choices dit and dis where there has beena switch. The estimators de,ned by (6) and (7) coincide only for T = 2.Although the pairwise estimator de,ned by (7) is not a maximum likelihoodestimator, it may be used in cases where T is large in which case the con-ditional likelihood approach (6) may become computationally infeasible. Theargument that leads to (6) and (7) breaks down, however, when the laggedchoice enters the model. In other words, the estimators de,ned either by (6)or (7) are inconsistent when dit−1 is included as an additional variable in xit .

It is also well known (see for example Chamberlain, 1985; Magnac, 1997)that the conditional likelihood approach may be used to estimate panel datalogit models of the form of (3) where the lagged dependent variable is theonly explanatory variable, i.e. � = 0, provided that there are at least fourobservations per individual. The resulting estimators, however, are again in-consistent when other explanatory variables besides the lagged choice areincluded in the model.We proceed to describe some of the other methods that are typically used

in estimating binary choice models. To facilitate comparisons with the con-ditional likelihood approach and Honor/e and Kyriazidou’s estimator, we willfocus on the logistic speci,cation and assume that the time-varying errorterms are serially independent.In the absence of the lagged dependent variable (� = 0) and of individ-

ual heterogeneity (�i = 0), the independence over time assumption on the

Page 15: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 125

time-varying errors implies that model (3) may be estimated by pooling ob-servations over time. The log-likelihood takes the form:

n∑i=1

T∑t=1

[dit ln.(xitb) + (1− dit) ln (1−.(xitb))]; (8)

where . is the logistic function: .(v) = exp(v)=(1 + exp(v)).In the absence of dynamics (�=0), and in the case where individual e-ects

are present, the standard random e-ects approach postulates a functional formfor the conditional density of the individual e-ects given the entire path of thexit process. Typically, however, the individual e-ects are assumed independentof the exogenous variables, i.e. f(�i|xi1; : : : ; xiT )=f(�i), where f(·) denotes adensity function speci,ed up to a ,nite number of parameters, e.g. it may betaken to be the density of the normal distribution. Under these assumptions,the log-likelihood takes the form:

n∑i=1

ln∫ T∏

t=1

[.(xitb+ �)dit (1−.(xitb+ �))(1−dit)]f(�) d�: (9)

In the case where � �= 0 and there is no individual heterogeneity (�i = 0)the log-likelihood is

n∑i=1

T∑t=1

[dit ln.(xitb+ gdit−1) + (1− dit) ln (1−.(xitb+ gdit−1))]

+n∑

i=1[di0 lnpi0 + (1− di0) ln (1− pi0)]: (10)

To the extent that T is small relative to n, assumptions have to be madeabout the initial observations di0. 9 The typical approach assumes that theseare exogenous and treats them as nonstochastic constants. The model is thenestimated by maximizing with respect to b and g the ,rst term of (10),which is the log-likelihood conditional on the initial choices di0 and whichcoincides with (8) with the lagged choice dit−1 entering as another variablein xit . The assumption that the initial choices are exogenous may be a rea-sonable assumption if the initial observations in the sample coincide withthe initialization of the process. However, even in this case, this exogeneityassumption will most likely fail if permanent unobserved individual hetero-geneity is present.In the case where �; �i �= 0, the log-likelihood is

n∑i=1

ln∫ T∏

t=1

[.(xitb+ �+ gdit−1)dit (1−.(xitb+ �+ gdit−1))(1−dit)

p0(xi; �)di0 (1− p0(xi; �))(1−di0)]f(�|xi) d�: (11)

9 See also the discussion of the initial conditions problem in Hsiao (1986, Chapter 7).

Page 16: Panel data analysis of household brand choices

126 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Thus, one needs to specify a functional form for the distribution of the ini-tial observations conditional on the exogenous variables and the individuale-ects, p0(xi; �), as well as a form for the distribution of the individuale-ects conditional on the entire path of the exogenous variables, f(�|xi).In most applied marketing research, however, estimation of the model isbased on (9) with the lagged choice entering as another variable in xit . Thatis, initial observations are treated as nonstochastic variables and individuale-ects are assumed independent of the observed covariates and the initialconditions.In the analysis that follows we will compare the results obtained by the

di-erent methods described above using a panel data set of household yogurtpurchases.

4. Data description

We use the A.C. Nielsen data on yogurt purchases in the city of SiouxFalls, South Dakota. The choice of Sioux Falls as a site for the panel wasdriven by (a) the proximity of its demographic pro,le to that of the UnitedStates, and (b) the ability to monitor purchasing in all major grocery out-lets. The complete data are for 2 years, from 1986 to 1988. During thattime period, a sample of households in the market were issued magnetizedcards. Each time the household shopped at a grocery store, it presented thecard at the check-out counter. All purchases made by the household werethen scanned and provided to the data-gathering agency. The agency alsocollected weekly data on marketing variables that inKuence consumer choice,such as the shelf prices for the di-erent brands, which brands, if any, wereon display in the store that week, and=or were featured in local newspaperadvertisements or in the stores’ Kyers. It is therefore possible to re-create thestore environment for each purchase occasion made by a household member.In addition to the marketing variables, detailed demographic information onthe households in the sample is also available.We choose the two dominant yogurt brands in this market, Yoplait and

Nordica, for the analysis. These brands account for 18.4 and 19.5%, respec-tively, of yogurt purchases in terms of weight, and 21.2 and 23.9 in terms ofnumber of units bought. We focus our attention on the most popular size ofthese brands, the 6 oz packages. This size accounts for the majority of pur-chases made of the selected brands (92% for Yoplait and 98% for Nordica,in terms of units bought).Our initial sample consists of 1318 households who bought yogurt at

least once between 17 September 1986 and 1 August 1988. On each pur-chase occasion a household member buying yogurt may purchase multi-ple units of the same variety and brand, or di-erent varieties of the same

Page 17: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 127

brand (e.g. non-fat, low-fat, etc.), or even di-erent brands. In the data usedin the estimation, we ignore the quantity decision, and record the brandchoice on each purchase occasion irrespective of which varieties of the samebrand were bought (calculating a weighted price accordingly). In the casewhere multiple brands were bought, we randomly select one of the purchasedbrands. 10 This gives us 17,679 purchase occasions, out of which 3813 arefor Yoplait and 4739 are for Nordica. The remaining purchases are of otherbrands.Since we focus attention on only the top two brands in this market, we

remove from the data all purchase occasions where other brands were bought.Furthermore, because some of the estimation methods use the lagged choiceas an explanatory variable, we keep in the data only those households thathave at least two consecutive purchases of any one of the two brands underconsideration. This leaves us with 737 households and 5618 purchase occa-sions, out of which 2718 are for Yoplait and the remaining 2900 for Nordica.The panel is unbalanced. The minimum number of purchase occasions perhousehold is 2, for the reason explained above, while the maximum is 305.The mean number of purchases is 9.5 and the median is 5 (as compared to13.4 and 8, respectively, in the original sample).The marketing variables that are available for these data are the shelf price,

and the presence or absence of a store display and of a feature advertisementfor each brand (the latter coded as 0–1 dummy variables). Although thedata provide us with information on the value of coupons redeemed bythe household, we do not use this information to calculate net prices, i.e.the shelf price for each brand net of any coupons redeemed. One reasonis that redemption is observed only for the brand that is purchased whichmay introduce selection bias in estimating the e-ect of price on brand choice(the issue of selection bias when including coupon information has been ad-dressed by Chiang, 1995). Furthermore, in 311 of the 710 purchase occasionswhere coupons were redeemed, net prices are negative, indicating that theremay be problems with the manner in which the value of redeemed couponsis recorded in the data. However, in some of the speci,cations that we es-timate we use the information whether a household ever uses coupons ornot.Table 1 provides descriptive statistics for the sample of 5618 purchase

occasions. On average, Nordica is cheaper and it exhibits higher frequencyof store displays and feature advertisements.Demographic variables are often important in determining brand choices.

In some of the speci,cations that we estimate below we use several such vari-ables: the mean income of the income class that the household belongs to

10 There are very few purchase occasions where multiple brands were bought, approximately3% in the complete data set.

Page 18: Panel data analysis of household brand choices

128 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 1Summary statistics of marketing variables—two-brand samplea

Brand Share Average price Proportion of displays Proportion of features(%) (cent=oz) (%) (%)

Nordica 51.62 6.66 (1.01) 5.14 26.63Yoplait 48.38 9.90 (1.05) 1.89 1.89

aStandard deviations in parentheses.

Table 2Summary statistics of demographic variables

Mean Median S.D. Proportion(%)

INCOME (in thousands of dollars) 29.5 27.5 16.8HHSIZE 3.0 3.0 1.4HH-S 12.5HH-WW 32.8HH-C 48.0

(INCOME); the household size (HHSIZE); a dummy for full-time employedsingle households (HH-S); a dummy for households with two full-time em-ployed heads (HH-WW); and a dummy that equals one if a household usesa coupon at least once during the entire sampling period (HH-C). Table2 gives summary statistics for these variables for the households that wereused in the estimation. It is perhaps noteworthy that approximately 50% ofthe households in our sample use discount coupons at least once.

5. Estimation results

In this section we describe our empirical results obtained by the di-erentestimation methods outlined in Section 3. In the most general speci,cation,the model is given by (3), where dit = 1 if household i chooses Yoplaitin period t and dit = 0 if it chooses Nordica. The exogenous variables inxit are Pit , the di-erence in the prices between the two brands (in naturallogarithms—see Section 2), and the di-erence in the dummy variables forthe two brands that describe whether the brand was displayed in the storeand featured in an advertisement that week, Dit ; and Fit . These last two ex-planatory variables, Dit ; and Fit ; are discrete, taking on three possible values:1, 0, and −1. Thus the model has one continuous and two discrete variablesas exogenous regressors.

Page 19: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 129

We estimate the model using the following approaches.

(a) The conditional likelihood (logit) approach without the lagged choice(CL), and with the lagged choice treated as exogenous (CLL).

(b) The conditional logit pairwise approach without the lagged choice (CLP),and with the lagged choice treated as exogenous (CLLP).

(c) The pooled logit approach without the lagged choice (PL), and with thelagged choice treated as exogenous (PLL).

(d) The pooled logit approach with normally distributed random e-ects with-out the lagged choice (PLHET) and with the lagged choice treated asexogenous (PLLHET).

(e) The Honor/e and Kyriazidou approach (HK).

The objective functions that correspond to the estimation methods above are:(6), (7), (8), (9), and (5) respectively. For methods (a)–(d), the laggedchoice is treated as an exogenous variable whenever it is included in theestimation. As noted in Section 3, this produces consistent estimators for ap-proaches (c) and (d) to the extent that initial observations are exogenous. For(a) and (b), however, treating dit−1 as an additional variable in xit producesinconsistent results.Before we present our results, some comments on the way the data are

used in the estimation are in order. In what follows we will use the term‘string’ to denote a consecutive sequence of Yoplait and Nordica purchases.Thus, a household who buys a third brand on a purchase occasion beforethe very last one produces more than one strings. For the reason explainedin Section 4, we only consider strings of length at least 2. Thus, we donot concatenate a household’s strings to produce a single household purchasehistory of only the two brands under consideration, which would introducebias in the estimate of the coeGcient of the lagged choice. Instead, we treateach string as an additional purchase sequence by the same household. Thisgives us a panel of 1400 strings instead of 737, which would correspondto one for each household in the sample if we merely deleted all purchaseoccasions where a third brand was bought. In view of the way we constructthe data used in the estimation, the i subscript in the objective functions nowdenotes a string instead of a household. We do, however, keep track of thehousehold identity that each string belongs to when we estimate the modelusing the random e-ects approach (d). The constructed panel of strings isalso unbalanced. Thus, subscripts t in the objective functions run from 1 toTi; where Ti now denotes the length of a string.

All of the methods above that use the lagged choice as an explanatoryvariable (CLL, CLLP, PLL, PLLHET, HK) ignore the information in theinitial observation of each string, except for using the initial brand choice asan explanatory variable. From the methods that do not include the lag, CLand CLP use all information of each string, including the initial observation.

Page 20: Panel data analysis of household brand choices

130 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 3Estimates using various approachesa

�p �d �f � !� 1�

CL −3:662 0.470 0.986(0.334) (0.214) (0.117)

CLL −3:347 0.828 0.924 −0:068(0.399) (0.278) (0.141) (0.140)

CLP −2:186 0.512 0.989(0.136) (0.106) (0.051)

CLLP −1:943 0.716 1.036 0.539(0.152) (0.128) (0.057) (0.064)

CLP-W −3:159 0.642 0.692(0.495) (0.287) (0.171)

CLLP-W −3:389 0.784 0.632 0.046(0.552) (0.340) (0.200) (0.218)

PL −2:118 0.600 1.485 1.083(0.178) (0.140) (0.074) (0.073)

PLF −2:520 0.661 1.526 1.119(0.151) (0.111) (0.061) (0.063)

PLL −3:049 0.853 1.392 3.458 −0:333(0.249) (0.174) (0.091) (0.084) (0.102)

PLHET −3:400 0.921 1.366 0.898 2.118(0.293) (0.207) (0.108) (0.142) (0.077)

PLHETF −3:724 0.867 1.454 1.022 1.944(0.226) (0.151) (0.084) (0.115) (0.062)

PLLHET −3:821 1.031 1.456 2.126 0.198 1.677(0.313) (0.217) (0.113) (0.114) (0.150) (0.086)

HK05 −3:477 0.261 0.782 1.223(0.679) (0.470) (0.267) (0.352)

HK10 −3:128 0.248 0.759 1.198(0.658) (0.365) (0.228) (0.317)

HK30 −2:644 0.289 0.724 1.192(0.782) (0.315) (0.195) (0.291)

HK05-W −2:432 0.770 0.659 0.558(0.654) (0.439) (0.211) (0.254)

HK10-W −2:626 0.788 0.635 0.590(0.605) (0.387) (0.194) (0.232)

HK30-W −2:778 0.779 0.619 0.627(0.575) (0.368) (0.187) (0.220)

PLLHET-S −3:419 1.095 1.291 1.550 0.681 1.161(0.326) (0.239) (0.119) (0.117) (0.156) (0.081)

aStandard errors in parentheses.

PL and PLHET ignore the initial observation of each string completely. Wedo, however, estimate the model with PL and PLHET using all observations,including the strings with only one purchase of any one of the two brands.These estimates are denoted by PLF and PLHETF in Table 3. Including these

Page 21: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 131

additional observations increases the number of purchase occasions to 8552,out of which 4739 are for Nordica and the remaining for Yoplait.Note that the CL and CLL procedures are based on strings of length of

at least two and three purchases, respectively, where at least one switch be-tween the two brands occurs. Their computational cost explodes for longstrings that have a lot of such switches. In the results presented below, themaximum string length was set equal to 20. For longer strings, the remainingobservations of the string were treated as new strings.The HK method requires choosing the bandwidth, hn; and a functional form

for the kernel function K(·). We specify hn = h× n−1=5 where n now denotesthe total number of strings and h is a positive constant, set equal to 0.5,1.0 and 3.0. The kernel function is taken to be the standard normal densityfunction. Note that in this case the objective function is globally concave sothat we do not have to worry about local maxima.Finally, we discuss how we deal with the unbalanced nature of the con-

structed panel of strings. Our working assumption is that Ti, the length ofeach of the strings constructed as discussed above, is a random variable thatis independent of brand choice. Furthermore, the assumptions correspondingto the various estimation methods all hold conditional on Ti. Therefore, theobjective functions that correspond to the maximum-likelihood estimators—CL, PL, PLL, PLHET, PLLHET—are correctly speci,ed. For the pairwisedi-erence methods which are not based on the likelihood principle—CLP,CLLP, HK—the objective functions (Eqs. (7) and (5)) give proportionallymore weight to longer strings. We may think of weighting each string sothat it receives a weight proportional to its length, Ti. In particular, in theobjective function of CLP (Eq. (7)), we multiply the contribution of eachpair that belongs to a string of length Ti by 1=Ti. Note that this is the sameweighting that would be required to make the estimates from a pairwise dif-ference approach equal to those obtained by taking deviations from individualmeans (the standard ‘within’ approach) in a linear ,xed e-ects model withan unbalanced data set. In the linear model, this weighting is optimal in thesense that it produces the MLE estimates under a normality assumption. Inthe CLLP procedure, the e-ective string length is (Ti − 1), given that themethod conditions on the initial observation. In this case, we therefore useweights equal to 1=(Ti − 1). Similarly, for HK, the e-ective string length is(Ti − 2), since the method conditions on the initial and the last observationof each string. We therefore use weights equal to 1=(Ti − 2). In Table 3, theestimates produced by this weighting scheme are denoted by the suGx W.We should point out, however, that this choice of weights is arbitrary. Theestimators de,ned by Eqs. (7) and (5) may be motivated as GMM estimatorsthat satisfy moment conditions that result from the ,rst order conditions of alimiting maximization problem. We might therefore use instead the optimalweighting scheme described in Hansen (1982) to produce eGcient (within a

Page 22: Panel data analysis of household brand choices

132 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

certain class) estimates that would also account for the possibly unbalancednature of the data. For the other methods, which are all based on the maxi-mum likelihood principle (CL, PL, PLL, PLHET, PLLHET), such weightingis not required since we are assuming that Ti is exogenous.Table 3 provides the estimation results from the di-erent procedures for

the coeGcients �p; �d; and �f of Pit ; Dit ; and Fit , respectively, and of �,the coeGcient on the lagged endogenous variable dit−1: The table revealsthat almost all procedures yield statistically signi,cant coeGcients with theexpected signs. Speci,cally, an increase in the price of a brand reduces theprobability of choosing that brand, and the presence of a store display orof a feature advertisement for a brand makes purchase of that brand morelikely. We also note that most methods produce positive estimates for �, i.e.a previous purchase of a brand increases the probability of purchasing thesame brand in the next period.With respect to the conditional methods we note the following. The like-

lihood approaches, CL and CLL, produce estimates that are in general closeto those of the weighted pairwise approaches, CLP-W and CLLP-W. Theestimates for �p range between −3:2 and −3:7. For �d the estimates fromthe likelihood and the weighted pairwise approach range between 0.5 and0.8. The di-erences in the �f estimates are somewhat larger; the estimatesrange between 0.6 and 1.0. Both CLL and CLLP-W produce statistically in-signi,cant �’s. The CLL estimate, however, is negative. As we discuss inthe Monte Carlo section below, both the likelihood and the pairwise ap-proaches give �’s that are on average negatively biased toward zero. Theunweighted pairwise approaches produce �p’s that are considerably lowerthan their weighted counterparts, around −2:0. This may be explained bythe fact that, without weighting, longer strings contribute more to the objec-tive function. Thus, longer strings that display loyal brand choice and aretherefore less price elastic receive more weight. The coeGcients on the otherexogenous variables, �d and �f; are estimated close to the values reportedabove. In contrast to CLL and CLLP-W, however, CLLP produces a sta-tistically signi,cant estimate for �, which is estimated positive around 0.5.Again, a possible explanation for the large di-erence between the value of� as estimated by CLLP relative to the estimated values produced by CLLand CLLP-W may be due to the fact that, without weighting, longer stringsthat display loyal brand choice contribute more to the objective function.Thus, it appears that weighting has important implications for producing pointestimates.With respect to the pooled methods we note the following. �p is estimated

in the range of −3:0 to −3:8, with the exception of PL and PLF which es-timate it at −2:1 and −2:5, respectively. Observe that the estimates increase(in absolute value) monotonically when the lagged choice is included as anexplanatory variable and also when unobserved household heterogeneity is

Page 23: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 133

allowed for. That is, controlling for possible state dependence and householde-ects results in higher price elasticity estimates. The estimates for �d rangefrom 0.6 to 1.0 and are close to those obtained by the conditional methods.We note the same pattern as for the �p estimates, namely that including thelag and allowing for household heterogeneity raises �d monotonically. The �f

estimates are all very close, in the neighborhood of 1.5. The lagged choice isfound to have a large positive e-ect on brand choice: PLL estimates � at 3.5.However, introducing heterogeneity lowers it substantially to 2.1 (PLLHET).This is due to the fact that ignoring possible heterogeneity introduces spuriousstate dependence, a point originally raised by Heckman (1981). Thus, the PLLestimate of � may also capture unobserved heterogeneity. Indeed, we ,nd thatthere is substantial unobserved heterogeneity in the sample. All methods thatestimate random e-ects give high values for 1�, the standard deviation of thehousehold e-ects, ranging from 1.7 (PLLHET) to 2.1 (PLHET). Note thatintroducing the lag lowers 1�, or in other words, ignoring possible state de-pendence exaggerates the amount of unobserved heterogeneity. Furthermore,we note that the presence of the lagged choice lowers the estimated mean !a

of �i, from approximately 0.9 (PLHET) and 1.0 (PLHETF) to 0.2 (PLLHET)when heterogeneity is allowed for. The same observation applies to the casewhere �i is assumed to have a degenerate distribution (1�=0), in which casethe estimated mean !� drops from approximately 1.1 (PL and PLF) to −0:3(PLL) when the lagged choice is included in the explanatory variable set. Fi-nally, the estimates produced by the methods that use the entire sample (PLFand PLHETF) are higher in absolute value for all coeGcients compared tothose produced by the same procedures which use only the restricted samplethat ignores the initial observation and only considers strings of length atleast equal to 2. The di-erences, however, are not large.With respect to the HK estimates we observe the following. The �p esti-

mates are in general close to those obtained by the other methods, althoughthe weighting scheme that we employ lowers the estimates considerably fortwo out of the three bandwidths considered. 11 For �d; the weighted HK es-timates range between 0.5 and 0.8 for the di-erent bandwidths and are closeto the values estimated by the other methods, especially those of the condi-tional pairwise logit approach. However, without the weighting �d drops byalmost a half for all bandwidths and becomes insigni,cant. The �f estimatesare all close, estimated between 0.6 and 0.8, close to those obtained fromCLP and CLLP, but lower than the values estimated by the other methods.The estimates of � are again quite di-erent with and without weighting, rang-ing between 0.6 (with the weighting) and 1.3 (without weighting), and theyare signi,cantly lower than those estimated by the pooled methods. We notesome sensitivity in the point estimates of all coeGcients with respect to the

11 Using weights equal to 1=Ti instead of 1=(Ti − 2) hardly changed the point estimates.

Page 24: Panel data analysis of household brand choices

134 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

bandwidth choice, although the estimates are well within one standard errorof each other for the range of bandwidths considered. As expected, the esti-mated standard errors on all coeGcients are quite larger than those obtainedfor the conditional and the pooled estimates, especially when no weightingis used. This is due to the fact that HK uses only a subset of the sampleused by the other methods, namely only strings of length at least four, and tothe nonparametric component in the HK method which yields estimators thatconverge at slower than the standard square-root of the sample size rate. In-tuitively, the kernel weighting in the HK objective function implies that onlya subset of the data is essentially given non-zero weight, which means thatthe e-ective sample size is smaller than that used by either the conditionalor the pooled methods.The large di-erence between the estimate of PLLHET for the habit e-ect,

�, and the estimates obtained by HK may be explained by the fact that theHK method uses strings that exhibit at least one switch between the twobrands (excluding the initial and last observation in each string) and that ituses strings of length at least equal to four. Excluding households that arecompletely loyal to one brand (PLLHET-S), produces, as expected, a lowerestimate for �, approximately equal to 1.6. From an economic point of viewit seems that the habit e-ect as estimated by PLLHET is too large. For ex-ample, using the PLLHET estimate, we ,nd that having consumed a brand inthe previous period increases the probability of buying the same brand againfrom 0:50 (which is the probability of consuming any one of the two brandsin the sample) to approximately 0:89. 12 It is likely that the latter estimatestill reKects some spurious state dependence due to household heterogeneitythat is not suGciently captured. We proceed to investigate this possibility byestimating several speci,cations that include household demographic infor-mation.Table 4 presents results using PLL and PLLHET when household charac-

teristics and their interactions with prices are included as additional explana-tory variables. The ,rst column for each method shows the estimation resultswhen all household characteristics and price interactions are included. 13 Inthis case, none of the estimates of the coeGcients on the additional variablesis statistically signi,cant. The second column shows the results when the leastsigni,cant variables are dropped and the remaining variables become signif-icant. The estimates of �d; �f, and � are little inKuenced by the inclusionof demographic variables and their interactions with price. The coeGcients

12 See the working version of the paper, available at the corresponding author’s web sites, forthe calculations leading to this estimate.13 Note that there may be an endogeneity issue pertaining to the use of the dummy HH-C,

which denotes whether a household ever uses a coupon or not, since this decision may bedetermined contemporaneously with the brand choice decision.

Page 25: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 135

Table 4Estimates with demographic variablesa

Variable PLL PLLHET

PRICE −2:227 −3:009 −4:516 −4:090(1.284) (0.252) (1.683) (0.323)

DISPLAY 0.870 0.878 1.024 1.035(0.176) (0.176) (0.219) (0.218)

FEATURE 1.373 1.372 1.441 1.443(0.092) (0.092) (0.113) (0.113)

LAG 3.419 3.417 2.108 2.116(0.084) (0.084) (0.114) (0.114)

CONSTANT −1:666 −1:364 −0:741 −0:821(0.523) (0.243) (0.781) (0.466)

ln INCOME 0.327 0.212 0.333 0.320(0.178) (0.073) (0.261) (0.138)

HHSIZE 0.065 0.103 −0:067(0.078) (0.037) (0.122)

HH-S 0.535 0.494 0.318(0.325) (0.129) (0.476)

HH-WW 0.169 0.217(0.236) (0.343)

P ln INCOME −0:336 0.058(0.428) (0.552)

PHHSIZE 0.080 0.087(0.184) (0.251)

PHH-S −0:844 1.187 1.708(0.747) (0.920) (0.493)

PHH-WW −0:702 −0:387 −0:574(0.551) (0.216) (0.697)

PHH-C 0.200 0.280(0.204) (0.375)

1� 1.655 1.660(0.086) (0.086)

Total price e-ectb −2:600 −3:060 −4:265 −4:002−Log-likelihood 2093.462 2094.853 1821.359 1822.5326

aStandard errors in parentheses.bEvaluated at the means of demographic variables.

on the price are, as expected, less precisely estimated. However, the totalprice e-ect that includes only signi,cant explanatory variables, is close tothe previous estimates. The results show that the estimates produced by thepooled procedures are very robust with respect to di-erent speci,cations. Ifthere is any additional spurious state dependence and heterogeneity left, it isnot captured by household demographic characteristics.We next investigate the sensitivity of the estimates produced by the pooled

logit methods with respect to the speci,cation of the error distribution, the

Page 26: Panel data analysis of household brand choices

136 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 5Probit estimatesa

�p �d �f � !� 1�

PP −2:300 0.636 1.581 1.174(0.206) (0.146) (0.076) (0.085)

PPL −3:002 0.850 1.396 3.626 −0:429(0.243) (0.175) (0.092) (0.080) (0.102)

PPHET −3:625 1.021 1.397 0.737 1.514(0.293) (0.207) (0.108) (0.142) (0.072)

PPLHET −3:769 1.014 1.439 2.104 0.131 1.018(0.309) (0.216) (0.113) (0.119) (0.154) (0.064)

aStandard errors in parentheses.

exogeneity assumption on the initial observation of each string, and the dis-tribution of the household e-ects.Table 5 presents the results obtained assuming that the transitory error

term �it is normally distributed and independent across households and overpurchase occasions. We estimate the model by the pooled probit approachwithout the lagged choice in the explanatory variable set (PP) and with thelag, treated as exogenous (PPL); and by the pooled probit with normallydistributed random e-ects without the lag (PPHET) and with the lag, treatedas exogenous (PPLHET). We note that the results change very little as wereplace the logit assumption with the normality assumption (compare with theestimates of Table 3). The most important di-erence with the pooled logitestimates is in the magnitude of the standard deviation of the random e-ects,which drops to 1.514 when the lag is excluded (PPHET), and to 1.018 whenit is included (PPLHET). Given the relative insensitivity of the estimates withrespect to the distribution of the transitory errors, we focus in the rest of thissection on the logit speci,cation.All previous estimation by the pooled methods when the lagged choice

is included as an explanatory variable, treats the initial observations in eachstring as exogenous. However, as pointed out by Heckman (1981), this willin general lead to inconsistent estimates, especially of the habit parameter�, if unobserved heterogeneity is present. It therefore becomes important tomodel the probability of the initial brand choice in each purchase string, i.e.p0(xi; �i) ≡ Pr(di0 = 1|xi; �i). Note that if the process is stationary and hasbeen operating for a long time, it may be reasonable to assume that it is inequilibrium, so that we could take p0(xi; �i) to be the steady state probabilityof choosing 1 given (xi; �i). In the absence of exogenous covariates, thisprobability may be easily shown to be

.(�i)1−.(�+ �i) +.(�i)

:

Page 27: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 137

Table 6PLLHET—endogenous initial conditionsa

�p �d �f � !� 1�

−4:053 0.803 1.401 1.598 0.046 1.770(0.274) (0.178) (0.115) (0.115) (0.133) (0.102)aStandard errors in parentheses.

In the presence of exogenous variables, calculation of the steady state prob-ability would require speci,cation of the distribution of xit as well. Instead,we approximate it by

.( Xx�+ �i)1−.( Xx�+ �+ �i) +.( Xx�+ �i)

; (12)

where Xx is the overall sample mean of xit , averaged over both householdsand purchase occasions.Table 6 reports the estimates obtained by maximizing the log-likelihood

function (11) where p0(xi; �i) is speci,ed as in (12) and f(�|xi) is takento be the normal density with mean !� and variance 12

�. The results revealthat the point estimates of the coeGcients on the exogenous variables do notchange much. However, the estimate of � drops signi,cantly to 1.598 com-pared to 2.126 produced by PLLHET when the initial observations are treatedas exogenous (see Table 3). In other words, assuming exogenous initial con-ditions leads to overestimating the amount of state dependence. This ,ndingis consistent with results reported in the literature for other dynamic discretechoice models, 14 and underlines the importance of appropriately modelinginitial conditions when random e-ects methods are used in estimating statedependence.We proceed to investigate the sensitivity of the point estimates with re-

spect to the normality assumption underlying the random e-ects approach.We estimate a random e-ects model using a discrete distribution with 2–5support points (we return to our original assumption that initial conditions areexogenous). The results are reported in Table 7. We note that the estimateshardly change as we increase the number of support points, and that theyare very close to the ones obtained by PLLHET with normal random e-ects(see Table 3). We do, however, observe a small but signi,cant decrease inthe estimated values for � as the number of support points increases. TheSchwartz Information Criterion for model selection (last column of Table 7)suggests that the model with a four-point distribution for the random e-ects

14 See Chay and Hyslop (1998) for an investigation of the issue of modeling initial conditionsin dynamic discrete choice models of labor force and welfare participation.

Page 28: Panel data analysis of household brand choices

138 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 7PLLHET—discrete distribution of random e-ects

Parameter estimatesa Mass pointsb

�p �d �f � �1 �2 �3 �4 �5 SIC

−3:652 0.969 1.398 2.383 −0:674 2.185 3811(0.293) (0.198) (0.108) (0.107) [0.711] [0.289]−3:824 1.058 1.394 2.157 −1:672 0.290 2.734 3760(0.306) (0.214) (0.112) (0.114) [0.327] [0.450] [0.224]−3:833 1.080 1.459 2.067 −1:728 0.199 2.230 5.351 3741(0.312) (0.220) (0.115) (0.116) [0.307] [0.424] [0.232] [0.037]−3:844 1.098 1.465 2.052 −2:809 −1:082 0.328 2.264 5.383 3756(0.313) (0.221) (0.116) (0.117) [0.090] [0.291] [0.354] [0.228] [0.037]aStandard errors in parentheses. Correct only to the extent that the true distribution of the

random e-ects has indeed 2–5 support points.bProbabilities in square brackets.

Table 8PLL—,xed e-ects estimation

Parameter estimatesa Fixed e-ects distributionb

�p �d �f � [− 70;−7] (−7;−2] (−2;−1] (−1; 1) [1; 2) [2; 7) [7; 70]

−3:933 1.109 1.289 1.138 322 2 18 122 63 74 136(1.80) (0.235) (0.116) (0.106)aStandard errors in parentheses.bIntervals in square brackets.

is the most appropriate in terms of precision of the estimates and parsimonyof the parametrization.We next attempt to capture any spurious state dependence by directly esti-

mating the household speci,c e-ects. Using the objective function of the PLLprocedure, we carry the maximization in two steps, maximizing for any valueof � and � the log-likelihood with respect to the �i’s over a 118-point grid onthe interval [− 70; 70]. Out of the 737 households, 458 (approximately 60%)never switch between brands. Of those, 322 always buy Nordica and the re-maining 136 always buy Yoplait. As expected, the estimated e-ects for thesehouseholds are very large (in absolute value). Note that the average numberof purchase occasions (excluding the initial observations in each string) israther small, approximately 8, compared to the number of household-speci,ce-ects that we want to estimate (737). We therefore do not expect the esti-mates to be consistent. However, as Table 8 reveals, the estimated e-ect ofthe lagged choice drops signi,cantly, by almost 50%, from 2.126 to 1.138.All other coeGcient estimates are very close to those estimated by PLLHETin Table 3.

Page 29: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 139

Table 9Summary statistics of marketing variables—three-brand samplea

Brand Share Average price Proportion of Proportion of(%) (cent=oz) displays (%) features (%)

Nordica 42.23 6.73 (0.99) 4.52 23.06Yoplait 40.07 9.99 (1.03) 1.66 4.74Dannon 17.70 8.25 (0.50) 0.00 2.66

aStandard deviations in parentheses.

Finally, we examine the sensitivity of results with respect to the number ofbrands used in the analysis. We include one more brand (Dannon), which hasthe next largest market share (15.6% in terms of weight and 10.9% in termsof units bought). This increases the number of households to 839 and thenumber of purchase occasions to 7434 (after excluding all occasions whereother brands were bought and including only households who have at leasttwo consecutive purchases of any one of the three brands). Summary statisticsfor this sample may be found in Table 9.We consider the trinomial logit model,

Pr(djit =1|{{Xjis}2j=0}Ts=0; {�ji}2j=0; {{djis}2j=0}t−1s=0)

=exp(Xjit�j + �jdjit−1 + �ji)∑2j=0exp (Xjit�j + �jdjit−1 + �ji)

;

where Xjit contains brand j’s price (in natural logarithm), and its display andfeature dummies. Note that the model above imposes that there are no crosse-ects across brands, i.e. �kj = �kj =0 for all k �= j in the notation of Section2. This restriction is common in the literature although it is not necessary.To keep the notation consistent with the one used in the two-brand case, weuse j = 0 to denote Nordica, j = 1 to denote Yoplait, and j = 2 to denoteDannon.Table 10 reports our estimates using PLL (assuming exogenous initial con-

ditions and no household=brand heterogeneity, i.e. �ji=0 for all i and j); PLL-HET (assuming exogenous initial conditions, and that �1i − �0i and �2i − �0iare normally distributed random e-ects with means !�1 and !�2 , variances 12

�1and 12

�2 , respectively, and correlation coeGcient equal to 3); and HK (withthe bandwidth constant set equal to 1). In panel A of the table we restrict allcoeGcients to be equal across the three brands, (�j = �l = � and �j = �l = �for all j; l) which is again common practice in the literature. In panel B weallow the coeGcient of the lagged choice to vary across brands. The suGx Sdenotes the case where the model was estimated using only households whoswitch at least once among the three brands.Comparing the results from the three-brand case (Table 10) to those from

the two-brand case (Table 3), we ,nd the following. For a given estimation

Page 30: Panel data analysis of household brand choices

140P.Chintagunta

etal./Journal

ofEconom

etrics103

(2001)111–153

Table 10(A) Estimates using three brandsa—identical �j’s and �j’s

�p �d �f � !�1 !�2 1�1 1�2 3

PLL −3:245 0.868 1.399 3.509 −0:253 −1:403(0.183) (0.146) (0.071) (0.061) (0.079) (0.056)

PLLHET −4:481 1.469 1.881 2.403 0.245 −1:354 1.650 1.576 0.375(0.321) (0.252) (0.119) (0.122) (0.158) (0.128) (0.086) (0.101) (0.086)

PLLHET-S −4:159 1.530 1.796 1.649 0.798 −0:705 1.346 1.512 0.385(0.338) (0.281) (0.128) (0.128) (0.172) (0.143) (0.096) (0.119) (0.095)

HK10 −3:009 0.749 0.871 0.807(0.319) (0.298) (0.196) (0.222)

(B) Estimates using three brandsa—identical �j’s; di=erent �j’s

�p �d �f �0 �1 �2 !�1 !�2 1�1 1�2 3

PLL −3:251 0.864 1.398 3.768 3.180 3.645 1.643 0.384(0.184) (0.146) (0.071) (0.164) (0.159) (0.169) (0.104) (0.078)

PLLHET −4:479 1.454 1.881 2.644 2.085 2.454 1.561 −0:102 1.655 1.556 0.343(0.321) (0.252) (0.119) (0.317) (0.329) (0.368) (0.199) (0.157) (0.087) (0.107) (0.097)

PLLHET-S −4:175 1.534 1.796 1.741 1.545 1.641 1.670 0.144 1.348 1.510 0.374(0.339) (0.281) (0.128) (0.325) (0.336) (0.387) (0.206) (0.166) (0.098) (0.125) (0.104)

HK10 −3:076 0.787 0.860 1.733 0.502 0.834(0.491) (0.348) (0.199) (0.467) (0.417) (0.388)

aStandard errors in parentheses.

Page 31: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 141

method, the magnitudes of the parameter estimates are similar across the twocases (one exception is the �d estimate using HK, which in the three-brandcase is considerably higher than in the two-brand-case and also becomesstatistically signi,cant). In general, all parameter estimates tend to increase(in absolute value) when a third brand is included, with the exception of �p

and � when HK is used (although the drop in the �p estimate is quite smallcompared to its standard error). The increase in the (absolute) magnitudeof the estimated price and promotional e-ects is not too surprising sinceone might expect that with the addition of a third brand in the analysis,consumers’ demand could exhibit higher elasticity with respect to price and,rms’ promotional e-orts. The increase of the habit e-ect as estimated bythe pooled methods when a third brand is added might indicate that some ofthe increased heterogeneity in households and brand decisions in the enlargeddata set is captured by a larger estimate of the state dependence. In contrast,the HK estimate of �, which is robust to the amount of the heterogeneityacross households and brands, decreases when a third brand is included inthe analysis. An interesting result that emerges from our analysis when thehabit e-ects are allowed to vary across brands (Table 10) is that there appearto be signi,cant di-erences in the estimates for the di-erent brands. 15 Inparticular, the habit e-ect is strongest for Nordica, followed by Dannon andthen Yoplait. This pattern is consistent across all estimation methods. Futureresearch may endeavor to uncover the reasons for the di-erences in brandloyalty, for example the extent of advertising, price promotions, etc. This,nding may further suggest that common ad hoc restrictions on behavioralparameters (e.g. identical own e-ects, i.e. that �jj and �jj are identical for allj), in addition to those necessary to identify the parameters of the econometricmodel, may bias results.Concluding our discussion of the three-brand case, we ,nd that there do

not appear to be serious selection problems associated with our restricting theanalysis to two brands. The pattern of the variation in the estimates acrossmethods remains the same as in the two-brand case. The estimate of the habite-ect is more sensitive to the choice of estimation method than to the inclu-sion of a third brand. One factor that could potentially explain the di-erencesin results when a third brand is added in the analysis is that our sample in thethree-brand case includes additional households. We would therefore expectthe sample in the three-brand case to reKect more heterogeneity in purchasebehavior. It is precisely in such a situation that we expect the HK method todo a better job capturing the underlying properties of the data.

15 We also estimated versions of the model where some of the coeGcients on the exogenousvariables were allowed to vary across brands. However, we did not ,nd much variation in theestimates.

Page 32: Panel data analysis of household brand choices

142 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Summarizing our empirical analysis, we ,nd that all procedures yield sig-ni,cant price elasticities (above 1) in brand choice, and sensitivity of house-holds to ,rms’ marketing e-orts. Furthermore, the estimates suggest that thereexist large habit e-ects and substantial heterogeneity across households. Thesize of the estimated parameters, however, varies considerably across estima-tion methods. To investigate this issue further and identify situations wherethe di-erent methods are most reliable in producing point estimates, we nextpresent the results of a Monte Carlo study that uses the design of the datafor the two-brand case.

6. Monte Carlo results

In this section we report results of Monte Carlo experiments that investigatethe sensitivity of various estimators considered in the previous section withrespect to:

(a) the assumption of strict exogeneity of initial observations;(b) the magnitude of the variance of the individual e-ects;(c) the correlation of individual e-ects and exogenous covariates.

The estimators under consideration are the conditional logit (CL); the condi-tional logit with lag (CLL); the conditional logit with pairwise comparisons(CLP); the conditional logit with pairwise comparisons with lag (CLLP); thepooled logit (PL); the pooled logit with lag (PLL); the pooled logit withnormally distributed random e-ects (PLHET); the pooled logit with lag andnormally distributed random e-ects (PLLHET); the pooled logit with lag,normally distributed random e-ects and endogenous initial conditions (PLL-HETI); and the Honor/e–Kyriazidou (HK) estimator for three di-erent valuesof the bandwidth constant: 0.5, 1.0, and 3.0.In all experiments we generate a panel of T =5 observations for each one

of n=771 individuals. Note that 5 is the median string length of the sampleused in estimating the model in the previous section. 771 is the number ofstrings with exactly ,ve non-overlapping consecutive purchases of Yoplaitand Nordica that we obtain from the original sample of 17,679 purchaseoccasions.The response variables dit for each individual and for the last four periods

of the ,ve-period panel, i.e. for t = 1; 2; 3; 4, are generated according to thelogit model:

dit = 1{Pit�p +Dit�d + Fit�f + �dit−1 + �i + �it ¿ 0}=1{xit�+ �dit−1 + �i + �it ¿ 0};

Page 33: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 143

where Pit is the di-erence in the log-prices between the two brands, and Dit

and Fit are the di-erences in the feature and display variables between thetwo brands. The coeGcients on the explanatory variables � = (�p; �d; �f)′

and � are set equal to the point estimates obtained from PLLHET (i.e.�= (−3:821; 1:031; 1:456)′ and �= 2:126). The error terms �it are drawn in-dependently over time and over individuals from a logistic distribution. Theexogenous regressor strings xi ≡ (xi1; xi2; xi3; xi4) are generated by randomdrawing with replacement from the sample of 771 strings of length equalto 5.The designs described below di-er in the way the initial observation di0

and the individual e-ect �i for each individual are generated. In each designthe experiment is replicated 100 times.In the ,rst experiment (UN0 design), we take the initial observations as

exogenous and the individual e-ects identically equal to the mean of �i asestimated by PLLHET (!� = 0:198) for all individuals. The initial observa-tions on the response variables di0, are generated similarly to the exogenousregressors xi, i.e. by random drawing with replacement from the sample ofinitial observations of the 771 strings. Note that in this design, 1� =0: Thus,PLL estimates the model consistently.In the second set of experiments (UNLO and UNME designs), we take

the initial observations as exogenous and the individual e-ects to be inde-pendent of the regressors, and we examine the sensitivity of the estimatorswith respect to the magnitude of the variance of the individual e-ects relativeto the variance of the time-varying error component. Speci,cally, the indi-vidual e-ects �i are generated as N(!�; 12

�), independently over individuals,with !� equal to the estimated mean and variance of �i from PLLHET (i.e.!� = 0:198) and 1� equal to 4=(2

√3) for the UNLO design and equal to

4=√3 for the UNME design. Note that 4=

√3 ≈ 1:814 is the standard devia-

tion of the logistic distribution and it is approximately equal to the standarddeviation of � estimated by PLLHET (1:677). Thus, in the UNME designthe variance of the individual e-ects � is equal to the variance of the transi-tory errors �it , while in the UNLO design 12

� is equal to 1=4 of the varianceof �it .

In the next sets of experiments (COLO, COME), we relax the assump-tion of independence between individual e-ects and regressors. In particular,the individual e-ects are generated as a linear combination of one of theexogenous covariates, Pit , over the four time periods:

�i = !� + c4∑

t=1

(Pit − 1

n

n∑i=1

Pit

);

where !� = 0:198, and c is such that the implied 12� is equal to the two

values considered in the independence case: 4=(2√3) in the COLO design,

and 4=√3 for the COME design. We specify c to be positive which implies

Page 34: Panel data analysis of household brand choices

144 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

that the induced correlation between the individual e-ects and the logarithmof the brand price ratio is positive.Finally, for all designs with 1� = 4=

√3, we relax the assumption of strict

exogeneity of the initial choice. The individual e-ects are either independentof the exogenous covariates (UNMEI design), or positively correlated withPit as above (COMEI design). The initial di0’s are drawn as

di0 = 1{�i + �i0 ¿ 0};where �i0 is drawn again from a logistic distribution independently of every-thing else. This design implies that di0 is now correlated with the individuale-ect.The results from the Monte Carlo experiments are reported in Tables 11,

12, 13 and 14. Each table reports the mean bias, the standard deviation, andthe root mean squared error (across the 100 replications) of the estimates ofeach one of the four parameters, �p; �d; �f, and �. We observe the following:Across all seven designs, the conditional logit methods—CL, CLL, CLP,

and CLLP—give overall smaller average biases than the other methods forthe coeGcients of the exogenous variables, �p; �d, and �f. In particular, themean bias for �p is at most 6% of the true parameter value. The biases for�d are somewhat larger, up to 14% of the true value, while the bias for�f is at most 6% of the true value. There is no indication that using thepairwise estimators, CLP and CLLP, biases the estimates more than usingthe conditional likelihood estimators, CL and CLL. Introducing the lag ingeneral lowers the biases for all � coeGcients. The feedback parameter � onthe lagged choice is signi,cantly underestimated towards zero by both CLLand CLLP for all designs. On average, the bias ranges from 72% to 100% ofthe true value. As expected, increasing the variance of the individual e-ectsdoes not a-ect the average biases for any of the coeGcients.With the exception of the UN0 design, the pooled methods, PL and PLL,

often produce very biased estimates for the price coeGcient �p: Speci,cally,in all designs where the individual e-ects are independent of the x’s (UNLO,UNME, UNMEI) �p is underestimated in absolute terms (positive bias) by12% up to 47% of the true value, although the estimates are still of the correctsign (negative). The biases increase substantially when correlation between�i and xi is introduced (COLO, COME, COMEI designs), ranging between82% to almost 170% of the true parameter value. 16 In particular, in theCOME, and COMEI designs, PL and PLL produce on average price e-ectsthat are of the wrong sign (positive). Increasing 1� has a big e-ect; averagebiases for �p increase by 2–3 times. The coeGcients of the display and thefeature variables, �d and �f; are almost invariably underestimated, although

16 In experiments, reported in the working paper version, where negative correlation between�i and Pi was introduced, we found that the estimated �p’s were negatively biased for allpooled methods. On average �p was overestimated by approximately 50–100%.

Page 35: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 145

Table 11(A) Mean bias for �p

UN0 UNLO UNME COLO COME UNMEI COMEI

CL 0.14 0.10 0.07 0.08 0.14 −0:24 −0:22CLL 0.03 −0:02 −0:06 −0:05 −0:01 −0:24 −0:24CLP 0.22 0.16 0.10 0.11 0.17 −0:23 −0:18CLLP 0.08 0.02 −0:05 −0:03 −0:00 −0:23 −0:19PL 0.40 0.88 1.63 3.75 6.12 1.80 6.47PLL 0.01 0.47 1.18 3.10 5.14 1.06 4.43PLHET −0:35 −0:21 0.41 3.57 3.64 0.82 3.64PLLHET −0:28 −0:03 0.13 3.72 4.14 0.79 4.14PLLHETI 0.17 0.29 1.06 2.73 4.72 0.13 2.89HK05 −0:07 −0:26 −0:23 −0:23 −0:21 −0:03 −0:10HK10 −0:20 −0:36 −0:34 −0:31 −0:32 −0:14 −0:21HK30 −0:34 −0:46 −0:45 −0:42 −0:46 −0:27 −0:33

(B) Standard deviation for �p

CL 0.45 0.48 0.52 0.39 0.41 0.59 0.50CLL 0.45 0.48 0.52 0.40 0.42 0.58 0.51CLP 0.47 0.51 0.55 0.41 0.41 0.64 0.52CLLP 0.48 0.51 0.56 0.42 0.44 0.64 0.53PL 0.28 0.24 0.27 0.26 0.27 0.27 0.26PLL 0.28 0.25 0.28 0.25 0.25 0.30 0.25PLHET 0.35 0.33 0.34 0.32 0.30 0.30 0.30PLLHET 0.31 0.31 0.36 0.38 0.34 0.32 0.34PLLHETI 0.28 0.26 0.29 0.26 0.26 0.37 0.32HK05 1.21 1.36 1.39 1.19 1.30 1.43 1.44HK10 1.09 1.20 1.26 1.06 1.11 1.28 1.31HK30 1.03 1.13 1.18 0.98 1.02 1.21 1.26

(C) RMSE for �p

CL 0.47 0.49 0.52 0.40 0.43 0.63 0.55CLL 0.45 0.48 0.52 0.41 0.42 0.63 0.56CLP 0.51 0.53 0.56 0.42 0.45 0.67 0.55CLLP 0.48 0.50 0.56 0.42 0.44 0.67 0.56PL 0.48 0.91 1.66 3.76 6.12 1.82 6.47PLL 0.28 0.53 1.21 3.11 5.15 1.10 4.44PLHET 0.49 0.38 0.53 3.58 3.65 0.88 3.65PLLHET 0.41 0.31 0.38 3.74 4.15 0.85 4.15PLLHETI 0.33 0.39 1.10 2.75 4.73 0.39 2.90HK05 1.21 1.38 1.41 1.21 1.31 1.42 1.43HK10 1.10 1.24 1.29 1.10 1.15 1.28 1.32HK30 1.08 1.21 1.26 1.06 1.11 1.24 1.30

the biases are not as large compared to �p; the highest being approximately50% of the true coeGcient values. Similar to the �p estimates, the biasesfor both �d and �f increase with 1�: � is in general overestimated by PLL

Page 36: Panel data analysis of household brand choices

146 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 12(A) Mean bias for �d

UN0 UNLO UNME COLO COME UNMEI COMEI

CL −0:14 −0:11 −0:10 −0:09 0.04 0.04 0.08CLL −0:07 −0:04 −0:02 −0:02 0.10 0.04 0.08CLP −0:12 −0:08 −0:07 −0:09 0.04 0.04 0.09CLLP −0:05 −0:01 0.00 −0:02 0.10 0.04 0.09PL −0:18 −0:28 −0:44 −0:06 −0:14 −0:47 −0:24PLL 0.00 −0:10 −0:28 0.02 −0:11 −0:19 −0:13PLHET −0:04 −0:06 −0:18 0.07 0.01 −0:23 0.01PLLHET 0.06 0.01 0.00 0.11 −0:09 −0:12 −0:09PLLHETI −0:25 −0:23 −0:34 −0:08 −0:13 −0:02 0.08HK05 1.75 1.42 1.99 1.15 1.24 2.02 0.82HK10 1.31 1.12 1.60 0.84 1.12 1.79 0.68HK30 1.21 1.10 1.52 0.66 1.12 1.69 0.71

(B) Standard deviation for �d

CL 0.37 0.39 0.37 0.32 0.36 0.36 0.40CLL 0.38 0.39 0.38 0.32 0.36 0.36 0.40CLP 0.39 0.41 0.40 0.32 0.38 0.38 0.41CLLP 0.39 0.41 0.40 0.32 0.39 0.38 0.41PL 0.21 0.21 0.20 0.21 0.19 0.20 0.20PLL 0.26 0.25 0.23 0.25 0.22 0.23 0.21PLHET 0.27 0.27 0.26 0.22 0.24 0.23 0.24PLLHET 0.28 0.27 0.28 0.23 0.22 0.24 0.22PLLHETI 0.22 0.23 0.22 0.23 0.21 0.25 0.24HK05 3.84 3.72 4.13 3.32 3.57 4.33 3.07HK10 3.56 3.50 3.98 3.18 3.52 4.34 2.68HK30 3.71 3.72 4.10 2.78 3.57 4.25 2.68

(C) RMSE for �d

CL 0.39 0.40 0.38 0.33 0.36 0.36 0.41CLL 0.38 0.39 0.38 0.32 0.37 0.36 0.41CLP 0.40 0.41 0.40 0.33 0.38 0.38 0.42CLLP 0.39 0.41 0.39 0.32 0.40 0.38 0.42PL 0.28 0.35 0.48 0.22 0.24 0.51 0.31PLL 0.26 0.27 0.36 0.25 0.24 0.29 0.24PLHET 0.27 0.28 0.31 0.23 0.24 0.32 0.24PLLHET 0.28 0.27 0.28 0.25 0.24 0.27 0.24PLLHETI 0.33 0.32 0.41 0.24 0.25 0.25 0.25HK05 4.20 3.96 4.56 3.49 3.76 4.76 3.16HK10 3.78 3.66 4.27 3.28 3.68 4.68 2.76HK30 3.88 3.86 4.36 2.84 3.72 4.55 2.76

except in the COLO and COME designs. The bias is at most 50% of thetrue value (UNMEI design). We note that the biases for the designs whereinitial conditions are endogenous (UNMEI, COMEI) are in absolute value

Page 37: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 147

Table 13(A) Mean bias for �f

UN0 UNLO UNME COLO COME UNMEI COMEI

CL −0:05 −0:07 −0:08 −0:08 −0:08 0.05 0.05CLL −0:00 −0:02 −0:02 −0:02 −0:01 0.06 0.06CLP −0:05 −0:06 −0:07 −0:07 −0:08 0.06 0.05CLLP −0:00 −0:01 −0:02 −0:01 −0:01 0.06 0.06PL −0:03 −0:26 −0:59 −0:21 −0:49 −0:74 −0:69PLL −0:01 −0:23 −0:56 −0:17 −0:48 −0:37 −0:42PLHET 0.20 0.10 −0:18 −0:17 −0:26 −0:38 −0:26PLLHET 0.11 −0:02 −0:11 −0:15 −0:33 −0:27 −0:33PLLHETI −0:19 −0:28 −0:58 −0:22 −0:49 −0:07 −0:11HK05 0.03 0.06 0.09 0.03 0.11 0.13 0.06HK10 −0:02 0.00 0.03 0.00 0.06 0.05 −0:01HK30 −0:07 −0:05 −0:03 −0:03 0.03 −0:03 −0:06

(B) Standard deviation for �f

CL 0.18 0.18 0.19 0.16 0.17 0.20 0.18CLL 0.18 0.19 0.19 0.17 0.17 0.20 0.18CLP 0.19 0.19 0.19 0.16 0.18 0.21 0.19CLLP 0.19 0.19 0.19 0.17 0.18 0.21 0.19PL 0.12 0.11 0.10 0.09 0.09 0.10 0.09PLL 0.12 0.12 0.12 0.10 0.10 0.11 0.09PLHET 0.14 0.14 0.13 0.11 0.11 0.12 0.11PLLHET 0.13 0.14 0.14 0.12 0.11 0.11 0.11PLLHETI 0.12 0.12 0.11 0.10 0.10 0.13 0.11HK05 0.36 0.42 0.49 0.36 0.37 0.47 0.39HK10 0.31 0.36 0.40 0.31 0.35 0.37 0.34HK30 0.28 0.34 0.38 0.30 0.36 0.35 0.32

(C) RMSE for �f

CL 0.19 0.20 0.20 0.17 0.19 0.20 0.19CLL 0.18 0.19 0.19 0.17 0.17 0.21 0.19CLP 0.19 0.19 0.20 0.18 0.19 0.21 0.20CLLP 0.19 0.19 0.19 0.17 0.18 0.22 0.20PL 0.12 0.28 0.60 0.22 0.50 0.75 0.70PLL 0.12 0.26 0.57 0.20 0.49 0.38 0.43PLHET 0.25 0.17 0.22 0.20 0.28 0.40 0.28PLLHET 0.17 0.14 0.18 0.20 0.34 0.29 0.34PLLHETI 0.22 0.31 0.59 0.24 0.50 0.14 0.15HK05 0.36 0.42 0.50 0.36 0.39 0.49 0.39HK10 0.31 0.35 0.40 0.31 0.35 0.37 0.34HK30 0.29 0.34 0.37 0.30 0.35 0.35 0.32

considerably larger than in the other designs where the initial conditions areexogenous. For the designs with correlated e-ects and endogenous initialconditions, increasing 1a tends to lower the average bias for �: Finally, in

Page 38: Panel data analysis of household brand choices

148 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 14(A) Mean bias for �

UN0 UNLO UNME COLO COME UNMEI COMEI

CL — — — — — — —CLL −1:70 −1:68 −1:64 −1:65 −1:58 −2:09 −2:05CLP — — — — — — —CLLP −1:67 −1:65 −1:60 −1:62 −1:54 −2:14 −2:09PL — — — — — — —PLL 0.00 0.14 0.40 −0:21 −0:18 1.09 0.47PLHET — — — — — — —PLLHET −0:07 −0:01 0.04 −0:44 0.30 0.97 0.30PLLHETI −0:35 −0:32 0.06 −0:65 −0:60 −0:07 −0:51HK05 0.03 0.03 0.07 0.01 0.05 0.67 0.64HK10 −0:00 0.00 0.05 −0:02 0.03 0.61 0.58HK30 −0:03 −0:03 0.02 −0:05 0.01 0.56 0.54

(B) Standard deviation for �

CL — — — — — — —CLL 0.12 0.11 0.13 0.10 0.09 0.12 0.09CLP — — — — — — —CLLP 0.13 0.12 0.14 0.12 0.11 0.14 0.11PL — — — — — — —PLL 0.10 0.09 0.09 0.09 0.08 0.13 0.10PLHET — — — — — — —PLLHET 0.11 0.11 0.13 0.10 0.12 0.15 0.12PLLHETI 0.09 0.09 0.09 0.09 0.07 0.13 0.09HK05 0.28 0.35 0.36 0.27 0.25 0.65 0.63HK10 0.26 0.32 0.34 0.25 0.25 0.60 0.53HK30 0.25 0.30 0.33 0.25 0.25 0.56 0.49

(C) RMSE for �

CL — — — — — — —CLL 1.71 1.68 1.64 1.65 1.58 2.10 2.05CLP — — — — — — —CLLP 1.68 1.65 1.61 1.62 1.54 2.15 2.09PL — — — — — — —PLL 0.10 0.16 0.41 0.23 0.20 1.10 0.49PLHET — — — — — — —PLLHET 0.13 0.11 0.13 0.45 0.32 0.99 0.32PLLHETI 0.36 0.33 0.11 0.66 0.61 0.15 0.52HK05 0.28 0.35 0.37 0.27 0.25 0.93 0.90HK10 0.26 0.32 0.34 0.25 0.25 0.86 0.78HK30 0.25 0.30 0.33 0.25 0.25 0.79 0.72

almost all designs, the biases for PLL are smaller than those of PL, i.e.introducing the lagged choice in the regressor set when state dependence ispresent improves the estimates.

Page 39: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 149

Introducing normal random e-ects in the pooled procedures (PLHET andPLLHET), lowers the average biases in �p relative to PL and PLL in almostall designs. In the UN0, UNLO, UNME, and UNMEI designs, introducingthe lag as an explanatory variable decreases the bias in �p, similar to the casewithout heterogeneity. The opposite occurs in the correlated e-ects designs,COLO, COME, COMEI. Similar to PL and PLL, the average biases for�p are very large, up to 100% of the true value, sometimes yielding positiveprice e-ects (COME, COMEI designs). Increasing 1� produces higher averagebiases for �p: PLHET and PLLHET estimate �d and �f quite well. The meanbiases are at most around 25% of the true values of the coeGcients. ThecoeGcient of the lagged choice is in general better estimated by PLLHETthan PLL, i.e. accounting for unobserved heterogeneity lowers the bias inestimating the e-ect of state dependence. However, the biases in the designswhere initial conditions are endogenous is signi,cant, up to 50% of the truevalue of �; similar to the PLL estimates.Allowing for endogenous initial conditions (PLLHETI) decreases, as might

be expected, the magnitude of the average biases of all the estimated co-eGcients compared to PLLHET in the design where initial conditions areendogenously generated and the individual e-ects are uncorrelated with theregressors (UNMEI design). However, this ,nding does not carry over to thecase where the random e-ects are correlated with prices (COMEI design):only the biases in �p and �f decrease substantially over those of PLLHET,while the bias in � tends to increase (in absolute terms). For the other designsthe biases tend to be higher than those of PLLHET. A notable exception isthe COLO design where the bias in �p drops signi,cantly (by 27%). In thesame design, however, the bias of � increases by approximately 50%. It isinteresting to note that, in most designs, the bias of the state dependenceparameter � reverses sign as compared to that of PLLHET; in particular, inmost cases � is underestimated (negative bias).The HK method overestimates on average the price e-ect; the mean biases

are all negative for all designs and bandwidths. The biases are at most 15% ofthe true �p; and they increase (in absolute magnitude) monotonically with thevalue of the bandwidth. The mean biases for �d are very large. The coeGcientis overestimated by 100–200%. However, the median biases, reported in theworking paper, are much smaller, up to 40% of the true value. In contrast,the average biases for �f are of much smaller order, at most 10% of thetrue coeGcient. HK estimates � quite well. The average biases are smallerthan those produced by the other methods. With the exception of the designswith endogenous initial conditions, the biases for � are at most 5% of thetrue value. In the UNMEI, COMEI designs, however, � is overestimatedaround 30%. Similar to the conditional methods, increasing the variance ofthe individual e-ects does not a-ect the average biases of the coeGcients,with the exception of �d: The average biases for �d, �f; and � decrease

Page 40: Panel data analysis of household brand choices

150 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 15Average number of HK strings

UN0 UNLO UNME COLO COME UNMEI COMEI

343 323 268 378 340 250 312

monotonically (but not necessarily in absolute value) with the bandwidthconstant h:Concerning the dispersion of the estimates around their averages across the

100 replications, as measured by their standard deviation (see Panels B ofTables 11–14), we see that the pooled estimates have the lowest dispersion,which is to be expected since these methods use all the data. All proceduresdisplay relatively constant dispersion across all designs and across the di-er-ent speci,cations of 12

�: The standard deviation of the HK estimates in generaldecreases as the bandwidth increases, as expected. The relatively large dis-persion of the HK estimates may be explained by the fact that this methoduses the smallest number of observations among all approaches. Table 15provides the average (over 100 replications) number of observations used bythe HK procedure. Thus, for example for the UNME design the sample sizeused by HK is less than a third of the sample size (n = 771) used by thepooled methods.Finally, Panels C of Tables 11–14 report root mean squared errors (RM-

SEs) of the parameter estimates across the 100 replications. We ,nd thatboth the conditional methods and HK have constant RMSEs for the �p co-eGcient across designs, although HK yields RMSEs that are almost twiceas large as those of the conditional methods. In contrast, the RMSEs of thepooled methods increase dramatically as we increase the relative magnitudeof the household e-ects and as we introduce correlation between the latterand the exogenous covariates. For �d we ,nd that HK produces very largeRMSEs, which is to be expected given the large mean biases for that co-eGcient. All other methods tend to produce similar RMSEs which do nottend to vary much across designs. For �f we note that all methods producecomparable RMSEs, with CL and CLL exhibiting the lowest ones. For �;however, both CLL and CLLP produce very large RMSEs as expected giventhe size of the average biases. With the exception of the endogenous initialconditions designs (UNMEI, COMEI), HK produces constant RMSEs whilethe pooled methods produce RMSEs that tend to increase as 12

� increases andas we introduce correlation between the household e-ects and the observedcovariates.We conclude that the conditional logit procedures appear to be the most

robust in estimating the coeGcients on the exogenous variables among allprocedures. However, they produce very poor estimates of the habit e-ect

Page 41: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 151

�: The pooled procedures produce signi,cantly biased estimates of key pa-rameters, such as the price coeGcient �p and the feedback parameter �; inparticular when initial conditions are endogenous and=or the individual ef-fects are correlated with the exogenous variables. The HK method estimates�p and � quite well, although it produces more imprecise estimates than bothconditional and pooled methods.

7. Conclusions

Empirical researchers have extensively used the panel data discrete choicemodel with a lagged choice variable to capture the e-ects of state depen-dence (‘loyalty’) on purchase behavior. More recently, various estimation ap-proaches have been implemented to also allow for the presence of unobservedheterogeneity across consumers for di-erent brands.The present paper provides a theoretical foundation for the standard model

of brand choice with a lagged dependent variable and unobserved individuale-ects. We introduce habit e-ects in the utility function which in generalrequires the consumer to solve a dynamic optimization problem. We derivesuGcient conditions under which the dynamic problem maps into a staticone-period optimization problem that underlies the standard econometric spec-i,cation.In addition, the paper provides an empirical application of the estimator

recently proposed by Honor/e and Kyriazidou (2000) for dynamic discretechoice panel data models and compares it to estimators typically used inthe literature. Our empirical results for the yogurt data reveal that all proce-dures yield a signi,cant price elasticity of the brand choice, and sensitivity ofconsumers to marketing variables, such as advertising. Furthermore, the esti-mates suggest that there exist large habit e-ects and substantial heterogeneityacross consumers. The size of the estimated parameters, however, varies con-siderably across estimation methods. Our Monte Carlo results indicate thatthe conditional likelihood procedures are the most robust in estimating thecoeGcients on the exogenous variables among all procedures. However, thefeedback parameter on the lagged dependent variable is signi,cantly underes-timated. The pooled procedures are quite sensitive to model misspeci,cation,often yielding large biases for key economic parameters, such as the e-ectof state dependence and especially the e-ect of prices on brand choice. Theestimator proposed by Honor/e and Kyriazidou performs quite satisfactorily incapturing both the price and the habit e-ects.Future research would involve application of the estimators to other kinds

of products, to a larger number of brands, and to other consumer decisions(e.g. discrete=continuous choices of brands and quantities). Another importanttopic for further investigation is the timing and frequency of purchases.

Page 42: Panel data analysis of household brand choices

152 P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Acknowledgements

We would like to thank Andrew Chesher, Jim Heckman, Bo Honor/e, ChengHsiao, Jean-Marc Robin, Peter Rossi, three anonymous referees, and semi-nar participants at various institutions for useful comments. The paper waspresented at the 1998 Econometrics Camp, Catalina Island, California, at the1998 Econometrics Group conference, Bristol, and at the 1999 EconometricSociety European Meetings, Santiago de Compostela. We are grateful to theparticipants for helpful suggestions. Andr/e Bonfrer provided excellent researchassistance. The second author gratefully acknowledges ,nancial support fromNSF.

References

Borsch-Supan, A., Hajivassiliou, V.A., 1993. Smooth unbiased multivariate probabilitysimulators for maximum likelihood estimation of limited dependent variable models. Journalof Econometrics 58 (3), 347–368.

Chamberlain, G., 1984. Panel data. In: Griliches, Z., Intriligator, M. (Eds.), Handbook ofEconometrics, Vol. II. North-Holland, Amsterdam.

Chamberlain, G., 1985. Heterogeneity, omitted variable bias, and duration dependence. In:Heckman, J.J., Singer, B. (Eds.), Longitudinal Analysis of Labor Market Data. CambridgeUniversity Press, Cambridge.

Chamberlain, G., 1993. Feedback in panel data models. Unpublished manuscript, Departmentof Economics, Harvard University.

Chay, K., Hyslop, D., 1998. Identi,cation and estimation of dynamic binary response models:empirical evidence on alternative approaches to examining welfare dependence. Workingpaper, Econometrics Camp, Catalina Island, CA.

Chiang, J., 1995. Competing coupon promotions and category sales. Marketing Science 14 (1),105–122.

Deaton, A., Muellbauer, J., 1980. Economics and Consumer Behavior. Cambridge UniversityPress, Cambridge.

Erdem, T., Keane, M.P., 1996. Decision-making under uncertainty: capturing dynamic brandchoice processes is turbulent consumer goods markets. Marketing Science 15 (1), 1–20.

Geweke, J., Keane, M., Runkle, D., 1994. Statistical inference in the multinomial multiperiodprobit model. Working paper, Federal Reserve Bank of Minneapolis Sta- Report: 177.

GLonLul, F.F., 1999. Estimating price expectation in the OTC medicine market: an application ofdynamic discrete choice models to scanner panel data. Journal of Econometrics 89, 41–56.

Guadagni, P.M., Little, J.D.C., 1983. A logit model of brand choice calibrated on scanner data.Marketing Science 2 (3), 203–238.

Hanemann, W.M., 1984. Discrete=continuous models of consumer demand. Econometrica 52(3), 541–561.

Hansen, L.P., 1982. Large sample properties of generalized method of moments estimators.Econometrica 50, 1029–1054.

Heckman, J.J., 1981. Heterogeneity and state dependence. In: Rosen, S. (Ed.), Studies of LaborMarkets. The University of Chicago Press, Chicago.

Honor/e, B.E., Kyriazidou, E., 2000. Panel data discrete choice models with lagged dependentvariables. Econometrica 68, 839–874.

Hsiao, C., 1986. Analysis of Panel Data. Cambridge University Press, Cambridge.

Page 43: Panel data analysis of household brand choices

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153 153

Jain, D.C., Vilcassim, N.J., 1991. Investigating household purchase timing decision: aconditional hazard function approach. Marketing Science 10, 1–23.

Jain, D.C., Vilcassim, N.J., Chintagunta, P.K., 1994. A random-coeGcients logit brand-choicemodel applied to panel data. Journal of Business and Economic Statistics 12 (3), 317–328.

Jones, J.M., Landwehr, T.J., 1988. Removing heterogeneity bias from logit model estimation.Marketing Science 7, 41–59.

Keane, M.P., 1997. Modeling heterogeneity and state dependence in consumer choice behavior.Journal of Business and Economic Statistics 15 (3), 310–327.

Magnac, T., 1997. State dependence and heterogeneity in youth employment histories. Workingpaper, INRA and CREST, Paris.

Manski, C., 1987. Semiparametric analysis of random e-ects linear models from binary paneldata. Econometrica 55, 357–362.

McCulloch, R.E., Rossi, P.E., 1994. An exact likelihood analysis of the multinomial probit:model. Journal of Econometrics 64 (1–2), 207–240.

Rossi, P.E., McCulloch, R.E., Allenby, G.M., 1996. The value of purchase history data in targetmarketing. Marketing Science 15 (4), 321–340.

Roy, R., Chintagunta, P., Haldar, S., 1996. A framework for analyzing habits, hand-of-past, andheterogeneity in dynamic brand choice. Marketing Science 15 (3), 280–299.

Vilcassim, N.J., Jain, D.C., 1996. Modelling purchase timing and brand switching behaviorincorporating explanatory variables and unobserved heterogeneity. Journal of MarketingResearch 28, 29–41.