identification in multivariate partial observability probitidentification in multivariate partial...

96
Identification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September 15, 2011 Abstract Poirier (1980, JoE ) considered a bivariate probit model in which the binary dependent variables y 1 and y 2 were not observed individually, but the product z = y 1 @y 2 was observed. This paper expands this notion of partial observability to multivariate settings.

Upload: others

Post on 05-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

Identification in Multivariate Partial Observability Probit

Dale J. PoirierUniversity of California, Irvine, USA

September 15, 2011

Abstract

Poirier (1980, JoE ) considered a bivariate probit model in which thebinary dependent variables y1 and y2 were not observed individually, but theproduct z = y1@y2 was observed. This paper expands this notion of partialobservability to multivariate settings.

Page 2: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

2

Bivariate Probit

C Consider N independent observations from the latent bivariateregression model

where 2 = [$1N, $2N, D]N is an unknown parameter vector.

Page 3: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

3

C The bivariate probit (BP) model arises when only the sign of isobserved, i.e., if , and = 0 if (i = 1, 2).

B Zellner and Lee (1965, Econometrica) and Ashford and Sowden(1970, Biometrics) were early contributors.

B Chib and Greenberg (1998, Biometrika) provided a Bayesian analysisof multivariate probit (MP).

C The bivariate ordered probit model arises when (i = 1, 2) areobserved in more than two categories.

Page 4: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

4

C Of primary concern here is extending the case of bivariate partialobservability (BPO) probit introduced in Poirier (1980, JoE) in whichonly is observed. Also considered BPO in sampleselections models (Heckit-like estimators).

C BPO can be thought of as a 2×2 contingency table with covariates andpartial observability as suggested below.

Cell Counts

y1 = 0 y1 = 1y2 = 0 n00 n10

y2 = 1 n01 n11

B In BPO, we only observe (n01 + n10 + n01) = N - n11 and n11.

Page 5: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

5

Example: Consider a two-agent committee that requires both agents to be infavor in order for a motion to pass.

C For example, consider a man and a women who contemplate gettingmarried in the current year.

C You observe Z = 1 if they get married, and Z = 0 if they don’t.

C But you don’t observe their individual decisions Y1 = 0, 1 and Y2 = 0, 1.

Page 6: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

6

Note: There are analogs of partial observability in the analysis ofcontingency tables in which cells cannot be fully distinguished.

C See Fienberg (1970), Cohen (1971), Pocock (1973), Haberman (1974),and Chen and Fienberg (1974, 1976).

C In both cases issues of identifiability require careful analysis.

C An important difference between the these contingency table approachand BPO is the introduction of covariates provides ties across cellsbeyond the requirement that cell probabilities must add to unity.

Page 7: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

7

Table 1: Applications with Partial ObservabilityField Article Discrete Outcomes

agricultureDimara, and Skuras

(2003)producer aware of innovationproducer adopts innovation

bankingDwyer and Hassan

(2007)bank fails

bank suspends payments

credit Swain (2002)bank’s decision on access

household’s demand for loans

education Ballou (1996)individual seeks teaching job

educator offers a job

development Glewwe (1996)working

private vs. public sector

IMFPrzeworski and

Vreeland (2002);Rajbhandari (2011)

decision by IMF to extend financingdecision by a gov. to apply for

assistance

immigration Aydemir (2002)individual applies to emigratehost decides whether to accept

foreign investment Konig (2003)ownership advantages holdlocation advantages hold

internalization advantages hold

labor Mohanty (1992)individual seeks employment

firm selects individual

law and economics Feinstein (1990)violationdetection

mergers Brasington (2003)richer school districtpoorer school district

Page 8: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

8

C Farber (1982, Econometrica) considered an intermediate case betweenBP and POBP which can be visualized as the 2×2 contingency table withcovariates and the following structure (censored probit).

Cell Counts

y1 = 0 y1 = 1y2 = 0 n00 n10

y2 = 1 n01 n11

B In this case, we observe (n00 + n01), n10, and n11.

Page 9: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

9

C Maximum likelihood estimation of BPO is included in STATA.

C Cavanagh and Sherman (1998, JoE ) introduced a class of rank estimatorsof scaled coefficients in semiparametric monotonic linear index models.

B CS also considered single equation multiple indices modelsincluding BPO probit.

B Importantly, their results imply semiparametric analysis of partialobservability models without the normality assumption are identified.

Page 10: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

10

C Let M(@) denote the univariate standard normal cdf, and M2(0, 0; D) denotethe bivariate standard normal cdf with correlation D.

C The likelihood functions for BP and BPO are

where and are obser-ved and the component probabilities are given in the following table.

Page 11: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

11

Joint Probabilities (i, j = 0, 1) for BP

yn1 = 0 yn1 = 1

yn2 = 0

yn2 = 1

Page 12: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

12

C Meng and Schmidt (1983, IER) compared the relative variances for BPvs. BPO for a small number of parameter settings.

B The information loss under BPO can be “surprisingly large,”particularly near points that are not identified.

B The cost increases with the fraction of the data for which thedependant variable is imperfectly observed.

Page 13: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

13

C If D = 0:B BP reduces to two univariate probit models with a block diagonal

information matrix. B The BPO information matrix is not block diagonal.

C If D … 0:B BP is more efficient than univariate probit.B In contrast, the underlying latent system shows no gains from pooling

the two equations unless restrictions are added.< The nonlinearity of the model is why the OLS efficiency result

for the linear latent model does not carry over.< But this result does carry over if a bivariate linear probability

model is estimated.

Page 14: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

14

Multivariate ProbitC Consider a sample of N independent observations from a multivariate

probit model with latent variable representation

where

(1)

(2)

Page 15: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

15

and you only observe the binary outcomes

B Analysis here is conditional on the K×1 vectors xn (n = 1, 2, ..., N).

B Stack all parameters into the M = 3K + 3 vector

B Finally, let denote a J-dimensional normal pdf with meanm and covariance matrix E and denote the corresponding cdf by

(3)

Page 16: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

16

C The multivariate probit (MP) choice probability is

where the regions of integration are given by

(4)

(5)

Page 17: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

17

C A convenient representation for MP choice probability can beobtained by reparameterizing :n and D to

where (n = 1, ..., N; j = 1, ..., J). Then

C The log-likelihood function for J-dimensional multivariate probit is

(6)

(7)

Page 18: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

18

C Barrenechea (2007) derived where

is the (J - 1)×(J - 1) matrix obtained by omitting the jth row andcolumn of , is the (J - 2)×(J - 2) matrix obtained by omittingthe ith and jth rows and columns of ,

(8)

(9)

Page 19: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

19

(13)

(12)

(11)

(10)

Page 20: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

20

C Table 2 expresses

in terms of the original parameters :n and D in the case J = 3.

Table 2: Trivariate Probit:

yn2 yn1 = 0 yn1 = 1

yn3 = 0yn2 = 0

yn2 = 1

yn3 = 1yn2 = 0

yn2 = 1

(6)

Page 21: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

21

C The information matrix for multivariate probit is

where the expected information in cell i = [i1, ..., iJ]N is

(14)

(15)

Page 22: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

22

Multivariate Partial Observability Probit

C A direct extension of Poirier (1980) to multivariate partial observability

(MPO) probit corresponds to observing only (n = 1, ..., N).

Table 3: Sampling Distribution for MPO

zn = 0 1 - zn = 1

Page 23: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

23

C The log-likelihood for MPO is

where is observed.

(16)

Page 24: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

24

C The information matrix for MPO is

where

(17)

Page 25: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

25

C The additional information in MP over MPO is

where i = [i1, ..., iJ]N, 4J = [1, ..., 1]N, and gn(i*2) is given by (15).

B Clearly (18) is positive semidefinite.

B The internal sum in (18) is over the cells for yn that cannot be

distinguished when zn = 0.

(18)

Page 26: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

26

C MPO reflects unanimous consent among J agents.

B J = 2 (marriage)

B J = 12 (American criminal justice)

< Zn = 1 (guilty)

< Zn = 0 implies unanimous agreement on “not guilty” or a hung

jury.

Page 27: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

27

C The next section discusses conditions under which (16) is positive

definite, and hence, 2 is locally identified.

B But even when 2 is identified, MPO is unlikely to provide much

information on 2 when J > 2.

B So it is natural to look at other types of partial information that

involve more information than MPO, but less than MP.

Page 28: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

28

C Suppose the complete set of J(J - 1)/2 bivariate products

Znij = Yni @Ynj (i, j = 1, ..., J; i < j)

are observed. BPO applied to all possible pairs in MP is defined to be as

multivariate bivariate partial observability (MBPO).

Page 29: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

29

B MP uncovers 2J cells for yn.

B MPO uncovers one cell, yn = [1, ..., 1]N, but cannot distinguish among

the other 2J - 1 cells.

B MBPO uncovers 2J - J - 1 cells, but cannot distinguish among the J +

1 cells with znij = 0 (i.e., the cell yn = 0J and J cells with exactly one

ynij = 1).

B As J increases the proportion of cells that can be distinguished

increases.

Page 30: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

30

C How interesting is the MBPO data situation?

B A data collecting agency can use MBPO to design data releases that

are only partially informative.

B Consider asking J binary (yes/no) questions including a highly

sensitive one in which ynJ = 0 is embarrassing.

< If individuals are only asked whether all their answers equal one

(MPO), or whether all answers on pairs of equations are one

(MBPO), then a “no” answer does not imply ynJ = 0.

B Other suggestions?

Page 31: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

31

C The marginal distribution of Znij in MBPO is Bernouli with

B Poirier (1980) gave conditions such that Znij identifies $i, $j, and Dij.

B The joint sampling distribution Znij = Yni @Ynj (i, j = 1, ..., J; i < j)

provides a likelihood function for identifying 2.

Page 32: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

32

< Unlike using all possible BPs to estimate MP parameters

[e.g., Kimhi (1994, AJAE)], MBPO uses the actual likelihood for

and not a quasi-likelihood.

− “quasi” because the BPs are not independent.

< MBPO involves J-dimensional integrals. BPO only requires two

dimensional integrals.

Page 33: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

33

C If zn uncovers a specific cell, let dn = 1, and let dn = 0 otherwise.

B If dn = 1, then .

B If dn = 0, then equals the sum of probabilities

for all the cells that cannot be distinguished, i.e.,

(19)

Page 34: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

34

C The MBPO log-likelihood is

B The information matrix for MBPO is

B I MBPO (2) - I MPO (2) and I MP(2) - I MBPO(2) are positive semidefinite.

(20)

Page 35: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

35

C Consider trivariate partial observability (TPO) (J = 3) [Konig (2003,

Rev. World Econ)].

Page 36: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

36

B The additional information in trivariate probit (TP) over TPO is

where i = [i1, i2, i3]N and

(15)

Page 37: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

37

C Trivariate bivariate partial observability (TBPO) arises when only zn12

= yn1 @ yn2, zn13 = yn1 @ yn3, and zn23 = yn2 @ yn3 are observed.

Page 38: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

38

B For some realizations of

it is possible to recover the unobserved yn.

< If zn12 = zn23 = 1, then yn = [1, 1, 1]N.

< If zn12 = 1 and zn23 = 0, then yn = [1, 1, 0]N.

< If zn12 = 0 and zn23 = 1, then yn = [0, 1, 1]N.

< If zn12 = zn23 = 0 and zn13 = 1, then yn = [1, 0, 1]N.

< Set dn = 1 if any of these four cases occurs.

Page 39: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

39

< Otherwise, set dn = 0 for the four remaining cells which cannot be

distinguished when zn12 = zn13 = zn23 = 0 [see (19)]:

B See Table 4.

(23)

Page 40: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

40

Table 4: Mapping z12, z23, z13 6 yn for TBPO

zn13 = 0 zn13 = 1zn12 = 0 zn12 = 1 zn12 = 0 zn12 = 1

zn23 = 0[0, 0, 0]N, [1, 0, 0]N,

[0, 1, 0]N, [0, 0, 1]N[1, 1, 0]N [1, 0, 1]N

zn23 = 1 [0, 1, 1]N [1, 1, 1]N

Note: Blank cells indicate impossible combinations.

C The joint distribution of Zn, is given in Table 5.

Page 41: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

41

Table 5: Sampling Distribution for TBPO

zn12 zn13 zn23

0 0 0

1 -

-

-

- 1 0 00 1 00 0 11 1 1

Note: Other combinations of the znij are assigned probability zero.

Page 42: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

42

C The TBPO log-likelihood is

where and

(24)

Page 43: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

43

B The information lost in going from TBPO to TP is

B The information gained in going from TPO to TBPO is

Page 44: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

44

C Just like in the case of MP, increasing J by one adds new parameters, but

also the possibility of more information on $j (1 # j # J) provided Dj(J+1)

… 0. The same holds for partial observability.

B Going from J to J + 1 increases the number of parameters by K + J.

Page 45: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

45

Identification

C Lindley (1971): “In passing it might be noted that unidentifiability causes no

real difficulty in the Bayesian approach.” (assuming a proper prior)

C Kadane (1974): “... identification is a property of the likelihood function, and

is the same whether considered classically or from the Bayesian approach.”

C Poirier (1988, ET ): There is, however, no Bayesian

free lunch. The “price” is that there exist quantities

about which the data are uninformative, i.e., their

marginal prior and posterior distributions are identical.

Page 46: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

46

C Consider the bivariate case J = 2.

C The BPO likelihood is

C Because

there is a labeling problem.

Page 47: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

47

C Geweke (2007, Comp. Stat. & Data Anal.) discussed a similar labeling

problem in the context of mixtures.

B If the function of interest is invariant to parameter permutation, then

the labeling is not a problem.

< An example is prediction of z.

< However, obtaining posterior simulation convergence can be

more challenging.

B If the function of interest involves, say, only $1, then the prior should

also not be permutation invariant.

< Different marginal priors for $1 and $2 should be used.

< E.g., a dogmatic exclusion restriction on a component in $1.

Page 48: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

48

B It is inherent in BPO that the researcher must introduce non-data

based information to distinguish between the two equations.

Page 49: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

49

C Given restrictions R(2) = 0, Poirier (1980, JoE) showed [using

Rothenberg (1971, Econometrica)] that 2 is locally identified provided

where the BPO information matrix is

Page 50: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

50

where

Page 51: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

51

C I BPO(2) is a weighted sum across observations of outer products of the

gradient vectors .

C Provided these gradients exhibit sufficient variations across observations

and there is at least one exclusion restriction on $, I BPO(2) is nonsingular

and 2 is locally identified.

B If the excluded covariate is continuous, then the required variation

is obtained.

Page 52: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

52

B The restriction D12 = 0 alone may or may not identify $1 and $2.

B The intermediate cases of Meng and Schmidt (1983, IER) (e.g.,

censored probit) are identified without any restrictions.

C Begin with the simplest case: no covariates.

C Japanese proverb: “Your garden is not complete until there’s nothing

more you can take out of it.”

Page 53: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

53

C Pearson (1900, Phil. Tran. Roy. Soc. of Lon., Series A ) and Sheppard

(1900, Tran. of the Cam. Phil. Soc.).

B K = 1, xn = 1 (n = 1, ..., N).

B D12 is the tetrachoric correlation coefficient.

B Now known as BP with no covariates.

B With full observability everything is identified. Life is great for

BP!

Page 54: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

54

Figure 1: BP Likelihood, $1 =0, $2 =0, D = 0

Page 55: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

55

Figure 2: BP Likelihood Contours, $1 = 0, $2 = 0, D12 = 0

Page 56: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

56

B If D12 … 0, MLE for BP is more efficient than univariate probit MLEs:

Cell Counts

y1 = 0 y1 = 1y2 = 0 n00 n10

y2 = 1 n01 n11

< The MLE of is The MLE of $1,

$2, and D12 are and

(implicitly)

Page 57: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

57

C For this intercept-only case, the sample information matrix for BPO is

where

Page 58: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

58

C Clearly I BPO(2) is singular in this intercept-only case.

B The rank deficiency is 3 - 1 = 2. Knowing D12 does not identify $.

B $1, $2 and D12 are individually unidentified for BPO.

B Only the scalar

is identified. The MLE of is

Page 59: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

59

Figure 3: BPO Likelihood, $1 = 0, $2 = 0, D12 = 0

Page 60: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

60

Figure 4: BPO Likelihood Contours, $1 = 0, $2 = 0, D12 = 0

Page 61: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

61

C Clearly, without restrictions, BPO has a labeling problem.

B In fact, there are hyperplanes through the parameter space for which

the likelihood is constant.

< $2 = D12 $1

< $1 = D12 $2

B Recall:

Page 62: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

62

A “Peculiar” Case [Poirier (1980, JoE)]

Add a binary regressor to xn such that xn = [1, 0] for the first r

observations, and xn = [1, 1] for the last N - r observations.

C Suppose and or in other words,

R(2) = $12 = 0 and M R / M 2N = [0, 1, 0, 0, 0].

B (n = 1, ..., r)

B (n = r + 1, ..., N).

Page 63: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

63

C Under D12 = 0 the information matrix for BPO is

where

Page 64: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

64

C Define

B *NI BPO(2) * = 0 implying I BPO(2) is not positive definite.

B Therefore, 2 is not identified if the excluded covariate is binary.

Page 65: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

65

C Recall the “simple” case: there are three parameters $11, $21, D12, but only

is identified with BPO.

B Now add $22, but there is only one additional cofactor value.

B Even with D12 = 0, $11, $21, and $22 are not individually identified,

because the cofactor does not exhibit sufficient variation.

B If the cofactor also took on a third value, then $11, $21, and $22 are

individually identified if D12 = 0.

B If the cofactor is continuous, then no peculiar identification problem

arises.

B The non-linearity of ‹BPO(2) provides the local identification.

Page 66: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

66

C In the TPO case

where

Page 67: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

67

B Identification depends on the nonsingularity of

which rests on sufficient variation of the gradient over observations.

B In the intercept-only case the only thing identified is the scalar M3($1,

$2 $3; D12, D23, D13).

B Unlike the rank deficiency of two in the bivariate case, the rank

deficiency is 6 - 1 = 5 in the trivariate case.

(17)

Page 68: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

68

B Also there are labeling problems with information matrix (17)

becoming singular if $2 = D12 $1, $1 = D12 $2, $3 = D13 $1, $1 = D13 $3,

$2 = D23 $3, and $3 = D23 $2.

B Following the discussion of the BPO case, let xn = [1, xn2, xn3]N and

suppose $1 = [$11, 0, 0]N, $2 = [$21, $22, 0]N, and $3 = [$31, 0, $33]N.

< In other words, two continuous covariates xn2 and xn3 are intro-

duced together with the four restrictions $12 = $13 = $23 = $32 = 0.

< For the moment also suppose D = 03.

Page 69: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

69

< Dropping the last three elements in (36) yields

Page 70: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

70

Computation Issues

C The MP likelihood function requires computation of MJ (:n; D).

C J # 4 in applications in health, labor or education economics.

C Lesaffre and Kaufman (1992, JASA) showed

B MJ (:n; D) is strictly concave in $ given D.

B MJ (:n; D) is not strictly concave in D.

B MLE is consistent asymptotically normal provided no perfect

classifiers exist.

B Did not really address computational issues.

Page 71: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

71

C Freedman and Sekhon (2010, Pol. Anal., p. 145) note numerical problems

with bivariate probit.

B

< strictly monotone in D12.

< convex on(-1, 0)

< concave on (0, 1)

B

Page 72: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

72

C Huguenin, Pelgrin and Holly (2009) provide an exact decomposition of

the cdf of the J-variate (standardized) normal vector encountered in the

likelihood function of a multivariate probit model.

B Obtain a sum of multivariate integrals, in which the highest

dimension of the integrands is J ! 1.

< Integration domains are bounded - simplifying the integration.

< Based on Plackett (1954, Biometrika).

B J = 4 is feasible.

B Also consider the singular case.

Page 73: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

73

C Gassmann (2003, JCGS) also employed a recursive numerical method

based on Plackett (1954, Biometrika).

B Argues it works for J # 10.

B Drezner (1994) and Genz (2004) used the method for J = 3.

Page 74: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

74

C Gassmann, Deák, and Szántai (2002, JCGS) described and compared

numerical methods for finding multivariate probabilities over a rectangle.

B Computation times depend on J, the correlation structure, the

magnitude of the sought probability, and the required accuracy.

B Numerical tests were conducted on approximately 3,000 problems

generated randomly in up to J = 20 dimensions.

Page 75: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

75

B The findings indicate that direct integration methods give acceptable

results for up to J = 12, provided that the probability mass of the

rectangle is not too large (less than about 0.9).

B For problems with small probabilities (less than 0.3) a crude Monte

Carlo method gives reasonable results quickly, while bounding

procedures perform best on problems with large probabilities (> 0.9).

B For larger problems numerical integration with quasi-random

Korobov points and a decomposition method due to Deak.

Page 76: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

76

C Factor structures for the correlation matrix.

B Integration problems are typically greatly reduced.

B Butler and Moffitt (1992, Econometrica), equicorrelated.

B Ochi and Prentice (1984, Biometrika), equi-correlated.

B Hausman and Wise (1978, Econometrica): J = 3.

B Muthén (1979, JASA): J = 4

B Hausman (1980, Cahiers du Seminaire D’Econometri)

B Bock and Gibbons (1996, Biometrics)

B Gibbons and Wilcox-Gök (1998, JASA): MLE, J = 5.

B Song and Lee (2005, Statistica Sinica): MLE, J = 4.

B Ansari and Jedidi (2000, Psychometrika): Bayesian, J =12.

B Gibbons and Wilcox-Gök (1998, JASA): MLE, J = 5.

Page 77: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

77

C Bayesians must sample from non-tractable posterior that is proportional

to the product of the prior and the MP likelihood.

B Data augmentation to the rescue! [Chib and Greenberg (1998,

Biometrika), Albert and Chib (1993, JASA) McCulloch and Rossi

(1994, JoE)].

C It is possible to sample from the posterior without computing the

likelihood function!

Page 78: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

78

B Bayesian examples:

< Czado (1996): J = 5.

< Dey and Chen (1996), Chen and Dey (1998, Sankhy~): J = 6.

< Chib and Greenberg (1998): J = 7.

< Voicu and Buddelmeyer (2003, IZA): J = 6.

< Edwards and Allenby (2003, JMR): J = 25, equicorrelated.

< García-Zattera, Jara, Lesaffre and Declerk (2005): J = 8.

< Stefanescu and Turnbull (2005, Biom. J.): equi-correlated.

< Tabet (2007): J = 8.

< Lawrence et al. (2008): J = 6.

< Rosas and Shomer (2008, Leg. Stud. Quar.)

< Duvvuri and Gruca (2010, Psychometrika): J = 4

< Hahn, Carvallo and Scott (2010): J = 100, six factors

Page 79: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

79

C MCMC methods have difficulty dealing with the generation of draws

from the correlation matrix, because its conditional density is of

nonstandard form.

B Methods have been developed to generate draws using variants of

< Metropolis-Hastings algorithms [Chib and Greenberg (1998,

Biometrika); Nierop et al. (2000)].

< Griddy-Gibbs methods [Barnard, McCulloch, and Meng (2000);

Liechty, Ramaswamy, and Cohen (2000)].

B These approaches are computationally intensive and therefore are of

limited use when applied to large dimensional problems.

Page 80: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

80

C One contested issue is whether to work in terms of

B unidentified covariance matrix [Edwards and Allenby (2003, JMR),

McCulloch and Rossi (1994, JoE), Rossi, Allenby, and McCulloch

(2005, Bayesian Statistics and Marketing)]

B identified correlation matrix [Chib and Greenberg (1998,

Biometrika)]

Page 81: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

81

Priors

C Given the information loss in BPO, an informative Bayesian analysis is

an attractive option for supplementing sample information.

B Identification of 2 in the BPO model is often weak, and what is

needed is an informative prior with some public appeal, not a

“noninformative” prior.

B Below is a prior family which should be attractive in many situations.

B Later I will suggest a restricted version depending on only four easily

interpreted hyperparameters.

Page 82: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

82

C Assume $ = [$1N, $2N]N z D with joint pdf

where K = K1 + K2.

B This is a conjugate multivariate normal prior for $ given D.

B The prior pdf f$(D* ) is the symmetric beta

Page 83: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

83

Posterior

C The posterior pdf of 2 is unfortunately analytically intractable. So

Bayesian analysis rests on the ability to draw a sample from it.

B BPO can be viewed as a binary outcome model in which the link

function is a bivariate standard normal cdf.

B Thus consider the data augmented posterior where

.

Page 84: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

84

B Set and r = 0 and proceed as follows.

Step 1: Consider the distribution . The only difference from the

standard data augmentation case is that we must sample from a truncated

bivariate distribution here. Let

Page 85: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

85

Set r = r + 1 and sample either

or

Page 86: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

86

Step 2: - where

Page 87: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

87

Step 3: , where

The random walk Metropolis-Hastings algorithm is a convenient way to

sample from this univariate distribution.

Page 88: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

88

C Write D†(r) = D(r - 1) + u, where D†(r) is the candidate value, D(r - 1) is the

current value, and u - ù(0, N–1).

C Let

Generate a proposed value D†(r) by drawing u. Put D(r) = D†(r) with

probability

and leave D(r) = D(r - 1) with probability 1 - "(D†, D).

Page 89: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

89

C Go back to step 1 and repeat until convergence is obtained. As R 64, the

resulting draws converge in distribution to the augmented posterior.

C Given a sample (r = 1, 2, ..., R), it is then possible to

compute posterior expectations of a quantity of interest, say, h($1, $2, D).

Examples are:

B Moments of parameters.

B Posterior probabilities of the signs of parameters.

B Predictive density corresponding to covariates

and

(i, j = 0, 1).

Page 90: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

90

Restricted Prior

C The following restrictions should be attractive in many cases. Their

appeal rests on the easiness of choosing the hyperparameters.

Page 91: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

91

C Suppose

B the first element in xn1 and xn2 is unity,

B any continuous covariates are measured as deviations from their

sample means divided by sample standard deviations, and

B dummy variables are left as is.

B These conventions guarantee a simple interpretation for the intercepts

and render coefficients unitless facilitating prior elicitation.

Page 92: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

92

C Suppose , where (i = 1, 2), and

B This prior centers the slopes over zero, leaving specification of the

hyperparameters and to center the intercepts.

B One way of choosing (j = 1, 2) is to consider the average observation

, subjectively assess = Prob(ynj = 1), and then

choose (j = 1, 2).

B The hyperparameter controls the prior precision for $ relative to the

precision in OLS applied separately to the latent system.

Page 93: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

93

C The remaining two hyperparameters a and c are candidates for a sensitivity

analysis reflecting the strength of the prior.

B The hyperparameter determines the precision 4(2 + 1) of D.

B Centering the prior for D over zero (implying the two binary decisions

are independent), is often a convenient reference point.

B This prior has many similarities with priors in Adkins and Hill (1996),

Zellner (1986), and Zellner and Rossi (1984, JoE), but applied here to

bivariate probit.

Page 94: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

94

C Note ( j = 1, 2), where

B Posterior mean slopes shrink the OLS slopes toward and the

posterior mean intercept shrinks the OLS slope toward

B Centering the slopes over non-zero values, or changing prior precision

matrix to an arbitrary does not complicate matters.

Page 95: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

95

Concluding Comments

C multivariate-t [Chib (2000), Dey and Chen (1996, 2008)]

C Chen and Dey (1998, Sankhy~): scale mixture of multivariate normal link.

B Unlike Chib and Greenberg (1998, Biometrika), use a Metropolized

hit-and-run algorithm (Chen and Schmeiser, 1993) .

C Chen and Shao (1999, JMA) considered multivariate data where some are

binary and others are ordinal; Bayesian, J = 3.

C multivariate multinomial [Zhang, Boscardin, and Belin (2008, CSDA)]

Page 96: Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial Observability Probit Dale J. Poirier University of California, Irvine, USA September

96

C Correlated binary response data with covariates:

B Liang and Zeger (1986, Biometrika)

B Zeger and Liang (1986, Biometrics)

B Prentice (1988, Biometrics)

B Tan, Qu and Kutner (1997, Comm. Stat.)

B Carey, Zeger and Diggle (1993, Biometrika)

B Glonek and McCullagh (1995, JRSSB).