identification in multivariate partial observability probitidentification in multivariate partial...
TRANSCRIPT
Identification in Multivariate Partial Observability Probit
Dale J. PoirierUniversity of California, Irvine, USA
September 15, 2011
Abstract
Poirier (1980, JoE ) considered a bivariate probit model in which thebinary dependent variables y1 and y2 were not observed individually, but theproduct z = y1@y2 was observed. This paper expands this notion of partialobservability to multivariate settings.
2
Bivariate Probit
C Consider N independent observations from the latent bivariateregression model
where 2 = [$1N, $2N, D]N is an unknown parameter vector.
3
C The bivariate probit (BP) model arises when only the sign of isobserved, i.e., if , and = 0 if (i = 1, 2).
B Zellner and Lee (1965, Econometrica) and Ashford and Sowden(1970, Biometrics) were early contributors.
B Chib and Greenberg (1998, Biometrika) provided a Bayesian analysisof multivariate probit (MP).
C The bivariate ordered probit model arises when (i = 1, 2) areobserved in more than two categories.
4
C Of primary concern here is extending the case of bivariate partialobservability (BPO) probit introduced in Poirier (1980, JoE) in whichonly is observed. Also considered BPO in sampleselections models (Heckit-like estimators).
C BPO can be thought of as a 2×2 contingency table with covariates andpartial observability as suggested below.
Cell Counts
y1 = 0 y1 = 1y2 = 0 n00 n10
y2 = 1 n01 n11
B In BPO, we only observe (n01 + n10 + n01) = N - n11 and n11.
5
Example: Consider a two-agent committee that requires both agents to be infavor in order for a motion to pass.
C For example, consider a man and a women who contemplate gettingmarried in the current year.
C You observe Z = 1 if they get married, and Z = 0 if they don’t.
C But you don’t observe their individual decisions Y1 = 0, 1 and Y2 = 0, 1.
6
Note: There are analogs of partial observability in the analysis ofcontingency tables in which cells cannot be fully distinguished.
C See Fienberg (1970), Cohen (1971), Pocock (1973), Haberman (1974),and Chen and Fienberg (1974, 1976).
C In both cases issues of identifiability require careful analysis.
C An important difference between the these contingency table approachand BPO is the introduction of covariates provides ties across cellsbeyond the requirement that cell probabilities must add to unity.
7
Table 1: Applications with Partial ObservabilityField Article Discrete Outcomes
agricultureDimara, and Skuras
(2003)producer aware of innovationproducer adopts innovation
bankingDwyer and Hassan
(2007)bank fails
bank suspends payments
credit Swain (2002)bank’s decision on access
household’s demand for loans
education Ballou (1996)individual seeks teaching job
educator offers a job
development Glewwe (1996)working
private vs. public sector
IMFPrzeworski and
Vreeland (2002);Rajbhandari (2011)
decision by IMF to extend financingdecision by a gov. to apply for
assistance
immigration Aydemir (2002)individual applies to emigratehost decides whether to accept
foreign investment Konig (2003)ownership advantages holdlocation advantages hold
internalization advantages hold
labor Mohanty (1992)individual seeks employment
firm selects individual
law and economics Feinstein (1990)violationdetection
mergers Brasington (2003)richer school districtpoorer school district
8
C Farber (1982, Econometrica) considered an intermediate case betweenBP and POBP which can be visualized as the 2×2 contingency table withcovariates and the following structure (censored probit).
Cell Counts
y1 = 0 y1 = 1y2 = 0 n00 n10
y2 = 1 n01 n11
B In this case, we observe (n00 + n01), n10, and n11.
9
C Maximum likelihood estimation of BPO is included in STATA.
C Cavanagh and Sherman (1998, JoE ) introduced a class of rank estimatorsof scaled coefficients in semiparametric monotonic linear index models.
B CS also considered single equation multiple indices modelsincluding BPO probit.
B Importantly, their results imply semiparametric analysis of partialobservability models without the normality assumption are identified.
10
C Let M(@) denote the univariate standard normal cdf, and M2(0, 0; D) denotethe bivariate standard normal cdf with correlation D.
C The likelihood functions for BP and BPO are
where and are obser-ved and the component probabilities are given in the following table.
11
Joint Probabilities (i, j = 0, 1) for BP
yn1 = 0 yn1 = 1
yn2 = 0
yn2 = 1
12
C Meng and Schmidt (1983, IER) compared the relative variances for BPvs. BPO for a small number of parameter settings.
B The information loss under BPO can be “surprisingly large,”particularly near points that are not identified.
B The cost increases with the fraction of the data for which thedependant variable is imperfectly observed.
13
C If D = 0:B BP reduces to two univariate probit models with a block diagonal
information matrix. B The BPO information matrix is not block diagonal.
C If D … 0:B BP is more efficient than univariate probit.B In contrast, the underlying latent system shows no gains from pooling
the two equations unless restrictions are added.< The nonlinearity of the model is why the OLS efficiency result
for the linear latent model does not carry over.< But this result does carry over if a bivariate linear probability
model is estimated.
14
Multivariate ProbitC Consider a sample of N independent observations from a multivariate
probit model with latent variable representation
where
(1)
(2)
15
and you only observe the binary outcomes
B Analysis here is conditional on the K×1 vectors xn (n = 1, 2, ..., N).
B Stack all parameters into the M = 3K + 3 vector
B Finally, let denote a J-dimensional normal pdf with meanm and covariance matrix E and denote the corresponding cdf by
(3)
16
C The multivariate probit (MP) choice probability is
where the regions of integration are given by
(4)
(5)
17
C A convenient representation for MP choice probability can beobtained by reparameterizing :n and D to
where (n = 1, ..., N; j = 1, ..., J). Then
C The log-likelihood function for J-dimensional multivariate probit is
(6)
(7)
18
C Barrenechea (2007) derived where
is the (J - 1)×(J - 1) matrix obtained by omitting the jth row andcolumn of , is the (J - 2)×(J - 2) matrix obtained by omittingthe ith and jth rows and columns of ,
(8)
(9)
19
(13)
(12)
(11)
(10)
20
C Table 2 expresses
in terms of the original parameters :n and D in the case J = 3.
Table 2: Trivariate Probit:
yn2 yn1 = 0 yn1 = 1
yn3 = 0yn2 = 0
yn2 = 1
yn3 = 1yn2 = 0
yn2 = 1
(6)
21
C The information matrix for multivariate probit is
where the expected information in cell i = [i1, ..., iJ]N is
(14)
(15)
22
Multivariate Partial Observability Probit
C A direct extension of Poirier (1980) to multivariate partial observability
(MPO) probit corresponds to observing only (n = 1, ..., N).
Table 3: Sampling Distribution for MPO
zn = 0 1 - zn = 1
23
C The log-likelihood for MPO is
where is observed.
(16)
24
C The information matrix for MPO is
where
(17)
25
C The additional information in MP over MPO is
where i = [i1, ..., iJ]N, 4J = [1, ..., 1]N, and gn(i*2) is given by (15).
B Clearly (18) is positive semidefinite.
B The internal sum in (18) is over the cells for yn that cannot be
distinguished when zn = 0.
(18)
26
C MPO reflects unanimous consent among J agents.
B J = 2 (marriage)
B J = 12 (American criminal justice)
< Zn = 1 (guilty)
< Zn = 0 implies unanimous agreement on “not guilty” or a hung
jury.
27
C The next section discusses conditions under which (16) is positive
definite, and hence, 2 is locally identified.
B But even when 2 is identified, MPO is unlikely to provide much
information on 2 when J > 2.
B So it is natural to look at other types of partial information that
involve more information than MPO, but less than MP.
28
C Suppose the complete set of J(J - 1)/2 bivariate products
Znij = Yni @Ynj (i, j = 1, ..., J; i < j)
are observed. BPO applied to all possible pairs in MP is defined to be as
multivariate bivariate partial observability (MBPO).
29
B MP uncovers 2J cells for yn.
B MPO uncovers one cell, yn = [1, ..., 1]N, but cannot distinguish among
the other 2J - 1 cells.
B MBPO uncovers 2J - J - 1 cells, but cannot distinguish among the J +
1 cells with znij = 0 (i.e., the cell yn = 0J and J cells with exactly one
ynij = 1).
B As J increases the proportion of cells that can be distinguished
increases.
30
C How interesting is the MBPO data situation?
B A data collecting agency can use MBPO to design data releases that
are only partially informative.
B Consider asking J binary (yes/no) questions including a highly
sensitive one in which ynJ = 0 is embarrassing.
< If individuals are only asked whether all their answers equal one
(MPO), or whether all answers on pairs of equations are one
(MBPO), then a “no” answer does not imply ynJ = 0.
B Other suggestions?
31
C The marginal distribution of Znij in MBPO is Bernouli with
B Poirier (1980) gave conditions such that Znij identifies $i, $j, and Dij.
B The joint sampling distribution Znij = Yni @Ynj (i, j = 1, ..., J; i < j)
provides a likelihood function for identifying 2.
32
< Unlike using all possible BPs to estimate MP parameters
[e.g., Kimhi (1994, AJAE)], MBPO uses the actual likelihood for
and not a quasi-likelihood.
− “quasi” because the BPs are not independent.
< MBPO involves J-dimensional integrals. BPO only requires two
dimensional integrals.
33
C If zn uncovers a specific cell, let dn = 1, and let dn = 0 otherwise.
B If dn = 1, then .
B If dn = 0, then equals the sum of probabilities
for all the cells that cannot be distinguished, i.e.,
(19)
34
C The MBPO log-likelihood is
B The information matrix for MBPO is
B I MBPO (2) - I MPO (2) and I MP(2) - I MBPO(2) are positive semidefinite.
(20)
35
C Consider trivariate partial observability (TPO) (J = 3) [Konig (2003,
Rev. World Econ)].
36
B The additional information in trivariate probit (TP) over TPO is
where i = [i1, i2, i3]N and
(15)
37
C Trivariate bivariate partial observability (TBPO) arises when only zn12
= yn1 @ yn2, zn13 = yn1 @ yn3, and zn23 = yn2 @ yn3 are observed.
38
B For some realizations of
it is possible to recover the unobserved yn.
< If zn12 = zn23 = 1, then yn = [1, 1, 1]N.
< If zn12 = 1 and zn23 = 0, then yn = [1, 1, 0]N.
< If zn12 = 0 and zn23 = 1, then yn = [0, 1, 1]N.
< If zn12 = zn23 = 0 and zn13 = 1, then yn = [1, 0, 1]N.
< Set dn = 1 if any of these four cases occurs.
39
< Otherwise, set dn = 0 for the four remaining cells which cannot be
distinguished when zn12 = zn13 = zn23 = 0 [see (19)]:
B See Table 4.
(23)
40
Table 4: Mapping z12, z23, z13 6 yn for TBPO
zn13 = 0 zn13 = 1zn12 = 0 zn12 = 1 zn12 = 0 zn12 = 1
zn23 = 0[0, 0, 0]N, [1, 0, 0]N,
[0, 1, 0]N, [0, 0, 1]N[1, 1, 0]N [1, 0, 1]N
zn23 = 1 [0, 1, 1]N [1, 1, 1]N
Note: Blank cells indicate impossible combinations.
C The joint distribution of Zn, is given in Table 5.
41
Table 5: Sampling Distribution for TBPO
zn12 zn13 zn23
0 0 0
1 -
-
-
- 1 0 00 1 00 0 11 1 1
Note: Other combinations of the znij are assigned probability zero.
42
C The TBPO log-likelihood is
where and
(24)
43
B The information lost in going from TBPO to TP is
B The information gained in going from TPO to TBPO is
44
C Just like in the case of MP, increasing J by one adds new parameters, but
also the possibility of more information on $j (1 # j # J) provided Dj(J+1)
… 0. The same holds for partial observability.
B Going from J to J + 1 increases the number of parameters by K + J.
45
Identification
C Lindley (1971): “In passing it might be noted that unidentifiability causes no
real difficulty in the Bayesian approach.” (assuming a proper prior)
C Kadane (1974): “... identification is a property of the likelihood function, and
is the same whether considered classically or from the Bayesian approach.”
C Poirier (1988, ET ): There is, however, no Bayesian
free lunch. The “price” is that there exist quantities
about which the data are uninformative, i.e., their
marginal prior and posterior distributions are identical.
46
C Consider the bivariate case J = 2.
C The BPO likelihood is
C Because
there is a labeling problem.
47
C Geweke (2007, Comp. Stat. & Data Anal.) discussed a similar labeling
problem in the context of mixtures.
B If the function of interest is invariant to parameter permutation, then
the labeling is not a problem.
< An example is prediction of z.
< However, obtaining posterior simulation convergence can be
more challenging.
B If the function of interest involves, say, only $1, then the prior should
also not be permutation invariant.
< Different marginal priors for $1 and $2 should be used.
< E.g., a dogmatic exclusion restriction on a component in $1.
48
B It is inherent in BPO that the researcher must introduce non-data
based information to distinguish between the two equations.
49
C Given restrictions R(2) = 0, Poirier (1980, JoE) showed [using
Rothenberg (1971, Econometrica)] that 2 is locally identified provided
where the BPO information matrix is
50
where
51
C I BPO(2) is a weighted sum across observations of outer products of the
gradient vectors .
C Provided these gradients exhibit sufficient variations across observations
and there is at least one exclusion restriction on $, I BPO(2) is nonsingular
and 2 is locally identified.
B If the excluded covariate is continuous, then the required variation
is obtained.
52
B The restriction D12 = 0 alone may or may not identify $1 and $2.
B The intermediate cases of Meng and Schmidt (1983, IER) (e.g.,
censored probit) are identified without any restrictions.
C Begin with the simplest case: no covariates.
C Japanese proverb: “Your garden is not complete until there’s nothing
more you can take out of it.”
53
C Pearson (1900, Phil. Tran. Roy. Soc. of Lon., Series A ) and Sheppard
(1900, Tran. of the Cam. Phil. Soc.).
B K = 1, xn = 1 (n = 1, ..., N).
B D12 is the tetrachoric correlation coefficient.
B Now known as BP with no covariates.
B With full observability everything is identified. Life is great for
BP!
54
Figure 1: BP Likelihood, $1 =0, $2 =0, D = 0
55
Figure 2: BP Likelihood Contours, $1 = 0, $2 = 0, D12 = 0
56
B If D12 … 0, MLE for BP is more efficient than univariate probit MLEs:
Cell Counts
y1 = 0 y1 = 1y2 = 0 n00 n10
y2 = 1 n01 n11
< The MLE of is The MLE of $1,
$2, and D12 are and
(implicitly)
57
C For this intercept-only case, the sample information matrix for BPO is
where
58
C Clearly I BPO(2) is singular in this intercept-only case.
B The rank deficiency is 3 - 1 = 2. Knowing D12 does not identify $.
B $1, $2 and D12 are individually unidentified for BPO.
B Only the scalar
is identified. The MLE of is
59
Figure 3: BPO Likelihood, $1 = 0, $2 = 0, D12 = 0
60
Figure 4: BPO Likelihood Contours, $1 = 0, $2 = 0, D12 = 0
61
C Clearly, without restrictions, BPO has a labeling problem.
B In fact, there are hyperplanes through the parameter space for which
the likelihood is constant.
< $2 = D12 $1
< $1 = D12 $2
B Recall:
62
A “Peculiar” Case [Poirier (1980, JoE)]
Add a binary regressor to xn such that xn = [1, 0] for the first r
observations, and xn = [1, 1] for the last N - r observations.
C Suppose and or in other words,
R(2) = $12 = 0 and M R / M 2N = [0, 1, 0, 0, 0].
B (n = 1, ..., r)
B (n = r + 1, ..., N).
63
C Under D12 = 0 the information matrix for BPO is
where
64
C Define
B *NI BPO(2) * = 0 implying I BPO(2) is not positive definite.
B Therefore, 2 is not identified if the excluded covariate is binary.
65
C Recall the “simple” case: there are three parameters $11, $21, D12, but only
is identified with BPO.
B Now add $22, but there is only one additional cofactor value.
B Even with D12 = 0, $11, $21, and $22 are not individually identified,
because the cofactor does not exhibit sufficient variation.
B If the cofactor also took on a third value, then $11, $21, and $22 are
individually identified if D12 = 0.
B If the cofactor is continuous, then no peculiar identification problem
arises.
B The non-linearity of ‹BPO(2) provides the local identification.
66
C In the TPO case
where
67
B Identification depends on the nonsingularity of
which rests on sufficient variation of the gradient over observations.
B In the intercept-only case the only thing identified is the scalar M3($1,
$2 $3; D12, D23, D13).
B Unlike the rank deficiency of two in the bivariate case, the rank
deficiency is 6 - 1 = 5 in the trivariate case.
(17)
68
B Also there are labeling problems with information matrix (17)
becoming singular if $2 = D12 $1, $1 = D12 $2, $3 = D13 $1, $1 = D13 $3,
$2 = D23 $3, and $3 = D23 $2.
B Following the discussion of the BPO case, let xn = [1, xn2, xn3]N and
suppose $1 = [$11, 0, 0]N, $2 = [$21, $22, 0]N, and $3 = [$31, 0, $33]N.
< In other words, two continuous covariates xn2 and xn3 are intro-
duced together with the four restrictions $12 = $13 = $23 = $32 = 0.
< For the moment also suppose D = 03.
69
< Dropping the last three elements in (36) yields
70
Computation Issues
C The MP likelihood function requires computation of MJ (:n; D).
C J # 4 in applications in health, labor or education economics.
C Lesaffre and Kaufman (1992, JASA) showed
B MJ (:n; D) is strictly concave in $ given D.
B MJ (:n; D) is not strictly concave in D.
B MLE is consistent asymptotically normal provided no perfect
classifiers exist.
B Did not really address computational issues.
71
C Freedman and Sekhon (2010, Pol. Anal., p. 145) note numerical problems
with bivariate probit.
B
< strictly monotone in D12.
< convex on(-1, 0)
< concave on (0, 1)
B
72
C Huguenin, Pelgrin and Holly (2009) provide an exact decomposition of
the cdf of the J-variate (standardized) normal vector encountered in the
likelihood function of a multivariate probit model.
B Obtain a sum of multivariate integrals, in which the highest
dimension of the integrands is J ! 1.
< Integration domains are bounded - simplifying the integration.
< Based on Plackett (1954, Biometrika).
B J = 4 is feasible.
B Also consider the singular case.
73
C Gassmann (2003, JCGS) also employed a recursive numerical method
based on Plackett (1954, Biometrika).
B Argues it works for J # 10.
B Drezner (1994) and Genz (2004) used the method for J = 3.
74
C Gassmann, Deák, and Szántai (2002, JCGS) described and compared
numerical methods for finding multivariate probabilities over a rectangle.
B Computation times depend on J, the correlation structure, the
magnitude of the sought probability, and the required accuracy.
B Numerical tests were conducted on approximately 3,000 problems
generated randomly in up to J = 20 dimensions.
75
B The findings indicate that direct integration methods give acceptable
results for up to J = 12, provided that the probability mass of the
rectangle is not too large (less than about 0.9).
B For problems with small probabilities (less than 0.3) a crude Monte
Carlo method gives reasonable results quickly, while bounding
procedures perform best on problems with large probabilities (> 0.9).
B For larger problems numerical integration with quasi-random
Korobov points and a decomposition method due to Deak.
76
C Factor structures for the correlation matrix.
B Integration problems are typically greatly reduced.
B Butler and Moffitt (1992, Econometrica), equicorrelated.
B Ochi and Prentice (1984, Biometrika), equi-correlated.
B Hausman and Wise (1978, Econometrica): J = 3.
B Muthén (1979, JASA): J = 4
B Hausman (1980, Cahiers du Seminaire D’Econometri)
B Bock and Gibbons (1996, Biometrics)
B Gibbons and Wilcox-Gök (1998, JASA): MLE, J = 5.
B Song and Lee (2005, Statistica Sinica): MLE, J = 4.
B Ansari and Jedidi (2000, Psychometrika): Bayesian, J =12.
B Gibbons and Wilcox-Gök (1998, JASA): MLE, J = 5.
77
C Bayesians must sample from non-tractable posterior that is proportional
to the product of the prior and the MP likelihood.
B Data augmentation to the rescue! [Chib and Greenberg (1998,
Biometrika), Albert and Chib (1993, JASA) McCulloch and Rossi
(1994, JoE)].
C It is possible to sample from the posterior without computing the
likelihood function!
78
B Bayesian examples:
< Czado (1996): J = 5.
< Dey and Chen (1996), Chen and Dey (1998, Sankhy~): J = 6.
< Chib and Greenberg (1998): J = 7.
< Voicu and Buddelmeyer (2003, IZA): J = 6.
< Edwards and Allenby (2003, JMR): J = 25, equicorrelated.
< García-Zattera, Jara, Lesaffre and Declerk (2005): J = 8.
< Stefanescu and Turnbull (2005, Biom. J.): equi-correlated.
< Tabet (2007): J = 8.
< Lawrence et al. (2008): J = 6.
< Rosas and Shomer (2008, Leg. Stud. Quar.)
< Duvvuri and Gruca (2010, Psychometrika): J = 4
< Hahn, Carvallo and Scott (2010): J = 100, six factors
79
C MCMC methods have difficulty dealing with the generation of draws
from the correlation matrix, because its conditional density is of
nonstandard form.
B Methods have been developed to generate draws using variants of
< Metropolis-Hastings algorithms [Chib and Greenberg (1998,
Biometrika); Nierop et al. (2000)].
< Griddy-Gibbs methods [Barnard, McCulloch, and Meng (2000);
Liechty, Ramaswamy, and Cohen (2000)].
B These approaches are computationally intensive and therefore are of
limited use when applied to large dimensional problems.
80
C One contested issue is whether to work in terms of
B unidentified covariance matrix [Edwards and Allenby (2003, JMR),
McCulloch and Rossi (1994, JoE), Rossi, Allenby, and McCulloch
(2005, Bayesian Statistics and Marketing)]
B identified correlation matrix [Chib and Greenberg (1998,
Biometrika)]
81
Priors
C Given the information loss in BPO, an informative Bayesian analysis is
an attractive option for supplementing sample information.
B Identification of 2 in the BPO model is often weak, and what is
needed is an informative prior with some public appeal, not a
“noninformative” prior.
B Below is a prior family which should be attractive in many situations.
B Later I will suggest a restricted version depending on only four easily
interpreted hyperparameters.
82
C Assume $ = [$1N, $2N]N z D with joint pdf
where K = K1 + K2.
B This is a conjugate multivariate normal prior for $ given D.
B The prior pdf f$(D* ) is the symmetric beta
83
Posterior
C The posterior pdf of 2 is unfortunately analytically intractable. So
Bayesian analysis rests on the ability to draw a sample from it.
B BPO can be viewed as a binary outcome model in which the link
function is a bivariate standard normal cdf.
B Thus consider the data augmented posterior where
.
84
B Set and r = 0 and proceed as follows.
Step 1: Consider the distribution . The only difference from the
standard data augmentation case is that we must sample from a truncated
bivariate distribution here. Let
85
Set r = r + 1 and sample either
or
86
Step 2: - where
87
Step 3: , where
The random walk Metropolis-Hastings algorithm is a convenient way to
sample from this univariate distribution.
88
C Write D†(r) = D(r - 1) + u, where D†(r) is the candidate value, D(r - 1) is the
current value, and u - ù(0, N–1).
C Let
Generate a proposed value D†(r) by drawing u. Put D(r) = D†(r) with
probability
and leave D(r) = D(r - 1) with probability 1 - "(D†, D).
89
C Go back to step 1 and repeat until convergence is obtained. As R 64, the
resulting draws converge in distribution to the augmented posterior.
C Given a sample (r = 1, 2, ..., R), it is then possible to
compute posterior expectations of a quantity of interest, say, h($1, $2, D).
Examples are:
B Moments of parameters.
B Posterior probabilities of the signs of parameters.
B Predictive density corresponding to covariates
and
(i, j = 0, 1).
90
Restricted Prior
C The following restrictions should be attractive in many cases. Their
appeal rests on the easiness of choosing the hyperparameters.
91
C Suppose
B the first element in xn1 and xn2 is unity,
B any continuous covariates are measured as deviations from their
sample means divided by sample standard deviations, and
B dummy variables are left as is.
B These conventions guarantee a simple interpretation for the intercepts
and render coefficients unitless facilitating prior elicitation.
92
C Suppose , where (i = 1, 2), and
B This prior centers the slopes over zero, leaving specification of the
hyperparameters and to center the intercepts.
B One way of choosing (j = 1, 2) is to consider the average observation
, subjectively assess = Prob(ynj = 1), and then
choose (j = 1, 2).
B The hyperparameter controls the prior precision for $ relative to the
precision in OLS applied separately to the latent system.
93
C The remaining two hyperparameters a and c are candidates for a sensitivity
analysis reflecting the strength of the prior.
B The hyperparameter determines the precision 4(2 + 1) of D.
B Centering the prior for D over zero (implying the two binary decisions
are independent), is often a convenient reference point.
B This prior has many similarities with priors in Adkins and Hill (1996),
Zellner (1986), and Zellner and Rossi (1984, JoE), but applied here to
bivariate probit.
94
C Note ( j = 1, 2), where
B Posterior mean slopes shrink the OLS slopes toward and the
posterior mean intercept shrinks the OLS slope toward
B Centering the slopes over non-zero values, or changing prior precision
matrix to an arbitrary does not complicate matters.
95
Concluding Comments
C multivariate-t [Chib (2000), Dey and Chen (1996, 2008)]
C Chen and Dey (1998, Sankhy~): scale mixture of multivariate normal link.
B Unlike Chib and Greenberg (1998, Biometrika), use a Metropolized
hit-and-run algorithm (Chen and Schmeiser, 1993) .
C Chen and Shao (1999, JMA) considered multivariate data where some are
binary and others are ordinal; Bayesian, J = 3.
C multivariate multinomial [Zhang, Boscardin, and Belin (2008, CSDA)]
96
C Correlated binary response data with covariates:
B Liang and Zeger (1986, Biometrika)
B Zeger and Liang (1986, Biometrics)
B Prentice (1988, Biometrics)
B Tan, Qu and Kutner (1997, Comm. Stat.)
B Carey, Zeger and Diggle (1993, Biometrika)
B Glonek and McCullagh (1995, JRSSB).