parameter identifiability of discrete dag models with...
TRANSCRIPT
Parameter identifiability of discrete DAG models
with latent variables
John A. Rhodes
Algebraic Statistics 2014
IIT, May 19 – 22
Note: AK ⊂ US, P(x ∈ AK|x ∈ US) ≈ .16, P(p ∈ AK|p ∈ US) ≈ .0023, P(c ∈ AK|c ∈ US) ≈ 1
Thanks to those who made AS2014 possible:
Local Organizers: Sonja Petrovic, Despina Stasi
Program Committee: Stephen Fienberg, Sonja Petrovic, Seth
Sullivant, Henry Wynn, Ruriko Yoshida
IIT grad students: Weronika J. Swiechowicz, Carlo Pierandozzi,
Martin Dillon, Kawkab Alhejoj, Dane Wilburne, Junyu He
IIT undergrads: Xintong Li, Meng (Mamie) Wang
Discrete DAG identifiability 2/29
Parameter identifiability of discrete DAG models
with latent variables
Collaborators:
Elizabeth Allman, Mathematics, UAF
Elena Stanghellini, Statistics, Perugia
Marco Valtorta, Computer Science, South Carolina
Discrete DAG identifiability 3/29
Example:
0
2
1
3
4
56
7
Variables Xi have finite state spaces, of size ni .
Q: Are model parameters identifiable?
Discrete DAG identifiability 4/29
Example:
0
2
1
3
4
56
7
Parameters: Conditional probabilities P(Xi | pa(Xi))
Joint Distribution:∑
X0
∏
i
P(Xi | pa(Xi))
Identifiability: The joint distribution of observable variables
determines the parameters (up to...)
Discrete DAG identifiability 5/29
Identifiability:
1) Parameterization is polynomial, so focus on generic behavior.
(generic complex? real? stochastic?)
2) Latent variables ⇒ “label-swapping”
⇒ n0!-to-1 parametrization, at best
Q’: Is the parameterization generically k-to-1 for some finite k?
If so, characterize the fibers of the parameterization.
Discrete DAG identifiability 6/29
Common practical approach:
With J = Jacobian of parameterization, N=dim(parameter space)
compute rank(J) at many random points
• If rank (J) < N everywhere, parameters not identifiable (∞-to-1).
• If rank (J) = N, then parameters locally identifiable
• Since local identifiability 6⇒ global identifiability, assume/hope
label swapping is only issue, k = n0!.
Discrete DAG identifiability 7/29
For any specific DAG and finite state spaces, one can (try to)
answer this question with computational algebra, but....
Q”: What graphical criteria addresses identifiability?
Cf: ”do”-calculus for identifiability of causal effects – determines
exactly what is identifiable, gives rational formulas.
— for nonparametric latent variables —
Discrete DAG identifiability 8/29
Simple DAGs:
0
1 2
P(X1,X2) = MT1 DM2
D = diag(P(X0)), M1 = P(X1 | X0), M2 = P(X2 | X0)
Non-uniqueness of matrix factorization
⇒ ∞-to-1 parameterization
Discrete DAG identifiability 9/29
Simple DAGs:
Star model — tensor decomposition
0
1 2 3
Kruskal’s Theorem: Decomposition of a generic 3-tensors is unique
if n1, n2, n3 sufficiently large relative to n0,
n1 + n2 + n3 ≥ 2n0 + 2
⇒ Parameters are identifiable, up to label swapping.
Discrete DAG identifiability 10/29
Example (Kuroki and Pearl 2014)
0
1 2 3 4
By do-calculus, P(X3 | do(X2)) is not identifiable,
But...
If X0 has finite state space, X1,X4 have larger state spaces,
P(X3 | do(X2)) is identifiable.
Discrete DAG identifiability 11/29
In fact, all parameters are identifiable, up to label swapping.
0
1 2 3 4
• reverse 1 → 2, Markov equivalent model
• condition on X2 generic Kruskal model with“same”
parameters
• identify P(X4 | X0), up to label swap
• Solve P(X1,X2,X3;X4) = P(X1,X2,X3;X0)P(X4 | X0)
for P(X1,X2,X3;X0) to “uncover” latent
• From P(X0,X1,X2,X3) find remaining parameters.
Discrete DAG identifiability 12/29
More generally...
To gain insight, consider all DAG models with:
• 1 latent, parent of at most 4 observables
• binary variables
Goals:
• Develop algebraic arguments not tied to binary case
• Reduce more complex DAGs to these... (more later)
Is conditioning/marginalizing/Kruskal enough to successfully
analyze these?
Discrete DAG identifiability 13/29
Almost ....
Model Graph dim(Θ) 2A − 1 k
2-B, B ≥ 0 ≥ 5 3 ∞
3-0
0
1 2 3 7 7 23-Bx , B ≥ 1 ≥ 9 7 ∞
4-0
0
1 2 3 4 9 15 2
4-1
0
1 2 3 4 11 15 2
4-2a
0
1 2 3 4 13 15 ∞
4-2b,c
0
1 2 3 4 ,
0
2 1 3 4 13 15 2
4-2d
0
1 3 2 4 15 15 2
4-3a,b (A)
0
1 2 3 4 ,
0
2 1 3 4 15 15 2
4-3c,d
0
1 3 2 4 ,
0
1 2 4 3 17 15 ∞
4-3e,f (B)
0
2 1 3 4 ,
0
1 2 3 4 15 15 4
4-3g
0
1 2 3 4 17 15 ∞
4-3h
0
1 2 4 3 25 15 ∞
4-3i
0
1 2 3 4 25 15 ∞4-Bx , B ≥ 4 ≥ 19 15 ∞
2 interesting cases...
Discrete DAG identifiability 14/29
Model A (binary)
With binary variables, the parameterization for
0
2 1 3 4
is generically 2-to-1 on stochastic parameter space.
This model is not reducible to Kruskal.
Discrete DAG identifiability 15/29
With binary variables, the parameterization for
0
2 1 3 4
is generically 2-to-1 on stochastic parameter space.
Sketch:
• Condition on X1,X3 (4 ways), to give 4 matrices
• Construct expressions in these matrices whose eigenvectors
identify parameters.
• Need generic condition: distinct eigenvalues. Equivalently:
There is a 3-way interaction between X0,X1,X3
Discrete DAG identifiability 16/29
Why is the 3-way interaction needed?
0
2 1 3 4
has ∞-to-1 parametrization. So
0
2 1 3 4
5
does as well.
Discrete DAG identifiability 17/29
But conditioning
0
2 1 3 4
5
on X5 yields
0
2 1 3 4
still with an ∞-to-1 parameterization.
Discrete DAG identifiability 18/29
Contradiction, since
0
2 1 3 4
has a 2-to-1 parameterization (Model A).
FLAW: Conditioning gave a non-generic instance – no 3-way
interaction between 0,1,3 – there is no contradiction
Discrete DAG identifiability 19/29
Moral 1: Conditioning must be done carefully, to give a generic
model.
Moral 2: Frameworks such as summary graphs and maximal
ancestral graphs which graphically depict some consequences of
conditioning are not helpful here – don’t get generic instances.
Discrete DAG identifiability 20/29
Model A (general)
The model
0
2 1 3 4
is generically identifiable, up to label swapping, provided
n2, n4 ≥ n0
Discrete DAG identifiability 21/29
Model B (binary)With binary variables, the parameterization for
0
2 1 3 4
is generically 4-to-1 on stochastic parameter space,
– not just label swapping –
Sketch:
• Conditioning a generic model on X1 yields 2 generic
0
1 2 3
models
• These each have 2-to-1 parameterizations.
• Any of the 4 choices of parameters for them can be “combined”
to give parameters for the original model.
Discrete DAG identifiability 22/29
Model B (general)
If n2, n3, n4 sufficiently large relative to n0, then
0
2 1 3 4
has a (n0!)n1-to-1 parameterization.
Moreover, a full fiber can be obtained from any single element by
rational formulas.
Discrete DAG identifiability 23/29
Large DAG models
If a DAG model has a k-to-1 parameterization, then
k is unchanged if:
• remove observable sinks with all parents observable
• pass to Markov equivalent graphs
k may change if:
• marginalize/condition on observed variables
Cautions:
• Marginalize only over sinks, but risk losing identifiability.
• Condition carefully, to get generic model.
Discrete DAG identifiability 24/29
A general result
Building on Model B0
2 1 3 4 ,
Theorem: Suppose a DAG has one latent node 0 with no parents,
and three observable sinks 1, 2, 3 that are children of 0.
Let
C = Anc (Chd(0) ∩ Anc(1) ∩ Anc(2) ∩ Anc(3)) r {0},
and
u = |C ∩ Pa(1) ∩ Pa(2) ∩ Pa(3)| .
Then for binary variables, the parametrization is generically k-to-1
with k = 22u
the potential fiber can be described, and thus k can be determined
exactly.
(non-binary version also)
Discrete DAG identifiability 25/29
Example: Model B
0
2 1 3 4
Sinks 2,3,4, all children of 0,
C = Anc (Chd(0) ∩ Anc(2) ∩ Anc(3) ∩ Anc(4)) r {0}
= {1}
u = |C ∩ Pa(2) ∩ Pa(3) ∩ Pa(4)|
= 1
so 221= 4-to-1 parameterization
Discrete DAG identifiability 26/29
Example: from beginning of talk
0
2
1
3
4
56
7
Remove 7 (observable child with observable parents):
0
2
1
3
4
56
Discrete DAG identifiability 27/29
0
2
1
3
4
56
Sinks 4,5,6 all children of 0,
C = Anc (Chd(0) ∩ Anc(4) ∩ Anc(5) ∩ Anc(6)) r {0}
= ∅
u = |C ∩ Pa(4) ∩ Pa(5) ∩ Pa(6)|
= 0
so 220= 2-to-1 parameterization
Discrete DAG identifiability 28/29
Final comments:
• A 2-sink theorem is “under development,” building on0
2 1 3 4
• Multiple latent variables with no/limited common children may
be handlable.
• Main impediment to non-binary variables is awkwardness of
statements.
Discrete DAG identifiability 29/29