learning with tree-averaged densities and distributions
DESCRIPTION
Learning with Tree-averaged Densities and Distributions. Sergey Kirshner Alberta Ingenuity Centre for Machine Learning, Department of Computing Science, University of Alberta, Canada. NIPS 2007 Poster W12. December 5, 2007. Overview. Want to fit density to complete multivariate data - PowerPoint PPT PresentationTRANSCRIPT
Learning with Tree-averaged Densities and Distributions
Sergey KirshnerAlberta Ingenuity Centre for Machine Learning,
Department of Computing Science, University of Alberta, Canada
December 5, 2007
NIPS 2007Poster W12
NIPS 2007 Learning with Tree-averaged Densities and Distributions 2
Overview• Want to fit density to complete multivariate
data
• New density estimation model based on averaging over tree-dependence structures– Distribution = Univariate Marginals + Copula– Bayesian averaging over tree-structured
copulas– Efficient parameter estimation for tree-
averaged copulas• Can solve problems with 10-30 dimensions
NIPS 2007 Learning with Tree-averaged Densities and Distributions 3
Most Popular Distribution…• Interpretable• Closed under taking
marginals• Generalizes to
multiple dimensions• Models pairwise
dependence• Tractable• 245 pages out of
691 from Continuous Multivariate Distributions by Kotz, Balakrishnan, and Johnson
-3-2
-10
12
3
-3-2
-10
12
30
0.05
0.1
0.15
0.2
NIPS 2007 Learning with Tree-averaged Densities and Distributions 4
What If the Data Is NOT Gaussian?
NIPS 2007 Learning with Tree-averaged Densities and Distributions 5
Curse of Dimensionality
1/n
1/n
nd cells
-3-2
-10
12
3
-3-2
-10
12
30
0.05
0.1
0.15
0.2
V[-2,2]d ≈ 0.9545d
[Bellman 57]
NIPS 2007 Learning with Tree-averaged Densities and Distributions 6
Avoiding the Curse: Step 1Separating Univariate Marginals
univariate marginals,independent variables,
multivariate dependence term,copula
NIPS 2007 Learning with Tree-averaged Densities and Distributions 7
Monotonic Transformation of the Variables
NIPS 2007 Learning with Tree-averaged Densities and Distributions 8
CopulaCopula C is a multivariate distribution (cdf) defined on a unit hypercube with uniform univariate marginals:
NIPS 2007 Learning with Tree-averaged Densities and Distributions 9
Sklar’s Theorem[Sklar 59]
= +
NIPS 2007 Learning with Tree-averaged Densities and Distributions 10
Example: Bivariate Gaussian Copula
NIPS 2007 Learning with Tree-averaged Densities and Distributions 11
Useful Properties of Copulas• Preserves concordance between the
variables– Rank-based measure of dependence
• Preserves mutual information
• Can be viewed as a canonical form of a multivariate distribution for the purpose of the estimation of multivariate dependence
NIPS 2007 Learning with Tree-averaged Densities and Distributions 12
Copula Density
NIPS 2007 Learning with Tree-averaged Densities and Distributions 13
Separating Univariate Marginals
1. Fit univariate marginals (parametric or non-parametric)
2. Replace data points with cdf’s of the marginals
3. Estimate copula densityInference for the margins [Joe and Xu 96]; canonical maximum likelihood [Genest et al 95]
NIPS 2007 Learning with Tree-averaged Densities and Distributions 14
What Next?
• Aren’t we back to square one?– Still estimating multivariate density from data
• Not quite– All marginals are fixed– Lots of approaches for copulas
• Vast majority focus on bivariate case– Design models that use only pairs of variables
NIPS 2007 Learning with Tree-averaged Densities and Distributions 15
Tree-Structured Densities
x2
x3
x4
x5
x6
x1
NIPS 2007 Learning with Tree-averaged Densities and Distributions 16
Tree-Structured Copulas
NIPS 2007 Learning with Tree-averaged Densities and Distributions 17
A1A2
A1A3
A1A4
A2A3
A2A4
A3A4
c(a1,a2)c(a1,a3)c(a1,a4)c(a2,a3)c(a2,a4)c(a3,a4)
a1
a3
a2
a4
0.31260.02290.01720.02300.01830.2603
0.31260.02290.01720.02300.01830.2603
c(a1,a2)c(a1,a3)c(a1,a4)c(a2,a3)c(a2,a4)c(a3,a4)
A1A2
A1A3
A1A4
A2A3
A2A4
A3A4
a1
a3
a2
a4
Chow-Liu Algorithm (for Copulas)
a1
a3
a2
a4
NIPS 2007 Learning with Tree-averaged Densities and Distributions 18
a1
a3
a2
a4
a1
a3
a2
a4
Distribution over Spanning Trees[Meilă and Jaakkola 00, 06]
a1
a3
a2
a4
O(d3) !!!
NIPS 2007 Learning with Tree-averaged Densities and Distributions 19
Tree-Averaged Copula• Can compute sum over all dd-2 spanning
trees
• Can be viewed as a mixture over many, many spanning trees
• Can use EM to estimate the parameters– Even though there are dd-2 mixture components!
NIPS 2007 Learning with Tree-averaged Densities and Distributions 20
EM for Tree-Averaged Copulas
• E-step: compute– Can be done in O(d3) per data point
• M-step: update and – Update of is often linear in the number of
points• Gaussian copula: solving cubic equation
– Update of is essentially iterative scaling• Can be done in O(d3) per iteration
Intractable!!!
NIPS 2007 Learning with Tree-averaged Densities and Distributions 21
Experiments: Log-Likelihood on Test Data
UCI ML RepositoryMAGIC data set
12000 10-dimensional vectors
2000 examples in test sets
Average over 10 partitions
NIPS 2007 Learning with Tree-averaged Densities and Distributions 22
Binary-Continuous Data
NIPS 2007 Learning with Tree-averaged Densities and Distributions 23
Summary• Multivariate distribution = univariate
marginals + copula• Copula density estimation via tree-averaging
– Closed form• Tractable parameter estimation algorithm in
ML framework (EM)– O(Nd3) per iteration
• Only bivariate distributions at each estimation– Potentially avoiding the curse of dimensionality
• New model for multi-site rainfall amounts (POSTER W12)