learning with tree-averaged densities and distributions

Learning with Tree-averaged Densities and Distributions

Sergey KirshnerAlberta Ingenuity Centre for Machine Learning,

Department of Computing Science, University of Alberta, Canada

December 5, 2007

NIPS 2007Poster W12

NIPS 2007 Learning with Tree-averaged Densities and Distributions 2

Overview• Want to fit density to complete multivariate

data

• New density estimation model based on averaging over tree-dependence structures– Distribution = Univariate Marginals + Copula– Bayesian averaging over tree-structured

copulas– Efficient parameter estimation for tree-

averaged copulas• Can solve problems with 10-30 dimensions


Most Popular Distribution…• Interpretable• Closed under taking

marginals• Generalizes to

multiple dimensions• Models pairwise

dependence• Tractable• 245 pages out of

691 from Continuous Multivariate Distributions by Kotz, Balakrishnan, and Johnson

-3-2

-10

12

3

-3-2

-10

12

30

0.05

0.1

0.15

0.2


What If the Data Is NOT Gaussian?


Curse of Dimensionality

1/n

1/n

nd cells

-3-2

-10

12

3

-3-2

-10

12

30

0.05

0.1

0.15

0.2

V[-2,2]d ≈ 0.9545d

[Bellman 57]


Avoiding the Curse: Step 1Separating Univariate Marginals

univariate marginals,independent variables,

multivariate dependence term,copula


Monotonic Transformation of the Variables


CopulaCopula C is a multivariate distribution (cdf) defined on a unit hypercube with uniform univariate marginals:


Sklar’s Theorem[Sklar 59]

= +


Example: Bivariate Gaussian Copula


Useful Properties of Copulas• Preserves concordance between the

variables– Rank-based measure of dependence

• Preserves mutual information

• Can be viewed as a canonical form of a multivariate distribution for the purpose of the estimation of multivariate dependence


Copula Density


Separating Univariate Marginals

1. Fit univariate marginals (parametric or non-parametric)

2. Replace data points with cdf’s of the marginals

3. Estimate copula densityInference for the margins [Joe and Xu 96]; canonical maximum likelihood [Genest et al 95]


What Next?

• Aren’t we back to square one?– Still estimating multivariate density from data

• Not quite– All marginals are fixed– Lots of approaches for copulas

• Vast majority focus on bivariate case– Design models that use only pairs of variables


Tree-Structured Densities

x2

x3

x4

x5

x6

x1


Tree-Structured Copulas


A1A2

A1A3

A1A4

A2A3

A2A4

A3A4

c(a1,a2)c(a1,a3)c(a1,a4)c(a2,a3)c(a2,a4)c(a3,a4)

a1

a3

a2

a4

0.31260.02290.01720.02300.01830.2603

0.31260.02290.01720.02300.01830.2603

c(a1,a2)c(a1,a3)c(a1,a4)c(a2,a3)c(a2,a4)c(a3,a4)

A1A2

A1A3

A1A4

A2A3

A2A4

A3A4

a1

a3

a2

a4

Chow-Liu Algorithm (for Copulas)

a1

a3

a2

a4


a1

a3

a2

a4

a1

a3

a2

a4

Distribution over Spanning Trees[Meilă and Jaakkola 00, 06]

a1

a3

a2

a4

O(d3) !!!


Tree-Averaged Copula• Can compute sum over all dd-2 spanning

trees

• Can be viewed as a mixture over many, many spanning trees

• Can use EM to estimate the parameters– Even though there are dd-2 mixture components!


EM for Tree-Averaged Copulas

• E-step: compute– Can be done in O(d3) per data point

• M-step: update and – Update of is often linear in the number of

points• Gaussian copula: solving cubic equation

– Update of is essentially iterative scaling• Can be done in O(d3) per iteration

Intractable!!!


Experiments: Log-Likelihood on Test Data

UCI ML RepositoryMAGIC data set

12000 10-dimensional vectors

2000 examples in test sets

Average over 10 partitions


Binary-Continuous Data


Summary• Multivariate distribution = univariate

marginals + copula• Copula density estimation via tree-averaging

– Closed form• Tractable parameter estimation algorithm in

ML framework (EM)– O(Nd3) per iteration

• Only bivariate distributions at each estimation– Potentially avoiding the curse of dimensionality

• New model for multi-site rainfall amounts (POSTER W12)

learning with tree-averaged densities and distributions

Documents

treeaveraged densities

treeaveraged copulascan

multivariate density

machine learning

uniform univariate marginals

multivariate dependence

multivariate distribution

datanot quiteall marginals