learning with tree-averaged densities and distributions

23
Learning with Tree- averaged Densities and Distributions Sergey Kirshner Alberta Ingenuity Centre for Machine Learning, Department of Computing Science, University of Alberta, Canada December 5, 2007 NIPS 2007 Poster W12

Upload: nirav

Post on 17-Mar-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Learning with Tree-averaged Densities and Distributions. Sergey Kirshner Alberta Ingenuity Centre for Machine Learning, Department of Computing Science, University of Alberta, Canada. NIPS 2007 Poster W12. December 5, 2007. Overview. Want to fit density to complete multivariate data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning with Tree-averaged Densities and Distributions

Learning with Tree-averaged Densities and Distributions

Sergey KirshnerAlberta Ingenuity Centre for Machine Learning,

Department of Computing Science, University of Alberta, Canada

December 5, 2007

NIPS 2007Poster W12

Page 2: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 2

Overview• Want to fit density to complete multivariate

data

• New density estimation model based on averaging over tree-dependence structures– Distribution = Univariate Marginals + Copula– Bayesian averaging over tree-structured

copulas– Efficient parameter estimation for tree-

averaged copulas• Can solve problems with 10-30 dimensions

Page 3: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 3

Most Popular Distribution…• Interpretable• Closed under taking

marginals• Generalizes to

multiple dimensions• Models pairwise

dependence• Tractable• 245 pages out of

691 from Continuous Multivariate Distributions by Kotz, Balakrishnan, and Johnson

-3-2

-10

12

3

-3-2

-10

12

30

0.05

0.1

0.15

0.2

Page 4: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 4

What If the Data Is NOT Gaussian?

Page 5: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 5

Curse of Dimensionality

1/n

1/n

nd cells

-3-2

-10

12

3

-3-2

-10

12

30

0.05

0.1

0.15

0.2

V[-2,2]d ≈ 0.9545d

[Bellman 57]

Page 6: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 6

Avoiding the Curse: Step 1Separating Univariate Marginals

univariate marginals,independent variables,

multivariate dependence term,copula

Page 7: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 7

Monotonic Transformation of the Variables

Page 8: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 8

CopulaCopula C is a multivariate distribution (cdf) defined on a unit hypercube with uniform univariate marginals:

Page 9: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 9

Sklar’s Theorem[Sklar 59]

= +

Page 10: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 10

Example: Bivariate Gaussian Copula

Page 11: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 11

Useful Properties of Copulas• Preserves concordance between the

variables– Rank-based measure of dependence

• Preserves mutual information

• Can be viewed as a canonical form of a multivariate distribution for the purpose of the estimation of multivariate dependence

Page 12: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 12

Copula Density

Page 13: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 13

Separating Univariate Marginals

1. Fit univariate marginals (parametric or non-parametric)

2. Replace data points with cdf’s of the marginals

3. Estimate copula densityInference for the margins [Joe and Xu 96]; canonical maximum likelihood [Genest et al 95]

Page 14: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 14

What Next?

• Aren’t we back to square one?– Still estimating multivariate density from data

• Not quite– All marginals are fixed– Lots of approaches for copulas

• Vast majority focus on bivariate case– Design models that use only pairs of variables

Page 15: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 15

Tree-Structured Densities

x2

x3

x4

x5

x6

x1

Page 16: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 16

Tree-Structured Copulas

Page 17: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 17

A1A2

A1A3

A1A4

A2A3

A2A4

A3A4

c(a1,a2)c(a1,a3)c(a1,a4)c(a2,a3)c(a2,a4)c(a3,a4)

a1

a3

a2

a4

0.31260.02290.01720.02300.01830.2603

0.31260.02290.01720.02300.01830.2603

c(a1,a2)c(a1,a3)c(a1,a4)c(a2,a3)c(a2,a4)c(a3,a4)

A1A2

A1A3

A1A4

A2A3

A2A4

A3A4

a1

a3

a2

a4

Chow-Liu Algorithm (for Copulas)

a1

a3

a2

a4

Page 18: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 18

a1

a3

a2

a4

a1

a3

a2

a4

Distribution over Spanning Trees[Meilă and Jaakkola 00, 06]

a1

a3

a2

a4

O(d3) !!!

Page 19: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 19

Tree-Averaged Copula• Can compute sum over all dd-2 spanning

trees

• Can be viewed as a mixture over many, many spanning trees

• Can use EM to estimate the parameters– Even though there are dd-2 mixture components!

Page 20: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 20

EM for Tree-Averaged Copulas

• E-step: compute– Can be done in O(d3) per data point

• M-step: update and – Update of is often linear in the number of

points• Gaussian copula: solving cubic equation

– Update of is essentially iterative scaling• Can be done in O(d3) per iteration

Intractable!!!

Page 21: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 21

Experiments: Log-Likelihood on Test Data

UCI ML RepositoryMAGIC data set

12000 10-dimensional vectors

2000 examples in test sets

Average over 10 partitions

Page 22: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 22

Binary-Continuous Data

Page 23: Learning with Tree-averaged Densities and Distributions

NIPS 2007 Learning with Tree-averaged Densities and Distributions 23

Summary• Multivariate distribution = univariate

marginals + copula• Copula density estimation via tree-averaging

– Closed form• Tractable parameter estimation algorithm in

ML framework (EM)– O(Nd3) per iteration

• Only bivariate distributions at each estimation– Potentially avoiding the curse of dimensionality

• New model for multi-site rainfall amounts (POSTER W12)