the more, the merrier: the blessing of dimensionality for ... · the more, the merrier: the...

15

The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU) Mikhail Belkin (OSU) Navin Goyal (Microsoft) Luis Rademacher (OSU) James Voss (OSU)

Upload: trinhhanh

Post on 04-Apr-2018

219 views

Category:

Documents

3 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

The More, the Merrier: the Blessing of Dimensionality for Learning

Large Gaussian Mixtures

Joseph Anderson (OSU)!Mikhail Belkin (OSU)

Navin Goyal (Microsoft) Luis Rademacher (OSU)

James Voss (OSU)

Page 2: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

Curse of Dimensionality

• Many problems become much harder as the dimension increases.

• A common approach is to preprocess the data, mapping to a lower dimension, and propagate results back to original dimension.

Page 3: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

The Blessing of Dimensionality

• Unusual(?) Phenomenon: Some problems become much easier as the dimension increases; dimensionality reduction “considered harmful”.

• This Talk: Parameter estimation of Gaussian Mixture Model (GMM) is generically hard in low dimension and easy in high dimension. Exponential gap.

Page 4: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

Learning a GMM

• Problem: Given samples of density 𝑓(x) = Σi=1..k 𝑤i𝑓i, where Σi 𝑤i = 1 and 𝑓i is the 𝑛-dimensional density 𝒩(µi,𝜮i), estimate 𝑤i, 𝜮i, and µi efficiently.

• We show a Blessing of Dimensionality for recovery of means and weights.

Page 5: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

Our Results: Low Dimension is Hard

• We show that in low dimension efficient estimation of a GMM fails generically.

✤ Dense enough sets can form the means for two different GMM’s which are exponentially close in distribution, but far in parameter space.

The lower bound, through a reduction, also applies to another problem, Independent Component Analysis.

Page 6: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

Our Results: Low-Dimensional Hardness

• Fact: Any algorithm requires 1/𝑑TV (𝑀,𝑁) samples to distinguish 𝑀 from N.

• Technique: Use RKHS to bound the difference between a smooth function and a (Gaussian Kernel) interpolant over sufficiently dense finite subsets of [0,1]

n.

Theorem: Given 𝑘2 points uniformly sampled from the 𝑛-dimensional hypercube [0,1]ⁿ, one can (whp) find two disjoint subsets 𝐴 and 𝐵 of equal cardinality, together with GMM’s 𝑀 and 𝑁 with unit covariances over 𝐴 and 𝐵 respectively, so that

𝑑TV (𝑀,𝑁) < exp(-C (k / log k)1/n),

but the means of M and N are well-separated.

Page 7: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

Our Results: Very Large Mixtures in High Dimension are Easy to Learn

• Summary: For any fixed p, one can learn the parameters of 𝛰(𝑛p)

Gaussians in ℝ𝑛 using smoothed polynomial time and sample size.

Theorem: Assuming each Gaussian has the same, known covariance matrix, one can estimate the means using sample size and time polynomial in the dimension 𝑛, a conditioning parameter 1/𝑠, and other natural parameters (e.g. ratio of largest to smallest weight, failure/success probability, largest eigenvalue of the covariance matrix).

• The condition parameter 𝑠 is a generalization of the smallest singular value of the matrix of GMM means.

• Next, we will see 1/s is generically (in smoothed sense) “not too large”

Page 8: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

• Special case: learning k = 𝛰(n2) means (p = 2 in previous theorem). Let M be any 𝑛 × 𝑘 matrix of GMM means.

Our Results: Generic Tractability (Smoothed Analysis)

Theorem: If E is an 𝑛 × 𝑘 “perturbation” matrix whose entries are iid from 𝒩(0,𝜎2), for 𝜎 > 0, then

𝐏( 𝑠(𝑀+E) ≤ 𝜎2/𝑛7 ) = 𝑂(1/𝑛)

where 𝑠 denotes the conditioning parameter of the GMM means matrix; (𝑠 is the kth singular value of the second Khatri-Rao power.)

Page 9: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

Our Results: Blessing of Dimensionality

• In low dimension, GMM learning is generically hard.

• In high enough dimension, learning large GMM’s is generically easy.

Page 10: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

Previous Work• First algorithms with full analysis: [Dasgupta, 1999], [Dasgupta

Schulman], [Arora Kannan], [Vempala Wang], [Kannan, Salmasian Vempala], [Brubaker Vempala], [Kalai Moitra Valiant], [Belkin Sinha, 2009]. All use projection and have separation assumptions or only for special cases.

• [Belkin Sinha, 2010] and [Moitra Valiant, 2010] give polynomial time algorithms for a fixed number of components, with arbitrary separation; however, are super-exponential in the number of components. These still use lower dimensional projections to estimate parameters. [Moitra Valiant, 2010] give an example of two different one-dimensional mixtures exponentially close in the number of components.

• [Hsu Kakade, 2012] Gave efficient estimation when as many components as the ambient dimension, without separation assumption. Complexity related to 𝜎min of the matrix of means.

Page 11: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

Simultaneous Work

• [Bhaskara Charikar Moitra Vijayaraghavan, STOC 2014] Independent work, learn large GMM’s in high dimension with smoothed analysis of tensor decompositions. Can handle unknown, axis-aligned covariance. Higher running time in the number of components, lower probability of error.

Page 12: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

Independent Component Analysis (ICA)• Origins in the Signal Processing community.

• Vast literature.

• The Model: Observations of X = AS + 𝜂, where A is an unknown matrix, S is an unknown random vector whose coordinates are mutually independent, and 𝜂 is noise (typically assumed Gaussian) independent of S.

• The Goal: recover “mixing matrix” A from observations of X.

• We use ICA algorithm of [Goyal Vempala Xiao, ‘14].

Page 13: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

Poissonization: Basic Idea• Let 𝔇 be a distribution over the n-element set X where xi

has probability wi.

• Let R ~ Poisson(𝜆).

• Fact: After drawing R samples from X according to 𝔇, let S = (S1, S2, …, Sn) be the vector where Si is the number of times xi was drawn. Then for each i, Si ~Poisson(𝜆wi) and they are mutually independent.

• If X is a subset of ℝ𝑛 and we sum the R samples, we can write this as a single sample Y = AS, where A is the matrix whose columns are the points in X.

Page 14: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

Our Results: Reducing GMM estimation to ICA

• The resulting sample (approximately) from a product distribution.

𝑅 ~ Poisson(𝜆) Zi ~ GMM Y = Z1 + … + ZR

Transform to Y, having independent coordinates

Choose threshold and add noise.

Page 15: The More, the Merrier: the Blessing of Dimensionality for ... · The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures Joseph Anderson (OSU)!

Conclusion• In high enough dimension efficient estimation is

possible, even for very large mixtures.

• Proved by a reduction to ICA.

• In low dimension, GMM learning is generically hard.

• Raises questions about dimensionality reduction in general.

• The lower bound also applies to ICA via the reduction.

• Thanks!

The More the Merrier? The Effects of Community Feedback on

Dimensionality reduction Usman Roshan CS 675. Dimensionality reduction What is dimensionality reduction? –Compress high dimensional data into lower dimensions

More the Merrier - India Brand Equity Foundation

Time Zones in Outlook....the more the merrier!

The [Blessings]curses of high Dimensionality: Explaining … · 2020. 5. 5. · The blessing of high dimensionality EGU, 2020 3 / 11. In high dimensions ..... all vectors from same

FINDING NEEDLES IN HAYSTACKS: NATIONAL BUREAU OF …many needle-in-a-haystack problems (Agrawal, Gans, and Goldfarb, 2018). If the curse of dimensionality is both the blessing and

Dimensionality Reduction for Stationary Time Series via ...papers.nips.cc/paper/7609-dimensionality-reduction-for-stationary-ti… · Dimensionality Reduction for Stationary Time

The More the Merrier?

GrandinRoad 2015 pg 1 More the Merrier NYC

The More the Merrier? The Effect of Family Size and …eprints.lse.ac.uk/19444/1/The_More_the_Merrier_The_Effect_of...The More the Merrier? The Effect of Family Size and Birth Order

Dimensionality and dimensionality … and dimensionality reductiondimensionality reduction Nuno Vasconcelos ECE Depp,artment, UCSD. Note ... The curse of dimensionality

The More the Merrier? The Causal Effect of High Fertility ...The More the Merrier The ausal ect o igh ertility on 735 13 supportandcompanionshipfromtheirnetworks,havingmorechildrenmaybeprotective

THE MORE THE MERRIER

Embracing the Blessing of Dimensionality in Factor Modelsquefeng/publication/JASA2017.pdf2 Q. LI ET AL. are uncorrelated. Then, the model implied covariance structure is! = Bcov(ft)B′

High dimensionality

The more the merrier: comparative analysis of microarray

QlikView Dimensionality

Embracing the Blessing of Dimensionality in Factor Modelschengg/factor.pdfEmbracing the Blessing of Dimensionality in Factor Models Quefeng Li, Guang Cheng, Jianqing Fan and Yuyan

The more, the merrier – crowd sourced testing

Dimensionality Reduction - University of Pittsburghpeople.cs.pitt.edu/~iyad/DR.pdf · Dimensionality Reduction Problems of learning in high dimensional spaces: • Curse of dimensionality

The More the Merrier? THC Potency in the Legalization Era

The More the Merrier: Recent Hybridization and Polyploidy ... · The More the Merrier: Recent Hybridization and Polyploidy in CardamineW Terezie Mandáková,a Aleš Kovarík,b Judita

Promiscuous Pairing - More the Merrier - Agile India 2015

1,2, 2,3 and Ivan Y. Tyukin - Le...mining; geometry 1. Introduction During the last two decades, the curse of dimensionality in data analysis was complemented by the blessing of dimensionality:

THE MORGUE THE MERRIER - Heuer Publishing Co061110.pdf · THE MORGUE THE MERRIER THE MORGUE THE MERRIER. By Pat Cook . SYNOPSIS: “You ever wonder what their stories are?” asks

Dimensionality Reduction

HEAD TOPICS The more the merrier