bregman information bottleneck nips’03, whistler december 2003 koby crammer hebrew university of...
TRANSCRIPT
![Page 1: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/1.jpg)
Bregman Bregman Information BottleneckInformation Bottleneck
NIPS’03, Whistler December 2003
Koby CrammerKoby CrammerHebrew UniversityHebrew University
of Jerusalemof Jerusalem
Noam SlonimNoam SlonimPrinceton UniversityPrinceton University
![Page 2: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/2.jpg)
MotivationMotivation
• Extend the IB for a broad family of representations• Relation to the Exponential family
Hello, world
Multinomial distribution
Vectors
![Page 3: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/3.jpg)
OutlineOutline
• Rate-Distortion Formulation• Bregman Divergences• Bregman IB• Statistical Interpretation• Summary
![Page 4: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/4.jpg)
Information BottleneckInformation Bottleneck
X T Y
X
[ p(y=1|X) … p(y=n|X)]
[ p(y=1|T) … p(y=n|T)]
T
![Page 5: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/5.jpg)
• Input
• Variables
• Distortion
Rate-Distortion FormulationRate-Distortion Formulation
![Page 6: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/6.jpg)
• Bolzman Distribution:
• Markov + Bayes
• Marginal
Self-Consistent EquationsSelf-Consistent Equations
![Page 7: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/7.jpg)
Bregman DivergencesBregman Divergences
f
(u,f(u))
(v,f(v))
(v, f(u)+f’(u)(v-u))
Bf(v||u) = f(v) - (f(u)+f’(u)(v-u))Bf(v||u) = f:S R
![Page 8: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/8.jpg)
• Functional
• Bregman Function
• Input
• Variables
• Distortion
Bregman IB: Rate-Distortion FormulationBregman IB: Rate-Distortion Formulation
![Page 9: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/9.jpg)
• Bolzman Distribution:
• Prototypes: convex combination of input vectors
• Marginal
Self-Consistent EquationsSelf-Consistent Equations
![Page 10: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/10.jpg)
Special CasesSpecial Cases
• Information Bottleneck: Bregman function: f(x)=x log(x) – x Domain: Simplex Divergence: Kullback-Leibler
• Soft K-means Bregman function: f(x)=(1/2) x2
Domain: Realsn
Divergence: Euclidian Distance [Still, Bialek, Bottou, NIPS 2003]
![Page 11: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/11.jpg)
Bregman IBBregman IB
Information Bottleneck
BregmanClustering
Rate-Distortion
Exponential Family
![Page 12: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/12.jpg)
Exponential FamilyExponential Family
• Expectation parameters:
• Examples (single dimension): Normal
Poisson
![Page 13: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/13.jpg)
• Expectation parameters:
• Properties :
Exponential Family and Exponential Family and Bregman DivergencesBregman Divergences
![Page 14: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/14.jpg)
IllustrationIllustration
![Page 15: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/15.jpg)
• Expectation parameters:
• Properties :
Exponential Family and Exponential Family and Bregman DivergencesBregman Divergences
![Page 16: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/16.jpg)
• Distortion:
• Data vectors and prototypes: expectation parameters
• Question: For what exponential distribution we have ?
Answer: Poisson
Back to Distributional ClusteringBack to Distributional Clustering
![Page 17: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/17.jpg)
Product of Poisson
Distributions
IllustrationIllustration
a a b a a a b a a a .8.2
a b
6040
a b
Pr
Multinomial Distribution
![Page 18: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/18.jpg)
Back to Distributional ClusteringBack to Distributional Clustering
• Information Bottleneck: Distributional clustering of Poison distributions
• (Soft) k-means: (Soft) Clustering of Normal distributions
![Page 19: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/19.jpg)
• Distortion
• Input: Observations
• Output Parameters of Distribution
• IB functional: EM [Elidan & Fridman, before]
Maximum Likelihood PerspectiveMaximum Likelihood Perspective
![Page 20: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/20.jpg)
• Posterior:
• Partition Function:
Weighted -norm of the Likelihood
• → ∞ , most likely cluster governs• →0 , clusters collapse into a single prototype
Back to Self Consistent EquationsBack to Self Consistent Equations
![Page 21: Bregman Information Bottleneck NIPS’03, Whistler December 2003 Koby Crammer Hebrew University of Jerusalem Noam Slonim Princeton University](https://reader036.vdocuments.site/reader036/viewer/2022081603/56649d1f5503460f949f3600/html5/thumbnails/21.jpg)
Summary Summary
• Bregman Information Bottleneck Clustering/Compression
for many representations and divergences
• Statistical Interpretation Clustering of distributions from the exponential family EM like formulation
• Current Work: Algorithms Characterize distortion measures which also yield
Bolzman distributions General distortion measures