hilbert space embeddings of conditional distributions -- with applications to dynamical systems le...

27
Hilbert Space Embeddings of Conditional Distributions --With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan Huang, Alex Smola and Kenji Fukumizu

Upload: magdalene-gabriella-stokes

Post on 20-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Hilbert Space Embeddings of Conditional Distributions

--With Applications to Dynamical Systems

Le SongCarnegie Mellon University

Joint work with Jonathan Huang, Alex Smola and Kenji Fukumizu

Page 2: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Hilbert Space Embeddings

• Summary statistics for distributions:

(mean)

(covariance)

(expected features)

Page 3: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Hilbert Space Embeddings

• Reconstruct from for certain kernels!Eg. RBF kernel .

• Sample average converges to true mean at .

Page 4: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Hilbert Space Embeddings

Embed Conditional Distributions p(y|x) µ[p(y|x)]

Embed Conditional Distributions p(y|x) µ[p(y|x)]

Structured Prediction

Dynamical Systems

Page 5: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Embedding P(Y|X)

• Simple case: X binary, Y vector valued• Need to embed

• Solution:– Divide observations according to their – Average feature for each group

What if X is continuous or structured?

Page 6: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

• Goal: Estimate embedding from finite sample• Problem: Some X=x are never observed…

Embed p(Y|X=x) for x never observed?

Embedding P(Y|X)

Page 7: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

• If and are close, and will be similar.

Key Idea

Generalize to unobserved X=x via a kernel on X

Page 8: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Conditional Embedding

Conditional Embedding Operator

Page 9: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Conditional Embedding Operator

generalization of covariance matrix

Page 10: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Kernel Estimate

• Empirical estimate converges at

Page 11: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Sum and Product RulesProbabilistic

RelationHilbert Space

Relation

Sum Rule

Product Rule

Can do probabilistic inference with embeddings!

Page 12: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Dynamical Systems

Maintain belief usingHilbert space embeddings

Hidden State

observation

Page 13: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Advantages

• Applies to any domain with a kernelEg. reals, rotations, permutations, strings

• Applies to more general distributionsEg. multimodal, skewed

Page 14: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

• Linear dynamics model (Kalman filter optimal):

Evaluation with Linear Dynamics

Trajectory of Hidden States

Observations

Page 15: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Kalman

2 4 6

0.2

0.25

0.3

0.35

N(0,10-1 I), N(0,10-2 I)

Training Size ( Steps)

RM

S E

rror

RKHS

Gaussian process noise, Gaussian observation noise

Page 16: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

2 4 6

0.2

0.22

0.24

(1,1/3), N(0,10-2 I)

Training Size ( Steps)

RM

S E

rror

RKHS

Kalman

Skewed process noise, small observation noise

Page 17: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

2 3 4 5 60.15

0.155

0.16

0.165

0.17

0.175

0.5N(-0.2,10 -2 I)+0.5N(0.2,10 -2 I), N(0,10 -2 I)

Training Size ( Steps)

RM

S E

rror

RKHSKalman

Bimodal process noise, small observation noise

Page 18: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Camera Rotation

Rendered using POV-ray renderer

Page 19: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Predicted Camera Rotation

Ground truth

RKHS Kalman filter Random

Trace measure

3 2.91 2.18 0.01

Dynamical systems withHilbert space embeddings

works better

Page 20: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Video from Predicted Rotation

Original VideoVideo Rendered with

Predicted Camera Rotation

Page 21: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Conclusion

• Embedding conditional distributions– Generalizes to unobserved X=x– Kernel estimate – Convergence of empirical estimate– Probabilistic inference

• Application to dynamical systems learning and inference

Page 22: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

The End

• Thanks

Page 23: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Future Work

• Estimate the hidden states during training?

• Dynamical system = chain graphical model; how about tree and general graph topology?

Page 24: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

2 4 6

0.3

0.4

0.5

N(0,10-2 I), N(0,10-1 I)

Training Size (2 n Steps)

RM

S E

rror

RKHS

Kalman

Small process noise, large observation noise

Page 25: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Dynamics can Help

-3 -2 -1 0

11.5

22.5

log10(Noise Variance)

Tra

ce M

easu

re

Page 26: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Dynamical Systems

Maintain belief usingHilbert space embeddings

Hidden State

observation

Page 27: Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan

Conditional Embedding Operator

• Desired properties:1. 2.

• Covariance operator:

generalization of covariance matrix