nonparametric bayesian learning of switching dynamical processes

Massachusetts Institute of Technology

Stochastic Systems Group

Nonparametric Bayesian Learning of

Switching Dynamical Processes

Emily Fox, Erik Sudderth, Michael Jordan, and Alan Willsky

Nonparametric Bayes Workshop 2008

Helsinki, Finland

Laboratory for Information and Decision Systems


Applications


Priors on Modes

• Switching linear dynamical processes useful for describing nonlinear phenomena

• Goal: allow uncertainty in number of dynamical modes

Utilize hierarchical Dirichlet process (HDP) prior

Cluster based on dynamics

Switching Dynamical Processes

= set of dynamic parameters


Outline

• Background Switching dynamical processes: SLDS, VAR Prior on dynamic parameters Sticky HDP-HMM

• HDP-AR-HMM and HDP-SLDS

• Sampling Techniques

• Results Synthetic IBOVESPA stock index Dancing honey bee


Linear Dynamical Systems

• State space LTI model:

• Vector autoregressive (VAR) process:


Linear Dynamical Systems

• State space LTI model:

State space models

VAR processes

• Vector autoregressive (VAR) process:


Switching Dynamical Systems

• Switching linear dynamical system (SLDS):

• Switching VAR process:


Prior on Dynamic Parameters

Group all observations assigned to mode k

Define the following mode-specific matrices

Results in K decoupled linear regression problems

Rewrite VAR process in matrix form:

Place matrix-normal inverse Wishart prior on:


Sticky HDP-HMM

• Dirichlet process (DP): Mode space of unbounded size Model complexity adapts to

observations

• Hierarchical: Ties mode transition distributions Shared sparsity

• Sticky: self-transition bias parameter

Time

Mo

de

Infinite HMM: Beal, et.al., NIPS 2002HDP-HMM: Teh, et. al., JASA 2006 Sticky HDP-HMM: Fox, et.al., ICML 2008


• Global transition distribution:

Sticky HDP-HMM

sparsity of is shared,increased probability of self-transition

• Mode-specific transition distributions:


HDP-AR-HMM and HDP-SLDS

HDP-AR-HMM HDP-SLDS


Blocked Gibbs Sampler

Sample parameters

• Approximate HDP: Truncate stick-breaking Weak limit approximation:

• Sample transition distributions:

• Sample dynamic parameters using state sequence as VAR(1) pseudo-observations:

Fox, et.al., ICML 2008



Sample mode sequence

• Use state sequence as pseudo-observations of an HMM

• Compute backwards messages:

• Block sample as:



Sample state sequence

• Equivalent to LDS with time-varying dynamic parameters

• Compute backwards messages (backwards information filter):

• Block sample as:

All Gaussian distributions


Hyperparameters

• Place priors on hyperparameters and learn them from data

• Weakly informative priors

• All results use the same settings

hyperparameters

can be set using the data


Results: Synthetic VAR(1)

HDP-HMM

HDP-VAR(1)-HMM HDP-VAR(2)-HMM

HDP-SLDS

5-mode VAR(1) data


Results: Synthetic AR(2)

HDP-SLDS


HDP-HMM

3-mode AR(2) data


Results: Synthetic SLDS

HDP-SLDS


HDP-HMM

3-mode SLDS data


Results: IBOVESPA

• Data: Sao Paolo stock index

• Goal: detect changes in volatility

• Compare inferred change-points to 10 cited world events

sticky HDP-SLDS non-sticky HDP-SLDS ROC

Daily Returns

Carvalho and Lopes, Comp. Stat. & Data Anal., 2006


Results: Dancing Honey Bee

• 6 bee dance sequences with expert labeled dances: Turn right (green) Waggle (red) Turn left (blue)

Sequence 1 Sequence 2 Sequence 3 Sequence 4 Sequence 5 Sequence 6

TimeOh et. al., IJCV, 2007

x-pos

y-pos

sin

cos

• Observation vector: Head angle (cos, sin) x-y body position


Movie: Sequence 6



Nonparametric approach:

• Model: HDP-VAR(1)-HMM

• Set hyperparameters

• Unsupervised training from each sequence

• Infer: Number of modes Dynamic parameters Mode sequence

Supervised Approach [Oh:07]:

• Model: SLDS

• Set number of modes to 3

• Leave one out training: fixed label sequences on 5 of 6 sequences

• Data-driven MCMC Use learned cues (e.g., head angle)

to propose mode sequences

Oh et. al., IJCV, 2007



Sequence 4 Sequence 5 Sequence 6

HDP-AR-HMM: 83.2%SLDS [Oh]: 93.4%





Sequence 1 Sequence 2 Sequence 3





Conclusion

• Examined HDP as a prior for nonparametric Bayesian learning of SLDS and switching VAR processes.

• Presented efficient Gibbs sampler

• Demonstrated utility on simulated and real datasets

nonparametric bayesian learning of switching dynamical processes

Documents

conclusionexamined hdp

dynamic parameterssticky

dynamical processesemily

dancing honey beesequence

mode sequencesoh et

block sample

sample dynamic parameters

dynamic parametersresults