dynamic bayesian network

30
Dynamic Bayesian Network Fuzzy Systems Lifelog management

Upload: roza

Post on 14-Feb-2016

53 views

Category:

Documents


10 download

DESCRIPTION

Dynamic Bayesian Network. Fuzzy Systems Lifelog management. Outline. Introduction Definition Representation Inference Learning Comparison Summary. C. A. B. A. B. D. Brief Review of Bayesian Networks. Graphical representations of joint distributions:. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic Bayesian Network

Dynamic Bayesian Network

Fuzzy SystemsLifelog management

Page 2: Dynamic Bayesian Network

• Introduction

• Definition

• Representation

• Inference

• Learning

• Comparison

• Summary

Outline

Page 3: Dynamic Bayesian Network

Brief Review of Bayesian Networks• Graphical representations of joint distributions:

)/( ABP

A B

)(AP

)(AP ),/( CABPB

D

C

A

)(CP

)/( BDP

Static world, each random variable has a single fixed value.

)()()/()/(

BPAPABPBAP

Mathematical formula used for calculating conditional probabilities. Develop by the mathematician and theologian Thomas Bayes (published in 1763)

3/29

Page 4: Dynamic Bayesian Network

Introduction• Dynamic system

– Sequential data modeling (part of speech)– Time series modeling (activity recognition)

• Classic approaches – Linear models: ARIMA (autoregressive integrated moving average),

ARMAX (autoregressive moving average exogenous variables model)

– Nonlinear models: neural networks, decision trees– Problems

• Prediction of the future based on only a finite window• Difficult to incorporate prior knowledge• Difficult to deal with multi-dimensional inputs and/or outputs

• Recent approaches– Hidden Markov models (HMMs): discrete random variable– Kalman filter models (KFMs): continuous state variables– Dynamic Bayesian networks (DBNs)

4/29

Page 5: Dynamic Bayesian Network

Motivation

Time = t

Mt

Xt

Ot

Transportation Mode: Walking, Running, Car, Bus

True velocity and location

Observed location

Need conditional probability distributions

e.g. a distribution on (velocity, location)

given the transportation mode

Prior knowledge or learned from data

Time = t+1

Mt+1

Xt+1

Ot+1

Given a sequence of observations (Ot), find the most likely Mt’s that explain it.

Or could provide a probability distribution on the possible Mt’s.

5/29

Page 6: Dynamic Bayesian Network

• Introduction

• Definition

• Representation

• Inference

• Learning

• Comparison

• Summary

Outline

Page 7: Dynamic Bayesian Network

Dynamic Bayesian Networks• BNs consisting of a structure that repeats an indefinite (or dynamic)

number of times– Time-invariant: the term ‘dynamic’ means that we are modeling

a dynamic model, not that the networks change over time

• General form of HMMs and KFLs by representing the hidden and observed state in terms of state variables of complex interdependencies

D

A

C

B

D

A

C

B

D

A

C

B

frame i frame i+1frame i-1

7/29

Page 8: Dynamic Bayesian Network

Formal Definition• Defined as

– : a directed, acyclic graph of starting nodes (initial probability distribution)

– : a directed, acyclic graph of transition nodes (transition probabilities between time slices)

– : starting vectors of observable as well as hidden random variable

– : transition matrices regarding observable as well as hidden random variables

8/29

Page 9: Dynamic Bayesian Network

• Introduction

• Definition

• Representation

• Inference

• Learning

• Comparison

• Summary

Outline

Page 10: Dynamic Bayesian Network

Representation (1): Problem• Target: Is it raining today?

• Necessity to specify an unbounded number of conditional probability table, one for each variable in each slice

• Each one might involve an unbounded number of parents

}{}{

tt

tt

UERX

next step: specify dependencies among the variables.

10/29

Page 11: Dynamic Bayesian Network

Representation (2): Solution• Assume that change in the world state are caused by a stationary

process (unmoving process over time)

• Use Markov assumption - The current state depends on only in a finite history of previous states.

Using the first-order Markov process:

• In addition to restricting the parents of the state variable Xt, we must restrict the parents of the evidence variable Et

))(/( tt UParentUP is the same for all t

)/()/( 11:0 tttt XXPXXP Transition Model

)/(),/( 1:0:0 ttttt XEPEXEP Sensor Model

11/29

Page 12: Dynamic Bayesian Network

Representation: Extension• There are two possible fixes if the approximation is too inaccurate:

– Increasing the order of the Markov process model. For example, adding as a parent of , which might give slightly more accurate predictions

– Increasing the set of state variables. For example, adding to allow to incorporate historical records of rainy seasons, or adding , and to allow to use a physical model of rainy conditions

• Bigram

• Trigram

Wi+1WiWi-1 . . . . . .

Wi+1WiWi-1 . . . . . .

2tRain tRain

tSeason

teTemperatur tHumiditytPressure

12/29

Page 13: Dynamic Bayesian Network

• Introduction

• Definition

• Representation

• Inference

• Learning

• Comparison

• Summary

Outline

Page 14: Dynamic Bayesian Network

Inference: Overview• To infer the hidden states X given the observations Y1:t

– Extend HMM and KFM’s / call BN inference algorithms as subroutines – NP-hard problem

• Inference tasks– Filtering(monitoring): recursively estimate the belief state using Bayes’ rule

• Predict: computing P(Xt| y1:t-1 )• Updating: computing P(Xt | y1:t )• Throw away the old belief state once we have computed the

prediction(“rollup”)– Smoothing: estimate the state of the past, given all the evidence up to the

current time• Fixed-lag smoothing(hindsight): computing P(Xt-1 | y1:t ) where l > 0 is the

lag– Prediction: predict the future

• Lookahead: computing P(Xt+h | y1:t) where h > 0 is how far we want to look ahead

– Viterbi decoding: compute the most likely sequence of hidden states given the data

• MPE(abduction): x*1:t = argmax P(x1:t | y1:t )

14/29

Page 15: Dynamic Bayesian Network

Inference: Comparison• Filtering: r = t

• Smoothing: r > t

• Prediction: r < t

• Viterbi: MPE

15/29

Page 16: Dynamic Bayesian Network

Inference: Filtering• Compute the belief state - the posterior distribution over the current

state, given all evidence to date

• Filtering is what a rational agent needs to do in order to keep track of the current state so that the rational decisions can be made

• Given the results of filtering up to time t, one can easily compute the result for t+1 from the new evidence

)/( :1 tt eXP

))/(()/( 1:1,11:11 ttttt eXPefeXP

)/()/(

)/()/(

)/(

:1111

:11:1,11

1,:11

tttt

ttttt

ttt

eXPXeP

eXPeXeP

eeXP

(dividing up the evidence)

(for some function f)

(using Bayes’ Theorem)

(by the Marcov propertyof evidence)

α is a normalizing constant used to make probabilities sum up to 1

16/29

Page 17: Dynamic Bayesian Network

Inference: Filtering• Illustration for two steps in the Umbrella example:

• On day 1, the umbrella appears so U1=true

– The prediction from t=0 to t=1 is

and updating it with the evidence for t=1 gives

• On day 2, the umbrella appears so U2=true– The prediction from t=1 to t=2 is

and updating it with the evidence for t=2 gives

0

)()/()( 0011r

rPrRPRP

)()/()/( 11111 RPRuPuRP

1

)/()/()/( 111212r

urPrRPuRP

)/()/(),/( 1222212 uRPRuPuuRP

17/29

Page 18: Dynamic Bayesian Network

Inference: Smoothing• Compute the posterior distribution over the past state, given all

evidence up to the present

• Hindsight provides a better estimate of the state than was available at the time, because it incorporates more evidence

)/( :1 tk eXP for some k such that 0 ≤ k < t.

18/29

Page 19: Dynamic Bayesian Network

Inference: Prediction• Compute the posterior distribution over the future state, given all

evidence to date

• The task of prediction can be seen simply as filtering without the addition of new evidence

)/( :1 tkt eXP for some k>0

19/29

Page 20: Dynamic Bayesian Network

Inference: Most Likely Explanation (MLE)• Compute the sequence of states that is most likely to have

generated a given sequence of observation

• Algorithms for this task are useful in many applications, including speech recognition

• There exist a recursive relationship between the most likely paths to each state Xt+1 and the most likely paths to each state Xt. This relationship can be write as an equation connecting the probabilities of the paths:

)/(maxarg :1:1:1 ttx eXPt

))/,...,()/(()/(

)/,,...,(

:111...

111

1:111

maxmax

max

11

...1

ttXX

ttX

tt

tttXX

exxPxXPXeP

eXxxP

tt

t

20/29

Page 21: Dynamic Bayesian Network

Inference: Algorithms• Exact Inference algorithms

– Forwards-backwards smoothing algorithm (on any discrete-state DBN)

– The frontier algorithm (sweep a Markov blanket, the frontier set F, across the DBN, first forwards and then backwards)

– The interface algorithm (use only the set of nodes with outgoing arcs to the next time slice to d-separate the past from the future)

– Kalman filtering and smoothing

• Approximate algorithms:– The Boyen-Koller (BK) algorithm (approximate the joint distribution

over the interface as a product of marginals)– Factored frontier (FF) algorithm / Loopy propagation algorithm (LBP)– Kalman filtering and smoother– Stochastic sampling algorithm:

• Importance sampling or MCMC (offline inference)• Particle filtering (PF) (online)

21/29

Page 22: Dynamic Bayesian Network

• Introduction

• Definition

• Representation

• Inference

• Learning

• Comparison

• Summary

Outline

Page 23: Dynamic Bayesian Network

Learning (1)• The techniques for learning DBN are mostly straightforward extensions of

the techniques for learning BNs• Parameter learning

– The transition model P(Xt | Xt-1) / The observation model P(Yt | Xt) – Offline learning

• Parameters must be tied across time-slices• The initial state of the dynamic system can be learned independently

of the transition matrix– Online learning

• Add the parameters to the state space and then do online inference (filtering)

– The usual criterion is maximum-likelihood(ML)

• The goal of parameter learning is to compute– θ*

ML = argmaxθP( Y| θ) = argmaxθlog P( Y| θ) – θ*

MAP = argmaxθlog P( Y| θ) + logP(θ)– Two standard approaches: gradient ascent and EM(Expectation

Maximization)

23/29

Page 24: Dynamic Bayesian Network

Learning (2)• Structure learning

– The intra-slice connectivity must be a DAG– Learning the inter-slice connectivity is equivalent to the variable

selection problem, since for each node in slice t, we must choose its parents from slice t-1.

– Learning for DBNs reduces to feature selection if we assume the intra-slice connections are fixed

24/29

Page 25: Dynamic Bayesian Network

• Introduction

• Definition

• Representation

• Inference

• Learning

• Comparison

• Summary

Outline

Page 26: Dynamic Bayesian Network

Comparison (HMM: Hidden Markov Model)

• Structure– One discrete hidden node (X: hidden variables)– One discrete or continuous observed node per time slice (Y: observations)

• Parameters– The initial state distribution P( X1 ) – The transition model P( Xt | Xt-1 )– The observation model P( Yt | Xt )

• Features– A discrete state variable with arbitrary dynamics and arbitrary

measurements– Structures and parameters remain same over time

X1

Y1

X2

Y2

X3

Y3

X4

Y4

26/29

Page 27: Dynamic Bayesian Network

Comparison with HMMs• HMMs

1 2 3

obs obs obs

.3.7 .8

.21

1 2 3qiqi-1

1 .7 .3 0

2 0 .8 .23 0 0 1

P(qi|qi-1)

q=1

q=2

q=3

P(obsi | qi)

frame i

Qi

obsi

. . .. . .Qi-1

obsi-1

Qi+1

obsi+1

frame i+1frame i-1

= state= allowed transition

= variable= allowed dependency

• DBNs

27/29

Page 28: Dynamic Bayesian Network

Comparison (KFL: Kalman Filter Model)• KFL has the same topology as an HMM

• All the nodes are assumed to have linear-Gaussian distributions

– x(t+1) = F*x(t) + w(t), • w ~ N(0, Q) : process noise, x(0) ~ N(X(0), V(0))

– y(t) = H*x(t) + v(t), • v ~ N(0, R) : measurement noise

• Features– A continuous state variable with linear-Gaussian dynamics and

measurements– Also known as Linear Dynamic Systems(LDSs)

• A partially observed stochastic process • With linear dynamics and linear observations: f( a + b) = f(a) +

f(b)• Both subject to Gaussian noise

X1

Y1

X2

Y2

28/29

Page 29: Dynamic Bayesian Network

Comparison with HMM and KFM• DBN represents the hidden state in terms of a set of random

variables– HMM’s state space consists of a single random variable

• DBN allows arbitrary CPDs– KFM requires all the CPDs to be linear-Gaussian

• DBN allows much more general graph structures– HMMs and KFMs have a restricted topology

• DBN generalizes HMM and KFM (more expressive power)

29/29

Page 30: Dynamic Bayesian Network

Summary• DBN: a Bayesian network with a temporal probability model

• Complexity in DBNs– Inference– Structure learning

• Comparison with other methods– HMMs: discrete variables– KFMs: continuous variables

• Discussion– Why to use DBNs instead of HMMs or KFMs?– Why to use DBNs instead of BNs?

30/29