dynamic bayesian network
DESCRIPTION
Dynamic Bayesian Network. Fuzzy Systems Lifelog management. Outline. Introduction Definition Representation Inference Learning Comparison Summary. C. A. B. A. B. D. Brief Review of Bayesian Networks. Graphical representations of joint distributions:. - PowerPoint PPT PresentationTRANSCRIPT
Dynamic Bayesian Network
Fuzzy SystemsLifelog management
• Introduction
• Definition
• Representation
• Inference
• Learning
• Comparison
• Summary
Outline
Brief Review of Bayesian Networks• Graphical representations of joint distributions:
)/( ABP
A B
)(AP
)(AP ),/( CABPB
D
C
A
)(CP
)/( BDP
Static world, each random variable has a single fixed value.
)()()/()/(
BPAPABPBAP
Mathematical formula used for calculating conditional probabilities. Develop by the mathematician and theologian Thomas Bayes (published in 1763)
3/29
Introduction• Dynamic system
– Sequential data modeling (part of speech)– Time series modeling (activity recognition)
• Classic approaches – Linear models: ARIMA (autoregressive integrated moving average),
ARMAX (autoregressive moving average exogenous variables model)
– Nonlinear models: neural networks, decision trees– Problems
• Prediction of the future based on only a finite window• Difficult to incorporate prior knowledge• Difficult to deal with multi-dimensional inputs and/or outputs
• Recent approaches– Hidden Markov models (HMMs): discrete random variable– Kalman filter models (KFMs): continuous state variables– Dynamic Bayesian networks (DBNs)
4/29
Motivation
Time = t
Mt
Xt
Ot
Transportation Mode: Walking, Running, Car, Bus
True velocity and location
Observed location
Need conditional probability distributions
e.g. a distribution on (velocity, location)
given the transportation mode
Prior knowledge or learned from data
Time = t+1
Mt+1
Xt+1
Ot+1
Given a sequence of observations (Ot), find the most likely Mt’s that explain it.
Or could provide a probability distribution on the possible Mt’s.
5/29
• Introduction
• Definition
• Representation
• Inference
• Learning
• Comparison
• Summary
Outline
Dynamic Bayesian Networks• BNs consisting of a structure that repeats an indefinite (or dynamic)
number of times– Time-invariant: the term ‘dynamic’ means that we are modeling
a dynamic model, not that the networks change over time
• General form of HMMs and KFLs by representing the hidden and observed state in terms of state variables of complex interdependencies
D
A
C
B
D
A
C
B
D
A
C
B
frame i frame i+1frame i-1
7/29
Formal Definition• Defined as
– : a directed, acyclic graph of starting nodes (initial probability distribution)
– : a directed, acyclic graph of transition nodes (transition probabilities between time slices)
– : starting vectors of observable as well as hidden random variable
– : transition matrices regarding observable as well as hidden random variables
8/29
• Introduction
• Definition
• Representation
• Inference
• Learning
• Comparison
• Summary
Outline
Representation (1): Problem• Target: Is it raining today?
• Necessity to specify an unbounded number of conditional probability table, one for each variable in each slice
• Each one might involve an unbounded number of parents
}{}{
tt
tt
UERX
next step: specify dependencies among the variables.
10/29
Representation (2): Solution• Assume that change in the world state are caused by a stationary
process (unmoving process over time)
• Use Markov assumption - The current state depends on only in a finite history of previous states.
Using the first-order Markov process:
• In addition to restricting the parents of the state variable Xt, we must restrict the parents of the evidence variable Et
))(/( tt UParentUP is the same for all t
)/()/( 11:0 tttt XXPXXP Transition Model
)/(),/( 1:0:0 ttttt XEPEXEP Sensor Model
11/29
Representation: Extension• There are two possible fixes if the approximation is too inaccurate:
– Increasing the order of the Markov process model. For example, adding as a parent of , which might give slightly more accurate predictions
– Increasing the set of state variables. For example, adding to allow to incorporate historical records of rainy seasons, or adding , and to allow to use a physical model of rainy conditions
• Bigram
• Trigram
Wi+1WiWi-1 . . . . . .
Wi+1WiWi-1 . . . . . .
2tRain tRain
tSeason
teTemperatur tHumiditytPressure
12/29
• Introduction
• Definition
• Representation
• Inference
• Learning
• Comparison
• Summary
Outline
Inference: Overview• To infer the hidden states X given the observations Y1:t
– Extend HMM and KFM’s / call BN inference algorithms as subroutines – NP-hard problem
• Inference tasks– Filtering(monitoring): recursively estimate the belief state using Bayes’ rule
• Predict: computing P(Xt| y1:t-1 )• Updating: computing P(Xt | y1:t )• Throw away the old belief state once we have computed the
prediction(“rollup”)– Smoothing: estimate the state of the past, given all the evidence up to the
current time• Fixed-lag smoothing(hindsight): computing P(Xt-1 | y1:t ) where l > 0 is the
lag– Prediction: predict the future
• Lookahead: computing P(Xt+h | y1:t) where h > 0 is how far we want to look ahead
– Viterbi decoding: compute the most likely sequence of hidden states given the data
• MPE(abduction): x*1:t = argmax P(x1:t | y1:t )
14/29
Inference: Comparison• Filtering: r = t
• Smoothing: r > t
• Prediction: r < t
• Viterbi: MPE
15/29
Inference: Filtering• Compute the belief state - the posterior distribution over the current
state, given all evidence to date
• Filtering is what a rational agent needs to do in order to keep track of the current state so that the rational decisions can be made
• Given the results of filtering up to time t, one can easily compute the result for t+1 from the new evidence
)/( :1 tt eXP
))/(()/( 1:1,11:11 ttttt eXPefeXP
)/()/(
)/()/(
)/(
:1111
:11:1,11
1,:11
tttt
ttttt
ttt
eXPXeP
eXPeXeP
eeXP
(dividing up the evidence)
(for some function f)
(using Bayes’ Theorem)
(by the Marcov propertyof evidence)
α is a normalizing constant used to make probabilities sum up to 1
16/29
Inference: Filtering• Illustration for two steps in the Umbrella example:
• On day 1, the umbrella appears so U1=true
– The prediction from t=0 to t=1 is
and updating it with the evidence for t=1 gives
• On day 2, the umbrella appears so U2=true– The prediction from t=1 to t=2 is
and updating it with the evidence for t=2 gives
0
)()/()( 0011r
rPrRPRP
)()/()/( 11111 RPRuPuRP
1
)/()/()/( 111212r
urPrRPuRP
)/()/(),/( 1222212 uRPRuPuuRP
17/29
Inference: Smoothing• Compute the posterior distribution over the past state, given all
evidence up to the present
• Hindsight provides a better estimate of the state than was available at the time, because it incorporates more evidence
)/( :1 tk eXP for some k such that 0 ≤ k < t.
18/29
Inference: Prediction• Compute the posterior distribution over the future state, given all
evidence to date
• The task of prediction can be seen simply as filtering without the addition of new evidence
)/( :1 tkt eXP for some k>0
19/29
Inference: Most Likely Explanation (MLE)• Compute the sequence of states that is most likely to have
generated a given sequence of observation
• Algorithms for this task are useful in many applications, including speech recognition
• There exist a recursive relationship between the most likely paths to each state Xt+1 and the most likely paths to each state Xt. This relationship can be write as an equation connecting the probabilities of the paths:
)/(maxarg :1:1:1 ttx eXPt
))/,...,()/(()/(
)/,,...,(
:111...
111
1:111
maxmax
max
11
...1
ttXX
ttX
tt
tttXX
exxPxXPXeP
eXxxP
tt
t
20/29
Inference: Algorithms• Exact Inference algorithms
– Forwards-backwards smoothing algorithm (on any discrete-state DBN)
– The frontier algorithm (sweep a Markov blanket, the frontier set F, across the DBN, first forwards and then backwards)
– The interface algorithm (use only the set of nodes with outgoing arcs to the next time slice to d-separate the past from the future)
– Kalman filtering and smoothing
• Approximate algorithms:– The Boyen-Koller (BK) algorithm (approximate the joint distribution
over the interface as a product of marginals)– Factored frontier (FF) algorithm / Loopy propagation algorithm (LBP)– Kalman filtering and smoother– Stochastic sampling algorithm:
• Importance sampling or MCMC (offline inference)• Particle filtering (PF) (online)
21/29
• Introduction
• Definition
• Representation
• Inference
• Learning
• Comparison
• Summary
Outline
Learning (1)• The techniques for learning DBN are mostly straightforward extensions of
the techniques for learning BNs• Parameter learning
– The transition model P(Xt | Xt-1) / The observation model P(Yt | Xt) – Offline learning
• Parameters must be tied across time-slices• The initial state of the dynamic system can be learned independently
of the transition matrix– Online learning
• Add the parameters to the state space and then do online inference (filtering)
– The usual criterion is maximum-likelihood(ML)
• The goal of parameter learning is to compute– θ*
ML = argmaxθP( Y| θ) = argmaxθlog P( Y| θ) – θ*
MAP = argmaxθlog P( Y| θ) + logP(θ)– Two standard approaches: gradient ascent and EM(Expectation
Maximization)
23/29
Learning (2)• Structure learning
– The intra-slice connectivity must be a DAG– Learning the inter-slice connectivity is equivalent to the variable
selection problem, since for each node in slice t, we must choose its parents from slice t-1.
– Learning for DBNs reduces to feature selection if we assume the intra-slice connections are fixed
24/29
• Introduction
• Definition
• Representation
• Inference
• Learning
• Comparison
• Summary
Outline
Comparison (HMM: Hidden Markov Model)
• Structure– One discrete hidden node (X: hidden variables)– One discrete or continuous observed node per time slice (Y: observations)
• Parameters– The initial state distribution P( X1 ) – The transition model P( Xt | Xt-1 )– The observation model P( Yt | Xt )
• Features– A discrete state variable with arbitrary dynamics and arbitrary
measurements– Structures and parameters remain same over time
X1
Y1
X2
Y2
X3
Y3
X4
Y4
26/29
Comparison with HMMs• HMMs
1 2 3
obs obs obs
.3.7 .8
.21
1 2 3qiqi-1
1 .7 .3 0
2 0 .8 .23 0 0 1
P(qi|qi-1)
q=1
q=2
q=3
P(obsi | qi)
frame i
Qi
obsi
. . .. . .Qi-1
obsi-1
Qi+1
obsi+1
frame i+1frame i-1
= state= allowed transition
= variable= allowed dependency
• DBNs
27/29
Comparison (KFL: Kalman Filter Model)• KFL has the same topology as an HMM
• All the nodes are assumed to have linear-Gaussian distributions
– x(t+1) = F*x(t) + w(t), • w ~ N(0, Q) : process noise, x(0) ~ N(X(0), V(0))
– y(t) = H*x(t) + v(t), • v ~ N(0, R) : measurement noise
• Features– A continuous state variable with linear-Gaussian dynamics and
measurements– Also known as Linear Dynamic Systems(LDSs)
• A partially observed stochastic process • With linear dynamics and linear observations: f( a + b) = f(a) +
f(b)• Both subject to Gaussian noise
X1
Y1
X2
Y2
28/29
Comparison with HMM and KFM• DBN represents the hidden state in terms of a set of random
variables– HMM’s state space consists of a single random variable
• DBN allows arbitrary CPDs– KFM requires all the CPDs to be linear-Gaussian
• DBN allows much more general graph structures– HMMs and KFMs have a restricted topology
• DBN generalizes HMM and KFM (more expressive power)
29/29
Summary• DBN: a Bayesian network with a temporal probability model
• Complexity in DBNs– Inference– Structure learning
• Comparison with other methods– HMMs: discrete variables– KFMs: continuous variables
• Discussion– Why to use DBNs instead of HMMs or KFMs?– Why to use DBNs instead of BNs?
30/29