1
Russell and Norvig, AIMA : Chapter 15Part B – 15.3, 15.4
Probabilistic Reasoning over Time
Presented to: Prof. Dr. S. M. Aqil Burney
Presented by: Zain Abbas (MSCS-UBIT)
2
Agenda
Temporal probabilistic agents
Inference: Filtering, prediction,
smoothing and most likely
explanation
Hidden Markov models
Kalman filters
Dynamic Bayesian Networks
3
Stochastic (Random) Process
A process that grows in space or time in accordance with some probability distribution.
In the simplest possible case ("discrete time"), a stochastic process amounts to a sequence of random variables
4
Markov Chain
A stochastic process (family of random variables) {Xn, n = 0, 1, 2, . . .}, satisfying it takes on a finite or countable number of
possible values. If Xn = i, the process is said to be in state i at time n
whenever the process is in state i , there is a fixed probability Pij that it will next be in state j . Formally:
P{Xn+1= j |Xn = i ,Xn−1= in−1, . . . ,X1= i1,X0= i0 } = Pij
for all states i0, i1, . . . , in−1, in, i , j and all n≥0.
5
Hidden Markov Model
Set of states: Process moves from one state to another
generating a sequence of states : Markov chain property: probability of
each subsequent state depends only on what was the previous state:
States are not visible, but each state randomly generates one of M observations (or visible states)
},,,{ 21 Nsss
,,,, 21 ikii sss
)|(),,,|( 1121 ikikikiiik ssPssssP
},,,{ 21 Mvvv
6
Hidden Markov Model
To define hidden Markov model, the following probabilities have to be specified:
Matrix of transition probabilities A=(aij) where aij= P(si | sj)
Matrix of observation probabilities B=( bi (vm ) ) where bi(vm ) = P(vm | si)
A vector of initial probabilities =(i) where i = P(si) .
7
Hidden Markov Model Hidden Markov model unfolded in time
HMM ( Graphical View)
8
Summary of the Concept
Q
Q
QXPQP
QXPXP
)|()(
),()(
Q
TTT qqqxxxPqqqP )|()( 212121
Q
T
ttt
T
ttt qxpqqP
111 )|()|(
Markov chain process Output process
9
Earlier Example
Transition Matrix Tij =
Sensor Matrix with U1= true, O1=
7.03.0
3.07.0
2.00
09.0
10
Messages as column vectors Forward and backward messages as column
vectors:
tT
tt fTOf :111:1
tx
tttttttt exPxXPXePeXP )|()|()|()|( :11111:11
)|()|()|( :1:1:1 kkktktk eXPXePeXP
tkktk bOTb :21:1
11
Messages as column vectors
Can avoid storing all forward messages in smoothing by running forward algorithm backwards:
tttT
tT
tt
tT
tt
ffOT
fTfO
fTOf
:11:111
1
:11:111
:111:1
12
Example
Low High
0.70.3
0.2 0.8
DryRain
0.6 0.60.4 0.4
13
Example
Two states: ‘Low’ and ‘High’ atmospheric pressure
Two observations: ‘Rain’ and ‘Dry’ Transition probabilities:
P(‘Low’|‘Low’)=0.3 , P(‘High’|‘Low’)=0.7 , P(‘Low’|‘High’)=0.2, P(‘High’|‘High’)=0.8
Observation probabilities: P(‘Rain’|‘Low’)=0.6 , P(‘Dry’|‘Low’)=0.4 , P(‘Rain’|‘High’)=0.4 , P(‘Dry’|‘High’)=0.3
Initial probabilities: say P(‘Low’)=0.4 , P(‘High’)=0.6 .
14
Calculation of probabilities Suppose we want to calculate a probability of
a sequence of observations in our example, {‘Dry’, ’Rain’}
Consider all possible hidden state sequences
P({‘Dry’,’Rain’} ) = P({‘Dry’,’Rain’} , {‘Low’,’Low’}) + P({‘Dry’,’Rain’} , {‘Low’,’High’}) + P({‘Dry’,’Rain’} , {‘High’,’Low’}) + P({‘Dry’,’Rain’} , {‘High’,’High’})
15
Calculation of probabilities The first term can be calculated as : P({‘Dry’,’Rain’} , {‘Low’,’Low’})
=P({‘Dry’,’Rain’} | {‘Low’,’Low’}) * P({‘Low’,’Low’})
=P(‘Dry’|’Low’)*P(‘Rain’|’Low’) * P(‘Low’)*P(‘Low’|’Low)
= 0.4*0.4*0.6*0.4*0.3 = 0.088
16
Agenda
Temporal probabilistic agents
Inference: Filtering, prediction,
smoothing and most likely
explanation
Hidden Markov models
Kalman filters
Dynamic Bayesian Networks
17
Kalman Filters
System state cannot be measured directly Need to estimate “optimally” from
measurements
Measuring Devices
Kalman Filter
MeasurementError Sources
System State (desired but not known)
External Controls
Observed Measurements
Optimal Estimate of
System State
SystemError Sources
System
Black Box
18
What is a Kalman Filter?
A set of mathematical equations
Iterative, recursive process
Optimal data processing algorithm under
certain criteria
For linear system and white Gaussian errors,
Kalman filter is “best” estimate based on all
previous measurements
Estimates past, present, future states
19
White Gaussian Noise
White noise is a random signal (or process) with a flat power spectral density. In other words, the signal contains equal power within a fixed bandwidth at any center frequency.
20
Optimal
Dependent upon the criteria chosen to evaluate performance
Under certain assumptions, KF is optimal with respect to virtually any criteria that makes sense Linear data Gaussian model
21
Recursive
A Kalman filter only needs info from the previous state Updated for each iteration Older data can be discarded
▪ Saves computation capacity and storage
22
Variables
In order to use the Kalman filter to estimate the internal state of a process given only a sequence of noisy observations, one must model the process in accordance with the framework of the Kalman filter.
This means specifying the matrices Fk , Hk
, Qk , Rk and sometimes Bk for each time-step k .
23
Variables
xk = state vector, process to examine wk = process noise
White, Gaussian, Mean=0, Covariance Matrix Q vk = measurement noise
White, Gaussian, Mean=0, Covariance Matrix R Uncorrelated with wk
Sk = Covariance of the innovation, residual Kk = Kalman gain matrix Pk = Covariance of prediction error zk= Measurement of system state
24
Equations
kkk
kk
k
k
k
kkk
vposz
at
t
vel
pos
vel
pos
wAxX
210
01 2
1
1
1
25
More Equations
kkkkk
Tkkk
Tkk
kkk
kk
xAzKxAx
APSAPQAAPP
SAPK
RPS
ˆˆˆ 11
11
1
26
Kalman gain
Relates the new estimate to the most certain of the previous estimates Large measurement noise -> Small gain Large system noise -> Large gain
System and measurement noise unchanged Steady-state Kalman Filter
27
Kalman Filter
The Kalman filter has two distinct phases: Predict Update
The predict phase uses the state estimate from the previous timestep to produce an estimate of the state at the current timestep.
In the update phase, measurement information at the current timestep is used to refine this prediction to arrive at a new, (hopefully) more accurate state estimate, again for the current timestep.
28
Iterative calculations
Prediction The state The error covariance
Update Kalman gain Update with new
measurement Update with new error
covariance
Update
Predict
29
Iterative calculations
Prediction
Update
Update
Predict
Tkkk
Tkk
kkk
APSAPQAAPP
wAxX1
1
11
kkkk
kkkkk
kkk
PSKIP
xAzKxAx
SAPK
1
11
1
ˆˆˆ
30
Lost on the 1-dimensional line Position – y(t) Assume Gaussian distributed
measurements
y
Example
Example
0 10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
• Sextant Measurement at t1: Mean = z1 and Variance = z1
• Optimal estimate of position is: ŷ(t1) = z1
• Variance of error in estimate: 2x (t1) = 2
z1
• Boat in same position at time t2 - Predicted position is z1
31
0 10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Example
• So we have the prediction ŷ-(t2)• GPS Measurement at t2: Mean = z2 and Variance = z2
• Need to correct the prediction due to measurement to get ŷ(t2)• Closer to more trusted measurement – linear interpolation?
prediction ŷ-(t2)measurement z(t2)
32
33
0 10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
• Corrected mean is the new optimal estimate of position• New variance is smaller than either of the previous two
variances
measurement z(t2)
corrected optimal estimate ŷ(t2)
prediction ŷ-(t2)
Example
34
Example – Accelerating Spacecraft
• Assume that the system variables, represented by the
vector x, are governed by the equation
xk+1 = Axk + wk
where wk is random process noise, and the subscripts on the
vectors represent the time step.
• A spacecraft is accelerating with random bursts of gas from
its reaction control system thrusters
• The vector x might consist of position p and velocity v.
35
Example – Accelerating Spacecraft
The system equation would be given by
where ak is the random time-varying acceleration, and T is the time between step k and step k+1.
k
T
k
k
k
k aTv
pT
v
p
2
1
12
10
1
36
Example – Accelerating Spacecraft
The system represented was simulated on a computer with random bursts of acceleration which had a standard deviation of 0.5 feet/sec2.
The position was measured with an error of 10 feet (one standard deviation).
Software used: MATLAB®
37
Example – Accelerating Spacecraft