1 probabilistic reasoning over time (especially for hmm and kalman filter ) december 1 th, 2004...
Post on 20-Dec-2015
214 views
TRANSCRIPT
1
Probabilistic Reasoning Over Time
(Especially for HMM and Kalman filter )
December 1th, 2004
SeongHun LeeInHo ParkYang Ming
2
Contents
Markov Models Hidden Markov Models
HMMs as Generative Processes Markov Assumptions for HMMs The 3 Problems of HMMs HMMs for Speech Recognition
Kalman filters
4
Markov Process
Stochastic process of a temporal sequence Probability distribution of the variable q at time t depends on
the variable q at times t-1 to 1
First order Markov process State transition from state depends on previous state: P[qt=j|qt-1=i, qt-2=k,…] = P[qt=j|qt-1=i]
State transition is independent of time: aij = P[qt=j|qt-1=i]
5
Markov Models
Markov Model Model of a Markov process wi
th discrete state Given the observed sequenc
e, state sequence uniquely defined. Probability of the state seque
nce 's1 s3 s1 s2 s2 s3' given the observation sequence 'A C A B B C' is 1
7
Example of Markov Model
Markov chain with 3 states 3 states : sunny, cloudy, rain
0.2
sunny
cloudy rain
0.3
0.40.3
0.6
0.1
0.2
0.8
0.1 0.8 0.1 0.1
0.2 0.6 0.2
0.3 0.3 0.4
sunny cloudy rain
sunny
cloudy
rain
WeatherOfToday
Weather of Tomorrow
8
Example of Markov Model (cont’) Probability of a sequence S
Compute product of successive probabilities Ex. How is weather for next 2 days (today : sunny)?
Possible answer : sunny-sunny with 64%
sunny cloudyrain
0.10.2
P(sunny,cloudy,rain)
= P(sunny)P(cloudy|sunny)P(rain|cloudy)
= 1.0 x 0.1 x 0.2 = 0.02
sunny sunnysunny
0.80.8
P(sunny,sunny,sunny)
= P(sunny)P(sunny|sunny)P(sunny|cloudy)
= 1.0 x 0.8 x 0.8 = 0.64
10
Hidden Markov Model
Hidden Markov Model State is not observed (hidden) Observable symptom (output)
Transition probabilities between states Depend only on previous state:
Emission probabilties Depend only on the current state:
(where xt is observed)
11
Markov Assumptions
Emissions Probability to emit xt at time t in state qt = i does not depend
on anything else:
Transitions Probability to go from state j to state i at time t does not dep
end on anything else
Probability does not depend on time t:
13
HMM as Generative Processes
HMM can be use to generate sequences Define a set of starting states with initial probabilities P(q0 =
i) Define a set of final states For each sequence to generate:
Select an initial state j according to P(q0) Select the next state i according to P(qt = i|qt-1=j) Emit an output according to the emission distribution P(xt|qt =
i) If i is a final state, then stop, otherwise loop to step 2
14
Coin Toss Model
2-Coins model Description
State S={S1, S2} : two different biased coins Each state characterized by probability distribution of
heads and tails States transitions characterized by state transition
matrix Observation symbol V={H, T} (H: Head, T: Tail)
given
hidden
15
Urn and Ball Model
Each urn contain colored balls (4 distinct colors)
Basic step Choose urn according to some
probabilistic procedure Get a ball from the urn Record (observe) its color Replace the ball Repeat the above procedure.
Colors of selected balls are observed but sequence of choosing urns is hidden
17
The 3 Problems of HMMs
HMM model gives rise to 3 different problems: The Evaluation Problem
Given HMM parameterized by , compute likelihood of a sequence
The Decoding Problem Given HMM parameterized by , compute optimal path Q throug
h the state space given a sequence X:
The Learning Problem Given an HMM parameterized by and a set of sequences Xn,
select parameters such that:
18
The Evaluation Problem
Finding Probability of Observation Sphinx quiz
Sphinx in castle. The sphinx proposes a quiz. The sphinx unseen to you
shows a card from 4 kinds (spade, heart, diamond, clover) every day.
It depends on her feeling at the day which card is chosen.
The feeling change pattern and preference for each feeling are known.
After 3 cards are shown, you must answer probability of the observation sequence
19
The Evaluation Problem
Straightforward way
Straightforward way Enumerating every possible state sequence
of length T(the number of observation) P( ) = P( ) +
P( ) + … + P( )
Time complexity : 2 * T * NT
Time complexity is too high Consider
Use probability of partial observation
20
The Evaluation Problem
Forward Variable Approach Forward variable
Save probability of partial observation sequence in state matrix.
Forward variable in Sj Use “Forward Variable” in previous
states Calculate each transition probabilit
y with forward variable and emittion probability.
Sum all calculations.
21
The Evaluation Problem
Forward Variable Approach Forward variable
Probability of having generated sequence and being in state i at time t
22
The Evaluation Problem
Forward Variable Approach Reminder: Initial condition:
-> prior probabilities of each state i
Compute for each state i and each time t of a given sequence
Compute likelihood as follows:
Sum αT(i)’sto get P(O|λ)
23
The Evaluation Problem Forward Variable Approach Let’s Do it.
Assume prior probability P( )=P( )=.5 ( ,1) = P( ) * P( | ) = .5 * .2 ( ,1) = P( ) * P( | ) = .5 * .1
( ,2) = ( ,1)* P( | )*P( | ) + ( ,1) * P( | ) * P( | )
…
24
The Decoding Problem
Finding Best State Sequence Sphinx quiz
The sphinx changes a quiz. Same condition as before After 3 cards are shown, you
must find the sequence of her feelings (maximum likely state sequence) Answer is : ?
? …
25
The Decoding Problem
Choosing Individually most likely states Find individually most likely state
Find most likely first state, Find most likely second state, and so on
In Quiz We get
Problem No guarantee that path is valid one when HMM has
state transition with zero probability
individuallychosen
……
……
individuallychosen
zero prob.transition
26
The Decoding Problem
Viterbi algorithm Find single best state sequence path
Maximize P(Q|X, ), i.e. maximize P(Q,X| ) Based on dynamic programming methods
Dynamic programming Similar to shortest path algorithm Use “Viterbi Variable” in previous states
Have maximum probability of partial sequence Have sequence of its states
Calculate each transition probability with Viterbi variable and emittion probability.
Choose state in previous states, which has maximum result
27
The Decoding Problem
Viterbi algorithm Viterbi algorithm finds the best state sequence Viterbi variable:
28
The Decoding Problem
Viterbi algorithm step 1 : Initialization
1( i ) = ibi (O1) for 1≤i≤N, ( is initial prob, b is output prob.) 1( i ) = 0 (sequnce of best path)
step 2 : Induction t( j ) = Max[ t-1( i ) aij ] bj (Ot), 1≤j≤N t( j ) = argmax[ t-1( i ) aij ], 1≤j≤N (store backtrace)
step 3 : Termination P* = Max[T (s)] qT
* = argmax[T(s)]
step 4 : Path (state sequence) backtracking (t=T-1..1) qt
* = t+1(qt+1
*)
1
2
3
states
29
The Decoding Problem
Viterbi algorithm Let’s Do it Step 1: Initialization
1( ) = P( ) * P( | ) = .5 * .2 = .1 1( ) = P( ) * P( | ) = .5 * .1 = .05
Step 2: Induction 1( ) * P( | ) * P ( | )
= .1 * .8 * .6 = 0.048 1( ) * P( | ) * P ( | )
= .05 * .6 * .6 = 0.018 2( ) = 0.048 …
30
The Learning Problem
Parameter Estimation Problems
Sphinx quiz Sphinx changes a quiz again!! No information about
“condition of feeling changes” and “choosing card”
With many card sequences you have to find best model which give best conditions of “feeling changes” and “choosing card”
31
The Learning Problem
Baum-Welch Method Find
: model parameter
Locally maximize it by iterative hill-climbing algorithm Work out probability of observations using some model. Find which state transitions, symbol emissions used
most. By increasing probability of those, choose revised model
which gives higher probability to observations Training !
)|(maxarg
trainingOP
32
The Learning Problem
Baum-Welch Method
Baum-Welch Method Algorithms Step 1 : Begin with some model
(perhaps pre-selected or just chosen randomly)
Step 2 : Run O through current model to estimate expectations of each model parameter
Step 3 : Change model to maximize values of paths used a lot
Step 4 : Repeat this process, until converging on optimal values for the model parameter
33
The Learning Problem
Baum-Welch Method Let’s Do it
Step 1: Choose initial model
Step 2: Run O through current model to estimate expectations of each model parameter
Step 3 : Change model to maximize values of paths used a lot
Step 4 : Repeat this process, until converging on optimal values for the model parameter
35
Sequential Data
Often highly variable, but has embedded structure
Information contained in the structure
37
HMMs for Speech Recognition
Find a sequence of phonemes (or words) given an acoustic sequence ex. “How to wreak a nice beach.” ex. “How to recognize speech.”
Idea: use a phoneme model
38
Phoneme model
Phoneme Smallest unit of sound Distinct meaning
Consonant Vowel
Phoneme model Observed speech signa
ls Find sequence of state
s Maximize P(signals|st
ates)
39
Embbeded Training of HMMs
For each acoustic sequence in training set, create new HMM as concatenation of the HMMs representing underlying sequence of phonemes.
Maximize likelihood of training sentences.
40
HMMs: Decoding a Sentence
Decide what is accepted vocabulary Optionally add a language model: P(word sequence) Efficient algorithm to find optimal path in decoding HMM:
41
A demo of HMM application
http://www.mmk.e-technik.tu-muenchen.de/rotdemo.html
This demo shows the image retrieval system, which enables the user to search a grayscale image database intuitively by presenting simple sketches.
You can find the detailed description of this demo at: http://www.mmk.e-technik.tu-muenchen.de/demo/im
agedb/theory.html
43
Kalman Filter?
What is the Kalman Filter? A technique that can be used to recursively estimate unobser
vable quantities called state variables, {xt}, from an observed time series {yt}.
What is it used for? Tracking missiles Extracting lip motion from video Lots of computer vision applications Economics Navigation
44
Problem? Estimating the location of a ship
“Suppose that you are lost at sea during the night and have no idea at all of your location.”
Problem? Inherent measuring device inaccuracies.
Your measurement has somewhat uncertainty!
45
Uncertainty
Conditional density of position based on measured value z1
Assume Gaussian distribution
),( 211 zN)( 1zxf
x1z
z1 : Measured position
x : Real position
Q: What can be a measure of uncertainty?
46
Measurements
You make a measurement Also, your friend make a measurement
Question 1. Which one is the better?
Question 2. What’s the best way to combine these measurements
),( 211 zN
),( 222 zN)( zxf
x1z 2z
47
Combine measurements
),( 211 zN
),( 222 zN
)( zxf
x1z 2z
),( 2N
222
21
21
122
21
22 zz
22
21
2
111
Uncertainty is decreased by
combining the two pieces of
Information !!
48
)(ˆ 2tx
)(ˆ 1tx
Optimal estimate at t2, ,is equal to the best prediction of its value before z2 is taken, , plus a correction term of an optimal weighting value times the difference between z2 and the best prediction of its value before it is actually taken, .
)(ˆ 1tx
What does it mean?
22
21
21
2
1221
1222
21
21
1
222
21
22
122
21
22
2
)(
)](ˆ)[()(ˆ
)(
)(ˆ
tKwhere
txztKtx
zzz
zz
tx
49
Moving? Suppose you’re moving
wudt
dx
u is a nominal velocity
w is a noisy term
The “noise” w will be modeled as a white Gaussian noise with a mean of zero and variance of .
2w
][)()(
][)(ˆ)(ˆ
2322
232
2323
tttt
ttutxtx
wxx
Best prediction
23
23
2
3
33333
3)(
)()(
)](ˆ)[()(ˆ)(ˆ
zx
x
t
ttK
txztKtxtx
Best estimate
50
Summary
Process Model Describes how the state changes over time
Measurement Model Where you are from what you see !!!
Predictor-corrector Predicting the new state and its uncertainty Correcting with the new measurement
52
References
You can find useful materials about HMM from CS570 AI Lecture Note(2003) http://www.idiap.ch/~bengio/ http://speech.chungbuk.ac.kr/~owkwon/
You can find useful materials about Kalman Filter from http://www.cs.unc.edu/~welch/kalman Maybeck, 1979, “Stochastic models, estimation, and control” Greg Welch, and Gray Bishop, 2001, “An introduction to the K
alman Filter”