1 probabilistic reasoning over time (especially for hmm and kalman filter ) december 1 th, 2004...

52
1 Probabilistic Reasoning Over Time (Especially for HMM and Kalman filter ) December 1 th , 2004 SeongHun Lee InHo Park Yang Ming

Post on 20-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

1

Probabilistic Reasoning Over Time

(Especially for HMM and Kalman filter )

December 1th, 2004

SeongHun LeeInHo ParkYang Ming

2

Contents

Markov Models Hidden Markov Models

HMMs as Generative Processes Markov Assumptions for HMMs The 3 Problems of HMMs HMMs for Speech Recognition

Kalman filters

3

Markov Models

4

Markov Process

Stochastic process of a temporal sequence Probability distribution of the variable q at time t depends on

the variable q at times t-1 to 1

First order Markov process State transition from state depends on previous state: P[qt=j|qt-1=i, qt-2=k,…] = P[qt=j|qt-1=i]

State transition is independent of time: aij = P[qt=j|qt-1=i]

5

Markov Models

Markov Model Model of a Markov process wi

th discrete state Given the observed sequenc

e, state sequence uniquely defined. Probability of the state seque

nce 's1 s3 s1 s2 s2 s3' given the observation sequence 'A C A B B C' is 1

6

Markov Models (Graphical View) A Markov model:

A Markov model unfolded in time:

7

Example of Markov Model

Markov chain with 3 states 3 states : sunny, cloudy, rain

0.2

sunny

cloudy rain

0.3

0.40.3

0.6

0.1

0.2

0.8

0.1 0.8 0.1 0.1

0.2 0.6 0.2

0.3 0.3 0.4

sunny cloudy rain

sunny

cloudy

rain

WeatherOfToday

Weather of Tomorrow

8

Example of Markov Model (cont’) Probability of a sequence S

Compute product of successive probabilities Ex. How is weather for next 2 days (today : sunny)?

Possible answer : sunny-sunny with 64%

sunny cloudyrain

0.10.2

P(sunny,cloudy,rain)

= P(sunny)P(cloudy|sunny)P(rain|cloudy)

= 1.0 x 0.1 x 0.2 = 0.02

sunny sunnysunny

0.80.8

P(sunny,sunny,sunny)

= P(sunny)P(sunny|sunny)P(sunny|cloudy)

= 1.0 x 0.8 x 0.8 = 0.64

9

Hidden Markov Models

10

Hidden Markov Model

Hidden Markov Model State is not observed (hidden) Observable symptom (output)

Transition probabilities between states Depend only on previous state:

Emission probabilties Depend only on the current state:

(where xt is observed)

11

Markov Assumptions

Emissions Probability to emit xt at time t in state qt = i does not depend

on anything else:

Transitions Probability to go from state j to state i at time t does not dep

end on anything else

Probability does not depend on time t:

12

Hidden Markov Models (Graphical View) Hidden Markov

Model Hidden Markov model

unfolded in time

13

HMM as Generative Processes

HMM can be use to generate sequences Define a set of starting states with initial probabilities P(q0 =

i) Define a set of final states For each sequence to generate:

Select an initial state j according to P(q0) Select the next state i according to P(qt = i|qt-1=j) Emit an output according to the emission distribution P(xt|qt =

i) If i is a final state, then stop, otherwise loop to step 2

14

Coin Toss Model

2-Coins model Description

State S={S1, S2} : two different biased coins Each state characterized by probability distribution of

heads and tails States transitions characterized by state transition

matrix Observation symbol V={H, T} (H: Head, T: Tail)

given

hidden

15

Urn and Ball Model

Each urn contain colored balls (4 distinct colors)

Basic step Choose urn according to some

probabilistic procedure Get a ball from the urn Record (observe) its color Replace the ball Repeat the above procedure.

Colors of selected balls are observed but sequence of choosing urns is hidden

16

The 3 Problems of HMMs

17

The 3 Problems of HMMs

HMM model gives rise to 3 different problems: The Evaluation Problem

Given HMM parameterized by , compute likelihood of a sequence

The Decoding Problem Given HMM parameterized by , compute optimal path Q throug

h the state space given a sequence X:

The Learning Problem Given an HMM parameterized by and a set of sequences Xn,

select parameters such that:

18

The Evaluation Problem

Finding Probability of Observation Sphinx quiz

Sphinx in castle. The sphinx proposes a quiz. The sphinx unseen to you

shows a card from 4 kinds (spade, heart, diamond, clover) every day.

It depends on her feeling at the day which card is chosen.

The feeling change pattern and preference for each feeling are known.

After 3 cards are shown, you must answer probability of the observation sequence

19

The Evaluation Problem

Straightforward way

Straightforward way Enumerating every possible state sequence

of length T(the number of observation) P( ) = P( ) +

P( ) + … + P( )

Time complexity : 2 * T * NT

Time complexity is too high Consider

Use probability of partial observation

20

The Evaluation Problem

Forward Variable Approach Forward variable

Save probability of partial observation sequence in state matrix.

Forward variable in Sj Use “Forward Variable” in previous

states Calculate each transition probabilit

y with forward variable and emittion probability.

Sum all calculations.

21

The Evaluation Problem

Forward Variable Approach Forward variable

Probability of having generated sequence and being in state i at time t

22

The Evaluation Problem

Forward Variable Approach Reminder: Initial condition:

-> prior probabilities of each state i

Compute for each state i and each time t of a given sequence

Compute likelihood as follows:

Sum αT(i)’sto get P(O|λ)

23

The Evaluation Problem Forward Variable Approach Let’s Do it.

Assume prior probability P( )=P( )=.5 ( ,1) = P( ) * P( | ) = .5 * .2 ( ,1) = P( ) * P( | ) = .5 * .1

( ,2) = ( ,1)* P( | )*P( | ) + ( ,1) * P( | ) * P( | )

24

The Decoding Problem

Finding Best State Sequence Sphinx quiz

The sphinx changes a quiz. Same condition as before After 3 cards are shown, you

must find the sequence of her feelings (maximum likely state sequence) Answer is : ?

? …

25

The Decoding Problem

Choosing Individually most likely states Find individually most likely state

Find most likely first state, Find most likely second state, and so on

In Quiz We get

Problem No guarantee that path is valid one when HMM has

state transition with zero probability

individuallychosen

……

……

individuallychosen

zero prob.transition

26

The Decoding Problem

Viterbi algorithm Find single best state sequence path

Maximize P(Q|X, ), i.e. maximize P(Q,X| ) Based on dynamic programming methods

Dynamic programming Similar to shortest path algorithm Use “Viterbi Variable” in previous states

Have maximum probability of partial sequence Have sequence of its states

Calculate each transition probability with Viterbi variable and emittion probability.

Choose state in previous states, which has maximum result

27

The Decoding Problem

Viterbi algorithm Viterbi algorithm finds the best state sequence Viterbi variable:

28

The Decoding Problem

Viterbi algorithm step 1 : Initialization

1( i ) = ibi (O1) for 1≤i≤N, ( is initial prob, b is output prob.) 1( i ) = 0 (sequnce of best path)

step 2 : Induction t( j ) = Max[ t-1( i ) aij ] bj (Ot), 1≤j≤N t( j ) = argmax[ t-1( i ) aij ], 1≤j≤N (store backtrace)

step 3 : Termination P* = Max[T (s)] qT

* = argmax[T(s)]

step 4 : Path (state sequence) backtracking (t=T-1..1) qt

* = t+1(qt+1

*)

1

2

3

states

29

The Decoding Problem

Viterbi algorithm Let’s Do it Step 1: Initialization

1( ) = P( ) * P( | ) = .5 * .2 = .1 1( ) = P( ) * P( | ) = .5 * .1 = .05

Step 2: Induction 1( ) * P( | ) * P ( | )

= .1 * .8 * .6 = 0.048 1( ) * P( | ) * P ( | )

= .05 * .6 * .6 = 0.018 2( ) = 0.048 …

30

The Learning Problem

Parameter Estimation Problems

Sphinx quiz Sphinx changes a quiz again!! No information about

“condition of feeling changes” and “choosing card”

With many card sequences you have to find best model which give best conditions of “feeling changes” and “choosing card”

31

The Learning Problem

Baum-Welch Method Find

: model parameter

Locally maximize it by iterative hill-climbing algorithm Work out probability of observations using some model. Find which state transitions, symbol emissions used

most. By increasing probability of those, choose revised model

which gives higher probability to observations Training !

)|(maxarg

trainingOP

32

The Learning Problem

Baum-Welch Method

Baum-Welch Method Algorithms Step 1 : Begin with some model

(perhaps pre-selected or just chosen randomly)

Step 2 : Run O through current model to estimate expectations of each model parameter

Step 3 : Change model to maximize values of paths used a lot

Step 4 : Repeat this process, until converging on optimal values for the model parameter

33

The Learning Problem

Baum-Welch Method Let’s Do it

Step 1: Choose initial model

Step 2: Run O through current model to estimate expectations of each model parameter

Step 3 : Change model to maximize values of paths used a lot

Step 4 : Repeat this process, until converging on optimal values for the model parameter

34

HMMs for Applications

35

Sequential Data

Often highly variable, but has embedded structure

Information contained in the structure

36

More examples

Text, on-line handwriting, music notes, DNA sequence, program codes

37

HMMs for Speech Recognition

Find a sequence of phonemes (or words) given an acoustic sequence ex. “How to wreak a nice beach.” ex. “How to recognize speech.”

Idea: use a phoneme model

38

Phoneme model

Phoneme Smallest unit of sound Distinct meaning

Consonant Vowel

Phoneme model Observed speech signa

ls Find sequence of state

s Maximize P(signals|st

ates)

39

Embbeded Training of HMMs

For each acoustic sequence in training set, create new HMM as concatenation of the HMMs representing underlying sequence of phonemes.

Maximize likelihood of training sentences.

40

HMMs: Decoding a Sentence

Decide what is accepted vocabulary Optionally add a language model: P(word sequence) Efficient algorithm to find optimal path in decoding HMM:

41

A demo of HMM application

http://www.mmk.e-technik.tu-muenchen.de/rotdemo.html

This demo shows the image retrieval system, which enables the user to search a grayscale image database intuitively by presenting simple sketches.

You can find the detailed description of this demo at: http://www.mmk.e-technik.tu-muenchen.de/demo/im

agedb/theory.html

42

Kalman Filter

43

Kalman Filter?

What is the Kalman Filter? A technique that can be used to recursively estimate unobser

vable quantities called state variables, {xt}, from an observed time series {yt}.

What is it used for? Tracking missiles Extracting lip motion from video Lots of computer vision applications Economics Navigation

44

Problem? Estimating the location of a ship

“Suppose that you are lost at sea during the night and have no idea at all of your location.”

Problem? Inherent measuring device inaccuracies.

Your measurement has somewhat uncertainty!

45

Uncertainty

Conditional density of position based on measured value z1

Assume Gaussian distribution

),( 211 zN)( 1zxf

x1z

z1 : Measured position

x : Real position

Q: What can be a measure of uncertainty?

46

Measurements

You make a measurement Also, your friend make a measurement

Question 1. Which one is the better?

Question 2. What’s the best way to combine these measurements

),( 211 zN

),( 222 zN)( zxf

x1z 2z

47

Combine measurements

),( 211 zN

),( 222 zN

)( zxf

x1z 2z

),( 2N

222

21

21

122

21

22 zz

22

21

2

111

Uncertainty is decreased by

combining the two pieces of

Information !!

48

)(ˆ 2tx

)(ˆ 1tx

Optimal estimate at t2, ,is equal to the best prediction of its value before z2 is taken, , plus a correction term of an optimal weighting value times the difference between z2 and the best prediction of its value before it is actually taken, .

)(ˆ 1tx

What does it mean?

22

21

21

2

1221

1222

21

21

1

222

21

22

122

21

22

2

)(

)](ˆ)[()(ˆ

)(

)(ˆ

tKwhere

txztKtx

zzz

zz

tx

49

Moving? Suppose you’re moving

wudt

dx

u is a nominal velocity

w is a noisy term

The “noise” w will be modeled as a white Gaussian noise with a mean of zero and variance of .

2w

][)()(

][)(ˆ)(ˆ

2322

232

2323

tttt

ttutxtx

wxx

Best prediction

23

23

2

3

33333

3)(

)()(

)](ˆ)[()(ˆ)(ˆ

zx

x

t

ttK

txztKtxtx

Best estimate

50

Summary

Process Model Describes how the state changes over time

Measurement Model Where you are from what you see !!!

Predictor-corrector Predicting the new state and its uncertainty Correcting with the new measurement

51

Appendix – derivation

52

References

You can find useful materials about HMM from CS570 AI Lecture Note(2003) http://www.idiap.ch/~bengio/ http://speech.chungbuk.ac.kr/~owkwon/

You can find useful materials about Kalman Filter from http://www.cs.unc.edu/~welch/kalman Maybeck, 1979, “Stochastic models, estimation, and control” Greg Welch, and Gray Bishop, 2001, “An introduction to the K

alman Filter”