introduction to hidden markov bayesian networks/olآ  introduction •a hidden markov model...

Download Introduction to Hidden Markov Bayesian Networks/OLآ  Introduction •A Hidden Markov Model (HMM) is

If you can't read please download the document

Post on 27-Jun-2020




0 download

Embed Size (px)


  • Introduction to Hidden Markov Models

  • Introduction

    • A Hidden Markov Model (HMM) is a simple temporal Bayesian


    • The structure of the network dictates how a system that varies over

    time moves from one state to the next.

    • Generally, we start at an initial state at time zero and the process

    moves to each of its possible new states with a probability based

    solely on the current state.

    • In Markov models, the states are directly observable – they are the

    values being measured

    • In Hidden Markov Models, the states are hidden, but lead to

    observable values with certain probabilities.

  • • Set of states:

    • Process moves from one state to another generating a

    sequence of states :

    •Markov chain property: probability of each subsequent state depends only on

    what was the previous state:

    •To define Markov model, the following probabilities have to be specified:

    inital probabilities and transition probabilities

    Firstly, Markov Models

    },,,{ 21 Nsss K

    KK ,,,, 21 ikii


    )|(),,,|( 1121 !! = ikikikiiik ssPssssP K

    )|( jiij ssPa =)( ii sP=!

  • Rain Dry


    0.2 0.8

    • Two states : ‘Rain’ and ‘Dry’.

    • Transition probabilities:

    P(‘Rain’|‘Rain’)=0.3 , P(‘Dry’|‘Rain’)=0.7 ,

    P(‘Rain’|‘Dry’)=0.2, P(‘Dry’|‘Dry’)=0.8

    • Initial probabilities: say P(‘Rain’)=0.4 , P(‘Dry’)=0.6 .

    Example of Markov Model

  • • By Markov chain property, probability of state sequence can be found by the formula:

    •Suppose we want to calculate a probability of a sequence of states in our

    example, {‘Dry’,’Dry’,’Rain’,Rain’}.

    P({‘Dry’,’Dry’,’Rain’,Rain’} )

    = P(‘Rain’|’Rain’) P(‘Rain’|’Dry’) P(‘Dry’|’Dry’) P(‘Dry’)

    = 0.3*0.2*0.8*0.6

    Calculation of sequence probability






















  • Low High


    0.2 0.8


    0.6 0.6 0.4 0.4

    Example of Hidden Markov Model

  • • Two states : ‘Low’ and ‘High’ atmospheric pressure.

    • Two observations : ‘Rain’ and ‘Dry’.

    • Transition probabilities:

    P(‘Low’|‘Low’)=0.3 , P(‘High’|‘Low’)=0.7 ,

    P(‘Low’|‘High’)=0.2, P(‘High’|‘High’)=0.8

    • Observation probabilities :

    P(‘Rain’|‘Low’)=0.6 , P(‘Dry’|‘Low’)=0.4 , P(‘Rain’|‘High’)=0.4 ,

    P(‘Dry’|‘High’)=0.3 .

    • Initial probabilities: say P(‘Low’)=0.4 , P(‘High’)=0.6 .

    Example of Hidden Markov Model

  • •Suppose we want to calculate a probability of a sequence of observations in

    our example, {‘Dry’,’Rain’}.

    •Consider all possible hidden state sequences:

    P({‘Dry’,’Rain’} )

    = P({‘Dry’,’Rain’} , {‘Low’,’Low’}) + P({‘Dry’,’Rain’} ,

    {‘Low’,’High’}) + P({‘Dry’,’Rain’} , {‘High’,’Low’}) +

    P({‘Dry’,’Rain’} , {‘High’,’High’})

    where first term is :

    P({‘Dry’,’Rain’} , {‘Low’,’Low’})

    = P({‘Dry’,’Rain’} | {‘Low’,’Low’}) P({‘Low’,’Low’})

    = P(‘Dry’|’Low’)P(‘Rain’|’Low’) P(‘Low’)P(‘Low’|’Low’)

    = 0.4*0.4*0.6*0.4*0.3

    Calculation of observation sequence probability

  • How HMM Works

    • HMM is a stochastic generative

    model for sequences

    • Defined by

    – finite set of states S

    – finite alphabet A

    – transition prob matrix T

    – emission prob matrix E

    • Move from state to state

    according to T while emitting

    symbols according to E sk



    a1 a2

  • • Game:

    – You bet £1

    – You roll your die

    – Casino rolls their die

    – Highest number wins £2

    • Question: Suppose we played 2

    games, and the sequence of rolls

    was 1, 6, 2, 6. Have we been


    Example: Dishonest Casino

    • Casino has two dices:

    – Fair dice

    • P(i) = 1/6, i = 1..6

    – Loaded dice

    • P(i) = 1/10, i = 1..5

    • P(i) = 1/2, i = 6

    • Casino switches between fair &

    loaded die with probability of 1/2.

    • Initially, dice is always fair

  • “Visualisation” of Dishonest Casino

  • 1, 6, 2, 6?

    We were probably cheated...

    E(1|Fair) * T(?,Fair)*

    E(6|Loaded) * T(Fair,Loaded)*

    E(2|Fair) * T(Loaded, Fair) *

    E(6|Loaded) * T(Fair, Loaded)

  • Evaluation problem. Given the HMM M=(A, B, !) and the observation sequence O=o1 o2 ... oK , calculate the probability that model M has generated sequence O .

    • Decoding problem. Given the HMM M=(A, B, !) and the observation sequence O=o1 o2 ... oK , calculate the most likely sequence of hidden states si that produced this observation

    sequence O.

    • Learning problem. Given some training observation sequences O=o1 o2 ... oK and general

    structure of HMM (numbers of hidden and visible states), determine HMM parameters

    M=(A, B, !) that best fit training data.

    O=o1...oK denotes a sequence of observations ok!{v1,…,vM}.

    Main uses of HMMs :

  • • Typed word recognition, assume all characters are separated.

    • Character recognizer outputs probability of the image being

    particular character, P(image|character).








    Word recognition example(1).

    Hidden state Observation

  • • If lexicon is given, we can construct separate HMM models

    for each lexicon word.

    Amherst a m h e r s t

    Buffalo b u f f a l o

    0.5 0.03

    • Here recognition of word image is equivalent to the problem

    of evaluating few HMM models.

    •This is an application of Evaluation problem.

    Word recognition example(3).

    0.4 0.6

  • • We can construct a single HMM for all words.

    • Hidden states = all characters in the alphabet.

    • Transition probabilities and initial probabilities are calculated

    from language model.

    • Observations and observation probabilities are as before.

    a m

    h e




    b v



    • Here we have to determine the best sequence of hidden states,

    the one that most likely produced word image.

    • This is an application of Decoding problem.

    Word recognition example(4).

  • What makes a good HMM problem space?


    • Classification problems

    There are two main types of output from an HMM:

    – Scoring of sequences

    • For example, protein family modelling

    – Labeling of observations within a sequence

    • For example, identifying genes in a particular sequence

  • HMM Requirements

    So you’ve decided you want to build an HMM,

    here’s what you need:

    • An architecture

    – Probably the hardest part

    – Should be sound & easy to interpret

    • A well-defined success measure

    – Necessary for any form of machine learning

  • HMM Requirements


    • Training data

    – Labeled or unlabeled – it depends

    • You do not always need a labeled training set to do

    observation labeling, but it helps

    – Amount of training data needed is:

    • Directly proportional to the number of free parameters in the


    • Inversely proportional to the size of the training sequences

  • HMM Advantages

    • Statistical Grounding

    – Statisticians are comfortable with the theory behind hidden

    Markov models

    – Freedom to manipulate the training and verification processes

    – Mathematical / theoretical analysis of the results and processes

    – HMMs are still very powerful modeling tools – far more powerful

    than many statistical methods

  • HMM Advantages continued

    • Modularity

    – HMMs can be combined into larger HMMs

    • Transparency of the Model

    – Assuming an architecture with a good design

    – People can read the model and make sense of it

    – The model itself can help increase understanding

  • HMM Advantages continued

    • Incorporation of Prior Knowledge

    – Incorporate prior knowledge into the architecture

    – Initialize the model close to something believed to be correct

    – Use prior knowledge to constrain training process

  • HMM Disadvantages

    • Markov Chains


View more >