hidden markov models, bayesian networks - nutritionistahofroe.net/stat430/lectures/28-hidden.pdf ·...
TRANSCRIPT
Hidden Markov Models, Bayesian Networks
Stat 430
Outline
• Definition of HMM
• Set-up of 3 main problems
• Three Main algorithms:
• Forward/Backward
• Viterbi
• Baum-Welch
Hidden Markov Models (HMM)
• Ideaeach state of a Markov chain emits a letter from a fixed alphabetdistribution of letters is time independent, but depends on state
• Situationusually we have string of emitted symbols, but don’t know Markov Chain (it’s hidden)
Application Areas
• Pattern recognition
• Search algorithms
• Sequence alignments
• Time series analysis
Setup
Markov State Diagramwith transition probabilities A
Emitted Sequence Y
emission probabilities B
Usually, we only observe Y (several instances of it)
Example
• Suppose we have five amino acid sequencesCAEFTPAVHCKETTPADHCAETPDDHCAEFDDHCDAEFPDDH
• find best possible alignment of all sequences (allowing insertions and deletions)
Run of an HMM• 2-step process:
initial -> emission -> transition -> emission -> transition -> q1 O1 q2 O2 q3
• Sequence of visited states Q = q1 q2 q3 ...
• Sequence of emitted symbols O = O1 O2 O3 ...
• usually we can observe O, but don’t know Q
Example• Markov Chain with two states, S1 and S2
and transition matrix P
• Emission alphabet {a, b}
• In S1 emission probabilitiesfor a and b are 0.5 and 0.5
• In S2 emission probabilitiesfor a and b are 0.25 and 0.75
• Observed sequence is bbb
S1 S2
S1
S2
0.9 0.1
0.8 0.2
Example
• Observed sequence is bbb
• What is the most likely sequence Q that emitted bbb? argmaxQ P(Q|O)
• What is the probability to observe O? P(O) = ∑Q P(O|Q) P(Q)
S1 S2
S1
S2
0.9 0.10.8 0.2
a bS1
S2
0.5 0.50.25 0.75
transitions emissions
Definition
• set of states S1, S2, ..., SN
• the transition matrix P with pij = P(qt+1 = Sj| qt=Si)
• an alphabet of M unique, observed symbols A = {a1, ..., aM}
• emission probabilities bi(a) = P(state Si emits a)
• initial distribution πi = P(q1 = Si)
A hidden Markov model (HMM) consists of
Three Main Problems
• Find P(O)computational problem: naive solution is intractablefoward-backward algorithm
• Find the sequence of states that most likely produced observed output O: argmaxQ = P(Q|O)Viterbi algorithm
• for fixed topology find P, B and π that maximize probability to observe OBaum-Welch algorithm
Forward/Backward
• given all parameters (B, Q, π) find P(O)
• naive approach is computationally too intensive
• Use help of forwards α and backwards β
• α(t,i) = P(o1o2 … ot, qt=Si)
• then P(O) = ∑i α(T,i)
Viterbi
• compute arg maxQ P(Q | O)
• two-step algorithm:
• maximize probability first,
• then recover structure Q
Baum-Welch
HMM for Amino Acid/Gene Sequences
CAEFTPAVH, CKETTPADH, CAETPDDH, CAEFDDH, CDAEFPDDH
d1
i1
m1
d2
i2
m2
d3
i3
m3 m4
i0
m0
Example
• CAEFDDH most likely produced bym0 m1 m2 m3 m4 d5 d6 m7 m8 m9 m10
• CDAEFPDDH most likely produced bym0 m1 i1 m2 m3 m4 d5 m6 m7 m8 m9 m10
• Yields alignment C -AEF - -DDH CDAEFP -DDH
CAEFTPAVH, CKETTPADH, CAETPDDH, CAEFDDH, CDAEFPDDH
R packages
• HMM
• RHmm
• HiddenMarkov
• msm
• depmix, depmixS4
• flexmix
Bayesian Networks
• HMMs are special case of Bayesian Nets
• A Bayesian Network is a directed acyclic graph, where nodes represent variablesedges describe conditional relationships
Setup
• If there is no edge between two nodes, the nodes are independent
• Edges imply parent/child relationship:
• X1 has children X2, X3, X5
• X5 has parents X1, X2
• P(X1, ..., Xp) = ∏i P(Xi|parents(Xi))
X1
X2
X3
X4X5
Example
• Given that the grass is wet, what is the probability that it is raining?
R packages
• deal
• MASTINO