how to find foreign genes? markov models aaaa: 10% aaac: 15% aaag: 40% aaat: 35% aaa aac aag aat...
TRANSCRIPT
How to find foreign genes?Markov Models
AAAA: 10%
AAAC: 15%
AAAG: 40%
AAAT: 35%
AAAAACAAGAATACA . . .TTGTTT
TrainingSet
Building the model
How to find foreign genes?Markov Models
A C G TAAA 0.10 0.15 0.40 0.35AAC 0.25 0.45 0.25 0.05AAG 0.25 0.20 0.30 0.25 AAT 0.25 0.20 0.30 0.25 ACA 0.15 0.20 0.25 0.40 . . .TTG 0.20 0.50 0.05 0.25TTT 0.10 0.55 0.25 0.10
Candidategene
AAAACAA…
0.10
3rd order Markov model
Markov ChainsA traffic light considered as a sequence of states
A trivial Markov chain – the transition probability between the states is always 1
Pgy = 1
Pyr = 1
Prg = 1
If we watch our traffic light, it will emit a string of states
A traffic light considered as a sequence of states Markov Chains
In the case of a simple Markov model, the state labels (e.g. green, red, yellow)
are the observable outputs of the process
Markov ChainsAn occasionally malfunctioning traffic light!!
The Markov property is that the probability of observing next a given future state depends only on the current state!
Pgy = 1
Pyr = .9
Prg = .85
Pry = .15
Pyg = .10
Markov ChainsThe Markov Property
ast = P(xi = t | xi-1 = s)
English Translation:
The transition probability ast from state s to state t…
…is equal to the probability that the ith state was t..
given that
that the immediately proceeding state (xi-1) was s
This is a form of conditional probability
Markov Chain
Now we can consider the probability of an observed sequence!
An occasionally malfunctioning traffic light!!
Markov ChainsWhat is the probability of chain of events x?
P(x) = P(xL, xL-1, … ,x1)
English Translation:
The probability of observing sequence of states x...
...is equal to the probability that the XLth state was
whatever AND the XL-1th state was whatever else,
AND etc., etc.
This is a form of joint probability
Markov ChainsWhat is the probability of chain of events x?
P(x) = P(xL, xL-1, … ,x1)
= P(xL | xL-1, … ,x1) P(xL-1 | xL-2, … ,x1) ... P(x1)
This is because P(X,Y) = P(X|Y) * P(Y)
English Translation:
The probability of events X AND Y happening is equal to the probability of X happening given that Y has already
happened, times the probability of event Y
Markov ChainsWhat is the probability of chain of events x?
P(x) = P(xL | xL-1, … ,x1) P(xL-1 | xL-2, … ,x1) ... P(x1)
But remember the key property of a Markov Chain is that probability of symbol xi depends ONLY on
the value of preceding symbol Xi-1!! Therefore:
P(x) = P(xL | xL-1) P(xL-1 | xL-2) ... P(x2|x1) P(x1)
P(x) = P(x1) axi-1xi
L
i=2
Markov ChainsHow about nucleic acid sequences?
No reason why nucleic acid sequences found in an organism cannot be modeled using Markov chains
A C
G T
Markov ModelWhat do we need to probabilistically model DNA sequences?
A C
G T
States
Transition probabilities
The states are the same for all organisms, so the transition probabilities are the model parameters we need to estimate
Parameter estimation
AAAA: 10%
AAAC: 15%
AAAG: 40%
AAAT: 35%
AAAAACAAGAATACA . . .TTGTTT
TrainingSet
Building the Markov Model
This is a maximum likelihood approach to parameter estimation. Such procedures
maximize the overall probability of the training set data.
Markov ModelWhich model best explains a newly observed sequence?
A C
G T
Each organism will have different transition probabilities parameters, so you can ask “was the sequence more likely
to be generated by model A or model B?”
A C
G T
Organism A Organism B
P(x|model A)
P(x|model B)S(x) = log
A commonly used metric for discrimination usingMarkov Chains is the Log-Odds ratio
Markov ModelWhich model best explains a newly observed sequence?
i =1
L
aAxi-1xi
aBxi-1xi
log