introduction to sequence models. sequences many types of information involve sequences: -financial...
TRANSCRIPT
Sequences
Many types of information involve sequences:- Financial data:
- DNA:
- Robot motion
- Text: “Jack Flash sat on a candle stick.”
Sequence Models
Sequence models try to describe how an element of a sequence depends on previous (or sometimes following) elements.
For instance, financial models might try to predict a stock price tomorrow, given the stock prices for the past few weeks.
As another example, robot motion models try to predict where a robot will be, given current location and the commands given to the motors.
Types of Sequence Models
“Continuous-time” models:• these try to describe situations where things
change continously, or smoothly, as a function of time.
• For instance, weather models, models from physics and engineering describing how gases or liquids behave over time, some financial models, …
• Typically, these involve differential equations• We won’t be talking about these
Types of Sequence Models
“Discrete-time” models:• These try to describe situations where the environment
provides information periodically, rather than continuously.– For instance, if stock prices are quoted once per day, or once
per hour, or once per time period T, then it’s a discrete sequence of data.
– The price of a stock as it fluctuates all day, at any time point, is a continuous sequence of data.
• We’ll cover 2 examples of Discrete-time sequence models:– Hidden Markov Models (used in NLP, machine learning)– Particle Filters (primarily used in robotics)
Hidden Markov Models
How students spend their time (observed once per time interval T):
Sleep Study Video games
0.4 0.60.8
0.0
0.2
0.3
0.3
0.3
0.1
Markov Model:- A set of states- A set of transitions (edges) from one state to the next- A conditional probability P(destination state | source state)
Quiz: Markov Models
How students spend their time (observed once per time interval T):
Sleep Study Video games
0.4 0.60.8
0.0
0.2
0.3
0.3
0.3
0.1
Suppose a student starts in the Study state.What is P(Study) in the next time step?What about P(Study) after two time steps?And P(Study) after three time steps?
Answer: Markov Models
How students spend their time (observed once per time interval T):
Sleep Study Video games
0.4 0.60.8
0.0
0.2
0.3
0.3
0.3
0.1
Suppose a student starts in the Study state.What is P(Study) in the next time step? 0.4What about P(Study) after two time steps? 0.4*0.4 + 0.3*0.2 + 0.3*0.1 = 0.16 + 0.06 + 0.03 = .25And P(Study) after three time steps? … complicated
Simpler Example
Suppose the student starts asleep.What is P(Sleep) after 1 time step?What is P(Sleep) after 2 time steps?What is P(Sleep) after 3 time steps?
Sleep Study
00.50.5
1
Answer: Simpler Example
Suppose the student starts asleep.What is P(Sleep) after 1 time step? 0.5What is P(Sleep) after 2 time steps? 0.5*0.5 + 0.5*1 = 0.75What is P(Sleep) after 3 time steps? 0.5*0.5*0.5 + 0.5*1*0.5 + 0.5*0.5*1 + 0.5*0*1 = 0.625
Sleep Study
00.50.5
1
Stationary Distribution
What happens after many, many time steps?We’ll make three assumptions about the transition probabilities:1. It’s possible to get from any state to any other state.2. On average, the number of time steps it takes to get from one
state back to itself is finite.3. There are no cycles (or periods).Any Markov chains in this course will have these properties; in practice, most do anyway.
Sleep Study
00.50.5
1
Stationary Distribution
What happens after many, many time steps?If those assumptions are true, then:- After enough time steps, the probability of each state
converges to a stationary distribution.- This means that the probability at one time step is the
same as the probability at the next time step, and the one after that, and the one after that, …
Sleep Study
00.50.5
1
Stationary Distribution
Let’s compute the stationary distribution for this Markov chain:Let Pt be the probability distribution for Sleep at time step t.
For big enough t, Pt(Sleep) = Pt-1(Sleep).
Pt(Sleep) = Pt-1(Sleep)*0.5 + Pt-1(Study)*1
x = 0.5x + 1*(1-x)1.5x = 1x = 2/3
Sleep Study
00.50.5
1
Quiz: Stationary Distribution
Compute the stationary distribution for this Markov chain.
A B
0.40.750.25
0.6
Answer: Stationary Distribution
Compute the stationary distribution for this Markov chain.Pt(A) = Pt-1(A)
Pt(A) = Pt-1(A) * 0.75 + Pt-1(B) * 0.6
x = 0.75 x + 0.6 (1-x)0.85x = 0.6x = 0.6 / 0.85 ~= 0.71
A B
0.40.750.25
0.6
Learning Markov Model Parameters
There are six probabilities associated with Markov models:1. Initial state probabilities P0(A), P0(B)
2. Transition probabilities P(A|A), P(B|A), P(A|B), and P(B|B).
A B
???
?
Initial state is A: ?is B: ?
Learning Markov Model Parameters
Here is a sequence of observations from our Markov model:BAAABABBAAAUse maximum likelihood to estimate these parameters.1. P0(A) = 0/1, P0(B) = 1/1
2. P(A|A) = 4/6 = 2/3. P(B|A) = 2/6=1/3.3. P(A|B) = 3/4. P(B|B) = 1/4.
A B
???
?
Initial state is A: ?is B: ?
Quiz: Learning Markov Model Parameters
Here is a sequence of observations from our Markov model:AAABBBBBABBBAUse maximum likelihood to estimate these parameters.
A B
???
?
Initial state is A: ?is B: ?
Answer: Learning Markov Model Parameters
Here is a sequence of observations from our Markov model:AAABBBBBABBBAUse maximum likelihood to estimate these parameters.1. P0(A) = 1/1. P0(B) = 0/1.
2. P(A|A) = 2/4. P(B|A) = 2/4.3. P(A|B) = 2/8 = 1/4. P(B|B) = 6/8 = 3/4.
A B
???
?
Initial state is A: ?is B: ?
Restrictions on Markov Models
Sleep Study Video games
0.4 0.60.8
0.0
0.2
0.3
0.3
0.30.1
- Probability only depends on previous state, not any of the states before that (called the Markov assumption)
- Transition probabilities cannot change over time (called the stationary assumption)
Observations and Latent States
Markov models don’t get used much in AI.
The reason is that Markov models assume that you know exactly what state you are in, at each time step.
This is rarely true for AI agents.
Instead, we will say that the agent has a set of possible latent states – states that are not observed, or known to the agent.
In addition, the agent has sensors that allow it to sense some aspects of the environment, to take measurements or observations.
Hidden Markov Models
Suppose you are the parent of a college student, and would like to know how studious your child is.
You can’t observe them at all times, but you can periodically call, and see if your child answers.
Sleep Study
0.50.60.4
0.5H1 H2 H3
…
Sleep Study
0.50.60.4
0.5
Sleep Study
0.50.60.4
0.5
O1 O2 O3Answer callor not?
Answer callor not?
Answer callor not?
Hidden Markov Models
H1 H2 H3…
O1 O2 O3
H1 H2 P(H2|H1)
Sleep Sleep 0.6
Study Sleep 0.5
H2 H3 P(H3|H2)
Sleep Sleep 0.6
Study Sleep 0.5
H4 H3 P(H4|H3)
Sleep Sleep 0.6
Study Sleep 0.5
H1 O1P(O1
|H1)
Sleep Ans 0.1
Study Ans 0.8
H2 O2P(O2
|H2)
Sleep Ans 0.1
Study Ans 0.8
H3 O3P(O3
|H3)
Sleep Ans 0.1
Study Ans 0.8
H1 P(H1)
Sleep 0.5
Study 0.5
Here’s the same model, with probabilities in tables.
Hidden Markov Models
HMMs (and MMs) are a special type of Bayes Net. Everything you have learned about BNs applies here.
H1 H2 H3…
O1 O2 O3
H1 H2 P(H2|H1)
Sleep Sleep 0.6
Study Sleep 0.5
H2 H3 P(H3|H2)
Sleep Sleep 0.6
Study Sleep 0.5
H4 H3 P(H4|H3)
Sleep Sleep 0.6
Study Sleep 0.5
H1 O1P(O1
|H1)
Sleep Ans 0.1
Study Ans 0.8
H2 O2P(O2
|H2)
Sleep Ans 0.1
Study Ans 0.8
H3 O3P(O3
|H3)
Sleep Ans 0.1
Study Ans 0.8
H1 P(H1)
Sleep 0.5
Study 0.5
Hidden Markov Models
Suppose a parent calls and gets an answer at time step 1. What is P(H1=Sleep|O1=Ans)?
Notice: before the observation, P(Sleep) was 0.5. By making a call and getting an answer, the parent’s belief in Sleep drops to P(Sleep) = 0.111.
H1…
O1
H1 H2 P(H2|H1)
Sleep Sleep 0.6
Study Sleep 0.5
H1 O1P(O1
|H1)
Sleep Ans 0.1
Study Ans 0.8
H1 P(H1)
Sleep 0.5
Study 0.5
Hidden Markov ModelsSuppose a parent calls and gets an answer at time step 2. What is P(H2=Sleep|O2=Ans)?
H1
O1
H1 H2 P(H2|H1)
Sleep Sleep 0.6
Study Sleep 0.5
H1 O1P(O1
|H1)
Sleep Ans 0.1
Study Ans 0.8
H1 P(H1)
Sleep 0.5
Study 0.5
H2
O2
Quiz: Hidden Markov Models
H1
O1
H1 H2 P(H2|H1)
Sleep Sleep 0.6
Study Sleep 0.5
H1 O1P(O1
|H1)
Sleep Ans 0.1
Study Ans 0.8
H1 P(H1)
Sleep 0.5
Study 0.5
H2
O2
Suppose a parent calls twice, once at time step 1 and once at time step 2. The first time, the child does not answer, and the second time the child does.
Now what is P(H2=Sleep)?
Answer: Hidden Markov Models
H1
O1
H1 H2 P(H2|H1)
Sleep Sleep 0.6
Study Sleep 0.5
H1 O1P(O1
|H1)
Sleep Ans 0.1
Study Ans 0.8
H1 P(H1)
Sleep 0.5
Study 0.5
H2
O2
Suppose a parent calls twice, once at time step 1 and once at time step 2. The first time, the child does not answer, and the second time the child does.
Now what is P(H2=Sleep)?
Numerator:+
Denominator:+