graphical' models' template'models'...

Daphne Koller

Overview'

Probabilis.c'Graphical'Models' Template'Models'

Representa.on'

Daphne Koller

Genetic Inheritance

Homer

Bart

Marge

Lisa Maggie

Clancy Jackie

Selma

Daphne Koller

Genetic Inheritance

GHomer

GBart

GMarge

GLisa GMaggie

GHarry GJackie

GSelma

BHarry BJackie

BSelma BHomer BMarge

BBart BLisa BMaggie

Daphne Koller

NLP Sequence Models

Daphne Koller

Image Segmentation

Daphne Koller

The University Example Difficulty

Grade

Intelligence

D(c1) D(cn) . . . I(s1) I(sm) . . .

G(c1,s1) G(cn,sm) . . . G(c1,sm) . . . G(cn,s1) . . .

Daphne Koller

Robot Localization

Fox, Burgard, Thrun

Daphne Koller

Robot Localization

x

z

u

x

z

u

2

2

x

z

u

... t

t

x 1

1

0

1 0 t-1 control signal

robot pose

sensor observation

map

Daphne Koller

Template Variables •  Template variable X(U1,…,Uk) is

instantiated (duplicated) multiple times – Location(t), Sonar(t) – Genotype(person), Phenotype(person) – Label(pixel) – Difficulty(course), Intelligence(student),

Grade(course, student)

Daphne Koller

Template Models •  Languages that specify how variables

inherit dependency model from template •  Dynamic Bayesian networks •  Object-relational models – Directed •  Plate models

– Undirected

Daphne Koller

Temporal)Models)

Probabilis0c)Graphical)Models) Template)Models)

Representa0on)

Daphne Koller

Distributions over Trajectories •  Pick time granularity Δ •  X(t) – variable X at time tΔ •  X(t:t’) = {X(t), …, X(t’)} (t ≤ t’)

•  Want to represent P(X(t:t’)) for any t, t’

0 1 2 3 4 5

Daphne Koller

Markov Assumption

Daphne Koller

Time Invariance •  Template probability model P(X’ | X) •  For all t:

Daphne Koller

Template Transition Model

Location’

Failure’

Obs’

Location

Failure

Time)slice)t" Time)slice)t+1"

Velocity Velocity’

Weather Weather’

Daphne Koller

Location0

Failure0

Time)slice)0"

Velocity0

Weather0

Obs0

Initial State Distribution

Daphne Koller

Location0

Failure0

Time)slice)0"

Velocity0

Weather0

Location1

Failure1

Obs1

Time)slice)1"

Velocity1

Weather1

Location2

Failure2

Obs2

Velocity2

Weather2

Time)slice)2"

Obs0

Ground Bayesian Network

Daphne Koller

2-time-slice Bayesian Network •  A transition model (2TBN) over X1,…,Xn is

specified as a BN fragment such that: –  The nodes include X1’,…,Xn’ and a subset of X1,…,Xn –  Only the nodes X1’,…,Xn’ have parents and a CPD

•  The 2TBN defines a conditional distribution

Daphne Koller

Dynamic Bayesian Network •  A dynamic Bayesian network (DBN) over

X1,…,Xn is defined by a – 2TBN BN→ over X1,…,Xn – a Bayesian network BN(0) over X1

(0) ,…,Xn

(0)

Daphne Koller

Ground Network •  For a trajectory over 0,…,T we define a

ground (unrolled network) such that – The dependency model for X1

(0) ,…,Xn

(0) is copied from BN(0)

– The dependency model for X1(t)

,…,Xn(t) for

all t > 0 is copied from BN→

Daphne Koller Huang, Koller, Malik, Ogasawara, Rao, Russell, Weber, AAAI 94

LeftClr

RightClr

LatAct

Xdot

FwdAct

Ydot

Stopped

EngStat

FrontBackStat

LeftClr’

RightClr’

LatAct’

Xdot’

FwdAct’

Ydot’

Stopped’

EngStat’

FrontBackStat’

InLane InLane’

SensOK’

FYdotDiff’

FcloseSlow’

BXdot’

BcloseFast’

BYdotDiff’

Fclr’

Bclr’

LftClrSens’

RtClrSens’

TurnSignal’

XdotSens’

YdotSens’

FYdotDiffSens’

FclrSens’

BXdotDiffSens’

BclrSens’

BYdotDiffSens’

Daphne Koller

Summary •  DBNS are a compact representation for

encoding structured distributions over arbitrarily long temporal trajectories

•  They make assumptions that may require appropriate model (re)design: – Markov assumption – Time invariance

Daphne Koller

Hidden&Markov&Models&

Probabilis1c&Graphical&Models& Template&Models&

Representa1on&

Daphne Koller

Hidden Markov Models S’

O’

S S1

O1

S0 S2

O2

S3

O3

s1 s2 s3 s4

0.5 0.5 0.7

0.3

0.4

0.6

0.1

0.9

Daphne Koller

Numerous Applications •  Robot localization •  Speech recognition •  Biological sequence analysis •  Text annotation

Daphne Koller

Robot Localization

o

u

o

u

(2)

o

u

... (t) (1) (0)

control signal

robot pose

sensor observation

map

S S S S

(t-1) (1) (0)

(2) (t) (1)

Daphne Koller

Speech Recognition

Dan Jurafsky, Stanford

Daphne Koller

Segmentation of Acoustic Signal


Daphne Koller

Phonetic Alphabet

http://www.speech.cs.cmu.edu/cgi-bin/cmudict

•  R read R IY D •  S sea S IY •  SH she SH IY •  T tea T IY •  TH theta TH EY T AH •  UH hood HH UH D •  UW two T UW •  V vee V IY •  W we W IY •  Y yield Y IY L D •  Z zee Z IY •  ZH seizure S IY ZH ER

•  AA odd AA D •  AE at AE T •  AH hut HH AH T •  AO ought AO T •  AW cow K AW •  AY hide HH AY D •  B be B IY •  CH cheese CH IY Z •  D dee D IY •  DH thee DH IY •  EH Ed EH D •  ER hurt HH ER T •  EY ate EY T •  F fee F IY

•  G green G R IY N •  HH he HH IY •  IH it IH T •  IY eat IYT •  JH gee JH IY •  K key K IY •  L lee L IY •  M me M IY •  N knee N IY •  NG ping P IH NG •  OW oat OW T •  OY toy T OY •  P pee P IY

Daphne Koller

Word HMM


Daphne Koller

Phone HMM


Daphne Koller Dan Jurafsky, Stanford

Recognition HMM

Daphne Koller

Summary •  HMMs can be viewed as a subclass of DBNs •  HMMs seem unstructured at the level of

random variables •  HMM structure typically manifests in

sparsity and repeated elements within the transition matrix

•  HMMs are used in a wide variety of applications for modeling sequences

Daphne Koller

Plate&Models&

Probabilis.c&Graphical&Models& Template&Models&

Representa.on&

Daphne Koller

Modeling Repetition

Outcome Tosses&t! Outcome(t1) Outcome(tk)

{t1, …, tk}

θ

. . .

Daphne Koller

Grade

Intelligence

Students&s!G(s1)�

I(s1)�

G(s2)�

I(s2)�

Daphne Koller

Difficulty Courses&c&

Grade Intelligence

Students&s&

D(c1)�

G(s1,c1)�

I(s1,c1)�

G(s2,c1)�

I(s2,c1)�

D(c2)�

G(s1,c1)�

I(s1,c2)�

G(s2,c1)�

I(s2,c2)�

Nested Plates

Daphne Koller

Difficulty

Grade Courses&c&

Intelligence

Students&s&

D(c1)�

G(s1,c1)�

I(s1)� D(c2)� I(s2)�

G(s1,c2)� G(s2,c1)� G(s2,c2)�

Overlapping Plates

Daphne Koller

D(c1)�

G(s1,c1)�

I(s1)� D(c2)� I(s2)�

G(s1,c2)� G(s2,c1)� G(s2,c2)�

θI θG

θD

Explicit Parameter Sharing

Daphne Koller

Welcome&to&Geo101&

Welcome&to&CS101&

low&/&high&

Collective Inference

0% 50% 100%0% 50% 100%

A&

C&

low& high&

0% 50% 100%

easy&/&hard&

Daphne Koller

Plate Dependency Model •  For a template variable A(U1,…,Uk): – Template parents B1(U1),…,Bm(Um)

– CPD P(A | B1,…, Bm)

Daphne Koller

Ground Network Let A(U1,…,Uk) with parents B1(U1),…,Bm(Um) •  for any instantiation u1,…,uk to U1,…,Uk we

would have:

Daphne Koller

Plate Dependency Model Let A(U1,…,Uk) with parents B1(U1),…,Bm(Um) •  For each i, we must have Ui ⊆ U1,…,Uk – No indices in parent that are not in child

Daphne Koller

Summary •  Template for an infinite set of BNs, each

induced by a different set of domain objects •  Parameters and structure are reused within

a BN and across different BNs •  Models encode correlations across multiple

objects, allowing collective inference •  Multiple “languages”, each with different

tradeoffs in expressive power

graphical' models' template'models'...

Documents