graphical models - svivek · • example: hidden markov model –naïve bayes classifier is also a...

31
CS 6355: Structured Prediction Graphical Models 1

Upload: lydiep

Post on 26-Aug-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

CS6355:StructuredPrediction

GraphicalModels

1

Sofar…Wediscussedsequencelabelingtasks:• HMM:HiddenMarkovModels• MEMM:MaximumEntropyMarkovModels• CRF:ConditionalRandomFieldsAllthesemodelsusealinearchainstructuretodescribetheinteractionsbetweenrandomvariables.

2

yt-1 yt

xt

yt-1 yt

xt

yt-1 yt

xt

HMM MEMM CRF

Thislecture

Graphicalmodels– Directed:BayesianNetworks– Undirected:MarkovNetworks(MarkovRandomField)

• Representations• Inference• Learning

3

GraphicalModels

• Alanguagetorepresentprobabilitydistributionsovermultiplerandomvariables– Directedorundirectedgraphs

• Encodesconditionalindependenceassumptions• Orequivalently,encodesfactorizationofjointprobabilities.

• Generalmachinery for– Algorithmsforcomputingmarginalandconditionalprobabilities

• Recallthatwehavebeenlookingatmostprobablestatessofar• Exploitinggraphstructure

– An“inferenceengine”

– Canintroducepriorprobabilitydistributions• Becauseparametersarealsorandomvariables

4

Decomposejointprobabilityviaadirectedacyclicgraph– Nodesrepresentrandomvariables– Edgesrepresentconditionaldependencies– Eachnodeisassociatedwithaconditionalprobabilitytable

BayesianNetwork

ExamplefromRussellandNorvig 5

Decomposejointprobabilityviaadirectedacyclicgraph– Nodesrepresentrandomvariables– Edgesrepresentconditionaldependencies– Eachnodeisassociatedwithaconditionalprobabilitytable

BayesianNetwork

ExamplefromRussellandNorvig

P(B,E,A,J,M)=P(B)P(E)P(A|B,E)P(J|A)P)M|A)

Compactrepresentationofthejointprobabilitydistribution

6

• Localindependencies:Anodeisindependentwithitsnon-descendantsgivenitsparents.

• Topologicalindependencies:Anodeisindependentofallothernodesgivenitsparents,childrenandchildren’sparents—thatisgivenitsMarkovBlanket.

• Globalindependencies:D-separation

IndependenceAssumptionsofaBN

7

(Xi ? NonDescendants(Xi)|Parents(Xi))

(Xi ? Xj |MB(Xi)) for all j 6= i

ExamplefromDaphneKoller

• Localindependencies:Anodeisindependentwithitsnon-descendantsgivenitsparents.

• Topologicalindependencies:Anodeisindependentofallothernodesgivenitsparents,childrenandchildren’sparents—thatisgivenitsMarkovBlanket.

• Globalindependencies:D-separation

IndependenceAssumptionsofaBN

Wheredotheindependenceassumptionscomefrom?

8

(Xi ? NonDescendants(Xi)|Parents(Xi))

(Xi ? Xj |MB(Xi)) for all j 6= i

ExamplefromDaphneKoller

• Localindependencies:Anodeisindependentwithitsnon-descendantsgivenitsparents.

• Topologicalindependencies:Anodeisindependentofallothernodesgivenitsparents,childrenandchildren’sparents—thatisgivenitsMarkovBlanket.

• Globalindependencies:D-separation

IndependenceAssumptionsofaBN

Wheredotheindependenceassumptionscomefrom?

Domainknowledge

9

(Xi ? NonDescendants(Xi)|Parents(Xi))

(Xi ? Xj |MB(Xi)) for all j 6= i

ExamplefromDaphneKoller

BayesianNetwork

• Example:HiddenMarkovModel– NaïveBayesclassifierisalsoasimpleBayesnet

• SometimesBayesnetscannotrepresenttheindependencerelationswewantconveniently.– Eg:Segmentinganimagebyassigningalabeltoeachpixel

• Say,wewantadjacentlabelstoinfluenceeachotherTwoproblems:1. Whatistherightdirectionofarrows?

2. Foranychoiceofthearrows,strangedependenciesshowup.X8 isindependentofeverythinggivenitsMarkovblanket(othercirclednodeshere)

10

BayesianNetwork

• Example:HiddenMarkovModel– NaïveBayesclassifierisalsoasimpleBayesnet

• SometimesBayesnetscannotrepresenttheindependencerelationswewantconveniently.– Eg:Segmentinganimagebyassigningalabeltoeachpixel

• Say,wewantadjacentlabelstoinfluenceeachother

ExamplefromKevinMurphy 11

BayesianNetwork

• Example:HiddenMarkovModel– NaïveBayesclassifierisalsoasimpleBayesnet

• SometimesBayesnetscannotrepresenttheindependencerelationswewantconveniently.– Eg:Segmentinganimagebyassigningalabeltoeachpixel

• Say,wewantadjacentlabelstoinfluenceeachother

ExamplefromKevinMurphy

Twoproblems:1. Whatisthecorrectdirectionofarrows?

2. Foranychoiceofthearrows,strangedependenciesshowup.X8 isindependentofeverythinggivenitsMarkovblanket(othercirclednodeshere)

12

FromdirectedtoundirectedNetworks

SometimesBayesnetscannotrepresenttheindependencerelationswewantconveniently.

– Eg:Segmentinganimagebyassigningalabeltoeachpixel• Say,wewantadjacentlabelstoinfluenceeachother

ExamplefromKevinMurphy 13

UndirectedGraphicalModels

• Anotherwayofdefiningconditionalindependence

• Generalstructure– Nodesarerandomvariables– Edges(hyper-edges)definedependencies

• Thenodesinacomplete subgraph formaclique.

a.k.a MarkovRandomFields/MarkovNetworks

14

8cliques:{A},{B},{C},{D}{AB},{BC},{CD},{AD}

UndirectedGraphicalModels

• Anotherwayofdefiningconditionalindependence

• Generalstructure– Nodesarerandomvariables– Edges(hyper-edges)definedependencies

• Thenodesinacomplete subgraph formaclique.

a.k.a MarkovRandomFields/MarkovNetworks

15

P (x) =1

Z

Y

c2Cliques

fc(xc)

8cliques:{A},{B},{C},{D}{AB},{BC},{CD},{AD}

P (A,B,C,D) =1

Zf1(A,B)f2(B,C)

f3(C,D)f4(A,D)

ThisisaGibbsdistributionifallfactorsarepositive

UndirectedGraphicalModels

• Anotherwayofdefiningconditionalindependence

• Generalstructure– Nodesarerandomvariables– Edges(hyper-edges)definedependencies

• Thenodesinacomplete subgraph formaclique.

a.k.a MarkovRandomFields/MarkovNetworks

Thejointprobabilitydecouplesovercliques.Everycliquexcassociatedwithapotential function f(xc)

16

P (x) =1

Z

Y

c2Cliques

fc(xc)

8cliques:{A},{B},{C},{D}{AB},{BC},{CD},{AD}

P (A,B,C,D) =1

Zf1(A,B)f2(B,C)

f3(C,D)f4(A,D)

ThisisaGibbsdistributionifallfactorsarepositive

• Localindependencies:Anodeisindependentofallothernodesgivenitsneighbors.

• Globalindependencies:IfX,Y,Zaresetsofnodes,XisconditionallyindependentofYgivenZifremovingallnodesofCremovesallpathsfromAtoB

IndependenceAssumptionsofaMRF

17

IndependenceAssumptionsofaMRF

Wheredotheindependenceassumptionscomefrom?

18

• Localindependencies:Anodeisindependentofallothernodesgivenitsneighbors.

• Globalindependencies:IfX,Y,Zaresetsofnodes,XisconditionallyindependentofYgivenZifremovingallnodesofCremovesallpathsfromAtoB

IndependenceAssumptionsofaMRF

Wheredotheindependenceassumptionscomefrom?

Domainknowledge

19

• Localindependencies:Anodeisindependentofallothernodesgivenitsneighbors.

• Globalindependencies:IfX,Y,Zaresetsofnodes,XisconditionallyindependentofYgivenZifremovingallnodesofCremovesallpathsfromAtoB

MRFtoFactorgraph

20

Z: Calledthepartitionfunction, sumoverallassignmentstotherandomvariables

Normalize:

where

f(xc,µ)isoftenwrittenasexp(µT xc)Log-linearmodel

Factorgraphs

21

Z: Calledthepartitionfunction,sumoverallassignmentstotherandomvariables

Normalize:

where

Factorgraph:Makesthefactorizationexplicit,factors insteadofcliques

?

f(xc,µ)isoftenwrittenasexp(µT xc)Log-linearmodel

Whichcliques?

Factorgraphs

22

Z: Calledthepartitionfunction,sumoverallassignmentstotherandomvariables

Normalize:

where

Factorgraph:Makesthefactorizationexplicit,factors insteadofcliques

Factors

Factors

Factors

f(xc,µ)isoftenwrittenasexp(µT xc)Log-linearmodel

?

Factorgraphs

1 2 3

4 5

23

Z: Calledthepartitionfunction,sumoverallassignmentstotherandomvariables

Normalize:

where

Factorgraph:Makesthefactorizationexplicit,factors insteadofcliques

Factors

Factors

Factors

f(xc,µ)isoftenwrittenasexp(µT xc)Log-linearmodel

P (x) =1

Z

fa(x1, x2, x4)fb(x2, x3, x5)fc(x4, x5)

?

CommentsaboutMRFs

• Connectiontostatisticalphysics– IdenticaltoBoltzmanndistributioninenergybasedmodels– Probabilityofasystemexistinginastate:

• Ifx isdependentonallitsneighbors:– Ifxcanbeinoneoftwostates(binary),Ising model– Ifxcanbeinoneofmorethantwostates(multiclass),Pottsmodel

Z: Zustandssumme,“sumoverstates”,morecommonlycalledthepartitionfunction

24

CommentsaboutMRFs

• Connectiontostatisticalphysics– IdenticaltoBoltzmanndistributioninenergybasedmodels– Probabilityofasystemexistinginastate:

• Ifx isdependentonallitsneighbors:– Ifxcanbeinoneoftwostates(binary),Ising model– Ifxcanbeinoneofmorethantwostates(multiclass),Pottsmodel

Energyofcliquecexistinginstatexc

Z: Zustandssumme,“sumoverstates”,morecommonlycalledthepartitionfunction

25

CommentsaboutMRFs

• Connectiontostatisticalphysics– IdenticaltoBoltzmanndistributioninenergybasedmodels– Probabilityofasystemexistinginastate:

• Ifx isdependentonallitsneighbors:– Ifxcanbeinoneoftwostates(binary),Ising model– Ifxcanbeinoneofmorethantwostates(multiclass),Pottsmodel

Energyofcliquecexistinginstatexc

Z: Zustandssumme,“sumoverstates”,morecommonlycalledthepartitionfunction

26

BayesianNetworksvs.MarkovNetworks

• Bothnetworksrepresent– Asetofconditionalindependencerelations– i.e,askeletonthatshowshowajointprobabilitydistributionis

factorized

• Bothnetworkshavetheoremsaboutequivalencebetweenconditionalindependenceand jointprobabilityfactorization

• GivenaBayesnet,thereisnotalwaysanequivalentMarkovrandomfield,andviceversa– Withsomecaveats– SeethechapteronundirectedgraphicalmodelsinKoller and

Friedman’sbook

27

Inferenceingraphicalmodels

Ingeneral,computeprobabilityofasubsetofstates– P(xA),forsomesubsetsofrandomvariablesxA

• Note:Sofar,wehavegenerallyconsideredthesomethinglikeargmaxx P(x)

• Exactinference• “Approximate”inference

28

1 2 3

4 5(moreonthisinfutureclasses)

Inferenceingraphicalmodels

Ingeneral,computeprobabilityofasubsetofstates– P(xA),forsomesubsetsofrandomvariablesxA

• Note:Sofar,wehavegenerallyconsideredtheequivalentofargmaxx P(x)

• Exactinference• “Approximate”inference

29

1 2 3

4 5(moreonthisinfutureclasses)

Inferenceingraphicalmodels

Ingeneral,computeprobabilityofasubsetofstates– P(xA),forsomesubsetsofrandomvariablesxA

• Exactinference– Variableelimination

• Marginalizebysummingoutvariablesina“good”order• ThinkaboutwhatwedidforViterbi

– Beliefpropagation(exactonlyforgraphswithoutloops)• Nodespassmessagestoeachotherabouttheirestimateofwhattheneighbor’sstateshouldbe

– Generallyefficientfortrees,sequences(andmaybeothergraphstoo)

• “Approximate”inference30

1 2 3

4 5(moreonthisinfuturelectures)

Whatmakesanorderinggood?

Inferenceingraphicalmodels

Ingeneral,computeprobabilityofasubsetofstates– P(xA),forsomesubsetsofrandomvariablesxA

• Exactinference• “Approximate”inference– MarkovChainMonteCarlo

• GibbsSampling/Metropolis-Hastings– Variational algorithms

• Frameinferenceasanoptimizationproblem,perturbittoanapproximateoneandsolvetheapproximateproblem

– LoopyBeliefpropagation• RunBPandhopeitworks!

– Thenot-so-goodnews:Approximateinferenceisalsointractable!31

1 2 3

4 5

NP-hardingeneral,worksforsimplegraphs

(moreonthisinfuturelectures)