graphical models - svivek · • example: hidden markov model –naïve bayes classifier is also a...
TRANSCRIPT
Sofar…Wediscussedsequencelabelingtasks:• HMM:HiddenMarkovModels• MEMM:MaximumEntropyMarkovModels• CRF:ConditionalRandomFieldsAllthesemodelsusealinearchainstructuretodescribetheinteractionsbetweenrandomvariables.
2
yt-1 yt
xt
yt-1 yt
xt
yt-1 yt
xt
HMM MEMM CRF
Thislecture
Graphicalmodels– Directed:BayesianNetworks– Undirected:MarkovNetworks(MarkovRandomField)
• Representations• Inference• Learning
3
GraphicalModels
• Alanguagetorepresentprobabilitydistributionsovermultiplerandomvariables– Directedorundirectedgraphs
• Encodesconditionalindependenceassumptions• Orequivalently,encodesfactorizationofjointprobabilities.
• Generalmachinery for– Algorithmsforcomputingmarginalandconditionalprobabilities
• Recallthatwehavebeenlookingatmostprobablestatessofar• Exploitinggraphstructure
– An“inferenceengine”
– Canintroducepriorprobabilitydistributions• Becauseparametersarealsorandomvariables
4
Decomposejointprobabilityviaadirectedacyclicgraph– Nodesrepresentrandomvariables– Edgesrepresentconditionaldependencies– Eachnodeisassociatedwithaconditionalprobabilitytable
BayesianNetwork
ExamplefromRussellandNorvig 5
Decomposejointprobabilityviaadirectedacyclicgraph– Nodesrepresentrandomvariables– Edgesrepresentconditionaldependencies– Eachnodeisassociatedwithaconditionalprobabilitytable
BayesianNetwork
ExamplefromRussellandNorvig
P(B,E,A,J,M)=P(B)P(E)P(A|B,E)P(J|A)P)M|A)
Compactrepresentationofthejointprobabilitydistribution
6
• Localindependencies:Anodeisindependentwithitsnon-descendantsgivenitsparents.
• Topologicalindependencies:Anodeisindependentofallothernodesgivenitsparents,childrenandchildren’sparents—thatisgivenitsMarkovBlanket.
• Globalindependencies:D-separation
IndependenceAssumptionsofaBN
7
(Xi ? NonDescendants(Xi)|Parents(Xi))
(Xi ? Xj |MB(Xi)) for all j 6= i
ExamplefromDaphneKoller
• Localindependencies:Anodeisindependentwithitsnon-descendantsgivenitsparents.
• Topologicalindependencies:Anodeisindependentofallothernodesgivenitsparents,childrenandchildren’sparents—thatisgivenitsMarkovBlanket.
• Globalindependencies:D-separation
IndependenceAssumptionsofaBN
Wheredotheindependenceassumptionscomefrom?
8
(Xi ? NonDescendants(Xi)|Parents(Xi))
(Xi ? Xj |MB(Xi)) for all j 6= i
ExamplefromDaphneKoller
• Localindependencies:Anodeisindependentwithitsnon-descendantsgivenitsparents.
• Topologicalindependencies:Anodeisindependentofallothernodesgivenitsparents,childrenandchildren’sparents—thatisgivenitsMarkovBlanket.
• Globalindependencies:D-separation
IndependenceAssumptionsofaBN
Wheredotheindependenceassumptionscomefrom?
Domainknowledge
9
(Xi ? NonDescendants(Xi)|Parents(Xi))
(Xi ? Xj |MB(Xi)) for all j 6= i
ExamplefromDaphneKoller
BayesianNetwork
• Example:HiddenMarkovModel– NaïveBayesclassifierisalsoasimpleBayesnet
• SometimesBayesnetscannotrepresenttheindependencerelationswewantconveniently.– Eg:Segmentinganimagebyassigningalabeltoeachpixel
• Say,wewantadjacentlabelstoinfluenceeachotherTwoproblems:1. Whatistherightdirectionofarrows?
2. Foranychoiceofthearrows,strangedependenciesshowup.X8 isindependentofeverythinggivenitsMarkovblanket(othercirclednodeshere)
10
BayesianNetwork
• Example:HiddenMarkovModel– NaïveBayesclassifierisalsoasimpleBayesnet
• SometimesBayesnetscannotrepresenttheindependencerelationswewantconveniently.– Eg:Segmentinganimagebyassigningalabeltoeachpixel
• Say,wewantadjacentlabelstoinfluenceeachother
ExamplefromKevinMurphy 11
BayesianNetwork
• Example:HiddenMarkovModel– NaïveBayesclassifierisalsoasimpleBayesnet
• SometimesBayesnetscannotrepresenttheindependencerelationswewantconveniently.– Eg:Segmentinganimagebyassigningalabeltoeachpixel
• Say,wewantadjacentlabelstoinfluenceeachother
ExamplefromKevinMurphy
Twoproblems:1. Whatisthecorrectdirectionofarrows?
2. Foranychoiceofthearrows,strangedependenciesshowup.X8 isindependentofeverythinggivenitsMarkovblanket(othercirclednodeshere)
12
FromdirectedtoundirectedNetworks
SometimesBayesnetscannotrepresenttheindependencerelationswewantconveniently.
– Eg:Segmentinganimagebyassigningalabeltoeachpixel• Say,wewantadjacentlabelstoinfluenceeachother
ExamplefromKevinMurphy 13
UndirectedGraphicalModels
• Anotherwayofdefiningconditionalindependence
• Generalstructure– Nodesarerandomvariables– Edges(hyper-edges)definedependencies
• Thenodesinacomplete subgraph formaclique.
a.k.a MarkovRandomFields/MarkovNetworks
14
8cliques:{A},{B},{C},{D}{AB},{BC},{CD},{AD}
UndirectedGraphicalModels
• Anotherwayofdefiningconditionalindependence
• Generalstructure– Nodesarerandomvariables– Edges(hyper-edges)definedependencies
• Thenodesinacomplete subgraph formaclique.
a.k.a MarkovRandomFields/MarkovNetworks
15
P (x) =1
Z
Y
c2Cliques
fc(xc)
8cliques:{A},{B},{C},{D}{AB},{BC},{CD},{AD}
P (A,B,C,D) =1
Zf1(A,B)f2(B,C)
f3(C,D)f4(A,D)
ThisisaGibbsdistributionifallfactorsarepositive
UndirectedGraphicalModels
• Anotherwayofdefiningconditionalindependence
• Generalstructure– Nodesarerandomvariables– Edges(hyper-edges)definedependencies
• Thenodesinacomplete subgraph formaclique.
a.k.a MarkovRandomFields/MarkovNetworks
Thejointprobabilitydecouplesovercliques.Everycliquexcassociatedwithapotential function f(xc)
16
P (x) =1
Z
Y
c2Cliques
fc(xc)
8cliques:{A},{B},{C},{D}{AB},{BC},{CD},{AD}
P (A,B,C,D) =1
Zf1(A,B)f2(B,C)
f3(C,D)f4(A,D)
ThisisaGibbsdistributionifallfactorsarepositive
• Localindependencies:Anodeisindependentofallothernodesgivenitsneighbors.
• Globalindependencies:IfX,Y,Zaresetsofnodes,XisconditionallyindependentofYgivenZifremovingallnodesofCremovesallpathsfromAtoB
IndependenceAssumptionsofaMRF
17
IndependenceAssumptionsofaMRF
Wheredotheindependenceassumptionscomefrom?
18
• Localindependencies:Anodeisindependentofallothernodesgivenitsneighbors.
• Globalindependencies:IfX,Y,Zaresetsofnodes,XisconditionallyindependentofYgivenZifremovingallnodesofCremovesallpathsfromAtoB
IndependenceAssumptionsofaMRF
Wheredotheindependenceassumptionscomefrom?
Domainknowledge
19
• Localindependencies:Anodeisindependentofallothernodesgivenitsneighbors.
• Globalindependencies:IfX,Y,Zaresetsofnodes,XisconditionallyindependentofYgivenZifremovingallnodesofCremovesallpathsfromAtoB
MRFtoFactorgraph
20
Z: Calledthepartitionfunction, sumoverallassignmentstotherandomvariables
Normalize:
where
f(xc,µ)isoftenwrittenasexp(µT xc)Log-linearmodel
Factorgraphs
21
Z: Calledthepartitionfunction,sumoverallassignmentstotherandomvariables
Normalize:
where
Factorgraph:Makesthefactorizationexplicit,factors insteadofcliques
?
f(xc,µ)isoftenwrittenasexp(µT xc)Log-linearmodel
Whichcliques?
Factorgraphs
22
Z: Calledthepartitionfunction,sumoverallassignmentstotherandomvariables
Normalize:
where
Factorgraph:Makesthefactorizationexplicit,factors insteadofcliques
Factors
Factors
Factors
f(xc,µ)isoftenwrittenasexp(µT xc)Log-linearmodel
?
Factorgraphs
1 2 3
4 5
23
Z: Calledthepartitionfunction,sumoverallassignmentstotherandomvariables
Normalize:
where
Factorgraph:Makesthefactorizationexplicit,factors insteadofcliques
Factors
Factors
Factors
f(xc,µ)isoftenwrittenasexp(µT xc)Log-linearmodel
P (x) =1
Z
fa(x1, x2, x4)fb(x2, x3, x5)fc(x4, x5)
?
CommentsaboutMRFs
• Connectiontostatisticalphysics– IdenticaltoBoltzmanndistributioninenergybasedmodels– Probabilityofasystemexistinginastate:
• Ifx isdependentonallitsneighbors:– Ifxcanbeinoneoftwostates(binary),Ising model– Ifxcanbeinoneofmorethantwostates(multiclass),Pottsmodel
Z: Zustandssumme,“sumoverstates”,morecommonlycalledthepartitionfunction
24
CommentsaboutMRFs
• Connectiontostatisticalphysics– IdenticaltoBoltzmanndistributioninenergybasedmodels– Probabilityofasystemexistinginastate:
• Ifx isdependentonallitsneighbors:– Ifxcanbeinoneoftwostates(binary),Ising model– Ifxcanbeinoneofmorethantwostates(multiclass),Pottsmodel
Energyofcliquecexistinginstatexc
Z: Zustandssumme,“sumoverstates”,morecommonlycalledthepartitionfunction
25
CommentsaboutMRFs
• Connectiontostatisticalphysics– IdenticaltoBoltzmanndistributioninenergybasedmodels– Probabilityofasystemexistinginastate:
• Ifx isdependentonallitsneighbors:– Ifxcanbeinoneoftwostates(binary),Ising model– Ifxcanbeinoneofmorethantwostates(multiclass),Pottsmodel
Energyofcliquecexistinginstatexc
Z: Zustandssumme,“sumoverstates”,morecommonlycalledthepartitionfunction
26
BayesianNetworksvs.MarkovNetworks
• Bothnetworksrepresent– Asetofconditionalindependencerelations– i.e,askeletonthatshowshowajointprobabilitydistributionis
factorized
• Bothnetworkshavetheoremsaboutequivalencebetweenconditionalindependenceand jointprobabilityfactorization
• GivenaBayesnet,thereisnotalwaysanequivalentMarkovrandomfield,andviceversa– Withsomecaveats– SeethechapteronundirectedgraphicalmodelsinKoller and
Friedman’sbook
27
Inferenceingraphicalmodels
Ingeneral,computeprobabilityofasubsetofstates– P(xA),forsomesubsetsofrandomvariablesxA
• Note:Sofar,wehavegenerallyconsideredthesomethinglikeargmaxx P(x)
• Exactinference• “Approximate”inference
28
1 2 3
4 5(moreonthisinfutureclasses)
Inferenceingraphicalmodels
Ingeneral,computeprobabilityofasubsetofstates– P(xA),forsomesubsetsofrandomvariablesxA
• Note:Sofar,wehavegenerallyconsideredtheequivalentofargmaxx P(x)
• Exactinference• “Approximate”inference
29
1 2 3
4 5(moreonthisinfutureclasses)
Inferenceingraphicalmodels
Ingeneral,computeprobabilityofasubsetofstates– P(xA),forsomesubsetsofrandomvariablesxA
• Exactinference– Variableelimination
• Marginalizebysummingoutvariablesina“good”order• ThinkaboutwhatwedidforViterbi
– Beliefpropagation(exactonlyforgraphswithoutloops)• Nodespassmessagestoeachotherabouttheirestimateofwhattheneighbor’sstateshouldbe
– Generallyefficientfortrees,sequences(andmaybeothergraphstoo)
• “Approximate”inference30
1 2 3
4 5(moreonthisinfuturelectures)
Whatmakesanorderinggood?
Inferenceingraphicalmodels
Ingeneral,computeprobabilityofasubsetofstates– P(xA),forsomesubsetsofrandomvariablesxA
• Exactinference• “Approximate”inference– MarkovChainMonteCarlo
• GibbsSampling/Metropolis-Hastings– Variational algorithms
• Frameinferenceasanoptimizationproblem,perturbittoanapproximateoneandsolvetheapproximateproblem
– LoopyBeliefpropagation• RunBPandhopeitworks!
– Thenot-so-goodnews:Approximateinferenceisalsointractable!31
1 2 3
4 5
NP-hardingeneral,worksforsimplegraphs
(moreonthisinfuturelectures)