bayesian belief network

53
Bayesian Belief Network

Upload: samira

Post on 06-Jan-2016

39 views

Category:

Documents


2 download

DESCRIPTION

Bayesian Belief Network. The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most important developments in the recent history of AI This can work well, even the assumption is not true!. v NB. Naive Bayes assumption: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bayesian Belief Network

Bayesian Belief Network

Page 2: Bayesian Belief Network

• The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one of the most important developments in the recent history of AI

• This can work well, even the assumption is not true!

),,()(

),,,(

)()()(

cavitycatchtoothachePcloudyWeatherP

cloudyWeathercavitycatchtoothacheP

bPaPbaP

====

=∧

Page 3: Bayesian Belief Network

vNB

Naive Bayes assumption:

which gives

Page 4: Bayesian Belief Network

Bayesian networks Conditional Independence Inference in Bayesian Networks Irrelevant variables Constructing Bayesian Networks Aprendizagem Redes Bayesianas

Examples - Exercisos

Page 5: Bayesian Belief Network

Naive Bayes assumption of conditional independence too restrictive

But it's intractable without some such assumptions...

Bayesian Belief networks describe conditional independence among subsets of variables

allows combining prior knowledge about (in)dependencies amongvariables with observed training data

Page 6: Bayesian Belief Network

Bayesian networks A simple, graphical notation for conditional independence

assertions and hence for compact specification of full joint distributions

Syntax: a set of nodes, one per variable a directed, acyclic graph (link ≈ "directly influences") a conditional distribution for each node given its parents:

P (Xi | Parents (Xi))

In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values

Page 7: Bayesian Belief Network

Bayesian Networks

Bayesian belief network allows a subset of the

variables conditionally independent

A graphical model of causal relationships Represents dependency among the variables Gives a specification of joint probability distribution

X Y

ZP

Nodes: random variablesLinks: dependencyX,Y are the parents of Z, and Y is the parent of PNo dependency between Z and PHas no loops or cycles

Page 8: Bayesian Belief Network

Conditional Independence Once we know that the patient has cavity we do

not expect the probability of the probe catching to depend on the presence of toothache

Independence between a and b

)|()|(

)|()|(

cavitytoothachePcatchcavitytoothacheP

cavitycatchPtoothachecavitycatchP

=∧=∧

)()|(

)()|(

bPabP

aPbaP

==

Page 9: Bayesian Belief Network

Example Topology of network encodes conditional independence assertions:

Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity

Page 10: Bayesian Belief Network

Bayesian Belief Network: An Example

FamilyHistory

LungCancer

PositiveXRay

Smoker

Emphysema

Dyspnea

LC

~LC

(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

0.8

0.2

0.5

0.5

0.7

0.3

0.1

0.9

Bayesian Belief Networks

The conditional probability table for the variable LungCancer:Shows the conditional probability for each possible combination of its parents

∏=

=n

iZParents iziPznzP

1))(|(),...,1(

Page 11: Bayesian Belief Network

Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor

Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar?

Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls

Network topology reflects "causal" knowledge:

A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call

Page 12: Bayesian Belief Network

Belief Networks

Burglary P(B)0.001

Earthquake P(E)0.002

Alarm

Burg. Earth. P(A)t t .95t f .94f t .29

f f .001

JohnCalls MaryCallsA P(J)t .90f .05

A P(M)t .7f .01

Page 13: Bayesian Belief Network

Full Joint Distribution

))(|(),...,(1

1 i

n

iin XparentsxPxxP ∏

=

=

00062.0998.0999.0001.07.09.0

)()()|()|()|(

)(

=××××=¬¬¬∧¬=

¬∧¬∧∧∧ePbPebaPamPajP

ebamjP

Page 14: Bayesian Belief Network

Compactness A CPT for Boolean Xi with k Boolean parents has 2k rows for the

combinations of parent values

Each row requires one number p for Xi = true(the number for Xi = false is just 1-p)

If each variable has no more than k parents, the complete network requires O(n · 2k) numbers

I.e., grows linearly with n, vs. O(2n) for the full joint distribution

For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31)

Page 15: Bayesian Belief Network

Inference in Bayesian Networks How can one infer the (probabilities of)

values of one or more network variables, given observed values of others?

Bayes net contains all information needed for this inference

If only one variable with unknown value, easy to infer it

In general case, problem is NP hard

Page 16: Bayesian Belief Network

Example

In the burglary network, we migth observe the event in which JohnCalls=true and MarryCalls=true

We could ask for the probability that the burglary has occured

P(Burglary|JohnCalls=ture,MarryCalls=true)

Page 17: Bayesian Belief Network

Remember - Joint distribution

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“ benötigt.

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

P(cavity | toothache) =P(cavity∧toothache)

P(toothache)

P(¬ cavity | toothache) =P(¬ cavity∧toothache)

P(toothache)€

=0.108 + 0.012

0.108 + 0.012 + 0.016 + 0.064= 0.6

=0.016 + 0.064

0.108 + 0.012 + 0.016 + 0.064= 0.4

Page 18: Bayesian Belief Network

Normalization

4.0,6.008.0,12.0

)|(),|(

)()|()|(

)|()|(1

=

¬

×=

¬+=

α

α

α

xyPxyP

YPYXPXYP

xyPxyP

Page 19: Bayesian Belief Network

Normalization

• X is the query variable• E evidence variable• Y remaining unobservable variable

• Summation over all possible y (all possible values of the unobservable varables Y)

P(Cavity | toothache) =αP(Cavity, toothache)

=α [P(Cavity, toothache,catch) + P(Cavity, toothache,¬ catch)]

=α [< 0.108,0.016 > + < 0.012,0.064 >] =α < 0.12,0.08 >=< 0.6,0.4 >

P(X | e) =αP(X,e) =α P(X,e,y)y

Page 20: Bayesian Belief Network

P(Burglary|JohnCalls=ture,MarryCalls=true)• The hidden variables of the query are Earthquake

and Alarm

• For Burglary=true in the Bayesain network

P(B | j,m) =αP(B, j,m) =α P(B,e,a, j,m)a

∑e

P(b | j,m) =α P(b)P(e)P(a |b,e)P( j | a)P(m | a)a

∑e

Page 21: Bayesian Belief Network

To compute we had to add four terms, each computed by multipling five numbers

In the worst case, where we have to sum out almost all variables, the complexity of the network with n Boolean variables is O(n2n)

Page 22: Bayesian Belief Network

P(b) is constant and can be moved out, P(e) term can be moved outside summation a

JohnCalls=true and MarryCalls=true, the probability that the burglary has occured is aboud 28%€

P(b | j,m) =αP(b) P(e) P(a |b,e)P( j | a)P(m | a)a

∑e

P(B, j,m) =α < 0.00059224,0.0014919 >≈< 0.284,0.716 >

Page 23: Bayesian Belief Network

Computation for Burglary=true

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 24: Bayesian Belief Network

Variable elimination algorithm• Eliminate repeated calculation

• Dynamic programming

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 25: Bayesian Belief Network

Irrelevant variables• (X query variable, E evidence variables)

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 26: Bayesian Belief Network

Complexity of exact inference

The burglary network belongs to a family of networks in which there is at most one undiracted path between tow nodes in the network These are called singly connected networks or

polytrees The time and space complexity of exact inference

in polytrees is linear in the size of network Size is defined by the number of CPT entries If the number of parents of each node is bounded by a

constant, then the complexity will be also linear in the number of nodes

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 27: Bayesian Belief Network

For multiply connected networks variable elimination can have exponentional time and space complexity

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 28: Bayesian Belief Network

Constructing Bayesian Networks

A Bayesian network is a correct representation of the domain only if each node is conditionally independent of its predecessors in the ordering, given its parents

P(MarryCalls|JohnCalls,Alarm,Eathquake,Bulgary)=P(MaryCalls|Alarm)

Page 29: Bayesian Belief Network

Conditional Independence relations in Bayesian networks

The toopological semantics is given either of the spqcifications of DESCENDANTS or MARKOV BLANKET

Page 30: Bayesian Belief Network

Local semantics

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 31: Bayesian Belief Network

Example

JohnCalls is indipendent of Burglary and Earthquake given the value of Alarm

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 32: Bayesian Belief Network

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 33: Bayesian Belief Network

Example

Burglary is indipendent of JohnCalls and MaryCalls given Alarm and Earthquake

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 34: Bayesian Belief Network

Constructing Bayesian networks 1. Choose an ordering of variables X1, … ,Xn

2. For i = 1 to n add Xi to the network select parents from X1, … ,Xi-1 such that

P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1)

This choice of parents guarantees:

P (X1, … ,Xn) = πni =1 P (Xi | X1, … , Xi-1)

= πni =1P (Xi | Parents(Xi))

(by construction) (chain rule)

Page 35: Bayesian Belief Network

The compactness of Bayesian networks is an example of locally structured systems Each subcomponent interacts directly with only

bounded number of other components

Constructing Bayesian networks is difficult Each variable should be directly influenced by only a

few others The network topology reflects thes direct influences

Page 36: Bayesian Belief Network

Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?

Example

Page 37: Bayesian Belief Network

Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?

P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No

P(B | A, J, M) = P(B | A)?

P(B | A, J, M) = P(B)?

No

Example

Page 38: Bayesian Belief Network

Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? YesP(B | A, J, M) = P(B)? NoP(E | B, A ,J, M) = P(E | A)?P(E | B, A, J, M) = P(E | A, B)?

No

Example

Page 39: Bayesian Belief Network

Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? YesP(B | A, J, M) = P(B)? NoP(E | B, A ,J, M) = P(E | A)? NoP(E | B, A, J, M) = P(E | A, B)? Yes

No

Example

Page 40: Bayesian Belief Network

Example contd.

Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed Some links represent tenous relationship that require difficult and unnatural

probability judgment, such the probability of Earthquake given Burglary and Alarm

Page 41: Bayesian Belief Network

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 42: Bayesian Belief Network

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 43: Bayesian Belief Network

Aprendizagem Redes Bayesianas

Como preencher as entradas numa Tabela de Probabilidade Condicional

1º Caso: Se a estrutura da rede bayesiana fôr conhecida, e todas as variavéis podem ser observadas do conjunto de treino.Então: Entrada (i,j) = utilizando os valores observados no conjunto de treino

2º Caso: Se a estrutura da rede bayesiana fôr conhecida, e algumas das variavéis não podem ser observadas no conjunto de treino.

Então utiliza-se método do algoritmo do gradiente ascendente

))(Pr/( ii YsedecessoreyP

Page 44: Bayesian Belief Network

Exemplo 1º caso

Person FH S E LC PXRay DPerson FH S E LC PXRay DP1 Sim Sim Não Sim + SimP2 Sim Não Não Sim - SimP3 Sim Não Sim Não + NãoP4 Não Sim Sim Sim - SimP5 Não Sim Não Não + Não

P6 Sim Sim ? ? ? ?

LC

~LC

(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)

0.5

P(LC = Sim \ FH=Sim, S=Sim) =0.5

=))(Pr/( ii YsedecessoreyP

FamilyHistory

LungCancer

Smoker

Emphysema

Page 45: Bayesian Belief Network

Exemplo 2º caso

Suppose structure known, variables partially observable Similar to training neural network with hidden units In fact, can learn network conditional probability tables using

gradient ascent

Person FH S E LC PXRay DPerson FH S E LC PXRay DP1 --- Sim --- Sim + SimP2 --- Não --- Sim - SimP3 --- Não --- Não + NãoP4 --- Sim --- Sim - SimP5 --- Sim --- Não + Não

P6 Sim Sim ? ? ? ?

Page 46: Bayesian Belief Network

Summary

Bayesian networks provide a natural representation for (causally induced) conditional independence

Topology + CPTs = compact representation of joint distribution

Generally easy for domain experts to construct

Page 47: Bayesian Belief Network
Page 48: Bayesian Belief Network
Page 49: Bayesian Belief Network
Page 50: Bayesian Belief Network
Page 51: Bayesian Belief Network

-> P(d|a,b,c)=P(d|a,c)=0.66

->

P(b | a,c,d) =α P(a)c

∑ P(b)P(c | a,b)P(d | a,c)

P(b | a,c,d) =αP(a)P(b) P(c | a,b)P(d | a,c)c

P(B | a,c,d) =α < 0.05,0.075 >=< 0.4,0.6 >

P(b | a,c,d) = 0.6

P(d | a,b,c) =αP(a)P(b)P(c | a,b)P(d | a,c)

P(D | a,b,c) =α < 0.0825,0.0425 >=< 0.66,034 >

Page 52: Bayesian Belief Network

Bayesian networks Conditional Independence Inference in Bayesian Networks Irrelevant variables Constructing Bayesian Networks Aprendizagem Redes Bayesianas

Examples - Exercisos

Page 53: Bayesian Belief Network

árv dec ID3