2014_0287_bayes networks.ppt

Post on 12-Apr-2016

10 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1.Bayesian Networks2.Conditional Independence3.Creating Tables4.Notations for Bayesian Networks5.5.Calculating conditional probabilities Calculating conditional probabilities

from the tablefrom the tables6.6.Calculating conditional independenceCalculating conditional independence7.7.Markov Chain Monte CarloMarkov Chain Monte Carlo8.8.Markov Models.Markov Models.9.9.Markov Models and Probabilistic Markov Models and Probabilistic

methods in visionmethods in vision

Used in Spring 2012, Spring 2013, Winter 2014 (partially)

Introduction to Introduction to Probabilistic RoboticsProbabilistic Robotics

1. Probabilities2. Bayes rule3. Bayes filters4. Bayes networks5. Markov Chains

new

next

Bayesian Networks and Bayesian Networks and Markov ModelsMarkov Models

many many more

• Bayesian networks and Markov models :Bayesian networks and Markov models :1. Applications in User ModelingUser Modeling2. Applications in Natural Language Natural Language

ProcessingProcessing3. Applications in robotic controlrobotic control4.4. Applications in Applications in robot Visionrobot Vision

Bayesian Networks (BNs) – Bayesian Networks (BNs) – OverviewOverview

• Introduction to BNs– Nodes, structure and probabilities– Reasoning with BNs– Understanding BNs

• Extensions of BNs– DecisionDecision Networks– DynamicDynamic Bayesian Networks (DBNs)

Definition of Bayesian Networks

• A data structure that represents the dependence between variables

• Gives a concise specification of the joint probability joint probability distributiondistribution

• A Bayesian Network is a directed acyclic graph (DAG) in which the following holds:1. A set of random variables makes up the nodes in the

network2. A set of directed links connects pairs of nodes3. Each node has a probability distribution that quantifies

the effects of its parents

5

Conditional Conditional IndependenceIndependence

The relationship between :– conditional independence – and BN structure is important for understanding how BNs work

6

Conditional Independence – Causal ChainsCausal Chains

• Causal chains give rise to conditional independence

• Example: “Smoking causes cancer, which causes dyspnoea”

)|()|( BCPBACP

A B C

A B C

smoking cancer dyspnoea

Conditional Independence – Common CausesCommon Causes

• Common Causes (or ancestors) also give rise to conditional independence

Example: Example: “Cancer is a common cause of the two symptoms: a positive Xray and dyspnoea”

BCABCPBACP | indep )|()|(

A

B

C

cancer

( ) (A indep C) | B

Xray dyspnoea

I have dyspnoea (C ) because of cancer (B) so I do not need an Xray test

Conditional Dependence – Common Effects

• Common effects (or their descendants) give rise to conditional dependence

• Example: “Cancer is a common effect common effect of pollution and smoking”Given cancer, smoking “explains away” pollution

)| indep ()()()|( BCACPAPBCAP

A

B

C

cancer

smoking

pollution

pollutioncancer

( )

We know that you smoke and have cancer, we do not need to assume that your cancer was caused by pollution

???

Joint Distributions Joint Distributions for describing for describing uncertain worldsuncertain worlds

• Researchers found already numerous and dramatic benefits of Joint Distributions for describing uncertain worlds

• Students in robotics and Artificial Intelligence have to understand problems with using Joint Distributions

• You should discover how Bayes Net methodology allows us to build Joint Distributions in manageable chunks

10

Bayes Net methodologyBayes Net methodology

1. Bayesian Methods are one of the most important most important conceptual advances in the Machine Learning / AI field to have emerged since 1995.

2. A clean, clear, manageable language and methodology for expressing what the robot designer is certain and uncertain about

3. Already, many practical applications in medicine, factories, helpdesks: for instancefor instance1. P(this problem | these symptoms) // we will use P as probability2. anomalousness of this observation3. choosing next diagnostic test | these observations

11

Why Bayesian methods matter?

12

Problem 1: Problem 1: Creating Creating Joint Joint

Distribution TableDistribution Table• Joint Distribution Table is an

important concept

Truth table of all combinations of Boolean

Variables

Probabilistic truth table • You can

guess this table, you can take data from some statistics,

• You can build this table based on some partial tables

• Idea – use decision diagrams to represent these data.

Use of Use of independence independence

while creating the while creating the tablestables

Wet – Wet – Sprinkler – Sprinkler –

Rain Rain ExampleExample17

18

W

S

R

Wet-Sprinkler-Rain Example

Problem 1: Problem 1: Creating the Joint Creating the Joint

TableTable

Our Goal is to derive this table

But the same data can be stored explicitely or implicitely, not necessarily in the form of a table!!

What extra assumptions can help to create this

table?

Let us observe that if I know 7 of these, the eight is obviously unique , as their sum = 1

So I need to guess or calculate or find 2n-1 = 7 values

21Wet-Sprinkler-Rain Example

22Understanding of causation

Wet-Sprinkler-Rain Example

Sprinkler on Sprinkler on under condition that it rainedrained

You need to understand causation when you create the table

Independence simplifies probabilitiesIndependence simplifies probabilities

23Wet-Sprinkler-Rain Example

We can use these probabilities to create the

table

P(S|R) = Sprinkler on under condition that it rained

S and R are independent

We use independence of variables S and R

24

This first step shows the collected data

Conditional Probability Table

(CPT)

Grass is wet

It rainedSprinkler was on

Wet-Sprinkler-Rain Example

We create the CPT for S and R based on our knowledge of the problem

What about children playing or a dog pissing?It is still possible by this value 0.1

Full joint for only S and RFull joint for only S and R

25Wet-Sprinkler-Rain Example

Use chain rule for probabilities

0.950.900.90

0.01

Independence of S and R is used

Chain Rule Chain Rule for Probabilitiesfor Probabilities

0.950.900.90

0.01

Random variables

Full joint probability

Wet-Sprinkler-Rain Example

• You have a table

• You want to calculate some probability

P(~W)

28

Six numbers

Wet-Sprinkler-Rain Example

We reduced only from seven to six numbers

Independence of S and R implies calculating fewer numbers Independence of S and R implies calculating fewer numbers to create the complete Joint Table for W, S and Rto create the complete Joint Table for W, S and R

Explanation of Explanation of Diagrammatic Diagrammatic

NotationsNotationssuch as Bayes Networkssuch as Bayes Networks

You do not need to build the complete table!!

30You can build a graph of tables or nodes which correspond to certain types of tables

Wet-Sprinkler-Rain Example

This first step shows the collected data

Conditional Probability Table

(CPT)

Grass is wet

It rainedSprinkler was on

Wet-Sprinkler-Rain Example

Full joint probability

When you have this table you can modify it,

you can also calculate everything!!

• You have a table

• You want to calculate some probability

P(~W)

Problem 2: Problem 2: Calculating conditional Calculating conditional probabilities from the probabilities from the

Joint Distribution Joint Distribution TableTable

Probability that S=T and W=TProbability that S=T = Probability that grass is wet

under assumption that sprinkler was on

Wet-Sprinkler-Rain Example

36Wet-Sprinkler-Rain Example

37

We showed examples of both causal inference and diagnostic inference

Wet-Sprinkler-Rain Example

We will use this in next slide

““Explaining Away” the facts Explaining Away” the facts from the tablefrom the table

38Wet-Sprinkler-Rain Example

<<

Calculated earlier from this table

Conclusions on this problem

1. Table can be used for Explaining Away2. Table can be used to calculate

conditional independence.3. Table can be used to calculate

conditional probabilities4. Table can be used to determine

causality

39

Problem 3: What if S and R Problem 3: What if S and R are dependent?are dependent?

Calculating Calculating conditional independenceconditional independence

Conditional Independence of Conditional Independence of SS and and RR

41

Wet-Sprinkler-Rain Example

Diagrammatic notation for conditional Diagrammatic notation for conditional Independence Independence of two variablesof two variables

42

Wet-Sprinkler-Rain Example

extended

S3

S2S1

Conditional Independence formalized for Conditional Independence formalized for sets of variablessets of variables

Now we will explain conditional independence

CLOUDY - Wet-Sprinkler-Rain Example

Example – Example – Lung Cancer Lung Cancer

DiagnosisDiagnosis

Example – Lung Cancer DiagnosisExample – Lung Cancer Diagnosis1. A patient has been suffering from shortness of

breath (called dyspnoea) and visits the doctor, worried that he has lung cancer.

2. The doctor knows that other diseases, such as tuberculosis and bronchitis are possible causes, as well as lung cancer.

3. She also knows that other relevant information includes whether or not the patient is a smoker (increasing the chances of cancer and bronchitis) and what sort of air pollution he has been exposed to.

4. A positive Xray would indicate either TB or lung cancer. 46

Nodes and Values in Bayesian NetworksQ: What are the Q: What are the nodes to represent and what and what

values can they take? values can they take?A: Nodes can be discrete or continuous• Boolean nodes – represent propositions taking binary

valuesExample: Cancer node represents proposition “the patient has cancer”

• Ordered valuesExample: Pollution node with values low, medium, high

• Integral valuesExample: Age with possible values 1-120

47Lung Cancer Lung Cancer

Lung Cancer Example: Nodes and Values

Node name Type ValuesPollution Binary {low,high}Smoker Boolean {T,F}Cancer Boolean {T,F}Dyspnoea Boolean {T,F}Xray Binary {pos,neg}

Example of variables as nodes in BN

Lung Cancer Example: Bayesian Network StructureBayesian Network Structure

49

Pollution Smoker

Cancer

Xray Dyspnoea

Lung Cancer Lung Cancer

CConditional onditional PProbability robability TTablesables (CPTs) (CPTs)

in Bayesian Networks in Bayesian Networks

Conditional Probability Conditional Probability TablesTables (CPTs (CPTs) in ) in Bayesian NetworksBayesian Networks

After specifying topology, we must specify specify

the CPT the CPT for each discrete node1. Each row of CPT contains the

conditional probability of each node value for each possible combination of values in its parent nodes

2. Each row of CPT must sum to 13. A CPT for a Boolean variable with n

Boolean parents contains 2n+1 probabilities

4. A node with no parents has one row (its prior probabilities)

51

Lung Cancer Example: example of CPT

52

Pollution low

Smoking is true

C= cancerP = pollutionS = smokingX = XrayD = Dyspnoea

Bayesian Network for cancer

Probability of cancer given state of variables P and S

Lung Cancer Lung Cancer

• Several small CPTs are used to create larger JDTs.

The Markov Property Markov Property for Bayesian Networks

• Modelling with BNs requires assuming the Markov Property:– There are no direct dependencies in the system

being modelled which are not already explicitly shown via arcs

• Example: smoking can influence dyspnoeadyspnoea only through causing cancer

Markov’s Idea: Markov’s Idea: all information is modelled by arcs

Software NETICA Software NETICA for for

Bayesian Networks Bayesian Networks and and

joint probabilitiesjoint probabilities

Reasoning with Numbers – Using Netica software

56Here are the collected dataLung Cancer Lung Cancer

Representing the Representing the Joint Probability Joint Probability Distribution: Distribution: ExampleExample

)|Pr()|Pr(

),|Pr(

)Pr()Pr(

)Pr(

TCTDTCposX

FSlowPTC

FSlowP

TDposXTCFSlowP

57

P = pollutionS = smokingX = XrayD = Dyspnoea

This graph shows how we can calculate the joint probability from other probabilities in the network

We want to calculate this

Lung Cancer Lung Cancer

Problem 4:Problem 4:Determining Causality Determining Causality

and Bayes Nets: and Bayes Nets:

Advertisement Advertisement exampleexample

Causality and Bayes Nets: Causality and Bayes Nets: Advertisement exampleAdvertisement example• Bayes nets allow one to learn about causal

relationships• One more Example:– Marketing analysts want to know whether to increase,

decrease or leave unchanged the exposure of some advertisement in order to maximize profit from the sale of some product

– Advertised (A) and Buy (B) will be variables for someone having seen the advertisement seen the advertisement or purchased the product

Advertised-Buy Example

Causality ExampleExample1. So we want to know the probability that B=true given that

we force A=true, or A=false

2. We could do this by finding two similar populations and observing B based on A=trueA=true for one and A=false for the other

3. But this may be difficult or expensive difficult or expensive to find such populations

Advertised-Buy Example

–Buy (B) will be variables for someone purchased the product

–Advertised (A) seen the advertisementseen the advertisement

How causality can be How causality can be represented in a represented in a

graph?graph?

Markov condition and Causal Markov condition and Causal Markov ConditionMarkov Condition

• But how do we learn whether or not A causes B at all?

• The Markov Condition Markov Condition states:– Any node in a Bayes net is conditionally independent of its

non-descendants given its parents

• The CAUSAL Markov Condition CAUSAL Markov Condition (CMC) states:– Any phenomenon in a causal net is independent of its non-is independent of its non-

effects effects given its direct causes

Advertised-Buy ExampleAdvertised (A) and Buy (B)

Acyclic Causal Graph Acyclic Causal Graph versus versus Bayes Net Bayes Net

• Thus, if we have a directed acyclic causal graph C for variables in X, then, by the Causal Markov Condition, C is also a Bayes net is also a Bayes net for the joint probability distribution of X

• The reverse is not necessarily true—a network may satisfy the Markov condition without depicting causality

Advertised-Buy Example

Causality Example: Causality Example: when we learn that p(b|a) and p(b|a’) are not equal

• Given the Causal Markov Condition CMCCausal Markov Condition CMC, we can infer causal relationships from conditional (in)dependence relationships learned from the datalearned from the data

• Suppose we learn with high Bayesian probability with high Bayesian probability that p(b|a)p(b|a) and p(b|a’) p(b|a’) are not equalnot equal

• Given the CMC, there are four simple causal explanations for this: (more complex ones too)

Causality Example: four causal explanations

B causes AIf you buy more, they have more money to advertise

A causes B

If they advertise more you buy more

Causality Example: four causal explanations

1. Hidden common cause of A and B (e.g. income)

selection bias

A and B are causes for data selection 1. (a.k.a. selection bias, 2. perhaps if database didn’t record

false instances of A and B)

In rich country they advertise more and they buy more

If you increase information about Ad in database, then you increase also information about Buy in the database

Causality Example continued

• But we still don’t know if A But we still don’t know if A causes Bcauses B

• Suppose– We learn about the Income (I)

and geographic Location (L) of the purchaser

– And we learn with high Bayesian probability the network on the right

Advertised-Buy ExampleAdvertised (A=Ad) and Buy (B)

Causality Example - using CMC

• Given the Causal Markov Condition CMC, the ONLY causal explanation for the conditional (in)dependence relationships encoded in the Bayes net is that Ad is a cause for Buy

• That is, none of the other relationships or combinations thereof produce the probabilistic relationships encoded here

Advertised-Buy ExampleAdvertised (Ad) and Buy (B)

Causality in Bayes Networks

• Thus, Bayes Nets allow inference of causal inference of causal relationshipsrelationships by the Causal Markov Condition (CMC)

Problem 5:Problem 5:Determine Determine

D-separationD-separation in in Bayesian NetworksBayesian Networks

D-separation in Bayesian NetworksD-separation in Bayesian Networks

• We will formulate a Graphical Criterion a Graphical Criterion of conditional independence

• We can determine whether a set of nodes X is independent of another set Y, given a set of evidence nodes E, via the Markov property:Markov property:– If every undirected path from a node in X to a

node in Y is d-separated by E, then X and Y are conditionally independent conditionally independent given E

Determining D-separation (cont)

Chain

Common Common causecause

Common Common effecteffect

• A set of nodes E d-separates two sets of nodes X and Ynodes X and Y, if every undirected path from a node in X to a node in Y is blocked given E

• A path is blocked given a set of nodes E, if there is a node Z on the path for which one of three conditions holds:1. Z is in E and Z has one arrow on the path leading in and one arrow out (chain)(chain)2. Z is in E and Z has both path arrows leading out (common cause(common cause)3. Neither Z nor any descendant of Z is in E, and both path arrows lead into Z (common effectcommon effect)

Another Another ExampleExample of of Bayesian Networks: Bayesian Networks: AlarmAlarm

• Let us draw BN from these data

Alarm Example

74

Bayes Net Bayes Net Corresponding to Corresponding to Alarm-Alarm-

Burglar problemBurglar problem

Alarm Example

Compactness, Global Semantics, Local Compactness, Global Semantics, Local Semantics and Markov BlanketSemantics and Markov Blanket

• Compactness of Bayes Net

75Alarm Example

Burglar Earthquake

John callsMary calls

Global Semantics, Global Semantics, Local Semantics and Local Semantics and

Markov Blanket Markov Blanket for BNsfor BNs• Useful concepts 76

Alarm Example

78

79

Markov’s blanket are:1.Parents2.Children3.Children’s parent

Problem 6Problem 6How to How to

systematically Build systematically Build a Bayes Network a Bayes Network

-- -- ExampleExample

81

Alarm Example

83Alarm Example

84Alarm Example

85

So we add arrow

Alarm Example

86Alarm Example

87Alarm Example

Bayes Net for the car that does not Bayes Net for the car that does not want to startwant to start

88

Such networks can be used for robot diagnostics or diagnostic of a human done by robot

Inference in Bayes Nets and how to Inference in Bayes Nets and how to simplify itsimplify it

89

Alarm Example

First method of simplification: First method of simplification: EnumerationEnumeration

90Alarm Example

91Alarm Example

Second method: Second method: Variable Variable EliminationElimination

92

Alarm Example

Variable A was eliminated

Variable E was eliminated

93Polytrees are better

3SAT Example

94

IDEA: Convert DAG to polytrees

Clustering is used to convert non-Clustering is used to convert non-polytree BNspolytree BNs

95

96

Not a polytree Is a polytree

Alarm Example

EXAMPLEEXAMPLE: Clustering is used to convert : Clustering is used to convert non-polytree BNsnon-polytree BNs

• Approximate InferenceApproximate Inference

1. Direct sampling methods2. Rejection sampling3. Likelihood weighting4. Markov chain Monte Carlo

97

1. Direct 1. Direct Sampling Sampling MethodsMethods

98

Direct SamplingDirect SamplingDirect Sampling generates minterms with their probabilities

100Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

We start from top

101Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

Cloudy = yes

102Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

Cloudy = yes

103Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

sprinkler = no

104Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

105Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

106Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

We generated a sample minterm C S’ R W

2. Rejection 2. Rejection Sampling Sampling MethodsMethods

107

Rejection Sampling• Reject inconsistent samples

108Wet Sprinkler Rain Example

3. Likelihood 3. Likelihood weighting weighting methodsmethods

109

110

111Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

112Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

113Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

114Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

115Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

116Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

117Wet Sprinkler Rain Example

W=wetC= cloudyR = rainS = sprinkler

Likelihood weighting Likelihood weighting vs. rejection rejection samplingsampling

1. Both generate consistent estimates of the joint distribution conditioned on the values of the evidence variables

2. Likelihood weighting converges faster converges faster to the correct probabilities

3. But even likelihood weighting degrades likelihood weighting degrades with many evidence variables because a few samples will have nearly all the total weight

119

SourcesProf. David PageMatthew G. Lee

Nuria Oliver,Barbara Rosario

Alex PentlandEhrlich Av

Ronald J. WilliamsAndrew Moore tutorial with the same title

Russell &Norvig’s AIMA site Alpaydin’s Introduction to Machine Learning site.

top related