uncertainty (chapter 13, russell & norvig)e.g., p(cavity) =p(cavity=true)= 0.1 p(weather=sunny)...

41
CS151, Spring 2007 Modified from slides by © David Kriegman, 2001 Uncertainty Uncertainty (Chapter 13, Russell & (Chapter 13, Russell & Norvig Norvig ) ) Introduction to Artificial Intelligence CS 150 Lecture 14

Upload: others

Post on 17-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

UncertaintyUncertainty(Chapter 13, Russell &(Chapter 13, Russell & NorvigNorvig))

Introduction to Artificial IntelligenceCS 150

Lecture 14

Page 2: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

AdministrationAdministration

• Last Programming assignment will be handed outlater this week.

• I am doing probability today because we want youto implement a Naïve Bayes spam filter.

• To understand a Naïve Bayes classifier, you need tounderstand Bayes.

• To understand Bayes, you have to understandprobability….so neural nets will have to wait untilnext week…

Page 3: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

This LectureThis Lecture

• Probability• Syntax• Semantics• Inference rules• How to build a Naïve Bayes classifier

ReadingChapter 13

Page 4: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

The real world: Things go wrongThe real world: Things go wrong

• Incomplete information– Unknown preconditions, e.g., Is the spare actually intact?– Disjunctive effects, e.g., Inflating a tire with a pump may cause

the tire to inflate or a slow hiss or the tire may burst or …

• Incorrect information– Current state incorrect, e.g., spare NOT intact– Missing/incorrect postconditions (effects) in operators

• Qualification problem:– can never finish listing all the required preconditions and

possible conditional outcomes of actions

Consider a plan for changing a tireafter getting a flat using operators ofRemoveTire(x), PutOnTire(x),InflateTire(x)

Page 5: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Possible SolutionsPossible Solutions• Conditional planning

– Plan to obtain information (observation actions)– Subplan for each contingency, e.g.,

[Check(Tire1), [IF Intact(Tire1)

THEN [Inflate(Tire1)]ELSE [CallAAA] ]

– Expensive because it plans for many unlikely cases

• Monitoring/Replanning– Assume normal states, outcomes– Check progress during execution, replan if necessary– Unanticipated outcomes may lead to failure (e.g., no AAA card)

• In general, some monitoring is unavoidable

Page 6: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

• Let action Att = leave for airport t t minutes before flight from Lindbergh Field• Will At get me there on time?

Problems:1. Partial observability (road state, other drivers' plans, etc.)2. Noisy sensors (traffic reports)3. Uncertainty in action outcomes (turn key, car doesn’t start, etc.)4. Immense complexity of modeling and predicting traffic

Hence a purely logical approach either1) risks falsehood: “A90 will get me there on time,” or2) leads to conclusions that are too weak for decision making:

“A90 will get me there on time if there's no accident on I-5and it doesn't rain and my tires remain intact, etc., etc.”

(A1440 might reasonably be said to get me there on time but I'd have to stay overnight inthe airport …)

Dealing with Uncertainty Head-on:Dealing with Uncertainty Head-on:ProbabilityProbability

Page 7: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Methods for handling uncertaintyMethods for handling uncertaintyDefault or nonmonotonic logic:

Assume my car does not have a flat tireAssume A125 works unless contradicted by evidence

Issues: What assumptions are reasonable? How to handle contradiction?

Rules with fudge factors:A90 0 .3 get there on timeSprinkler 0.99 WetGrassWetGrass 0.7 Rain

Issues: Problems with rule combination, e.g., does Sprinkler cause Rain??

ProbabilityGiven the available evidence,

A90 will get me there on time with probability 0.95

(Fuzzy logic handles degree of truth NOT uncertaintye.g., WetGrass is true to degree 0.2)

Page 8: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Fuzzy Logic in Real WorldFuzzy Logic in Real World

Page 9: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

ProbabilityProbabilityProbabilistic assertions summarize effects of

Ignorance: lack of relevant facts, initial conditions, etc.Laziness: failure to enumerate exceptions, qualifications, etc.

Subjective or Bayesian probability:Probabilities relate propositions to one's own state of knowledge

e.g., P(A90 succeeds | no reported accidents) = 0.97

These are NOT assertions about the world, but represent belief about thewhether the assertion is true.

Probabilities of propositions change with new evidence:e.g., P(A90 | no reported accidents, 5 a.m.) = 0.99

(Analogous to logical entailment status; I.e., does KB |= α )

Page 10: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Making decisions under uncertaintyMaking decisions under uncertainty• Suppose I believe the following:

P(A 30 gets me there on time | ...) = 0.05P(A 60 gets me there on time | ...) = 0.70P(A 100 gets me there on time | ...) = 0.95P(A 1440 gets me there on time | ...) = 0.9999

• Which action to choose?

• Depends on my preferences for missing flight vs. airport cuisine, etc.

•• Utility theoryUtility theory is used to represent and infer preferences

•• Decision theory = utility theory + probability theoryDecision theory = utility theory + probability theory

Page 11: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Unconditional ProbabilityUnconditional Probability• Let A be a proposition, P(A) denotes the unconditional

probability that A is true.

• Example: if Male denotes the proposition that a particularperson is male, then P(Male)=0.5 mean that without anyother information, the probability of that person beingmale is 0.5 (a 50% chance).

• Alternatively, if a population is sampled, then 50% of thepeople will be male.

• Of course, with additional information (e.g. that the personis a CS151 student), the “posterior probability” will likelybe different.

Page 12: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Axioms of probabilityAxioms of probabilityFor any propositions A, B

1. 0 ≤ P(A) ≤ 12. P(True) = 1 and P(False) = 03. P(A ∨ B) = P(A) + P(B) - P(A ∧ B)

de Finetti (1931): an agent who bets according toprobabilities that violate these axioms can be forced to betso as to lose money regardless of outcome.

Page 13: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

SyntaxSyntaxSimilar to PROPOSITIONAL LOGIC: possible worlds defined by

assignment of values to random variables.

Note: To make things confusing variables have first letter upper-case, andsymbol values are lower-case

Propositional or Boolean random variables e.g., Cavity (do I have a cavity?)

Include propositional logic expressions

e.g., ¬Burglary ∨ Earthquake

Multivalued random variablese.g., Weather is one of <sunny,rain,cloudy,snow>

Values must be exhaustive and mutually exclusive

A proposition is constructed by assignment of a value:e.g., Weather = sunny; also Cavity = true for clarity

Page 14: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Priors, Distribution, JointPriors, Distribution, JointPrior or unconditional probabilities of propositions

e.g., P(Cavity) =P(Cavity=TRUE)= 0.1P(Weather=sunny) = 0.72

correspond to belief prior to arrival of any (new) evidence

Probability distribution gives probabilities of all possible valuesof the random variable.

Weather is one of <sunny,rain,cloudy,snow> P(Weather) = <0.72, 0.1, 0.08, 0.1>(normalized, i.e., sums to 1)

Page 15: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Joint probability distributionJoint probability distributionJoint probability distribution for a set of variables gives values for each

possible assignment to all the variables

P(Toothache, Cavity) is a 2 by 2 matrix.

NOTE: Elements in table sum to 1 3 independentnumbers.

0.890.01Cavity=false0.060.04Cavity=true

Toothache = falseToothache=true

Page 16: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

The importance of independenceThe importance of independence

• P(Weather,Cavity) is a 4 by 2 matrix of values:

• But these are independent (usually):– Recall that if X and Y are independent then:– P(X=x,Y=y) = P(X=x)*P(Y=y)

Weather is one of <sunny,rain,cloudy,snow> P(Weather) = <0.72, 0.1, 0.08, 0.1> and P(Cavity) = .1

.09.072.09.648Cavity=False

.01.008.01.072Cavity=True

snowcloudyrainsunnyWeather =

Page 17: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

The importance of independenceThe importance of independence

If we compute the marginals (sums of rows or columns), we see:

Each entry is just the product of the marginals (because they’re independent)!So, instead of using 7 numbers, we can represent these by just recording 4

numbers, 3 from Weather and 1 from Cavity:Weather is one of <sunny,rain,cloudy,snow>

P(Weather) = <0.72, 0.1, 0.08, (1- Sum of the others)>P(Cavity) = <.1, (1-.1)>

This can be much more efficient (more later…)

.1

.09

.01

snow

.08.1.72marginals

.9.072.09.648Cavity=False

.1.008.01.072Cavity=True

marginalscloudyrainsunnyWeather =

Page 18: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Conditional ProbabilitiesConditional Probabilities• Conditional or posterior probabilities

e.g., P(Cavity | Toothache) = 0.8 What is the probability of having a cavity given that the patient has a

toothache? (as another example: P(Male | CS151Student) = ??

• If we know more, e.g., dental probe catches, then we haveP(Cavity | Toothache, Catch) = .9 (say)

• If we know even more, e.g., Cavity is also given, then we have P(Cavity | Toothache, Cavity) = 1

Note: the less specific belief remains valid after more evidence arrives, butis not always useful.

• New evidence may be irrelevant, allowing simplification, e.g.,P(Cavity | Toothache, PadresWin) = P(Cavity | Toothache) = 0.8(again, this is because Cavities and Baseball are independent

Page 19: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

• Conditional or posterior probabilities e.g., P(Cavity | Toothache) = 0.8 What is the probability of having a cavity given that the patient has a

toothache?• Definition of conditional probability:

P(A|B) = P(A, B) / P(B) if P(B) ≠ 0

I.e., P(A|B) means,our universe hasshrunk to the onein which B is true.

P(A|B) is the ratio ofthe red area to theyellow area.

Conditional ProbabilitiesConditional Probabilities

Page 20: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Conditional ProbabilitiesConditional Probabilities• Conditional or posterior probabilities

e.g., P(Cavity | Toothache) = 0.8 What is the probability of having a cavity given that the patient

has a toothache?

• Definition of conditional probability: P(A|B) = P(A, B) / P(B) if P(B) ≠ 0

P(Cavity|Toothache) = .04/(.04+.01) = 0.8

0.890.01Cavity=false0.060.04Cavity=true

Toothache = falseToothache=true

Page 21: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Conditional probability cont.Conditional probability cont.Definition of conditional probability:

P(A | B) = P(A, B) / P(B) if P(B) ≠ 0

Product rule gives an alternative formulation: P(A, B) = P(A | B)P(B) = P(B|A)P(A)

Example:P(Male)= 0.5P(CS151Student) = 0.0037P(Male | CS151Student) = 0.9

What is the probability of being a male CS151 student?P(Male, CS151Student)

= P(Male | CS151Student)P(CS151Student)= 0.0037 * 0.9 = 0.0033

Page 22: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Conditional probability cont.Conditional probability cont.A general version holds for whole distributions, e.g.,

P(Weather,Cavity) = P(Weather|Cavity) P(Cavity)(View as a 4 X 2 set of equations, not matrix multiplication -The book calls this a pointwise product)

The Chain rule is derived by successive application of product rule:P(X1,…Xn) = P(X1, …, Xn-1) P(Xn | X1, …, Xn-1)

= P(X1, …,Xn-2) P(Xn-1| X1 , …,Xn-2) P(Xn | X1, …, Xn-1)= … n= ∏ P(Xi | X1, …, Xi-1) i=1

Page 23: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Bayes Bayes RuleRule From product rule P(A∧ B) = P(A|B)P(B) = P(B|A)P(A), we can

obtain Bayes' rule

Why is this useful???

For assessing diagnostic probability from causal probability:

e.g. Cavities as the cause of toothaches

Sleeping late as the cause for missing class

Meningitis as the cause of a stiff neck

)(

)()|()|(

BP

APABPBAP =

)(

)()|()|(

EffectP

CausePCauseEffectPEffectCauseP =

Page 24: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Bayes Bayes Rule: ExampleRule: Example

Let M be meningitis, S be stiff neck

P(M)=0.0001

P(S)= 0.1

P(S | M) = 0.8

)(

)()||()|(

EffectP

CausePCauseEffectPEffectCauseP =

0008.01.0

0001.08.0

)(

)()|()|( =

!==

SP

MPMSPSMP

Note: posterior probability of meningitis still very small!

Page 25: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Bayes Bayes Rule: Another ExampleRule: Another Example

Let A be AIDS, T be positive test result for AIDS

P(T|A)=0.99 “true positive” rate of test (aka a “hit”)

P(T|~A)= 0.005 “false positive” rate of test (aka a “false alarm”)

P(~T | A) = 1-0.99=.01 “false negative” or “miss”

P(~T|~A) = 0.995 “correct rejection”

Seems like a pretty good test, right?

Page 26: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Bayes Bayes Rule: Another ExampleRule: Another Example Let A be AIDS, T be positive test result for AIDSP(T|A)=0.99 “true positive” rate of test (aka a “hit”)P(T|~A)= 0.005 “false positive” rate of test (aka a “false alarm”)P(~T | A) = 1-0.99=.01 “false negative” or “miss”P(~T|~A) = 0.995 “correct rejection”Now, suppose you are from a low-risk group, so P(A) = .001, and you get a

positive test result. Should you worry?Let’s compute it using Bayes’ rule: P(A|T) = P(T|A)*P(A)/P(T)Hmmm. Wait a minute: How do we know P(T)?Recall there are two possibilities: Either you have AIDS or you don’t.Therefore: P(A|T) + P(~A|T) = 1.P(~A|T) = P(T|~A)*P(~A)/P(T)So, P(A|T) + P(~A|T) = P(T|A)*P(A)/P(T) + P(T|~A)*P(~A)/P(T) = 1

Page 27: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Bayes Bayes Rule: Another ExampleRule: Another ExampleP(A|T) = P(T|A)*P(A)/P(T)Hmmm. Wait a minute: How do we know P(T)?Recall there are two possibilities: Either you have AIDS or you don’t.Therefore: P(A|T) + P(~A|T) = 1.P(A|T) + P(~A|T) = 1.Now, by Bayes’ rule:P(~A|T) = P(T|~A)*P(~A)/P(T)So,P(A|T) + P(~A|T) = P(T|A)*P(A)/P(T) + P(T|~A)*P(~A)/P(T) = 1⇒ P(T|A)*P(A) + P(T|~A)*P(~A) = P(T)⇒ The denominator is always the sum of the numeratorsalways the sum of the numerators if they are exhaustive

of the possibilities - a normalizing constant to make the probabilities sum to 1!⇒ We can compute the numerators and then normalize afterwards!

Page 28: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Bayes Bayes Rule: Back to the ExampleRule: Back to the Example Let A be AIDS, T be positive test result for AIDSP(T|A)=0.99 “true positive” rate of test (aka a “hit”)P(T|~A)= 0.005 “false positive” rate of test (aka a “false alarm”)P(~T | A) = 1-0.99=.01 “false negative” or “miss”P(~T|~A) = 0.995 “correct rejection”Now, suppose you are from a low-risk group, so P(A) = .001, and you get a

positive test result. Should you worry?

P(A|T) ∝P(T|A)*P(A) = 0.99*.001 = .00099

P(~A|T) ∝ P(T|~A)*P(~A) = 0.005*.999 = 0.004995

P(A|T) = .00099/(.00099+.004995) = 0.165Maybe you should worry, or maybe you should get re-tested….

Page 29: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Bayes Bayes Rule: Back to the Example IIRule: Back to the Example II Let A be AIDS, T be positive test result for AIDSP(T|A)=0.99 “true positive” rate of test (aka a “hit”)P(T|~A)= 0.005 “false positive” rate of test (aka a “false alarm”)P(~T | A) = 1-0.99=.01 “false negative” or “miss”P(~T|~A) = 0.995 “correct rejection”Now, suppose you are from a high-risk group, so P(A) = .01, and you get

a positive test result. Should you worry?

P(A|T) ∝P(T|A)*P(A) = 0.99*.01 = .0099

P(~A|T) ∝ P(T|~A)*P(~A) = 0.005*.99 = 0.00495

P(A|T) = .0099/(.0099+.00495) = 0.667=> You should worry!

Page 30: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Bayes Bayes Rule: Another ExampleRule: Another Example Let A be AIDS, T be positive test result for AIDSP(T|A)=0.99 “true positive” rate of test (aka a “hit”)P(T|~A)= 0.005 “false positive” rate of test (aka a “false alarm”)P(~T | A) = 1-0.99=.01 “false negative” or “miss”P(~T|~A) = 0.995 “correct rejection”Now, suppose you are from a low-risk group, so P(A) = .001, and you get a positive

test result. Should you worry?Let’s compute it using Bayes’ rule: P(A|T) = P(T|A)*P(A)/P(T)Hmmm. Wait a minute: How do we know P(T)?Recall there are two possibilities: Either you have AIDS or you don’t.Therefore: P(A|T) + P(~A|T) = 1.P(~A|T) = P(T|~A)*P(~A)/P(T)So, P(A|T) + P(~A|T) = P(T|A)*P(A)/P(T) + P(T|~A)*P(~A)/P(T) = 1

Page 31: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

A generalized A generalized Bayes Bayes RuleRule

• More general version conditionalized on somebackground evidence E

)|(

)|(),|(),|(

EBP

EAPEABPEBAP =

Page 32: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Full joint distributionsFull joint distributionsA complete probability model specifies every entry in the

joint distribution for all the variables X = X1, ..., Xni.e., a probability for each possible world X1= x1, ..., Xn= xn

E.g., suppose Toothache and Cavity are the randomvariables:

Possible worlds are mutually exclusive P(w1 ∧ w2) = 0Possible worlds are exhaustive w1 ∨ … ∨ wn is Truehence Σi P(wi) = 1

0.890.01Cavity=false0.060.04Cavity=true

Toothache = falseToothache=true

Page 33: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Using the full joint distributionUsing the full joint distribution

What is the unconditional probability of having a Cavity?

P(Cavity) = P(Cavity ^ Toothache) + P(Cavity^ ~Toothache) = 0.04 + 0.06 = 0.1

What is the probability of having either a cavity or a Toothache?

P(Cavity ∨ Toothache)= P(Cavity,Toothache) + P(Cavity, ~Toothache) + P(~Cavity,Toothache)= 0.04 + 0.06 + 0.01 = 0.11

0.890.01Cavity=false0.060.04Cavity=true

Toothache = falseToothache=true

Page 34: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Using the full joint distributionUsing the full joint distribution

What is the probability of having a cavity given that youalready have a toothache?

0.890.01Cavity=false0.060.04Cavity=true

Toothache = falseToothache=true

8.001.004.0

04.0

)(

)()|( =

+=

!=

ToothacheP

ToothacheCavityPToothacheCavityP

Page 35: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

NormalizationNormalizationSuppose we wish to compute a posterior distribution over random variable Agiven B=b, and suppose A has possible values a1... am

We can apply Bayes' rule for each value of A: P(A=a1|B=b) = P(B=b|A=a1)P(A=a1)/P(B=b)

... P(A=am | B=b) = P(B=b|A=am)P(A=am)/P(B=b)

Adding these up, and noting that Σi P(A=ai | B=b) = 1:P(B=b) = Σi P(B=b|A=ai)P(A=ai)

This is the normalization factor denoted α = 1/P(B=b): P(A | B=b) = α P(B=b | A)P(A)

Typically compute an unnormalized distribution, normalize at ende.g., suppose P(B=b | A)P(A) = <0.4,0.2,0.2>then P(A|B=b) = α <0.4,0.2,0.2>

= <0.4,0.2,0.2> / (0.4+0.2+0.2) = <0.5,0.25,0.25>

Page 36: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

MarginalizationMarginalization

Given a conditional distribution P(X | Z), we can create theunconditional distribution P(X) by marginalization:

P(X) = Σz P(X | Z=z) P(Z=z) = Σz P(X, Z=z)

In general, given a joint distribution over a set of variables,the distribution over any subset (called a marginaldistribution for historical reasons) can be calculated bysumming out the other variables.

Page 37: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

ConditioningConditioningIntroducing a variable as an extra condition:

P(X|Y) = Σz P(X | Y, Z=z) P(Z=z | Y)

Why is this useful??Intuition: it is often easier to assess each specific circumstance, e.g.,

P(RunOver | Cross) = P(RunOver | Cross, Light=green)P(Light=green | Cross) + P(RunOver | Cross, Light=yellow)P(Light=yellow | Cross) + P(RunOver | Cross, Light=red)P(Light=red | Cross)

Page 38: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Absolute IndependenceAbsolute Independence• Two random variables A and B are (absolutely) independent iff

P(A, B) = P(A)P(B)

• Using product rule for A & B independent, we can show:P(A, B) = P(A | B)P(B) = P(A)P(B)Therefore P(A | B) = P(A)

• If n Boolean variables are independent, the full joint is:

P(X1, …, Xn) = Πi P(Xi)Full joint is generally specified by 2n - 1 numbers, but when independent only n

numbers are needed.

• Absolute independence is a very strong requirement, seldom met!!

Page 39: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

ConditionalConditional IndependenceIndependence• Some evidence may be irrelevant, allowing simplification, e.g.,

P(Cavity | Toothache, PadresWin) = P(Cavity | Toothache) = 0.8

• This property is known as Conditional Independence and can be expressed as:P(X | Y, Z) = P(X | Z)

which says that X and Y independent given Z.

• If I have a cavity, the probability that thedentist’s probe catches in it doesn'tdepend on whether I have a toothache:1. P(Catch | Toothache, Cavity) = P(Catch | Cavity)

i.e., Catch is conditionally independent of Toothache given CavityThis doesn’t say anything about P(Catch|Toothache)

The same independence holds if I haven't got a cavity: 2. P(Catch | Toothache, ~Cavity) = P(Catch | ~Cavity)

Page 40: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Conditional independence contd.Conditional independence contd.Equivalent statements to

P(Catch | Toothache, Cavity) = P(Catch | Cavity) ( * )

1.a P(Toothache | Catch, Cavity) = P(Toothache | Cavity)

P(Toothache | Catch, Cavity) = P(Catch | Toothache, Cavity) P(Toothache | Cavity) / P(Catch | Cavity) = P(Catch | Cavity) P(Toothache | Cavity) / P(Catch | Cavity) (from *) = P(Toothache|Cavity)

1.b P(Toothache,Catch|Cavity) = P(Toothache|Cavity)P(Catch|Cavity)

P(Toothache,Catch | Cavity) = P(Toothache | Catch,Cavity) P(Catch | Cavity) (product rule) = P(Toothache | Cavity) P(Catch | Cavity) (from 1a)

Page 41: Uncertainty (Chapter 13, Russell & Norvig)e.g., P(Cavity) =P(Cavity=TRUE)= 0.1 P(Weather=sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution

CS151, Spring 2007 Modified from slides by © David Kriegman, 2001

Using Conditional IndependenceUsing Conditional Independence

Full joint distribution can now be written as P(Toothache,Catch,Cavity)

= (Toothache,Catch | Cavity) P(Cavity) = P(Toothache | Cavity)P(Catch | Cavity)P(Cavity) (from 1.b)

Specified by: 2 + 2 + 1 = 5 independent numbersCompared to 7 for general jointor 3 for unconditionally indpendent.