reasoning under uncertainty
DESCRIPTION
Artificial Intelligence CSPP 56553 February 18, 2004. Reasoning Under Uncertainty. Agenda. Motivation Reasoning with uncertainty Medical Informatics Probability and Bayes’ Rule Bayesian Networks Noisy-Or Decision Trees and Rationality Conclusions. Uncertainty. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/1.jpg)
Reasoning Under Uncertainty
Artificial Intelligence
CSPP 56553
February 18, 2004
![Page 2: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/2.jpg)
Agenda
• Motivation– Reasoning with uncertainty
• Medical Informatics
• Probability and Bayes’ Rule– Bayesian Networks– Noisy-Or
• Decision Trees and Rationality• Conclusions
![Page 3: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/3.jpg)
Uncertainty
• Search and Planning Agents– Assume fully observable, deterministic, static
• Real World: – Probabilities capture “Ignorance & Laziness”
• Lack relevant facts, conditions
• Failure to enumerate all conditions, exceptions
– Partially observable, stochastic, extremely complex
– Can't be sure of success, agent will maximize
– Bayesian (subjective) probabilities relate to knowledge
![Page 4: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/4.jpg)
Motivation
• Uncertainty in medical diagnosis– Diseases produce symptoms– In diagnosis, observed symptoms => disease ID– Uncertainties
• Symptoms may not occur• Symptoms may not be reported• Diagnostic tests not perfect
– False positive, false negative
• How do we estimate confidence?
![Page 5: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/5.jpg)
Motivation II
• Uncertainty in medical decision-making– Physicians, patients must decide on treatments– Treatments may not be successful– Treatments may have unpleasant side effects
• Choosing treatments– Weigh risks of adverse outcomes
• People are BAD at reasoning intuitively about probabilities– Provide systematic analysis
![Page 6: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/6.jpg)
Probability Basics
• The sample space:– A set Ω ={ω1, ω2, ω3,… ωn}
• E.g 6 possible rolls of die; • ωi is a sample point/atomic event
• Probability space/model is a sample space with an assignment P(ω) for every ω in Ω s.t. 0<= P(ω)<=1; Σ ωP(ω) = 1– E.g. P(die roll < 4)=1/6+1/6+1/6=1/2
![Page 7: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/7.jpg)
Random Variables
• A random variable is a function from sample points to a range (e.g. reals, bools)
• E.g. Odd(1) = true
• P induces a probability distribution for any r.v X:– P(X=xi) = Σ{ω:X(ω)=xi}P(ω)
– E.g. P(Odd=true)=1/6+1/6+1/6=1/2
• Proposition is event (set of sample pts) s.t. proposition is true: e.g. event a= A(ω)=true
![Page 8: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/8.jpg)
Why probabilities?
• Definitions imply that logically related events have related probabilities
• In AI applications, sample points are defined by set of random variables– Random vars: boolean, discrete, continuous
![Page 9: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/9.jpg)
Prior Probabilities
• Prior probabilities: belief prior to evidence– E.g. P(cavity=t)=0.2; P(weather=sunny)=0.6
– Distribution gives values for all assignments
• Joint distribution on set of r.v.s gives probability on every atomic event of r.v.s– E.g. P(weather,cavity)=4x2 matrix of values
• Every question about a domain can be answered with joint b/c every event is a sum of sample pts
![Page 10: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/10.jpg)
Conditional Probabilities
• Conditional (posterior) probabilities– E.g. P(cavity|toothache) = 0.8, given only that– P(cavity|toothache)=2 elt vector of 2 elt vectors
• Can add new evidence, possibly irrelevant
• P(a|b) = P(a,b)/P(b) where P(b) ≠0
• Also, P(a,b)=P(a|b)P(b)=P(b|a)P(a)– Product rule generalizes to chaining
![Page 11: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/11.jpg)
Inference By Enumeration
![Page 12: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/12.jpg)
Inference by Enumeration
![Page 13: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/13.jpg)
Inference by Enumeration
![Page 14: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/14.jpg)
Independence
![Page 15: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/15.jpg)
Conditional Independence
![Page 16: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/16.jpg)
Conditional Independence II
![Page 17: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/17.jpg)
Probabilities Model Uncertainty
• The World - Features– Random variables– Feature values
• States of the world– Assignments of values to variables
– Exponential in # of variables– possible states
},...,,{ 21 nXXX}...,,{ ,21 iikii xxx
n
iik
1
nik 2;2
![Page 18: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/18.jpg)
Probabilities of World States
• : Joint probability of assignments– States are distinct and exhaustive
• Typically care about SUBSET of assignments– aka “Circumstance”
– Exponential in # of don’t cares
}),,,({),( 43},{ },{
2142 fXvXtXuXPfXtXPftu ftv
)( iSP
)(1
1
n
i ik
jjSP
![Page 19: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/19.jpg)
A Simpler World
• 2^n world states = Maximum entropy– Know nothing about the world
• Many variables independent– P(strep,ebola) = P(strep)P(ebola)
• Conditionally independent– Depend on same factors but not on each other– P(fever,cough|flu) = P(fever|flu)P(cough|flu)
![Page 20: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/20.jpg)
Probabilistic Diagnosis
• Question:– How likely is a patient to have a disease if they have
the symptoms?
• Probabilistic Model: Bayes’ Rule• P(D|S) = P(S|D)P(D)/P(S)
– Where• P(S|D) : Probability of symptom given disease• P(D): Prior probability of having disease• P(S): Prior probability of having symptom
![Page 21: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/21.jpg)
Diagnosis
• Consider Meningitis:– Disease: Meningitis: m– Symptom: Stiff neck: s– P(s|m) = 0.5– P(m) =0.0001– P(s) = 0.1– How likely is it that someone with a stiff neck
actually has meningitis?
![Page 22: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/22.jpg)
Modeling (In)dependence
• Simple, graphical notation for conditional independence; compact spec of joint
• Bayesian network– Nodes = Variables– Directed acyclic graph: link ~ directly influences– Arcs = Child depends on parent(s)
• No arcs = independent (0 incoming: only a priori)• Parents of X = • For each X need
)(X))(|( XXP
![Page 23: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/23.jpg)
Example I
![Page 24: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/24.jpg)
Simple Bayesian Network
• MCBN1
A
B C
D E
A = only a prioriB depends on AC depends on AD depends on B,CE depends on C
Need:P(A)P(B|A)P(C|A)P(D|B,C)P(E|C)
Truth table22*22*22*2*22*2
![Page 25: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/25.jpg)
Simplifying with Noisy-OR
• How many computations? – p = # parents; k = # values for variable– (k-1)k^p– Very expensive! 10 binary parents=2^10=1024
• Reduce computation by simplifying model– Treat each parent as possible independent cause– Only 11 computations
• 10 causal probabilities + “leak” probability– “Some other cause”
![Page 26: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/26.jpg)
Noisy-OR Example
A B
P(b|a) b b
a
a
0.6 0.4
0.5 0.5
2.05.0/4.01
)1/(4.0)1(
4.0)1)(1(
6.0)1)(1(1)|(
1)|(
5.0)1(1)|(
)1)(1()|(
)1)(1(1)|(
a
a
a
an
n
n
an
an
c
Lc
Lc
LcabP
LabP
LLabP
LcabP
LcabP
![Page 27: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/27.jpg)
Noisy-OR Example II
55.0
)1(665.03.0
)1(63.0)1(035.03.0
9.0)1)(1(1.0)1)(1)(1(7.01
)()|()()|()|(
3.0)1(1)|(
)1)(1(1)|(
)1)(1(1)|(
)1)(1)(1(1)|(
b
b
bb
bba
nnnnn
n
an
bn
ban
c
c
cc
LcLcc
aPbadPaPabdPbdP
LLbadP
LcbadP
LcbadP
LccabdP
A B
D
Full model: P(d|ab)P(d|ab)P(d|ab)P(d|ab) & neg
Assume:
P(a)=0.1
P(b)=0.05
P(d|ab)=0.3
= 0.5
P(d|b) = 0.7
ac
![Page 28: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/28.jpg)
Graph Models
• Bipartite graphs– E.g. medical reasoning– Generally, diseases cause symptom (not reverse)
d1
d2
d3
d4
s1
s2
s3
s4
s5
s6
![Page 29: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/29.jpg)
Topologies
• Generally more complex– Polytree: One path between any two nodes
• General Bayes Nets– Graphs with undirected cycles
• No directed cycles - can’t be own cause
• Issue: Automatic net acquisition– Update probabilities by observing data– Learn topology: use statistical evidence of indep,
heuristic search to find most probable structure
![Page 30: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/30.jpg)
Holmes Example (Pearl)
Holmes is worried that his house will be burgled. Forthe time period of interest, there is a 10^-4 a priori chanceof this happening, and Holmes has installed a burglar alarmto try to forestall this event. The alarm is 95% reliable insounding when a burglary happens, but also has a false positive rate of 1%. Holmes’ neighbor, Watson, is 90% sure to call Holmes at his office if the alarm sounds, but he is alsoa bit of a practical joker and, knowing Holmes’ concern, might (30%) call even if the alarm is silent. Holmes’ otherneighbor Mrs. Gibbons is a well-known lush and often befuddled, but Holmes believes that she is four times morelikely to call him if there is an alarm than not.
![Page 31: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/31.jpg)
Holmes Example: Model
There a four binary random variables:B: whether Holmes’ house has been burgledA: whether his alarm soundedW: whether Watson calledG: whether Gibbons called
B A
W
G
![Page 32: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/32.jpg)
Holmes Example: Tables
B = #t B=#f
0.0001 0.9999
A=#t A=#fB
#t#f
0.95 0.05 0.01 0.99
W=#t W=#fA
#t#f
0.90 0.10 0.30 0.70
G=#t G=#fA
#t#f
0.40 0.60 0.10 0.90
![Page 33: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/33.jpg)
Decision Making
• Design model of rational decision making– Maximize expected value among alternatives
• Uncertainty from– Outcomes of actions– Choices taken
• To maximize outcome– Select maximum over choices– Weighted average value of chance outcomes
![Page 34: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/34.jpg)
Gangrene Example
Medicine Amputate foot
Live 0.99 Die 0.01
850 0
Die 0.05 0
Full Recovery 0.7 1000
Worse 0.25
Medicine Amputate leg
Die 0.4 0
Live 0.6 995
Die 0.02 0
Live 0.98 700
![Page 35: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/35.jpg)
Decision Tree Issues
• Problem 1: Tree size– k activities : 2^k orders
• Solution 1: Hill-climbing– Choose best apparent choice after one step
• Use entropy reduction
• Problem 2: Utility values– Difficult to estimate, Sensitivity, Duration
• Change value depending on phrasing of question
• Solution 2c: Model effect of outcome over lifetime
![Page 36: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/36.jpg)
Conclusion
• Reasoning with uncertainty– Many real systems uncertain - e.g. medical
diagnosis
• Bayes’ Nets– Model (in)dependence relations in reasoning– Noisy-OR simplifies model/computation
• Assumes causes independent
• Decision Trees– Model rational decision making
• Maximize outcome: Max choice, average outcomes
![Page 37: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/37.jpg)
Bayesian Spam Filtering
• Automatic Text Categorization
• Probabilistic Classifier– Conditional Framework– Naïve Bayes Formulation
• Independence assumptions galore
– Feature Selection– Classification & Evaluation
![Page 38: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/38.jpg)
Spam Classification
• Text categorization problem– Given a message,M, is it Spam or NotSpam?
• Probabilistic framework– P(Spam|M)> P(NotSpam|M)
• P(Spam|M)=P(Spam,M)P(M)• P(NotSpam|M)=P(NotSpam,M)P(M)
– Which is more likely?
![Page 39: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/39.jpg)
Characterizing a Message
• Represent message M as set of features– Features: a1,a2,….an
• What features?– Words! (again)
– Alternatively (skip) n-gram sequences
• Stemmed (?)• Term frequencies: N(W, Spam); N(W,NotSpam)
– Also, N(Spam),N(NotSpam): # of words in each class
![Page 40: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/40.jpg)
Characterizing a Message II
• Estimating term conditional probabilities
• Selecting good features:– Exclude terms s.t.
• N(W|Spam)+N(W|NotSpam)<4
• 0.45 <=P(W|Spam)/P(W|Spam)+P(W|NotSpam)<=0.55
1)(
1),(
)|(
CNK
CWNCWP
![Page 41: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/41.jpg)
Naïve Bayes Formulation
• Naïve Bayes (aka “Idiot” Bayes)– Assumes all features independent
• Not accurate but useful simplification
• So,– P(M,Spam)=P(a1,a2,..,an,Spam)– = P(a1,a2,..,an|Spam)P(Spam)– =P(a1|Spam)..P(an|Spam)P(Spam)– Likewise for NotSpam
![Page 42: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/42.jpg)
Experimentation (Pantel & Lin)
• Training: 160 spam, 466 non-spam
• Test: 277 spam, 346 non-spam
• 230,449 training words; 60434 spam– 12228 terms; filtering reduces to 3848
![Page 43: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/43.jpg)
Results (PL)
• False positives: 1.16%
• False negatives: 8.3%
• Overall error: 4.33%
• Simple approach, effective
![Page 44: Reasoning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062408/56813af1550346895da36819/html5/thumbnails/44.jpg)
Variants
• Features?
• Model?– Explicit bias to certain error types
• Address lists
• Explicit rules