dealing with uncertainty (1) week 5 chapter 3. introduction the world is not a well-defined place....
DESCRIPTION
Sources of Uncertainty Uncertain data – missing data, unreliable, ambiguous, imprecise representation, inconsistent, subjective, derived from defaults, noisy… Uncertain knowledge – Multiple causes lead to multiple effects – Incomplete knowledge of causality in the domain – Probabilistic/stochastic effects Uncertain knowledge representation – restricted model of the real system – limited expressiveness of the representation mechanism inference process – Derived result is formally correct, but wrong in the real world – New conclusions are not well-founded (eg, inductive reasoning) – Incomplete, default reasoning methods 3TRANSCRIPT
DEALING WITH UNCERTAINTY(1)
WEEK 5
CHAPTER 3
Introduction
• The world is not a well-defined place.• There is uncertainty in the facts we know:
– What’s the temperature? Imprecise measures– Is X a good president? Imprecise definitions– Where are the road pits? Imprecise knowledge
• There is uncertainty in our inferences– If I have red scars a itchy rash and was gardening all
weekend I have poison ivy• People make successful decisions all the time
anyhow.
2
probably
Sources of Uncertainty
• Uncertain data– missing data, unreliable, ambiguous, imprecise representation, inconsistent,
subjective, derived from defaults, noisy…• Uncertain knowledge
– Multiple causes lead to multiple effects– Incomplete knowledge of causality in the domain– Probabilistic/stochastic effects
• Uncertain knowledge representation– restricted model of the real system– limited expressiveness of the representation mechanism
• inference process– Derived result is formally correct, but wrong in the real world– New conclusions are not well-founded (eg, inductive reasoning)– Incomplete, default reasoning methods
3
Reasoning Under Uncertainty
• So how do we do reasoning under uncertainty and with inexact knowledge?– heuristics
• ways to mimic heuristic knowledge processing methods used by experts ( limit the search for solution)
– empirical associations• experiential reasoning and based on limited observations• Verifiable or provable by means of observation or experiment.• Guided by practical experience and not theory, as in medicine.
– probabilities• objective (frequency counting)• subjective (human experience )
4
Decision making with uncertainty
• Rational behavior:– For each possible action, identify the possible
outcomes– Compute the probability of each outcome– Compute the utility of each outcome– Compute the probability-weighted (expected)
utility over possible outcomes for each action– Select the action with the highest expected utility
(principle of Maximum Expected Utility)
5
Some Relevant Factors
• expressiveness– can concepts used by humans be represented adequately?– can the confidence of experts in their decisions be expressed?
• comprehensibility– representation of uncertainty– utilization in reasoning methods
• correctness– probabilities– relevance ranking– long inference chains
• computational complexity– feasibility of calculations for practical purposes
• reproducibility– Do the observations deliver the same results when repeated?
6
Basic Probability
• Probability theory enables us to make rational decisions.
• Which mode of transportation is safer ( more safety):– Car or Plane?– What is the probability of an accident?
Basic Probability Theory
• An experiment has a set of potential outcomes, e.g., throw a dice
• The sample space of an experiment is the set of all possible outcomes, e.g., {1, 2, 3, 4, 5, 6}.
• An event is a subset of the sample space.– {2}– {3, 6}– even = {2, 4, 6}– odd = {1, 3, 5}
Probability as Relative Frequency
• An event has a probability.
• Consider a long sequence of experiments. If we look at the number of times a particular event occurs in that sequence, and compare it to the total number of experiments,
we can compute a ratio.
• This ratio is one way of estimating the probability of the event.
• P(E) = (# of times E occurred)/(total # of trials)
• Example– 100 attempts are made to swim a length in 30 secs. The
swimmer succeeds on 20 occasions (tries); therefore the probability that a swimmer can complete the length in 30 secs is:
• 20/100 = 0.2• Failure = (1- 0.2) or 0.8
• The experiments, the sample space and the events must be defined clearly for probability to be meaningful
– What is the probability of an accident?
Theoretical Probability
• Principle of Indifference - Alternatives are always to be judged probabley if we have no reason to expect or prefer one over the other.
• Each outcome in the sample space is assigned equal probability.
• Example: throw a dice– P({1})=P({2})= ... =P({6})=1/6
Law of Large Numbers
• As the number of experiments increases the relative frequency of an event more closely approximates the theoretical probability of the event.
– if the theoretical assumptions hold.
• Buffon’s Needle for Computing π– Draw parallel lines 1 inch apart on a plane– Throw a 1-inch needle on the plane– P( needle crossing a line )=2/π
number of throws 2 number of crossings
Large Number Reveals Untruth in Assumptions
Results of 1,000,000 throws of a diceNumber 1 2 3 4 5 6Fraction .155 .159 .164 .169 .174 .179
Why ?
Axioms of Probability Theory
• Suppose P(.) is a probability function, then1. for any event E, 0≤P(E) ≤1. …..How ?
2. P(S) = 1, where S is the sample space.
3. for any two mutually exclusive events E1 and E2,
P(E1 È E2) = P(E1) + P(E2)
• Any function that satisfies the above three axioms is a probability function.
Joint Probability• Let A, B be two events, the joint probability of both A and B
being true is denoted by P(A, B).
• Example:
P(spade) is the probability of the top card being a spade.
P(king) is the probability of the top card being a king.
P(spade, king) is the probability of the top card being both a spade and a king, i.e., the king of spade.
P(king, spade)=P(spade, king) ???
Properties of Probability1. P(ØE) = 1– P(E)
2. If E1 and E2 are logically equivalent, then
P(E1)=P(E2). – E1: Not all philosophers are more than six feet tall.– E2: Some philosopher is not more that six feet tall.
Then P(E1)=P(E2).
3. P(E1, E2)≤P(E1).
Conditional Probability• The probability of an event may change after
knowing another event. The probability of A given B is denoted by P(A|B).
• Example– P( W=space ) the probability of a randomly
selected word from an English text is ‘space’– P( W=space | W’=outer) the probability of
‘space’ if the previous word is ‘outer’
ExampleA:the top card of a deck of poker cards is a king of spadeP(A) = 1/52
However, if we knowB: the top card is a king
then, the probability of A given B is true isP(A|B) = 1/4.
How to Compute P(A|B)?
Business StudentsOf 100 students completing a course, 20 were business major. Ten students received A in the course, and three of these were business majors.,
suppose A is the event that a randomly selected student got an A in the course, B is the event that a randomly selected event is a business major. What is the probability of A? What is the probability of A after knowing B is true?
3 7 80
20
A
B not B
Probabilistic Reasoning
• Evidence– What we know about a situation.
• Hypothesis– What we want to conclude.
• Compute– P( Hypothesis | Evidence )
Credit Card Authorization• E is the data about the applicant's age, job,
education, income, credit history, etc,
• H is the hypothesis that the credit card will provide positive return.
• The decision of whether to issue the credit card to the applicant is based on the probability P(H|E).
Medical Diagnosis
• E is a set of symptoms, such as, coughing, sneezing, headache, ...
• H is a disorder, e.g., common cold, SARS, flu.
• The diagnosis problem is to find an H (disorder) such that P(H|E) is maximum.
Basics of Probability Theory• mathematical approach for processing uncertain information
– sample space setX = {x1, x2, …, xn}• collection of all possible events• can be discrete or continuous
– probability number P(xi): likelihood of an event xi to occur• non-negative value in [0,1]• total probability of the sample space is 1• for mutually exclusive events, the probability for at least one of them
is the sum of their individual probabilities• experimental probability
– based on the frequency of events• subjective probability
– based on expert assessment
24
Compound Probabilities• describes independent events
– do not affect each other in any way• joint probability of two independent events A
and BP(A B) = P(A) * P (B)
• union probability of two independent events A and BP(A È B) = P(A) + P(B) - P(A B)
=P(A) + P(B) - P(A) * P (B)
25
Probability theory• Random variables
– Domain
• Atomic event: complete specification of state
• Prior probability: degree of belief without any other evidence
• Joint probability: matrix of combined probabilities of a set of variables
• Alarm, Burglary, Earthquake– Boolean (like these), discrete,
continuous• Alarm=True Burglary=True
Earthquake=Falsealarm burglary earthquake
• P(Burglary) = .1
• P(Alarm, Burglary) =
26
alarm ¬alarm
burglary .09 .01
¬burglary .1 .8
Independence• When two sets of propositions do not affect each others’ probabilities,
we call them independent, and can easily compute their joint and conditional probability:– Independent (A, B) if P(A B) = P(A) P(B), P(A | B) = P(A)
• For example, {moon-phase, light-level} might be independent of {burglary, alarm, earthquake}– Then again, it might not: Burglars might be more likely to burglarize
houses when there’s a new moon (and hence little light)– But if we know the light level, the moon phase doesn’t affect whether
we are burglarized– Once we’re burglarized, light level doesn’t affect whether the alarm
goes off• We need a more complex notion of independence, and methods for
reasoning about these kinds of relationships
27
Conditional independence• Absolute independence:
– A and B are independent if P(A B) = P(A) P(B); equivalently, P(A) = P(A | B) and P(B) = P(B | A)
• A and B are conditionally independent given C if– P(A B | C) = P(A | C) P(B | C)
• This lets us decompose the joint distribution:– P(A B C) = P(A | C) P(B | C) P(C)
• Moon-Phase and Burglary are conditionally independent given Light-Level
• Conditional independence is weaker than absolute independence, but still useful in decomposing the full joint probability distribution
28
Conditional Probabilities
• describes dependent events– affect each other in some way
• conditional probability of event a given that event B has already occurredP(A|B) = P(A B) / P(B)
29
Q & A
30