probability
DESCRIPTION
Probability. Tamara Berg CS 590-133 Artificial Intelligence. Many slides throughout the course adapted from Svetlana Lazebnik , Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer , Rob Pless , Killian Weinberger, Deva Ramanan. Announcements. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/1.jpg)
1
Probability
Tamara BergCS 590-133 Artificial Intelligence
Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan
![Page 2: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/2.jpg)
2
Announcements• Midterm was handed back last Thurs
– Pick up from Rohit if you were absent
• HW2 due tonight, 11:59pm– Reminder 5 free late days to use over the semester
as you like
![Page 3: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/3.jpg)
3
Where are we?• Now leaving: search, games, and planning
• Entering: probabilistic models and machine learning
![Page 4: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/4.jpg)
4
Probability: Review of main concepts
![Page 5: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/5.jpg)
5
Uncertainty• General situation:
– Evidence: Agent knows certain things about the state of the world (e.g., sensor readings or symptoms)
– Hidden variables: Agent needs to reason about other aspects (e.g. where an object is or what disease is present, or how sensor is bad.)
– Model: Agent knows something about how the known variables relate to the unknown variables
• Probabilistic reasoning gives us a framework for managing our beliefs and knowledge
![Page 6: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/6.jpg)
![Page 7: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/7.jpg)
Today• Goal:
– Modeling and using distributions over LARGE numbers of random variables
• Probability– Random Variables– Joint and Marginal Distributions– Conditional Distribution– Product Rule, Chain Rule, Bayes’ Rule– Inference– Independence
7
![Page 8: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/8.jpg)
8
Motivation: Planning under uncertainty
• Let action At = leave for airport t minutes before flight– Will At succeed, i.e., get me to the airport in time for the flight?
• Problems:• Partial observability (road state, other drivers' plans, etc.)• Noisy sensors (traffic reports)• Uncertainty in action outcomes (flat tire, etc.)• Complexity of modeling and predicting traffic
• Hence a purely logical approach either• Risks falsehood: “A25 will get me there on time,” or• Leads to conclusions that are too weak for decision making:
• A25 will get me there on time if there's no accident on the bridge and it doesn't rain and my tires remain intact, etc., etc.
• A1440 will get me there on time but I’ll have to stay overnight in the airport
![Page 9: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/9.jpg)
9
Logical Reasoning
Incorrect. Not all patients with toothaches have cavities
Too many possible causes
Example: Diagnosing a toothache
![Page 10: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/10.jpg)
10
Probability
Probabilistic assertions summarize effects of– Laziness: reluctance to enumerate exceptions,
qualifications, etc.– Ignorance: lack of explicit theories, relevant facts,
initial conditions, etc.– Intrinsically random phenomena
![Page 11: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/11.jpg)
11
Making decisions under uncertainty• Suppose the agent believes the following:
P(A25 gets me there on time) = 0.04 …P(A120 gets me there on time) = 0.95 P(A1440 gets me there on time) = 0.9999
• Which action should the agent choose?– Depends on preferences for missing flight vs. time spent waiting– Encapsulated by a utility function
• The agent should choose the action that maximizes the expected utility:
P(At succeeds) * U(At succeeds) + P(At fails) * U(At fails)
• More generally: EU(A) =
• Utility theory is used to represent and infer preferences• Decision theory = probability theory + utility theory
![Page 12: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/12.jpg)
12
Monty Hall problem• You’re a contestant on a game show. You see three closed
doors, and behind one of them is a prize. You choose one door, and the host opens one of the other doors and reveals that there is no prize behind it. Then he offers you a chance to switch to the remaining door. Should you take it?
http://en.wikipedia.org/wiki/Monty_Hall_problem
![Page 13: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/13.jpg)
13
Monty Hall problem• With probability 1/3, you picked the correct door,
and with probability 2/3, picked the wrong door. If you picked the correct door and then you switch, you lose. If you picked the wrong door and then you switch, you win the prize.
• Expected utility of switching:EU(Switch) = (1/3) * 0 + (2/3) * Prize
• Expected utility of not switching:EU(Not switch) = (1/3) * Prize + (2/3) * 0
![Page 14: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/14.jpg)
14
Where do probabilities come from?
• Frequentism– Probabilities are relative frequencies– For example, if we toss a coin many times, P(heads) is the
proportion of the time the coin will come up heads– But what if we’re dealing with events that only happen once?
• E.g., what is the probability that Team X will win the Superbowl this year?
• Subjectivism– Probabilities are degrees of belief – But then, how do we assign belief values to statements?– What would constrain agents to hold consistent beliefs?
![Page 15: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/15.jpg)
15
Kolmogorov’s Axioms of probability
• These axioms are sufficient to completely specify probability theory for discrete random variables• For continuous variables, need density functions
For any propositions (events) A and B:
![Page 16: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/16.jpg)
16
Probabilities and rationality• Why should a rational agent hold beliefs that are consistent
with axioms of probability?– For example, P(A) + P(¬A) = 1
• If an agent has some degree of belief in proposition A, he/she should be able to decide whether or not to accept a bet for/against A (De Finetti, 1931): – If the agent believes that P(A) = 0.4, should he/she agree to bet $4
that A will occur against $6 that A will not occur?
• Theorem: An agent who holds beliefs inconsistent with axioms of probability can be convinced to accept a combination of bets that is guaranteed to lose them money
![Page 17: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/17.jpg)
![Page 18: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/18.jpg)
18
Random variables• We describe the (uncertain) state of the world using
random variables Denoted by capital letters– R: Is it raining?– W: What’s the weather?– D: What is the outcome of rolling two dice?– S: What is the speed of my car (in MPH)?
• Just like variables in CSPs, random variables take on values in a domain Domain values must be mutually exclusive and exhaustive– R in {True, False}– W in {Sunny, Cloudy, Rainy, Snow}– D in {(1,1), (1,2), … (6,6)}– S in [0, 200]
![Page 19: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/19.jpg)
Probability Distributions• Unobserved random variables have distributions
• A distribution is a TABLE of probabilities of values• A probability (lower case value) is a single number
• Must have: 19
T P
warm 0.5
cold 0.5
W Psun 0.6rain 0.1fog 0.2999999999999
meteor 0.0000000000001
![Page 20: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/20.jpg)
20
Events• Probabilistic statements are defined over events, or sets
of world states “It is raining” “The weather is either cloudy or snowy” “The sum of the two dice rolls is 11” “My car is going between 30 and 50 miles per hour”
• Events are described using propositions about random variables: R = True W = “Cloudy” W = “Snowy” D {(5,6), (6,5)} 30 S 50
• Notation: P(A) is the probability of the set of world states in which proposition A holds
![Page 21: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/21.jpg)
21
Atomic events• Atomic event: a complete specification of the state of
the world, or a complete assignment of domain values to all random variables– Atomic events are mutually exclusive and exhaustive
• E.g., if the world consists of only two Boolean variables Cavity and Toothache, then there are four distinct atomic events:
Cavity = false Toothache = falseCavity = false Toothache = trueCavity = true Toothache = falseCavity = true Toothache = true
![Page 22: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/22.jpg)
22
Joint probability distributions• A joint distribution is an assignment of
probabilities to every possible atomic event
– Why does it follow from the axioms of probability that the probabilities of all possible atomic events must sum to 1?
Atomic event PCavity = false Toothache = false 0.8
Cavity = false Toothache = true 0.1
Cavity = true Toothache = false 0.05
Cavity = true Toothache = true 0.05
![Page 23: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/23.jpg)
23
Joint probability distributions
• Suppose we have a joint distribution of n random variables with domain sizes d– What is the size of the probability table?– Impossible to write out completely for all but the
smallest distributions
• Notation: P(X1 = x1, X2 = x2, …, Xn = xn) refers to a single entry (atomic event) in the joint probability distribution tableP(X1, X2, …, Xn) refers to the entire joint probability distribution table
![Page 24: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/24.jpg)
Probabilistic Models• A probabilistic model is a joint distribution
over a set of random variables
• Probabilistic models:– (Random) variables with domains Assignments
are called outcomes– Joint distributions: say whether assignments
(outcomes) are likely– Normalized: sum to 1.0– Ideally: only certain variables directly interact
• Constraint satisfaction probs:– Variables with domains– Constraints: state whether assignments are
possible– Ideally: only certain variables directly interact
T W Phot sun 0.4hot rain 0.1cold sun 0.2cold rain 0.3
T W Phot sun Thot rain Fcold sun Fcold rain T
Distribution over T,W
Constraint over T,W
![Page 25: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/25.jpg)
25
Marginal probability distributions• From the joint distribution P(X,Y) we can find the
marginal distributions P(X) and P(Y)
P(Cavity, Toothache)Cavity = false Toothache = false 0.8
Cavity = false Toothache = true 0.1
Cavity = true Toothache = false 0.05
Cavity = true Toothache = true 0.05
P(Cavity)Cavity = false ?
Cavity = true ?
P(Toothache)Toothache = false ?
Toochache = true ?
![Page 26: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/26.jpg)
26
Marginal probability distributions• From the joint distribution P(X,Y) we can find the
marginal distributions P(X) and P(Y)
• General rule: to find P(X = x), sum the probabilities of all atomic events where X = x. This is called marginalization (we are marginalizing out all the variables except X)
n
iin
n
yxPyxyxP
yYxXyYxXPxXP
11
1
),(),(),(
)()()(
![Page 27: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/27.jpg)
Inference
27
![Page 28: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/28.jpg)
Probabilistic Inference• Probabilistic inference: compute a desired probability from
other known probabilities (e.g. conditional from joint)
• We generally compute conditional probabilities – P(on time | no reported accidents) = 0.90– These represent the agent’s beliefs given the evidence
• Probabilities change with new evidence:– P(on time | no accidents, 5 a.m.) = 0.95– P(on time | no accidents, 5 a.m., raining) = 0.80– Observing new evidence causes beliefs to be updated
28
![Page 29: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/29.jpg)
29
Conditional probability• For any two events A and B,
)(),(
)()()|(
BPBAP
BPBAPBAP
P(A) P(B)
P(A B)
![Page 30: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/30.jpg)
Conditional Probabilities• A simple relation between joint and conditional probabilities
– In fact, this is taken as the definition of a conditional probability
T W Phot sun 0.4hot rain 0.1cold sun 0.2cold rain 0.3
30
![Page 31: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/31.jpg)
31
Conditional probability
• What is P(Cavity = true | Toothache = false)?0.05 / 0.85 = 0.059
• What is P(Cavity = false | Toothache = true)?0.1 / 0.15 = 0.667
P(Cavity, Toothache)Cavity = false Toothache = false 0.8
Cavity = false Toothache = true 0.1
Cavity = true Toothache = false 0.05
Cavity = true Toothache = true 0.05
P(Cavity)Cavity = false 0.9
Cavity = true 0.1
P(Toothache)Toothache = false 0.85
Toothache = true 0.15
![Page 32: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/32.jpg)
32
Conditional distributions• A conditional distribution is a distribution over the values
of one variable given fixed values of other variablesP(Cavity, Toothache)Cavity = false Toothache = false 0.8
Cavity = false Toothache = true 0.1
Cavity = true Toothache = false 0.05
Cavity = true Toothache = true 0.05
P(Cavity | Toothache = true)Cavity = false 0.667
Cavity = true 0.333
P(Cavity|Toothache = false)Cavity = false 0.941
Cavity = true 0.059
P(Toothache | Cavity = true)Toothache= false 0.5
Toothache = true 0.5
P(Toothache | Cavity = false)Toothache= false 0.889
Toothache = true 0.111
![Page 33: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/33.jpg)
33
Normalization trick• To get the whole conditional distribution P(X | Y = y)
at once, select all entries in the joint distribution table matching Y = y and renormalize them to sum to one
P(Cavity, Toothache)Cavity = false Toothache = false 0.8
Cavity = false Toothache = true 0.1
Cavity = true Toothache = false 0.05
Cavity = true Toothache = true 0.05
Toothache, Cavity = falseToothache= false 0.8
Toothache = true 0.1
P(Toothache | Cavity = false)Toothache= false 0.889
Toothache = true 0.111
Select
Renormalize
![Page 34: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/34.jpg)
34
Normalization trick• To get the whole conditional distribution P(X | Y = y)
at once, select all entries in the joint distribution table matching Y = y and renormalize them to sum to one
• Why does it work?
)(),(
),(),(
yPyxP
yxPyxP
x
by marginalization
![Page 35: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/35.jpg)
Inference by Enumeration• P(W=sun)?
• P(W=sun | S=winter)?
• P(W=sun | S=winter, T=hot)?
S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
35
![Page 36: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/36.jpg)
37
Product rule• Definition of conditional probability:
• Sometimes we have the conditional probability and want to obtain the joint:
)(),()|(
BPBAPBAP
)()|()()|(),( APABPBPBAPBAP
![Page 37: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/37.jpg)
The Product Rule• Sometimes have conditional distributions but want the joint
• Example with new weather condition, “dryness”.
W P
sun 0.8
rain 0.2
D W P
wet sun 0.1
dry sun 0.9wet rain 0.7
dry rain 0.3
D W P
wet sun 0.08
dry sun 0.72wet rain 0.14
dry rain 0.0638
![Page 38: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/38.jpg)
39
Product rule• Definition of conditional probability:
• Sometimes we have the conditional probability and want to obtain the joint:
• The chain rule:
)(),()|(
BPBAPBAP
)()|()()|(),( APABPBPBAPBAP
n
iii
nnn
AAAP
AAAPAAAPAAPAPAAP
111
112131211
),,|(
),,|(),|()|()(),,(
![Page 39: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/39.jpg)
40
The Birthday problem• We have a set of n people. What is the probability that
two of them share the same birthday?• Easier to calculate the probability that n people do not
share the same birthday
)distinct ,( )distinct ,|, fromdistinct (
)distinct ,(
11
1111
1
n
nnn
n
BBPBBBBBP
BBP
n
iiii BBBBBP
11111 )distinct ,|, fromdistinct (
![Page 40: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/40.jpg)
41
The Birthday problem
distinct) ,,|,, fromdistinct ( 1111 iii BBBBBP
3651365
365364
365365distinct) ,,( 1
nBBP n
3651365
365364
3653651distinct)not ,,( 1
nBBP n
n
iiii
n
BBBBBP
BBP
11111
1
)distinct ,|, fromdistinct (
)distinct ,(
3651365 i
![Page 41: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/41.jpg)
42
The Birthday problem• For 23 people, the probability of sharing a
birthday is above 0.5!
http://en.wikipedia.org/wiki/Birthday_problem
![Page 42: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/42.jpg)
44
Product Rule
![Page 43: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/43.jpg)
Bayes’ Rule• Two ways to factor a joint distribution over two variables:
• Dividing, we get:
• Why is this at all helpful?– Lets us build one conditional from its reverse– Often one conditional is tricky but the other one is simple– Foundation of many systems we’ll see later (e.g. Automated
Speech Recognition, Machine Translation)
• In the running for most important AI equation!
That’s my rule!
45
![Page 44: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/44.jpg)
Inference with Bayes’ Rule• Example: Diagnostic probability from causal probability:
• Example:– m is meningitis, s is stiff neck
– Note: posterior probability of meningitis still very small– Note: you should still get stiff necks checked out! Why?
Examplegivens
46
![Page 45: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/45.jpg)
47
Independence• Two events A and B are independent if and only if
P(A B) = P(A) P(B)– In other words, P(A | B) = P(A) and P(B | A) = P(B)– This is an important simplifying assumption for
modeling, e.g., Toothache and Weather can be assumed to be independent
• Are two mutually exclusive events independent?– No, but for mutually exclusive events we have
P(A B) = P(A) + P(B)
• Conditional independence: A and B are conditionally independent given C iff P(A B | C) = P(A | C) P(B | C)
![Page 46: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/46.jpg)
48
Conditional independence: Example
• Toothache: boolean variable indicating whether the patient has a toothache
• Cavity: boolean variable indicating whether the patient has a cavity• Catch: whether the dentist’s probe catches in the cavity
• If the patient has a cavity, the probability that the probe catches in it doesn't depend on whether he/she has a toothacheP(Catch | Toothache, Cavity) = P(Catch | Cavity)
• Therefore, Catch is conditionally independent of Toothache given Cavity• Likewise, Toothache is conditionally independent of Catch given Cavity
P(Toothache | Catch, Cavity) = P(Toothache | Cavity)• Equivalent statement:• P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
![Page 47: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/47.jpg)
49
Conditional independence: Example
• How many numbers do we need to represent the joint probability table P(Toothache, Cavity, Catch)? 23 – 1 = 7 independent entries
• Write out the joint distribution using chain rule:P(Toothache, Catch, Cavity)
= P(Cavity) P(Catch | Cavity) P(Toothache | Catch, Cavity) = P(Cavity) P(Catch | Cavity) P(Toothache | Cavity)
• How many numbers do we need to represent these distributions? 1 + 2 + 2 = 5 independent numbers
• In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n
![Page 48: Probability](https://reader035.vdocuments.site/reader035/viewer/2022062520/56816562550346895dd7e513/html5/thumbnails/48.jpg)
Summary
• Probabilistic model:– All we need is joint probability table (JPT)– Can perform inference by enumeration– From JPT we can obtain CPT and marginals – (Can obtain JPT from CPT and marginals)– Bayes Rule can perform inference directly
from CPT and marginals• Next several lectures:
– How to avoid storing the entire JPT50