6.2 probability theory longin jan latecki temple university

Module #19 – Probability

6.2 Probability Theory

Longin Jan LateckiTemple University

Slides for a Course Based on the TextSlides for a Course Based on the TextDiscrete Mathematics & Its ApplicationsDiscrete Mathematics & Its Applications (6 (6thth

Edition) Kenneth H. Rosen Edition) Kenneth H. Rosen based on slides by based on slides by

Michael P. Frank and Andrew W. MooreMichael P. Frank and Andrew W. Moore

Terminology• A (stochastic) A (stochastic) experimentexperiment is a procedure that is a procedure that

yields one of a given set of possible outcomesyields one of a given set of possible outcomes• The The sample spacesample space SS of the experiment is the of the experiment is the

set of possible outcomes.set of possible outcomes.• An event is a An event is a subsetsubset of sample space. of sample space.• A random variable is a function that assigns a A random variable is a function that assigns a

real value to each outcome of an experimentreal value to each outcome of an experimentNormally, a probability is related to an experiment or a trial.

Let’s take flipping a coin for example, what are the possible outcomes?

Heads or tails (front or back side) of the coin will be shown upwards.After a sufficient number of tossing, we can “statistically” concludethat the probability of head is 0.5.In rolling a dice, there are 6 outcomes. Suppose we want to calculate the prob. of the event of odd numbers of a dice. What is that probability?

Random Variables

• A A “random variable”“random variable” VV is any variable whose is any variable whose value is unknown, or whose value depends on value is unknown, or whose value depends on the precise situation.the precise situation.– E.g.E.g., the number of students in class today, the number of students in class today– Whether it will rain tonight (Boolean variable)Whether it will rain tonight (Boolean variable)

• The proposition The proposition VV==vvii may have an uncertain may have an uncertain

truth value, and may be assigned a truth value, and may be assigned a probability.probability.

Module #19 – ProbabilityExample 10

• A fair coin is flipped 3 times. Let A fair coin is flipped 3 times. Let SS be the sample space of 8 be the sample space of 8 possible outcomes, and let possible outcomes, and let XX be a random variable that assignees be a random variable that assignees to an outcome the number of heads in this outcome. to an outcome the number of heads in this outcome.

• Random variable Random variable X X is a functionis a function XX::SS → → XX((SS), ), where where XX((SS)={0, 1, 2, 3} is the range of )={0, 1, 2, 3} is the range of X, X, which is the number the number of heads, andof heads, andSS={ ={ (TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT), (HTH)(TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT), (HTH) } }

• X(TTT) = 0 X(TTT) = 0 X(TTH) = X(HTT) = X(THT) = 1X(TTH) = X(HTT) = X(THT) = 1X(HHT) = X(THH) = X(HTH) = 2X(HHT) = X(THH) = X(HTH) = 2X(HHH) = 3X(HHH) = 3

• The The probability distribution (pdf) of random variableprobability distribution (pdf) of random variable XX is given by is given by P(X=3) = 1/8, P(X=2) = 3/8, P(X=1) = 3/8, P(X=0) = 1/8. P(X=3) = 1/8, P(X=2) = 3/8, P(X=1) = 3/8, P(X=0) = 1/8.

Experiments & Sample Spaces

• A (stochastic) A (stochastic) experimentexperiment is any process by which is any process by which a given random variable a given random variable VV gets assigned some gets assigned some particularparticular value, and where this value is not value, and where this value is not necessarily known in advance.necessarily known in advance.– We call it the “actual” value of the variable, as We call it the “actual” value of the variable, as

determined by that particular experiment.determined by that particular experiment.• The The sample spacesample space SS of the experiment is just of the experiment is just

the domain of the random variable, the domain of the random variable, SS = dom[ = dom[VV]]..• The The outcomeoutcome of the experiment is the specific of the experiment is the specific

value value vvii of the random variable that is selected. of the random variable that is selected.

Events

• An An eventevent EE is any set of possible outcomes in is any set of possible outcomes in SS……– That is, That is, EE SS = = domdom[[VV]]..

• E.g., the event that “less than 50 people show up for our next E.g., the event that “less than 50 people show up for our next class” is represented as the set class” is represented as the set {1, 2, …, 49}{1, 2, …, 49} of values of the of values of the variable variable VV = (# of people here next class) = (# of people here next class)..

• We say that event We say that event EE occursoccurs when the actual value when the actual value of of VV is in is in EE, which may be written , which may be written VVEE..– Note that Note that VVEE denotes the proposition (of uncertain denotes the proposition (of uncertain

truth) asserting that the actual outcome (value of truth) asserting that the actual outcome (value of VV) will ) will be one of the outcomes in the set be one of the outcomes in the set EE..

Probabilities

• We write P(A) as “the fraction of possible We write P(A) as “the fraction of possible worlds in which A is true”worlds in which A is true”

• We could at this point spend 2 hours on the We could at this point spend 2 hours on the philosophy of this.philosophy of this.

• But we won’t.But we won’t.

Visualizing A

Event space of all possible worlds

Its area is 1Worlds in which A is False

Worlds in which A is true

P(A) = Area ofreddish oval

Probability

• The The probabilityprobability p p = Pr[= Pr[EE] ] [0,1] [0,1] of an event of an event EE is is a real number representing our degree of certainty a real number representing our degree of certainty that that EE will occur. will occur.– If If Pr[Pr[EE] = 1] = 1, then , then EE is absolutely certain to occur, is absolutely certain to occur,

• thus thus VVEE has the truth value has the truth value TrueTrue..– If If Pr[Pr[EE] = 0] = 0, then , then EE is absolutely certain is absolutely certain not not to occur,to occur,

• thus thus VVEE has the truth value has the truth value FalseFalse..– If If Pr[Pr[EE] = ] = ½½, then we are , then we are maximally uncertainmaximally uncertain about about

whether whether EE will occur; that is, will occur; that is, • VVEE and and VVEE are considered are considered equally likelyequally likely..

– How do we interpret other values of How do we interpret other values of pp??Note: We could also define probabilities for more general propositions, as well as events.

Four Definitions of Probability

• Several alternative definitions of probability Several alternative definitions of probability are commonly encountered:are commonly encountered:– Frequentist, Bayesian, Laplacian, AxiomaticFrequentist, Bayesian, Laplacian, Axiomatic

• They have different strengths & They have different strengths & weaknesses, philosophically speaking.weaknesses, philosophically speaking.– But fortunately, they coincide with each other But fortunately, they coincide with each other

and work well together, in the majority of cases and work well together, in the majority of cases that are typically encountered.that are typically encountered.

Probability: Frequentist Definition

• The probability of an event The probability of an event EE is the limit, as is the limit, as nn→∞→∞,, of of the fraction of times that we find the fraction of times that we find VVEE over the course over the course of of nn independent repetitions of (different instances of) independent repetitions of (different instances of) the same experiment.the same experiment.

• Some problems with this definition: Some problems with this definition: – It is only well-defined for experiments that can be It is only well-defined for experiments that can be

independently repeated, infinitely many times! independently repeated, infinitely many times! • or at least, if the experiment can be repeated in principle, or at least, if the experiment can be repeated in principle, e.g.e.g., over , over

some hypothetical ensemble of (say) alternate universes.some hypothetical ensemble of (say) alternate universes.

– It can never be measured exactly in finite time!It can never be measured exactly in finite time!

• Advantage:Advantage: It’s an objective, mathematical definition. It’s an objective, mathematical definition.

lim:]Pr[

Probability: Bayesian Definition• Suppose a rational, profit-maximizing entity Suppose a rational, profit-maximizing entity RR is is

offered a choice between two rewards:offered a choice between two rewards:– Winning Winning $1$1 if and only if the event if and only if the event EE actually occurs. actually occurs.– Receiving Receiving pp dollars (where dollars (where pp[0,1][0,1]) unconditionally.) unconditionally.

• If If RR can honestly state that he is completely can honestly state that he is completely indifferent between these two rewards, then we say indifferent between these two rewards, then we say that that RR’s probability for ’s probability for EE is is pp, that is, , that is, PrPrRR[[EE] :] :≡≡ pp..

• Problem:Problem: It’s a subjective definition; depends on the It’s a subjective definition; depends on the reasoner reasoner RR, and his knowledge, beliefs, & rationality., and his knowledge, beliefs, & rationality.– The version above additionally assumes that the utility of The version above additionally assumes that the utility of

money is linear.money is linear.• This assumption can be avoided by using “utils” (utility units) This assumption can be avoided by using “utils” (utility units)

instead of dollars.instead of dollars.

Probability: Laplacian Definition

• First, assume that all individual outcomes in the First, assume that all individual outcomes in the sample space are sample space are equally likelyequally likely to each other to each other……– Note that this term still needs an operational definition!Note that this term still needs an operational definition!

• Then, the probability of any event Then, the probability of any event EE is given by, is given by, Pr[Pr[EE] = |] = |EE|/||/|SS||. . Very simple!Very simple!

• Problems:Problems: Still needs a definition for Still needs a definition for equally likelyequally likely, , and depends on the existence of and depends on the existence of somesome finite sample finite sample space space SS in which all outcomes in in which all outcomes in SS are, in fact, are, in fact, equally likely.equally likely.

Probability: Axiomatic Definition• Let Let pp be any total function be any total function pp::SS→[0,1]→[0,1] such that such that

∑∑ss pp((ss) = 1) = 1..

• Such a Such a pp is called a is called a probability distributionprobability distribution..• Then, theThen, the probability under p probability under p

of any event of any event EESS is just: is just:• Advantage: Advantage: Totally mathematically well-defined!Totally mathematically well-defined!

– This definition can even be extended to apply to infinite This definition can even be extended to apply to infinite sample spaces, by changing sample spaces, by changing ∑∑→→∫∫, and calling , and calling pp a a probability density functionprobability density function or a probability or a probability measuremeasure..

• Problem: Problem: Leaves operational meaning unspecified.Leaves operational meaning unspecified.

spEp )(:][

The Axioms of Probability

• 0 <= P(A) <= 10 <= P(A) <= 1

• P(True) = 1P(True) = 1

• P(False) = 0P(False) = 0

• P(A or B) = P(A) + P(B) - P(A and B)P(A or B) = P(A) + P(B) - P(A and B)

Interpreting the axioms• 0 <= P(A) 0 <= P(A) <= 1<= 1• P(True) = 1P(True) = 1• P(False) = 0P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)P(A or B) = P(A) + P(B) - P(A and B)

The area of A can’t get any smaller than 0

And a zero area would mean no world could ever have A true

Interpreting the axioms• 0 <= 0 <= P(A) <= 1P(A) <= 1• P(True) = 1P(True) = 1• P(False) = 0P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)P(A or B) = P(A) + P(B) - P(A and B)

The area of A can’t get any bigger than 1

And an area of 1 would mean all worlds will have A true

Interpreting the axioms

• 0 <= P(A) <= 10 <= P(A) <= 1• P(True) = 1P(True) = 1• P(False) = 0P(False) = 0• P(P(AA or or BB) = P() = P(AA) + P() + P(BB) - P() - P(AA and and BB))

These Axioms are Not to be Trifled With

• There have been attempts to do different There have been attempts to do different methodologies for uncertaintymethodologies for uncertainty

• Fuzzy LogicFuzzy Logic

• Three-valued logicThree-valued logic

• Dempster-ShaferDempster-Shafer

• Non-monotonic reasoningNon-monotonic reasoning

• But the axioms of probability are the only system But the axioms of probability are the only system with this property: with this property:

If you gamble using them you can’t be unfairly exploited by an If you gamble using them you can’t be unfairly exploited by an opponent using some other system [di Finetti 1931]opponent using some other system [di Finetti 1931]

Theorems from the Axioms

• 0 <= P(A) <= 1, P(True) = 1, P(False) = 00 <= P(A) <= 1, P(True) = 1, P(False) = 0• P(P(AA or or BB) = P() = P(AA) + P() + P(BB) - P() - P(AA and and BB))

From these we can prove:From these we can prove:

P(not A) = P(~A) = 1-P(A)P(not A) = P(~A) = 1-P(A)

• How?How?

Another important theorem

• 0 <= P(A) <= 1, P(True) = 1, P(False) = 00 <= P(A) <= 1, P(True) = 1, P(False) = 0• P(P(AA or or BB) = P() = P(AA) + P() + P(BB) - P() - P(AA and and BB))

From these we can prove:From these we can prove:

P(A) = P(A ^ B) + P(A ^ ~B)P(A) = P(A ^ B) + P(A ^ ~B)

• How?How?

Probability of an event EThe probability of an event E is the sum of the The probability of an event E is the sum of the

probabilities of the outcomes in E. That isprobabilities of the outcomes in E. That is

Note that, if there are n outcomes in the event E, Note that, if there are n outcomes in the event E, that is, if E = {athat is, if E = {a11,a,a22,…,a,…,ann} then} then

p(E) p(s)

p(E) p(

Example

• What is the probability that, if we flip a coin What is the probability that, if we flip a coin three times, that we get an odd number of three times, that we get an odd number of tails?tails?

((TTTTTT), (TTH), (), (TTH), (TTHH), (HTT), (HHHH), (HTT), (HHTT), (HHH), ), (HHH), (THT), (H(THT), (HTTH)H)

Each outcome has probability 1/8,Each outcome has probability 1/8,

p(odd number of tails) = 1/8+1/8+1/8+1/8 = ½ p(odd number of tails) = 1/8+1/8+1/8+1/8 = ½

Visualizing Sample Space

• 1.1. ListingListing– S = {Head, Tail}S = {Head, Tail}

• 2.2. Venn Diagram Venn Diagram

• 3.3. Contingency TableContingency Table

• 4.4. Decision Tree DiagramDecision Tree Diagram

TailTail

THTHHTHT

Sample SpaceSample SpaceS = {HH, HT, TH, TT}S = {HH, HT, TH, TT}

Venn Diagram

OutcomeOutcome

Experiment: Toss 2 Coins. Note Faces.Experiment: Toss 2 Coins. Note Faces.

Event Event

22ndnd CoinCoin11stst CoinCoin HeadHead TailTail TotalTotal

HeadHead HHHH HTHT HH, HTHH, HT

TailTail THTH TTTT TH, TTTH, TT

TotalTotal HH,HH, THTH HT,HT, TTTT SS

Contingency Table

S = {HH, HT, TH, TT}S = {HH, HT, TH, TT} Sample SpaceSample Space

OutcomeOutcomeSimpleSimpleEvent Event (Head on(Head on1st Coin)1st Coin)

Tree Diagram

Outcome Outcome

S = {HH, HT, TH, TT}S = {HH, HT, TH, TT} Sample SpaceSample Space

Discrete Random Variable

– Possible values (outcomes) are discretePossible values (outcomes) are discrete• E.g., natural number (0, 1, 2, 3 etc.)E.g., natural number (0, 1, 2, 3 etc.)

– Obtained by CountingObtained by Counting– Usually Finite Number of ValuesUsually Finite Number of Values

• But could be infinite (must be “countable”)But could be infinite (must be “countable”)

Discrete Probability Distribution ( also called probability mass function (pmf) )

1.1.List of All possible [List of All possible [xx, , pp((xx)] pairs)] pairs– xx = Value of Random Variable (Outcome) = Value of Random Variable (Outcome)– pp((xx) = Probability Associated with Value) = Probability Associated with Value

2.2.Mutually Exclusive (No Overlap)Mutually Exclusive (No Overlap)

3.3.Collectively Exhaustive (Nothing Left Out)Collectively Exhaustive (Nothing Left Out)

4. 0 4. 0 pp((xx) ) 1 1

5. 5. pp((xx) = 1) = 1

Visualizing Discrete Probability Distributions

{ (0, .25), (1, .50), (2, .25) }{ (0, .25), (1, .50), (2, .25) }{ (0, .25), (1, .50), (2, .25) }{ (0, .25), (1, .50), (2, .25) }

ListingListingTableTable

GraphGraph EquationEquation

# # TailsTails f(xf(x))CountCount

p(xp(x))

00 11 .25.2511 22 .50.5022 11 .25.25

pp xxnn

xx nn xxpp ppxx nn xx(( ))

!! (( )) !!(( ))

.00.00

.25.25

.50.50

00 11 22xx

p(x)p(x)

Arity of Random Variables

• Suppose A can take on more than 2 valuesSuppose A can take on more than 2 values• A is a A is a random variable with arity krandom variable with arity k if it can take on exactly one value if it can take on exactly one value

out of out of {v{v11,v,v22, .. v, .. vkk}}

• Thus…Thus…

jivAvAP ji if 0)(

1)( 21 kvAvAvAP

Mutually Exclusive Events

• Two events Two events EE11, , EE22 are called are called mutually mutually

exclusiveexclusive if they are disjoint: if they are disjoint: EE11EE22 = = – Note that two mutually exclusive events Note that two mutually exclusive events cannot cannot

both occurboth occur in the same instance of a given in the same instance of a given experiment.experiment.

• For mutually exclusive events,For mutually exclusive events,Pr[Pr[EE11 EE22] = Pr[] = Pr[EE11] + Pr[] + Pr[EE22]]..

Exhaustive Sets of Events

• A set A set EE = { = {EE11, , EE22, …}, …} of events in the sample of events in the sample

space space SS is called is called exhaustiveexhaustive iff . iff .• An exhaustive set An exhaustive set EE of events that are all mutually of events that are all mutually

exclusive with each other has the property thatexclusive with each other has the property that

.1]Pr[ iE

An easy fact about Multivalued Random Variables:

• Using the axioms of probability…Using the axioms of probability…

0 <= P(A) <= 1, P(True) = 1, P(False) = 00 <= P(A) <= 1, P(True) = 1, P(False) = 0

P(A or B) = P(A) + P(B) - P(A and B)P(A or B) = P(A) + P(B) - P(A and B)• And assuming that A obeys…And assuming that A obeys…

• It’s easy to prove thatjivAvAP ji if 0)(

1)( 21 kvAvAvAP

jji vAPvAvAvAP

• And thus we can prove

Another fact about Multivalued Random Variables:

• Using the axioms of probability…Using the axioms of probability…

0 <= P(A) <= 1, P(True) = 1, P(False) = 00 <= P(A) <= 1, P(True) = 1, P(False) = 0

P(A or B) = P(A) + P(B) - P(A and B)P(A or B) = P(A) + P(B) - P(A and B)• And assuming that A obeys…And assuming that A obeys…

• It’s easy to prove that

jivAvAP ji if 0)(

1)( 21 kvAvAvAP

)(])[(1

jji vABPvAvAvABP

Elementary Probability Rules

• P(~A) + P(A) = 1P(~A) + P(A) = 1• P(B) = P(B ^ A) + P(B ^ ~A)P(B) = P(B ^ A) + P(B ^ ~A)

jjvABPBP

Bernoulli Trials

• Each performance of an experiment with Each performance of an experiment with only two possible outcomes is called a only two possible outcomes is called a Bernoulli trialBernoulli trial..

• In general, a possible outcome of a In general, a possible outcome of a Bernoulli trial is called a Bernoulli trial is called a successsuccess or a or a failurefailure..

• If p is the probability of a success and q is If p is the probability of a success and q is the probability of a failure, then p+q=1.the probability of a failure, then p+q=1.

ExampleA coin is biased so that the probability of heads is 2/3. A coin is biased so that the probability of heads is 2/3.

What is the probability that exactly four heads come up What is the probability that exactly four heads come up when the coin is flipped seven times, assuming that the when the coin is flipped seven times, assuming that the flips are independent?flips are independent?

The number of ways that we can get four heads is: C(7,4) The number of ways that we can get four heads is: C(7,4) = 7!/4!3!= 7*5 = 35= 7!/4!3!= 7*5 = 35

The probability of getting four heads and three tails is The probability of getting four heads and three tails is (2/3)(2/3)44(1/3)(1/3)3= 3= 16/316/377

p(4 heads and 3 tails) is C(7,4) p(4 heads and 3 tails) is C(7,4) (2/3)(2/3)44(1/3)(1/3)33 = 35*16/3 = 35*16/377 = = 560/2187560/2187

Probability of k successes in n independent Bernoulli trials.

The probability of k successes in n The probability of k successes in n independent Bernoulli trials, with independent Bernoulli trials, with probability of success p and probability of probability of success p and probability of failure q = 1-p is C(n,k)pfailure q = 1-p is C(n,k)pkkqqn-kn-k

Find each of the following probabilities when n independent Bernoulli trials are carried out

with probability of success, p.

• Probability of no successes.Probability of no successes.

C(n,0)pC(n,0)p00qqn-k n-k = 1(p= 1(p00)(1-p))(1-p)n n = (1-p)= (1-p)n n

• Probability of at least one success.Probability of at least one success.

1 - (1-p)1 - (1-p)n n (why?)(why?)

Find each of the following probabilities when n independent Bernoulli trials are carried out

with probability of success, p.

• Probability of at most one success.Probability of at most one success.

Means there can be no successes or one Means there can be no successes or one successsuccess

C(n,0)pC(n,0)p00qqn-0 n-0 +C(n,1)p+C(n,1)p11qqn-1 n-1

(1-p)(1-p)nn + np(1-p) + np(1-p)n-1n-1

• Probability of at least two successes.Probability of at least two successes.

1 -1 - (1-p)(1-p)nn - np(1-p) - np(1-p)n-1n-1

A coin is flipped until it comes ups tails. The probability the coin comes up tails is p.

• What is the probability that the experiment What is the probability that the experiment ends after n flips, that is, the outcome ends after n flips, that is, the outcome consists of n-1 heads and a tail?consists of n-1 heads and a tail?

(1-p)(1-p)n-1n-1pp

Probability vs. Odds• You may have heard the term “odds.”You may have heard the term “odds.”

– It is widely used in the gambling community.It is widely used in the gambling community.

• This is not the same thing as probability!This is not the same thing as probability!– But, it is very closely related.But, it is very closely related.

• The The odds in favorodds in favor of an event of an event EE means the means the relative relative probability of probability of EE compared with its complement compared with its complement EE. .

OO((EE) :) :≡ Pr(≡ Pr(EE)/Pr()/Pr(EE))..– E.g.E.g., if , if pp((EE) = 0.6) = 0.6 then then pp((EE) = 0.4) = 0.4 and and OO((EE) = 0.6/0.4 = 1.5) = 0.6/0.4 = 1.5..

• Odds are conventionally written as a ratio of integers.Odds are conventionally written as a ratio of integers.– E.g.E.g., , 3/23/2 or or 3:23:2 in above example. “Three to two in favor.” in above example. “Three to two in favor.”

• The The odds againstodds against EE just means just means 1/1/OO((EE)). . “2 to 3 against”“2 to 3 against”

Exercise:Express theprobabilityp as a functionof the odds in favor O.

Example 1: Balls-and-Urn• Suppose an urn contains 4 blue balls and 5 red balls.Suppose an urn contains 4 blue balls and 5 red balls.• An example An example experiment: experiment: Shake up the urn, reach in Shake up the urn, reach in

(without looking) and pull out a ball.(without looking) and pull out a ball.• A A random variablerandom variable VV: Identity of the chosen ball.: Identity of the chosen ball.• The The sample space sample space SS: The set of: The set of

all possible values of all possible values of VV::– In this case, In this case, SS = { = {bb11,…,,…,bb99}}

• An An eventevent EE: “The ball chosen is: “The ball chosen isblue”: blue”: EE = { ______________ } = { ______________ }

• What are the odds in favor of What are the odds in favor of EE??• What is the probability of What is the probability of EE? ?

Independent Events

• Two events E,F are called independent if Pr[EF] = Pr[E]·Pr[F].

• Relates to the product rule for the number of ways of doing two independent tasks.

• Example: Flip a coin, and roll a die.Pr[(coin shows heads) (die shows 1)] =

Pr[coin is heads] × Pr[die is 1] = ½×1/6 =1/12.

Example

Suppose a red die and a blue die are rolled. The sample space:

1 2 3 4 5 6 1 x x x x x x 2 x x x x x x 3 x x x x x x 4 x x x x x x 5 x x x x x x 6 x x x x x x

Are the events sum is 7 and the blue die is 3 independent?

|S| = 36 1 2 3 4 5 6 1 x x x x x x 2 x x x x x x 3 x x x x x x 4 x x x x x x 5 x x x x x x 6 x x x x x x

|sum is 7| = 6

|blue die is 3| = 6

| in intersection | = 1

p(sum is 7 and blue die is 3) =1/36

p(sum is 7) p(blue die is 3) =6/36*6/36=1/36

Thus, p((sum is 7) and (blue die is 3)) = p(sum is 7) p(blue die is 3)

The events sum is 7 and the blue die is 3 are independent:

Conditional Probability

• Let Let EE,,FF be any events such that be any events such that Pr[Pr[FF]>0]>0..• Then, the Then, the conditional probabilityconditional probability of of EE given given FF, ,

written written Pr[Pr[EE||FF]], is defined as , is defined as Pr[Pr[EE||FF] :] :≡ ≡ Pr[Pr[EEFF]/Pr[]/Pr[FF]]..

• This is what our probability that This is what our probability that EE would turn out would turn out to occur should be, if we are given to occur should be, if we are given only only the the information that information that FF occurs. occurs.

• If If EE and and FF are independent then are independent then Pr[Pr[EE||FF] = Pr[] = Pr[EE]].. Pr[Pr[EE||FF] = Pr[] = Pr[EEFF]/Pr[]/Pr[FF] = Pr[] = Pr[EE]]×Pr[×Pr[FF]/Pr[]/Pr[FF] = Pr[] = Pr[EE]]

Visualizing Conditional Probability

• If we are given that event If we are given that event FF occurs, then occurs, then– Our attention gets restricted to the subspace Our attention gets restricted to the subspace FF..

• Our Our posteriorposterior probability for probability for EE (after seeing (after seeing FF)) correspondscorrespondsto the to the fractionfraction of of FF where where EEoccurs also.occurs also.

• Thus, Thus, pp′(′(EE)=)=pp((EE∩∩FF)/)/pp((FF).).

Entire sample space S

Event FEvent EEventE∩F

• Suppose I choose a single letter out of the 26-letter English alphabet, totally at random.– Use the Laplacian assumption on the sample space {a,b,..,z}.– What is the (prior) probability

that the letter is a vowel?• Pr[Vowel] = __ / __ .

• Now, suppose I tell you that the letter chosen happened to be in the first 9 letters of the alphabet.– Now, what is the conditional (or

posterior) probability that the letteris a vowel, given this information?

• Pr[Vowel | First9] = ___ / ___ .

Conditional Probability Example

de fghi

1st 9lettersvowels

Sample Space S

Example

• What is the probability that, if we flip a coin three times, that we get an odd number of tails (=event E), if we know that the event F, the first flip comes up tails occurs?

(TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT), (HTH)

Each outcome has probability 1/4,

p(E |F) = 1/4+1/4 = ½, where E=odd number of tailsor p(E|F) = p(EF)/p(F) = 2/4 = ½For comparison p(E) = 4/8 = ½ E and F are independent, since p(E |F) = Pr(E).

Prior and Posterior Probability• Suppose that, before you are given any information about the

outcome of an experiment, your personal probability for an event E to occur is p(E) = Pr[E].– The probability of E in your original probability distribution p is called

the prior probability of E.• This is its probability prior to obtaining any information about the outcome.

• Now, suppose someone tells you that some event F (which may overlap with E) actually occurred in the experiment.– Then, you should update your personal probability for event E to occur,

to become p′(E) = Pr[E|F] = p(E∩F)/p(F).• The conditional probability of E, given F.

– The probability of E in your new probability distribution p′ is called the posterior probability of E.

• This is its probability after learning that event F occurred.

• After seeing F, the posterior distribution p′ is defined by letting

p′(v) = p({v}∩F)/p(F) for each individual outcome vS.

6.3 Bayes’ Theorem

Slides for a Course Based on the TextSlides for a Course Based on the TextDiscrete Mathematics & Its ApplicationsDiscrete Mathematics & Its Applications (6 (6thth

Edition) Kenneth H. Rosen Edition) Kenneth H. Rosen based on slides by based on slides by

Michael P. Frank and Wolfram BurgardMichael P. Frank and Wolfram Burgard

Bayes’ Rule• One way to compute the probability that a

hypothesis H is correct, given some data D:

• This follows directly from the definition of conditional probability! (Exercise: Prove it.)

• This rule is the foundation of Bayesian methods for probabilistic reasoning, which are very powerful, and widely used in artificial intelligence applications:– For data mining, automated diagnosis, pattern recognition,

statistical modeling, even evaluating scientific hypotheses!

]Pr[]|Pr[]|Pr[

Rev. Thomas Bayes1702-1761

Bayes’ Theorem

• Allows one to compute the probability that a Allows one to compute the probability that a hypothesis hypothesis HH is correct, given data is correct, given data DD::

]Pr[]|Pr[]|Pr[

iii HHD

]Pr[]|Pr[

]Pr[]|Pr[]|Pr[

Set of Hj is exhaustive

Example 1: Two boxes with balls• Two boxes: first: 2 blue and 7 red balls; second: 4 blue and 3 red ballsTwo boxes: first: 2 blue and 7 red balls; second: 4 blue and 3 red balls• Bob selects a ball by first choosing one of the two boxes, and then one ball Bob selects a ball by first choosing one of the two boxes, and then one ball

from this box.from this box.• If Bob has selected a red ball, what is the probability that he selected a ball If Bob has selected a red ball, what is the probability that he selected a ball

from the first box.from the first box.• An An eventevent EE: Bob has chosen a red ball.: Bob has chosen a red ball.• An An eventevent FF: Bob has chosen a ball from the first box.: Bob has chosen a ball from the first box.• We want to find p(We want to find p(FF | | EE))

Example 2:• Suppose 1% of population has AIDS• Prob. that the positive result is right: 95%• Prob. that the negative result is right: 90%• What is the probability that someone who has the

positive result is actually an AIDS patient?

• H: event that a person has AIDS• D: event of positive result• P[D|H] = 0.95 = 0.95 P[D| H] = 1- 0.9

• P[D] = P[D|H]P[H]+P[D|H]P[H ] = 0.95*0.01+0.1*0.99=0.1085

• P[H|D] = 0.95*0.01/0.1085=0.0876

What’s behind door number three?• The Monty Hall problem paradoxThe Monty Hall problem paradox

– Consider a game show where a prize (a car) is behind one of Consider a game show where a prize (a car) is behind one of three doorsthree doors

– The other two doors do not have prizes (goats instead)The other two doors do not have prizes (goats instead)– After picking one of the doors, the host (Monty Hall) opens a After picking one of the doors, the host (Monty Hall) opens a

different door to show you that the door he opened is not the different door to show you that the door he opened is not the prizeprize

– Do you change your decision?Do you change your decision?• Your initial probability to win (i.e. pick the right door) is 1/3Your initial probability to win (i.e. pick the right door) is 1/3• What is your chance of winning if you change your choice What is your chance of winning if you change your choice

after Monty opens a wrong door?after Monty opens a wrong door?• After Monty opens a wrong door, if you change your choice, After Monty opens a wrong door, if you change your choice,

your chance of your chance of winningwinning is 2/3 is 2/3– Thus, your chance of winning Thus, your chance of winning doublesdoubles if you change if you change– Huh?Huh?

Monty Hall Problem

Ci - The car is behind Door i, for i equal to 1, 2 or 3.

Hij - The host opens Door j after the player has picked Door i, for i and j equal to 1, 2 or 3. Without loss of generality, assume, by re-numbering the doors if necessary, that the player picks Door 1, and that the host then opens Door 3, revealing a goat. In other words, the host makes proposition H13 true.

Then the posterior probability of winning by not switching doorsis P(C1|H13).

1)( iCP

1113131

)()|()|(

iii CPCHP

CPCHPHCP

P(H13 | C1 )= 0.5, since the host will always open a door that has no car behind it, chosen from among the two not picked by the player (which are 2 and 3 here)

)()|()()|()()|(

331322131113

CPCHPCPCHPCPCHP

Module #19 – ProbabilityThe probability of winning by switching is P(C2|H13), since under our assumption switching means switching the selection to Door 2, since P(C3|H13) = 0 (the host will never open the door with the car)

2213132

)()|()|(

iii CPCHP

CPCHPHCP

The posterior probability of winning by not switching doorsis P(C1|H13) = 1/3.

Exercises: 6, p. 424, and 16, p. 425

Continuous random variable

Continuous Prob. Density Function

1.1. Mathematical FormulaMathematical Formula

2.2. Shows All Values, Shows All Values, xx, and , and Frequencies, f(Frequencies, f(xx))– f(f(xx) Is ) Is NotNot Probability Probability

3.3. PropertiesProperties

((Area Under Curve)Area Under Curve)ValueValue

((Value, Frequency)Value, Frequency)

f(x)f(x)

aa bbxxff xx dxdx

All All xx

aa x x bb

Continuous Random Variable Probability

Probability Is Area Probability Is Area Under Curve!Under Curve!

PP cc xx dd ff xx dxdxcc

dd(( )) (( ))

f(x)f(x)

Probability mass function

In probability theory, a probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value. A pmf differs from a probability density function (pdf) in that the values of a pdf, defined only for continuous random variables, are not probabilities as such. Instead, the integral of a pdf over a range of possible values (a, b] gives the probability of the random variable falling within that range.

Example graphs of a pmfs. All the values of a pmf must be non-negative and sum up to 1. (right) The pmf of a fair die. (All the numbers on the die have an equal chance of appearing on top when the die is rolled.)

Module #19 – ProbabilitySuppose that X is a discrete random variable, taking values on some countable sample space S ⊆ R. Then the probability mass function fX(x) for X is given by

Note that this explicitly defines fX(x) for all real numbers,

including all values in R that X could never take; indeed, it assigns such values a probability of zero.

Example. Suppose that X is the outcome of a single coin toss, assigning 0 to tails and 1 to heads. The probability that X = x is 0.5 on the state space {0, 1} (this is a Bernoulli random variable), and hence the probability mass function is

)()( xfxXP X

Uniform Distribution

1. 1. Equally Likely OutcomesEqually Likely Outcomes

2. Probability Density2. Probability Density

3. Mean & Standard Deviation 3. Mean & Standard Deviation Mean Mean

MedianMedian

f xd c

c d d c

c d d c2 12

Uniform Distribution Example

• You’re production manager of a soft drink You’re production manager of a soft drink bottling company. You believe that when a bottling company. You believe that when a machine is set to dispense 12 oz., it really machine is set to dispense 12 oz., it really dispenses dispenses 11.511.5 to to 12.512.5 oz. inclusive. oz. inclusive.

• Suppose the amount dispensed has a uniform Suppose the amount dispensed has a uniform distribution. distribution.

• What is the probability that What is the probability that lessless thanthan 11.811.8 oz. oz. is dispensed?is dispensed?

Uniform Distribution Solution

PP(11.5 (11.5 xx 11.8) 11.8) = (Base)(Height) = (Base)(Height)

= (11.8 - 11.5)(1) = = (11.8 - 11.5)(1) = 0.30 0.30

11.511.5 12.512.5

ff((xx))

xx11.811.8

1 112 5 11511

1.01.0

Normal Distribution

1.1. Describes Many Random Processes Describes Many Random Processes or Continuous Phenomenaor Continuous Phenomena

2.2. Can Be Used to Approximate Discrete Can Be Used to Approximate Discrete Probability DistributionsProbability Distributions

– Example: BinomialExample: Binomial

3.3. Basis for Classical Statistical InferenceBasis for Classical Statistical Inference

4.4. A.k.a. Gaussian distributionA.k.a. Gaussian distribution

Normal Distribution

1.1. ‘‘Bell-Shaped’ & Bell-Shaped’ & SymmetricalSymmetrical

2.2. Mean, Median, Mode Mean, Median, Mode Are EqualAre Equal

4. Random Variable Has 4. Random Variable Has Infinite RangeInfinite Range MeanMean

* light-tailed distribution

Probability Density Function

ff((xx)) == Frequency of Random Variable Frequency of Random Variable xx == Population Standard DeviationPopulation Standard Deviation == 3.14159; e = 2.718283.14159; e = 2.71828xx == Value of Random Variable (-Value of Random Variable (-< < xx < < )) == Population MeanPopulation Mean

Effect of Varying Parameters ( & )

Normal Distribution Probability

?)()( dxxfdxcPd

Probability is Probability is area under area under curve!curve!

Infinite Number of Tables

Normal distributions differ by Normal distributions differ by mean & standard deviation.mean & standard deviation.

Each distribution would Each distribution would require its own table.require its own table.

That’s an That’s an infinite infinite number!number!

Standardize theNormal Distribution

One table! One table!

Normal DistributionNormal Distribution

Standardized

Normal DistributionStandardized

Normal Distribution

Intuitions on Standardizing

• Subtracting Subtracting from each value X just from each value X just moves the curve around, so values are moves the curve around, so values are centered on 0 instead of on centered on 0 instead of on

• Once the curve is centered, dividing Once the curve is centered, dividing each value by each value by >1 moves all values >1 moves all values toward 0, pressing the curvetoward 0, pressing the curve

Standardizing Example

6.2 X= 5

6 2 510

Standardizing Example

6.2 X= 5

6 2 510

.12 Z= 0

Standardized Normal Distribution

6.4 Expected Value and Variance

Slides for a Course Based on the TextSlides for a Course Based on the TextDiscrete Mathematics & Its ApplicationsDiscrete Mathematics & Its Applications

(6(6thth Edition) Kenneth H. Rosen Edition) Kenneth H. Rosen based on slides by Michael P. Frankbased on slides by Michael P. Frank

Expected Values• For any random variable For any random variable VV having a numeric domain, its having a numeric domain, its

expectation valueexpectation value or or expected valueexpected value or or weighted average weighted average valuevalue or or (arithmetic) mean value(arithmetic) mean value ExEx[[VV]], under the , under the probability distribution probability distribution Pr[Pr[vv] = ] = pp((vv)), is defined as , is defined as

• The term “expected value” is very widely used for this.The term “expected value” is very widely used for this.– But this term is somewhat misleading, since the “expected” But this term is somewhat misleading, since the “expected”

value might itself be totally unexpected, or even impossible!value might itself be totally unexpected, or even impossible!• E.g.E.g., if , if pp(0)=0.5(0)=0.5 & & pp(2)=0.5(2)=0.5, then , then Ex[Ex[VV]=1]=1, even though , even though pp(1)=0(1)=0 and so and so

we know that we know that VV≠1≠1!!• Or, if Or, if pp(0)=0.5(0)=0.5 & & pp(1)=0.5(1)=0.5, then , then Ex[Ex[VV]=0.5]=0.5 even if even if VV is an integer is an integer

variable!variable!

.)(:][:][:ˆ][

p vpvVVVdom

Derived Random Variables

• Let Let SS be a sample space over values of a random be a sample space over values of a random variable variable VV (representing possible outcomes). (representing possible outcomes).

• Then, any function Then, any function ff over over SS can also be considered can also be considered to be a random variable (whose actual value to be a random variable (whose actual value ff((VV)) is is derived from the actual value of derived from the actual value of VV).).

• If the range If the range RR = = rangerange[[ff]] of of ff is numeric, then the is numeric, then the mean value mean value ExEx[[ff]] of of ff can still be defined, as can still be defined, as

sfspff )()(][ˆ Ex

Recall that a random variable X is actually a functionf: S → X(S),

where S is the sample space and X(S) is the range of X.This fact implies that the expected value of X is

Ss SXr

rrXpsXspXE)(

)()()()(

Example 1. Expected Value of a Die.Let X be the number that comes up when a die is rolled.

}6,...,1{}6,...,1{ 2

rrrXpXE

• A fair coin is flipped 3 times. Let A fair coin is flipped 3 times. Let SS be the be the sample space of 8 possible outcomes, and let sample space of 8 possible outcomes, and let XX be a random variable that assignees to an be a random variable that assignees to an outcome the number of heads in this outcome. outcome the number of heads in this outcome.

E(X) = 1/8[X(TTT) + X(TTH) + X(THH) + E(X) = 1/8[X(TTT) + X(TTH) + X(THH) + X(HTT) + X(HHT) +X(HHH) + X(THT) + X(HTT) + X(HHT) +X(HHH) + X(THT) + X(HTH)] X(HTH)] = 1/8[0 + 1 + 2 + 1 + 2 + 3 + 1 + 2] = 12/8 =3/2= 1/8[0 + 1 + 2 + 1 + 2 + 3 + 1 + 2] = 12/8 =3/2

Linearity of Expectation Values

• Let Let XX11, , XX22 be any two random variables be any two random variables derived from the derived from the samesame sample space sample space SS, and , and subject to the same underlying distribution. subject to the same underlying distribution.

• Then we have the following theorems:Then we have the following theorems:ExEx[[XX11++XX22] = ] = ExEx[[XX11] + ] + ExEx[[XX22]]

ExEx[[aXaX11 + + bb] = ] = aaExEx[[XX11] + ] + bb

• You should be able to easily prove these for You should be able to easily prove these for yourself at home.yourself at home.

Variance & Standard Deviation

• The The variancevariance VarVar[[XX] = ] = σσ22((XX)) of a random variable of a random variable XX is the expected value of the is the expected value of the squaresquare of the of the difference between the value of difference between the value of XX and its and its expectation value expectation value ExEx[[XX]]::

• The The standard deviationstandard deviation or or root-mean-squareroot-mean-square (RMS) (RMS) differencedifference of of XX is is σσ((XX) :≡ ) :≡ VarVar[[XX]]1/21/2..

p spXsXX )(][)(:][ 2ExVar

• What is the variance of the random variable What is the variance of the random variable XX, , where where XX is the number that comes up when a die is the number that comes up when a die is rolled?is rolled?

• V(X) = E(XV(X) = E(X22) – E(X)) – E(X)22

E(XE(X22) = 1/6[1) = 1/6[122 + 2 + 222 + 3 + 322 + 4 + 422 + 5 + 522 + 6 + 622] = 91/6] = 91/6

V(X) = 91/6 – (7/2) V(X) = 91/6 – (7/2) 2 2 = 35/12 = 35/12 ≈≈ 2.92 2.92

6.2 probability theory longin jan latecki temple university

Documents

cis 601 image fundamentals longin jan latecki

sliding window filters longin jan latecki october 9, 2002

1 representing relations based on aaron bloomfield modified...

1 random walks on graphs: an overview purnamrita sarkar, cmu...

turing machines (11.5) longin jan latecki temple university

language recognition (12.4) longin jan latecki temple...

1 template matching longin jan latecki temple university cis...

tianyang ma and longin jan latecki temple university

context-dependent detection of unusual events in videos by...

by dr. michael p. frank, university of florida modified by...

1 euler number computation by kishore kulkarni (under the...

segmentation in color space using clustering student: yijian...

language recognition (11.4) and turing machines (11.5)...

longin jan latecki temple university latecki@temple

longin jan latecki ( latecki@temple ) computer and...

independent component analysis on images instructor: dr....

rosen, section 8.5 equivalence relations longin jan latecki...

introduction to expectation maximization assembled and...

steganography by: joe jupin supervised by: dr. longin jan...

finite-state machines with no output longin jan latecki ...