intro to probability - machinesintro to probability andrei barbu andrei barbu (mit) probability...

122
Intro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Upload: others

Post on 06-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Intro to Probability

Andrei Barbu

Andrei Barbu (MIT) Probability August 2018

Page 2: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Some problemsSome problems

A means to capture uncertainty

You have data from two sources, are they different?

How can you generate data or fill in missing data?

What explains the observed data?

Andrei Barbu (MIT) Probability August 2018

Page 3: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Some problemsSome problems

A means to capture uncertainty

You have data from two sources, are they different?

How can you generate data or fill in missing data?

What explains the observed data?

Andrei Barbu (MIT) Probability August 2018

Page 4: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Some problemsSome problems

A means to capture uncertainty

You have data from two sources, are they different?

How can you generate data or fill in missing data?

What explains the observed data?

Andrei Barbu (MIT) Probability August 2018

Page 5: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Some problemsSome problems

A means to capture uncertainty

You have data from two sources, are they different?

How can you generate data or fill in missing data?

What explains the observed data?

Andrei Barbu (MIT) Probability August 2018

Page 6: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Some problemsSome problems

A means to capture uncertainty

You have data from two sources, are they different?

How can you generate data or fill in missing data?

What explains the observed data?

Andrei Barbu (MIT) Probability August 2018

Page 7: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

UncertaintyUncertainty

Every event has a probability of occurring, some event always occurs, andcombining separate events adds their probabilities.

Why these axioms? Many other choices are possible:Possibility theory, probability intervals

Belief functions, upper and lower probabilities

Andrei Barbu (MIT) Probability August 2018

Page 8: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

UncertaintyUncertainty

Every event has a probability of occurring, some event always occurs, andcombining separate events adds their probabilities.

Why these axioms? Many other choices are possible:Possibility theory, probability intervals

Belief functions, upper and lower probabilities

Andrei Barbu (MIT) Probability August 2018

Page 9: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

UncertaintyUncertainty

Every event has a probability of occurring, some event always occurs, andcombining separate events adds their probabilities.

Why these axioms? Many other choices are possible:

Possibility theory, probability intervalsBelief functions, upper and lower probabilities

Andrei Barbu (MIT) Probability August 2018

Page 10: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

UncertaintyUncertainty

Every event has a probability of occurring, some event always occurs, andcombining separate events adds their probabilities.

Why these axioms? Many other choices are possible:Possibility theory, probability intervals

Belief functions, upper and lower probabilities

Andrei Barbu (MIT) Probability August 2018

Page 11: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

UncertaintyUncertainty

Every event has a probability of occurring, some event always occurs, andcombining separate events adds their probabilities.

Why these axioms? Many other choices are possible:Possibility theory, probability intervals

Belief functions, upper and lower probabilities

Andrei Barbu (MIT) Probability August 2018

Page 12: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

UncertaintyUncertainty

Every event has a probability of occurring, some event always occurs, andcombining separate events adds their probabilities.

Why these axioms? Many other choices are possible:Possibility theory, probability intervals

Belief functions, upper and lower probabilities

Andrei Barbu (MIT) Probability August 2018

Page 13: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability is special: Dutch booksProbability is special: Dutch books

I offer that if X happens I pay out R, otherwise I keep your money.

Why should I always value my bet at pR, where p is the probability of X?

Negative p

I’m offering to pay R for −pR dollars.

Probabilities don’t sum to one

Take out a bet that always pays off.If the sum is below 1, I pay R for less than R dollars.If the sum is above 1, buy the bet and sell it to me for more.

When X and Y are incompatible the value isn’t the sum

If the value is bigger, I still pay out more.If the value is smaller, sell me my own bets.

Decisions under probability are “rational”.

Andrei Barbu (MIT) Probability August 2018

Page 14: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability is special: Dutch booksProbability is special: Dutch books

I offer that if X happens I pay out R, otherwise I keep your money.

Why should I always value my bet at pR, where p is the probability of X?

Negative p

I’m offering to pay R for −pR dollars.

Probabilities don’t sum to one

Take out a bet that always pays off.If the sum is below 1, I pay R for less than R dollars.If the sum is above 1, buy the bet and sell it to me for more.

When X and Y are incompatible the value isn’t the sum

If the value is bigger, I still pay out more.If the value is smaller, sell me my own bets.

Decisions under probability are “rational”.

Andrei Barbu (MIT) Probability August 2018

Page 15: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability is special: Dutch booksProbability is special: Dutch books

I offer that if X happens I pay out R, otherwise I keep your money.

Why should I always value my bet at pR, where p is the probability of X?

Negative p

I’m offering to pay R for −pR dollars.

Probabilities don’t sum to one

Take out a bet that always pays off.If the sum is below 1, I pay R for less than R dollars.If the sum is above 1, buy the bet and sell it to me for more.

When X and Y are incompatible the value isn’t the sum

If the value is bigger, I still pay out more.If the value is smaller, sell me my own bets.

Decisions under probability are “rational”.

Andrei Barbu (MIT) Probability August 2018

Page 16: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability is special: Dutch booksProbability is special: Dutch books

I offer that if X happens I pay out R, otherwise I keep your money.

Why should I always value my bet at pR, where p is the probability of X?

Negative pI’m offering to pay R for −pR dollars.

Probabilities don’t sum to one

Take out a bet that always pays off.If the sum is below 1, I pay R for less than R dollars.If the sum is above 1, buy the bet and sell it to me for more.

When X and Y are incompatible the value isn’t the sum

If the value is bigger, I still pay out more.If the value is smaller, sell me my own bets.

Decisions under probability are “rational”.

Andrei Barbu (MIT) Probability August 2018

Page 17: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability is special: Dutch booksProbability is special: Dutch books

I offer that if X happens I pay out R, otherwise I keep your money.

Why should I always value my bet at pR, where p is the probability of X?

Negative pI’m offering to pay R for −pR dollars.

Probabilities don’t sum to one

Take out a bet that always pays off.If the sum is below 1, I pay R for less than R dollars.If the sum is above 1, buy the bet and sell it to me for more.

When X and Y are incompatible the value isn’t the sum

If the value is bigger, I still pay out more.If the value is smaller, sell me my own bets.

Decisions under probability are “rational”.

Andrei Barbu (MIT) Probability August 2018

Page 18: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability is special: Dutch booksProbability is special: Dutch books

I offer that if X happens I pay out R, otherwise I keep your money.

Why should I always value my bet at pR, where p is the probability of X?

Negative pI’m offering to pay R for −pR dollars.

Probabilities don’t sum to oneTake out a bet that always pays off.

If the sum is below 1, I pay R for less than R dollars.If the sum is above 1, buy the bet and sell it to me for more.

When X and Y are incompatible the value isn’t the sum

If the value is bigger, I still pay out more.If the value is smaller, sell me my own bets.

Decisions under probability are “rational”.

Andrei Barbu (MIT) Probability August 2018

Page 19: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability is special: Dutch booksProbability is special: Dutch books

I offer that if X happens I pay out R, otherwise I keep your money.

Why should I always value my bet at pR, where p is the probability of X?

Negative pI’m offering to pay R for −pR dollars.

Probabilities don’t sum to oneTake out a bet that always pays off.If the sum is below 1, I pay R for less than R dollars.

If the sum is above 1, buy the bet and sell it to me for more.

When X and Y are incompatible the value isn’t the sum

If the value is bigger, I still pay out more.If the value is smaller, sell me my own bets.

Decisions under probability are “rational”.

Andrei Barbu (MIT) Probability August 2018

Page 20: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability is special: Dutch booksProbability is special: Dutch books

I offer that if X happens I pay out R, otherwise I keep your money.

Why should I always value my bet at pR, where p is the probability of X?

Negative pI’m offering to pay R for −pR dollars.

Probabilities don’t sum to oneTake out a bet that always pays off.If the sum is below 1, I pay R for less than R dollars.If the sum is above 1, buy the bet and sell it to me for more.

When X and Y are incompatible the value isn’t the sum

If the value is bigger, I still pay out more.If the value is smaller, sell me my own bets.

Decisions under probability are “rational”.

Andrei Barbu (MIT) Probability August 2018

Page 21: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability is special: Dutch booksProbability is special: Dutch books

I offer that if X happens I pay out R, otherwise I keep your money.

Why should I always value my bet at pR, where p is the probability of X?

Negative pI’m offering to pay R for −pR dollars.

Probabilities don’t sum to oneTake out a bet that always pays off.If the sum is below 1, I pay R for less than R dollars.If the sum is above 1, buy the bet and sell it to me for more.

When X and Y are incompatible the value isn’t the sum

If the value is bigger, I still pay out more.If the value is smaller, sell me my own bets.

Decisions under probability are “rational”.

Andrei Barbu (MIT) Probability August 2018

Page 22: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability is special: Dutch booksProbability is special: Dutch books

I offer that if X happens I pay out R, otherwise I keep your money.

Why should I always value my bet at pR, where p is the probability of X?

Negative pI’m offering to pay R for −pR dollars.

Probabilities don’t sum to oneTake out a bet that always pays off.If the sum is below 1, I pay R for less than R dollars.If the sum is above 1, buy the bet and sell it to me for more.

When X and Y are incompatible the value isn’t the sumIf the value is bigger, I still pay out more.

If the value is smaller, sell me my own bets.

Decisions under probability are “rational”.

Andrei Barbu (MIT) Probability August 2018

Page 23: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability is special: Dutch booksProbability is special: Dutch books

I offer that if X happens I pay out R, otherwise I keep your money.

Why should I always value my bet at pR, where p is the probability of X?

Negative pI’m offering to pay R for −pR dollars.

Probabilities don’t sum to oneTake out a bet that always pays off.If the sum is below 1, I pay R for less than R dollars.If the sum is above 1, buy the bet and sell it to me for more.

When X and Y are incompatible the value isn’t the sumIf the value is bigger, I still pay out more.If the value is smaller, sell me my own bets.

Decisions under probability are “rational”.

Andrei Barbu (MIT) Probability August 2018

Page 24: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability is special: Dutch booksProbability is special: Dutch books

I offer that if X happens I pay out R, otherwise I keep your money.

Why should I always value my bet at pR, where p is the probability of X?

Negative pI’m offering to pay R for −pR dollars.

Probabilities don’t sum to oneTake out a bet that always pays off.If the sum is below 1, I pay R for less than R dollars.If the sum is above 1, buy the bet and sell it to me for more.

When X and Y are incompatible the value isn’t the sumIf the value is bigger, I still pay out more.If the value is smaller, sell me my own bets.

Decisions under probability are “rational”.

Andrei Barbu (MIT) Probability August 2018

Page 25: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Experiments, theory, and fundingExperiments, theory, and funding

Andrei Barbu (MIT) Probability August 2018

Page 26: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Experiments, theory, and fundingExperiments, theory, and funding

Andrei Barbu (MIT) Probability August 2018

Page 27: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean µX = E[X] =∑

xxp(x)

Variance σ2X = var(X) = E[(X − µ)2]

Covariance cov(X,Y) = E[(X − µx)(Y − µY)]

Correlation ρX,Y =cov(X,Y)σXσY

Andrei Barbu (MIT) Probability August 2018

Page 28: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean µX = E[X] =∑

xxp(x)

Variance σ2X = var(X) = E[(X − µ)2]

Covariance cov(X,Y) = E[(X − µx)(Y − µY)]

Correlation ρX,Y =cov(X,Y)σXσY

Andrei Barbu (MIT) Probability August 2018

Page 29: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean µX = E[X] =∑

xxp(x)

Variance σ2X = var(X) = E[(X − µ)2]

Covariance cov(X,Y) = E[(X − µx)(Y − µY)]

Correlation ρX,Y =cov(X,Y)σXσY

Andrei Barbu (MIT) Probability August 2018

Page 30: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean µX = E[X] =∑

xxp(x)

Variance σ2X = var(X) = E[(X − µ)2]

Covariance cov(X,Y) = E[(X − µx)(Y − µY)]

Correlation ρX,Y =cov(X,Y)σXσY

Andrei Barbu (MIT) Probability August 2018

Page 31: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean µX = E[X] =∑

xxp(x)

Variance σ2X = var(X) = E[(X − µ)2]

Covariance cov(X,Y) = E[(X − µx)(Y − µY)]

Correlation ρX,Y =cov(X,Y)σXσY

Andrei Barbu (MIT) Probability August 2018

Page 32: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Mean and varianceMean and variance

Often implicitly assume that our data comes from a normaldistribution.

That our samples are i.i.d. (independent indentically distributed).

Generally these don’t capture enough about the underlying data.

Uncorrelated does not mean independent!

Andrei Barbu (MIT) Probability August 2018

Page 33: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Mean and varianceMean and variance

Often implicitly assume that our data comes from a normaldistribution.

That our samples are i.i.d. (independent indentically distributed).

Generally these don’t capture enough about the underlying data.

Uncorrelated does not mean independent!

Andrei Barbu (MIT) Probability August 2018

Page 34: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Mean and varianceMean and variance

Often implicitly assume that our data comes from a normaldistribution.

That our samples are i.i.d. (independent indentically distributed).

Generally these don’t capture enough about the underlying data.

Uncorrelated does not mean independent!

Andrei Barbu (MIT) Probability August 2018

Page 35: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Mean and varianceMean and variance

Often implicitly assume that our data comes from a normaldistribution.

That our samples are i.i.d. (independent indentically distributed).

Generally these don’t capture enough about the underlying data.

Uncorrelated does not mean independent!

Andrei Barbu (MIT) Probability August 2018

Page 36: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Correlation vs independenceCorrelation vs independence

V = N(0, 1), X = sin(V), Y = cos(V)

Correlation only measures linear relationships.

Andrei Barbu (MIT) Probability August 2018

Page 37: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Correlation vs independenceCorrelation vs independence

V = N(0, 1), X = sin(V), Y = cos(V)

Correlation only measures linear relationships.

Andrei Barbu (MIT) Probability August 2018

Page 38: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Correlation vs independenceCorrelation vs independence

V = N(0, 1), X = sin(V), Y = cos(V)

Correlation only measures linear relationships.

Andrei Barbu (MIT) Probability August 2018

Page 39: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Data dinosaursData dinosaurs

Andrei Barbu (MIT) Probability August 2018

Page 40: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Data dinosaursData dinosaurs

Andrei Barbu (MIT) Probability August 2018

Page 41: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean

Variance

Covariance

Correlation

Are two players the same?

How do you know and how certain are you?

What about two players is different?

How do you quantify which differences matter?

Here’s a player, how good will they be?

What is the best information to ask for?

What is the best test to run?

If I change the size of the board, how mightthe results change?

. . .

Andrei Barbu (MIT) Probability August 2018

Page 42: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean

Variance

Covariance

Correlation

Are two players the same?

How do you know and how certain are you?

What about two players is different?

How do you quantify which differences matter?

Here’s a player, how good will they be?

What is the best information to ask for?

What is the best test to run?

If I change the size of the board, how mightthe results change?

. . .

Andrei Barbu (MIT) Probability August 2018

Page 43: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean

Variance

Covariance

Correlation

Are two players the same?

How do you know and how certain are you?

What about two players is different?

How do you quantify which differences matter?

Here’s a player, how good will they be?

What is the best information to ask for?

What is the best test to run?

If I change the size of the board, how mightthe results change?

. . .

Andrei Barbu (MIT) Probability August 2018

Page 44: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean

Variance

Covariance

Correlation

Are two players the same?

How do you know and how certain are you?

What about two players is different?

How do you quantify which differences matter?

Here’s a player, how good will they be?

What is the best information to ask for?

What is the best test to run?

If I change the size of the board, how mightthe results change?

. . .

Andrei Barbu (MIT) Probability August 2018

Page 45: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean

Variance

Covariance

Correlation

Are two players the same?

How do you know and how certain are you?

What about two players is different?

How do you quantify which differences matter?

Here’s a player, how good will they be?

What is the best information to ask for?

What is the best test to run?

If I change the size of the board, how mightthe results change?

. . .

Andrei Barbu (MIT) Probability August 2018

Page 46: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean

Variance

Covariance

Correlation

Are two players the same?

How do you know and how certain are you?

What about two players is different?

How do you quantify which differences matter?

Here’s a player, how good will they be?

What is the best information to ask for?

What is the best test to run?

If I change the size of the board, how mightthe results change?

. . .

Andrei Barbu (MIT) Probability August 2018

Page 47: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean

Variance

Covariance

Correlation

Are two players the same?

How do you know and how certain are you?

What about two players is different?

How do you quantify which differences matter?

Here’s a player, how good will they be?

What is the best information to ask for?

What is the best test to run?

If I change the size of the board, how mightthe results change?

. . .

Andrei Barbu (MIT) Probability August 2018

Page 48: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean

Variance

Covariance

Correlation

Are two players the same?

How do you know and how certain are you?

What about two players is different?

How do you quantify which differences matter?

Here’s a player, how good will they be?

What is the best information to ask for?

What is the best test to run?

If I change the size of the board, how mightthe results change?

. . .

Andrei Barbu (MIT) Probability August 2018

Page 49: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean

Variance

Covariance

Correlation

Are two players the same?

How do you know and how certain are you?

What about two players is different?

How do you quantify which differences matter?

Here’s a player, how good will they be?

What is the best information to ask for?

What is the best test to run?

If I change the size of the board, how mightthe results change?

. . .

Andrei Barbu (MIT) Probability August 2018

Page 50: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean

Variance

Covariance

Correlation

Are two players the same?

How do you know and how certain are you?

What about two players is different?

How do you quantify which differences matter?

Here’s a player, how good will they be?

What is the best information to ask for?

What is the best test to run?

If I change the size of the board, how mightthe results change?

. . .

Andrei Barbu (MIT) Probability August 2018

Page 51: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

DataData

Mean

Variance

Covariance

Correlation

Are two players the same?

How do you know and how certain are you?

What about two players is different?

How do you quantify which differences matter?

Here’s a player, how good will they be?

What is the best information to ask for?

What is the best test to run?

If I change the size of the board, how mightthe results change?

. . .

Andrei Barbu (MIT) Probability August 2018

Page 52: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability as an experimentProbability as an experiment

A machine enters a random state, its current state is an event.

Events, x, have probabilities associated, p(X = x) (p(x) shorthand)Sets of events, ARandom variables, X, are a function of the event

The probability of two events p(A ∪ B)The probability of either event p(A ∩ B)p(¬x) = 1− p(x)Joint probabilities P(x, y)Independence P(x, y) = P(x)P(y)

Conditional probabilities P(x|y) = P(x,y)p(y)

Law of total probability∑

A a = 1 when events A are a disjoint cover

Andrei Barbu (MIT) Probability August 2018

Page 53: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability as an experimentProbability as an experiment

A machine enters a random state, its current state is an event.

Events, x, have probabilities associated, p(X = x) (p(x) shorthand)Sets of events, ARandom variables, X, are a function of the event

The probability of two events p(A ∪ B)The probability of either event p(A ∩ B)p(¬x) = 1− p(x)Joint probabilities P(x, y)Independence P(x, y) = P(x)P(y)

Conditional probabilities P(x|y) = P(x,y)p(y)

Law of total probability∑

A a = 1 when events A are a disjoint cover

Andrei Barbu (MIT) Probability August 2018

Page 54: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability as an experimentProbability as an experiment

A machine enters a random state, its current state is an event.

Events, x, have probabilities associated, p(X = x) (p(x) shorthand)

Sets of events, ARandom variables, X, are a function of the event

The probability of two events p(A ∪ B)The probability of either event p(A ∩ B)p(¬x) = 1− p(x)Joint probabilities P(x, y)Independence P(x, y) = P(x)P(y)

Conditional probabilities P(x|y) = P(x,y)p(y)

Law of total probability∑

A a = 1 when events A are a disjoint cover

Andrei Barbu (MIT) Probability August 2018

Page 55: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability as an experimentProbability as an experiment

A machine enters a random state, its current state is an event.

Events, x, have probabilities associated, p(X = x) (p(x) shorthand)Sets of events, A

Random variables, X, are a function of the event

The probability of two events p(A ∪ B)The probability of either event p(A ∩ B)p(¬x) = 1− p(x)Joint probabilities P(x, y)Independence P(x, y) = P(x)P(y)

Conditional probabilities P(x|y) = P(x,y)p(y)

Law of total probability∑

A a = 1 when events A are a disjoint cover

Andrei Barbu (MIT) Probability August 2018

Page 56: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability as an experimentProbability as an experiment

A machine enters a random state, its current state is an event.

Events, x, have probabilities associated, p(X = x) (p(x) shorthand)Sets of events, ARandom variables, X, are a function of the event

The probability of two events p(A ∪ B)The probability of either event p(A ∩ B)p(¬x) = 1− p(x)Joint probabilities P(x, y)Independence P(x, y) = P(x)P(y)

Conditional probabilities P(x|y) = P(x,y)p(y)

Law of total probability∑

A a = 1 when events A are a disjoint cover

Andrei Barbu (MIT) Probability August 2018

Page 57: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability as an experimentProbability as an experiment

A machine enters a random state, its current state is an event.

Events, x, have probabilities associated, p(X = x) (p(x) shorthand)Sets of events, ARandom variables, X, are a function of the event

The probability of two events p(A ∪ B)

The probability of either event p(A ∩ B)p(¬x) = 1− p(x)Joint probabilities P(x, y)Independence P(x, y) = P(x)P(y)

Conditional probabilities P(x|y) = P(x,y)p(y)

Law of total probability∑

A a = 1 when events A are a disjoint cover

Andrei Barbu (MIT) Probability August 2018

Page 58: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability as an experimentProbability as an experiment

A machine enters a random state, its current state is an event.

Events, x, have probabilities associated, p(X = x) (p(x) shorthand)Sets of events, ARandom variables, X, are a function of the event

The probability of two events p(A ∪ B)The probability of either event p(A ∩ B)

p(¬x) = 1− p(x)Joint probabilities P(x, y)Independence P(x, y) = P(x)P(y)

Conditional probabilities P(x|y) = P(x,y)p(y)

Law of total probability∑

A a = 1 when events A are a disjoint cover

Andrei Barbu (MIT) Probability August 2018

Page 59: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability as an experimentProbability as an experiment

A machine enters a random state, its current state is an event.

Events, x, have probabilities associated, p(X = x) (p(x) shorthand)Sets of events, ARandom variables, X, are a function of the event

The probability of two events p(A ∪ B)The probability of either event p(A ∩ B)p(¬x) = 1− p(x)

Joint probabilities P(x, y)Independence P(x, y) = P(x)P(y)

Conditional probabilities P(x|y) = P(x,y)p(y)

Law of total probability∑

A a = 1 when events A are a disjoint cover

Andrei Barbu (MIT) Probability August 2018

Page 60: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability as an experimentProbability as an experiment

A machine enters a random state, its current state is an event.

Events, x, have probabilities associated, p(X = x) (p(x) shorthand)Sets of events, ARandom variables, X, are a function of the event

The probability of two events p(A ∪ B)The probability of either event p(A ∩ B)p(¬x) = 1− p(x)Joint probabilities P(x, y)

Independence P(x, y) = P(x)P(y)

Conditional probabilities P(x|y) = P(x,y)p(y)

Law of total probability∑

A a = 1 when events A are a disjoint cover

Andrei Barbu (MIT) Probability August 2018

Page 61: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability as an experimentProbability as an experiment

A machine enters a random state, its current state is an event.

Events, x, have probabilities associated, p(X = x) (p(x) shorthand)Sets of events, ARandom variables, X, are a function of the event

The probability of two events p(A ∪ B)The probability of either event p(A ∩ B)p(¬x) = 1− p(x)Joint probabilities P(x, y)Independence P(x, y) = P(x)P(y)

Conditional probabilities P(x|y) = P(x,y)p(y)

Law of total probability∑

A a = 1 when events A are a disjoint cover

Andrei Barbu (MIT) Probability August 2018

Page 62: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability as an experimentProbability as an experiment

A machine enters a random state, its current state is an event.

Events, x, have probabilities associated, p(X = x) (p(x) shorthand)Sets of events, ARandom variables, X, are a function of the event

The probability of two events p(A ∪ B)The probability of either event p(A ∩ B)p(¬x) = 1− p(x)Joint probabilities P(x, y)Independence P(x, y) = P(x)P(y)

Conditional probabilities P(x|y) = P(x,y)p(y)

Law of total probability∑

A a = 1 when events A are a disjoint cover

Andrei Barbu (MIT) Probability August 2018

Page 63: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Probability as an experimentProbability as an experiment

A machine enters a random state, its current state is an event.

Events, x, have probabilities associated, p(X = x) (p(x) shorthand)Sets of events, ARandom variables, X, are a function of the event

The probability of two events p(A ∪ B)The probability of either event p(A ∩ B)p(¬x) = 1− p(x)Joint probabilities P(x, y)Independence P(x, y) = P(x)P(y)

Conditional probabilities P(x|y) = P(x,y)p(y)

Law of total probability∑

A a = 1 when events A are a disjoint cover

Andrei Barbu (MIT) Probability August 2018

Page 64: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Beating the lotteryBeating the lottery

Andrei Barbu (MIT) Probability August 2018

Page 65: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Beating the lotteryBeating the lottery

Andrei Barbu (MIT) Probability August 2018

Page 66: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Analyzing a testAnalyzing a test

You want to play the lottery, and have a method to win.

0.5% of tickets are winners, and you have a test to verify this.

You are 85% accurate (5% false positives, 10% false negatives)

Is this test useful? How useful? Should you be betting?

Andrei Barbu (MIT) Probability August 2018

Page 67: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Analyzing a testAnalyzing a test

You want to play the lottery, and have a method to win.

0.5% of tickets are winners, and you have a test to verify this.

You are 85% accurate (5% false positives, 10% false negatives)

Is this test useful? How useful? Should you be betting?

Andrei Barbu (MIT) Probability August 2018

Page 68: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Analyzing a testAnalyzing a test

You want to play the lottery, and have a method to win.

0.5% of tickets are winners, and you have a test to verify this.

You are 85% accurate (5% false positives, 10% false negatives)

Is this test useful? How useful? Should you be betting?

Andrei Barbu (MIT) Probability August 2018

Page 69: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Analyzing a testAnalyzing a test

You want to play the lottery, and have a method to win.

0.5% of tickets are winners, and you have a test to verify this.

You are 85% accurate (5% false positives, 10% false negatives)

Is this test useful? How useful? Should you be betting?

Andrei Barbu (MIT) Probability August 2018

Page 70: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Analyzing a testAnalyzing a test

You want to play the lottery, and have a method to win.

0.5% of tickets are winners, and you have a test to verify this.

You are 85% accurate (5% false positives, 10% false negatives)

Is this test useful? How useful? Should you be betting?

Andrei Barbu (MIT) Probability August 2018

Page 71: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Analyzing a testAnalyzing a test

You want to play the lottery, and have a method to win.

0.5% of tickets are winners, and you have a test to verify this.

You are 85% accurate (5% false positives, 10% false negatives)

Is this test useful? How useful? Should you be betting?

Andrei Barbu (MIT) Probability August 2018

Page 72: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Analyzing a testAnalyzing a test

You want to play the lottery, and have a method to win.

0.5% of tickets are winners, and you have a test to verify this.

You are 85% accurate (5% false positives, 10% false negatives)

Is this test useful? How useful? Should you be betting?

Andrei Barbu (MIT) Probability August 2018

Page 73: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Is this test useful?Is this test useful?

(D−,T+) (D−,T−) (D+,T+) (D+,T−)What percent of the time when my test comes up true am I winner?D+ ∩T+

T+

= 0.9×0.0050.9×0.005+0.995×0.05 = 8.3% =

P(T + |D+)P(D+)P(T+)

P(A|B) =P(B|A)P(A)

P(B)posterior =

likelihood× priorprobability of data

Andrei Barbu (MIT) Probability August 2018

Page 74: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Is this test useful?Is this test useful?

(D−,T+) (D−,T−) (D+,T+) (D+,T−)

What percent of the time when my test comes up true am I winner?D+ ∩T+

T+

= 0.9×0.0050.9×0.005+0.995×0.05 = 8.3% =

P(T + |D+)P(D+)P(T+)

P(A|B) =P(B|A)P(A)

P(B)posterior =

likelihood× priorprobability of data

Andrei Barbu (MIT) Probability August 2018

Page 75: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Is this test useful?Is this test useful?

(D−,T+) (D−,T−) (D+,T+) (D+,T−)What percent of the time when my test comes up true am I winner?

D+ ∩T+T+

= 0.9×0.0050.9×0.005+0.995×0.05 = 8.3% =

P(T + |D+)P(D+)P(T+)

P(A|B) =P(B|A)P(A)

P(B)posterior =

likelihood× priorprobability of data

Andrei Barbu (MIT) Probability August 2018

Page 76: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Is this test useful?Is this test useful?

(D−,T+) (D−,T−) (D+,T+) (D+,T−)What percent of the time when my test comes up true am I winner?D+ ∩T+

T+

= 0.9×0.0050.9×0.005+0.995×0.05 = 8.3% =

P(T + |D+)P(D+)P(T+)

P(A|B) =P(B|A)P(A)

P(B)posterior =

likelihood× priorprobability of data

Andrei Barbu (MIT) Probability August 2018

Page 77: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Is this test useful?Is this test useful?

(D−,T+) (D−,T−) (D+,T+) (D+,T−)What percent of the time when my test comes up true am I winner?D+ ∩T+

T+= 0.9×0.005

0.9×0.005+0.995×0.05

= 8.3% =P(T + |D+)P(D+)

P(T+)

P(A|B) =P(B|A)P(A)

P(B)posterior =

likelihood× priorprobability of data

Andrei Barbu (MIT) Probability August 2018

Page 78: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Is this test useful?Is this test useful?

(D−,T+) (D−,T−) (D+,T+) (D+,T−)What percent of the time when my test comes up true am I winner?D+ ∩T+

T+= 0.9×0.005

0.9×0.005+0.995×0.05 = 8.3%

=P(T + |D+)P(D+)

P(T+)

P(A|B) =P(B|A)P(A)

P(B)posterior =

likelihood× priorprobability of data

Andrei Barbu (MIT) Probability August 2018

Page 79: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Is this test useful?Is this test useful?

(D−,T+) (D−,T−) (D+,T+) (D+,T−)What percent of the time when my test comes up true am I winner?D+ ∩T+

T+= 0.9×0.005

0.9×0.005+0.995×0.05 = 8.3% =P(T + |D+)P(D+)

P(T+)

P(A|B) =P(B|A)P(A)

P(B)posterior =

likelihood× priorprobability of data

Andrei Barbu (MIT) Probability August 2018

Page 80: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Is this test useful?Is this test useful?

(D−,T+) (D−,T−) (D+,T+) (D+,T−)What percent of the time when my test comes up true am I winner?D+ ∩T+

T+= 0.9×0.005

0.9×0.005+0.995×0.05 = 8.3% =P(T + |D+)P(D+)

P(T+)

P(A|B) =P(B|A)P(A)

P(B)posterior =

likelihood× priorprobability of data

Andrei Barbu (MIT) Probability August 2018

Page 81: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Human-Oriented RoboticsProf. Kai Arras

Social Robotics LabCommon Probability Distributions

Bernoulli Distribution

• Given a Bernoulli experiment, that is, a yes/no experiment with outcomes 0 (“failure”)or 1 (“success”)

• The Bernoulli distribution is a discrete proba-bility distribution, which takes value 1 with success probability and value 0 with failure probability 1 –

• Probability mass function

• Notation

0 10

0.5

1

1−h

h

Parameters• : probability of

observing a success

Expectation•

Variance•

37

Page 82: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Binomial Distribution

• Given a sequence of Bernoulli experiments

• The binomial distribution is the discrete probability distribution of the number of successes m in a sequence of N indepen-dent yes/no experiments, each of which yields success with probability

• Probability mass function

• Notation

Human-Oriented RoboticsProf. Kai Arras

Social Robotics LabCommon Probability Distributions

0 10 20 30 400

0.1

0.2

h = 0.5, N = 20h = 0.7, N = 20h = 0.5, N = 40

m

Parameters• N : number of trials• : success probability

Expectation•

Variance•

38

Page 83: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Gaussian Distribution

• Most widely used distribution for continuous variables

• Reasons: (i) simplicity (fully represented by only two moments, mean and variance) and (ii) the central limit theorem (CLT)

• The CLT states that, under mild conditions, the mean (or sum) of many independently drawn random variables is distributed approximately normally, irrespective of the form of the original distribution

• Probability density function

−4 −2 0 2 40

0.5

1

µ = 0, m = 1 µ = −3, m = 0.1 µ = 2, m = 2

Human-Oriented RoboticsProf. Kai Arras

Social Robotics LabCommon Probability Distributions

Parameters• : mean• : variance

Expectation•

Variance•

46

Page 84: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Gaussian Distribution

• Notation

• Called standard normal distribution for µ = 0 and = 1

• About 68% (~two third) of values drawn from a normal distribution are within a range of ±1 standard deviations around the mean

• About 95% of the values lie within a range of ±2 standard deviations around the mean

• Important e.g. for hypothesis testing

Human-Oriented RoboticsProf. Kai Arras

Social Robotics LabCommon Probability Distributions

Parameters• : mean• : variance

Expectation•

Variance•

47

Page 85: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Human-Oriented RoboticsProf. Kai Arras

Social Robotics LabCommon Probability Distributions

Multivariate Gaussian Distribution

• For d-dimensional random vectors, the multivariate Gaussian distribution is governed by a d-dimensional mean vector and a D x D covariance matrix that must be symmetric and positive semi-de#nite

• Probability density function

• Notation

Parameters• : mean vector• : covariance matrix

Expectation•

Variance•

48

Page 86: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Human-Oriented RoboticsProf. Kai Arras

Social Robotics LabCommon Probability Distributions

Multivariate Gaussian Distribution

• For d = 2, we have the bivariate Gaussian distribution

• The covariance matrix (often C) deter-mines the shape of the distribution (video)

Parameters• : mean vector• : covariance matrix

Expectation•

Variance•

49

Page 87: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Human-Oriented RoboticsProf. Kai Arras

Social Robotics LabCommon Probability Distributions

Multivariate Gaussian Distribution

• For d = 2, we have the bivariate Gaussian distribution

• The covariance matrix (often C) deter-mines the shape of the distribution (video)

Parameters• : mean vector• : covariance matrix

Expectation•

Variance•

49

Page 88: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Human-Oriented RoboticsProf. Kai Arras

Social Robotics LabCommon Probability Distributions

Multivariate Gaussian Distribution

• For d = 2, we have the bivariate Gaussian distribution

• The covariance matrix (often C) deter-mines the shape of the distribution (video)

Parameters• : mean vector• : covariance matrix

Expectation•

Variance•

Sour

ce [1

]

50

Page 89: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Human-Oriented RoboticsProf. Kai Arras

Social Robotics LabCommon Probability Distributions

Poisson Distribution

• Consider independent events that happen with an average rate of over time

• The Poisson distribution is a discrete distribution that describes the probability of a given number of events occurring in a !xed interval of time

• Can also be de#ned over other intervals such as distance, area or volume

• Probability mass function

• Notation

0 5 10 15 200

0.1

0.2

0.3

0.4

h = 1h = 4h = 10

Parameters• : average rate of events

over time or space

Expectation•

Variance•

45

Page 90: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Bayesian updatesBayesian updates

P(θ|X) = P(X|θ)P(θ)P(X)

P(θ): I have some prior over how good a player is: informative vsuninformative.

P(X|θ): I think dart throwing is a stochastic process, every player hasan unknown mean.

X: I observe them throwing darts.

P(X): Across all parameters this is how likely the data is.

Normalization is usually hard to compute, but it’s often not needed.

Say P(θ) is a normal distribution with mean 0 and high variance.

And P(X|θ) is also a normal distribution.

What’s the best estimate for this player’s performance?∂∂θ logP(θ|X) = 0

Andrei Barbu (MIT) Probability August 2018

Page 91: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Bayesian updatesBayesian updates

P(θ|X) = P(X|θ)P(θ)P(X)

P(θ): I have some prior over how good a player is: informative vsuninformative.

P(X|θ): I think dart throwing is a stochastic process, every player hasan unknown mean.

X: I observe them throwing darts.

P(X): Across all parameters this is how likely the data is.

Normalization is usually hard to compute, but it’s often not needed.

Say P(θ) is a normal distribution with mean 0 and high variance.

And P(X|θ) is also a normal distribution.

What’s the best estimate for this player’s performance?∂∂θ logP(θ|X) = 0

Andrei Barbu (MIT) Probability August 2018

Page 92: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Bayesian updatesBayesian updates

P(θ|X) = P(X|θ)P(θ)P(X)

P(θ): I have some prior over how good a player is: informative vsuninformative.

P(X|θ): I think dart throwing is a stochastic process, every player hasan unknown mean.

X: I observe them throwing darts.

P(X): Across all parameters this is how likely the data is.

Normalization is usually hard to compute, but it’s often not needed.

Say P(θ) is a normal distribution with mean 0 and high variance.

And P(X|θ) is also a normal distribution.

What’s the best estimate for this player’s performance?∂∂θ logP(θ|X) = 0

Andrei Barbu (MIT) Probability August 2018

Page 93: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Bayesian updatesBayesian updates

P(θ|X) = P(X|θ)P(θ)P(X)

P(θ): I have some prior over how good a player is: informative vsuninformative.

P(X|θ): I think dart throwing is a stochastic process, every player hasan unknown mean.

X: I observe them throwing darts.

P(X): Across all parameters this is how likely the data is.

Normalization is usually hard to compute, but it’s often not needed.

Say P(θ) is a normal distribution with mean 0 and high variance.

And P(X|θ) is also a normal distribution.

What’s the best estimate for this player’s performance?∂∂θ logP(θ|X) = 0

Andrei Barbu (MIT) Probability August 2018

Page 94: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Bayesian updatesBayesian updates

P(θ|X) = P(X|θ)P(θ)P(X)

P(θ): I have some prior over how good a player is: informative vsuninformative.

P(X|θ): I think dart throwing is a stochastic process, every player hasan unknown mean.

X: I observe them throwing darts.

P(X): Across all parameters this is how likely the data is.

Normalization is usually hard to compute, but it’s often not needed.

Say P(θ) is a normal distribution with mean 0 and high variance.

And P(X|θ) is also a normal distribution.

What’s the best estimate for this player’s performance?∂∂θ logP(θ|X) = 0

Andrei Barbu (MIT) Probability August 2018

Page 95: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Bayesian updatesBayesian updates

P(θ|X) = P(X|θ)P(θ)P(X)

P(θ): I have some prior over how good a player is: informative vsuninformative.

P(X|θ): I think dart throwing is a stochastic process, every player hasan unknown mean.

X: I observe them throwing darts.

P(X): Across all parameters this is how likely the data is.

Normalization is usually hard to compute, but it’s often not needed.

Say P(θ) is a normal distribution with mean 0 and high variance.

And P(X|θ) is also a normal distribution.

What’s the best estimate for this player’s performance?∂∂θ logP(θ|X) = 0

Andrei Barbu (MIT) Probability August 2018

Page 96: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Bayesian updatesBayesian updates

P(θ|X) = P(X|θ)P(θ)P(X)

P(θ): I have some prior over how good a player is: informative vsuninformative.

P(X|θ): I think dart throwing is a stochastic process, every player hasan unknown mean.

X: I observe them throwing darts.

P(X): Across all parameters this is how likely the data is.

Normalization is usually hard to compute, but it’s often not needed.

Say P(θ) is a normal distribution with mean 0 and high variance.

And P(X|θ) is also a normal distribution.

What’s the best estimate for this player’s performance?∂∂θ logP(θ|X) = 0

Andrei Barbu (MIT) Probability August 2018

Page 97: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Bayesian updatesBayesian updates

P(θ|X) = P(X|θ)P(θ)P(X)

P(θ): I have some prior over how good a player is: informative vsuninformative.

P(X|θ): I think dart throwing is a stochastic process, every player hasan unknown mean.

X: I observe them throwing darts.

P(X): Across all parameters this is how likely the data is.

Normalization is usually hard to compute, but it’s often not needed.

Say P(θ) is a normal distribution with mean 0 and high variance.

And P(X|θ) is also a normal distribution.

What’s the best estimate for this player’s performance?∂∂θ logP(θ|X) = 0

Andrei Barbu (MIT) Probability August 2018

Page 98: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Bayesian updatesBayesian updates

P(θ|X) = P(X|θ)P(θ)P(X)

P(θ): I have some prior over how good a player is: informative vsuninformative.

P(X|θ): I think dart throwing is a stochastic process, every player hasan unknown mean.

X: I observe them throwing darts.

P(X): Across all parameters this is how likely the data is.

Normalization is usually hard to compute, but it’s often not needed.

Say P(θ) is a normal distribution with mean 0 and high variance.

And P(X|θ) is also a normal distribution.

What’s the best estimate for this player’s performance?∂∂θ logP(θ|X) = 0

Andrei Barbu (MIT) Probability August 2018

Page 99: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Bayesian updatesBayesian updates

P(θ|X) = P(X|θ)P(θ)P(X)

P(θ): I have some prior over how good a player is: informative vsuninformative.

P(X|θ): I think dart throwing is a stochastic process, every player hasan unknown mean.

X: I observe them throwing darts.

P(X): Across all parameters this is how likely the data is.

Normalization is usually hard to compute, but it’s often not needed.

Say P(θ) is a normal distribution with mean 0 and high variance.

And P(X|θ) is also a normal distribution.

What’s the best estimate for this player’s performance?

∂∂θ logP(θ|X) = 0

Andrei Barbu (MIT) Probability August 2018

Page 100: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Bayesian updatesBayesian updates

P(θ|X) = P(X|θ)P(θ)P(X)

P(θ): I have some prior over how good a player is: informative vsuninformative.

P(X|θ): I think dart throwing is a stochastic process, every player hasan unknown mean.

X: I observe them throwing darts.

P(X): Across all parameters this is how likely the data is.

Normalization is usually hard to compute, but it’s often not needed.

Say P(θ) is a normal distribution with mean 0 and high variance.

And P(X|θ) is also a normal distribution.

What’s the best estimate for this player’s performance?∂∂θ logP(θ|X) = 0

Andrei Barbu (MIT) Probability August 2018

Page 101: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Graphical modelsGraphical models

So far we’ve talked about independence, conditioning, andobservation.

A toolkit to discuss these at a higher level of abstraction.

Andrei Barbu (MIT) Probability August 2018

Page 102: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Graphical modelsGraphical models

So far we’ve talked about independence, conditioning, andobservation.

A toolkit to discuss these at a higher level of abstraction.

Andrei Barbu (MIT) Probability August 2018

Page 103: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Graphical modelsGraphical models

So far we’ve talked about independence, conditioning, andobservation.

A toolkit to discuss these at a higher level of abstraction.

Andrei Barbu (MIT) Probability August 2018

Page 104: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Graphical modelsGraphical models

So far we’ve talked about independence, conditioning, andobservation.

A toolkit to discuss these at a higher level of abstraction.

Andrei Barbu (MIT) Probability August 2018

Page 105: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Speech recognition: Naïve bayesSpeech recognition: Naïve bayes

Break down a speech signal into parts.

Recover the original speech

Andrei Barbu (MIT) Probability August 2018

Page 106: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Speech recognition: Naïve bayesSpeech recognition: Naïve bayes

Break down a speech signal into parts.

Recover the original speech

Andrei Barbu (MIT) Probability August 2018

Page 107: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Speech recognition: Naïve bayesSpeech recognition: Naïve bayes

Break down a speech signal into parts.

Recover the original speech

Andrei Barbu (MIT) Probability August 2018

Page 108: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Speech recognition: Naïve bayesSpeech recognition: Naïve bayes

Create a set of features, each sound is composed of combinations offeatures.

P(c|X) ∝∏

KP(Xk|c)P(c)

Andrei Barbu (MIT) Probability August 2018

Page 109: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Speech recognition: Naïve bayesSpeech recognition: Naïve bayes

Create a set of features, each sound is composed of combinations offeatures.

P(c|X) ∝∏

KP(Xk|c)P(c)

Andrei Barbu (MIT) Probability August 2018

Page 110: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Speech recognition: Naïve bayesSpeech recognition: Naïve bayes

Create a set of features, each sound is composed of combinations offeatures.

P(c|X) ∝∏

KP(Xk|c)P(c)

Andrei Barbu (MIT) Probability August 2018

Page 111: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Speech recognition: Gaussian mixture modelSpeech recognition: Gaussian mixture model

Andrei Barbu (MIT) Probability August 2018

Page 112: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Speech recognition: Gaussian mixture modelSpeech recognition: Gaussian mixture model

Andrei Barbu (MIT) Probability August 2018

Page 113: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Speech recognition: Gaussian mixture modelSpeech recognition: Gaussian mixture model

Andrei Barbu (MIT) Probability August 2018

Page 114: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Speech recognition: Hidden Markov modelSpeech recognition: Hidden Markov model

Andrei Barbu (MIT) Probability August 2018

Page 115: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

Speech recognition: Hidden Markov modelSpeech recognition: Hidden Markov model

Andrei Barbu (MIT) Probability August 2018

Page 116: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

SummarySummary

Probabilities defined in terms of events

Random variables and their distributions

Reasoning with probabilities and Bayes’ rule

Updating our knowledge over time

Graphical models to reason abstractly

A quick tour of how we would build a more complex model

Andrei Barbu (MIT) Probability August 2018

Page 117: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

SummarySummary

Probabilities defined in terms of events

Random variables and their distributions

Reasoning with probabilities and Bayes’ rule

Updating our knowledge over time

Graphical models to reason abstractly

A quick tour of how we would build a more complex model

Andrei Barbu (MIT) Probability August 2018

Page 118: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

SummarySummary

Probabilities defined in terms of events

Random variables and their distributions

Reasoning with probabilities and Bayes’ rule

Updating our knowledge over time

Graphical models to reason abstractly

A quick tour of how we would build a more complex model

Andrei Barbu (MIT) Probability August 2018

Page 119: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

SummarySummary

Probabilities defined in terms of events

Random variables and their distributions

Reasoning with probabilities and Bayes’ rule

Updating our knowledge over time

Graphical models to reason abstractly

A quick tour of how we would build a more complex model

Andrei Barbu (MIT) Probability August 2018

Page 120: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

SummarySummary

Probabilities defined in terms of events

Random variables and their distributions

Reasoning with probabilities and Bayes’ rule

Updating our knowledge over time

Graphical models to reason abstractly

A quick tour of how we would build a more complex model

Andrei Barbu (MIT) Probability August 2018

Page 121: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

SummarySummary

Probabilities defined in terms of events

Random variables and their distributions

Reasoning with probabilities and Bayes’ rule

Updating our knowledge over time

Graphical models to reason abstractly

A quick tour of how we would build a more complex model

Andrei Barbu (MIT) Probability August 2018

Page 122: Intro to Probability - MachinesIntro to Probability Andrei Barbu Andrei Barbu (MIT) Probability August 2018

SummarySummary

Probabilities defined in terms of events

Random variables and their distributions

Reasoning with probabilities and Bayes’ rule

Updating our knowledge over time

Graphical models to reason abstractly

A quick tour of how we would build a more complex model

Andrei Barbu (MIT) Probability August 2018