bayesian wrap-up (probably). 5 minutes of math... marginal probabilities if you have a joint pdf:......

22
Bayesian Wrap-Up (probably)

Upload: barrie-richardson

Post on 17-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

Bayesian Wrap-Up(probably)

Page 2: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

5 minutes of math...•Marginal probabilities

•If you have a joint PDF:

•... and want to know about the probability of just one RV (regardless of what happens to the others)

•Marginal PDF of or :

Page 3: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

5 minutes of math...•Conditional probabilities

•Suppose you have a joint PDF, f(H,W)

•Now you get to see one of the values, e.g., H=“183cm”

•What’s your probability estimate of W, given this new knowledge?

Page 4: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

5 minutes of math...•Conditional probabilities

•Suppose you have a joint PDF, f(H,W)

•Now you get to see one of the values, e.g., H=“183cm”

•What’s your probability estimate of A, given this new knowledge?

Page 5: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

5 minutes of math...•From cond prob. rule, it’s 2 steps to Bayes’

rule:

•(Often helps algebraically to think of “given that” operator, “|”, as a division operation)

Page 6: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

Everything’s random...•Basic Bayesian viewpoint:

•Treat (almost) everything as a random variable

•Data/independent var: X vector

•Class/dependent var: Y

•Parameters: Θ

•E.g., mean, variance, correlations, multinomial params, etc.

•Use Bayes’ Rule to assess probabilities of classes

•Allows us to say: “It is is very unlikely that the mean height is 2 light years”

Page 7: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

Uncertainty over params•Maximum likelihood treats parameters as

(unknown) constants

•Job is just to pick the constants so as to maximize data likelihood

•Fullblown Bayesian modeling treats params as random variables

•PDF over parameter variables tells us how certain/uncertain we are about the location of that parameter

•Also allows us to express prior beliefs (probabilities) about params

Page 8: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

Example: Coin flipping•Have a “weighted” coin -- want to figure out

θ=Pr[heads]

•Maximum likelihood:

•Flip coin a bunch of times, measure #heads; #tails

•Use estimator to return a single value for θ

•Bayesian (MAP):

•Start w/ distribution over what θ might be

•Flip coin a bunch of times, measure #heads; #tails

•Update distribution, but never reduce to a single number

Page 9: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

Example: Coin flipping

?

??

??

?

?

0 flips total

Page 10: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

Example: Coin flipping

1 flip total

Page 11: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

Example: Coin flipping

5 flips total

Page 12: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

Example: Coin flipping

10 flips total

Page 13: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

Example: Coin flipping

20 flips total

Page 14: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

Example: Coin flipping

50 flips total

Page 15: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

Example: Coin flipping

100 flips total

Page 16: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

How does it work?•Think of parameters as just another kind

of random variable

•Now your data distribution is

•This is the generative distribution

•A.k.a. observation distribution, sensor model, etc.

•What we want is some model of parameter as a function of the data

•Get there with Bayes’ rule:

Page 17: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

What does that mean?•Let’s look at the parts:

•Generative distribution

•Describes how data is generated by the underlying process

•Usually easy to write down (well, easier than the other parts, anyway)

•Same old PDF/PMF we’ve been working with

•Can be used to “generate” new samples of data that “look like” your training data

Page 18: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

What does that mean?•The parameter prior or a priori distribution:

•Allows you to say “this value of is more likely than that one is...”

•Allows you to express beliefs/assumptions/ preferences about the parameters of the system

•Also takes over when the data is sparse (small N)

•In the limit of large data, prior should “wash out”, letting the data dominate the estimate of the parameter

•Can let be “uniform” (a.k.a., “uninformative”) to minimize its impact

Page 19: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

What does that mean?•The data prior:

•Expresses the probability of seeing data set X independent of any particular model

•Huh?

Page 20: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

What does that mean?•The data prior:

•Expresses the probability of seeing data set X independent of any particular model

•Can get it from the joint data/parameter model:

•In practice, often don’t need it explicitly (why?)

Page 21: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

What does that mean?•Finally, the posterior (or a posteriori)

distribution:

•Lit., “from what comes after” or “after the fact” (Latin)

•Essentially, “What we believe about the parameter after we look at the data”

•As compared to the “prior” or “a priori” (lit., “from what is before” or “before the fact”) parameter distribution,

Page 22: Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one

Exercise•Suppose you want to estimate the average air

speed of an unladen (African) swallow

•Let’s say that airspeeds of individual swallows, x, are Gaussianly distributed with mean and variance 1:

•Let’s say, also, that we think the mean is “around” 50 kph, but we’re not sure exactly what it is. But our uncertainty (variance) is 10 kph.

•Derive the posterior estimate of the mean airspeed.