cs b 351 l earning p robabilistic m odels. m otivation past lectures have studied how to infer...

CS B351LEARNING PROBABILISTIC MODELS

MOTIVATION

Past lectures have studied how to infer characteristics of a distribution, given a fully-specified Bayes net

Next few lectures: where does the Bayes net come from?

Strength Opponent Strength

Offense strength

Opp. Off.

Strength

Defense strength

Opp. Def.

Strength

Pass yds

Rush yds Rush yds

allowed

Score allowed

Offense strength

Opp. Off.

Strength

Defense strength

Opp. Def.

Strength

Pass yds

Rush yds Rush yds

allowed

Score allowed

Strength of

schedule

At Home

Injuries?Opp

injuries?

Offense strength

Opp. Off.

Strength

Defense strength

Opp. Def.

Strength

Pass yds

Rush yds Rush yds

allowed

Score allowed

Strength of

schedule

At Home

Injuries?Opp

injuries?

AGENDA

Learning probability distributions from example data

Influence of structure on performance Maximum likelihood estimation (MLE) Bayesian estimation

PROBABILISTIC ESTIMATION PROBLEM

Our setting: Given a set of examples drawn from the target

distribution Each example is complete (fully observable)

Goal: Produce some representation of a belief state so

we can perform inferences & draw certain predictions

DENSITY ESTIMATION

Given dataset D={d[1],…,d[M]} drawn from underlying distribution P*

Find a distribution that matches P* as “close” as possible

High-level issues: Usually, not enough data to get an accurate

picture of P*, which forces us to approximate. Even if we did have P*, how do we define

“closeness” (both theoretically and in practice)? How do we maximize “closeness”?

WHAT CLASS OF PROBABILITY MODELS?

For small discrete distributions, just use a tabular representation Very efficient learning techniques

For large discrete distributions or continuous ones, the choice of probability model is crucial Increasing complexity =>

Can represent complex distributions more accurately Need more data to learn well (risk of overfitting) More expensive to learn and to perform inference

TWO LEARNING PROBLEMS

Parameter learning What entries should be put into the model’s

probability tables? Structure learning

Which variables should be represented / transformed for inclusion in the model?

What direct / indirect relationships between variables should be modeled?

More “high level” problem Once structure is chosen, a set of (unestimated)

parameters emerge These need to be estimated using parameter learning

LEARNING COIN FLIPS Cherry and lime candies are in an opaque

bag Observe that c out of N draws are cherries

(data)

LEARNING COIN FLIPS Observe that c out of N draws are cherries

(data) Intuition: c/N might be a good hypothesis for

the fraction of cherries in the bag(or it might not, depending on the draw!)

“Intuitive” parameter estimate: empirical distribution P(cherry) c / N(this will be justified more thoroughly later)

STRUCTURE LEARNING EXAMPLE: HISTOGRAM BUCKET SIZES

Histograms are used to estimate distributions of continuous or large #s of discrete values… but how fine?

0 20 40 60 80 100

012345678

0 20 40 60 80 1001201401601802000

0 100 2000

0 16 32 48 64 80 96 112

STRUCTURE LEARNING: INDEPENDENCE RELATIONSHIPS

Compare table P(A,B,C,D) vs P(A)P(B)P(C)P(D)

Case 1: 15 free parameters (16 entries – sum to 1 constraint) P(ABCD) = p1

P(ABCD) = p2

… P(ABCD) = p15

P(ABCD) = 1-p1-…-p15

Case 2: 4 free parameters P(A)=p1, P(A)=1-p1

P(D)=p4, P(D)=1-p4

STRUCTURE LEARNING: INDEPENDENCE RELATIONSHIPS

Compare table P(A,B,C,D) vs P(A)P(B)P(C)P(D)

P(A,B,C,D) Would be able to fit ALL relationships in the data

P(A)P(B)P(C)P(D) Inherently does not have the capability to

accurately model correlations like A~=B Leads to biased estimates: overestimate or

underestimate the true probabilities

Original joint distribution P(X,Y)Learned using independence

assumption P(X)P(Y)

STRUCTURE LEARNING: EXPRESSIVE POWER

Making more independence assumptions always makes a probabilistic model less expressive

If the independence relationships assumed by structure model A are a superset of those in structure B, then B can express any probability distribution that A can

F1 F2 Fk

ARCS DO NOT NECESSARILY ENCODE CAUSALITY!

2 BN’s that can encode the same joint probability distribution

READING OFF INDEPENDENCE RELATIONSHIPS

Given B, does the value of A affect the probability of C? P(C|B,A) = P(C|B)?

No! C parent’s (B) are

given, and so it is independent of its non-descendents (A)

Independence is symmetric:C A | B => A C | B

LEARNING IN THE FACE OF NOISY DATA

Ex: flip two independent coins Dataset of 20 flips: 3 HH, 6 HT, 5 TH, 6 TT

Model 1

Model 2

Model 1

Model 2

Parameters estimated via empirical distribution (“Intuitive fit”)

P(X=H) = 9/20P(Y=H) = 8/20

P(X=H) = 9/20P(Y=H|X=H) = 3/9P(Y=H|X=T) = 5/11

Model 1

Model 2

Parameters estimated via empirical distribution (“Intuitive fit”)

P(X=H) = 9/20P(Y=H) = 8/20

P(X=H) = 9/20P(Y=H|X=H) = 3/9P(Y=H|X=T) = 5/11 Errors are

likely to be larger!

STRUCTURE LEARNING: FIT VS COMPLEXITY

Must trade off fit of data vs. complexity of model

Complex models More parameters to learn More expressive More data fragmentation = greater sensitivity

to noise

STRUCTURE LEARNING: FIT VS COMPLEXITY

Must trade off fit of data vs. complexity of model

Complex models More parameters to learn More expressive More data fragmentation = greater sensitivity

to noise

Typical approaches explore multiple structures, while optimizing the trade off between fit and complexity

Need a way of measuring “complexity” (e.g., number of edges, number of parameters) and “fit”

cs b 351 l earning p robabilistic m odels. m otivation past lectures have studied how to infer...

p4 structure learning

offense strengthopp

strengthdefense strengthopp

distributions of continuous

opp injuries

small discrete distributions

complex distributions

underlying distribution

Documents

rinciples and odels or organizing the function

detecting parameter r edundancy in ecological ...

powertrain and thermal system simulation m odels of a high...

m odels of workplace learning in europe

c ontemporary m odels of d evelopment and u nderdevelopment...

a unified framework for semantic matching in architectural...

e Àolutio de usiess odels et : uels ipats usiess

growing your o wn wp role m odels committed to brilliance!

utra fdd nvestigation of oise odels

m otivation & g oal s etting value = goals = behavior =...

modo ( “ m odels o f d iabetes and o besity”)...

m arkov m odels & pos t agging nazife dimililer 23/10/2012

p robabilistic i nference. a genda random variables bayes...

p robabilistic i nference. a genda conditional probability...

1 t alent m otivation s urvey technical white collar workers

otivation -f irst entury awn o f the - university of...

m otivation in evangelism

t ools /m odels for e ngage ny vanessa walker pvsd...

imulation odels of machine lements as ... - design …

aggregated combat odels - faculty