a. darwiche learning in bayesian networks. a. darwiche known structure complete data known structure...

37
A. Darwich Learning in Bayesian Learning in Bayesian Networks Networks

Upload: frederick-lindsey

Post on 18-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Learning in Bayesian NetworksLearning in Bayesian Networks

Page 2: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Known StructureComplete Data

Known StructureIncomplete Data

Unknown StructureComplete Data

Unknown StructureIncomplete Data

Learning

The Learning ProblemThe Learning Problem

Page 3: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Known Structure Complete DataKnown Structure Complete Data

Page 4: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Known Structure Incomplete DataKnown Structure Incomplete Data

Page 5: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Unknown Structure Complete DataUnknown Structure Complete Data

Page 6: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Unknown Structure Incomplete DataUnknown Structure Incomplete Data

Page 7: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Known StructureKnown Structure

Method A

CPTs A

Method B

CPTs B

Page 8: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Known StructureKnown Structure

= PrA

+CPTs

A= PrB

+CPTs B

Which probability distribution should we choose?

Common criterion: Choose distribution that maximizes

likelihood of data

Page 9: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Known StructureKnown Structure

= PrA

+CPTs

A= PrB

+CPTs B

d1

d6

Data D

PrA (D) = PrA (d1) … PrA (dm)

Likelihood of data given PrA

PrB (D) = PrB (d1) … PrB (dm)

Likelihood of data given PrB

Page 10: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Maximizing Likelihood of DataMaximizing Likelihood of Data

• Complete Data: Unique set of CPTs which maximize likelihood of data

• Incomplete Data: No Unique set of CPTs which maximize likelihood of data

Page 11: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Maximizing Likelihood of DataMaximizing Likelihood of Data

• Complete Data: Unique set of CPTs which maximize likelihood of data

• Incomplete Data: No Unique set of CPTs which maximize likelihood of data

Page 12: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Known Structure, Complete DataKnown Structure, Complete DataData D

d1

d6

òêdjbc= Count(bc;D)Count(dbc;D)

Estimated parameter:

Number of data points di with d b c

Number of data points di with b c=

Page 13: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Known Structure, Complete DataKnown Structure, Complete DataData D

d1

d6

òêdjbc= Count(bc;D)Count(dbc;D)

Estimated parameter:

= Pj=1m I (bc;dj)

Pj=1m I (dbc;dj )

Page 14: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

ComplexityComplexity

• Network with:– Nodes: n– Parameters: k– Data points: m

• Time complexity: O(m k n)(straightforward implementation)

• Space complexity: O(k)parameter count

Page 15: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Known Structure, Incomplete DataKnown Structure, Incomplete Data

EM Algorithm (Expectation-Maximization):-Initial CPTs to random values-Repeat until convergence:

-Estimate parameters using current CPTs (E-step)-Update CPTs using estimates (M-step)

Page 16: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Known Structure, Incomplete DataKnown Structure, Incomplete Data

òêdjbc= Pj=1m Pri(bcjdj)

Pj=1m Pri(dbcjdj )

Estimated parameters at iteration i+1 (using the CPTs at iteration i):

Pr0 corresponds to the initial Bayesian network (random CPTs)

Page 17: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

EM AlgorithmEM Algorithm

• Likelihood of data cannot get smaller after an iteration

• Algorithm is not guaranteed to return the network which absolutely maximizes likelihood of data

• It is guaranteed to return a local maxima: Random re-starts

• Algorithm is stopped when – change in likelihood gets very small

– Change in parameters gets very small

Page 18: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

ComplexityComplexity• Network with:

– Nodes: n– Parameters: k– Data points: m– Treewidth: w

• Time complexity (per iteration): O(m k n 2w)(straightforward implementation)

• Space complexity: O(k + n 2w)parameter count + space for inference

Page 19: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Collaborative FilteringCollaborative Filtering

• Collaborative Filtering (CF) finds items of interest to a user based on the preferences of other similar users.– Assumes that human behavior is predictable

Page 20: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Where is it used?Where is it used?• E-commerce

– Recommend products based on previous purchases or click-stream behavior

– Ex: Amazon.com

• Information sites– Rate items based on

previous user ratings

– Ex: MovieLens, Jester

Page 21: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

John 5 - 3 2

Sam - 4 1 5

Cindy 3 - 5 -

Bob 5 1 - -

Bob 5 1 3.5 1.7

CF

Page 22: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Memory-based AlgorithmsMemory-based Algorithms

• Use the entire database of user ratings to make predictions.– Find users with similar voting histories to the

active user.– Use these users’ votes to predict ratings for

products not voted on by the active user.

Page 23: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Model-based AlgorithmsModel-based Algorithms

• Construct a model from the vote database.

• Use the model to predict the active user’s ratings.

Page 24: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Bayesian ClusteringBayesian Clustering

• Use a Naïve Bayes network to model the vote database.

• m vote variables: one for each title.– Represent discrete vote values.

• 1 “cluster” variable– Represents user personalities

Page 25: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

05.

35.

1.

5.

)Pr(

4

3

2

1

c

c

c

c

cC

6.5

25.2

3.1

)|Pr(

4

1

1

c

c

c

cvCv kk

Naïve BayesNaïve Bayes

C

V1 V2 V3 Vm…

Page 26: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

C

V1 V2 V3 Vm…

05.

35.

1.

5.

)Pr(

4

3

2

1

c

c

c

c

cC

6.5

25.2

3.1

)|Pr(

4

1

1

c

c

c

cvCv kk

Page 27: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

• Inference– Evidence: known votes vk for titles k I

– Query: title j for which we need to predict vote

• Expected value of vote:

w

hkjj Ikvhvhp

1

):|Pr(

C

V1 V2 V3 Vm…

Page 28: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

LearningLearning• Simplified Expectation Maximization (EM)

Algorithm with partial data

• Initialize CPTs with random values subject to the following constraints:

)Pr(cc )|Pr(| cvkcvk

1C

c 1|

k

k

vcv

Page 29: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

DatasetsDatasets• MovieLens

– 943 users; 1682 titles; 100,000 votes (1..5); explicit voting

• MS Web – website visits– 610 users; 294 titles; 8,275 votes (0,1) :

null votes => 0 : 179,340 votes; implicit voting

Page 30: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

0

200

400

600

800

0 5 10 15

Iteration

To

tal A

bso

lute

Ch

ang

e

• Learning curve for MovieLens Dataset

Page 31: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

ProtocolsProtocols

• User database is divided into: 80% training set and 20% test set.– One-by-one select a user from the test set to be the

active user.– Predict some of their votes based on remaining

votes

Page 32: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

• All-But-One

• Given-{Two, Five, Ten}

Qe eIa e e e e e e e ee e e

Q eeIa Q Q Q Q Q Q QQ Q Q Q

e e e e eQIa Q Q Q Q Q QQ Q

e eee QeIa QQ Q e e ee e

Page 33: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Evaluation MetricEvaluation Metric

• Average Absolute Deviation

• Ranked Scoring

Pja

jaja vpP ,

,,

1

Page 34: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

ResultsResults• Experiments were run 5 times and averaged

• Movielens

Algorithm Given-Two Given-Five Given-Ten All-But-One

Correlation 1.019 .916 .865 .806

VecSim .948 .878 .843 .799

BC(9) .771 .765 .763 .753

Page 35: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

• MS Web

Algorithm Given-Two Given-Five Given-Ten All-But-One

Correlation 0.105 0.0911 0.0844 0.0673

VecSim 0.101 0.0885 0.0818 0.0675

BC(9) 0.0652 0.0652 0.0649 0.0507

Page 36: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Computational IssuesComputational Issues

• Prediction time: (Memory-based) 10 minutes per experiment; (Model-based) 2 minutes

• Learning time: 20 minutes per iteration

• n: number of data point; m: number of titles; w: number of votes per title;|C| number of personality types

Algorithm Prediction Time Learning Time Space

Memory-based O(n*m) N/A O(n*m)

Model-based O(|C|*m) O(n*m*|C|*w) O(|C|*m*w)

Page 37: A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown

A. Darwiche

Demo of SamIamDemo of SamIam

• Building networks:– Nodes, Edges– CPTs

• Inference:– Posterior marginals– MPE– MAP

• Learning: EM• Sensitivity Engine