bayesian networks 4 th, december 2009 presented by kwak, nam-ju the slides are based on, 2nd ed.,...

43
Bayesian Networks 4 th , December 2009 Presented by Kwak, Nam-ju The slides are based on <Data Mining : Practical Learning Tools and Techniques>, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials are from the official lecture slides of the book.

Upload: terence-hensley

Post on 26-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Bayesian Networks4th, December 2009

Presented by Kwak, Nam-ju

The slides are based on<Data Mining : Practical Learning Tools and Techniques>, 2nd ed.,

written by Ian H. Witten & Eibe Frank.Images and Materials are from the official lecture slides of the book.

Page 2: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Table of Contents• Probability Estimate vs. Prediction• What is Bayesian Network?• A Simple Example• A Complex One• Why does it work?• Learning Bayesian Networks• Overfitting• Searching for a Good Network Structure• K2 Algorithm• Other Algorithms• Conditional Likelihood• Data Structures for Fast Learning

Page 3: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Probability Estimate vs. Prediction

• Naïve Bayes classifier, logistic regression models: probability estimates

• For each class, they estimate the probability that a given instance belongs to that class.

Page 4: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Probability Estimate vs. Prediction

• Why probability estimates are useful?– They allow predictions to be ranked.– Treat classification learning as the task of learn-

ing class probability estimates from the data.

• What is being estimated is– The conditional probability distribution of the val-

ues of the class attribute given the values of the other attributes.

Page 5: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Probability Estimate vs. Prediction

• In this way, Naïve Bayes classifiers, logistic regression models and decision trees are ways of representing a conditional probability distribution.

Page 6: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

What is Bayesian Network?• A theoretically well-founded way of repre-

senting probability distributions concisely and comprehensively in a graphical manner.

• They are drawn as a network of nodes, one for each attribute, connected by directed edges in such a way that there are no cy-cles.– A directed acyclic graph

Page 7: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials
Page 8: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

A Simple Example

Pr[outlook=rainy | play=no]

Summed up into 1

Page 9: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials
Page 10: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

A Complex One• When outlook=rainy, temper-

ature=cool, humidity=high, and windy=true…

• Let’s call E the situation given above.

Page 11: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

A Complex One• E: rainy, cool, high, and true• Pr[play=no, E] = 0.0025• Pr[play=yes, E] = 0.0077

Multiply all those!!

An additional example of the calculation

Page 12: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

A Complex One• E: rainy, cool, high, and true• Pr[play=no, E] = 0.0025• Pr[play=yes, E] = 0.0077

Page 13: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

A Complex One

Summed up into 1

Page 14: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Why does it work?• Terminologies

– T: all the nodes, P: parents, D: descendant– Non-descendant: T-D Non-descendants

Page 15: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Why does it work?• Assumption (conditional independence)

– Pr[node | parents plus any other set of non-descendants]

= Pr[node | parents]

• Chain rule

• The nodes are ordered to give all ancestors of a node ai indices smaller than i. It’s possible since the network is acyclic.

Page 16: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Why does it work?

Ok, that’s what I’m talking about!!!

Page 17: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Learning Bayesian Networks• Basic components of algorithms for learning

Bayesian networks:– Methods for evaluating the goodness of a given

network– Methods for searching through space of possible

networks

Page 18: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Learning Bayesian Networks• Methods for evaluating the goodness of a given

network– Calculate the probability that the network accords

to each instance and multiply these probabilities all together.

– Alternatively, use the sum of logarithms.• Methods for searching through space of possible

networks– Search through the space of possible sets of

edges.

Page 19: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Overfitting• While maximizing the log-likelihood based on the

training data, the resulting network may overfit. What are the solutions?– Cross-validation: training instances and validation

instances (similar to ‘early stopping’ in learning of neural networks)

– Penalty for the complexity of the network– Assign a prior distribution over network structures

and find the most likely network using the proba-bility by the data.

Page 20: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Overfitting• Penalty for the complexity of the network

– Based on the total # of independent estimates in all the probability tables, which is called the # of parameters

Page 21: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Overfitting• Penalty for the complexity of the network

– K: the # of parameters– LL: log-likelihood– N: the # of instances in the training data– AIC score = -LL+K– MDL score = -LL+(K/2)logN– Those two scores are supposed to be minimized.

Akaike Information Criterion

Minimum Description Length

Page 22: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Overfitting• Assign a prior distribution over network struc-

tures and find the most likely network by combining its prior probability with the prob-ability accorded to the network by the data.

Page 23: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Searching fora Good Network Structure

• The probability of a single instance is the product of all the individual probabilities from the various conditional probability tables.

• The product can be rewritten to group to-gether all factors relating to the same table.

• Log-likelihood can also be grouped in such a way.

Page 24: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Searching fora Good Network Structure

• Therefore log-likelihood can be optimized separately for each node.

• This can be done by adding, or removing edges from other nodes to the node being optimized. (without making cycles)

Which one is the best?

Page 25: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Searching fora Good Network Structure

• AIC and MDL can be dealt with in a similar way since they can be split into several components, one for each node.

Page 26: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

K2 Algorithm• Starts with given ordering of nodes (at-

tributes)• Processes each node in turn• Greedily tries adding edges from previous

nodes to current node• Moves to next node when current node can’t

be optimized further

Result depends on the initial order

Page 27: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

K2 Algorithm• Some tricks

– Use Naïve Bayes classifier as a starting point.– Ensure that every node is in the Marcov blanket

of the class node. (Marcov blanket: parents, chil-dren, and children’s parents)

Naïve Bayesian Classifier Marcov blanket

Pictures from Wikipedia and http://nltk.googlecode.com/svn/trunk/doc/book/ch06.html

Page 28: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Other Algorithms• Extended K2 – sophisticated but slow

– Do not order the nodes.– Greedily add or delete edges between arbitrary

pairs of nodes.

• Tree Augmented Naïve Bayes (TAN)

Page 29: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Other Algorithms• Tree Augmented Naïve Bayes (TAN)

– Augment a tree to a Naïve Bayes classifier.– When the class node and its outgoing edges are

eliminated, the remaining edges should form a tree. Naïve Bayes classifier

Tree

Pictures from http://www.usenix.org/events/osdi04/tech/full_papers/cohen/cohen_html/index.html

Page 30: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Other Algorithms• Tree Augmented Naïve Bayes (TAN)

– MST of the network will be a clue for maximizing likelihood.

Page 31: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Conditional Likelihood• What we actually need to know is the condi-

tional likelihood, which is the conditional probability of the class given the other at-tributes.

• However, what we’ve tried to maximize is, in fact, just the likelihood.

O

X

Page 32: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Conditional Likelihood• Computing the conditional likelihood for a

given network and dataset is straightforward.• This is what logistic regression does.

Page 33: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Data Structures for Fast Learning

• Learning Bayesian networks involves a lot of counting.

• For each network structure to be searched, the data must be scanned to get the condi-tional probability tables. (Since the ‘given term’ of the table of a certain node changes frequently, we should rescan the data in or-der to get the brand new conditional probabil-ities many times.)

Page 34: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Data Structures for Fast Learning

• Use a general hash tables.– Assuming that there are 5 attributes, 2 with 3 val-

ues and 3 with 2 values.– There’re 4*4*3*3*3=432 possible categories.– This calculation includes cases of missing values.

(i.e. null)– This can cause memory problems.

Page 35: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Data Structures for Fast Learning

• AD (all-dimensions) tree

– Using a general hash table, there will be 3*3*3=27 categories, even though only 8 cate-gories are actually used.

Page 36: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Data Structures for Fast Learning

• AD (all-dimensions) tree

Only 8 categories are required,compared to 27.

Page 37: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Data Structures for Fast Learning

• AD (all-dimensions) tree - construction– Assume each attribute in the data has been assigned an

index.– Then, expand node for attribute i with the values of all at-

tributes j > i– Two important restrictions:

• Most populous expansion for each attribute is omitted (breaking ties arbitrarily)

• Expansions with counts that are zero are also omitted– The root node is given index zero

Page 38: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Data Structures for Fast Learning

• AD (all-dimensions) tree

Page 39: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Data Structures for Fast Learning

• AD (all-dimensions) treeQ. # of (humidity=normal, windy=true, play=no)?

Page 40: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Data Structures for Fast Learning

• AD (all-dimensions) treeQ. # of (humidity=normal, windy=false, play=no)?

?

Page 41: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Data Structures for Fast Learning

• AD (all-dimensions) treeQ. # of (humidity=normal, windy=false, play=no)?

#(humidity=normal, play=no) – #(humidity=normal, windy=true, play=no) = 1-1=0

Page 42: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Data Structures for Fast Learning

• AD tree only pay off if the data contains many thousands of instances.

Page 43: Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials

Questions and Answers• Any question?

Pictures from http://news.ninemsn.com.au/article.aspx?id=805150