intelligent systems (ai-2) · cpsc 422, lecture 18 slide 1 intelligent systems (ai-2) computer...

40
CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models

Upload: others

Post on 17-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 18 Slide 1

Intelligent Systems (AI-2)

Computer Science cpsc422, Lecture 18

Oct, 21, 2016

Slide SourcesRaymond J. Mooney University of Texas at Austin

D. Koller, Stanford CS - Probabilistic Graphical Models

Page 2: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 17 2

Lecture Overview

Probabilistic Graphical models

• Recap Markov Networks

• Recap one application

• Inference in Markov Networks (Exact and Approx.)

• Conditional Random Fields

Page 3: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Parameterization of Markov Networks

CPSC 422, Lecture 17 Slide 3

Factors define the local interactions (like CPTs in Bnets)

What about the global model? What do you do with Bnets?

X

X

Page 4: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

How do we combine local models?As in BNets by multiplying them!

CPSC 422, Lecture 17 Slide 4

Page 5: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Step Back…. From structure to factors/potentials

In a Bnet the joint is factorized….

CPSC 422, Lecture 17 Slide 5

In a Markov Network you have one factor for each maximal clique

Page 6: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

General definitions

Two nodes in a Markov network are independent if and only if every path between them is cut off by evidence

So the markov blanket of a node is…?

eg for C

eg for A C

CPSC 422, Lecture 17 6

Page 7: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 17 7

Lecture Overview

Probabilistic Graphical models

• Recap Markov Networks

• Applications of Markov Networks

• Inference in Markov Networks (Exact and Approx.)

• Conditional Random Fields

Page 8: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Markov Networks Applications (1): Computer Vision

Called Markov Random Fields

• Stereo Reconstruction

• Image Segmentation

• Object recognition

CPSC 422, Lecture 17 8

Typically pairwise MRF

• Each vars correspond to a pixel (or superpixel )

• Edges (factors) correspond to interactions between adjacent pixels in the image

• E.g., in segmentation: from generically penalize discontinuities, to road under car

Page 9: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Image segmentation

CPSC 422, Lecture 17 9

Page 10: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Image segmentation

CPSC 422, Lecture 17 10

Page 11: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 17 12

Lecture Overview

Probabilistic Graphical models

• Recap Markov Networks

• Applications of Markov Networks

• Inference in Markov Networks (Exact and Approx.)

• Conditional Random Fields

Page 12: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 17 Slide 13

Variable elimination algorithm for Bnets

To compute P(Z| Y1=v1 ,… ,Yj=vj ) :

1. Construct a factor for each conditional probability.

2. Set the observed variables to their observed values.

3. Given an elimination ordering, simplify/decompose sum of products

4. Perform products and sum out Zi

5. Multiply the remaining factors Z

6. Normalize: divide the resulting factor f(Z) by Z f(Z) .

Variable elimination algorithm for Markov Networks…..

Given a network for P(Z, Y1,… ,Yj Z1,… ,Zi), :

Page 13: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Gibbs sampling for Markov Networks

Example: P(D | C=0)

Resample non-evidence variables in a pre-defined order or a random order

Suppose we begin with A

A

B C

D E

F

A B C D E F

1 0 0 1 1 0

Note: never change evidence!

What do we need to sample?

A. P(A | B=0) B. P(A | B=0, C=0)

C. P( B=0, C=0| A)

CPSC 422, Lecture 17 14

Page 14: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Example: Gibbs sampling

Resample probability distribution of P(A|BC)

A

B C

D E

F

ϕ1

ϕ2 ϕ3

A=1 A=0

C=1 1 2

C=0 3 4

A=1 A=0

B=1 1 5

B=0 4.3 0.2A B C D E F

1 0 0 1 1 0

? 0 0 1 1 0

Φ1 × Φ2 × Φ3 = A=1 A=0

12.9 0.8

Normalized result = A=1 A=0

0.95 0.05

CPSC 422, Lecture 17 15

B=0 ; C=0

Page 15: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Example: Gibbs sampling

Resample probability distribution of B given A D

A

B C

D E

F

ϕ1

ϕ2

ϕ4

D=1 D=0

B=1 1 2

B=0 2 1

A=1 A=0

B=1 1 5

B=0 4.3 0.2A B C D E F

1 0 0 1 1 0

1 0 0 1 1 0

1 ? 0 1 1 0

Φ1 × Φ2 × Φ4 = B=1 B=0

1 8.6

Normalized result = B=1 B=0

0.11 0.89

CPSC 422, Lecture 17 16

Page 16: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 17 17

Lecture Overview

Probabilistic Graphical models

• Recap Markov Networks

• Applications of Markov Networks

• Inference in Markov Networks (Exact and Approx.)

• Conditional Random Fields

Page 17: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

We want to model P(Y1| X1.. Xn)

• Which model is simpler, MN or BN?

CPSC 422, Lecture 18 Slide 18

Y1

X1 X2 … Xn

Y1

X1 X2 … Xn

• Naturally aggregates the influence of different parents

… where all the Xi are always observed

MN BN

Page 18: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Conditional Random Fields (CRFs)

• Model P(Y1 .. Yk | X1.. Xn)

• Special case of Markov Networks where all the Xi

are always observed

• Simple case P(Y1| X1…Xn)

CPSC 422, Lecture 18 Slide 19

Page 19: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

What are the Parameters?

CPSC 422, Lecture 18 Slide 20

Page 20: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 21

}}1,1{exp{),( 11 YXwYX iiii

}}1{exp{)( 1010 YwY

Y1

X1 X2 … Xn

Page 21: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 22

}}1,1{exp{),( 11 YXwYX iiii

}}1{exp{)( 1010 YwY

Y1

X1 X2 … Xn

Page 22: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 23

}}1,1{exp{),( 11 YXwYX iiii

}}1{exp{)( 1010 YwY

Y1

X1 X2 … Xn

0

Page 23: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 24

n

i

iin xwwxxYP1

011 )exp(),....,,1( Y1

X1 X2 … Xn1),....,,0( 11

nxxYP

),....,|1( 11 nxxYP

Page 24: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Let’s derive the probabilities we need

CPSC 422, Lecture 18 Slide 25

n

i

iin xwwxxYP1

011 )exp(),....,,1( Y1

X1 X2 … Xn1),....,,0( 11

nxxYP

),....,|1( 11 nxxYP

Page 25: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Sigmoid Function used in Logistic Regression

• Great practical interest

• Number of param wi is linear instead of exponential in the number of parents

• Natural model for many real-world applications

• Naturally aggregates the influence of different parents

CPSC 422, Lecture 18 Slide 26

Y1

X1 X2 … Xn

Y1

X1 X2 … Xn

Page 26: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Logistic Regression as a Markov Net (CRF)

Logistic regression is a simple Markov Net (a CRF) aka naïve markov model

Y

X1 X2… Xn

• But only models the conditional distribution, P(Y | X ) and not the full joint P(X,Y )

CPSC 422, Lecture 18 Slide 27

Page 27: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Naïve Bayes vs. Logistic RegressionY

X1 X2… Xn

Y

X1 X2

Xn

Naïve Bayes

LogisticRegression (Naïve Markov)

Conditional

Generative

Discriminative

CPSC 422, Lecture 18 Slide 28

Page 28: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 18 Slide 29

Learning Goals for today’s class

You can:

• Perform Exact and Approx. Inference in Markov Networks

• Describe a few applications of Markov Networks

• Describe a natural parameterization for a Naïve Markov

model (which is a simple CRF)

• Derive how P(Y|X) can be computed for a Naïve Markov

model

• Explain the discriminative vs. generative distinction and its

implications

Page 29: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 18 Slide 30

Next class Mon

Revise generative temporal models (HMM)To Do

Linear-chain CRFs

• Go to Office Hours x

• Learning Goals (look at the end of the slides for each lecture – complete list ahs been posted)

• Revise all the clicker questions and practice exercises

• More practice material has been posted

• Check questions and answers on Piazza

Midterm, Wed, Oct 26, we will start at 9am sharp

How to prepare….

Page 30: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

31

Generative vs. Discriminative Models

Generative models (like Naïve Bayes): not directly designed to maximize performance on classification. They model the joint distribution P(X,Y).

Classification is then done using Bayesian inference But a generative model can also be used to perform any

other inference task, e.g. P(X1 | X2, …Xn, )• “Jack of all trades, master of none.”

Discriminative models (like CRFs): specifically designed and trained to maximize performance of classification. They only model the conditional distribution P(Y | X ).

By focusing on modeling the conditional distribution, they generally perform better on classification than generative models when given a reasonable amount of training data.

CPSC 422, Lecture 18

Page 31: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

On Fri: Sequence Labeling

Y2

X1 X2… XT

HMM

Linear-chain CRF

Conditional

Generative

Discriminative

Y1 YT

..

Y2

X1 X2… XT

Y1 YT

..

CPSC 422, Lecture 18Slide 32

Page 32: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 18 33

Lecture Overview

• Indicator function

• P(X,Y) vs. P(X|Y) and Naïve Bayes

• Model P(Y|X) explicitly with Markov Networks

• Parameterization

• Inference

• Generative vs. Discriminative models

Page 33: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

P(X,Y) vs. P(Y|X)

Assume that you always observe a set of variables

X = {X1…Xn}

and you want to predict one or more variables

Y = {Y1…Ym}

You can model P(X,Y) and then infer P(Y|X)

Slide 34CPSC 422, Lecture 18

Page 34: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

P(X,Y) vs. P(Y|X)

With a Bnet we can represent a joint as the product of Conditional Probabilities

With a Markov Network we can represent a joint a the product of Factors

Slide 35CPSC 422, Lecture 18

We will see that Markov Network are also suitable for representing the conditional prob. P(Y|X) directly

Page 35: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Directed vs. Undirected

CPSC 422, Lecture 18 Slide 36

Factorization

Page 36: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Naïve Bayesian Classifier P(Y,X)A very simple and successful Bnets that allow to classify

entities in a set of classes Y1, given a set of features (X1…Xn)

Example:

• Determine whether an email is spam (only two classes spam=T and spam=F)

• Useful attributes of an email ?

Assumptions• The value of each attribute depends on the classification

• (Naïve) The attributes are independent of each other given the classification

P(“bank” | “account” , spam=T) = P(“bank” | spam=T)

Slide 37CPSC 422, Lecture 18

Page 37: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Naïve Bayesian Classifier for Email Spam

Email Spam

Email contains “free”

words

• What is the structure?

Email contains “money”

Email contains “ubc”

Email contains “midterm”

The corresponding Bnet represent : P(Y1, X1…Xn)

Slide 38

CPSC 422, Lecture 18

Page 38: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Can we derive : P(Y1| X1…Xn) for any x1…xn

“free money for you now”

NB Classifier for Email Spam: Usage

Email Spam

Email contains “free”

Email contains “money”

Email contains “ubc”

Email contains “midterm”

But you can also perform any other inference…e.g., P(X1| X3 )

Slide 39CPSC 422, Lecture 18

Page 39: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

Can we derive : P(Y1| X1…Xn)

“free money for you now”

NB Classifier for Email Spam: Usage

Email Spam

Email contains “free”

Email contains “money”

Email contains “ubc”

Email contains “midterm”

But you can perform also any other inferencee.g., P(X1| X3 )

Slide 40CPSC 422, Lecture 18

Page 40: Intelligent Systems (AI-2) · CPSC 422, Lecture 18 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2016 Slide Sources Raymond J. Mooney University

CPSC 422, Lecture 18 Slide 41