csc446: pattern recognition (ln5)

18
Chapter 2 (Part 2): Bayesian Decision Theory Prof. Dr. Mostafa Gadal-Haqq Faculty of Computer & Information Sciences Computer Science Department AIN SHAMS UNIVERSITY CSC446 : Pattern Recognition (Study DHS-Chapter 2: Sec 2.4-2.6) ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1

Upload: mostafa-g-m-mostafa

Post on 07-Feb-2017

52 views

Category:

Data & Analytics


5 download

TRANSCRIPT

Page 1: CSC446: Pattern Recognition (LN5)

Chapter 2 (Part 2):

Bayesian Decision Theory

Prof. Dr. Mostafa Gadal-Haqq

Faculty of Computer & Information Sciences

Computer Science Department

AIN SHAMS UNIVERSITY

CSC446 : Pattern Recognition

(Study DHS-Chapter 2: Sec 2.4-2.6)

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1

Page 2: CSC446: Pattern Recognition (LN5)

2.4 Classifiers Using Discriminant Functions

• Classifier Representation

– A classifier can be represent in terms of

discriminant functions gi(x) ; i = 1, 2, …, c.

– The classifier assigns a feature vector x to class

i according to the value of g(x) .

– the discriminant functions gi(x) divide the feature

space into c decision regions Ri ; i = 1, 2,…, c .

x Ri if gi(x) > gj(x) j i

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 2

Page 3: CSC446: Pattern Recognition (LN5)

2.4 Classifiers Using Discriminant Functions

The classifier

can be

viewed as a

network.

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 3

Page 4: CSC446: Pattern Recognition (LN5)

2.4 Classifiers Using Discriminant Functions

• Properties of g(x)

– The choice of g(x) is not unique.

• If g(x) is scaled or shifted by a positive constant, we

will have the same decision:

g2(x) = k * g1(x), and g2(x) = g1(x) + k ; k is constant

– g(x) can be replaced by f(g(x)), where f(.) is a

monotonically increasing function:

g2(x) = f( g1( x ) )

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 4

Page 5: CSC446: Pattern Recognition (LN5)

• Examples of g(.):

– For minimum-error rate, we could choose g(.):

gi(x) = P(i | x)

gi(x) = P(x | i) P(i)

gi(x) =ln(gi(x)) = ln P(x | i) + ln P(i)

– For the general case with risks, we choose g(.):

gi(x) = - R(i | x)

2.4 Classifiers Using Discriminant Functions

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 5

Page 6: CSC446: Pattern Recognition (LN5)

• The two-category case

– A classifier is called a “dichotomizer” if it has

two discriminant functions g1 and g2.

– The decision rule becomes:

– we can put g(x) g1(x) – g2(x), then

2.4 Classifiers Using Discriminant Functions

Decide 1 if g1(x) > g2(x); Otherwise decide 2

Decide 1 if g (x) > 0; Otherwise decide 2

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 6

Page 7: CSC446: Pattern Recognition (LN5)

• The computation of g(x) for a dichotomizer is:

)|()( 11 xPxg

2.4 Classifiers Using Discriminant Functions

)(

)(ln

)|(

)|(ln

)|()|()(

2

1

2

1

21

P

P

xp

xp

xPxPxg

)|()( 22 xPxg

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 7

Page 8: CSC446: Pattern Recognition (LN5)

2.4 Classifiers Using Discriminant Functions

Feature

space for

two

classes

with two

features

and

decision

boundary.

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 8

Page 9: CSC446: Pattern Recognition (LN5)

2.5 The Univariate Normal Density

• A density that is analytically tractable

• Continuous density

• A lot of processes are asymptotically Gaussian

Where:

= mean (or expected value) of x 2 = squared deviation or variance

,2

1exp

2

1)(

2

xxp

1)( dxxp

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 9

Page 10: CSC446: Pattern Recognition (LN5)

2.5 The Normal Density

• Multivariate Normal Density – Multivariate normal density in d dimensions is:

where:

x = (x1, x2, …, xd)t = The multivariate random variable

= (1, 2, …, d)t = the mean vector

= d*d covariance matrix, || and -1 are it determinant

and inverse, respectively .

)x()x(

2

1exp

)2(

1)x( 1

2/12/

t

dp

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 10

Page 11: CSC446: Pattern Recognition (LN5)

2.6 Discriminant Functions for the Normal Density

• The minimum-error-rate the discriminant functions:

gi(x) = ln p(x | i) + ln P(i)

• if the densities p(x|ωi) are multivariate normal, i.e.,

if p(x|ωi) ~ N(µi,Σi).

• In this case,

• Let us consider a number of special cases:

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 11

Page 12: CSC446: Pattern Recognition (LN5)

2.6 Discriminant Functions for the Normal Density

• Case 1: Σi = σ2I:

• when the features are statistically independent, and

when each feature has the same variance, σ2. In this

case:

Σi = σ2I, |Σi| = σ2d , and Σi−1 = (1/σ2)I.

• The discriminant function is then:

• We ignored both |Σi| and the (d/2) ln 2π term, since they are

additive constants independent of i.

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 12

Page 13: CSC446: Pattern Recognition (LN5)

2.6 Discriminant Functions for the Normal Density

• where ||·|| is the Euclidean norm, that is,

||x − µi||2 = (x − µi)

t (x − µi)

• Expansion ||x − µi2|| yields

• Can be written as a linear discriminant functions:

• Where: and

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 13

Page 14: CSC446: Pattern Recognition (LN5)

2.6 Discriminant Functions for the Normal Density

• A classifier that uses linear discriminant functions

is called a linear machine.

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 14

Page 15: CSC446: Pattern Recognition (LN5)

2.6 Discriminant Functions for the Normal Density

• Reading:

– Case 2: Σi = Σ :

• the covariance matrices for all classes are

identical.

– Case 3: Σi = arbitrary:

• the general multivariate normal case, the

covariance matrices are different for each

category.

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 15

Page 16: CSC446: Pattern Recognition (LN5)

2.6 Discriminant Functions for the Normal Density

– Σi arbitrary:

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 16

Page 17: CSC446: Pattern Recognition (LN5)

2.6 Discriminant Functions for the Normal Density

• Numerical Example: (Two features, Two classes)

6

31

6

31

2

32

20

02/11

20

022

2/10

021

1

2/10

02/11

2

• using: P(w1)=P(w2)=0.5,

• The decision boundary g(x) = g1(x) - g2(x) =0

w1

w2

x2 - 3.514 + 1.125 x1 - 0.1875 x12=0

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 17

Page 18: CSC446: Pattern Recognition (LN5)

Home Work (1)

• Write a report on Section 2.9: Bayesian Decision

theory - Discrete features.

• 2.9.1: Independent binary features

• Example 3: Bayesian Decisions for 3D binary Data

• Problem Exercises:

– Derive the decision boundary equation in the

previous example (slide #17).

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 18