cosc522–machinelearning discriminant functions

COSC 522 – Machine Learning

Discriminant Functions

Hairong Qi, Gonzalez Family ProfessorElectrical Engineering and Computer ScienceUniversity of Tennessee, Knoxvillehttps://www.eecs.utk.edu/people/hairong-qi/Email: hqi@utk.edu

Course Website: http://web.eecs.utk.edu/~hqi/cosc522/

Recap from Previous Lecture• Definition of supervised learning (vs. unsupervised learning)• The difference between the training set and the test set• The difference between classification and regression• Definition of “features”, “samples”, and “dimension”• From histogram to probability density function (pdf)• In Bayes’ Formula, what is conditional pdf? Prior probability? Posterior

probability?• What does the normalization factor (or evidence) do?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall probability of

error?• What are cost function (or objective function) and optimization method used in

( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21

xxPxPx ωω >

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

MaximumPosteriorProbability

P error( ) = P ω2 | x( )p x( )dxℜ1

∫ + P ω1 | x( )p x( )dxℜ2

∫Overall probability of error

Questions• What is a discriminant function?• What is a multivariate Gaussian (or normal density function)?• What is the covariance matrix and what is its dimension?• What would the covariance matrix look like if the features are independent

from each other?• What would the covariance matrix look like if the features are independent

from each other AND have the same spread in each dimension?• What is minimum (Euclidean) distance classifier? Is it a linear or quadratic

classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Euclidean)

distance classifier?• What is minimum (Mahalanobis) distance classifier? Is it a linear or

quadratic classifier (machine)? What does the decision boundary looklike?

• What are the assumptions made when using a minimum (Mahalanobis)distance classifier?

• What does the decision boundary look like for a quadratic classifier?• What are the cost functions for the discriminant functions? And what is the

optimization method used to find the best solution?

Multi-variate Gaussian

Linear and Quadratic Machines

and their assumptions

Discrimimant Function

One way to represent pattern classifier- use discriminant functions gi(x)

For two-class cases,

( ) ( ) if class to vector x feature aassign willclassifier The

xgxg ji

g x( ) = g1(x) − g2(x) = P ω1 | x( ) − P ω2 | x( )

Multivariate Normal Density

p x ( ) =

12π( )d / 2

Σ1/ 2 exp − 1

2 x − µ ( )T

Σ−1 x − µ ( )

x : d - component column vector µ : d - component mean vectorΣ : d - by - d covariance matrixΣ : determinant

Σ-1 : inverse

( ) ( )!"

& −−== 2

1,dWhen σµ

Discriminant Function for Normal Density

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( )iiiiT

ωµµ

ωπµµ

lnln21

lnln212ln

+Σ−−Σ−−=

+Σ−−−Σ−−=

( )( )

( ) ( )!"#

&−Σ−−

Σ= − µµ

xxwxp T

2/12/ 21

Case 1: Si=s2I

The features are statistically independent, and have the same varianceGeometrically, the samples fall in equal-size hyperspherical clustersDecision boundary: hyperplane of d-1 dimension

σ 2 0 0 σ 2

) ) ) , Σ =σ 2d ,Σ−1 =

1σ 2 0

0 1σ 2

& & & &

) ) ) )

Linear Discriminant Function and Linear Machine

( ) ( )

( ) ( )iiTi

µµµ

( ) ( )iT

µµµ

−−=−

(distance) normEuclidean the:

Minimum-Distance Classifier

When P(wi) are the same for all c classes, the discriminant function is actually measuring the minimum distance from each x to each of the c mean vectors

2σµi

Case 2: Si = SThe covariance matrices for all the classes are identical but not a scalar of identity matrix. Geometrically, the samples fall in hyperellipsoidalDecision boundary: hyperplane of d-1 dimension

gi x ( ) = ln p

x |ω i( ) + ln P ω i( )

= −12 x − µ i( )T

Σi−1 x −

µ i( ) + ln P ω i( )

= µ i

T Σ−1( )T x − 1

2 µ i

TΣ−1 µ i + lnP ω i( )Squared Mahalanobis distance

Case 3: Si = arbitrary

The covariance matrices are different from each categoryQuadratic classifierDecision boundary: hyperquadratic for 2-D Gaussian( ) ( ) ( )

( ) ( ) ( )

( ) ( )iiiiTi

ωµµµ

ωµµ

lnln21

+Σ−Σ−Σ+Σ−=

+Σ−−Σ−−=

−−−

classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Euclidean) distance

classifier?• What is minimum (Mahalanobis) distance classifier? Is it a linear or quadratic

classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Mahalanobis)

distance classifier?• What does the decision boundary look like for a quadratic classifier?• What are the cost functions for the discriminant functions? And what is the

optimization method used to find the best solution?15

Bayes Decision Rule( ) ( ) ( )

( )xpPxp

xP jjj

ωωω

( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21

xxPxPx ωω >Maximum

PosteriorProbability

( ) ( ) if class to vector x feature aassign willclassifier The

xgxg ji

ωDiscriminantFunction

Case 1: Minimum Euclidean Distance (Linear Machine), Si=s2I

Case 2: Minimum Mahalanobis Distance (Linear Machine), Si = S

Case 3: Quadratic classifier , Si = arbitrary

All assuming Gaussian pdf

cosc522–machinelearning discriminant functions

Documents

machinelearning algorithms wealth analytics

scalablearchitecture forautomating machinelearning...

machinelearning-basedensemblemodelforzikavirust-cell

package ‘hidimda’ · all four discriminant routines,...

linear discriminant functions wen-hung liao, 11/25/2008

07 machinelearning pub

linear discriminant...

bayesian decision theory basic concepts discriminant...

lec 7 machinelearning · 2019. 9. 16. · 0.7 0.75 0.8 0.85...

research openaccess machinelearning-baseddynamic

novel tools and methods machinelearning

combining linear discriminant functions with neural

forlanguageprocessing l101: machinelearning

1 lecture 4 linear machine linear discriminant functions...

machinelearning concepts

cosc522–machinelearning baysiandecisiontheory

chapter 5: linear discriminant functions introduction ...

cse 291 - linear models for classification ·...

augmentingthesoftwaretestingworkflowwith...

machinelearning-basedsurrogatemodeling fordata