bayesian decision theoryhic/cs7616/pdf/lecture2.pdf · 2016-01-19 · • the bayes decision rule...

65
Bayesian Decision Theory Chapter 2 (Duda, Hart & Stork) CS 7616 - Pattern Recognition Henrik I Christensen Georgia Tech.

Upload: others

Post on 10-Mar-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

BayesianDecisionTheory

Chapter 2(Duda,Hart&Stork)

CS7616- PatternRecognition

HenrikIChristensenGeorgiaTech.

Page 2: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

BayesianDecisionTheory

• Designclassifierstorecommenddecisions thatminimizesometotalexpected”risk”.– Thesimplestrisk istheclassificationerror(i.e.,costsareequal).

– Typically,therisk includesthecost associatedwithdifferentdecisions.

Page 3: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Terminology

• Stateofnatureω (randomvariable):– e.g.,ω1 forseabass,ω2 forsalmon

• ProbabilitiesP(ω1) andP(ω2) (priors):– e.g.,priorknowledgeofhowlikelyistogetaseabassorasalmon

• Probabilitydensityfunctionp(x)(evidence):– e.g.,howfrequentlywewillmeasureapatternwithfeaturevaluex (e.g.,x correspondstolightness)

Page 4: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Terminology(cont’d)

• Conditionalprobabilitydensityp(x/ωj) (likelihood):– e.g.,howfrequentlywewillmeasureapatternwithfeaturevaluex giventhatthepatternbelongstoclassωj

e.g., lightness distributionsbetween salmon/sea-basspopulations

Page 5: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Terminology(cont’d)

• ConditionalprobabilityP(ωj/x)(posterior):– e.g.,theprobabilitythatthefishbelongstoclassωj givenmeasurementx.

Page 6: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

DecisionRuleUsingPriorProbabilities

Decideω1 if P(ω1) >P(ω2); otherwisedecide ω2

or P(error)=min[P(ω1),P(ω2)]

• Favoursthemostlikelyclass.• Thisrulewillbemakingthesamedecisionalltimes.

– i.e.,optimumifnootherinformationisavailable

1 2

2 1

( )( )

( )P if wedecide

P errorP if wedecideω ω

ω ω⎧

= ⎨⎩

Page 7: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

DecisionRuleUsingConditionalProbabilities

• UsingBayes’rule,theposteriorprobabilityofcategoryωjgivenmeasurementxisgivenby:

where(i.e.,scalefactor– sumofprobs=1)

Decideω1ifP(ω1 /x)>P(ω2/x); otherwisedecideω2or

Decideω1ifp(x/ω1)P(ω1)>p(x/ω2)P(ω2) otherwisedecideω2

( / ) ( )( / )

( )j j

j

p x P likelihood priorP xp x evidenceω ω

ω×

= =

2

1( ) ( / ) ( )j j

jp x p x Pω ω

=

=∑

Page 8: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

DecisionRuleUsingConditionalpdf (cont’d)

1 22 1( ) ( )3 3

P Pω ω= = P(ωj /x)p(x/ωj)

Page 9: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

ProbabilityofError

• Theprobabilityoferrorisdefinedas:

or

• Whatistheaverageprobabilityerror?

• TheBayesruleisoptimum,thatis,itminimizestheaverageprobabilityerror!

1 2

2 1

( / )( / )

( / )P x if wedecide

P error xP x if wedecideω ω

ω ω⎧

= ⎨⎩

( ) ( , ) ( / ) ( )P error P error x dx P error x p x dx∞ ∞

−∞ −∞

= =∫ ∫

P(error/x) = min[P(ω1/x), P(ω2/x)]

Page 10: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

WheredoProbabilitiesComeFrom?

• Therearetwocompetitiveanswerstothisquestion:

(1) Relativefrequency (objective)approach.– Probabilitiescanonlycomefromexperiments.

(2) Bayesian (subjective)approach.– Probabilitiesmayreflectdegreeofbeliefandcanbebasedonopinion.

Page 11: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Example(objectiveapproach)

• Classifycarswhethertheyaremoreorlessthan$50K:– Classes:C1 ifprice>$50K,C2 ifprice<=$50K– Features:x,theheightofacar

• UsetheBayes’ruletocomputetheposteriorprobabilities:

• Weneedtoestimatep(x/C1),p(x/C2),P(C1),P(C2)

( / ) ( )( / )( )i i

ip x C P CP C x

p x=

Page 12: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Example(cont’d)

• Collectdata– Askdrivershowmuchtheircarwasandmeasureheight.

• Determineprior probabilitiesP(C1),P(C2)– e.g.,1209samples:#C1=221#C2=988

1

2

221( ) 0.1831209988( ) 0.8171209

P C

P C

= =

= =

Page 13: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Example(cont’d)

• Determineclassconditionalprobabilities(likelihood)– Discretizecarheightintobinsandusenormalizedhistogram

( / )ip x C

Page 14: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Example(cont’d)

• Calculatetheposteriorprobability foreachbin:

1 11

1 1 2 2

( 1.0 / ) ( )( / 1.0)( 1.0 / ) ( ) ( 1.0 / ) ( )

0.2081*0.183 0.4380.2081*0.183 0.0597*0.817

p x C P CP C xp x C P C p x C P C

== = =

= + =

= =+

( / )iP C x

Page 15: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

AMoreGeneralTheory

• Usemorethanonefeatures.• Allowmorethantwocategories.• Allowactions otherthanclassifyingtheinputtooneofthepossiblecategories(e.g.,rejection).

• Employamoregeneralerrorfunction(i.e.,“risk”function)byassociatinga“cost”(“loss”function)witheacherror(i.e.,wrongaction).

Page 16: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Terminology

• Featuresformavector• Afinitesetofc categoriesω1,ω2,…,ωc

• Bayesrule(i.e.,usingvectornotation):

• Afinitesetof lactionsα1,α2,…,αl

• Aloss functionλ(αi /ωj)– thecostassociatedwithtakingactionαiwhenthecorrect

classificationcategoryisωj

dR∈x

( / ) ( )( / )

( )j j

j

p PP

pω ω

ω =x

xx

1( ) ( / ) ( )

c

j jj

where p p Pω ω=

=∑x x

Page 17: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

ConditionalRisk(orExpectedLoss)

• Supposeweobservexandtakeaction αi

• Supposethatthecostassociatedwithtakingactionαi withωj beingthecorrectcategoryisλ(αi /ωj)

• Theconditionalrisk (orexpectedloss)withtakingactionαi is:

1( / ) ( / ) ( / )

c

i i j jj

R a a Pλ ω ω=

=∑x x

Page 18: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

OverallRisk

• Supposeα(x)isageneral decisionrulethatdetermineswhichactionα1,α2,…,αltotakeforeveryx;thentheoverallriskisdefinedas:

• Theoptimum decisionruleistheBayesrule

( ( ) / ) ( )R R a p d= ∫ x x x x

Page 19: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

OverallRisk(cont’d)

• TheBayesdecisionruleminimizesR by:(i)ComputingR(αi /x) foreveryαi givenanx

(ii)ChoosingtheactionαiwiththeminimumR(αi /x)

• TheresultingminimumoverallriskiscalledBayesrisk andisthebest(i.e.,optimum)performancethatcanbeachieved:

* minR R=

Page 20: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Example:Two-categoryclassification

• Define– α1:decideω1

– α2:decideω2

– λij=λ(αi /ωj)

• Theconditionalrisksare:

1( / ) ( / ) ( / )

c

i i j jj

R a a Pλ ω ω=

=∑x x

(c=2)

Page 21: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Example:Two-categoryclassification(cont’d)

• Minimumriskdecisionrule:

or (i.e.,usinglikelihoodratio)

or

>

thresholdlikelihood ratio

Page 22: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

SpecialCase:Zero-OneLossFunction

• Assignthesamelosstoallerrors:

• Theconditionalriskcorrespondingtothislossfunction:

Page 23: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

SpecialCase:Zero-OneLossFunction(cont’d)

• Thedecisionrulebecomes:

• Inthiscase,theoverallriskistheaverageprobabilityerror!

or

or

Page 24: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Example

2 1( ) / ( )a P Pθ ω ω=

2 12 22

1 21 11

( )( )( )( )bPPω λ λ

θω λ λ

−=

−(decisionregions)

Decide ω1 if p(x/ω1)/p(x/ω2)>P(ω2 )/P(ω1) otherwise decide ω2

Assumingzero-one loss:

12 21λ λ>

>

assume:

Assuminggeneral loss:

Page 25: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

DiscriminantFunctions

• Ausefulwaytorepresentclassifiersisthroughdiscriminant functions gi(x),i =1,...,c,whereafeaturevectorx isassignedtoclassωi if:

gi(x)>gj(x) forall j i≠

Page 26: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

DiscriminantsforBayesClassifier

• Assumingagenerallossfunction:

gi(x)=-R(αi/x)

• Assumingthezero-onelossfunction:

gi(x)=P(ωi/x)

Page 27: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

DiscriminantsforBayesClassifier(cont’d)

• Isthechoiceofgi unique?– Replacinggi(x)withf(gi(x)),wheref() ismonotonicallyincreasing,doesnotchangetheclassificationresults.

( / ) ( )( )( )

( ) ( / ) ( )( ) ln ( / ) ln ( )

i ii

i i i

i i i

p Pgp

g p Pg p P

ω ω

ω ω

ω ω

=

=

= +

xxx

x xx x

gi(x)=P(ωi/x)

we’llusethisformextensively!

Page 28: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Caseoftwocategories

• Morecommontouseasinglediscriminantfunction(dichotomizer)insteadoftwo:

• Examples:1 2

1 1

2 2

( ) ( / ) ( / )( / ) ( )( ) ln ln( / ) ( )

g P Pp Pgp P

ω ω

ω ωω ω

= −

= +

x x xxxx

Page 29: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

DecisionRegions andBoundaries• Decisionrulesdividethefeaturespaceindecisionregions

R1,R2,…,Rc, separatedbydecisionboundaries.

decisionboundaryisdefinedby:

g1(x)=g2(x)

Page 30: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

DiscriminantFunctionforMultivariateGaussianDensity

• Considerthefollowingdiscriminantfunction:

( ) ln ( / ) ln ( )i i ig p Pω ω= +x x

N(µ,Σ)

p(x/ωi)

Page 31: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseI

• Σi=σ2(diagonal)– Featuresarestatisticallyindependent– Eachfeaturehasthesamevariance

favoursthea-priorimorelikelycategory

Page 32: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseI(cont’d)

wi=

)

)

Page 33: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseI(cont’d)

• Propertiesofdecisionboundary:– Itpassesthroughx0– Itisorthogonaltothelinelinkingthemeans.– WhathappenswhenP(ωi)=P(ωj) ?– IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.– Ifσ isverysmall,thepositionoftheboundaryisinsensitivetoP(ωi)

and P(ωj)

)

)

Page 34: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseI(cont’d)

IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.

Page 35: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseI(cont’d)

IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.

Page 36: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseI(cont’d)

IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.

Page 37: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseI(cont’d)

• Minimumdistanceclassifier– WhenP(ωi)areequal,then:

2( ) || ||i ig µ= − −x x

max

Page 38: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseII

• Σi=Σ

Page 39: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseII(cont’d)

Page 40: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseII(cont’d)

• Propertiesofhyperplane(decisionboundary):– Itpassesthroughx0– Itisnotorthogonaltothelinelinkingthemeans.– WhathappenswhenP(ωi)=P(ωj) ?– IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.≠

Page 41: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseII(cont’d)

IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.

Page 42: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseII(cont’d)

IfP(ωi)=P(ωj),thenx0 shiftsawayfromthemostlikelycategory.

Page 43: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseII(cont’d)

• Mahalanobisdistanceclassifier– WhenP(ωi)areequal,then:

max

Page 44: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseIII

• Σi=arbitrary

e.g., hyperplanes,pairsofhyperplanes,hyperspheres,hyperellipsoids,hyperparaboloids etc.

hyperquadrics;

Page 45: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Example- CaseIII

P(ω1)=P(ω2)

decisionboundary:

boundarydoesnot passthroughmidpointofμ1,μ2

Page 46: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseIII(cont’d)

non-lineardecisionboundaries

Page 47: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MultivariateGaussianDensity:CaseIII(cont’d)

• Moreexamples

Page 48: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

ErrorBounds• Exacterrorcalculationscouldbedifficult– easierto

estimateerrorbounds!

ormin[P(ω1/x),P(ω2/x)]

P(error)

Page 49: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

ErrorBounds(cont’d)

• IftheclassconditionaldistributionsareGaussian,then

where:

| |

Page 50: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

ErrorBounds(cont’d)

• TheChernoff boundcorrespondstoβ thatminimizes e-κ(β)– Thisisa1-Doptimizationproblem,regardlesstothedimensionality

oftheclassconditionaldensities.loose boundloose bound

tight bound

Page 51: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

ErrorBounds(cont’d)• Bhattacharyyabound

– Approximatetheerrorboundusingβ=0.5– EasiertocomputethanChernofferrorbutlooser.

• TheChernoffandBhattacharyyaboundswillnotbegoodboundsifthedistributionsarenot Gaussian.

Page 52: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Example

k(0.5)=4.06

( ) 0.0087P error ≤

Bhattacharyyaerror:

Page 53: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

ReceiverOperatingCharacteristic(ROC)Curve

• Everyclassifieremployssomekindofathreshold.

• Changingthethresholdaffectstheperformanceofthesystem.

• ROCcurvescanhelpusevaluatesystemperformancefordifferent thresholds.

2 1( ) / ( )a P Pθ ω ω=

2 12 22

1 21 11

( )( )( )( )bPPω λ λ

θω λ λ

−=

Page 54: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Example:PersonAuthentication• Authenticateapersonusingbiometrics(e.g.,fingerprints).

• Therearetwopossibledistributions(i.e.,classes):– Authentic (A)andImpostor (I)

IA

Page 55: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

Example:PersonAuthentication(cont’d)

• Possibledecisions:– (1)correctacceptance(truepositive):

• Xbelongs toA,andwedecideA

– (2)incorrectacceptance (falsepositive):• Xbelongs toI,andwedecide A

– (3)correctrejection(truenegative):• Xbelongs toI,andwedecide I

– (4)incorrectrejection (falsenegative):• Xbelongs toA,andwedecide I

I A

false positive

correct acceptance

correct rejection

false negative

Page 56: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

ErrorvsThreshold

ROC

Page 57: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

FalseNegativesvsPositives

Page 58: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the
Page 59: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

NextLecture

• LinearClassificationMethods– Hastieetal,Chapter4

• PaperlistwillavailablebyWeekend– BiddingtostartonMonday

Page 60: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

BayesDecisionTheory:CaseofDiscreteFeatures

• Replacewith

• Seesection2.9

( / )jp dω∫ x x ( / )jP ω∑x

x

Page 61: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MissingFeatures

• ConsideraBayesclassifierusinguncorrupteddata.• Supposex=(x1,x2)isatestvectorwherex1 ismissingandthe

valueofx2 is- howcanweclassifyit?– Ifwesetx1 equaltotheaveragevalue,wewillclassifyx asω3

– Butislarger;maybeweshouldclassifyxasω2 ?2 2ˆ( / )p x ω

2x̂

Page 62: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

MissingFeatures(cont’d)

• Supposex=[xg,xb](xg:goodfeatures,xb:badfeatures)• DerivetheBayesruleusingthegoodfeatures:

pp

Marginalizeposteriorprobabilityoverbadfeatures.

Page 63: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

CompoundBayesianDecisionTheory

• Sequential decision(1)Decideaseachfishemerges.

• Compound decision(1)Waitforn fishtoemerge.(2)Makeall n decisionsjointly.

– Couldimproveperformancewhenconsecutivestatesofnaturearenot bestatisticallyindependent.

Page 64: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

CompoundBayesianDecisionTheory(cont’d)

• SupposeΩ=(ω(1),ω(2),…,ω(n))denotesthenstatesofnaturewhereω(i)cantakeoneofcvaluesω1,ω2,…,ωc(i.e.,ccategories)

• SupposeP(Ω)isthepriorprobabilityofthenstatesofnature.

• SupposeX=(x1,x2,…,xn)arenobservedvectors.

Page 65: Bayesian Decision Theoryhic/CS7616/pdf/lecture2.pdf · 2016-01-19 · • The Bayes decision rule minimizes Rby: (i) Computing R(α i /x)for every α i given an x (ii) Choosing the

CompoundBayesianDecisionTheory(cont’d)

i.e.,consecutivestatesofnaturemaynot bestatisticallyindependent!

acceptable!P P