ar#ﬁcial intelligence - university of adelaidedsuter/harbin_course/perceptron.pdf · [many slides...

Ar#ficialIntelligencePerceptrons

Instructors:DavidSuterandQinceLi

CourseDelivered@HarbinIns#tuteofTechnology[ManyslidesadaptedfromthosecreatedbyDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.SomeothersfromcolleaguesatAdelaide

University.]

Error-DrivenClassifica#on

WhattoDoAboutErrors

§  Problem:there’ss#llspaminyourinbox

§  Needmorefeatures–wordsaren’tenough!§  Haveyouemailedthesenderbefore?§  Have1MotherpeoplejustgoYenthesameemail?§  Isthesendinginforma#onconsistent?§  IstheemailinALLCAPS?§  DoinlineURLspointwheretheysaytheypoint?§  Doestheemailaddressyouby(your)name?

§  NaïveBayesmodelscanincorporateavarietyoffeatures,buttendtodobestinhomogeneouscases(e.g.allfeaturesarewordoccurrences)

LinearClassifiers

FeatureVectors

Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just

# free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ...

SPAMor+

PIXEL-7,12 : 1 PIXEL-7,13 : 0 ... NUM_LOOPS : 1 ...

“2”

Some(Simplified)Biology

§  Verylooseinspira#on:humanneurons

LinearClassifiers

§  Inputsarefeaturevalues§  Eachfeaturehasaweight§  Sumistheac#va#on

§  Iftheac#va#onis:§  Posi#ve,output+1§  Nega#ve,output-1 Σ

f1

f2

f3

w1

w2 w3

>0?

Weights§  Binarycase:comparefeaturestoaweightvector§  Learning:figureouttheweightvectorfromexamples


# free : 4 YOUR_NAME :-1 MISSPELLED : 1 FROM_FRIEND :-3 ...


Dot product positive means the positive class

DecisionRules

BinaryDecisionRule

§  Inthespaceoffeaturevectors§  Examplesarepoints§  Anyweightvectorisahyperplane§  OnesidecorrespondstoY=+1§  OthercorrespondstoY=-1

BIAS : -3 free : 4 money : 2 ... 0 1

0

1

2

free

mon

ey

+1=SPAM

-1=HAM

WeightUpdates

Learning:BinaryPerceptron

§  Startwithweights=0§  Foreachtraininginstance:

§  Classifywithcurrentweights

§  Ifcorrect(i.e.,y=y*),nochange!

§  Ifwrong:adjusttheweightvector

Learning:BinaryPerceptron

§  Startwithweights=0§  Foreachtraininginstance:

§  Classifywithcurrentweights

§  Ifcorrect(i.e.,y=y*),nochange!§  Ifwrong:adjusttheweightvectorbyaddingorsubtrac#ngthefeaturevector.Subtractify*is-1.

Examples:Perceptron

§  SeparableCase

RealData

Mul#classDecisionRule

§  Ifwehavemul#pleclasses:§  Aweightvectorforeachclass:

§  Score(ac#va#on)ofaclassy:

§  Predic#onhighestscorewins

Binary=mul,classwherethenega,veclasshasweightzero

Learning:Mul#classPerceptron

§  Startwithallweights=0§  Pickuptrainingexamplesonebyone§  Predictwithcurrentweights

§  Ifcorrect,nochange!§  Ifwrong:lowerscoreofwronganswer,

raisescoreofrightanswer

§  Theconceptofhavingaseparatesetofweights(oneforeachclass)canbethoughtofashavingseparate“neurons”–alayerofneurons–oneforeachclass.Asinglelayernetwork.Ratherthantakingclass“max”overtheweights–onecantraintolearnacodingvector…

E.G.LearningDigits0,1….9

Proper#esofPerceptrons

§  Separability:trueifsomeparametersgetthetrainingsetperfectlycorrect

§  Convergence:ifthetrainingisseparable,perceptronwilleventuallyconverge(binarycase)

Separable

Non-Separable

ImprovingthePerceptron

ProblemswiththePerceptron

§  Noise:ifthedataisn’tseparable,weightsmightthrash§  Averagingweightvectorsover#me

canhelp(averagedperceptron)

§  Mediocregeneraliza#on:findsa“barely”separa#ngsolu#on

§  Overtraining:test/held-outaccuracyusuallyrises,thenfalls§  Overtrainingisakindofoverfiong

FixingthePerceptron

§  Lotsofliteratureonchangingthestepsize,averagingweightupdatesetc….

LinearSeparators

§  Whichoftheselinearseparatorsisop#mal?

SupportVectorMachines

§  Maximizingthemargin:goodaccordingtointui#on,theory,prac#ce§  OnlysupportvectorsmaYer;othertrainingexamplesareignorable§  Supportvectormachines(SVMs)findtheseparatorwithmaxmargin§  Basically,SVMsareMIRAwhereyouop#mizeoverallexamplesatonce

SVM

Classifica#on:Comparison

§  NaïveBayes§  Buildsamodeltrainingdata§  Givespredic#onprobabili#es§  Strongassump#onsaboutfeatureindependence§  Onepassthroughdata(coun#ng)

§  Perceptrons/SVN:§  Makeslessassump#onsaboutdata(?–linearseparabilityisabigassump#on!ButkernelSVN’setcweakenthatassump#on)

§  Mistake-drivenlearning§  Mul#plepassesthroughdata(predic#on)§  Oqenmoreaccurate

ar#ﬁcial intelligence - university of adelaidedsuter/harbin_course/perceptron.pdf · [many slides...

Documents