machine learning for intelligent systems · machine learning for intelligent systems instructors:...

MachineLearningforIntelligentSystems

Instructors:NikaHaghtalab (thistime)andThorstenJoachims

Lecture3:SupervisedLearningandDecisionTrees

Reading:UML18-18.2

PlacementExam

Calculus Optimization Probability Lin.Algebra

Example:AppleHarvestFestivalLearntoclassifytastyapplesbasedonexperience.à Trainingset:Applesyouhavetastedinthepast,theirfeaturesandwhethertheyweretasty.

à Youseeanapple,wouldyoubuyit?

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#4 B Red Large Crunchy Yes#5 B Green Large Crunchy Yes#6 B Green Small Crunchy No#7 A Red Small Soft No#8 B Red Small Crunchy Yes#9 B Green Medium Soft No#10 A Red Small Crunchy Yes#11 A Red Medium Crunchy ?

Features Label

SupervisedLearningInstanceSpace:

InstancespaceX includingfeaturerepresentation.E.g.,X = A, B × red, green × large,medium, small × crunchy, soft .

TargetAttributes(Labels):AsetY oflabels.E.g.,Y = Tasty, Not Tasty orY = Yes, No .

Hiddentargetfunction:Anunknownfunction,f: X → Y,e.g.,howanapplesfeaturescorrelate

withitstastinessinreallife.TrainingData:

Asetoflabeledpairs x, f x ∈ X×Y thatwehaveseenbefore,e.g.,wehavetasteda(A, red, large, crunchy) applebeforethatwasTasty.

Givenalargeenoughnumberoftrainingexamples,learnahypothesish: X → Y thatapproximatesf(⋅)

Informal:Ourmaingoal

HypothesisSpaceHypothesisspace:

ThesetH offunctionsweconsiderwhenlookingforh: X → Y.

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#4 B Red Large Crunchy Yes#5 B Green Large Crunchy Yes#6 B Green Small Crunchy No#7 A Red Small Soft No#8 B Red Small Crunchy Yes#9 B Green Medium Soft No#10 A Red Small Crunchy Yes

Example:AllhypothesesthatareANDoffeature-values:Farm = whatever ∧ color = red ∧ size = whatever∧ Iirmness = Crunchy

Consistency

Afunctionh: X → Y isconsistentwithasetoflabeledtrainingexamplesS ifandonlyifh x = y forall x, y ∈ S.

Consistency

Example:Is Farm = whatever ∧ color = red ∧ size = whatever ∧Iirmness = Crunchy consistentwiththetableofapples?

IsitconsistentwiththeapplesthatcamefromfarmA?

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#7 A Red Small Soft No#10 A Red Small Crunchy Yes

InductiveLearning

Givenalargeenoughnumberoftrainingexamples,andahypothesisspaceH,

learnahypothesish ∈ H thatapproximatesf(⋅)

MoreFormal:Ourmaingoal

Ourhope:Thereisah ∈ H thatapproximatesf.

Ourstrategy: Our trainingexamplesarerepresentativeoftherealityàWehaveseenmany examples.à Theywereselectedwithoutanybias.àAnyhypothesisthatisconsistentwithtrainingdatawouldbeaccurateonunobservedinstances.

Findh ∈ H suchthatisconsistentwithS.

h x = f x foralmostallx ∈ X

UsingConsistencyinLearningIdea: OnatrainingsetS,returnanyh ∈ H thatisconsistentwithit.

TheversionspaceVS(H, S) isasubsetoffunctionsfromHthatisconsistentwithS.

VersionSpace:

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#7 A Red Small Soft No#10 A Red Small Crunchy Yes

Example:ForthefollowingsetoftrainingexamplesSwhatandH thatisallhypothesesthatareANDoffeature-valuesettings.WhatisVS(H, S)?

ComputingtheVersionSpace

• InitializeVS = H• Foreachtrainingexample x, y ∈ S,

o removeanyh ∈ H suchthath x ≠ y.• OutputVS.

List-then-EliminateAlgorithm

Atahighlevel:ListallhypothesisinH,RemovethemiftheyarenotconsistentwithsomeexampleinS.

Takeaway:Thealgorithmworks,butà Keepingtrackoftheversionspaceistimeandspaceconsuming.à Onlyneedoneconsistentà Buildaconsistenthypothesisdirectly.

DecisionTrees

Firmness

crunchy

greenred

Yes Size

largesmall

No Yes

Internalnodes:Testonefeature,e.g.,color.

Branchesfromaninternalnode:Possiblevaluesofthefeaturethat’sbeingtested,e.g.,{red,green}.

Leafnodes:Thelabelofinstanceswhosefeaturescorrespondtotheroot-to-leafpath,e.g.,Yes orNo.

medium

UsingaDecisionTreeLetfunctionf bethefollowingdecisiontree.1. Whatisf((B, Green, Large, Crunchy))?2. Whatisf((B, Red, Large, Soft))?3. Whatisf((A, Green, Small, Crunchy))?

Firmness

crunchy

greenred

Yes Size

largesmall

No YesNo

medium

Is f consistentwiththetrainingsetinthetable.

MakingaTree

Example:Whatisthedecisiontreecorrespondingtofunctions1. Iirmness = Crunchy ?2. Iirmness = Crunchy ∧ (color = Red)?3. color = Red ∨ size = Large ?4. Iirmness = Crunchy ∧ color = Red ∨ size = Large ?

Top-DownInductionofDecisionTreesIdea:• Don’tuseList-then-Eliminatetooinefficient.• Insteadgrowagooddecisiontreetostartwith.

àWegrowfromtheroottotheleavesà Repeatedlytakeanexitingleafthatdoesnotincludeadefinitivelabelandreplaceitwithaninternalnode.

• Ifallexamplesin S havethesamelabelyàMakealeafwithlabely.

• Elseà PickfeatureAà ForeachvalueaV ofA,makeachildnodesuchthat

SV = { x, y ∈ S: feature A of x has value aV}àMakeatreewithA asarootandTD-IDT(SV,y)assubtrees.

TD-IDT(S,y)

GrowaconsistentDT

• Ifallexamplesin S havethesamelabelyàMakealeafwithlabely.

• Elseà PickfeatureAà ForeachvalueaV ofA,makeachildnodesuchthat

SV = { x, y ∈ S: feature A of x has value aV}àMakeatreewithA asarootandTD-IDT(SV,y)assubtrees.

TD-IDT(S,y)

WhichSplit?HowshouldwepickfeatureA forsplitting.à Howdoesthepredictionimproveafterthesplità Ateachnode,takethenumberofpositiveandnegativeinstances.E.g.(4Yes,2No).

à Ifweweretostop,besttopredictthemajoritylabel.E.g.,Yes.à Howdoweachieveamoredefiniteprediction?

Firmness

softcrunchy

(4Y,6N)

(4Y,2N) (0Y,4N)

largesmall

(4Y,6N)

(2Y,3N)

medium

(0Y,3N) (2Y,0N)

(4Y,6N)

(1Y,4N) (3Y,2N)

WhichSplit?

Firmness

softcrunchy

{4Y,6N}

{4Y,2N} {0Y,4N}

largesmall

(4Y,6N)

(2Y,3N)

medium

(0Y,3N) (2Y,0N)

(4Y,6N)

(1Y,4N) (3Y,2N)

Differentwaystomeasurethis:• Attributethatminimizesthenumberoferror:

• Beforesplit:Err 𝑆 = size of the minority label.• Aftersplit:Err 𝑆|𝐴 = ∑^_ Err(𝑆a)

• Attributethatminimizesentropy(maximizeInformationGain).• Beforesplit:𝐻 𝑆 = −∑𝑝 𝑥 log(𝑝 𝑥 )• Aftersplit:I 𝑆|𝐴 = ∑^_ 𝑝 𝑆a 𝐻(𝑆a)

Example:TextClassification• Task:LearnrulethatclassifiesReutersBusinessNews

• Class+:“CorporateAcquisitions”• Class-:Otherarticles• 2000traininginstances

• Representation:• Booleanattributes,indicatingpresenceofakeywordinarticle• 9947suchkeywords(moreaccurately,word“stems”)

LAROCHE STARTS BID FOR NECO SHARES

Investor David F. La Roche of North Kingstown, R.I., said he is offering to purchase 170,000 common shares of NECO Enterprises Inc at 26 dlrs each. He said the successful completion of the offer, plus shares he already owns, would give him 50.5 pct of NECO's 962,016 common shares. La Roche said he may buy more, and possible all NECO shares. He said the offer and withdrawal rights will expire at 1630 EST/2130 gmt, March 30, 1987.

SALANT CORP 1ST QTR FEB 28 NET

Oper shr profit seven cts vs loss 12 cts. Oper net profit 216,000 vs loss 401,000. Sales 21.4 mln vs 24.9 mln. NOTE: Current year net excludes 142,000 dlr tax credit. Company operating in Chapter 11 bankruptcy.

DecisionTreefor“CorporateAcq.”• vs=1:-• vs=0:• |export=1:…• |export=0:• ||rate=1:• |||stake=1:+• |||stake=0:• ||||debenture=1:+• ||||debenture=0:• |||||takeover=1:+• |||||takeover=0:• ||||||file=0:-• ||||||file=1:• |||||||share=1:+• |||||||share=0:-…andmanymore

Learnedtree:• has437nodes• isconsistent

Accuracyoflearnedtree:• 11%errorrate

Note:wordstemsexpandedforimprovedreadability.

machine learning for intelligent systems · machine learning for intelligent systems instructors:...

Documents

intelligent lighting : a machine learning perspective

intelligent adaptive learning: a powerful element for 21st...

an 'intelligent' assessment of learning outcomes

integrating machine learning in intelligent bioinformatics

individualizing learning using intelligent technology and...

intelligent applications with machine learning toolkits

flexible and intelligent learning architectures for sos ......

smarter, greener learning: intelligent buildings

machine learning plus intelligent optimization

kumitron: a multimodal psychomotor intelligent learning

improving math learning through intelligent tutoring

secure, assured, intelligent learning systems (sails)

system architecture of learning analytics in intelligent...

reinforcement learning, intelligent control and their

intelligent tutoring systems (its) for online learning

intelligent computer systems large-scale deep learning...

intelligent data engineering and automated learning

intelligent systems discriminative learning, neural...

intelligent searching: voice searches, machine learning

intelligent learning objects: an agent approach to create...