lecture 16 ml dt - github pages · lecture 16: intro to ml and decision trees theodoros rekatsinas...
TRANSCRIPT
![Page 1: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/1.jpg)
CS639:DataManagementfor
DataScienceLecture16:IntrotoMLandDecisionTrees
TheodorosRekatsinas(lecturebyAnkur Goswami manyslidesfromDavidSontag)
1
![Page 2: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/2.jpg)
Today’sLecture
1. IntrotoMachineLearning
2. TypesofMachineLearning
3. DecisionTrees
2
![Page 3: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/3.jpg)
1. IntrotoMachineLearning
3
![Page 4: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/4.jpg)
WhatisMachineLearning?
• “Learningisanyprocessbywhichasystemimprovesperformancefromexperience”– HerbertSimon
• DefinitionbyTomMitchell(1998):MachineLearningisthestudyofalgorithmsthat• ImprovetheirperformanceP• atsometaskT• withexperienceEAwell-definedlearningtaskisgivenby<P,T,E>.
![Page 5: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/5.jpg)
WhatisMachineLearning?
MachineLearningisthestudyofalgorithmsthat• ImprovetheirperformanceP• atsometaskT• withexperienceE
Awell-definedlearningtaskisgivenby<P,T,E>.
Experience:data-driventask,thusstatistics,probabilityExample:useheightandweighttopredictgender
![Page 6: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/6.jpg)
Whendoweusemachinelearning?
MLisusedwhen:• Humanexpertisedoesnotexist(navigatingonMars)• Humanscan’texplaintheirexpertise(speechrecognition)• Modelsmustbecustomized(personalizedmedicine)• Modelsarebasedonhugeamountsofdata(genomics)
![Page 7: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/7.jpg)
Ataskthatrequiresmachinelearning
Whatmakesahanddrawingbe2?
![Page 8: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/8.jpg)
Modernmachinelearning:Autonomouscars
![Page 9: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/9.jpg)
Modernmachinelearning:SceneLabeling
![Page 10: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/10.jpg)
Modernmachinelearning:SpeechRecognition
![Page 11: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/11.jpg)
2.TypesofMachineLearning
11
![Page 12: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/12.jpg)
TypesofLearning
• Supervised(inductive)learning• Given:trainingdata+desiredoutputs(labels)
• Unsupervisedlearning• Given:trainingdata(withoutdesiredoutputs)
• Semi-supervisedlearning• Given:trainingdata+afewdesiredoutputs
• Reinforcementlearning• Rewardsfromsequenceofactions
![Page 13: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/13.jpg)
SupervisedLearning:Regression
• Given• Learnafunctionf(x)topredictygivenx• yisreal-valued==regression
![Page 14: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/14.jpg)
SupervisedLearning:Classification
• Given• Learnafunctionf(x)topredictygivenx• yiscategorical==regression
![Page 15: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/15.jpg)
SupervisedLearning:Classification
• Given• Learnafunctionf(x)topredictygivenx• yiscategorical==regression
![Page 16: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/16.jpg)
SupervisedLearning
• Value xcanbemulti-dimensional.• Eachdimensioncorrespondstoanattribute
![Page 17: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/17.jpg)
TypesofLearning
• Supervised(inductive)learning• Given:trainingdata+desiredoutputs(labels)
• Unsupervisedlearning• Given:trainingdata(withoutdesiredoutputs)
• Semi-supervisedlearning• Given:trainingdata+afewdesiredoutputs
• Reinforcementlearning• Rewardsfromsequenceofactions
Wewillcoverlaterintheclass
![Page 18: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/18.jpg)
3.DecisionTrees
18
![Page 19: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/19.jpg)
Alearningproblem:predictfuelefficiency
![Page 20: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/20.jpg)
Hypotheses:decisiontreesf:X→Y
InformalAhypothesisisacertainfunctionthatwebelieve(orhope)issimilartothetruefunction,the targetfunction thatwewanttomodel.
![Page 21: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/21.jpg)
WhatfunctionscanDecisionTreesrepresent?
![Page 22: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/22.jpg)
Spaceofpossibledecisiontrees
• Howwillwechoosethebestone?• Letsfirstlookathowtosplitnodes,thenconsiderhowtofindthebesttree
![Page 23: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/23.jpg)
Whatisthesimplesttree?
• Alwayspredictmpg=bad• Wejusttakethemajorityclass
• Isthisagoodtree?• Weneedtoevaluateitsperformance
• Performance: Wearecorrecton22examplesandincorrecton18examples
![Page 24: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/24.jpg)
Adecisionstump
![Page 25: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/25.jpg)
Recursivestep
![Page 26: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/26.jpg)
Recursivestep
![Page 27: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/27.jpg)
Secondleveloftree
![Page 28: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/28.jpg)
![Page 29: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/29.jpg)
Arealldecisiontreesequal?
• Manytreescanrepresentthesameconcept• But,notalltreeswillhavethesamesize!• e.g., φ = ( A∧ B)∨(¬A∧ C) -- ((A and B) or ( not A and C))
• Whichtreedoweprefer?
![Page 30: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/30.jpg)
Learningdecisiontreesishard
• Learningthesimplest(smallest)decisiontreeisanNP-completeproblem[Hyafil &Rivest ’76]• Resorttoagreedyheuristic:• Startfromemptydecisiontree• Splitonnextbestattribute(feature)• Recurse
![Page 31: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/31.jpg)
Splitting:choosingagoodattribute
![Page 32: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/32.jpg)
Measuringuncertainty
• Goodsplitifwearemorecertainaboutclassificationaftersplit• Deterministicgood(alltrueorallfalse)• Uniformdistributionbad• Whataboutdistributionsinbetween?
![Page 33: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/33.jpg)
Entropy
![Page 34: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/34.jpg)
High,LowEntropy
![Page 35: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/35.jpg)
EntropyExample
![Page 36: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/36.jpg)
ConditionalEntropy
![Page 37: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/37.jpg)
Informationgain
![Page 38: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/38.jpg)
Learningdecisiontrees
![Page 39: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/39.jpg)
![Page 40: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/40.jpg)
Adecisionstump
![Page 41: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/41.jpg)
![Page 42: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/42.jpg)
![Page 43: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/43.jpg)
![Page 44: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/44.jpg)
BaseCases:AnIdea
• BaseCaseOne: Ifallrecordsincurrentdatasubsethavethesameoutputthendonotrecurse• BaseCaseTwo: Ifallrecordshaveexactlythesamesetofinputattributesthendonotrecurse
![Page 45: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/45.jpg)
TheproblemwithBaseCase3
![Page 46: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/46.jpg)
IfweomitBaseCase3
![Page 47: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/47.jpg)
Summary:BuildingDecisionTrees
![Page 48: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/48.jpg)
Fromcategoricaltoreal-valuedattributes
![Page 49: Lecture 16 ML DT - GitHub Pages · Lecture 16: Intro to ML and Decision Trees Theodoros Rekatsinas (lecture by AnkurGoswamimany slides from David Sontag) 1. Today’s Lecture 1. Intro](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec67c17ae6d260984337f4b/html5/thumbnails/49.jpg)
Whatyouneedtoknowaboutdecisiontrees
• DecisiontreesareoneofthemostpopularMLtools• Easytounderstand,implement,anduse• Computationallycheap(tosolveheuristically)
• Informationgaintoselectattributes• Presentedforclassificationbutcanbeusedforregressionanddensityestimationtoo• Decisiontreeswilloverfit!!!• Wewillseethedefinitionofoverfittingandrelatedconceptslaterinclass.