csc 4510 – machine learning
DESCRIPTION
Lecture 3: Classification and Decision Trees. CSC 4510 – Machine Learning. Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/. Last time:Machine learning Overview Supervised Learning Classification - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/1.jpg)
CSC 4510 – Machine LearningDr. Mary-Angela PapalaskariDepartment of Computing SciencesVillanova University
Course website:www.csc.villanova.edu/~map/4510/
Lecture 3: Classification and Decision Trees
1CSC 4510 - M.A. Papalaskari - Villanova University
![Page 2: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/2.jpg)
Last time:Machine learning Overview• Supervised Learning
– Classification– Regression
• Unsupervised learning
Others: Reinforcement learning, recommender systems.
Also talk about: Practical advice for applying learning algorithms.
CSC 4510 - M.A. Papalaskari - Villanova University 2
![Page 3: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/3.jpg)
Supervised or Unsupervised learning?Iris Data
![Page 4: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/4.jpg)
Resources: Datasets
• UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html
• UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html
• Statlib: http://lib.stat.cmu.edu/
• Delve: http://www.cs.utoronto.ca/~delve/
4CSC 4510 - M.A. Papalaskari - Villanova University
![Page 5: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/5.jpg)
Example: adult.data Dataset description from UCI Repository
CSC 4510 - M.A. Papalaskari - Villanova University 5
![Page 6: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/6.jpg)
UCI Repository: adult.data
CSC 4510 - M.A. Papalaskari - Villanova University 6
![Page 7: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/7.jpg)
Our Sample Data%,Class data,,,,,%,major 1=CS; 2=psych; 3=other,class (1=freshman; 2=sophomore;=graduate or other),birthday month (number),eyecolor (0=blue; =brown;
2=other),Do you prefer apples(1) or oranges (0)?,T: major,class,bmonth,eyecolor,aORo;A: 2,3,6,1,0;A: 1,2,3,1,1;A: 2,3,5,1,1;A: 3,4,7,1,1;A: 1,4,10,1,0;A: 3,4,6,1,0;A: 2,3,10,0,1;A: 1,4,7,1,0;A: 2,3,3,1,1;A: 3,3,7,1,1;A: 1,4,8,2,1;A: 1,4,4,1,0;A: 3,4,3,0,1;A: 3,4,2,2,1;A: 3,4,8,1,1;A: 1,4,2,2,0;A: 1,5,8,1,1;A: 1,5,4,0,1;A: 2,5,11,2,0;
CSC 4510 - M.A. Papalaskari - Villanova University 7
![Page 8: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/8.jpg)
8
Classification (Categorization)
• Given:– A description of an instance, xX, where X is the instance
language or instance space.– A fixed set of categories: C={c1, c2,…cn}
• Determine:– The category of x: c(x)C, where c(x) is a categorization
function whose domain is X and whose range is C.– If c(x) is a binary function C={0,1} ({true,false}, {positive,
negative}) then it is called a concept.
CSC 4510 - M.A. Papalaskari - Villanova University
![Page 9: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/9.jpg)
9
Tiny Example of Category Learning
• Instance attributes: <size, color, shape>– size {small, medium, large}– color {red, blue, green}– shape {square, circle, triangle}
• C = {positive, negative}
• D: Example Size Color Shape Category
1 small red circle positive
2 large red circle positive
3 small red triangle negative
4 large blue circle negativeCSC 4510 - M.A. Papalaskari - Villanova University
![Page 10: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/10.jpg)
10
Hypothesis Selection• Many hypotheses are usually consistent with the
training data.– red & circle– (small & circle) or (large & red) – (small & red & circle) or (large & red & circle)– not [ ( red & triangle) or (blue & circle) ]– not [ ( small & red & triangle) or (large & blue & circle) ]
• Bias– Any criteria other than consistency with the training
data that is used to select a hypothesis.
CSC 4510 - M.A. Papalaskari - Villanova University
![Page 11: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/11.jpg)
11
Generalization
• Hypotheses must generalize to correctly classify instances not in the training data.
• Simply memorizing training examples is a consistent hypothesis that does not generalize.
• Occam’s razor:– Finding a simple hypothesis helps ensure
generalization.
CSC 4510 - M.A. Papalaskari - Villanova University
![Page 12: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/12.jpg)
12
Hypothesis Space• Restrict learned functions a priori to a given hypothesis
space, H, of functions h(x) that can be considered as definitions of c(x).
• For learning concepts on instances described by n discrete-valued features, consider the space of conjunctive hypotheses represented by a vector of n constraints
<c1, c2, … cn> where each ci is either:– ?, a wild card indicating no constraint on the ith feature– A specific value from the domain of the ith feature– Ø indicating no value is acceptable
• Sample conjunctive hypotheses are– <big, red, ?>– <?, ?, ?> (most general hypothesis)– < Ø, Ø, Ø> (most specific hypothesis)
CSC 4510 - M.A. Papalaskari - Villanova University
![Page 13: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/13.jpg)
Decision Tree Creation
Example: Do We Want to Wait in a Restaurant?
13CSC 4510 - M.A. Papalaskari - Villanova University
![Page 14: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/14.jpg)
Decision Tree Creation• One Possible Decision Tree:
14CSC 4510 - M.A. Papalaskari - Villanova University
![Page 15: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/15.jpg)
Creating Efficient Decision Trees
15CSC 4510 - M.A. Papalaskari - Villanova University
![Page 16: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/16.jpg)
Decision Tree Induction
Many Trees, which to prefer?
Occam’s Razor: The most likely explanation for a set of observations is the simplest explanation.
Assumption: “Smallest Tree” == “Simplest”
16CSC 4510 - M.A. Papalaskari - Villanova University
![Page 17: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/17.jpg)
Decision Tree Induction Issues
UNFORTUNATELY: Finding smallest Tree is Intractable!
(what does this mean?)
17CSC 4510 - M.A. Papalaskari - Villanova University
![Page 18: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/18.jpg)
Heuristics to the Rescue!
• Algorithm:
18CSC 4510 - M.A. Papalaskari - Villanova University
![Page 19: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/19.jpg)
Informal Argument: Choosing Attributes
Some Attributes just discriminate better than others
19CSC 4510 - M.A. Papalaskari - Villanova University
![Page 20: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/20.jpg)
Choosing and Ordering Attribute-Tests
Information Theory“How many bits is a question’s answer worth?”Coin Toss: Fair vs. Rigged
Observation: 1 bit is enough to answer a yes/no question about which one has NO idea.If answers Vi have probabilities P(Vi), then we must weight the number of bits for each answer by its probability to get an overall average number of bits required to represent any answer.
I(P(v1),P(v
2),...,P(v
n)) = - Σ [ (P vi) * log2 (P vi)]I( (P v1), (P v2),..., (P vn)) = - Σ [ (P vi) * log2 (P vi)]
20CSC 4510 - M.A. Papalaskari - Villanova University
![Page 21: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/21.jpg)
Choosing Attributes
Given “p” positive examples of concept “F(x)” and “n” negative examples, what is I(“correctly identify instances of concept X”)?
I( , ) = I ( , ) = p+np+npp
p+np+nnn
--p+np+nnn
log2
log2 p+np+n
nn
p+np+npp
log2
log2
--p+np+n
pp
21CSC 4510 - M.A. Papalaskari - Villanova University
![Page 22: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/22.jpg)
Choosing/Ordering Decision Tree Attributes
If one knows the answer/value of an attribute, how much unknown information about the overall concept are we still missing?
. . . . . .. . . . . .v1
v1
vk
vk
Attribute A
p1 YES
n1 NO
pi YES
ni NO
pk YES
nk NO
I ( , ) I ( , ) p
i+n
ip
i+n
i
pi
pi
pi+n
ip
i+n
i
ni
ni
. . . . . .. . . . . .v1
v1
vk
vk
Attribute A
p1 YES
n1 NO
pi YES
ni NO
pk YES
nk NO
I ( , ) I ( , ) p
i+n
ip
i+n
i
pi
pi
pi+n
ip
i+n
i
ni
ni
Remainder (A) = Remainder (A) = ΣΣ=1i=1i
kk
I( , ) I( , ) p
i+n
ip
i+n
i
pi
pi
pi+n
ip
i+n
i
ni
ni
pi + n
ip
i + n
i
p+np+n
22CSC 4510 - M.A. Papalaskari - Villanova University
![Page 23: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/23.jpg)
Heuristically Choosing Attributes
When adding tests to a tree, always add the next attribute that gives us the largest information gain:
What happens when a leaf node is ambiguous (has both + and - examples)
when our decision path gets us to such a node, randomly give a yes/no answer according to the yes/no probabilities at that node
23CSC 4510 - M.A. Papalaskari - Villanova University
![Page 24: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/24.jpg)
When to Use/Not Use Decision Trees
ExpressivenessPro: any Boolean Function can be representedCon: many BFs don’t have compact trees
Overfitting: finding meaningless regularities in data
Solution 1 (Pruning): don’t use attributes whose G(A) is close to zero; use Chi-Squared tests for significance.Solution 2: (Cross Validation) Prefer trees with higher predictive accuracy on set-aside data.
24CSC 4510 - M.A. Papalaskari - Villanova University
![Page 25: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/25.jpg)
Our Sample Data
• Lets revisit the sample data from our class in AiSpace:• Download and save the file with student data• From the main tools page in AIspace.org select
“Decision Trees” • Launch the decision trees tool using Java web start (use
the first link on that page)• Load the example and use the “Step” button to build the
tree.• Observe the choice of nodes split by the decision tree
algorithm
CSC 4510 - M.A. Papalaskari - Villanova University 25
![Page 26: CSC 4510 – Machine Learning](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813cc9550346895da6726c/html5/thumbnails/26.jpg)
Class Exercise• Practice using decision tree learning on some of the sample
datasets available in AISpace
26CSC 4510 - M.A. Papalaskari - Villanova University
Some of the slides in this presentation are adapted from:• Prof. Frank Klassner’s ML class at Villanova• the University of Manchester ML course http://www.cs.manchester.ac.uk/ugt/COMP24111/• The Stanford online ML course http://www.ml-class.org/