com24111: machine learning decision trees gavin brown gbrown
DESCRIPTION
Q. Where is a good threshold? Also known as “decision stump”TRANSCRIPT
![Page 1: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/1.jpg)
COM24111: Machine Learning
Decision Trees
Gavin Brownwww.cs.man.ac.uk/~gbrown
![Page 2: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/2.jpg)
Recap: threshold classifiers
height
weightt
dancer"" else player"" then)( if tweight
![Page 3: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/3.jpg)
Q. Where is a good threshold?
10 20 3040 50 60
10
Also known as “decision stump”
![Page 4: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/4.jpg)
From Decision Stumps, to Decision Trees
- New type of non-linear model
- Copes naturally with continuous and categorical data
- Fast to both train and test (highly parallelizable)
- Generates a set of interpretable rules
![Page 5: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/5.jpg)
Recap: Decision Stumps
10 20 3040 50 60
The stump “splits” the dataset.
Here we have 4 classification errors.
10.514.117.021.023.2
27.130.142.047.057.359.9
yesno
10
predict 0 predict 1
![Page 6: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/6.jpg)
A modified stump
10 20 3040 50 60
10.514.117.021.023.227.130.142.047.0
57.359.9
yesno
10
Here we have 3 classification errors.
![Page 7: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/7.jpg)
Recursion…
10.514.117.021.023.2
27.130.142.047.057.359.9
yesno
Just another dataset!
Build a stump!
no yes no yes
![Page 8: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/8.jpg)
Decision Trees = nested rules
yesno
no yes no yes
10 20 3040 50 60
if x>25 thenif x>50 then y=0 else y=1; endif
elseif x>16 then y=0 else y=1; endif
endif
![Page 9: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/9.jpg)
Trees build “orthogonal” decision boundaries.
Boundary is piecewise, and at 90 degrees to feature axes.
![Page 10: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/10.jpg)
The most important concept in Machine Learning
![Page 11: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/11.jpg)
Looks good so far…
The most important concept in Machine Learning
![Page 12: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/12.jpg)
Looks good so far…
Oh no! Mistakes!What happened?
The most important concept in Machine Learning
![Page 13: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/13.jpg)
Looks good so far…
Oh no! Mistakes!What happened?
We didn’t have all the data.
We can never assume that we do.
This is called “OVER-FITTING”to the small dataset.
The most important concept in Machine Learning
![Page 14: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/14.jpg)
Number of possible paths down tree tells you the number of rules.More rules = more complicated.
Could have N rules, where N is the num of examples in training data.… the more rules… the more chance of OVERFITTING.
yesno
no yes no yes
![Page 15: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/15.jpg)
tree depth / length of rules
testing error
optimal depth about here
1 2 3 4 5 6 7 8 9
starting to overfit
Overfitting….
![Page 16: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/16.jpg)
![Page 17: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/17.jpg)
Take a short break.
Talk to your neighbours.
Make sure you understand the recursive algorithm.
![Page 18: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/18.jpg)
if x>25 thenif x>50 then y=0 else y=1; endif
elseif x>16 then y=0 else y=1; endif
endif
Decision trees can be seen as nested rules.Nested rules are FAST, and highly parallelizable.
x,y,z-coordinates per joint, ~60 totalx,y,z-velocities per joint, ~60 totaljoint angles (~35 total)joint angular velocities (~35 total)
![Page 19: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/19.jpg)
Real-time human pose recognition in parts from single depth imagesComputer Vision and Pattern Recognition 2011Shotton et al, Microsoft Research
• Trees are the basis of Kinect controller
• Features are simple image properties.
• Test phase: 200 frames per sec on GPU
• Train phase more complex but still parallel
![Page 20: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/20.jpg)
10
0
5
0 5
10
0
5
We’ve been assuming continuous variables!
10 20 3040 50 60
![Page 21: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/21.jpg)
The Tennis Problem
![Page 22: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/22.jpg)
Outlook
Humidity
HIGH
RAINSUNNYOVERCAST
NO
Wind
YES
WEAKSTRONG
NO
NORMAL
YES
YES
![Page 23: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/23.jpg)
![Page 24: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/24.jpg)
The Tennis Problem
Note: 9 examples say “YES”, while 5 say “NO”.
![Page 25: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/25.jpg)
Partitioning the data…
![Page 26: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/26.jpg)
Thinking in Probabilities…
![Page 27: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/27.jpg)
The “Information” in a feature
H(X) = 1
More uncertainty = less information
![Page 28: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/28.jpg)
The “Information” in a feature
H(X) = 0.72193
Less uncertainty = more information
![Page 29: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/29.jpg)
Entropy
![Page 30: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/30.jpg)
Calculating Entropy
![Page 31: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/31.jpg)
Information Gain, also known as “Mutual Information”
![Page 32: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/32.jpg)
maximum information gain
gain
![Page 33: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/33.jpg)
Outlook
Temp Temp
Humidity Humidity
HIGH
RAINSUNNYOVERCAST
Temp
Humidity
MILD
Wind Wind
HIGH
YES
WEAK STRONG
NO
NORMAL
YES
WEAKSTRONG
NO
HOT
NO
COOL
Wind
YES
WEAKSTRONG
NO
Humidity
MILD HOT
Wind
HIGH
STRONG
YES
COOL
YES
MILD
HIGH
NO YES
NORMALNORMAL
NO
YES YES
COOL
NO
WEAK
YES
NORMAL
![Page 34: COM24111: Machine Learning Decision Trees Gavin Brown gbrown](https://reader036.vdocuments.site/reader036/viewer/2022062306/5a4d1b8b7f8b9ab0599befe5/html5/thumbnails/34.jpg)
Decision trees
Trees learnt by recursive algorithm, ID3 (but there are many variants of this)
Trees draw a particular type of decision boundary (look it up in these slides!)
Highly efficient, highly parallelizable, lots of possible “splitting” criteria.
Powerful methodology, used in lots of industry applications.