Download - ID3 Algorithm & ROC Analysis
-
7/23/2019 ID3 Algorithm & ROC Analysis
1/51
ID3 Algorithm &
ROC AnalysisTalha KABAKU
-
7/23/2019 ID3 Algorithm & ROC Analysis
2/51
Agenda
Where are we now? Decision Trees What is ID3?
Entropy Information Gain Pros and Cons of ID3 An Example - The Simpsons What is ROC Analysis? ROC Space ROC Space Example over predictions
-
7/23/2019 ID3 Algorithm & ROC Analysis
3/51
Where are we now?
-
7/23/2019 ID3 Algorithm & ROC Analysis
4/51
Decision Trees
One of the most used classification approach because ofits clear model and presentation
Classification by using data attributes Aim is to reaching estimating destination field
value using source fields Tree Induction
Create tree Apply data into tree to classify
Each branch node represents a choice between anumber of alternatives Each leaf node represents a classification or decision Leaf Count = Rule Count
-
7/23/2019 ID3 Algorithm & ROC Analysis
5/51
Decision Trees (Cont.)
Leafs are inserted through top to bottom
A
B C
GFED
-
7/23/2019 ID3 Algorithm & ROC Analysis
6/51
Sample Decision Tree
-
7/23/2019 ID3 Algorithm & ROC Analysis
7/51
Creating Tree Model by Training Data
-
7/23/2019 ID3 Algorithm & ROC Analysis
8/51
Decision Tree Classification Task
-
7/23/2019 ID3 Algorithm & ROC Analysis
9/51
Apply Model to Test Data
-
7/23/2019 ID3 Algorithm & ROC Analysis
10/51
Apply Model to Test Data (Cont.)
-
7/23/2019 ID3 Algorithm & ROC Analysis
11/51
Apply Model to Test Data (Cont.)
-
7/23/2019 ID3 Algorithm & ROC Analysis
12/51
Apply Model to Test Data (Cont.)
-
7/23/2019 ID3 Algorithm & ROC Analysis
13/51
Apply Model to Test Data (Cont.)
-
7/23/2019 ID3 Algorithm & ROC Analysis
14/51
Apply Model to Test Data (Cont.)
-
7/23/2019 ID3 Algorithm & ROC Analysis
15/51
Decision Tree Algorithms
Classification and RegressionAlgorithms
Twoig Gini
Entropy-based Algorithms
ID3 C4.5 Memory-based (Sample-based)
Classification Algorithms
-
7/23/2019 ID3 Algorithm & ROC Analysis
16/51
Decision Trees by Variable Type
Single Variable Decision Trees Classifications are done with asking
questions over only one variable
Hybrid Decision Trees Classifications are done with asking
questions over both single and multiplevariables
Multiple Variables Decision Trees Classifications are done with asking
questions over multiple variables
-
7/23/2019 ID3 Algorithm & ROC Analysis
17/51
ID3 Algorithm
Iterative Dichotomizer 3 Developed by J. Ross Quinlan in 1979 Based on Entropy
Only works for discrete data Can not work with defective data Advantage over Hunt's algorithm is choosing
the right attribute while classification.
(Hunt's algorithm chooses randomly)
-
7/23/2019 ID3 Algorithm & ROC Analysis
18/51
Entropy
A formula to calculate the homogeneity of asample; gives idea about how muchinformation gain provides each leaf
A complete homogeneous sampleentropy value is 0
An equally divided sample entropy value is 1 Formula:
-
7/23/2019 ID3 Algorithm & ROC Analysis
19/51
Information Gain (IG)
Information Gain calculates effective changein entropy after making a decision based onthe value of an attribute.
Which attribute creates the mosthomogeneous branches?
First the entropy of the total dataset iscalculated.
The dataset is then split on the differentattributes.
-
7/23/2019 ID3 Algorithm & ROC Analysis
20/51
Information Gain (Cont.)
The entropy for each branch is calculated.Then it is added proportionally, to get totalentropy for the split.
The resulting entropy is subtracted from theentropy before the split.
The result is the Information Gain, ordecrease in entropy.
The attribute that yields the largest IG ischosen for the decision node.
-
7/23/2019 ID3 Algorithm & ROC Analysis
21/51
Information Gain (Cont.)
A branch set with entropy of 0 is aleaf node.
Otherwise, the branch needs furthersplitting to classify its dataset. The ID3 algorithm is run recursively
on the non-leaf branches, until all datais classified.
-
7/23/2019 ID3 Algorithm & ROC Analysis
22/51
ID3 Algorithm Steps
functionID3 (R: a set of non-categorical attributes,
C: the categorical attribute,
S: a training set) returns a decision tree;
begin
IfS is empty, return a single node with value Failure;
IfS consists of records all with the same value for
the categorical attribute,
return a single node with that value;IfR is empty, then return a single node with as value
the most frequent of the values of the categorical attribute
that are found in records of S; [note that then there
will be errors, that is, records that will be improperly
classified];
Let D be the attribute with largest Gain( D,S)
among attributes in R;
Let {dj| j=1,2, .., m} be the values of attribute D;
Let {Sj| j=1,2, .., m} be the subsets of S consisting
respectively of records with value dj for attribute D;
Return a tree with root labeled D and arcs labeled
d1, d2, .., dm going respectively to the trees
ID3(R-{D}, C, S1), ID3(R-{D}, C, S2), .., ID3(R-{D}, C, Sm);
endID3;
-
7/23/2019 ID3 Algorithm & ROC Analysis
23/51
Pros of ID3 Algorithm
Builds decision tree in min. steps The most important point while tree
induction is collecting enough reliable
associated data over specific properties. Asking right questions determines tree
induction. Each level benefits from previous level
choices Whole dataset is scanned to create tree
-
7/23/2019 ID3 Algorithm & ROC Analysis
24/51
Cons of ID3 Algorithm
Tree can not be updated when newdata is classified incorrectly, instead
a new tree must be generated. Only one attribute at a time is testedfor making a decision.
Can not work with defective data Can not work with numerical
attributes
-
7/23/2019 ID3 Algorithm & ROC Analysis
25/51
An Example - The Simpsons
Person Hair Length Weight Age Class
Homer 0'' 250 36 M
Marge 10'' 150 34 F
Bart 2'' 90 10 M
Lisa 6'' 78 8 M
Maggie 4'' 20 1 F
Abe 1'' 170 70 F
Selma 8'' 160 41 F
Otto 10'' 180 38 M
Krusty 6'' 200 45 M
-
7/23/2019 ID3 Algorithm & ROC Analysis
26/51
Information Gain over Hair Length
E(4F, 5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 ==> General Information Gain
E(1F,3M) = -(1/4)log2(1/4) - (3/4)log2(3/4) = 0.9710 E(3F,2M) = -(3/5)log2(3/5) - (2/5)log2(2/5)=0.8113
Gain(Hair Length
-
7/23/2019 ID3 Algorithm & ROC Analysis
27/51
Information Gain over Weight
E(4F, 5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911E(4F, 5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 ==> General Information Gain
E(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5) = 0.7219 E(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4) = 0
Gain(Weight
-
7/23/2019 ID3 Algorithm & ROC Analysis
28/51
Information Gain over Age
E(4F, 5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 ==> General Information Gain
E(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6) = 1 E(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3)= 0.9188
Gain(Age z= 40) = 0.9911 (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183
Age
-
7/23/2019 ID3 Algorithm & ROC Analysis
29/51
Results
As seen in the results,weight is the best
attribute to classify these group.
Attribute Information Gain (IG)
Hair Length
-
7/23/2019 ID3 Algorithm & ROC Analysis
30/51
Constructed Decision Tree
Weight
-
7/23/2019 ID3 Algorithm & ROC Analysis
31/51
Entropy over Nominal Values
If an attribute has nominal values: First calculate information gain for each attribute
value Then calculate attribute information gain
-
7/23/2019 ID3 Algorithm & ROC Analysis
32/51
Example II
IG= -(5/15)log2(5/15)-(10/15)log2(10/15) = ~0.918
-
7/23/2019 ID3 Algorithm & ROC Analysis
33/51
Example II (Cont.)
Information Gain over Engine
Engine: 6 small, 5 medium, 4 large 3 values for attribute engine, so we need 3 entropy
calculations
small: 5 no, 1 yes IGsmall = -(5/6)log2(5/6)-(1/6)log2(1/6) = ~0.65
medium: 3 no, 2 yes IGmedium = -(3/5)log2(3/5)-(2/5)log2(2/5) = ~0.97
large: 2 no, 2 yes
IGlarge = 1 (evenly distributed subset)=> IGEngine = IE(S) [(6/15)*IGsmall + (5/15)*IGmedium +(4/15)*Ilarge]
= IGEngine = 0.918 0.85 = 0.068
-
7/23/2019 ID3 Algorithm & ROC Analysis
34/51
Example II (Cont.)
Information Gain over SC/Turbo
SC/Turbo: 4 yes, 11 no 2 values for attribute SC/Turbo, so we need 2 entropy
calculations
yes: 2 yes, 2 no IGturbo = 1 (evenly distributed subset)
no: 3 yes, 8 no IGnoturbo = -(3/11)log2(3/11)-(8/11)log2(8/11) = ~0.84
IGturbo = IE(S) [(4/15)*IGturbo + (11/15)*IGnoturbo]
IGturbo = 0.918 0.886 = 0.032
-
7/23/2019 ID3 Algorithm & ROC Analysis
35/51
Example II (Cont.)
Information Gain over Weight
Weight: 6 Average, 4 Light, 5 Heavy 3 values for attribute weight, so we need 3 entropy
calculations
average: 3 no, 3 yes IGaverage = 1 (evenly distributed subset)
light: 3 no, 1 yes IGlight = -(3/4)log2(3/4)-(1/4)log2(1/4) = ~0.81
heavy: 4 no, 1 yes
IGheavy = -(4/5)log2(4/5)-(1/5)log2(1/5) = ~0.72
IGWeight = IE(S) [(6/15)*IGaverage + (4/15)*IGlight + (5/15)*IGheavy]
IGWeight = 0.918 0.856 = 0.062
-
7/23/2019 ID3 Algorithm & ROC Analysis
36/51
Example II (Cont.)
Information Gain over Full Eco
Fuel Economy: 2 good, 3 average, 10 bad 3 values for attribute Fuel Eco, so we need 3 entropy
calculations
good: 0 yes, 2 no IGgood = 0 (no variability)
average: 0 yes, 3 no IGaverage = 0 (no variability)
bad: 5 yes, 5 no
IGbad = 1 (evenly distributed subset)We can omit calculations for good and average since they always
end up not fast.
IGFuelEco = IE(S) [(10/15)*IGbad]
IGFuelEco = 0.918 0.667 = 0.251
-
7/23/2019 ID3 Algorithm & ROC Analysis
37/51
Example II (Cont.)
Results: Root of the tree
IGEngine 0.068
IGturbo 0.032
IGWeight 0.062
IGFuelEco 0.251
-
7/23/2019 ID3 Algorithm & ROC Analysis
38/51
Example II (Cont.)
Since we selected the Fuel Eco attribute for our Root Node, itis removed from the table for future calculations.
General Information Gain = 1 (Evenly distributed set)
-
7/23/2019 ID3 Algorithm & ROC Analysis
39/51
Example II (Cont.)
Information Gain over Engine Engine: 1 small, 5 medium, 4 large 3 values for attribute engine, so we need 3 entropy calculations small: 1 yes, 0 no
IGsmall = 0 (no variability) medium: 2 yes, 3 no IGmedium = -(2/5)log2(2/5)-(3/5)log2(3/5) = ~0.97
large: 2 no, 2 yes IGlarge = 1 (evenly distributed subset)
IGEngine = IE(SFuelEco) (5/10)*IGmedium + (4/10)*IGlarge]
IGEngine = 1 0.885 = 0.115
-
7/23/2019 ID3 Algorithm & ROC Analysis
40/51
Example II (Cont.)
Information Gain over SC/Turbo SC/Turbo: 3 yes, 7 no 2 values for attribute SC/Turbo, so we need 2 entropy calculations yes: 2 yes, 1 no
IGturbo = -(2/3)log2(2/3)-(1/3)log2(1/3) = ~0.84 no: 3 yes, 4 no IGnoturbo = -(3/7)log2(3/7)-(4/7)log2(4/7) = ~0.84
IGturbo = IE(SFuelEco) [(3/10)*IGturbo + (7/10)*IGnoturbo]
IGturbo = 1 0.965 = 0.035
-
7/23/2019 ID3 Algorithm & ROC Analysis
41/51
Example II (Cont.)
Information Gain over Weight Weight: 3 average, 5 heavy, 2 light 3 values for attribute weight, so we need 3 entropy calculations average: 3 yes, 0 no
IGaverage = 0 (no variability) heavy: 1 yes, 4 no IGheavy = -(1/5)log2(1/5)-(4/5)log2(4/5) = ~0.72
light: 1 yes, 1 no IlGight = 1 (evenly distributed subset)
IGEngine = IE(SFuel Eco) [(5/10)*IGheavy+(2/10)*IGlight]
IGEngine = 1 0.561 = 0.439
-
7/23/2019 ID3 Algorithm & ROC Analysis
42/51
Example II (Cont.)
Results:
Weight has the highest gain, and is thus thebest choice.
IGEngine 0.115
IGturbo 0.035
IGWeight 0.439
-
7/23/2019 ID3 Algorithm & ROC Analysis
43/51
Example II (Cont.)
Since there are only two items for SC/Turbo whereWeight = Light, and the result is consistent, we cansimplify theweight = Light path.
-
7/23/2019 ID3 Algorithm & ROC Analysis
44/51
Example II (Cont.)
Updated Table: (Weight = Heavy)
All cars with large engines in this table are not fast. Due to inconsistent patterns in the data, there is no way toproceed since medium size engines may lead toeither fast or not fast.
-
7/23/2019 ID3 Algorithm & ROC Analysis
45/51
ROC Analysis
Receiver Operating Characteristic The limitations of diagnostic accuracy as a measure
of decision performance require introduction of theconcepts of the sensitivity and specificity of a
diagnostic test. These measures and the relatedindices, true positive rate and false positiverate, are more meaningful than accuracy.
ROC curve is shown to be a complete description of
this decision threshold effect, indicating all possiblecombinations of the relative frequencies of the variouskinds of correct and incorrect decisions.
-
7/23/2019 ID3 Algorithm & ROC Analysis
46/51
ROC Analysis (Cont.)
Combinations of correct & incorrect decisions:
TPRis equivalent with sensitivity. FPRis equivalent with 1 - specificity. Best possible prediction would be 100% sensitivity
and 100% specificity(which means FPR = 0%).
Actual Value Prediction Outcome Description
p p True Positive Rate (TPR)
p n False Negative Rate (FNR)
n p False Positive Rate (FPR)
n n True Negative Rate (TNR)
-
7/23/2019 ID3 Algorithm & ROC Analysis
47/51
ROC Space
A ROC space is defined byFPRand TPRasxand y axes respectively, which depicts relativetrade-offs between true positive (benefits) and
false positive (costs). Since TPRis equivalent with sensitivityandFPRis equal to 1 specificity, the ROC graphis sometimes called the sensitivity vs (1
specificity) plot. Each prediction result one point in the ROCspace.
-
7/23/2019 ID3 Algorithm & ROC Analysis
48/51
Calculations
Sensitivity TPR= TP / P = TP / (TP + FN)
Specificity FPR= FP / N = FP / (FP + TN) Accuracy ACC = (TP + TN) / (P + N)
-
7/23/2019 ID3 Algorithm & ROC Analysis
49/51
A ROC Space Example
Let A, B, C, D to be predictions over 100negative and 100 positive instance:
Prediction/
Combination
TP FP FN TN TPR FPR ACC
A 63 28 37 72 0.63 0.28 0.68
B 77 77 23 23 0.77 0.77 0.50
C 24 88 76 12 0.24 0.88 0.18
D 76 12 24 88 0.76 0.12 0.82
-
7/23/2019 ID3 Algorithm & ROC Analysis
50/51
A ROC Space Example (Cont.)
-
7/23/2019 ID3 Algorithm & ROC Analysis
51/51
References
1. Data Mining Course Lectures, Ass. Prof. NilferYurtay
2. Quinlan, J.R. 1986, Machine Learning, 1, 813. http://www.cse.unsw.edu.
au/~billw/cs9414/notes/ml/06prop/id3/id3.html4. J. Han, M. Kamber, J. Pie, Data Mining Concepts and
Techniques, 3rd Edition, Elsevier, 2011.5. http://www.cise.ufl.edu/~ddd/cap6635/Fall-
97/Short-papers/2.htm6. C. E. Metz, Basic Principles of ROC Analysis,Seminars in Nuclear Medicine, Volume 8, Issue 4, P283-298
http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htmhttp://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htmhttp://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/06prop/id3/id3.htmlhttp://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/06prop/id3/id3.htmlhttp://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htmhttp://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htmhttp://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/06prop/id3/id3.htmlhttp://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/06prop/id3/id3.html