cs690l data mining: classification reference: j. han and m. kamber, data mining: concepts and...
TRANSCRIPT
CS690LData Mining: Classification
Reference:
J. Han and M. Kamber, Data Mining: Concepts and Techniques
Yong Fu: http://web.umr.edu/~yongjian/cs401dm/
Classification
• Classification determine the class or category of an object based on its properties• Two stages
– Learning stage: construction of a classification function or model– Classification stage: prediction of classes of objects using the
function or model• Tools for classification
– Decision tree– Bayesian networks– Neural networks– Regression
• Problem– Given a set of objects whose classes are known called training set
derive a classification model which can correctly classify future objects
Classification: Decision Tree
• Classification model: decision tree• Method: Top Down Induction of Decision Trees• Data representation:
– Every object is represented by a vector of values on a fixed set of attributes. If a relation is defined on the attributes an object is a tuple in the relation.
– A special attribute called class attribute tells the group/category the object belongs to which is the dependent attribute to be predicted
• Learning stage:– Induction of a decision tree that classifies the training set
• Classification stage: – The decision tree will classify new objects.
An Example
• Definitions A decision tree is a tree in which each non-leaf node corresponds to an attribute of objects and each branch from a non-leaf node to its children represents a value of the attribute. Each leaf node in a decision tree is labeled by a class of the objects
• Classification using decision trees Starting from the root an object follows the path to a leaf node which gives the class of the object taking branches according to its values along the way
• Alternative view of decision tree • Node/Branch: discrimination test• Node: subset of objects satisfying test
Decision Tree Induction
• Induction of decision trees:
Starting from a training set recursively selecting attributes to split nodes thus partitioning the objects– Termination condition: when to stop splitting a node– Selection of attribute for splitting testing:
• Best split• A measure for splitting?
• ID3 algorithm– Selection: attribute information gain– Termination condition: all objects are in a single class
ID3 Algorithm
ID3 Algorithm (Cont)
Example
• Information content of C (Expected information for the classification)
I(P) = Ent(C)= - {(9/14) log2 ( 9/14) + (5/14)log2 (5/14)} = 0.940
• For each Attribute Ai– Step 1: Compute the entropy for a given attribute Ai
Ent(Sunny) = - ((2/5 log2 2/5) + (3/5 log2 3/5)) = 0.97Ent(Rainy) = 0.97Ent(Overcast) = 0
– Step 2: Compute the Entropy (expected information based on the partitioning into Subsets by A)
Ent(C, Outlook) = (5/14)Ent(Sunny) + (5/14)Ent(Rainy) + (4/14)Ent(Overcast) = (5/14)(0.97) + (5/14)(0.97) + (4/14)(0) = 0.69
– Step 3: Gain(C, Outlook) = Ent(C) – Ent(C, Outlook) = 0.940 – 0.69 = 0.25
• Select the attribute that maximize information gain• Build a node for the selected attribute
Recursively build nodes.
Example: Decision Tree Building
Level1: Decision Tree Building
Outlook
Temp Hum Wind Class
85 85 False DP
80 90 True DP
72 95 False DP
69 70 False P
75 70 True P
Temp Hum Wind Class
83 88 False P
64 65 True P
72 90 False P
81 75 False P
Temp Hum Wind Class70 96 False P71 80 False P72 70 True DP75 80 False P71 96 True DP
RainySunny
Overcast
Decision Tree
Generated Rules
ID3 Algorithm
C4.5 Extensions to ID3• Gain ratio: Gain favors attributes with many values GainRatio (C, A) = Gain(C, A)/Ent(P) where P = (|T1|/|C|, |T2|/C, … |Tm|/|C|)
and Ti are partitions of C based on object’s value of A. e.g. GainRatio (Outlook) = Gain(Outlook)/ {(5/14) log2 (5/14) + (5/14)log2 (5/14)
+ (4/14)log2 (4/14) } • Missing values:
– consider only objects without the attribute is defined. • Continuous attributes:
– consider all binary splits A <= ai and A > ai where ai is the ith values of A. – compute the gain or gain ratio and choose the split that maximizes the gain
or gain ratio• Over-fitting: Change the termination condition. If a subtree is dominated by a
class stop splitting• Tree pruning: replacing a subtree by a single leaf node. When the expected
classification error can be reduced• Rule deriving: A rule basically corresponds to a path from root to a leaf The
LHS is the conjunction of testing and the RHS is the class prediction• Rule simplification: removing some conditions in the LHS
Evaluation of Decision Tree Methods
• Complexity
• Expressive power
• Robustness
• Effectiveness