decision trees prof. carolina ruiz dept. of computer science wpi

Decision Trees

Prof. Carolina RuizDept. of Computer Science

WPI

Constructing a decision tree

?

Which attribute to use as the root node? That is, which attribute to check first when making a prediction?

Pick the attribute that brings us closer to a decision.That is, the attribute that splits the data more homogenously.

Which attribute splits the data more homogenously? [0,1,3] [2,1,2] [3,1,1] bad unknown good

[3,3,2] [2,1,4] low high

[3,2,6] [2,1,0] none adequate

[0,0,4] [0,2,2] [5,1,0] 0-15 15-35 >35

low moderate high

Goal: Assign a unique number to each attribute that represents how well it “splits” the dataset according to the target attribute

Target

For example …

What function f to use?f([0,1,3],[2,1,2],[3,1,1]) = number

Possible f functions:

• Gini Indexmeasure of impurity

• Entropy from information theory

• Misclassification errormetric used by OneR

[0,1,3] [2,1,2] [3,1,1] bad unknown good

Using entropy as the f metric

f([0,1,3],[2,1,2],[3,1,1])

= Entropy([0,1,3],[2,1,2],[3,1,1])

= (4/14)*Entropy([0,1,3]) + (5/14)*Entropy([2,1,2]) + (5/14)*Entropy([3,1,1])

= (4/14)*[-0 -1/4 log2(1/4) -3/4 log2(3/4) ]

+ (5/14)*[-2/5 log2(2/5) -1/5 log2(1/5) -2/5 log2(2/5) ]

+ (5/14)*[-3/5 log2(3/5) -1/5 log2(1/5) -1/5 log2(1/5) ]

= 1.265

[0,1,3] [2,1,2] [3,1,1] bad unknown good

In general: Entropy([p,q,…,z])= - (p/m)log2(p/m) – (q/m)log2(q/m)

- … - (z/m)log2(z/m)

where m = p+q+…+z

Which attribute splits the data more homogenously? [0,1,3] [2,1,2] [3,1,1] bad unknown good

[3,3,2] [2,1,4] low high

[3,2,6] [2,1,0] none adequate

[0,0,4] [0,2,2] [5,1,0] 0-15 15-35 >35

low moderate high

Attribute with lowest entropy is chosen:

income

Target

1.2651.467 1.324

0.564


?

Which attribute to use as the root node? That is, which attribute to check first when making a prediction?

Pick the attribute that brings us closer to a decision.That is, the attribute that splits the data more homogenously.


income

0-15 15-35 > 35

prediction: high ? ?

?

Splitting instances with income = 15-35

[0,0,1], [0,1,1],[0,1,0] [0,1,0], [0,1,2] [0,2,2], [0,0,0]

entropy: 0.5 0.688 1

<- high

<- moderate

attribute with lowest entropy


income

0-15 15-35 > 35

prediction: high

… … …

Credit-history

decision trees prof. carolina ruiz dept. of computer science wpi

Documents