decision trees prof. carolina ruiz dept. of computer science wpi

10
Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI

Upload: penelope-hunt

Post on 18-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

Which attribute splits the data more homogenously? [0,1,3] [2,1,2] [3,1,1] bad unknown good [3,3,2] [2,1,4] low high [3,2,6] [2,1,0] none adequate [0,0,4] [0,2,2] [5,1,0] >35 low moderate high Goal: Assign a unique number to each attribute that represents how well it “splits” the dataset according to the target attribute Target

TRANSCRIPT

Page 1: Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI

Decision Trees

Prof. Carolina RuizDept. of Computer Science

WPI

Page 2: Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI

Constructing a decision tree

?

Which attribute to use as the root node? That is, which attribute to check first when making a prediction?

Pick the attribute that brings us closer to a decision.That is, the attribute that splits the data more homogenously.

Page 3: Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI

Which attribute splits the data more homogenously? [0,1,3] [2,1,2] [3,1,1] bad unknown good

[3,3,2] [2,1,4] low high

[3,2,6] [2,1,0] none adequate

[0,0,4] [0,2,2] [5,1,0] 0-15 15-35 >35

low moderate high

Goal: Assign a unique number to each attribute that represents how well it “splits” the dataset according to the target attribute

Target

Page 4: Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI

For example …

What function f to use?f([0,1,3],[2,1,2],[3,1,1]) = number

Possible f functions:

• Gini Indexmeasure of impurity

• Entropy from information theory

• Misclassification errormetric used by OneR

[0,1,3] [2,1,2] [3,1,1] bad unknown good

Page 5: Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI

Using entropy as the f metric

f([0,1,3],[2,1,2],[3,1,1])

= Entropy([0,1,3],[2,1,2],[3,1,1])

= (4/14)*Entropy([0,1,3]) + (5/14)*Entropy([2,1,2]) + (5/14)*Entropy([3,1,1])

= (4/14)*[-0 -1/4 log2(1/4) -3/4 log2(3/4) ]

+ (5/14)*[-2/5 log2(2/5) -1/5 log2(1/5) -2/5 log2(2/5) ]

+ (5/14)*[-3/5 log2(3/5) -1/5 log2(1/5) -1/5 log2(1/5) ]

= 1.265

[0,1,3] [2,1,2] [3,1,1] bad unknown good

In general: Entropy([p,q,…,z])= - (p/m)log2(p/m) – (q/m)log2(q/m)

- … - (z/m)log2(z/m)

where m = p+q+…+z

Page 6: Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI

Which attribute splits the data more homogenously? [0,1,3] [2,1,2] [3,1,1] bad unknown good

[3,3,2] [2,1,4] low high

[3,2,6] [2,1,0] none adequate

[0,0,4] [0,2,2] [5,1,0] 0-15 15-35 >35

low moderate high

Attribute with lowest entropy is chosen:

income

Target

1.2651.467 1.324

0.564

Page 7: Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI

Constructing a decision tree

?

Which attribute to use as the root node? That is, which attribute to check first when making a prediction?

Pick the attribute that brings us closer to a decision.That is, the attribute that splits the data more homogenously.

Page 8: Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI

Constructing a decision tree

income

0-15 15-35 > 35

prediction: high ? ?

?

Page 9: Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI

Splitting instances with income = 15-35

[0,0,1], [0,1,1],[0,1,0] [0,1,0], [0,1,2] [0,2,2], [0,0,0]

entropy: 0.5 0.688 1

<- high

<- moderate

attribute with lowest entropy

Page 10: Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI

Constructing a decision tree

income

0-15 15-35 > 35

prediction: high

… … …

Credit-history