bab 4.1 - 1/44 bab 4 classification: basic concepts, decision trees & model evaluation part 1...

44
Bab 4.1 - 1/44 Bab 4 Bab 4 Classification: Basic Concepts, Classification: Basic Concepts, Decision Trees & Model Evaluation Decision Trees & Model Evaluation Part 1 Part 1 Classification With Decision tree Classification With Decision tree

Upload: anne-hollyfield

Post on 14-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 1/44

Bab 4Bab 4Classification: Basic Concepts,Classification: Basic Concepts,

Decision Trees & Model Decision Trees & Model EvaluationEvaluation

Part 1Part 1Classification With Decision Classification With Decision

treetree

Page 2: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 2/44

Classification: Definition

Page 3: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 3/44

Example of Classification Task

Page 4: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 4/44

General Approach for Building Classification Model

Page 5: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 5/44

Classification Techniques

Page 6: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 6/44

Example of Decision Tree

Page 7: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 7/44

Another Example of Decision Tree

Page 8: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 8/44

Decision Tree Classification Task

Page 9: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 9/44

Apply Model to Test Data

Page 10: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 10/44

Decision Tree Classification Task

Page 11: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 11/44

Decision Tree Induction

Page 12: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 12/44

General Structure of Hunt’s Algorithm

Page 13: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 13/44

Hunt’s Algorithm

Page 14: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 14/44

Design Issues of Decision Tree Induction

Page 15: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 15/44

Methods for Expression Test Conditions

Page 16: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 16/44

Test Condition for Nominal Attributes

Page 17: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 17/44

Test Condition for Ordinal Attributes

Page 18: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 18/44

Test Condition for Continues Attributes

Page 19: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 19/44

Splitting Based on Continues Attributes

Page 20: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 20/44

How to Determine the Best Split / 1

Page 21: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 21/44

How to Determine the Best Split / 2

Page 22: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 22/44

Measures of Node Impurity

Page 23: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 23/44

Finding the Best Split / 1

Page 24: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 24/44

Finding the Best Split / 2

Page 25: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 25/44

Measure of Impurity: GINI

Page 26: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 26/44

Computing GINI Index of a Single Node

Page 27: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 27/44

Computing GINI Index for a Collection of Nodes

Page 28: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 28/44

Binary Attributes: Computing GINI Index

Page 29: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 29/44

Categorical Attributes: Computing GINI Index

Page 30: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 30/44

Continuous Attributes: Computing GINI Index / 1

Page 31: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 31/44

Continuous Attributes: Computing GINI Index / 2

Page 32: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 32/44

Measure of Impurity: Entropy

Page 33: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 33/44

Computing Entropy of a Single Node

Page 34: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 34/44

Computing information Gain After Splitting

Page 35: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 35/44

Problems with Information Gain

Page 36: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 36/44

Gain Ratio

Page 37: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 37/44

Measure of Impurity: Classification Error

Page 38: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 38/44

Computing Error of a Single Node

Page 39: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 39/44

Comparison among Impurity Measures

For binary (2-class) classification problems

Page 40: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 40/44

Misclassification Error vs Gini index

Page 41: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 41/44

Example: C4.5

• Simple depth-first construction.• Uses Information Gain• Sorts Continuous Attributes at each node.• Needs entire data to fit in memory.• Unsuitable for Large Datasets.

Needs out-of-core sorting.

• You can download the software from:http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.gz

Page 42: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 42/44

Scalable Decision Tree Induction / 1

• How scalable is decision tree induction? Particularly suitable for small data set

• SLIQ (EDBT’96 — Mehta et al.) Builds an index for each attribute and only class

list and the current attribute list reside in memory

Page 43: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 43/44

Scalable Decision Tree Induction / 2

• SLIQ

Sample data for the class buys_computer

Disk-resident attribute lists Memory-resident class list

RID Credit_rating Age Buys_computer

1 excellent 38 yes

2 excellent 26 yes

3 fair 35 no

4 excellent 49 no

Credit_rating

RID

excellent 1

excellent 2

excellent 4

fair 3

… …

age RID

26 2

35 3

38 1

49 4

… …

RID Buys_computer

node

1 yes 5

2 yes 2

3 no 3

4 no 6

… … …

0

1 2

3 4

5 6

Page 44: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree

Bab 4.1 - 44/44

Decision Tree Based Classification• Advantages

Inexpensive to construct Extremely fast at classifying unknown records Easy to interpret for small-sized tress Accuracy is comparable to other classification

techniques for many data sets

• Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification