lecture 3b: decision trees (1 part)

22
Machine Learning for Language Technology 2015 http://stp.lingfil.uu.se/~santinim/ml/2015/ml4lt_2015.htm Decision Trees (1 part) Marina Santini [email protected] Department of Linguistics and Philology Uppsala University, Uppsala, Sweden Autumn 2015

Upload: marina-santini

Post on 07-Jan-2017

1.091 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Lecture 3b: Decision Trees (1 part)

Machine Learning for Language Technology 2015http://stp.lingfil.uu.se/~santinim/ml/2015/ml4lt_2015.htm

Decision Trees (1 part)

Marina [email protected]

Department of Linguistics and PhilologyUppsala University, Uppsala, Sweden

Autumn 2015

Page 2: Lecture 3b: Decision Trees (1 part)

Outline

• Greediness

• Divide and Conquer

• Inductive Bias of the Decision Tree

• Loss function

• Expected loss

• Empirical error

• Induction

Lecture 3: Decision Trees (1) 2

Page 3: Lecture 3b: Decision Trees (1 part)

Learning: Generalization Ability

• Predicting the future based on the past

Lecture 3: Decision Trees (1) 3

Page 4: Lecture 3b: Decision Trees (1 part)

Predict whether a student will like a course

Lecture 3: Decision Trees (1) 4

Page 5: Lecture 3b: Decision Trees (1 part)

Training Data

Lecture 3: Decision Trees (1) 5

Page 6: Lecture 3b: Decision Trees (1 part)

That is, ....

• Questions = Features

• Answers = Feature Values

• Ratings = Class Labels

• An example is a set of feature values.

• Traning data is a set of examples associated with class labels.

Lecture 3: Decision Trees (1) 6

Page 7: Lecture 3b: Decision Trees (1 part)

”Greedy model”: the most useful feature

– Histograms

– Rood node

Lecture 3: Decision Trees (1) 7

Page 8: Lecture 3b: Decision Trees (1 part)

Divide & Conquer

• Divide:

– Partition the date into 2 parts:

• YES part vs NO part

• Conquer:

– Recurse and run the Divide routine

Lecture 3: Decision Trees (1) 8

Page 9: Lecture 3b: Decision Trees (1 part)

The end of the cycle

• ... When it becomes useless to query on additional features

Lecture 3: Decision Trees (1) 9

Page 10: Lecture 3b: Decision Trees (1 part)

Decision tree: Inductive Bias

• The goal of the decision tree learning model is:– to figure out what questions to ask

– in what order

– what answer to predict once you have asked enough questions

– The inductive bias of decision trees: The things that we want to learn to predect are more like the root node and less like the other branch nodes.

Lecture 3: Decision Trees (1) 10

Page 11: Lecture 3b: Decision Trees (1 part)

Informal Definition

• A decision tree is:

– a flow-chart-like structure, where

• each internal (non-leaf) node denotes a test on an attribute,

• each branch represents the outcome of a test, and

• each leaf (or terminal) node holds a class label.

• The topmost node in a tree is the root node.

Lecture 3: Decision Trees (1) 11

Page 12: Lecture 3b: Decision Trees (1 part)

Formalising the learning problem:1) the loss function

loss function

Lecture 3: Decision Trees (1) 12

Page 13: Lecture 3b: Decision Trees (1 part)

Formalising the learning problem:2) Data Generating Distribution

D ( x, y )

Lecture 3: Decision Trees (1) 13

Page 14: Lecture 3b: Decision Trees (1 part)

Expected Loss

1. The loss function

2. The data generating distribution

Lecture 3: Decision Trees (1) 14

Page 15: Lecture 3b: Decision Trees (1 part)

Formulae: Expected Value

How to read:

= epsilon

= equal by definition to (or: is defined as)

= blackboard-bold E

= sub the pair xy

= over script D

= l of the pair y f of x

15

Sum over all the pairs xy in script D of x and y times l of y and f of x

Page 16: Lecture 3b: Decision Trees (1 part)

Training Error

• The training error is the average error over the training data

• How to read: the training error epsilon-hat is equal by definition to 1 over N of the Sum from n=1 to capital N of “l” of y and f of x.

Lecture 3: Decision Trees (1) 16

Page 17: Lecture 3b: Decision Trees (1 part)

Empirical Error

• Alpaydin (2010: 24): the empirical error is the proportion of training instances where the predictions of h (the hypothesis = the informed guess) do not mach the required values given in X (the training set). The error of the the hypothesis h given the training set X is:

Lecture 3: Decision Trees (1) 17

Page 18: Lecture 3b: Decision Trees (1 part)

Induction

Given:

• a loss function l

and

• a sample d from some unknown distribution D

• you must compute a function f that has low expected error ε over D with respect to l.

Lecture 3: Decision Trees (1) 18

Page 19: Lecture 3b: Decision Trees (1 part)

Quiz 1: Training error

• How would you define a training error on a dataset:

1. Training error is the average loss over the training sample

2. Training error is the expected prediction error over an independent test sample

3. None of the above

Lecture 3: Decision Trees (1) 19

Page 20: Lecture 3b: Decision Trees (1 part)

Quiz 2: Distributions

What kind of distribution is D

in the formula above?

1. Normal

2. Unknown

3. None of the above

Lecture 3: Decision Trees (1) 20

Page 21: Lecture 3b: Decision Trees (1 part)

Quiz 3: Loss function

• How would you define a loss function?

1. The loss function L(actual value, predicted value) characterizes how bad predictions are

2. The loss function is an unknown distribution

3. Both definitions are incorrect.

Lecture 3: Decision Trees (1) 21

Page 22: Lecture 3b: Decision Trees (1 part)

The End

Lecture 3: Decision Trees (1) 22