questions on homework 1?

Questions on Homework 1?

Review of Terminology

• Hypothesis or Model: A particular classifier: e.g., decision tree, neural network, etc.

• Hypothesis or Model Space: All possible hypotheses of a particular type (e.g., decision tree; polynomial function; neural network)

• Learning algorithm: A method for choosing or constructing a hypothesis (or model) from a given hypothesis (or model) space

• Hypothesis or Model Parameters: E.g., size of decision tree; degree of polynomial; number of weights for neural network [constrains the hypothesis space]

• Learning algorithm parameters: E.g., “information gain” vs. “gain ratio”; or value of learning rate for perceptron learning

Cross-Validation

• Two uses: – Used to obtain better estimate of a model’s accuracy when

data is limited.

– Used for model selection.

k-fold Cross Validation for Estimating Accuracy

• Each example is used both as a training instance and as a test instance.

• Split data into k disjoint parts: S1, S2, ..., Sk.

• For i = 1 to kSelect Si to be the test set. Train on the remaining

data, test on Si , to obtain accuracy Ai .

• Report as the final accuracy of learning algorithm.

k-fold Cross Validation for Model Selection

Run k-fold cross-validation with parameter i to produce k models. Compute average test accuracy of these k models.

Choose parameter value with best average test accuracy.

Use all training data to learn model with this parameter value.

Test resulting model on separate, unseen test data.

Evaluating Hypotheses, Continued

• Precision: Fraction of true positives out of all predicted positives:

• Recall: Fraction of true positives out of all actual positives:

What is Precision (9)?

What is Recall (9)?

75% of instances classified as “9” actually are “9”

86% of all “9”s were classified as “9”

row = actual, column = predicted

What is Precision (8)?

What is Recall (8)?

row = actual, column = predicted

Error vs. Loss

• Error rate: Fraction of incorrect answers given by a classifier h

• Loss(y, ): Amount of utility lost by predicting when the correct answer is y.

• Note that

• Loss depends on the user and the task. E.g., for one user, we might have:

L(spam, nospam) = 1, L(nospam, spam)= 10

Goal of Machine Learning: Minimize expected loss over all input-output pairs (x, y) in data space.

Need to define prior probability distribution P(X, Y) over input-output pairs.

Let ξ be the set of all possible input-output pairs.

Then, expected generalization loss for hypothesis h with respect to loss function L is:

Best hypothesis, h*, is:

Commonly used Loss functions

Empirical Loss

Typically P(X, Y) is not known.

Learning method can only estimate GenLoss by observing empirical loss on a set of examples, E,

where N = |E| .

Best hypothesis, , is:

Sources of Loss

What are the possible reasons why would differ from the target function f ?

• Unrealizability:

• Variance: Different training sets return different h’s, especially when training sets are small

• Noise: f is nondeterministic: returns different values of f (x) for same x. (Sometimes this is a result of not having all necessary attributes in x.)

• Computational complexity: It may be intractable to search H.

Regularization for Model Selection

• Instead of doing cross-validation for model selection, put penalty (or, more generally, “regularization”) term directly in “Cost” function to be minimized:

questions on homework 1?

Documents