Download - Questions on Homework 1?
Questions on Homework 1?
Review of Terminology
• Hypothesis or Model: A particular classifier: e.g., decision tree, neural network, etc.
• Hypothesis or Model Space: All possible hypotheses of a particular type (e.g., decision tree; polynomial function; neural network)
• Learning algorithm: A method for choosing or constructing a hypothesis (or model) from a given hypothesis (or model) space
• Hypothesis or Model Parameters: E.g., size of decision tree; degree of polynomial; number of weights for neural network [constrains the hypothesis space]
• Learning algorithm parameters: E.g., “information gain” vs. “gain ratio”; or value of learning rate for perceptron learning
Cross-Validation
• Two uses: – Used to obtain better estimate of a model’s accuracy when
data is limited.
– Used for model selection.
k-fold Cross Validation for Estimating Accuracy
• Each example is used both as a training instance and as a test instance.
• Split data into k disjoint parts: S1, S2, ..., Sk.
• For i = 1 to kSelect Si to be the test set. Train on the remaining
data, test on Si , to obtain accuracy Ai .
• Report as the final accuracy of learning algorithm.
k-fold Cross Validation for Model Selection
Run k-fold cross-validation with parameter i to produce k models. Compute average test accuracy of these k models.
Choose parameter value with best average test accuracy.
Use all training data to learn model with this parameter value.
Test resulting model on separate, unseen test data.
Evaluating Hypotheses, Continued
• Precision: Fraction of true positives out of all predicted positives:
• Recall: Fraction of true positives out of all actual positives:
What is Precision (9)?
What is Recall (9)?
75% of instances classified as “9” actually are “9”
86% of all “9”s were classified as “9”
row = actual, column = predicted
What is Precision (8)?
What is Recall (8)?
row = actual, column = predicted
Error vs. Loss
• Error rate: Fraction of incorrect answers given by a classifier h
• Loss(y, ): Amount of utility lost by predicting when the correct answer is y.
• Note that
• Loss depends on the user and the task. E.g., for one user, we might have:
L(spam, nospam) = 1, L(nospam, spam)= 10
Goal of Machine Learning: Minimize expected loss over all input-output pairs (x, y) in data space.
Need to define prior probability distribution P(X, Y) over input-output pairs.
Let ξ be the set of all possible input-output pairs.
Then, expected generalization loss for hypothesis h with respect to loss function L is:
Best hypothesis, h*, is:
Commonly used Loss functions
Empirical Loss
Typically P(X, Y) is not known.
Learning method can only estimate GenLoss by observing empirical loss on a set of examples, E,
where N = |E| .
Best hypothesis, , is:
Sources of Loss
What are the possible reasons why would differ from the target function f ?
• Unrealizability:
• Variance: Different training sets return different h’s, especially when training sets are small
• Noise: f is nondeterministic: returns different values of f (x) for same x. (Sometimes this is a result of not having all necessary attributes in x.)
• Computational complexity: It may be intractable to search H.
Regularization for Model Selection
• Instead of doing cross-validation for model selection, put penalty (or, more generally, “regularization”) term directly in “Cost” function to be minimized: