machine learning (cse 446): perceptron...announcements i hw due this week. see detailed instructions...

21
Machine Learning (CSE 446): Perceptron Sham M Kakade c 2018 University of Washington [email protected] 1 / 14

Upload: others

Post on 15-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Machine Learning (CSE 446):Perceptron

Sham M Kakadec© 2018

University of [email protected]

1 / 14

Kira Goldner
Page 2: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Announcements

I HW due this week.See detailed instructions in the hw.

I One pdf file.I Answers and figures grouped together for each problem (in order).I Submit your code (only for problem 4 needed for HW1).

I Qz section this week. a little probability + more linear algebraI Updated late policy:

I You get 2 late days for the entire quarter, which will be automatically deducted perlate day (or part thereof).

I After these days are used up, 33% deducted per day.

I Today: the perceptron algo

1 / 14

Page 3: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Review

1 / 14

Page 4: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

The General Recipe (for supervised learning)

The cardinal rule of machine learning: Don’t touch your test data.

If you follow that rule, this recipe will give you accurate information:

1. Split data into training, (maybe) development, and test sets.

2. For different hyperparameter settings:

2.1 Train on the training data using those hyperparameter values.2.2 Evaluate loss on development data.

3. Choose the hyperparameter setting whose model achieved the lowest developmentdata loss.Optionally, retrain on the training and development data together.

4. Now you have a hypothesis.Evaluate that model on test data.

2 / 14

Kira Goldner
Kira Goldner
Page 5: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Also...

I supervised learning algorithms we covered:I Decision trees: uses few features, “selectively”I Nearest neighbor: uses all features, “blindly”

I unsupervised learning algorithms we covered:I K-means: a clustering algorithm

we show it converges later in the classI this algorithm does not use labels

3 / 14

Page 6: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Probability you should know

I expectations (and notation)

I mean, variance

I unbiased estimate

I joint distributions

I conditional distributions

3 / 14

Page 7: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Linear algebra you should know

I vectors

I inner products (and interpretation)

I Euclidean distance. Euclidean norm.I soon:

I matrices and matrix multiplicationI “outer” productsI a covariance matrixI how to write a vector x in an “orthogonal” basis.I SVD/eigenvectors/eigenvalues...

3 / 14

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 8: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Today

3 / 14

Page 9: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Is there a happy medium?

Decision trees (that aren’t too deep): use relatively few features to classify.

K-nearest neighbors: all features weighted equally.

Today: use all features, but weight them.

4 / 14

Page 10: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Is there a happy medium?

Decision trees (that aren’t too deep): use relatively few features to classify.

K-nearest neighbors: all features weighted equally.

Today: use all features, but weight them.

For today’s lecture, assume that y ∈ {−1,+1} instead of {0, 1}, and that x ∈ Rd.

4 / 14

Kira Goldner
Page 11: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Inspiration from NeuronsImage from Wikimedia Commons.

Input signals come in through dendrites, output signal passes out through the axon.

5 / 14

Page 12: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Neuron-Inspired Classifier

x[1]

w[1] ×

x[2] w[2] ×

x[3] w[3] ×

x[d] w[d] ×

b

!

fire, or not?ŷ

bias parameter

weight parameters

“activation” output

input

6 / 14

Page 13: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Neuron-Inspired Classifier

x[1]

w[1] ×

x[2] w[2] ×

x[3] w[3] ×

x[d] w[d] ×

b

! ŷoutput

input

f

6 / 14

Page 14: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

A “parametric” Hypothesis

f(x) = sign (w · x+ b)

remembering that: w · x =

d∑j=1

w[j] · x[j]

Learning requires us to set the weights w and the bias b.

7 / 14

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 15: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Geometrically...

I What does the decision boundary look like?

8 / 14

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 16: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

“Online learning”

I Let’s think of an online algorithm, where we try to update w as we examine eachtraining point (xi, yi), one at a time.

I How should we change w if we do not make an error on some point (x, y)?

I How should we change w if we make an error on some point (x, y)?

9 / 14

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 17: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Perceptron Learning AlgorithmData: D = 〈(xn, yn)〉Nn=1, number of epochs EResult: weights w and bias binitialize: w = 0 and b = 0;for e ∈ {1, . . . , E} do

for n ∈ {1, . . . , N}, in random order do# predicty = sign (w · xn + b);if y 6= yn then

# updatew← w + yn · xn;b← b+ yn;

end

end

endreturn w, b

Algorithm 1: PerceptronTrain

10 / 14

Kira Goldner
Page 18: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Parameters and Hyperparameters

This is the first supervised algorithm we’ve seen that has parameters that arenumerical values (w and b).

The perceptron learning algorithm’s sole hyperparameter is E, the number of epochs(passes over the training data).

How should we tune E using the development data?

11 / 14

Page 19: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Linear Decision Boundary

w·x + b = 0

activation = w·x + b

12 / 14

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 20: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

Interpretation of Weight Values

What does it mean when . . .

I w12 = 100?

I w12 = −1?

I w12 = 0?

What about feature scaling?

13 / 14

Kira Goldner
Kira Goldner
Kira Goldner
Page 21: Machine Learning (CSE 446): Perceptron...Announcements I HW due this week. See detailed instructions in the hw. I One pdf le. I Answers and gures grouped together for each problem

What can we say about convergence?

I Can you say what has to occur for the algorithm to converge?

I Can we understand when it will never converge?

Stay tuned...

14 / 14

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner