announcements10601/lectures/10601_fa20... · 2020. 12. 11. · announcements assignments hw3 mon,...

42
Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section / conflict form by Friday

Upload: others

Post on 14-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

AnnouncementsAssignments

▪ HW3

▪ Mon, 9/28, 11:59 pm

Midterm 1

▪ Mon, 10/5

▪ See Piazza for details

▪ Fill out swap-section / conflict form by Friday

Page 2: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Plan

Last time▪ Regression▪ Linear regression▪ Optimization for linear regression

Today▪ Optimization for linear regression

▪ Linear and convex function

▪ (Batch) Gradient descent

▪ Closed-form solution

▪ Stochastic gradient descent

2

Page 3: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Introduction to Machine Learning

Linear Regression and Optimization

Instructor: Pat Virtue

Page 4: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionSelling my car

Page 5: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear in Higher DimensionsWhat are these linear shapes called for 1-D, 2-D, 3-D, M-D input?

𝑦 = 𝒘𝑇𝒙 + 𝑏

𝒘𝑇𝒙 + 𝑏 = 0

𝒘𝑇𝒙 + 𝑏 ≥ 0

𝒙 ∈ ℝ 𝒙 ∈ ℝ2 𝒙 ∈ ℝ3 𝒙 ∈ ℝ𝑀

Page 6: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear FunctionLinear function

If 𝑓(𝒙) is linear, then:

▪ 𝑓 𝒙 + 𝒛 = 𝑓 𝒙 + 𝑓 𝒛

▪ 𝑓 𝛼𝒙 = 𝛼𝑓 𝒙 ∀𝛼

▪ 𝑓 𝛼𝒙 + 1 − 𝛼 𝒛 = 𝛼𝑓 𝒙 + 1 − 𝛼 𝑓 𝒛 ∀𝛼

Page 7: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Piazza Poll 1Based on the following definition of a linear, is the equation for a line, 𝑦 = 𝑤𝑥 + 𝑏, linear? Example: 𝑦 = 3𝑥 + 5

𝑓(𝒙) is linear if and only if:

▪ 𝑓 𝒙 + 𝒛 = 𝑓 𝒙 + 𝑓 𝒛 and

▪ 𝑓 𝛼𝒙 = 𝛼𝑓 𝒙 ∀𝛼

Page 8: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionLinear algebra formulation

Page 9: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionError and objectives

Page 10: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionLinear algebra formulation

Page 11: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Previous Piazza PollFor fixed data and fixed slope, w, what shape do we get by plotting MSE objective vs intercept, b?

A. Line

B. Plane

C. Half-plane

D. Convex Parabola (U-shape)

E. Concave parabola (up-side-down U)

F. None of the above

x

y

Page 12: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionOptimizing the objective

x

y𝐽 𝑤, 𝑏 =

1

2𝑦(1) − 𝑤𝑥(1) + 𝑏

2+ 𝑦(2) − 𝑤𝑥(2) + 𝑏

2

Page 13: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionOptimizing the objective

x

y𝐽 𝑤, 𝑏 =

1

2𝑦(1) − 𝑤𝑥(1) + 𝑏

2+ 𝑦(2) − 𝑤𝑥(2) + 𝑏

2

Page 14: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionOptimizing the objective

Page 15: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionMethods for optimizing the objective

▪ Grid search

▪ Random search

▪ Closed-form solution

▪ (Batch) Gradient descent

▪ Stochastic gradient descent

𝑤

𝑏

𝐽(𝑤, 𝑏)

Page 16: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

OptimizationLinear function

If 𝑓(𝒙) is linear, then:

▪ 𝑓 𝒙 + 𝒛 = 𝑓 𝒙 + 𝑓 𝒛

▪ 𝑓 𝛼𝒙 = 𝛼𝑓 𝒙 ∀𝛼

▪ 𝑓 𝛼𝒙 + 1 − 𝛼 𝒛 = 𝛼𝑓 𝒙 + 1 − 𝛼 𝑓 𝒛 ∀𝛼

Page 17: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

OptimizationConvex function

If 𝑓(𝒙) is convex, then:

▪ 𝑓 𝛼𝒙 + 1 − 𝛼 𝒛 ≤ 𝛼𝑓 𝒙 + 1 − 𝛼 𝑓 𝒛 ∀ 0 ≤ 𝛼 ≤ 1

Convex optimization

If 𝑓(𝒙) is convex, then:

▪ Every local minimum is also a global minimum ☺

Page 18: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionOptimizing the objective

x

y

Page 19: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

OptimizationGradients

Page 20: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

OptimizationGradients

Page 21: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

OptimizationGradients

Page 22: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

OptimizationGradient descent

Page 23: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionExpanding objective before computing gradient

𝐽 𝜽 =1

𝑁𝒚 − 𝑿𝜽 2

2

=1

𝑁𝒚 − 𝑿𝜽 𝑇 𝒚 − 𝑿𝜽

=1

𝑁𝒚𝑇 − 𝜽𝑇𝑿𝑇 𝒚 − 𝑿𝜽

=1

𝑁𝒚𝑇𝒚 − 𝜽𝑇𝑿𝑇𝒚 − 𝒚𝑇𝑿𝜽 + 𝜽𝑇𝑿𝑇𝑿𝜽

=1

𝑁𝒚𝑇𝒚 − 2𝜽𝑇𝑿𝑇𝒚 + 𝜽𝑇𝑿𝑇𝑿𝜽

Page 24: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear Regression

Gradient of objective with respect to parameters

𝐽 𝜽 =1

𝑁𝒚 − 𝑿𝜽 2

2

=1

𝑁𝒚𝑇𝒚 − 2𝜽𝑇𝑿𝑇𝒚 + 𝜽𝑇𝑿𝑇𝑿𝜽

∇𝐽 𝜽 =1

𝑁0 − 2𝑿𝑇𝒚 + 𝟐𝜽𝑇𝑿𝑇𝑿

=1

𝑁0 − 2𝑿𝑇𝒚 + 𝟐𝑿𝑇𝑿𝜽

=2

𝑁−𝑿𝑇𝒚 + 𝑿𝑇𝑿𝜽

𝜕𝒛𝑇𝒖

𝜕𝒛= 𝒖

-- or --

𝜕𝒛𝑇𝒖

𝜕𝒛= 𝒖𝑇

𝜕𝒛𝑇𝑨𝒛

𝜕𝒛= 𝑨 + 𝑨𝑇 𝒛

-- or --

𝜕𝒛𝑇𝑨𝒛

𝜕𝒛= 𝒛𝑇 𝑨 + 𝑨𝑇

Page 25: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionClosed-form solution

∇𝐽 𝜽 =2

𝑁−𝑿𝑇𝒚 + 𝑿𝑇𝑿𝜽

Page 26: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionNumber of solutions

Page 27: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

A Note on Matrix RankUnderlying dimensionality of the data

Page 28: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionMethods for optimizing the objective

▪ Grid search

▪ Random search

▪ Closed-form solution

▪ (Batch) Gradient descent

▪ Stochastic gradient descent

𝑤

𝑏

𝐽(𝑤, 𝑏)

Page 29: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionMethods for optimizing the objective

▪ Grid search

▪ Random search

▪ Closed-form solution

▪ (Batch) Gradient descent

▪ Stochastic gradient descent

𝑤

𝑏

𝐽(𝑤, 𝑏)

Page 30: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear Regression Gradient DescentWhat happens in gradient descent when we have N=1,000,000 training points?

Input x

Ou

tpu

t y

Page 31: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

(Batch) Gradient Descent

31

M

Slide credit: CMU MLD Matt Gormley

Page 32: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Stochastic Gradient Descent (SGD)

32

We need a per-example objective:

Slide credit: CMU MLD Matt Gormley

Page 33: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionOptimizing the objective

x

y

Page 34: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Stochastic Gradient Descent

Page 35: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionOptimizing the objective

x

y

Page 36: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Stochastic Gradient Descent

Page 37: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Linear RegressionOptimizing the objective

x

y

Page 38: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Stochastic Gradient Descent

Page 39: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Stochastic Gradient Descent

Page 40: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Stochastic Gradient Descent (SGD)

40

We need a per-example objective:

Slide credit: CMU MLD Matt Gormley

Page 41: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Stochastic Gradient Descent (SGD)

41

We need a per-example objective:

In practice, it is common to implement SGD using

sampling withoutreplacement (i.e.

shuffle({1,2,…N}), even though most of the

theory is for sampling with replacement (i.e.

Uniform({1,2,…N}).

Slide credit: CMU MLD Matt Gormley

Page 42: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section

Convergence Curves

• SGD reduces MSE much more rapidly than GD

• For GD / SGD, training MSE is initially large due to uninformed initialization

Gradient DescentSGD

Closed-form (normal eq.s)

• Def: an epoch is a single pass through the training data

1. For GD, only one update per epoch

2. For SGD, N updates per epoch N = (# train examples)

Slide credit: CMU MLD Matt Gormley and Eric P. Xing