linear regression & gradient descentbboots3/cs4641-fall2018/...linear regression & gradient...

37
Linear Regression & Gradient Descent Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.

Upload: others

Post on 28-May-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

LinearRegression&GradientDescent

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyByronBoots,withgratefulacknowledgementtoEricEatonandthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.

Page 2: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

RegressionGiven:– Datawhere

– Correspondinglabelswhere

2

0

1

2

3

4

5

6

7

8

9

1970 1980 1990 2000 2010 2020

Septem

berA

rcticSeaIceExtent

(1,000,000sq

km)

Year

DatafromG.Witt.JournalofStatisticsEducation,Volume21,Number1(2013)

LinearRegressionQuadraticRegression

X =n

x

(1), . . . ,x(n)o

x

(i) 2 Rd

y =n

y(1), . . . , y(n)o

y(i) 2 R

Page 3: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

LinearRegression• Hypothesis:

• Fitmodelbyminimizingsumofsquarederrors

3

x

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

Assumex0 =1

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

Figures are courtesy ofGregShakhnarovich

Page 4: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

LeastSquaresLinearRegression

4

• CostFunction

• Fitbysolving

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

min✓

J(✓)

Page 5: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

IntuitionBehindCostFunction

5SlidebyAndrewNg

Page 6: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

IntuitionBehindCostFunction

6

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 7: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

IntuitionBehindCostFunction

7

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 8: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

IntuitionBehindCostFunction

8

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 9: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

IntuitionBehindCostFunction

9

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 10: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

10

✓ J(✓)

q1q0

J(q0,q1)

FigurebyAndrewNg

Page 11: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

11

J(✓)

q1q0

J(q0,q1)

FigurebyAndrewNg

Page 12: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

12

J(✓)

q1q0

J(q0,q1)

FigurebyAndrewNg

Sincetheleastsquaresobjectivefunctionisconvex(concave),wedon’tneedtoworryaboutlocalminimainlinearregression

Page 13: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent• Initialize• Repeatuntilconvergence

13

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

learningrate(small)e.g.,α=0.05

J(✓)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

Page 14: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent• Initialize• Repeatuntilconvergence

14

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 15: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescentforLinearRegression

• Initialize• Repeatuntilconvergence

15

simultaneousupdateforj =0...d

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

• Toachievesimultaneousupdate• AtthestartofeachGDiteration,compute• Usethisstoredvalueintheupdatesteploop

h✓

⇣x

(i)⌘

kvk2 =

sX

i

v2i =q

v21 + v22 + . . .+ v2|v|L2 norm:

k✓new

� ✓old

k2 < ✏• Assumeconvergencewhen

Page 16: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

16

(forfixed,thisisafunctionofx) (functionoftheparameters)

h(x)=-900– 0.1x

SlidebyAndrewNg

Page 17: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

17

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 18: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

18

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 19: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

19

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 20: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

20

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 21: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

21

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 22: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

22

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 23: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

23

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 24: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

24

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 25: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

Choosingα

25

αtoosmall

slowconvergence

αtoolarge

Increasingvaluefor J(✓)

• Mayovershoottheminimum• Mayfailtoconverge• Mayevendiverge

Toseeifgradientdescentisworking,printouteachiteration• Thevalueshoulddecreaseateachiteration• Ifitdoesn’t,adjustα

J(✓)

Page 26: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

ExtendingLinearRegressiontoMoreComplexModels

• TheinputsX forlinearregressioncanbe:– Originalquantitativeinputs– Transformationofquantitativeinputs

• e.g.log,exp,squareroot,square,etc.

– Polynomialtransformation• example:y =b0 +b1×x +b2×x2 +b3×x3

– Basisexpansions– Dummycodingofcategoricalinputs– Interactionsbetweenvariables

• example:x3 =x1 × x2

Thisallowsuseoflinear regressiontechniquestofitnon-linear datasets.

Page 27: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

LinearBasisFunctionModels

• Generally,

• Typically,sothatactsasabias• Inthesimplestcase,weuselinearbasisfunctions:

h✓(x) =dX

j=0

✓j�j(x)

�0(x) = 1 ✓0

�j(x) = xj

basisfunction

BasedonslidebyChristopherBishop(PRML)

Page 28: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

LinearBasisFunctionModels

– Theseareglobal;asmallchangeinx affectsallbasisfunctions

• Polynomialbasisfunctions:

• Gaussianbasisfunctions:

– Thesearelocal;asmallchangeinx onlyaffectnearbybasisfunctions.μj ands controllocationandscale(width).

BasedonslidebyChristopherBishop(PRML)

Page 29: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

LinearBasisFunctionModels• Sigmoidal basisfunctions:

where

– Thesearealsolocal;asmallchangeinx onlyaffectsnearbybasisfunctions.μjands controllocationandscale(slope).

BasedonslidebyChristopherBishop(PRML)

Page 30: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

ExampleofFittingaPolynomialCurvewithaLinearModel

y = ✓0 + ✓1x+ ✓2x2 + . . .+ ✓px

p =pX

j=0

✓jxj

Page 31: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

QualityofFit

Overfitting:• Thelearnedhypothesismayfitthetrainingsetverywell( )

• ...butfailstogeneralizetonewexamples

31

Price

Size

Price

Size

Price

Size

Underfitting(highbias)

Overfitting(highvariance)

Correctfit

J(✓) ⇡ 0

BasedonexamplebyAndrewNg

Page 32: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

Regularization• Amethodforautomaticallycontrollingthecomplexityofthelearnedhypothesis

• Idea:penalizeforlargevaluesof– Canincorporateintothecostfunction– Workswellwhenwehavealotoffeatures,eachthatcontributesabittopredictingthelabel

• Canalsoaddressoverfitting byeliminatingfeatures(eithermanuallyorviamodelselection)

32

✓j

Page 33: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

Regularization• Linearregressionobjectivefunction

– istheregularizationparameter()– Noregularizationon!

33

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+ �

dX

j=1

✓2j

modelfittodata regularization

✓0

� � � 0

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 34: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

UnderstandingRegularization

• Notethat

– Thisisthemagnitudeofthefeaturecoefficientvector!

• Wecanalsothinkofthisas:

• L2 regularizationpullscoefficientstoward0

34

dX

j=1

✓2j = k✓1:dk22

dX

j=1

(✓j � 0)2 = k✓1:d � ~0k22

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 35: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

UnderstandingRegularization

• Whathappensifwesettobehuge(e.g.,1010)?

35

�Price

Size0 0 0 0

BasedonexamplebyAndrewNg

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 36: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

RegularizedLinearRegression

36

• CostFunction

• Fitbysolving

• Gradientupdate:

min✓

J(✓)

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

regularization

@

@✓jJ(✓)

@

@✓0J(✓)

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j � �✓j

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 37: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

RegularizedLinearRegression

37

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

• Wecanrewritethegradientstepas:

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j � �✓j

✓j ✓j (1� ↵�)� ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j