linear regression - penn engineering › ~cis519 › spring2019 › ... · based on slide by...
TRANSCRIPT
![Page 1: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/1.jpg)
LinearRegression
RobotImageCredit:Viktoriya Sukhanova ©123RF.com
TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.PleasesendcommentsandcorrectionstoEric.
![Page 2: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/2.jpg)
RegressionGiven:– Datawhere
– Correspondinglabelswhere
2
0
1
2
3
4
5
6
7
8
9
1970 1980 1990 2000 2010 2020
Septem
berA
rcticSeaIceExtent
(1,000,000sq
km)
Year
DatafromG.Witt.JournalofStatisticsEducation,Volume21,Number1(2013)
LinearRegressionQuadraticRegression
X =nx(1), . . . ,x(n)
ox(i) 2 Rd
y =ny(1), . . . , y(n)
oy(i) 2 R
![Page 3: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/3.jpg)
• 97samples,partitionedinto67train/30test• Eightpredictors(features):
– 6continuous(4logtransforms),1binary,1ordinal
• Continuousoutcomevariable:– lpsa:log(prostatespecificantigenlevel)
ProstateCancerDataset
BasedonslidebyJeffHowbert
![Page 4: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/4.jpg)
LinearRegression• Hypothesis:
• Fitmodelbyminimizingsumofsquarederrors
5
x
y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX
j=0
✓jxj
Assumex0 =1
y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX
j=0
✓jxj
Figures are courtesy ofGregShakhnarovich
![Page 5: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/5.jpg)
LeastSquaresLinearRegression
6
• CostFunction
• Fitbysolving
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2
min✓
J(✓)
![Page 6: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/6.jpg)
IntuitionBehindCostFunction
7
ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2
BasedonexamplebyAndrewNg
![Page 7: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/7.jpg)
IntuitionBehindCostFunction
8
0
1
2
3
0 1 2 3
y
x
(forfixed,thisisafunctionofx) (functionoftheparameter)
0
1
2
3
-0.5 0 0.5 1 1.5 2 2.5
ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2
BasedonexamplebyAndrewNg
![Page 8: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/8.jpg)
IntuitionBehindCostFunction
9
0
1
2
3
0 1 2 3
y
x
(forfixed,thisisafunctionofx) (functionoftheparameter)
0
1
2
3
-0.5 0 0.5 1 1.5 2 2.5
ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2
J([0, 0.5]) =1
2⇥ 3
⇥(0.5� 1)2 + (1� 2)2 + (1.5� 3)2
⇤⇡ 0.58Basedonexample
byAndrewNg
![Page 9: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/9.jpg)
IntuitionBehindCostFunction
10
0
1
2
3
0 1 2 3
y
x
(forfixed,thisisafunctionofx) (functionoftheparameter)
0
1
2
3
-0.5 0 0.5 1 1.5 2 2.5
ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2
J([0, 0]) ⇡ 2.333
BasedonexamplebyAndrewNg
J()isconcave
![Page 10: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/10.jpg)
IntuitionBehindCostFunction
11SlidebyAndrewNg
![Page 11: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/11.jpg)
IntuitionBehindCostFunction
12
(forfixed,thisisafunctionofx) (functionoftheparameters)
SlidebyAndrewNg
![Page 12: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/12.jpg)
IntuitionBehindCostFunction
13
(forfixed,thisisafunctionofx) (functionoftheparameters)
SlidebyAndrewNg
![Page 13: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/13.jpg)
IntuitionBehindCostFunction
14
(forfixed,thisisafunctionofx) (functionoftheparameters)
SlidebyAndrewNg
![Page 14: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/14.jpg)
IntuitionBehindCostFunction
15
(forfixed,thisisafunctionofx) (functionoftheparameters)
SlidebyAndrewNg
![Page 15: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/15.jpg)
BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce
16
✓
✓ J(✓)
q1q0
J(q0,q1)
FigurebyAndrewNg
![Page 16: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/16.jpg)
BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce
17
✓
✓
J(✓)
q1q0
J(q0,q1)
✓
FigurebyAndrewNg
![Page 17: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/17.jpg)
BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce
18
✓
✓
J(✓)
q1q0
J(q0,q1)
✓
FigurebyAndrewNg
Sincetheleastsquaresobjectivefunctionisconvex(concave),wedon’tneedtoworryaboutlocalminima
![Page 18: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/18.jpg)
GradientDescent• Initialize• Repeatuntilconvergence
19
✓
✓j ✓j � ↵@
@✓jJ(✓) simultaneousupdate
forj =0...d
learningrate(small)e.g.,α=0.05
J(✓)
✓
0
1
2
3
-0.5 0 0.5 1 1.5 2 2.5
↵
![Page 19: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/19.jpg)
GradientDescent• Initialize• Repeatuntilconvergence
20
✓
✓j ✓j � ↵@
@✓jJ(✓) simultaneousupdate
forj =0...d
ForLinearRegression:@
@✓jJ(✓) =
@
@✓j
1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2
=@
@✓j
1
2n
nX
i=1
dX
k=0
✓kx(i)k � y(i)
!2
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y(i)
!⇥ @
@✓j
dX
k=0
✓kx(i)k � y(i)
!
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y(i)
!x(i)j
=1
n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘x(i)j
![Page 20: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/20.jpg)
GradientDescent• Initialize• Repeatuntilconvergence
21
✓
✓j ✓j � ↵@
@✓jJ(✓) simultaneousupdate
forj =0...d
ForLinearRegression:@
@✓jJ(✓) =
@
@✓j
1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2
=@
@✓j
1
2n
nX
i=1
dX
k=0
✓kx(i)k � y(i)
!2
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y(i)
!⇥ @
@✓j
dX
k=0
✓kx(i)k � y(i)
!
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y(i)
!x(i)j
=1
n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘x(i)j
![Page 21: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/21.jpg)
GradientDescent• Initialize• Repeatuntilconvergence
22
✓
✓j ✓j � ↵@
@✓jJ(✓) simultaneousupdate
forj =0...d
ForLinearRegression:@
@✓jJ(✓) =
@
@✓j
1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2
=@
@✓j
1
2n
nX
i=1
dX
k=0
✓kx(i)k � y(i)
!2
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y(i)
!⇥ @
@✓j
dX
k=0
✓kx(i)k � y(i)
!
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y(i)
!x(i)j
=1
n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘x(i)j
![Page 22: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/22.jpg)
GradientDescent• Initialize• Repeatuntilconvergence
23
✓
✓j ✓j � ↵@
@✓jJ(✓) simultaneousupdate
forj =0...d
ForLinearRegression:@
@✓jJ(✓) =
@
@✓j
1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2
=@
@✓j
1
2n
nX
i=1
dX
k=0
✓kx(i)k � y(i)
!2
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y(i)
!⇥ @
@✓j
dX
k=0
✓kx(i)k � y(i)
!
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y(i)
!x(i)j
=1
n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘x(i)j
![Page 23: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/23.jpg)
GradientDescentforLinearRegression
• Initialize• Repeatuntilconvergence
24
✓
simultaneousupdateforj =0...d
✓j ✓j � ↵1
n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘x(i)j
• Toachievesimultaneousupdate• AtthestartofeachGDiteration,compute• Usethisstoredvalueintheupdatesteploop
h✓
⇣x(i)
⌘
kvk2 =
sX
i
v2i =q
v21 + v22 + . . .+ v2|v|L2 norm:
k✓new � ✓oldk2 < ✏• Assumeconvergencewhen
![Page 24: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/24.jpg)
GradientDescent
25
(forfixed,thisisafunctionofx) (functionoftheparameters)
h(x)=-900– 0.1x
SlidebyAndrewNg
![Page 25: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/25.jpg)
GradientDescent
26
(forfixed,thisisafunctionofx) (functionoftheparameters)
SlidebyAndrewNg
![Page 26: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/26.jpg)
GradientDescent
27
(forfixed,thisisafunctionofx) (functionoftheparameters)
SlidebyAndrewNg
![Page 27: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/27.jpg)
GradientDescent
28
(forfixed,thisisafunctionofx) (functionoftheparameters)
SlidebyAndrewNg
![Page 28: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/28.jpg)
GradientDescent
29
(forfixed,thisisafunctionofx) (functionoftheparameters)
SlidebyAndrewNg
![Page 29: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/29.jpg)
GradientDescent
30
(forfixed,thisisafunctionofx) (functionoftheparameters)
SlidebyAndrewNg
![Page 30: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/30.jpg)
GradientDescent
31
(forfixed,thisisafunctionofx) (functionoftheparameters)
SlidebyAndrewNg
![Page 31: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/31.jpg)
GradientDescent
32
(forfixed,thisisafunctionofx) (functionoftheparameters)
SlidebyAndrewNg
![Page 32: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/32.jpg)
GradientDescent
33
(forfixed,thisisafunctionofx) (functionoftheparameters)
SlidebyAndrewNg
![Page 33: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/33.jpg)
Choosingα
34
αtoosmall
slowconvergence
αtoolarge
Increasingvaluefor J(✓)
• Mayovershoottheminimum• Mayfailtoconverge• Mayevendiverge
Toseeifgradientdescentisworking,printouteachiteration• Thevalueshoulddecreaseateachiteration• Ifitdoesn’t,adjustα
J(✓)
![Page 34: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/34.jpg)
ExtendingLinearRegressiontoMoreComplexModels
• TheinputsX forlinearregressioncanbe:– Originalquantitativeinputs– Transformationofquantitativeinputs
• e.g.log,exp,squareroot,square,etc.
– Polynomialtransformation• example:y =b0 +b1×x +b2×x2 +b3×x3
– Basisexpansions– Dummycodingofcategoricalinputs– Interactionsbetweenvariables
• example:x3 =x1 × x2
Thisallowsuseoflinearregressiontechniquestofitnon-lineardatasets.
![Page 35: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/35.jpg)
LinearBasisFunctionModels
• Generally,
• Typically,sothatactsasabias• Inthesimplestcase,weuselinearbasisfunctions:
h✓(x) =dX
j=0
✓j�j(x)
�0(x) = 1 ✓0
�j(x) = xj
basisfunction
BasedonslidebyChristopherBishop(PRML)
![Page 36: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/36.jpg)
LinearBasisFunctionModels
– Theseareglobal;asmallchangeinx affectsallbasisfunctions
• Polynomialbasisfunctions:
• Gaussianbasisfunctions:
– Thesearelocal;asmallchangeinx onlyaffectnearbybasisfunctions.μj ands controllocationandscale(width).
BasedonslidebyChristopherBishop(PRML)
![Page 37: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/37.jpg)
LinearBasisFunctionModels• Sigmoidal basisfunctions:
where
– Thesearealsolocal;asmallchangeinx onlyaffectsnearbybasisfunctions.μjands controllocationandscale(slope).
BasedonslidebyChristopherBishop(PRML)
![Page 38: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/38.jpg)
ExampleofFittingaPolynomialCurvewithaLinearModel
y = ✓0 + ✓1x+ ✓2x2 + . . .+ ✓px
p =pX
j=0
✓jxj
![Page 39: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/39.jpg)
LinearBasisFunctionModels
• BasicLinearModel:
• GeneralizedLinearModel:
• Oncewehavereplacedthedatabytheoutputsofthebasisfunctions,fittingthegeneralizedmodelisexactlythesameproblemasfittingthebasicmodel– Unlessweusethekerneltrick– moreonthatwhenwecoversupportvectormachines
– Therefore,thereisnopointinclutteringthemathwithbasisfunctions
40
h✓(x) =dX
j=0
✓j�j(x)
h✓(x) =dX
j=0
✓jxj
BasedonslidebyGeoffHinton
![Page 40: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/40.jpg)
LinearAlgebraConcepts• Vector in isanorderedsetofd realnumbers– e.g.,v=[1,6,3,4]isin– “[1,6,3,4]” isacolumnvector:– asopposedtoarowvector:
• Anm-by-n matrix isanobjectwithm rowsandn columns,whereeachentryisarealnumber:
÷÷÷÷÷
ø
ö
ççççç
è
æ
4361
( )4361
÷÷÷
ø
ö
ççç
è
æ
2396784821
Rd
R4
BasedonslidesbyJosephBradley
![Page 41: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/41.jpg)
• Transpose:reflectvector/matrixonline:
( )baba T
=÷÷ø
öççè
æ÷÷ø
öççè
æ=÷÷
ø
öççè
ædbca
dcba T
– Note:(Ax)T=xTAT (We’lldefinemultiplicationsoon…)
• Vectornorms:– Lp normofv =(v1,…,vk)is– Commonnorms:L1,L2– Linfinity =maxi |vi|
• Lengthofavectorv isL2(v)
X
i
|vi|p! 1
p
BasedonslidesbyJosephBradley
LinearAlgebraConcepts
![Page 42: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/42.jpg)
• Vectordotproduct:
– Note:dotproductofu withitself =length(u)2 =
• Matrixproduct:
( ) ( ) 22112121 vuvuvvuuvu +=•=•
÷÷ø
öççè
æ++++
=
÷÷ø
öççè
æ=÷÷
ø
öççè
æ=
2222122121221121
2212121121121111
2221
1211
2221
1211 ,
babababababababa
AB
bbbb
Baaaa
A
kuk22
BasedonslidesbyJosephBradley
LinearAlgebraConcepts
![Page 43: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/43.jpg)
• Vectorproducts:– Dotproduct:
– Outerproduct:
( ) 22112
121 vuvuvv
uuvuvu T +=÷÷ø
öççè
æ==•
( ) ÷÷ø
öççè
æ=÷÷
ø
öççè
æ=
2212
211121
2
1
vuvuvuvu
vvuu
uvT
BasedonslidesbyJosephBradley
LinearAlgebraConcepts
![Page 44: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/44.jpg)
h(x) = ✓|x
x| =⇥1 x1 . . . xd
⇤
Vectorization• Benefitsofvectorization– Morecompactequations– Fastercode(usingoptimizedmatrixlibraries)
• Considerourmodel:
• Let
• Canwritethemodelinvectorized formas45
h(x) =dX
j=0
✓jxj
✓ =
2
6664
✓0✓1...✓d
3
7775
![Page 45: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/45.jpg)
Vectorization• Considerourmodelforn instances:
• Let
• Canwritethemodelinvectorized formas46
h✓(x) = X✓
X =
2
66666664
1 x(1)1 . . . x(1)
d...
.... . .
...
1 x(i)1 . . . x(i)
d...
.... . .
...
1 x(n)1 . . . x(n)
d
3
77777775
✓ =
2
6664
✓0✓1...✓d
3
7775
h⇣x(i)
⌘=
dX
j=0
✓jx(i)j
R(d+1)⇥1 Rn⇥(d+1)
![Page 46: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/46.jpg)
J(✓) =1
2n
nX
i=1
⇣✓|x(i) � y(i)
⌘2
Vectorization• Forthelinearregressioncostfunction:
47
J(✓) =1
2n(X✓ � y)| (X✓ � y)
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2
Rn⇥(d+1)
R(d+1)⇥1
Rn⇥1R1⇥n
Let:
y =
2
6664
y(1)
y(2)
...y(n)
3
7775
![Page 47: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/47.jpg)
ClosedFormSolution:
ClosedFormSolution• InsteadofusingGD,solveforoptimal analytically– Noticethatthesolutioniswhen
• Derivation:
Takederivativeandsetequalto0,thensolvefor:
48
✓@
@✓J(✓) = 0
J (✓) =1
2n(X✓ � y)| (X✓ � y)
/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y
1x1J (✓) =1
2n(X✓ � y)| (X✓ � y)
/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y
J (✓) =1
2n(X✓ � y)| (X✓ � y)
/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y
@
@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0
(X|X)✓ �X|y = 0
(X|X)✓ = X|y
✓ = (X|X)�1X|y
✓@
@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0
(X|X)✓ �X|y = 0
(X|X)✓ = X|y
✓ = (X|X)�1X|y
@
@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0
(X|X)✓ �X|y = 0
(X|X)✓ = X|y
✓ = (X|X)�1X|y
@
@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0
(X|X)✓ �X|y = 0
(X|X)✓ = X|y
✓ = (X|X)�1X|y
![Page 48: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/48.jpg)
ClosedFormSolution• CanobtainbysimplypluggingX and into
• IfX TX isnotinvertible(i.e.,singular),mayneedto:– Usepseudo-inverseinsteadoftheinverse
• Inpython,numpy.linalg.pinv(a)
– Removeredundant(notlinearlyindependent)features– Removeextrafeaturestoensurethatd ≤n
49
@
@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0
(X|X)✓ �X|y = 0
(X|X)✓ = X|y
✓ = (X|X)�1X|y
y =
2
6664
y(1)
y(2)
...y(n)
3
7775X =
2
66666664
1 x(1)1 . . . x(1)
d...
.... . .
...
1 x(i)1 . . . x(i)
d...
.... . .
...
1 x(n)1 . . . x(n)
d
3
77777775
✓ y
![Page 49: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/49.jpg)
GradientDescentvs ClosedForm
GradientDescentClosedFormSolution
50
• Requiresmultipleiterations• Needtochooseα• Workswellwhenn islarge• Cansupportincremental
learning
• Non-iterative• Noneedforα• Slowifn islarge
– Computing(X TX)-1 isroughlyO(n3)
![Page 50: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/50.jpg)
ImprovingLearning:FeatureScaling
• Idea:Ensurethatfeaturehavesimilarscales
• Makesgradientdescentconvergemuch faster
51
0
5
10
15
20
0 5 10 15 20
✓1
✓2
BeforeFeatureScaling
0
5
10
15
20
0 5 10 15 20
✓1
✓2
AfterFeatureScaling
![Page 51: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/51.jpg)
FeatureStandardization• Rescalesfeaturestohavezeromeanandunitvariance
– Letμj bethemeanoffeaturej:
– Replaceeachvaluewith:
• sj isthestandarddeviationoffeaturej• Couldalsousetherangeoffeaturej (maxj – minj)forsj
• Mustapplythesametransformationtoinstancesforbothtrainingandprediction
• Outlierscancauseproblems52
µj =1
n
nX
i=1
x(i)j
x(i)j
x(i)j � µj
sj
forj =1...d(notx0!)
![Page 52: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/52.jpg)
QualityofFit
Overfitting:• Thelearnedhypothesismayfitthetrainingsetverywell( )
• ...butfailstogeneralizetonewexamples
53
Prod
uctiv
ity
TimeSpent
Prod
uctiv
ity
TimeSpent
Prod
uctiv
ity
TimeSpent
Underfitting(highbias)
Overfitting(highvariance)
Correctfit
J(✓) ⇡ 0
BasedonexamplebyAndrewNg
![Page 53: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/53.jpg)
Regularization• Amethodforautomaticallycontrollingthecomplexityofthelearnedhypothesis
• Idea:penalizeforlargevaluesof– Canincorporateintothecostfunction– Workswellwhenwehavealotoffeatures,eachthatcontributesabittopredictingthelabel
• Canalsoaddressoverfitting byeliminatingfeatures(eithermanuallyorviamodelselection)
54
✓j
![Page 54: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/54.jpg)
Regularization• Linearregressionobjectivefunction
– istheregularizationparameter()– Noregularizationon!
55
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2+ �
dX
j=1
✓2j
modelfittodata regularization
✓0
� � � 0
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2+
�
2
dX
j=1
✓2j
![Page 55: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/55.jpg)
UnderstandingRegularization
• Notethat
– Thisisthemagnitudeofthefeaturecoefficientvector!
• Wecanalsothinkofthisas:
• L2 regularizationpullscoefficientstoward0
56
dX
j=1
✓2j = k✓1:dk22
dX
j=1
(✓j � 0)2 = k✓1:d � ~0k22
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2+
�
2
dX
j=1
✓2j
![Page 56: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/56.jpg)
UnderstandingRegularization
• Whathappensas?
57
Prod
uctiv
ity
TimeSpentonWork
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2+
�
2
dX
j=1
✓2j
� ! 1
✓0 + ✓1x+ ✓2x2 + ✓3x
3 + ✓4x4
![Page 57: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/57.jpg)
UnderstandingRegularization
• Whathappensas?
58
Prod
uctiv
ity
TimeSpentonWork
0 0 0 0
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2+
�
2
dX
j=1
✓2j
� ! 1
✓0 + ✓1x+ ✓2x2 + ✓3x
3 + ✓4x4
![Page 58: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/58.jpg)
RegularizedLinearRegression
59
• CostFunction
• Fitbysolving
• Gradientupdate:
min✓
J(✓)
✓j ✓j � ↵1
n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘x(i)j
✓0 ✓0 � ↵1
n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘
regularization
@
@✓jJ(✓)
@
@✓0J(✓)
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2+
�
2
dX
j=1
✓2j
� ↵�✓j
![Page 59: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/59.jpg)
RegularizedLinearRegression
60
✓0 ✓0 � ↵1
n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘
• Wecanrewritethegradientstepas:
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘2+
�
2
dX
j=1
✓2j
✓j ✓j (1� ↵�)� ↵1
n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘x(i)j
✓j ✓j � ↵1
n
nX
i=1
⇣h✓
⇣x(i)
⌘� y(i)
⌘x(i)j � ↵�✓j
![Page 60: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/60.jpg)
RegularizedLinearRegression
61
✓ =
0
BBBBB@X|X + �
2
666664
0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...
......
. . ....
0 0 0 . . . 1
3
777775
1
CCCCCA
�1
X|y
• Toincorporateregularizationintotheclosedformsolution:
![Page 61: Linear Regression - Penn Engineering › ~cis519 › spring2019 › ... · Based on slide by Christopher Bishop (PRML) Linear Basis Function Models • Sigmoidal basis functions:](https://reader033.vdocuments.site/reader033/viewer/2022060504/5f1dbd8592b54b5a00731b56/html5/thumbnails/61.jpg)
RegularizedLinearRegression
62
• Toincorporateregularizationintotheclosedformsolution:
• Canderivethisthesameway,bysolving
• Canprovethatforλ >0,inverseexistsintheequationabove
✓ =
0
BBBBB@X|X + �
2
666664
0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...
......
. . ....
0 0 0 . . . 1
3
777775
1
CCCCCA
�1
X|y
@
@✓J(✓) = 0