deep learning - penn engineering · • supervised training of deep models (e.g. many-layered...
TRANSCRIPT
![Page 1: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/1.jpg)
DeepLearning:RestrictedBoltzmannMachines
&DeepBeliefNets
BasedonslidesbyGeoffreyHinton,SueBecker,YannLeCun,Yoshua Bengio,FrankWood
Robot Image Credit: Viktoriya Sukhanova © 123RF.com
These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Please send comments and corrections to Eric.
![Page 2: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/2.jpg)
NeuralNetworks
2
Compare outputs with correct answer to get
error signal
outputs
inputs
hidden layers
Back-propagateerrorsignaltogetderivativesfor
learning
![Page 3: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/3.jpg)
Whatiswrongwithback-propagation?• Itrequireslabeledtrainingdata
– Almostalldataisunlabeled
• Thelearningtimedoesnotscalewell– Itisveryslowinnetswithmultiplehiddenlayers
• Itcangetstuckinpoorlocaloptima– Theseareoftenquitegood,butfordeepnetstheyarefarfromoptimal
3
![Page 4: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/4.jpg)
Motivations• Supervisedtrainingofdeepmodels(e.g.many-layeredNNets)isdifficult(optimizationproblem)
• Shallowmodels(SVMs,one-hidden-layerNNets,boosting,etc…)areunlikelycandidatesforlearninghigh-levelabstractionsneededforAI
• Unsupervisedlearningcoulddo“local-learning” (eachmoduletriesitsbesttomodelwhatitsees)
• Inference(+learning)isintractableindirectedgraphicalmodelswithmanyhiddenvariables
• Currentunsupervisedlearningmethodsdon’teasilyextendtolearnmultiplelevelsofrepresentation
![Page 5: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/5.jpg)
BeliefNets• Abeliefnetisadirectedacyclic
graphcomposedofstochasticvariables.
• Canobservesomeofthevariablesandwewouldliketosolvetwoproblems:
• Theinferenceproblem:Inferthestatesoftheunobservedvariables.
• Thelearningproblem:Adjusttheinteractionsbetweenvariablestomakethenetworkmorelikelytogeneratetheobserveddata.
stochastichiddencause
visibleeffect
Usenetscomposedoflayersofstochasticbinaryvariableswithweightedconnections.Later,wewillgeneralizetoothertypesofvariable.
![Page 6: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/6.jpg)
Explainingaway(JudeaPearl)• Eveniftwohiddencausesareindependent,theycanbecome
dependentwhenweobserveaneffectthattheycanbothinfluence.– Ifwelearnthattherewasanearthquakeitreducestheprobabilitythatthehousejumpedbecauseofatruck.
truckhitshouse earthquake
housejumps
P(J|T)=0.9
P(T)=e-10 P(E)=e-10
P(J|E)=0.9
P(J)=e-20
![Page 7: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/7.jpg)
Whymultilayerlearningishardinasigmoidbeliefnet
• TolearnΘ,weneedtheposteriordistributioninthefirsthiddenlayer.
• Problem1:Theposterioristypicallyintractablebecauseof“explainingaway”.
• Problem2: Theposteriordependsonthepriorcreatedbyhigherlayersaswellasthelikelihood.– SotolearnΘ,weneedtoknowtheweightsinhigherlayers,evenifweareonlyapproximatingtheposterior.Alltheweightsinteract.
• Problem3: Weneedtointegrateoverallpossibleconfigurationsofthehighervariablestogetthepriorforfirsthiddenlayer.Yuk!
data
hiddenvariables
hiddenvariables
hiddenvariables
likelihood Θ
prior
![Page 8: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/8.jpg)
StochasticbinaryneuronsHaveastateof1or0,whichisastochasticfunctionoftheneuron’sbias b andtheinputstates itreceivesfromotherneurons.
0.5
00
1
p(ai = 1) =
1
1 + exp(�bi �P
j xj⇥ji)
p(ai = 1) =
1
1 + exp(�bi �P
j sj⇥ji)
bi +X
j
sj⇥ji
![Page 9: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/9.jpg)
P (ai = 1) =
1
1 + exp(�P
j sj⇥ji/T )=
1
1 + exp(��Ei/T )
Energy gap = �Ei = E(ai = 0)� E(ai = 1)
StochasticunitsReplacethebinarythresholdunitsbybinarystochasticunitsthatmakebiasedrandomdecisions.
– Thetemperaturecontrolstheamountofnoise– Decreasingalltheenergygapsbetweenconfigurationsisequivalenttoraisingthenoiselevel
temperature
![Page 10: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/10.jpg)
RestrictedBoltzmannMachines
• Restricttheconnectivitytomakelearningeasier– Onlyonelayerofhiddenunits
• Dealwithmorelayerslater
– Noconnectionsbetweenhiddenunits• InanRBM,thehiddenunitsareconditionally
independentgiventhevisiblestates– Socanquicklygetanunbiasedsamplefromtheposteriordistributionwhengivenadata-vector
– Thisisabigadvantageoverdirectedbeliefnets
hidden
i
j
visible
![Page 11: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/11.jpg)
Theenergyofajointconfiguration(ignoringbiasterms)
weightbetweenunitsi andj
Energywithconfigurationvonthevisibleunitsandhonthehiddenunits
binarystateofvisibleuniti
E(v,h) = �X
i,j
vihj⇥ij
binarystateofhiddenunitj
�@E(v,h)
@⇥ij= vihj
![Page 12: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/12.jpg)
Weightsà Energiesà Probabilities
• Eachpossiblejointconfigurationofthevisibleandhiddenunitshasanenergy– Theenergyisdeterminedbytheweightsandbiases
• Theenergyofajointconfigurationofthevisibleandhiddenunitsdeterminesitsprobability:
• Theprobabilityofaconfigurationoverthevisibleunitsisfoundbysummingtheprobabilitiesofallthejointconfigurationsthatcontainit.
P (v,h) / e�E(v,h)
![Page 13: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/13.jpg)
Usingenergiestodefineprobabilities
• Probabilityofajointconfigurationoverbothvisibleandhiddenunits
• Probabilityofaparticularconfigurationofthevisibleunits
P (v,h) =e�E(v,h)
Pu,g e
�E(u,g)
P (v) =
Ph e�E(v,h)
Pu,g e
�E(u,g)
![Page 14: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/14.jpg)
ApictureoftheBoltzmannmachinelearningalgorithmforanRBM
i
t=0
Startwithatrainingvectoronthevisibleunits.
Thenalternatebetweenupdatingallthehiddenunitsinparallelandupdatingallthevisibleunitsinparallel.
afantasy
@ logP (v)
@⇥ij= E0(vihj)� E1(vihj)
j
E0(vihj)
i
j
E1(vihj)
i
j
E1(vihj)
i
j
t=1 t=2 t=∞
visible
hidden
![Page 15: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/15.jpg)
Averysurprisingfact• Everythingthatoneweightneedstoknowabouttheotherweightsandthedatainordertodomaximumlikelihoodlearningiscontainedinthedifferenceoftwocorrelations.
Derivativeoflogprobabilityofonetrainingvector
Expectedvalueofproductofstatesatthermalequilibriumwhenthetrainingvectorisclampedonthevisibleunits
Expectedvalueofproductofstatesatthermalequilibriumwhennothingisclamped
@ logP (v)
@⇥ij= E0(vihj)� E1(vihj)
![Page 16: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/16.jpg)
ApictureoftheBoltzmannmachinelearningalgorithmforanRBM
i
j
i
j
i
j
i
j
t=0t=1t=2t=infinity
Problem:thisMarkovchainmaytakeaverylongtimetoconverge!
Solution:ContrastiveDivergence
E0(vihj) E1(vihj) E1(vihj)
![Page 17: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/17.jpg)
ContrastiveDivergenceLearning:AquickwaytolearnanRBM
i
j
i
j
t=0t=1
Startwithatrainingvectoronthevisibleunits.
Updateallthehiddenunitsinparallel
Updatetheallthevisibleunitsinparalleltogeta“reconstruction”.
Updatethehiddenunitsagain.
Thisisnotfollowingthegradientoftheloglikelihood.Butitworkswell.
Itisapproximatelyfollowingthegradientofanotherobjectivefunction(Carreira-Perpinan &Hinton,2005).
reconstructiondata
E0(vihj) E1(vihj)
�⇥ij = ✏[E0(vihj)� E1(vihj)]
![Page 18: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/18.jpg)
Howtolearnasetoffeaturesthataregoodforreconstructingimagesofthedigit2
50binaryfeatureneurons
16x16pixelimage
50binaryfeatureneurons
16x16pixelimage
Increment weightsbetweenanactivepixelandanactivefeature
Decrementweightsbetweenanactivepixelandanactivefeature
data(reality)
reconstruction(betterthanreality)
![Page 19: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/19.jpg)
Eachneurongrabsadifferentfeature.
TheFinal50 x256Weights
![Page 20: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/20.jpg)
ReconstructionfromactivatedbinaryfeaturesData
ReconstructionfromactivatedbinaryfeaturesData
Howwellcanwereconstructthedigitimagesfromthebinaryfeatureactivations?
Newtestimagesfromthedigitclassthatthemodelwastrainedon
Imagesfromanunfamiliardigitclass(thenetworktriestoseeeveryimageasa2)
![Page 21: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/21.jpg)
UsinganRBMtolearnamodelofadigitclass
Reconstructionsbymodeltrainedon2’s
Reconstructionsbymodeltrainedon3’s
Data
i
j
i
j
reconstructiondata
256visibleunits(pixels)
100hiddenunits(features)
E0(vihj) E1(vihj)
![Page 22: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/22.jpg)
TrainingaDeepBeliefNetwork(themainreasonRBM’sareinteresting)
• Firsttrainalayeroffeaturesthatreceiveinputdirectlyfromthepixels.
• Thentreattheactivationsofthetrainedfeaturesasiftheywerepixelsandlearnfeaturesoffeaturesinasecondhiddenlayer.
• Itcanbeprovedthateachtimeweaddanotherlayeroffeaturesweimproveavariational lowerboundonthelogprobabilityofthetrainingdata.– Theproofisslightlycomplicated.– ButitisbasedonaneatequivalencebetweenanRBMandadeepdirectedmodel
![Page 23: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/23.jpg)
TheGenerativeModelAfterLearning3Layers
Togeneratedata:1. Getanequilibriumsamplefromthe
top-levelRBMbyperformingalternatingGibbssamplingforalongtime.
2. Performatop-downpasstogetstatesforalltheotherlayers.
Sothelowerlevelbottom-upconnectionsarenot partofthegenerativemodel.Theyarejustusedforinference.
h2
data
h1
h3
⇥3
⇥2
⇥1
![Page 24: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/24.jpg)
Whydoesgreedylearningwork?• EachRBMconvertsitsdatadistributioninto
anaggregatedposteriordistributionoveritshiddenunits.
• Thisdividesthetaskofmodelingitsdataintotwotasks:– Task1:Learngenerativeweightsthatcanconverttheaggregatedposteriordistributionoverthehiddenunitsbackintothedatadistribution.
– Task2:Learntomodeltheaggregatedposteriordistributionoverthehiddenunits.
– TheRBMdoesagoodjoboftask1andamoderatelygoodjoboftask2.
• Task2iseasier(forthenextRBM)thanmodelingtheoriginaldatabecausetheaggregatedposteriordistributionisclosertoadistributionthatanRBMcanmodelperfectly.
Aggregatedposterior
distributiononhiddenunits
Datadistributiononvisibleunits
P (v | h,⇥)
P (h | ⇥)
Task 1
Task 2
![Page 25: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/25.jpg)
Whydoesgreedylearningwork?• TheweightsΘ inthebottomlevelRBMdefine
P(v | h) andtheyalso,indirectly,defineP(h).• SowecanexpresstheRBMmodelas
• IfweleaveP(v | h,Θ) aloneandimproveP(h|Θ),wewillimproveP(v).
• ToimproveP(h),weneedittobeabettermodeloftheaggregatedposteriordistributionoverhiddenvectorsproducedbyapplyingΘ tothedata.– Accomplishedbythenexthigherlayer
P (v) =X
h
P (v | h,⇥)P (h | ⇥)
![Page 26: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/26.jpg)
Whygreedylearningworks• Eachtimewelearnanewlayer,theinferenceatthelayer
belowbecomesincorrect,butthevariational boundonthelogprob ofthedataimproves(onlytrueintheory)
• Sincetheboundstartsasanequality,learninganewlayerneverdecreasesthelogprob ofthedata,providedwestartthelearningfromthetiedweightsthatimplementthecomplementaryprior
• Nowthatwehaveaguaranteewecanloosentherestrictionsandstillfeelconfident– Allowlayerstovaryinsize– Donotstartthelearningateachlayerfromtheweightsinthelayerbelow
![Page 27: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/27.jpg)
Aneuralnetworkmodelofdigitrecognition
2000top-levelunits
500units
500units
28x28pixelimage
10labelunits
Themodellearnsajointdensityforlabelsandimages.
Toperformrecognitionwecanstartwithaneutralstateofthelabelunitsanddooneortwoiterationsofthetop-levelRBM.
OrwecanjustcomputethefreeenergyoftheRBMwitheachofthe10labels
ThetoptwolayersformarestrictedBoltzmannmachinewhosefreeenergylandscapemodelsthelowdimensionalmanifoldsofthedigits.
Thevalleyshavenames:
![Page 28: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/28.jpg)
Movieofthenetworkgeneratingdigits
(availableatwww.cs.toronto/~hinton)
![Page 29: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/29.jpg)
Fine-tuningwithacontrastiveversionofthe“wake-sleep” algorithm
Afterlearningmanylayersoffeatures,wecanfine-tunethefeaturestoimprovegeneration.
1.Doastochasticbottom-uppass– Adjustthetop-downweightstobegoodatreconstructingthefeatureactivitiesinthelayerbelow.
2. DoafewiterationsofsamplinginthetoplevelRBM– Adjusttheweightsinthetop-levelRBM.
3. Doastochastictop-downpass– Adjustthebottom-upweightstobegoodatreconstructingthefeatureactivitiesinthelayerabove.
Notrequired!Buthelpstherecognitionrate.
![Page 30: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/30.jpg)
LimitsoftheGenerativeModel
1.Designedforimageswherenon-binaryvaluescanbetreatedasprobabilities.
2. Top-downfeedbackonlyinthehighest(associative)layer.3. Nosystematicwaytodealwithinvariance.4. Assumessegmentationalreadyperformedanddoesnotlearn
toattendtothemostinformativepartsofobjects.
![Page 31: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/31.jpg)
DeepNetActivationFunctions
![Page 32: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/32.jpg)
OtherDeepArchitectures:ConvolutionalNeuralNetwork
[Image credit: http://timdettmers.com/2015/03/26/convolution-deep-learning/]
![Page 33: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/33.jpg)
OtherDeepArchitectures:ConvolutionalNeuralNetwork
[Image credit: http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/]
[Image credit: http://rnd.azoft.com/wp-content/uploads_rnd/2016/11/overall-1024x256.png]
![Page 34: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/34.jpg)
OtherDeepArchitectures:LongShort-TermMemory(LSTM)
[Image credits: http://colah.github.io/posts/2015-08-Understanding-LSTMs/]
![Page 35: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/35.jpg)
DeepLearningintheHeadlines
48
![Page 36: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/36.jpg)
pixels
edges
object parts(combination of edges)
object models
DeepBeliefNetonFaceImages
BasedonmaterialsbyAndrewNg
49
![Page 37: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/37.jpg)
Examplesoflearnedobjectpartsfromobjectcategories
LearningofObjectParts
Faces Cars Elephants Chairs
Slidecredit:AndrewNg50
![Page 38: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/38.jpg)
TrainingonMultipleObjects
Trainedon4classes(cars,faces,motorbikes,airplanes).Secondlayer:Shared-featuresandobject-specificfeatures.Thirdlayer:Morespecificfeatures.
Slidecredit:AndrewNg51
![Page 39: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/39.jpg)
SceneLabelingviaDeepLearning
[Farabet etal.ICML2012,PAMI2013] 52
![Page 40: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/40.jpg)
Inputimages
SamplesfromfeedforwardInference(control)
SamplesfromFullposteriorinference
InferencefromDeepLearnedModelsGeneratingposteriorsamplesfromfacesby“fillingin” experiments(cf.LeeandMumford,2003).Combinebottom-upandtop-downinference.
Slidecredit:AndrewNg53
![Page 41: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/41.jpg)
MachineLearninginAutomaticSpeechRecognition
ATypicalSpeechRecognitionSystem
MLusedtopredictofphonestatesfromthesoundspectrogram
Deeplearninghasstate-of-the-artresults
# HiddenLayers 1 2 4 8 10 12
WordErrorRate% 16.0 12.8 11.4 10.9 11.0 11.1
BaselineGMMperformance=15.4%[Zeiler etal.“Onrectifiedlinearunitsforspeechrecognition”ICASSP2013]
54
![Page 42: Deep Learning - Penn Engineering · • Supervised training of deep models (e.g. many-layered NNets) is difficult (optimization problem) • Shallow models (SVMs, one-hidden-layer](https://reader033.vdocuments.site/reader033/viewer/2022042221/5ec714e1f17d3575a27ac0af/html5/thumbnails/42.jpg)
ImpactofDeepLearninginSpeechTechnology
Slidecredit:LiDeng,MSResearch55