convolutional neural networks for language•convolutional neural networks: a brief history •the...
TRANSCRIPT
CS6956:DeepLearningforNLP
ConvolutionalNeuralNetworksforLanguage
Featuresfromtext
Example:Sentimentclassification
Thegoal:Isthesentimentofasentencepositive,negativeorneutral?
Thefilmisfunandishosttosometrulyexcellentsequences
Approach:TrainamulticlassclassifierWhatfeatures?
2
Featuresfromtext
Example:Sentimentclassification
Thegoal:Isthesentimentofasentencepositive,negativeorneutral?
Thefilmis funandishosttosome trulyexcellentsequences
Approach:TrainamulticlassclassifierWhatfeatures?Somewordsandngrams areinformative,whilesomearenot
3
Featuresfromtext
Example:Sentimentclassification
Thegoal:Isthesentimentofasentencepositive,negativeorneutral?
Thefilmis funandishosttosome trulyexcellentsequences
Approach:TrainamulticlassclassifierWhatfeatures?Somewordsandngrams areinformative,whilesomearenot
Weneedto:1. Identifyinformativelocalinformation2. Aggregateitintoafixedsizevectorrepresentation
4
ConvolutionalNeuralNetworks
Designedto1. Identifylocalpredictorsinalargerinput
2. Poolthemtogethertocreateafeaturerepresentation
3. Andpossiblyrepeatthisinahierarchicalfashion
IntheNLPcontext,ithelpsidentifypredictivengrams foratask
5
Overview
• ConvolutionalNeuralNetworks:Abriefhistory
• ThetwooperationsinaCNN– Convolution– Pooling
• Convolution+Poolingasabuildingblock
• CNNsinNLP
• RecurrentnetworksvsConvolutionalnetworks
6
Overview
• ConvolutionalNeuralNetworks:Abriefhistory
• ThetwooperationsinaCNN– Convolution– Pooling
• Convolution+Poolingasabuildingblock
• CNNsinNLP
• RecurrentnetworksvsConvolutionalnetworks
7
ConvolutionalNeuralNetworks:Briefhistory
• HubelandWiesel,1950s/60s:Mammalianvisualcortexcontainneuronsthatrespondtosmallregionsandspecificpatternsinthevisualfield
• Fukushima1980,Neocognitron:DirectlyinspiredbyHubel,Wiesel– Keyidea:localityoffeaturesinthevisualcortexisimportant,integratethemlocallyand
propagatethemtofurtherlayers– Twooperations:convolutionallayerthatreactstospecificpatternsandadown-sampling
layerthataggregatesinformation
• LeCun 1989-today,ConvolutionalNeuralNetwork:Asupervisedversion– Relatedtoconvolutionkernelsincomputervision– Verysuccessfulonhandwritingrecognitionandothercomputervisiontasks
• Hasbecomebetteroverrecentyearswithmoredata,computation– Krizhevsky etal2012:ObjectdetectionwithImageNet– Thedefactofeatureextractorforcomputervision
8
Firstaroseinthecontextofvision
ConvolutionalNeuralNetworks:Briefhistory
• HubelandWiesel,1950s/60s:Mammalianvisualcortexcontainneuronsthatrespondtosmallregionsandspecificpatternsinthevisualfield
9
Firstaroseinthecontextofvision
NobelPrizeinPhysiologyorMedicine,1981
DavidH.Hubel Torsten Wiesel
ConvolutionalNeuralNetworks:Briefhistory
• HubelandWiesel,1950s/60s:Mammalianvisualcortexcontainneuronsthatrespondtosmallregionsandspecificpatternsinthevisualfield
• Fukushima1980,Neocognitron:DirectlyinspiredbyHubel,Wiesel– Keyidea:localityoffeaturesinthevisualcortexisimportant,integratethemlocallyand
propagatethemtofurtherlayers– Twooperations
1. convolutionallayerthatreactstospecificpatternsand,2. adown-samplinglayerthataggregatesinformation
10
Firstaroseinthecontextofvision
ConvolutionalNeuralNetworks:Briefhistory
• HubelandWiesel,1950s/60s:Mammalianvisualcortexcontainneuronsthatrespondtosmallregionsandspecificpatternsinthevisualfield
• Fukushima1980,Neocognitron:DirectlyinspiredbyHubel,Wiesel– Keyidea:localityoffeaturesinthevisualcortexisimportant,integratethemlocallyand
propagatethemtofurtherlayers– Twooperations:convolutionallayerthatreactstospecificpatternsandadown-sampling
layerthataggregatesinformation
• LeCun 1989-today,ConvolutionalNeuralNetwork:Asupervisedversion– Relatedtoconvolutionkernelsincomputervision– Successwithhandwritingrecognitionandothercomputervisiontasks
11
Firstaroseinthecontextofvision
ConvolutionalNeuralNetworks:Briefhistory
• HubelandWiesel,1950s/60s:Mammalianvisualcortexcontainneuronsthatrespondtosmallregionsandspecificpatternsinthevisualfield
• Fukushima1980,Neocognitron:DirectlyinspiredbyHubel,Wiesel– Keyidea:localityoffeaturesinthevisualcortexisimportant,integratethemlocallyand
propagatethemtofurtherlayers– Twooperations:convolutionallayerthatreactstospecificpatternsandadown-sampling
layerthataggregatesinformation
• LeCun 1989-today,ConvolutionalNeuralNetwork:Asupervisedversion– Relatedtoconvolutionkernelsincomputervision– Successwithhandwritingrecognitionandothercomputervisiontasks
• Hasbecomebetteroverrecentyearswithmoredata,computation– Krizhevsky etal2012:ObjectdetectionwithImageNet– Thedefactofeatureextractorforcomputervision
12
Firstaroseinthecontextofvision
ConvolutionalNeuralNetworks:Briefhistory
• IntroducedtoNLPbyCollobert etal,2011– Usedasafeatureextractionsystemforsemanticrolelabeling
• Sincethenseveralotherapplicationssuchassentimentanalysis,questionclassification,etc– Kalchbrener etal2014,Kim2014
13
CNNterminology
• Filter– Afunctionthattransformsininputmatrix/vectorintoascalarfeature– Afilterisalearnedfeaturedetector
• Channel– Incomputervision,colorimageshavered,blueandgreenchannels– Ingeneral,achannelrepresentsamediumthatcapturesinformation
aboutaninputindependentofotherchannels• Forexample,differentkindsofwordembeddings couldbedifferentchannels• Channelscouldthemselvesbeproducedbypreviousconvolutionallayers
• Receptivefield– Theregionoftheinputthatafiltercurrentlyfocuseson
14
Showsitscomputervisionsandsignalprocessingorigins
CNNterminology
• Filter– Afunctionthattransformsininputmatrix/vectorintoascalarfeature– Afilterisalearnedfeaturedetector(alsocalledafeaturemap)
• Channel– Incomputervision,colorimageshavered,blueandgreenchannels– Ingeneral,achannelrepresentsamediumthatcapturesinformation
aboutaninputindependentofotherchannels• Forexample,differentkindsofwordembeddings couldbedifferentchannels• Channelscouldthemselvesbeproducedbypreviousconvolutionallayers
• Receptivefield– Theregionoftheinputthatafiltercurrentlyfocuseson
15
Showsitscomputervisionsandsignalprocessingorigins
CNNterminology
• Filter– Afunctionthattransformsininputmatrix/vectorintoascalarfeature– Afilterisalearnedfeaturedetector(alsocalledafeaturemap)
• Channel– Incomputervision,colorimageshavered,blueandgreenchannels– Ingeneral,achannelrepresentsamediumthatcapturesinformation
aboutaninputindependentofotherchannels• Forexample,differentkindsofwordembeddings couldbedifferentchannels• Channelscouldthemselvesbeproducedbypreviousconvolutionallayers
• Receptivefield– Theregionoftheinputthatafiltercurrentlyfocuseson
16
Showsitscomputervisionsandsignalprocessingorigins
CNNterminology
• Filter– Afunctionthattransformsininputmatrix/vectorintoascalarfeature– Afilterisalearnedfeaturedetector(alsocalledafeaturemap)
• Channel– Incomputervision,colorimageshavered,blueandgreenchannels– Ingeneral,achannelrepresentsa“viewoftheinput”thatcaptures
informationaboutaninputindependentofotherchannels• Forexample,differentkindsofwordembeddings couldbedifferentchannels• Channelscouldthemselvesbeproducedbypreviousconvolutionallayers
• Receptivefield– Theregionoftheinputthatafiltercurrentlyfocuseson
17
Showsitscomputervisionsandsignalprocessingorigins
Overview
• ConvolutionalNeuralNetworks:Abriefhistory
• ThetwooperationsinaCNN– Convolution– Pooling
• Convolution+Poolingasabuildingblock
• CNNsinNLP
• RecurrentnetworksvsConvolutionalnetworks
18
Whatisaconvolution?
19
Let’sseethisusinganexampleforvectors.
Wewillgeneralizethistomatricesandbeyond,butthegeneralidearemainsthesame.
Whatisaconvolution?
20
Anexampleusingvectors
2 3 1 3 2 1Avector𝐱
Whatisaconvolution?
21
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter 𝐟 ofsize𝑛
Anexampleusingvectors
Here,thefiltersizeis3
Whatisaconvolution?
22
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Theoutput isalsoavector
Anexampleusingvectors
output( =*𝑓, ⋅ 𝑥(/ 01 2,
�
,
Whatisaconvolution?
23
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Theoutput isalsoavector
Anexampleusingvectors
output( =*𝑓, ⋅ 𝑥(/ 01 2,
�
,
Thefiltermovesacrossthevector.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthevectorofthatsize.
Whatisaconvolution?
24
2 3 1 3 2 1
1 2 1
output( =*𝑓, ⋅ 𝑥(/ 01 2,
�
,
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
0
Paddingatthebeginning
Whatisaconvolution?
25
2 3 1 3 2 1
1 2 1
output( =*𝑓, ⋅ 𝑥(/ 01 2,
�
,
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7Theoutput isalsoavector
0
Paddingatthebeginning
Whatisaconvolution?
26
2 3 1 3 2 1
1 2 1
output( =*𝑓, ⋅ 𝑥(/ 01 2,
�
,
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9Theoutput isalsoavector
Whatisaconvolution?
27
2 3 1 3 2 1
1 2 1
output( =*𝑓, ⋅ 𝑥(/ 01 2,
�
,
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8Theoutput isalsoavector
Whatisaconvolution?
28
2 3 1 3 2 1
1 2 1
output( =*𝑓, ⋅ 𝑥(/ 01 2,
�
,
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9Theoutput isalsoavector
Whatisaconvolution?
29
2 3 1 3 2 1
1 2 1
output( =*𝑓, ⋅ 𝑥(/ 01 2,
�
,
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8Theoutput isalsoavector
Whatisaconvolution?
30
2 3 1 3 2 1
1 2 1
output( =*𝑓, ⋅ 𝑥(/ 01 2,
�
,
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
0
Paddingattheend
Whatisaconvolution?
31
2 3 1 3 2 1
1 2 1
output( =*𝑓, ⋅ 𝑥(/ 01 2,
�
,
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Whatisaconvolution?
32
2 3 1 3 2 1
1 2 1
output( =*𝑓, ⋅ 𝑥(/ 01 2,
�
,
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thefiltermovesacrossthevector.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthevectorofthatsize.
Whatisaconvolution?
33
Thesameideaappliestomatricesaswell
Aninputmatrix Afilter
Thefiltermovesacrossthematrix.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthematrix ofthatsize.
Whatisaconvolution?
34
Thesameideaappliestomatricesaswell
Aninputmatrix Afilter Theresultofconvolution
Thefiltermovesacrossthematrix.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthematrixofthatsize.
Whatisaconvolution?
35
Thesameideaappliestomatricesaswell
Aninputmatrix Afilter Theresultofconvolution
Thefiltermovesacrossthematrix.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthematrixofthatsize.
Whatisaconvolution?
36
Thesameideaappliestomatricesaswell
Aninputmatrix Afilter Theresultofconvolution
Thefiltermovesacrossthematrix.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthematrixofthatsize.
Whatisaconvolution?
37
Thesameideaappliestomatricesaswell
Aninputmatrix Afilter Theresultofconvolution
Thefiltermovesacrossthematrix.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthematrixofthatsize.
Whatisaconvolution?
38
Thesameideaappliestomatricesaswell
Aninputmatrix Afilter Theresultofconvolution
Thefiltermovesacrossthematrix.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthematrixofthatsize.
Whatisaconvolution?
39
Thesameideaappliestomatricesaswell
Aninputmatrix Afilter Theresultofconvolution
Thefiltermovesacrossthematrix.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthematrixofthatsize.
Whatisaconvolution?
40
Thesameideaappliestomatricesaswell
Aninputmatrix Afilter Theresultofconvolution
Andsoon…Thefiltermovesacrossthematrix.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthematrixofthatsize.
Whatisaconvolution?
41
Thesameideaappliestomatricesaswell
Aninputmatrix Afilter Theresultofconvolution
Thefiltermovesacrossthematrix.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthematrixofthatsize.
Whatisaconvolution?
42
Thesameideaappliestomatricesaswell
Aninputmatrix Afilter Theresultofconvolution
Andsoon…Thefiltermovesacrossthematrix.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthematrixofthatsize.
Whatisaconvolution?
43
Thesameideaappliestomatricesaswell
Aninputmatrix Afilter Theresultofconvolution
Thefiltermovesacrossthematrix.
Ateachposition,theoutputisthedotproductofthefilterwithasliceofthematrixofthatsize.
Overview
• ConvolutionalNeuralNetworks:Abriefhistory
• ThetwooperationsinaCNN– Convolution– Pooling
• Convolution+Poolingasabuildingblock
• CNNsinNLP
• RecurrentnetworksvsConvolutionalnetworks
44
Pooling:Anaggregationoperation
• Aconvolutionproducesavector/matrixthatcapturespropertiesofeachwindow
• Poolingcombinesthisinformationtoproduceadown-sampledversionvector/matrix– Typicallyusingthemaximumortheaveragevaluewithinawindow
• Intuition– Afilterisafeaturedetectorthatdiscovershowwelleachwindow
matchesafeatureofinterest– Themostimportantfeaturesshouldberecognizedregardlessoftheir
location– Answer:Pooltheinformationfromdifferentwindowstogether
45
Whatispooling?
46
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thepoolingoperationcanbeappliedusingawindowaswell
Example1:Maxpoolingwithwindowsize3
Whatispooling?
47
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thepoolingoperationcanbeappliedusingawindowaswell
9
Example1:Maxpoolingwithwindowsize3
Whatispooling?
48
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thepoolingoperationcanbeappliedusingawindowaswell
9 9
Example1:Maxpoolingwithwindowsize3
Whatispooling?
49
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thepoolingoperationcanbeappliedusingawindowaswell
9 9 9
Example1:Maxpoolingwithwindowsize3
Whatispooling?
50
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thepoolingoperationcanbeappliedusingawindowaswell
9 9 9 8
Example1:Maxpoolingwithwindowsize3
Whatispooling?
51
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thepoolingoperationcanbeappliedusingawindowaswell
Example2:Averagepoolingwithwindowsize3
Whatispooling?
52
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thepoolingoperationcanbeappliedusingawindowaswell
Example2:Averagepoolingwithwindowsize3
8
Whatispooling?
53
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thepoolingoperationcanbeappliedusingawindowaswell
8 8.6
Example2:Averagepoolingwithwindowsize3
Whatispooling?
54
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thepoolingoperationcanbeappliedusingawindowaswell
Example2:Averagepoolingwithwindowsize3
8 8.6 8.3
Whatispooling?
55
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thepoolingoperationcanbeappliedusingawindowaswell
Example2:Averagepoolingwithwindowsize3
8 8.6 8.3 7
Whatispooling?
56
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thepoolingoperationcanbeappliedusingawindowaswell
Example3:Maxpoolingwithwindowsize=lengthofthevector
9
Whatispooling?
57
2 3 1 3 2 1
1 2 1
Avector𝐱
Filter𝐟 ofsize𝑛
Anexampleusingvectors
7 9 8 9 8 4Theoutput isalsoavector
Thepoolingoperationcanbeappliedusingawindowaswell
ImportantnoteTherearenolearnedparametersforthepoolingoperation.Itisadeterministicoperation.
Typicalkindsofpooling
• Maxpooling– Takethemaximumvalueoftheresultsoftheconvolution
• Averagepooling– Usesaveragetopoolinsteadofmax
• K-maxpooling– TakethetopKvalues(forafixedk)– Generalizationofmaxpooling
58
Overview
• ConvolutionalNeuralNetworks:Abriefhistory
• ThetwooperationsinaCNN– Convolution– Pooling
• Convolution+Poolingasabuildingblock
• CNNsinNLP
• RecurrentnetworksvsConvolutionalnetworks
59
Convolution+Pooling=onelayer
• Input:amatrix.Convolutionwilloperateoverwindowsofthismatrix.
Thiscouldbeextendedtogeneraltensorsaswell
60
Convolution+Pooling=onelayer
• Input:amatrix.Convolutionwilloperateoverwindowsofthismatrix.
• Thewindowsizedefinesthereceptivefield– Wewillrefertothewindowasx5
61
Convolution+Pooling=onelayer
• Input:amatrix.Convolutionwilloperateoverwindowsofthismatrix.
• Thewindowsizedefinesthereceptivefield– Wewillrefertothewindowasx5
• Afilterisdefinedbysomeparameters(thatwillbelearned)– Ingeneral,amatrixu ofthesameshapeasathewindowandabiasb
62
Convolution+Pooling=onelayer
• Input:amatrix.Convolutionwilloperateoverwindowsofthismatrix.
• Thewindowsizedefinesthereceptivefield– Wewillrefertothewindowasx5
• Afilterisdefinedbysomeparameters(thatwillbelearned)– Ingeneral,amatrixu ofthesameshapeasathewindowandabiasb
• Convolution:Iterateoverallwindowsandapplythefilter– Typicallyhasanon-linearity(e.g.ReLU)
𝑝( = 𝑔(𝑢 ⋅ 𝑥( + 𝑏)
63
Convolution+Pooling=onelayer
• Input:amatrix.Convolutionwilloperateoverwindowsofthismatrix.
• Thewindowsizedefinesthereceptivefield– Wewillrefertothewindowasx5
• Afilterisdefinedbysomeparameters(thatwillbelearned)– Ingeneral,amatrixu ofthesameshapeasathewindowandabiasb
• Convolution:Iterateoverallwindowsandapplythefilter– Typicallyhasanon-linearity(e.g.ReLU)
𝑝( = 𝑔(𝑢 ⋅ 𝑥( + 𝑏)
• Pooling:Aggregatethe𝑝(’sintoadown-sampledversion,sometimesasinglenumber
64
Convolution+Pooling=onelayer
• Input:amatrix.Convolutionwilloperateoverwindowsofthismatrix.
• Thewindowsizedefinesthereceptivefield– Wewillrefertothewindowasx5
• Afilterisdefinedbysomeparameters(thatwillbelearned)– Ingeneral,amatrixu ofthesameshapeasathewindowandabiasb
• Convolution:Iterateoverallwindowsandapplythefilter– Typicallyhasanon-linearity(e.g.ReLU)
𝑝( = 𝑔(𝑢 ⋅ 𝑥( + 𝑏)
• Pooling:Aggregatethe𝑝(’sintoadown-sampledversion,sometimesasinglenumber
• Typically,therearemanyfilters,eachofwhicharepooledindependently
65
Hyperparameters
• Filtersizes:Howbigshouldthefilterbe?– Typically,3x3,5x5,etc
• Stride:howdoesthefiltermovealongtheinput?– Itcouldskipsomesteps,ornot.
• Howmanyfiltersshouldthebe?
• Padding:Shouldtherebepaddingornot?Ifso,shouldthepaddingbezerosorrandom?
• Howbigshouldthepoolingwindowbe?
• Whatkindofpooling:Average,Max,L2norm?
66
Example:LeNet
Anexamplenetworkusesthesebuildingblock
67
LeNet-5wasproposedbyLeCun 1998forhandwritingrecognitionHadseverallevelsofconvolution-pooling
Overview
• ConvolutionalNeuralNetworks:Abriefhistory
• ThetwooperationsinaCNN– Convolution– Pooling
• Convolution+Poolingasabuildingblock
• CNNsinNLP
• RecurrentnetworksvsConvolutionalnetworks
68
ConvolutionalNeuralNetworksinNLP
• Goal:Torepresentasequenceofwordsasafeaturevector
• Approach:– Representthesequenceofwordsbysequence(s)ofembeddings– Convolvewithseveralfilters– Poolacrossthesequencetogetafeaturevectorofafixeddimensionality
69
ConvolutionalNeuralNetworksinNLP
70
Iatecaketoday
Supposewewanttoclassifythissentence:
Goal:Torepresentasequenceofwordsasafeaturevector
ConvolutionalNeuralNetworksinNLP
71
I
ate
cake
today
Wordembeddings
Goal:Torepresentasequenceofwordsasafeaturevector
ConvolutionalNeuralNetworksinNLP
72
I
ate
cake
today
Wordembeddings
padding
padding
Goal:Torepresentasequenceofwordsasafeaturevector
ConvolutionalNeuralNetworksinNLP
73
I
ate
cake
today
Wordembeddings
padding
padding
Applyafilter
Goal:Torepresentasequenceofwordsasafeaturevector
ConvolutionalNeuralNetworksinNLP
74
I
ate
cake
today
Wordembeddings
padding
padding
Goal:Torepresentasequenceofwordsasafeaturevector
ConvolutionalNeuralNetworksinNLP
75
I
ate
cake
today
Wordembeddings
padding
padding
Goal:Torepresentasequenceofwordsasafeaturevector
ConvolutionalNeuralNetworksinNLP
76
I
ate
cake
today
Wordembeddings
padding
padding
Goal:Torepresentasequenceofwordsasafeaturevector
ConvolutionalNeuralNetworksinNLP
77
I
ate
cake
today
Wordembeddings
padding
padding
Goal:Torepresentasequenceofwordsasafeaturevector
ConvolutionalNeuralNetworksinNLP
78
I
ate
cake
today
Wordembeddings
padding
padding
Convolutionwithonefilter
Goal:Torepresentasequenceofwordsasafeaturevector
ConvolutionalNeuralNetworksinNLP
79
I
ate
cake
today
Wordembeddings
padding
padding
Convolutionwithonefilter
Poolingacrossthesentence(oftenmax
pooling)togetonefeature
Goal:Torepresentasequenceofwordsasafeaturevector
ConvolutionalNeuralNetworksinNLP
80
I
ate
cake
today
Wordembeddings
padding
padding
Convolutionwithmanyfilters
Poolingacrossthesentence(oftenmax
pooling)getsafeaturevector
Therecanbeseveralfilters(sometimescalledkernels,orfeaturemaps)
Goal:Torepresentasequenceofwordsasafeaturevector
Convolution+poolingexample
81
1. Eachwordisembeddedintoa2dvector,thewindowconcatenatesthem
2. A6x3filterwithatanh non-linearity
3. Maxpoolingovereachdimensiontoproducea3dimensionalvector
Examplesofconvolution+pooling
82FigurefromGoldberg2017
Thinkofconvolutionsasfeatureextractors
Anarrowconvolution(i.e.withoutanypadding)inthevectorconcatenationnotation
Awideconvolution(i.e.withpadding)inthevectorstackingnotation
Overview
• ConvolutionalNeuralNetworks:Abriefhistory
• ThetwooperationsinaCNN– Convolution– Pooling
• Convolution+Poolingasabuildingblock
• CNNsinNLP
• RecurrentnetworksvsConvolutionalnetworks
83
Featuresfromtext
• Ifwewanttoclassifytext,weneedtorepresenttheminsomefeaturespace
• Wehave(atleast)twowaystogetfeaturesfromtextusinganeuralnetwork:– RecurrentNeuralNetworks– ConvolutionalNeuralNetworks
84
RNNsvsCNNs
• RNNsmodelnon-Markoviandependencies– Canlookat(effectively)infinitewindowsaroundatargetword– Cancapturesequentialpatternsinsuchwindows
85
RNNsvsCNNs
• RNNsmodelnon-Markoviandependencies– Canlookat(effectively)infinitewindowsaroundatargetword– Cancapturesequentialpatternsinsuchwindows
• CNNscaptureinformativengrams– Alsogappy n-grams– Butalsoaccountforlocalorderingpatterns
86
RNNsvsCNNs
• RNNsmodelnon-Markoviandependencies– Canlookat(effectively)infinitewindowsaroundatargetword– Cancapturesequentialpatternsinsuchwindows
• CNNscaptureinformativengrams– Alsogappy n-grams– Butalsoaccountforlocalorderingpatterns
• Howdotheycompare?– Botharetrainedend-to-endwithataskloss– RNNs(specifically,BiRNNs)aremorepopulartoday…
• … butthiscanchange– CNNsallowformoreparallelism,andsomaybebettersuitedforcertain
hardware/softwareimprovements
87
RNNsandCNNsasbuildingblocks
ThinkofthemasLegobricksforconstructinglargerarchitectures
BotharecomputationgraphsMixandmatchwithothercomputationgraphstocreatelargerneuralnetworks
Generaltoolsthatcanbeusedwithotherideasthatwehaveseenandwillsee
Eg:contextualembeddings,attention,etc.
88