lecture 22: the attention mechanism - github pages€¦ · lecture 22: the attention mechanism theo...

CS839:ProbabilisticGraphicalModels

Lecture22:TheAttentionMechanismTheoRekatsinas

WhyAttention?

• Considermachinetranslation:• Weneedtopayattentiontothewordwearecurrentlytranslating.Istheentiresequenceneededascontext?

• Thecatisblack->Lechatest noir

WhyAttention?

• Considermachinetranslation:• Weneedtopayattentiontothewordwearecurrentlytranslating.Istheentiresequenceneededascontext?

• Thecatisblack->Lechatest noir

• RNNsarethede-factostandardformachinetranslation• Problem:translationreliesonreadingacompletesentenceandcompressesallinformationintoafixed-lengthvectorasentencewithhundredsofwordsrepresentedbyseveralwordswillsurelyleadtoinformationloss,inadequatetranslation,etc.

• Long-rangedependenciesaretricky.

Basicencoder- decoder

SoftAttentionforTranslation

“Ilovecoffee”->“Megustaelcafé”

Distributionoverinputwords

Bahdanauetal,“NeuralMachineTranslationbyJointlyLearningtoAlignandTranslate”,ICLR2015

SoftAttention

FromY.Bengio CVPR2015Tutorial

BidirectionalencoderRNN

DecoderRNN

AttentionModel

SoftAttentionContextvector(inputtodecoder):

Mixtureweights:

Alignmentscore(howwelldoinputwordsnearjmatchoutputwordsatpositioni):

SoftAttentionLuong,PhamandManning’sTranslationSystem(2015):

LuongandManningIWSLT2015

TranslationErrorRatevsHuman

HardAttention

MonotonicAttention

GlobalAttention• Blue=encoder• Red=decoder• Attendtoacontextvector.• Decodercapturesglobalinformationnotonlytheinformationfromonehiddenstate.• Contextvectortakesallcell’soutputsasinputandcomputesaprobabilitydistributionforeachtokenthedecoderwantstogenerate

LocalAttention

• Computeabestalignedpositionfirst• Thencomputeacontextvectorcenteredatthatposition

RNNforCaptioning

Image:HxWx3

Features:D

Hiddenstate:H

Firstword

Secondword

Distributionovervocab

RNNonlylooksatwholeimage,once

WhatiftheRNNlooksatdifferentpartsoftheimageateachtimestep?

SoftAttentionforCaptioning

Image:HxWx3

Features:LxD

Xuetal,“Show,AttendandTell:NeuralImageCaptionGenerationwithVisualAttention”,ICML2015

Image:HxWx3

Features:LxD

Image:HxWx3

Features:LxD

DistributionoverLlocations

Image:HxWx3

Features:LxD

Weightedcombinationoffeatures

z1Weightedfeatures:D

Image:HxWx

Features:LxD

Weightedfeatures:D y1

Firstword

Image:HxWx3

Features:LxD

Firstword

Weightedfeatures:D

Image:HxWx3

Features:LxD

Firstword

z2Weightedfeatures:D

Image:HxWx3

Features:LxD

Firstword

z2 y2Weightedfeatures:D

Image:HxWx3

Features:LxD

Firstword

z2 y2Weightedfeatures:D

SoftvsHardAttention

Image:HxWx3

Gridoffeatures(EachD-dimensional)

Distributionovergridlocations

pa+pb+pc+pc=1

FromRNN:

SoftvsHardAttention

Image:HxWx3

pa+pb+pc+pc=1

FromRNN:

Contextvectorz(D-dimensional)

SoftvsHardAttention

Image:HxWx3

pa+pb+pc+pc=1

FromRNN:

Softattention:SummarizeALLlocationsz=paa+pbb +pcc +pdd

Derivativedz/dpisnice!Trainwithgradientdescent

SoftvsHardAttention

Image:HxWx3

pa+pb+pc+pc=1

FromRNN:

Softattention:SummarizeALLlocationsz=paa+pbb +pcc +pdd

Derivativedz/dpisnice!Trainwithgradientdescent

Hardattention:SampleONElocation

accordingtop,z=thatvector

Withargmax,dz/dpiszeroalmosteverywhere…

Can’tusegradientdescent;needreinforcementlearning

SoftvsHardAttention

Multi-headedAttention

Attentionisallyouneed

Attentiontricks

SoftvsHardAttentionAttentionTakeawaysPerformance:• Attentionmodelscanimprove

accuracy andreducecomputationatthesametime.

Complexity:• Therearemanydesignchoices.• Thosechoiceshaveabigeffectonperformance.• Ensemblinghasunusuallylargebenefits.• Simplifywherepossible!

SoftvsHardAttentionAttentionTakeawaysExplainability:• Attentionmodelsencodeexplanations.• Bothlocusandtrajectoryhelp

understandwhat’sgoingon.

Hardvs.Soft:• Softmodelsareeasiertotrain,hardmodelsrequirereinforcementlearning.

• Theycanbecombined,asinLuongetal.

lecture 22: the attention mechanism - github pages€¦ · lecture 22: the attention mechanism theo...

Documents

residual learning, attention mechanism and multi-tasks

attention branch network: learning of attention mechanism...

interactions of visual attention and object recognition:...

singing voice extraction with attention-based spectrograms...

contrastive attention mechanism for abstractive sentence...

abstractive text summarization with attention-based...

accepted to ieee transactions on image ... context memory...

naming in young children: a dumb attentional mechanism...

fem 4100 topic 5 perception mechanism, awareness & attention

attention-based learning for missing data imputation in...

3d attention mechanism for fine-grained classification of

lecture 8: machine translation and sequence-to-sequence...

attention models & current topics in neural mt•solution:...

a possible mechanism for impaired joint attention in...

afs: an attention-based mechanism for supervised feature...

a mechanism for learning, attention switching, and cognition

molecular mechanism of action of herbicides · molecular...

attention-based encoder-decoder networks for spelling and...

self-supervised equivariant attention mechanism for weakly...

oscillatory phase synchronisation: a brain mechanism of...