variational autoencoders write poetryllcao.net/cu-deeplearning17/pp/class7_ elsbeth_fei-tzin.pdf ·...
TRANSCRIPT
VariationalAutoencoders Write Poetry
(Generating Sentences from a Continuous Space)
Elsbeth Turcan andFei-TzinLee
PaperbySamBowman,LukeVilnis etal.
2016
Motivation
– Generativemodelsfornaturallanguagesentences
– Machinetranslation
– Imagecaptioning
– Datasetsummarization
– Chatbots
– Etc.
– Wanttocapturehigh-levelfeaturesoftextsuchastopicandstyleandkeepthemconsistentwhengeneratingtext
Related work - RNNLM
– InthewordsofBowmanetal.,“AstandardRNNlanguagemodelpredictseachwordofasentenceconditionedonthepreviouswordandanevolvinghiddenstate.”
– Inotherwords,itonlylooksattherelationshipsbetweenconsecutivewords,andsodoesnotcontainorobserveanyglobalfeatures
– Butwhatifwewantglobalinformation?
Other related work
– Skip-thought
– Generatesentencecodesinthestyleofwordembeddings topredictcontextsentences
– Paragraphvector
– Avectorrepresenting theparagraphisincorporatedintosingle-wordembeddings
Autoencoders
– TypicallycomposedoftwoRNNs
– ThefirstRNNencodesasentenceintoanintermediatevector
– ThesecondRNNdecodestheintermediaterepresentationbackintoasentence,ideallythesameastheinput
Variational Autoencoders (VAEs)– Regularautoencoders learnonlydiscretemappingsfrompointtopoint
– However,ifwewanttolearnholisticinformationaboutthestructureofsentences,weneedtobeabletofillsentencespacebetter
– InaVAE,wereplacethehiddenvectorz withaposteriorprobabilitydistributionq(z|x)conditionedontheinput,andsampleourlatentz fromthatdistributionateachstep
– Weensurethatthisdistributionhasatractableformbyenforcingitssimilaritytoadefinedpriordistribution,typicallysomeformofGaussian
Modified loss function
– Theregularautoencoder’s lossfunctionwouldencouragetheVAEtolearnposteriorsasclosetodiscreteaspossible– inotherwords,Gaussiansthatareclusteredextremelytightlyaroundtheirmeans
– Inordertoenforceourposterior’ssimilaritytoawell-formedGaussian,weintroduceaKLdivergencetermintoourloss,asbelow:
Reparameterization trick
– Intheoriginalformulation,theencodernetencodesthesentence intoaprobabilitydistribution(usuallyGaussian);practicallyspeaking,itencodesthesentence intotheparametersofthedistribution(i.e.µandσ)
– However,thisposeschallengesforuswhilebackpropagating:wecan’tbackpropagate overthejumpfromµandσ toz,sinceit’srandom
– Solution:extracttherandomnessfromtheGaussianbyreformulatingitasafunctionofµ,σ,andanotherseparaterandomvariable
FromStackOverflow.
Specific architecture
– Single-layerLSTMforencoderanddecoder
Issues and fixes
– Decodertoostrong,withoutanylimitationsjustdoesn’tusez atall
– Fix:KLannealing
– Fix:worddropout
Experiments – Language modeling– UsedVAEtocreatelanguagemodelsonthePennTreebankdataset,with
RNNLMasbaseline
– Task:trainanLMonthetrainingsetandhaveitdesignatethetestsetashighlyprobable
– RNNLMoutperformedtheVAEinthetraditionalsetting
– However,whenhandicapswereimposedonbothmodels(inputless decoder),theVAEwassignificantlybetterabletoovercomethem
Experiments – Imputing missing words– Task:infermissingwordsinasentencegivensomeknownwords(imputation)
– PlacetheunknownwordsattheendofthesentencefortheRNNLM
– RNNLMandVAEperformedbeamsearch(VAEdecodingbrokenintothreesteps)toproducethemostlikelywordstocompleteasentence
– Preciseevaluationoftheseresultsiscomputationallydifficult
Adversarial evaluation
– Instead,createanadversarialclassifier,trainedtodistinguishrealsentencesfromgeneratedsentences,andscorethemodelonhowwellitfoolstheadversary
– Adversarialerrorisdefinedasthegapbetweenchanceaccuracy(50%)andtherealaccuracyoftheadversary– ideallythiserrorwillbeminimized
Experiments - Other
– SeveralotherexperimentsintheappendixshowedtheVAEtobeapplicabletoavarietyoftasks
– Textclassification
– Paraphrasedetection
– Questionclassification
Analysis
– Worddropout
– Keepratetoolow:sentencestructuresuffers
– Keepratetoohigh:nocreativity,stiflesthevariation
– Effectsoncostfunctioncomponents:
Extras: sampling from the posterior and homotopies– Samplingfromtheposterior:examplesofsentencesadjacentinsentencespace
– Homotopies:linearinterpolationsinsentencespacebetweenthecodesfortwosentences
Even more homotopies
Thanks for listening!
– Anyquestions?