controlling linguistic style aspects in neural language...
TRANSCRIPT
ControllingLinguisticStyleAspectsinNeuralLanguageGeneration
JessicaFicler andYoav Goldberg
ISCOL2017
ControllingLinguisticStyleAspectsinNeuralLanguageGeneration
JessicaFicler andYoav Goldberg
ISCOL2017
Ourgoalistogeneratetext……whileallowingcontrolofitsstyle.
Style
Thesamemessage(e.g.expressingapositivesentimenttowardsamovie)canbeconveyedindifferentways.
“OMG...ThismovieactuallymademecryalittlebitbecauseIlaughedsohardatsomeparts.“
StyleAspects(Example)
“OMG...ThismovieactuallymademecryalittlebitbecauseIlaughedsohardatsomeparts.“
Colloquialstyle
StyleAspects(Example)
“OMG...Thismovieactuallymademe cryalittlebitbecauseI laughedsohardatsomeparts.“
Colloquialstyle
Personalvoice
StyleAspects(Example)
“OMG...Thismovieactuallymademe cryalittle bitbecauseI laughedsohard atsomeparts.“
Colloquialstyle
Personalvoice
Fewadjectives
StyleAspects(Example)
“OMG...Thismovieactuallymademe cryalittle bitbecauseI laughedsohard atsomeparts.“
Colloquialstyle
Personalvoice
Fewadjectives
“Agenuinelyunique,full-onsensoryexperiencethattreadsitsownpathbetweennarrativeclarityandpurevisualexpression.”
StyleAspects(Example)
“OMG...Thismovieactuallymademe cryalittle bitbecauseI laughedsohard atsomeparts.“
Colloquialstyle
Personalvoice
Fewadjectives
“Agenuinelyunique,full-onsensoryexperiencethattreads itsownpathbetweennarrativeclarityandpurevisualexpression.”
Professionalcritic
StyleAspects(Example)
“OMG...Thismovieactuallymademe cryalittle bitbecauseI laughedsohard atsomeparts.“
Colloquialstyle
Personalvoice
Fewadjectives
“Agenuinelyunique,full-onsensoryexperiencethattreads itsownpathbetweennarrativeclarityandpurevisualexpression.”
Professionalcritic
Impersonalvoice
StyleAspects(Example)
“OMG...Thismovieactuallymademe cryalittle bitbecauseI laughedsohard atsomeparts.“
Colloquialstyle
Personalvoice
Fewadjectives
“Agenuinelyunique,full-on sensoryexperiencethattreads itsown pathbetweennarrativeclarityandpurevisual expression.”
Professionalcritic
Impersonalvoice
Manyadjectives
StyleAspects(Example)
Thechallenge
Generatetextthatconformstoasetofcontent-based andstylisticrequirements.
Thechallenge
Generatetext thatconformstoasetofcontent-based andstylisticrequirements.
fulllength,naturalsentences
Thechallenge
Generatetext thatconformstoaset ofcontent-based andstylisticrequirements.
morethan2
fulllength,naturalsentences
Example
Theme:ActingDescriptive:True
Example
Theme:ActingDescriptive:True
“Awhollyoriginal,well-acted,romanticcomedythat'selevatedbythemodesttalentsofalesserknowncast.”
Example
Theme:ActingDescriptive:True
“Awhollyoriginal,well-acted,romanticcomedythat'selevatedbythemodesttalentsofalesserknowncast.”
Example
Theme:ActingDescriptive:True
“Awhollyoriginal,well-acted,romanticcomedythat'selevatedbythemodest talentsofalesser known cast.”
Example
Theme:ActingDescriptive:True
“Awhollyoriginal,well-acted,romanticcomedythat'selevatedbythemodest talentsofalesser known cast.”
Theme:PlotDescriptive:False
“Ithinkthepoor writing andscript arewhatcausedthismovietobomb.”
FormalDefinition
• Weassumeasetofkparameters𝑝" …𝑝%,eachparameter𝑝& withasetofpossiblevalues𝑉()
FormalDefinition
• Weassumeasetofkparameters𝑝" …𝑝%,eachparameter𝑝& withasetofpossiblevalues𝑉()
• Input:specificassignmenttotheseparameters
e.g.ValueParameter
FalseProfessional
TruePersonal
≤ 10Length
FalseDescriptive
OtherTheme
PositiveSentiment
FormalDefinition
• Weassumeasetofkparameters𝑝" …𝑝%,eachparameter𝑝& withasetofpossiblevalues𝑉()
• Input:specificassignmenttotheseparameters
e.g.
Output:atextthatiscompatiblewiththeparametersvalues
ValueParameter
FalseProfessional
TruePersonal
≤ 10Length
FalseDescriptive
OtherTheme
PositiveSentiment
e.g.“Idon'tunderstandwhyitisratedsopoorly.”
ThisworkWeconsider6parametersandvaluesfromthemoviereviewsdomain
ContentStyle
SentimentTheme
ProfessionalPersonal
DescriptiveLength
ContentParameters
TaskDescription– ContentParameters
Sentiment- Thescorethatthereviewergavethemovie
Positive Neutral
“Whilethefilmdoesn'tquitereach thelevelofsugarfluctuations,it'sbeautifullyanimated.”
“Thismovieissomuchtokeepyouontheedgeofyourseat.”
Negative
“It’saverylow-budgetmoviethatjustseemstobeabunchoffluff.”
TaskDescription– ContentParameters
Theme- Whetherthesentence'scontentisaboutthePlot,Acting,Production,Effectsornoneofthese(Other)
TaskDescription– ContentParameters
Theme- Whetherthesentence'scontentisaboutthePlot,Acting,Production,Effectsornoneofthese(Other)
Plot- “Thestoryline hadmelaughingoutloud.”
TaskDescription– ContentParameters
Theme- Whetherthesentence'scontentisaboutthePlot,Acting,Production,Effectsornoneofthese(Other)
Acting- “Thecast areallexcellent.”
Plot- “Thestoryline hadmelaughingoutloud.”
TaskDescription– ContentParameters
Theme- Whetherthesentence'scontentisaboutthePlot,Acting,Production,Effectsornoneofthese(Other)
Production- “Thedirector'smagical.”
Acting- “Thecast areallexcellent.”
Plot- “Thestoryline hadmelaughingoutloud.”
TaskDescription– ContentParameters
Theme- Whetherthesentence'scontentisaboutthePlot,Acting,Production,Effectsornoneofthese(Other)
Effects- “Onlysavinggraceisthesound effects.”
Production- “Thedirector'smagical.”
Acting- “Thecast areallexcellent.”
Plot- “Thestoryline hadmelaughingoutloud.”
TaskDescription– ContentParameters
Theme- Whetherthesentence'scontentisaboutthePlot,Acting,Production,Effectsornoneofthese(Other)
Effects- “Onlysavinggraceisthesound effects.”
Other- “I'mafraidthatthemovieisaimedatkidsandadultsweren'tsurewhattosayaboutit.”
Production- “Thedirector'smagical.”
Acting- “Thecast areallexcellent.”
Plot- “Thestoryline hadmelaughingoutloud.”
StyleParameters
TaskDescription– StyleParameters
Length– Numberofwords
≤10words
11-20words
21-40words
>40words
TaskDescription– StyleParameters
Professional- Whetherthereviewiswritteninthestyleofaprofessionalcriticornot
TaskDescription– StyleParameters
Professional- Whetherthereviewiswritteninthestyleofaprofessionalcriticornot
True
“Thisisabreathoffreshair,it'sawelcomereturntothefranchise'sbrandofsatiricalhumor.”
TaskDescription– StyleParameters
Professional- Whetherthereviewiswritteninthestyleofaprofessionalcriticornot
True False
“Sogladtoseethismovie!!”“Thisisabreathoffreshair,it'sawelcomereturntothefranchise'sbrandofsatiricalhumor.”
TaskDescription– StyleParameters
Personal- Whetherthereviewdescribessubjectiveexperience(writteninpersonalvoice)ornot
TaskDescription– StyleParameters
Personal- Whetherthereviewdescribessubjectiveexperience(writteninpersonalvoice)ornot
True
“Icouldseethemovieagain”
TaskDescription– StyleParameters
Personal- Whetherthereviewdescribessubjectiveexperience(writteninpersonalvoice)ornot
True False
“Verysimilartothebook.”“Icouldseethemovieagain”
TaskDescription– StyleParameters
Descriptive- Whetherthereviewisindescriptive(containsahighratioofadjectives)styleornot
TaskDescription– StyleParameters
True
“Suchahilarious andfunny romanticcomedy.”
Descriptive- Whetherthereviewisindescriptive(containsahighratioofadjectives)styleornot
TaskDescription– StyleParameters
True False
“Adefinitemustseeforfansofanimefans,popculturereferencesandanimationwithagood laughtoo.”
“Suchahilarious andfunny romanticcomedy.”
Descriptive- Whetherthereviewisindescriptive(containsahighratioofadjectives)styleornot
Andwewouldliketocontrolforalltheseaspectssimultaniously
ValueParameterType
FalseProfessionalStyle
TruePersonalStyle
≤ 10LengthStyle
FalseDescriptiveStyle
OtherThemeContent
PositiveSentimentContent
“Idon'tunderstandwhyitisratedsopoorly.”
Model
Model
aconditionedlanguagemodel:
𝑃 𝑤" …𝑤- 𝑐 = 0 𝑃(𝑤2|𝑤", … , 𝑤25" , 𝑐)-
27"
Model
aconditionedlanguagemodel:
Conditioneachwordonthehistory,aswellasonacontextc.
𝑃 𝑤" …𝑤- 𝑐 = 0 𝑃(𝑤2|𝑤", … , 𝑤25" , 𝑐)-
27"
ModelInourcase,cisaconcatenationoftheparametersvaluesembeddingvectors
c: Theme:PlotProffesional:TrueDescriptive:TrueLength:≤10Sentiment:PositivePersonal:False
ModelInourcase,cisaconcatenationoftheparametersvaluesembeddingvectors
c: Theme:PlotProffesional:TrueDescriptive:TrueLength:≤10Sentiment:PositivePersonal:False
start
ModelInourcase,cisaconcatenationoftheparametersvaluesembeddingvectors
c: Theme:PlotProffesional:TrueDescriptive:TrueLength:≤10Sentiment:PositivePersonal:False
start
ModelInourcase,cisaconcatenationoftheparametersvaluesembeddingvectors
c: Theme:PlotProffesional:TrueDescriptive:TrueLength:≤10Sentiment:PositivePersonal:False
start
An
ModelInourcase,cisaconcatenationoftheparametersvaluesembeddingvectors
c: Theme:PlotProffesional:TrueDescriptive:TrueLength:≤10Sentiment:PositivePersonal:False
start
AnAn
ModelInourcase,cisaconcatenationoftheparametersvaluesembeddingvectors
c: Theme:PlotProffesional:TrueDescriptive:TrueLength:≤10Sentiment:PositivePersonal:False
start
AnAn
ModelInourcase,cisaconcatenationoftheparametersvaluesembeddingvectors
c: Theme:PlotProffesional:TrueDescriptive:TrueLength:≤10Sentiment:PositivePersonal:False
start
AnAnentertaining
ModelInourcase,cisaconcatenationoftheparametersvaluesembeddingvectors
c: Theme:PlotProffesional:TrueDescriptive:TrueLength:≤10Sentiment:PositivePersonal:False
start
AnAnentertaining
entertaining
and
and
visually
visually
attractive
attractive
family-friendly
family-friend
ly
story
story
.
Themodelissimple,but…
weneedtrainingdata annotatedwiththeappropriatevalues.
Text
Parameters
extract
Text
Parameters
extractMetadata
Heuristics
Text
Parameters
extract trainMetadata
Heuristics
Text
Parameters
extract trainMetadata
Heuristics
Rotten-Tomatoeswebsite.7,500movies.1,002,625moviereviews.
Text
Parameters
extract trainMetadata
Heuristics
Rotten-Tomatoeswebsite.7,500movies.1,002,625moviereviews.
Professional
ProfessionalInrottentomatoes thecriticreviewsareseparatedfromtheaudiencereview
ProfessionalInrottentomatoes thecriticreviewsareseparatedfromtheaudiencereview
ProfessionalNonProfessional
Someofthenon-professionalreviewersareconsideredas“superreviewers”
Alsoprofessional
Sentiment
Sentiment
Sentimentscores
SentimentSentimentWenormalizedthecriticsscorestobeon0-5scale
Negative
0-2
Neutral
3
Positive
4-5
Text
Parameters
extract trainMetadata
Heuristics
Rotten-Tomatoeswebsite.7,500movies.1,002,625moviereviews.
Text
Parameters
extract trainMetadata
Heuristics
Rotten-Tomatoeswebsite.7,500movies.1,002,625moviereviews.
Contentwords
Functionwords
POStags
ThemeContentwords
EffectsProductionDirectorDirected
Productionco-production
ActingActingCast
PerformancePlayRole
MiscastingActor
PlotStory
StorytellingPlotScript
ManuscriptTaleScene
EffectsSongMusicVoiceVisual
SoundtrackShot
Todeterminethevalueforthethemeparameterwesearchedforwordsthatarerelatedtothe4topicsandarecommoninourdataset
Theme
ThemeContentwords
EffectsProductionDirectorDirected
Productionco-production
ActingActingCast
PerformancePlayRole
MiscastingActor
PlotStory
StorytellingPlotScript
ManuscriptTaleScene
EffectsSongMusicVoiceVisual
SoundtrackShot
Eachsentencewaslabeledwiththecategorythathasthemostwordsinthesentence.Sentencesthatdonotincludeanywordsfromourlistsarelabeledasother
Todeterminethevalueforthethemeparameterwesearchedforwordsthatarerelatedtothe4topicsandarecommoninourdataset
Theme
PersonalVoice
Personal
True
I
My
False
Othercases
Todetermineweatherareviewiswritteninpersonalvoicewesearchforwordsthatexpresssubjectivity
PersonalPronouns
Descriptiveness
Weassumethatdescriptivetextsmakeheavyuseofadjectives
True
%JJ≥35
False
Othercases
Distributionofpart-of-speechtags
Descriptive
Length
Length ≤10words 11-20words 21-40words >40words
DatasetStatisticsOurfinaldata-setincludes2,773,435sentences
Wedividedthedatasettotraining(~2.7M),development(~2K)andtest(~2K)sets
Eachsentenceislabeledwiththe6parameters
ParametersValues Texteasy
ParametersValues Texteasy
TextParametersValueseasy
hard
TextParametersValuesextract
hard
Text
ConditionedLanguageModel
ParametersValuesextract
Text
ConditionedLanguageModel
Doesthiswork?
ParametersValuesextract
ExamplesofGeneratedSentences
ValueParameter
FalseProfessional
TruePersonal
11-20Length
TrueDescriptive
OtherTheme
NegativeSentiment
ExamplesofGeneratedSentences
ValueParameter
FalseProfessional
TruePersonal
11-20Length
TrueDescriptive
OtherTheme
NegativeSentiment
“Ultimately,Icanhonestlysaythatthismovieisfullofstupidstupid andstupidstupid stupidstupid stupid.”
ExamplesofGeneratedSentences
ValueParameter
FalseProfessional
TruePersonal
11-20Length
TrueDescriptive
OtherTheme
NegativeSentiment
“Ultimately,Icanhonestlysaythatthismovieisfullofstupidstupid andstupidstupid stupidstupid stupid.”
ExamplesofGeneratedSentences
ValueParameter
FalseProfessional
TruePersonal
11-20Length
TrueDescriptive
OtherTheme
NegativeSentiment
“Ultimately,Icanhonestlysaythatthismovieisfullofstupidstupid andstupidstupid stupidstupid stupid.”
ExamplesofGeneratedSentences
ValueParameter
FalseProfessional
TruePersonal
11-20Length
TrueDescriptive
OtherTheme
NegativeSentiment
“Ultimately,Icanhonestlysaythatthismovieisfullofstupidstupid andstupidstupid stupidstupid stupid.”
ExamplesofGeneratedSentences
ValueParameter
FalseProfessional
TruePersonal
11-20Length
TrueDescriptive
OtherTheme
NegativeSentiment
“Ultimately,Icanhonestlysaythatthismovieisfullofstupidstupid andstupidstupid stupidstupid stupid.”
ExamplesofGeneratedSentences
ValueParameter
FalseProfessional
TruePersonal
11-20Length
TrueDescriptive
OtherTheme
NegativeSentiment
“Ultimately,Icanhonestlysaythatthismovieisfullofstupidstupid andstupidstupid stupidstupid stupid.”
ExamplesofGeneratedSentences
ValueParameter
FalseProfessional
TruePersonal
11-20Length
TrueDescriptive
OtherTheme
NegativeSentiment
“Ultimately,Icanhonestlysaythatthismovieisfullofstupidstupid andstupidstupid stupidstupid stupid.”
ExamplesofGeneratedSentences
ValueParameter
FalseProfessional
TruePersonal
11-20Length
TrueDescriptive
OtherTheme
NegativeSentiment
“Ultimately,Icanhonestlysaythatthismovieisfullofstupidstupid andstupidstupid stupidstupid stupid.”
“Thefilm’ssimple,andarefreshingtakeonthecomplexfamilydramaoftheregionsofhumanintelligence.”
ValueParameter
TrueProfessional
FalsePersonal
11-20Length
FalseDescriptive
OtherTheme
PositiveSentiment
ExamplesofGeneratedSentences
ValueParameter
FalseProfessional
TruePersonal
11-20Length
TrueDescriptive
OtherTheme
NegativeSentiment
“Ultimately,Icanhonestlysaythatthismovieisfullofstupidstupid andstupidstupid stupidstupid stupid.”
“Thefilm’ssimple,andarefreshingtakeonthecomplexfamilydramaoftheregionsofhumanintelligence.”
ValueParameter
TrueProfessional
FalsePersonal
11-20Length
FalseDescriptive
OtherTheme
PositiveSentiment
Wewouldliketoquantitatively measureourmodelcapabilities.
Evaluation
• EvaluatingLMQuality(Perplexity)• EvaluatingtheGeneratedSentences
EvaluatingLMQuality
SanityCheck
1.Conditionedvs.UnconditionedDoesknowingtheparametersindeedhelpsinachievingbetterlanguagemodelingresults?
SanityCheck
1.Conditionedvs.UnconditionedDoesknowingtheparametersindeedhelpsinachievingbetterlanguagemodelingresults?
TestDev
24.4 25.8Not-conditioned
23.3 24.8Conditioned
Knowingthecorrectparametervaluesindeedresultsinbetterperplexity!
Baseline
2.Conditionedvs.DedicatedLMs
IsourmodeleffectivecomparingtotrainaseparateunconditionedLMonsubsetofthedata(dedicatedLM)?
Baseline
2.Conditionedvs.DedicatedLMs
IsourmodeleffectivecomparingtotrainaseparateunconditionedLMonsubsetofthedata(dedicatedLM)?
DataSet
Baseline
2.Conditionedvs.DedicatedLMs
IsourmodeleffectivecomparingtotrainaseparateunconditionedLMonsubsetofthedata(dedicatedLM)?
DataSet
Sentiment:Positive
Sentiment:Neutral
Sentiment:Negative
whengeneratingtext,wewouldchoosethemodelthatcorrespondstotherequested
Whengeneratingtext,wewouldchoosethemodelthatcorrespondstotherequestedvalue
Baseline
2.Conditionedvs.DedicatedLMs
IsourmodeleffectivecomparingtotrainaseparateunconditionedLMonsubsetofthedata(dedicatedLM)?
DataSet
Sentiment:Positive;Proffesional:True
Sentiment:Positive;Proffesional:False
Sentiment:Neutral;Proffesional:True
Sentiment:Neutral;Proffesional:False
Sentiment:Negative;Proffesional:True
Sentiment:Negative;Proffesional:False
Thenumberofmodelsthatneedtobetraineddependsonthenumberofparametersandthepossiblevalues
Baseline
2.Conditionedvs.DedicatedLMs
IsourmodeleffectivecomparingtotrainaseparateunconditionedLMonsubsetofthedata(dedicatedLM)?
DataSet
Sentiment:Positive;Proffesional:True;Theme:Other;Personal:False;Length:21-40;Descriptive:False
Sentiment:Negative;Proffesional:False;Theme:Other;Personal:True;Length:21-40;Descriptive:False
.
.
.
.240
Evaluation(LanguageModelQuality)
WehypothesizethattheconditionedLMwillbeableto:
Evaluation(LanguageModelQuality)
WehypothesizethattheconditionedLMwillbeableto:• Generalizeacrossproperties-combinations
Evaluation(LanguageModelQuality)
WehypothesizethattheconditionedLMwillbeableto:• Generalizeacrossproperties-combinations• Sharedatabetweenthedifferentsettings
Evaluation(LanguageModelQuality)
WehypothesizethattheconditionedLMwillbeableto:• Generalizeacrossproperties-combinations• Sharedatabetweenthedifferentsettings
AndthuswillbemoreeffectivethanadedicatedLM
Evaluation(LanguageModelQuality)
Weverifythishypothesisbytrainingdedicatedmodelsandcomparetheirresultsonthecorrespondingdatatotheresultsachievedbyourmodel
BaselineForasetofparametersandvalues𝑝" …𝑝-,wetrainnsub-modelsEachsubmodel𝑚& istrainedonthesubsetofsentencesthatmatchparameters𝑝" …𝑝&
BaselineForasetofparametersandvalues𝑝" …𝑝-,wetrainnsub-modelsEachsubmodel𝑚& istrainedonthesubsetofsentencesthatmatchparameters𝑝" …𝑝&Example- giventhesetofparametersvalues:personal:false,sentiment:pos,professional:false,theme:other andlength:≤10wetrain5sub-models:
1. personal:false2. persoal:false andsentiment:positive3. persoal:false,sentiment:positive andprofessional:false4. persoal:false,sentiment:positive,professional:false and theme:other5. persoal:false,sentiment:positive,professional:false, theme:other andlength:≤10
BaselineForasetofparametersandvalues𝑝" …𝑝-,wetrainnsub-modelsEachsubmodel𝑚& istrainedonthesubsetofsentencesthatmatchparameters𝑝" …𝑝&Example- giventhesetofparametersvalues:personal:false,sentiment:pos,professional:false,theme:other andlength:≤10wetrain5sub-models:
1. personal:false2. persoal:false andsentiment:positive3. persoal:false,sentiment:positive andprofessional:false4. persoal:false,sentiment:positive,professional:false and theme:other5. persoal:false,sentiment:positive,professional:false, theme:other andlength:≤10
Asweaddparameters,thesizeofthetrainingsetofthesub-modeldecreases.
Baseline
Wemeasuretheperplexityofthededicatedmodelsonthetest-setsentencesthatmatchthecriteriaandcompareittoourconditionedLMandtoanunconditionedlanguagemodel.
Wedothisfor4differentparameter-sets.
Evaluation(LanguageModelQuality)
Thededicatedmodelachievesbetterperplexitythanourmodelondatawithpersonal:false
Evaluation(LanguageModelQuality)
Thededicatedmodelachievesbetterperplexitythanourmodelondatawithpersonal:false
Thegapisgettingsmallerasthededicatedmodelincludesmoreproperties
Evaluation(LanguageModelQuality)
Thededicatedmodelachievesbetterperplexitythanourmodelondatawithpersonal:false
Thegapisgettingsmallerasthededicatedmodelincludesmoreproperties
Eventuallytheconditionedmodelresultisbetterthanthededicatedmodelresult
Evaluation(LanguageModelQuality)
Thisisthecasealsointheother3setsthatwereexperimented
Evaluation(LanguageModelQuality)
ThededicatedLMscoresarebetterthanourmodelwhen:• Onlyfewconditioningparametersareneeded• Thecoverageoftheparametercombinationinthetrainingsetislargeenough
Evaluation(LanguageModelQuality)
ThededicatedLMscoresarebetterthanourmodelwhen:• Onlyfewconditioningparametersareneeded• Thecoverageoftheparametercombinationinthetrainingsetislargeenough
Weconcludethattheconditionedmodelmanagestogeneralizefromsentenceswithdifferentsetsofproperties,andiseffectivealsowithlargenumberofconditioningfactors.
Evaluation(LanguageModelQuality)
3.Conditionedvs.FlippedConditioningHoweffectivearetheconditioningparametersindividually?
Evaluation(LanguageModelQuality)
3.Conditionedvs.FlippedConditioningHoweffectivearetheconditioningparametersindividually?
Wecomparetheperplexitywhenusingthecorrectconditioningvaluestotheperplexityachievedwhenflippingtheparametervaluetoanincorrectone.
Evaluation(LanguageModelQuality)
23.3CorrectValue27.2 ReplacingDescriptivewithnon-Descriptive27.5 ReplacingPersonal25 ReplacingProfessional24.3 ReplacingSentimentPos withNeg
Themodeldistinguishesdescriptivetextandpersonalvoicebetterthanitdistinguishessentimentandprofessionaltext.
EvaluatingtheGeneratedSentences
Evaluation(GeneratedSentences)
Howwellsentencesgeneratedbythemodelmatchtherequestedbehavior(conditioningproperties)?
Evaluation(GeneratedSentences)
1.CapturingIndividualProperties
Foreachparameter,wemeasurethecorrespondenceofthesentencestotherequestedvalues.
Evaluation(GeneratedSentences)
1.CapturingIndividualProperties
Length
MaxMinAvgRequestedLength
2117.6<=10
25520.611-20
4973421-40
Evaluation(GeneratedSentences)
Descriptive
descriptive:true – 85.7%descriptivedescriptive:false – 96%non-descriptive
Wemeasurethepercentageofsentencesthatareconsideredasdescriptivewhenrequestingdescriptive:true,andwhenrequestingdescriptive:false
Evaluation(GeneratedSentences)
Personal
personal:true – 100%personalpersonal:false – 99.85%non-personal
Wemeasurethepercentageofsentencesthatareconsideredaspersonalvoicewhenrequestingpersonal:true,andwhenrequestingpersonal:false
Evaluation(GeneratedSentences)Theme
%Other%Effects%Prod%Acting%PlotRequestedvalue
0.30.200.898.7Plot
1.60.6095.32.5Acting
02.697.400Production
2.491.705.90Effects
99.90.0300.030.04Other
Foreachofthepossiblethemevalues,wecomputetheproportionofthesentencesthatweregeneratedwiththecorrespondingvalue.
Theconfusionshowsthatthemajorityofsentencesaregeneratedaccordingtotherequestedtheme
Evaluation(GeneratedSentences)Professional
TheprofessionalpropertycouldnotbeevaluatedautomaticallyWeperformedmanualevaluationusingMechanicalTurk
Evaluation(GeneratedSentences)Professional
TheprofessionalpropertycouldnotbeevaluatedautomaticallyWeperformedmanualevaluationusingMechanicalTurk
Canapersondistinguishprofessional:truefromprofessional:false?
Evaluation(GeneratedSentences)Professional
TheprofessionalpropertycouldnotbeevaluatedautomaticallyWeperformedmanualevaluationusingMechanicalTurk
Canapersondistinguishprofessional:truefromprofessional:false?
Werandomlycreated1000sentence-pairs:• professional:true• professional:false
Evaluation(GeneratedSentences)
(t)“Thisfilmhasacertainsenseofimaginationandasoberinglookattheclandestineindictment.”
(f)“Iknowit’salittlebittoolong,butit’sagreatmovietowatch!!!!”
Whichofthesentenceswaswrittenbyaprofessionalcritic?
Evaluation(GeneratedSentences)
Settings5differentannotators.majorityvote.
ResultTheannotatorswereabletotellaparttheprofessionalfromnon-professionalsentencesgeneratedsentencesin72.1% ofthecases.
Evaluation(GeneratedSentences)
AnalysisInafewcasesthesentencethatwasgeneratedforprofessional:truewasindeednotprofessionalenough
“Lookingforwardtothetrailer.”
Evaluation(GeneratedSentences)
Inmanycases,bothsentencescouldindeedbeconsideredaseitherprofessionalornot
Example:(t)“Thisisacutemoviewithsomefunnymoments,andsomeofthejokesarefunnyandentertaining.”(f)“Absolutelyamazingstoryofbraveryanddedication.”
Evaluation(GeneratedSentences)Sentiment
ManualannotationsusingMechanicalTurk
Werandomlycreated300pairsofgeneratedsentencesforeachofthefollowingsettings:
• positive/negative• positive/neutral• negative/neutral.
Evaluation(GeneratedSentences)Sentiment
ManualannotationsusingMechanicalTurk
Werandomlycreated300pairsofgeneratedsentencesforeachofthefollowingsettings:
• positive/negative• positive/neutral• negative/neutral.
Whichofthereviewerslikedthemoviemorethantheother?
Evaluation(GeneratedSentences)
Settings5differentannotators.Majorityvote.
Results
86.3%Positive/Negative63%Positive/Neutral69.7%Negative/Neutral
Evaluation(GeneratedSentences)
Exampleswheretheintendedsentimentwasnotrecognizedbytheannotators:
(Pos)“It’sashamethatthisfilmisnotasgoodasthepreviousfilm,butitstilldelivers.”(Neg)“Thepremiseisgreat,theactingisnotbad,butthespecialeffectsaresobad.”
Evaluation(GeneratedSentences)
2.GeneralizationAbility
Canthemodelgeneratesentencesforparametercombinationsithasnotseenintraining?
Evaluation(GeneratedSentences)
theme:plot andpersonal:true
75,421
Weremovedfromthetrainingsetabout~75Ksentenceswhichwerelabeledastheme:plot andpersonal:true
336,567 477,738
Evaluation(GeneratedSentences)
theme:plot andpersonal:true
75,421
Weremovedfromthetrainingsetabout~75Ksentenceswhichwerelabeledastheme:plot andpersonal:true
336,567 477,738
Wedon’ttrainonthese
Evaluation(GeneratedSentences)
Wethenaskedthetrainedmodeltogeneratesentenceswiththeme:plot andpersonal:true
Evaluation(GeneratedSentences)
Wethenaskedthetrainedmodeltogeneratesentenceswiththeme:plot andpersonal:true
Results100%ofthegeneratedsentencesindeedcontainedpersonalpronouns
82.4%ofthemfitthetheme:plot criteria(Theresultachievedbythefullmodelis97.8%)
Evaluation(GeneratedSentences)
Examples:
“Somepartsweren’tasgoodasI thoughtitwouldbeandtheactingandscript wereamazing.”
“I reallylikedthestory andtheperformanceswerelikableandthechemistrybetweenthetwoleadsisgreat.”
ComparisontoPreviousWork
ComparisontoPreviousWork
Mostworkfocusoncontent thatneedtobeconveyedinthegeneratedtext
• Reviewsgenerationconditionedoncategoryandnumericratingscores(Liptonetal.,2015;Tangetal.,2016)
• Dialoggenerationconditionedonadialogact(“request”,“inform”)andinformationtobeconveyed(“price=low,food=italian,near=citycenter”)(Wenetal.,2015;Dusek andJurcicek ,2016b,a)
ComparisontoPreviousWork
• Recipegenerationconditionedonalistofingredients(Kiddon etal.,2016)
• TextualbiographiesgenerationconditionedonWikipediainfoboxes(Lebret etal.,2016)
ComparisontoPreviousWork
Generationconditionedonstyle:• Indialoggeneration,Lietal.(2016)conditionthetextonthespeaker’sidentity(age,gender,location)forimprovingthefactualconsistencyoftheutterances
• InMachineTranslation,Sennrich etal.(2016a)modeltranslatesEnglishtoGermanwithafeaturethatencodeswhetherthegeneratedtext(inGerman)shouldexpresspoliteness.
ComparisontoPreviousWork
• Huetal.(2017)thattacklesthesameproblemasours:conditioningonmultipleaspectsofthegeneratedtext.TheirmodelfeaturesaVAEbasedmethodcoupledwithadiscriminatornetwork.Huetal.(2017)restrictthemselvestosentencesofuptolength16,andonlytwoconditioningaspects(sentimentandtense).
Summary
• Mostworkonneuralnaturallanguagegenerationfocusoncontrollingthecontent ofthegeneratedtext
• Weexperimentwithcontrollingseveral stylistic aspectsofthegeneratedtext,inadditiontoitscontent
• ThemethodisbasedonconditionedRNNlanguagemodel• Simplebutveryeffective!
• Wedemonstratetheapproachonthemoviereviewsdomain• Weshowthatitissuccessfulingeneratingcoherentsentencescorrespondingtotherequiredlinguisticstyleandcontent