neural mt - columbia universitykathy/nlp/2017/classslides/... · • police body-worn cameras have...
TRANSCRIPT
NeuralMT
Announcements• HW2directorystructurepenaltytoberemovedduetogradinginconsistencies.• Thosewholost15pointswillgain15points
• DanJurafskywillaAendthebeginningofclassnextTuesday• BepreparedwithquesEons.Yourchance!!!
• RupalPatel:Monday,Dec.4th,11:30,Davis
• DataScienceIns,tuteColloquiumSeriesEvent:DANJURAFSKY,STANFORDUNIVERSITY|Tuesday,December5that5PMinDavisAuditorium(412CEPSR)
• "DoesThisVehicleBelongtoYou?"ProcessingtheLanguageofPolicingforImprovingPolice-CommunityRela,ons
• • ABSTRACT• Policebody-worncamerashavethepotenEaltoplayanimportantroleinunderstandingandimprovingpolice-communityrelaEons.InthistalkIdescribeaseriesofstudiesconductedbyourlargeinterdisciplinaryteamatStanfordthatusespeechandnaturallanguageprocessingonbody-camerarecordingstomodeltheinteracEonsbetweenpoliceofficersandcommunitymembersintrafficstops.WeusetextandspeechfeaturestoautomaEcallymeasurelinguisEcaspectsoftheinteracEon,fromdiscoursefactorslikeconversaEonalstructuretosocialfactorslikerespect.Idescribethedifferenceswefindinthelanguagedirectedtowardblackversuswhitecommunitymembers,andoffersuggesEonsforhowthesefindingscanbeusedtohelpimprovethefraughtrelaEonsbetweenpoliceofficersandthecommuniEestheyserve.
Today• MulElingualChallengesforMT• MTApproaches• StaEsEcal• Neuralnet(Thursday)
• MTEvaluaEon
MTEvaluation• Moreartthanscience• WiderangeofMetrics/Techniques• interface,…,scalability,…,faithfulness,...space/Emecomplexity,…etc.
• AutomaEcvs.Human-based• DumbMachinesvs.SlowHumans
SlidefromNizarHabash
5 contents of original sentence conveyed (might need minor corrections)
4 contents of original sentence conveyed BUT errors in word order
3 contents of original sentence generally conveyed BUT errors in relationship between phrases, tense, singular/plural, etc.
2 contents of original sentence not adequately conveyed, portions of original sentence incorrectly translated, missing modifiers
1 contents of original sentence not conveyed, missing verbs, subjects, objects, phrases or clauses
Human-basedEvalua,onExampleAccuracyCriteria
SlidefromNizarHabash
5 clear meaning, good grammar, terminology and sentence structure
4 clear meaning BUT bad grammar, bad terminology or bad sentence structure
3 meaning graspable BUT ambiguities due to bad grammar, bad terminology or bad sentence structure
2 meaning unclear BUT inferable
1 meaning absolutely unclear
Human-basedEvalua,onExampleFluencyCriteria
SlidefromNizarHabash
Today:Crowdsourcing• AmazonMechanicalTurkorCrowdFlower• CreateaHITforeachsentence• GetmulEpleworkerstorate• Pay.01to.10perhit• CompleteanevaluaEoninhours(vsdays/weeks)• Ethics?
AutomaticEvaluationExampleBleuMetric(Papinenietal2001)
• Bleu• BiLingualEvalua;onUnderstudy• Modifiedn-gramprecisionwithlengthpenalty• Quick,inexpensiveandlanguageindependent• CorrelateshighlywithhumanevaluaEon• BiasagainstsynonymsandinflecEonalvariaEons
SlidefromNizarHabash
AutomaticEvaluationExampleBleuMetric
TestSentence
colorlessgreenideassleepfuriously
GoldStandardReferences
alldulljadeideassleepiratelydrabemeraldconceptssleepfuriouslycolorlessimmaturethoughtsnapangrily
SlidefromNizarHabash
AutomaticEvaluationExampleBleuMetric
TestSentence
colorlessgreenideassleepfuriously
GoldStandardReferences
alldulljadeideassleepiratelydrabemeraldconceptssleepfuriouslycolorlessimmaturethoughtsnapangrily
Unigramprecision=4/5
SlidefromNizarHabash
AutomaticEvaluationExampleBleuMetric
TestSentence
colorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriously
GoldStandardReferences
alldulljadeideassleepiratelydrabemeraldconceptssleepfuriouslycolorlessimmaturethoughtsnapangrily
Unigramprecision=4/5=0.8Bigramprecision=2/4=0.5
BleuScore=(a1a2…an)1/n=(0.8╳0.5)½=0.6325è63.25
SlidefromNizarHabash
BLEUscoresfor110translationsystemstrainedonEuroparl
Koehn,MTSummit,2005hAp://homepages.inf.ed.ac.uk/pkoehn/publicaEons/europarl-mtsummit05.pdf
AutomaticEvaluationExampleMETEOR(LavieandAgrawal2007)
• MetricforEvaluaEonofTranslaEonwithExplicitwordOrdering• ExtendedMatchingbetweentranslaEonandreference• Porterstems,wordNetsynsets
• UnigramPrecision,Recall,parameterizedF-measure• ReorderingPenalty• ParameterscanbetunedtoopEmizecorrelaEonwithhumanjudgments• Notbiasedagainst“non-staEsEcal”MTsystems
SlidefromNizarHabash
MetricsMATRWorkshop• WorkshopinAMTAconference2008• AssociaEonforMachineTranslaEonintheAmericas
• EvaluaEngevaluaEonmetrics• Compared39metrics• 7baselinesand32newmetrics• VariousmeasuresofcorrelaEonwithhumanjudgment• DifferentcondiEons:textgenre,sourcelanguage,numberofreferences,etc.
SlidefromNizarHabash
AutomaticEvaluationExampleSEPIA(HabashandElKholy2008)• AsyntacEcally-awareevaluaEonmetric• (LiuandGildea,2005;Owczarzaketal.,2007;GiménezandMàrquez,2007)
• UsesdependencyrepresentaEon• MICAparser(Nasr&Rambow2006)• 77%ofallstructuralbigramsaresurfacen-gramsofsize2,3,4
• Includesdependencysurfacespanasafactorinscore• long-distancedependenciesshouldreceiveagreaterweightthanshortdistancedependencies• HigherdegreeofgrammaEcality? 0%
5%10%15%20%25%30%35%40%45%50%
1 2 3 4plus
NeuralMTtakesover• WMT(WorkshoponMachineTranslaEon)• 2015–firstneuralMT,lowerbleuresults
• 2016:neuralMTbeatsphrase-basedandsyntax-based
0
5
10
15
20
25
30
2015 2016 2017
NeuralMT
Phrasebased
ResultsfromWMT(WorkshoponMachineTransla,on)GermantoEnglish2015:Montreal2016and2017:Edinburgh
WMT2017• Tasks• NewstranslaEon• QualityesEmaEon• AutomaEcpost-ediEng• Metrics• MulEmodalMTandmulElingualimagedescripEon• BiomedicaltranslaEon
NewsTranslationTask• 7languages,14tasks(fromandintoEnglish)• Chinese• Czech• German• Finnish• Latvian• Russian• Turkish
• Testdata:3000sentencesperlanguagepairexceptLatvian:2000sentences
TrainingData• Europarl• CommonCrawl• YandexRussian-Englishdata• WikipediaHeadlines• UnitedNaEons• NewsCommentaryV12• EUPressReleaseparallelcorpusforGerman,FinnishandLatvian
SubmittedSystems• 103systemsfrom31insEtuEons(nocompanies)• CompanyreleasesofNeuralMT• Microsou:February2016• Systran:August2016• Google:September2016
HumanEvaluation• Assessonadequacyalonga100pointscale(DirectAssessment)(vsRelaEveRanking)• HowadequatelydoesthetranslaEonexpressthemeaningofthereferencetranslaEon?
• OnetranslaEonperscreen/hit
• 151individualResearchers• 29differentgroups• Contributed12,693translaEonscores• 24days,22hours
• 754AMTworkers• Contributed237,200scores• 47days,23hours
SomeResults
Today• MulElingualChallengesforMT• MTApproaches• StaEsEcal• Neuralnet(Thursday)
• MTEvaluaEon
Encoder-DecoderApproach
BasicRNNApproach
das ist fur
h1 h2 h3
x1 X2√ X3√
Y1 Y2 Y3
That is almost
ENCODER
DECODER
BasicRNNApproach
das ist fur
h1 h2 h3
x1 X2√ X3√
Y1 Y2 Y3
That is almost
ENCODER
DECODER
EnEreinputrepresentedhere
Recurrentdecoderbut
das ist fur
h1 h2 h3
x1 X2√ X3√
Y1 Y2 Y3
That is almost
ENCODER
DECODER
TransiEonzt=f(zt-1,yT-1,hn)BackpropagaEon=Σtδzt/δh
zt zt zt
Choetal2014
ResultsforLongFrequentPhrases
Choetal2014
OtherVariants:Trainweightsseparately
das ist fur
h1 h2 h3
x1 X2√ X3√
Y1 Y2 Y3
That is almost
ENCODER
DECODER
AlsoUseful• TrainstackedRNNSusingmulEplelayers• UseabidirecEonalencoder• Thiscanhelpinrememberingtheearlypartofthesourceinputsentence
• Traintheinputsequenceinreverseorder:S1S2S3->T1T2T3wouldbetrainedasS3S2S1->T1T2T3• Why?
ReplacingRNNwithLSTMimprovesperformancefurther
AligningandTranslating
[Bahdanau,Cho,BengioICLR2015]
AttentionMechanism-Scoring
das ist fur
h1 h2 h3
x1 X2√ X3√
Y1 Y2 Y3
That ?
ENCODER
DECODER
H’1 H’2 H’3
Score(h’t-1,hs)
3
AttentionMechanism-Scoring
das ist fur
h1 h2 h3
x1 X2√ X3√
Y1 Y2 Y3
That ?
ENCODER
DECODER
H’1 H’2 H’3
Score(h’t-1,hs)
3 5
AttentionMechanism-Scoring
das ist fur
h1 h2 h3
x1 X2√ X3√
Y1 Y2 Y3
That ?
ENCODER
DECODER
H’1 H’2 H’3
Score(h’t-1,hs)
3 5 1
AttentionMechanism-Scoring
das ist fur
h1 h2 h3
x1 X2√ X3√
Y1 Y2 Y3
That ?
DECODER
H’1 H’2 H’3
Convertintoalignmentweights
.3 .5 .1αt
AttentionMechanism-Scoring
das ist fur
h1 h2 h3
x1 X2√ X3√
Y1 Y2 Y3
That ?
DECODER
H’1 H’2 H’3
Buildcontextvector:weightedaverage
αtct Ct=Σsαt(s)hs
Howdoyouscoreit?
das ist fur
h1 h2 h3
x1 X2√ X3√
Y1 Y2 Y3
?
DECODER
H’1 H’2 H’3
Score(hs,H’t)=H’tThsor=H’tTWαhs(Luongetal2015)
αtct
Performance• WithoutaAenEon,LSTMworksquitewellunElasentencegetslongerthan30words• AAenEondoesbeAer,however,evenwithshortersentences• OthertricksinWMT2017:• Improvementsof1.5–3bluepoints(Edin)• LayernormalizaEon,deepernetworks(encoderdepthof5,decoderdepthof8)• BasePhraseEncodings(BPE)• Reducedvocabularyimprovesmemoryefficiency
• Data:parallel,back-translated,duplicatedmonolingual
Questions?
InformationExtraction• ExtracEonofconcretefactsfromtext
• NamedenEEes,relaEons,events• Ouenusedtocreateastructuredknowledgebaseoffacts
• KathyMcKeown,aprofessorfromColumbiaUniversityinNewYorkCity,tookatrainyesterdaytoWashingtonDC.
NamedEntities• KathyMcKeownper,aprofessorfromColumbiaUniversityorginNewYorkCityloc,tookatrainyesterdaytoWashingtonDCloc.
NamedEntities,Relations• KathyMcKeownper,aprofessorfromColumbiaUniversityorginNewYorkCityloc,tookatrainyesterdaytoWashingtonDCloc.• KathyMcKeownfromColumbia• ColumbiainNewYorkCity
NamedEntities,Relations,Events• KathyMcKeownper,aprofessorfromColumbiaUniversityorginNewYorkCityloc,tookatrainyesterdaytoWashingtonDCloc.• KathyMcKeowntookatrain(yesterday)
EntityDiscoveryandLinking• KathyMcKeown,aprofessorfromColumbiaUniversityinNewYorkCity,tookatrainyesterdaytoWashingtonDC.
StateoftheArt(English)
• NamedEnEEes(news)• RelaEons(slotfilling)• Events(nuggets)
F-measure
• 89%• 59%• 63%
Methods:Sequencelabeling(MEMM,CRF),neuralnets,distantlearningFeatures:linguisEcfeatures,similarity,popularity,gazeteers,ontologies,verbtriggers
WhereHaveYouBeenEntityDiscoveryandLinking?GrowwithDEFT 2006-2011 2012-2017HENGJI,RPIMenEonExtracEon Human(most) AutomaEc
NILClustering None 64methods
ForeignLanguages Chinese(5%-10%lowerthanEnglish)
Systemfor282languages(Chinese/Spanishcomparableto/OutperformEnglish);researchtoward3,000languages
DocumentSize - 500à90,000documents
Genre News,webblog News,DiscussionForum,Webblog,Tweets
EnEtyTypes PER,GPE,ORG PER,GPE,ORG,LOC,FAC,hundredsoffine-grainedtypesfortyping
MenEonTypes Nameorallconcepts(most)
Name,Nominal,Pronoun(forBeST)
KB Wikipedia FreebaseàListonly
TrainingData 20,000queries(enEtymenEons)
500à0documents;unsupervisedlinkingcomparabletosupervisedlinking
#(Good)Papers 62 110(newKBPtrackatACL);6tutorialsattopconferences
SlidefromHengJi
OntheHorizon:EntityDiscoveryandLinkingPanel:HoaTrangDang,JasonDuncan,HengJi,KevinKnight,ChristopherManning,DanRoth
DEFTPIMee,ng10;30am-11:30amMay25,2017
• Amgoingcrazy• 3,000languages• 10,000enEtytypes• AllmenEontypes• MulE-media• Streamingmode• List-onlyKB• Context-aware,living• Nomoretrainingdata• On-callevaluaEon• Morenon-tradiEonalknowledgeresources
• Lotsofdevandtestsetsinlotsoflanguages
• Amstayingcool• Successinend-to-endcold-startKBP
• What’ssEllwrongwithnametagging
• SmartercollecEveinference
• ResoluEonoftruealiases• ResoluEonofhandlesusedasenEtymenEons
SlidefromHengJi