neural mt - columbia universitykathy/nlp/2017/classslides/... · • police body-worn cameras have...

NeuralMT

Announcements• HW2directorystructurepenaltytoberemovedduetogradinginconsistencies.•  Thosewholost15pointswillgain15points

• DanJurafskywillaAendthebeginningofclassnextTuesday• BepreparedwithquesEons.Yourchance!!!

• RupalPatel:Monday,Dec.4th,11:30,Davis

•  DataScienceIns,tuteColloquiumSeriesEvent:DANJURAFSKY,STANFORDUNIVERSITY|Tuesday,December5that5PMinDavisAuditorium(412CEPSR)

•  "DoesThisVehicleBelongtoYou?"ProcessingtheLanguageofPolicingforImprovingPolice-CommunityRela,ons

•  •  ABSTRACT•  Policebody-worncamerashavethepotenEaltoplayanimportantroleinunderstandingandimprovingpolice-communityrelaEons.InthistalkIdescribeaseriesofstudiesconductedbyourlargeinterdisciplinaryteamatStanfordthatusespeechandnaturallanguageprocessingonbody-camerarecordingstomodeltheinteracEonsbetweenpoliceofficersandcommunitymembersintrafficstops.WeusetextandspeechfeaturestoautomaEcallymeasurelinguisEcaspectsoftheinteracEon,fromdiscoursefactorslikeconversaEonalstructuretosocialfactorslikerespect.Idescribethedifferenceswefindinthelanguagedirectedtowardblackversuswhitecommunitymembers,andoffersuggesEonsforhowthesefindingscanbeusedtohelpimprovethefraughtrelaEonsbetweenpoliceofficersandthecommuniEestheyserve.

Today• MulElingualChallengesforMT• MTApproaches•  StaEsEcal• Neuralnet(Thursday)

• MTEvaluaEon

MTEvaluation• Moreartthanscience• WiderangeofMetrics/Techniques•  interface,…,scalability,…,faithfulness,...space/Emecomplexity,…etc.

• AutomaEcvs.Human-based• DumbMachinesvs.SlowHumans

SlidefromNizarHabash

5 contents of original sentence conveyed (might need minor corrections)

4 contents of original sentence conveyed BUT errors in word order

3 contents of original sentence generally conveyed BUT errors in relationship between phrases, tense, singular/plural, etc.

2 contents of original sentence not adequately conveyed, portions of original sentence incorrectly translated, missing modifiers

1 contents of original sentence not conveyed, missing verbs, subjects, objects, phrases or clauses

Human-basedEvalua,onExampleAccuracyCriteria


5 clear meaning, good grammar, terminology and sentence structure

4 clear meaning BUT bad grammar, bad terminology or bad sentence structure

3 meaning graspable BUT ambiguities due to bad grammar, bad terminology or bad sentence structure

2 meaning unclear BUT inferable

1 meaning absolutely unclear

Human-basedEvalua,onExampleFluencyCriteria


Today:Crowdsourcing• AmazonMechanicalTurkorCrowdFlower• CreateaHITforeachsentence• GetmulEpleworkerstorate• Pay.01to.10perhit• CompleteanevaluaEoninhours(vsdays/weeks)• Ethics?

AutomaticEvaluationExampleBleuMetric(Papinenietal2001)

• Bleu•  BiLingualEvalua;onUnderstudy•  Modifiedn-gramprecisionwithlengthpenalty•  Quick,inexpensiveandlanguageindependent•  CorrelateshighlywithhumanevaluaEon•  BiasagainstsynonymsandinflecEonalvariaEons


AutomaticEvaluationExampleBleuMetric

TestSentence

colorlessgreenideassleepfuriously

GoldStandardReferences

alldulljadeideassleepiratelydrabemeraldconceptssleepfuriouslycolorlessimmaturethoughtsnapangrily



TestSentence

colorlessgreenideassleepfuriously



Unigramprecision=4/5



TestSentence

colorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriously



Unigramprecision=4/5=0.8Bigramprecision=2/4=0.5

BleuScore=(a1a2…an)1/n=(0.8╳0.5)½=0.6325è63.25


BLEUscoresfor110translationsystemstrainedonEuroparl

Koehn,MTSummit,2005hAp://homepages.inf.ed.ac.uk/pkoehn/publicaEons/europarl-mtsummit05.pdf

AutomaticEvaluationExampleMETEOR(LavieandAgrawal2007)

•  MetricforEvaluaEonofTranslaEonwithExplicitwordOrdering•  ExtendedMatchingbetweentranslaEonandreference•  Porterstems,wordNetsynsets

•  UnigramPrecision,Recall,parameterizedF-measure•  ReorderingPenalty•  ParameterscanbetunedtoopEmizecorrelaEonwithhumanjudgments•  Notbiasedagainst“non-staEsEcal”MTsystems


MetricsMATRWorkshop• WorkshopinAMTAconference2008•  AssociaEonforMachineTranslaEonintheAmericas

•  EvaluaEngevaluaEonmetrics• Compared39metrics•  7baselinesand32newmetrics•  VariousmeasuresofcorrelaEonwithhumanjudgment•  DifferentcondiEons:textgenre,sourcelanguage,numberofreferences,etc.


AutomaticEvaluationExampleSEPIA(HabashandElKholy2008)•  AsyntacEcally-awareevaluaEonmetric•  (LiuandGildea,2005;Owczarzaketal.,2007;GiménezandMàrquez,2007)

•  UsesdependencyrepresentaEon•  MICAparser(Nasr&Rambow2006)•  77%ofallstructuralbigramsaresurfacen-gramsofsize2,3,4

•  Includesdependencysurfacespanasafactorinscore•  long-distancedependenciesshouldreceiveagreaterweightthanshortdistancedependencies•  HigherdegreeofgrammaEcality? 0%

5%10%15%20%25%30%35%40%45%50%

1 2 3 4plus

NeuralMTtakesover• WMT(WorkshoponMachineTranslaEon)• 2015–firstneuralMT,lowerbleuresults

• 2016:neuralMTbeatsphrase-basedandsyntax-based

0

5

10

15

20

25

30

2015 2016 2017

NeuralMT

Phrasebased

ResultsfromWMT(WorkshoponMachineTransla,on)GermantoEnglish2015:Montreal2016and2017:Edinburgh

WMT2017• Tasks• NewstranslaEon• QualityesEmaEon• AutomaEcpost-ediEng• Metrics• MulEmodalMTandmulElingualimagedescripEon• BiomedicaltranslaEon

NewsTranslationTask• 7languages,14tasks(fromandintoEnglish)•  Chinese•  Czech•  German•  Finnish•  Latvian•  Russian•  Turkish

• Testdata:3000sentencesperlanguagepairexceptLatvian:2000sentences

TrainingData• Europarl• CommonCrawl• YandexRussian-Englishdata• WikipediaHeadlines• UnitedNaEons• NewsCommentaryV12• EUPressReleaseparallelcorpusforGerman,FinnishandLatvian

SubmittedSystems• 103systemsfrom31insEtuEons(nocompanies)• CompanyreleasesofNeuralMT• Microsou:February2016•  Systran:August2016• Google:September2016

HumanEvaluation•  Assessonadequacyalonga100pointscale(DirectAssessment)(vsRelaEveRanking)•  HowadequatelydoesthetranslaEonexpressthemeaningofthereferencetranslaEon?

•  OnetranslaEonperscreen/hit

•  151individualResearchers•  29differentgroups•  Contributed12,693translaEonscores•  24days,22hours

•  754AMTworkers•  Contributed237,200scores•  47days,23hours

SomeResults

Today• MulElingualChallengesforMT• MTApproaches•  StaEsEcal• Neuralnet(Thursday)

• MTEvaluaEon

Encoder-DecoderApproach

BasicRNNApproach

das ist fur

h1 h2 h3

x1 X2√ X3√

Y1 Y2 Y3

That is almost

ENCODER

DECODER

BasicRNNApproach

das ist fur

h1 h2 h3

x1 X2√ X3√

Y1 Y2 Y3

That is almost

ENCODER

DECODER

EnEreinputrepresentedhere

Recurrentdecoderbut

das ist fur

h1 h2 h3

x1 X2√ X3√

Y1 Y2 Y3

That is almost

ENCODER

DECODER

TransiEonzt=f(zt-1,yT-1,hn)BackpropagaEon=Σtδzt/δh

zt zt zt

Choetal2014

ResultsforLongFrequentPhrases

Choetal2014

OtherVariants:Trainweightsseparately

das ist fur

h1 h2 h3

x1 X2√ X3√

Y1 Y2 Y3

That is almost

ENCODER

DECODER

AlsoUseful• TrainstackedRNNSusingmulEplelayers• UseabidirecEonalencoder•  Thiscanhelpinrememberingtheearlypartofthesourceinputsentence

• Traintheinputsequenceinreverseorder:S1S2S3->T1T2T3wouldbetrainedasS3S2S1->T1T2T3• Why?

ReplacingRNNwithLSTMimprovesperformancefurther

AligningandTranslating

[Bahdanau,Cho,BengioICLR2015]

AttentionMechanism-Scoring

das ist fur

h1 h2 h3

x1 X2√ X3√

Y1 Y2 Y3

That ?

ENCODER

DECODER

H’1 H’2 H’3

Score(h’t-1,hs)

3


das ist fur

h1 h2 h3

x1 X2√ X3√

Y1 Y2 Y3

That ?

ENCODER

DECODER

H’1 H’2 H’3

Score(h’t-1,hs)

3 5


das ist fur

h1 h2 h3

x1 X2√ X3√

Y1 Y2 Y3

That ?

ENCODER

DECODER

H’1 H’2 H’3

Score(h’t-1,hs)

3 5 1


das ist fur

h1 h2 h3

x1 X2√ X3√

Y1 Y2 Y3

That ?

DECODER

H’1 H’2 H’3

Convertintoalignmentweights

.3 .5 .1αt


das ist fur

h1 h2 h3

x1 X2√ X3√

Y1 Y2 Y3

That ?

DECODER

H’1 H’2 H’3

Buildcontextvector:weightedaverage

αtct Ct=Σsαt(s)hs

Howdoyouscoreit?

das ist fur

h1 h2 h3

x1 X2√ X3√

Y1 Y2 Y3

?

DECODER

H’1 H’2 H’3

Score(hs,H’t)=H’tThsor=H’tTWαhs(Luongetal2015)

αtct

Performance• WithoutaAenEon,LSTMworksquitewellunElasentencegetslongerthan30words•  AAenEondoesbeAer,however,evenwithshortersentences• OthertricksinWMT2017:•  Improvementsof1.5–3bluepoints(Edin)•  LayernormalizaEon,deepernetworks(encoderdepthof5,decoderdepthof8)•  BasePhraseEncodings(BPE)•  Reducedvocabularyimprovesmemoryefficiency

•  Data:parallel,back-translated,duplicatedmonolingual

Questions?

InformationExtraction• ExtracEonofconcretefactsfromtext

• NamedenEEes,relaEons,events• Ouenusedtocreateastructuredknowledgebaseoffacts

• KathyMcKeown,aprofessorfromColumbiaUniversityinNewYorkCity,tookatrainyesterdaytoWashingtonDC.

NamedEntities• KathyMcKeownper,aprofessorfromColumbiaUniversityorginNewYorkCityloc,tookatrainyesterdaytoWashingtonDCloc.

NamedEntities,Relations• KathyMcKeownper,aprofessorfromColumbiaUniversityorginNewYorkCityloc,tookatrainyesterdaytoWashingtonDCloc.• KathyMcKeownfromColumbia• ColumbiainNewYorkCity

NamedEntities,Relations,Events• KathyMcKeownper,aprofessorfromColumbiaUniversityorginNewYorkCityloc,tookatrainyesterdaytoWashingtonDCloc.• KathyMcKeowntookatrain(yesterday)

EntityDiscoveryandLinking• KathyMcKeown,aprofessorfromColumbiaUniversityinNewYorkCity,tookatrainyesterdaytoWashingtonDC.

StateoftheArt(English)

•  NamedEnEEes(news)•  RelaEons(slotfilling)•  Events(nuggets)

F-measure

•  89%•  59%•  63%

Methods:Sequencelabeling(MEMM,CRF),neuralnets,distantlearningFeatures:linguisEcfeatures,similarity,popularity,gazeteers,ontologies,verbtriggers

WhereHaveYouBeenEntityDiscoveryandLinking?GrowwithDEFT 2006-2011 2012-2017HENGJI,RPIMenEonExtracEon Human(most) AutomaEc

NILClustering None 64methods

ForeignLanguages Chinese(5%-10%lowerthanEnglish)

Systemfor282languages(Chinese/Spanishcomparableto/OutperformEnglish);researchtoward3,000languages

DocumentSize - 500à90,000documents

Genre News,webblog News,DiscussionForum,Webblog,Tweets

EnEtyTypes PER,GPE,ORG PER,GPE,ORG,LOC,FAC,hundredsoffine-grainedtypesfortyping

MenEonTypes Nameorallconcepts(most)

Name,Nominal,Pronoun(forBeST)

KB Wikipedia FreebaseàListonly

TrainingData 20,000queries(enEtymenEons)

500à0documents;unsupervisedlinkingcomparabletosupervisedlinking

#(Good)Papers 62 110(newKBPtrackatACL);6tutorialsattopconferences

SlidefromHengJi

OntheHorizon:EntityDiscoveryandLinkingPanel:HoaTrangDang,JasonDuncan,HengJi,KevinKnight,ChristopherManning,DanRoth

DEFTPIMee,ng10;30am-11:30amMay25，2017

•  Amgoingcrazy•  3,000languages•  10,000enEtytypes•  AllmenEontypes•  MulE-media•  Streamingmode•  List-onlyKB•  Context-aware,living•  Nomoretrainingdata•  On-callevaluaEon•  Morenon-tradiEonalknowledgeresources

•  Lotsofdevandtestsetsinlotsoflanguages

•  Amstayingcool•  Successinend-to-endcold-startKBP

•  What’ssEllwrongwithnametagging

•  SmartercollecEveinference

•  ResoluEonoftruealiases•  ResoluEonofhandlesusedasenEtymenEons

SlidefromHengJi

neural mt - columbia universitykathy/nlp/2017/classslides/... · • police body-worn cameras have...

Documents