6-2 smt 3 - cs.toronto.edufrank/csc401/lectures2017/6-2_smt_3.pdf · •we also need to learnthe...

46

Upload: hadan

Post on 05-Sep-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

CSC401/2511– Spring2017 2

Today• OverviewofIBM-2,IBM-3,andphrase-basedmethods.

• DecodingforSMT.

• EvaluationofMTsystems.

CSC401/2511– Spring2017 3

PracticalnoteonprogrammingIBM-1

• IfyouweretocodetheEMalgorithmforIBM-1,youwouldnot initialize𝜃 = 𝑃(𝑓|𝑒) uniformlyovertheentire vocabulary.• Don’tmakea𝑉*×𝑉, tablewith𝑃 𝑓 𝑒 = 1/ 𝑉,

• Thisstructurewouldbetoolarge.• Probabilitieswouldbetoosmall.• Itwouldtaketoomuchwork toupdate.

• Rather,initializeahashtableoverpossible alignments,ℳ.ForeveryEnglishword𝑒,onlyconsiderFrenchwords𝑓 insentencesalignedwithEnglishsentencescontaining𝑒.

• e.g.,structureP. 𝑒. 𝑓 ≔ 𝑃 𝑓 𝑒 = 1/ ℳ

CSC401/2511– Spring2017 4

HigherIBMmodels

• OnlyIBMModel1trainingreachesaglobalmaximum• TrainingofeachIBMmodelextends thenextlowestmodel.

• Highermodelsbecomecomputationallyexpensive.

IBMModel1 lexicaltranslationIBMModel2 addsabsolutere-orderingmodelIBMModel3 addsfertilitymodel… …

CSC401/2511– Spring2017 5

IBM-2• Unlike IBMModel-1,theplacementofawordin,say,Spanish inIBMModel-2dependsonwhereitsequivalent wordwasinEnglish.• IBM-2capturestheintuitionthattranslationsshouldlieroughly“alongthediagonal”.

Buenos dias , me gusta papas frías

Good X

day X

, X

I X

like X

cold X

potatoes X

CSC401/2511– Spring2017 6

IBM-2• IBMModel2builds onModel1byaddingare-orderingmodeldefinedbydistortion parametersregardlessofactualwords.

𝐷 𝑖 𝑗, ℒ,, ℒ* =theprobabilitythatthe𝑖89 English slotisalignedtothe𝑗89 French slot,givensentencelengthsℒ, andℒ*.

• InIBMModel2:

𝑃 𝑎 𝐸, ℒ,, ℒ* = =𝐷(𝑎>|𝑗, ℒ,, ℒ*)ℒ?

>@A• RecallthatinIBMModel1,

𝑃 𝑎 𝐸, ℒ,, ℒ* =𝑃(ℒ*)ℒ, + 1 ℒ?

CSC401/2511– Spring2017 7

IBM-2– Probabilityofalignment• 𝐸 = Andtheprogramhasbeenimplemented• 𝐹 = Leprogrammeaété mis enapplication• ℒ, = 6• ℒ* = 7• 𝑎 = {2,3,4,5,6,6,6} (i.e.,𝑓A ← 𝑒M,𝑓M ← 𝑒N,…)

• 𝑃(𝑎|𝐸, ℒ,, ℒ*)=𝐷 2 1,6,7 ×𝐷 3 2,6,7 ×𝐷 4 3,6,7 ×𝐷 5 4,6,7 ×𝐷 6 5,6,7 ×𝐷 6 6,6,7 ×𝐷(6|7,6,7)

D(2nd Englishword|1st Frenchword,…)

Thisisindependentoftheactualwords.Thiscaresonlyaboutposition.

CSC401/2511– Spring2017 8

IBM-2:generation

• TogenerateaFrenchsentence𝐹 fromEnglish𝐸,1. Pickanalignmentwithprobability

∏ 𝑫(𝒂𝒋|𝒋, 𝓛𝑬, 𝓛𝑭)𝓛𝑭𝒋@𝟏

3. Sample Frenchwordswithprobability

𝑃 𝐹 𝑎, 𝐸 ==𝑃(𝑓>|𝑒WX)ℒ?

>@A

𝑃 𝐹, 𝑎 𝐸 = 𝑃 𝑎 𝐸 𝑃 𝐹 𝑎, 𝐸 ==𝑫(𝒂𝒋|𝒋, 𝓛𝑬, 𝓛𝑭)𝑃(𝑓>|𝑒WX)ℒ?

>@A

So,

Thisisthesame𝑃(𝑓|𝑒)asinIBM-1.

CSC401/2511– Spring2017 9

IBM-2:training

• WeuseEM,asbeforewithIBM-1except thatweneedtotakethedistortion intoaccountwhencomputingtheprobabilityofanalignment.

• Wealsoneedtolearn thedistortionfunction.

• Aren’tyougladthatyoudon’tneedtoknowhowtocomputeEMforIBM-2?

CSC401/2511– Spring2017 10

IBM-3• IBMModel3extendsModel2byaddingafertilitymodelthatdescribeshowmanyFrench wordseachEnglish wordcanproduce.• Intheexamplebelow,implemented appearstobemorefertilethanprogram.

𝑒A 𝑒M 𝑒N 𝑒Y 𝑒Z 𝑒[

𝑓A 𝑓M 𝑓N 𝑓Y 𝑓Z 𝑓[ 𝑓\ 𝑓] 𝑓

𝑒_

CSC401/2511– Spring2017 11

IBM-3:Thegenerationmodel

𝑒A 𝑒M 𝑒N 𝑒Y 𝑒Z 𝑒[

𝑓A 𝑓M 𝑓N 𝑓Y 𝑓Z 𝑓[ 𝑓\ 𝑓] 𝑓

𝑒_

• First,wereplicateeachwordaccordingtoanewhiddenparameter,𝑵(𝒏|𝒆),whichistheprobabilitythatword𝒆 produces𝒏 words.• Wethenre-align (withdistortion)andtranslate aswedidinIBM-2.

𝑁(𝑛|𝑒)

CSC401/2511– Spring2017 12

IBMmodels

IBMModel1 lexicaltranslationIBMModel2 addsabsolutere-orderingmodelIBMModel3 addsfertilitymodel

CSC401/2511– Spring2017 13

Nomorewords Iwanttosee

phrases

CSC401/2511– Spring2017 14

Phrase-basedstatisticalMT• Phrase-based statisticalMTinvolvessegmentingsentencesintocontiguousblocksorsegments.• Eachphraseisprobabilisticallytranslated.

e.g.,𝑃 𝑧𝑢𝐻𝑎𝑢𝑠𝑠𝑒 𝑎𝑡ℎ𝑜𝑚𝑒• Eachphraseisprobabilisticallyre-ordered.

CSC401/2511– Spring2017 15

Phrase-basedstatisticalMT• Phrase-basedSMTallowsmany-to-many wordmappings.

• Largercontextallowsforsomedisambiguation thatisnotpossibleinword-basedalignment.• E.g.,

𝑃(coup|stroke)vs.

𝑃 coupdepoing|𝑝𝑢𝑛𝑐ℎ >𝑃 coupdepoing|strokeoffist𝑃 coupd’𝑜𝑒𝑖𝑙|𝑔𝑙𝑎𝑛𝑐𝑒 >𝑃 coupd’𝑜𝑒𝑖𝑙|strokeofeye

NocontextL

AtinyamountofcontextJ

CSC401/2511– Spring2017 16

Learningphrase-translations

• Typically,weusealignmenttemplates (Och etal.,1999).• Startwithaword-alignment,thenbuildphrases.

Maria no dió una bofetada a la bruja verde

Mary

did

not

slap

the

green

witch

Thisword-alignment isproducedbyamodellikeIBM-3

CSC401/2511– Spring2017 17

Learningphrase-translations

• Aphrasealignmentmust containall wordalignmentsforeachofitsrowsandcolumns.• Collectall phrasealignmentsthatareconsistentwiththewordalignment,e.g.

Consistent Inconsistent Inconsistent

CSC401/2511– Spring2017 18

Learningphrase-translations

• Givenword-alignments (producedautomaticallyorotherwise),wedonot needtodoEMtraining.E.g.,

• 𝑃 𝑓A𝑓M 𝑒A𝑒M𝑒N = stuv8(wxwy,zxzyz{)stuv8(zxzyz{)

CSC401/2511– Spring2017 19

Phrase-basedtranslationinpractice

CSC401/2511– Spring2017 20

Nomoremodels Iwanttosee

decoding

CSC401/2511– Spring2017 21

Decoding• Decoding istheactoftranslatinga‘foreign’languageintoyournativelanguage.• DecodingisanNP-completeproblem(Knight,1999).

• IBMModelsoftendecodedwithstackdecodingorA*search.

• Seminalpaper:U.Germann,M.Jahr,K.Knight,D.Marcu,K.Yamada(2001)FastDecodingandOptimalDecodingforMachineTranslation.In:ACL-2001.• Introducesgreedydecoding– startwithasolutionandincrementallytrytoimprove it.

CSC401/2511– Spring2017 22

Firststageofgreedymethod

• ForeachFrenchword𝑓>,picktheEnglishword𝑒∗ suchthat

𝑒∗ = argmaxz

𝑃(𝑓>|𝑒)

• Thisgivesaninitialalignment,e.g.,

(Better:quitenaturally,hetalksaboutagreatvictory)

Bien entendu , il parle d’ une belle victoireWell heard , it talking ∅ a beautiful victory

CSC401/2511– Spring2017 23

Sometransformations• 𝑪𝒉𝒂𝒏𝒈𝒆(𝒋, 𝒆): setstranslationof𝒇𝒋 to𝒆• UsuallyweonlyconsiderEnglishwords𝑒 thatarein

thetop𝑁 rankedtranslationsfor𝑓>.

• 𝑪𝒉𝒂𝒏𝒈𝒆𝟐(𝒋𝟏, 𝒆𝟏, 𝒋𝟐, 𝒆𝟐): setstranslationof𝒇𝒋𝟏 to𝒆𝟏andtranslationof𝒇𝒋𝟐 to𝒆𝟐

• Likeperformingtwo 𝐶ℎ𝑎𝑛𝑔𝑒 transformationsinsequence,butwithout evaluatingtheintermediatestring.

• 𝑪𝒉𝒂𝒏𝒈𝒆𝑨𝒏𝒅𝑰𝒏𝒔𝒆𝒓𝒕(𝒋, 𝒆𝟏, 𝒆𝟐): setstranslationof𝒇𝒋 to𝒆𝟏 andinserts𝒆𝟐 atitsmostlikelyposition.

CSC401/2511– Spring2017 24

Somemoretransformations• 𝑹𝒆𝒎𝒐𝒗𝒆𝑰𝒏𝒇𝒆𝒓𝒕𝒊𝒍𝒆(𝒊): Removes𝒆𝒊 if𝒆𝒊 isalignedwithno

Frenchwords.

• 𝑺𝒘𝒂𝒑𝑺𝒆𝒈 𝒊𝟏, 𝒊𝟐, 𝒋𝟏, 𝒋𝟐 : Swapssegment𝒆𝒊𝟏:𝒊𝟐 withsegment𝒆𝒋𝟏:𝒋𝟐 suchthatsegmentsdonotoverlap.

• 𝑱𝒐𝒊𝒏𝑾𝒐𝒓𝒅𝒔(𝒊𝟏, 𝒊𝟐): Removes𝒆𝒊𝟏 andaligns allFrenchwords thatwerealignedto𝒆𝒊𝟏 to𝒆𝒊𝟐.

CSC401/2511– Spring2017 25

Iteratinggreedily• Wehaveaninitialpair(𝐸 _ , 𝑎 _ ).

• Uselocaltransformations tomap(𝐸, 𝑎) tonewpairs,(𝐸�, 𝑎�).

• Ateachiteration,𝒌,takethehighestprobabilitypairfromallpossibletransformations• i.e.,if𝓡(𝑬 𝒌 , 𝒂 𝒌 ) isthesetofall(𝑬, 𝒂) ‘reachable’from(𝑬 𝒌 , 𝒂 𝒌 ),thenateachiteration:

𝐸 ��A , 𝑎 ��A = argmax,,W ∈𝓡(𝑬 𝒌 ,𝒂 𝒌 )

𝑃 𝐸 𝑃(𝐹, 𝑎|𝐸)

CSC401/2511– Spring2017 26

Exampleofgreedysearch

Bien intendu , il parle d’ une belle victoireWell heard , it talking ∅ a beautiful victory

Bien intendu , il parle d’ une belle victoireWell heard , it talks ∅ a great victory

𝑪𝒉𝒂𝒏𝒈𝒆𝟐(5, 𝑡𝑎𝑙𝑘𝑠, 8, 𝑔𝑟𝑒𝑎𝑡)

CSC401/2511– Spring2017 27

Exampleofgreedysearch

Bien intendu , il parle d’ une belle victoireWell heard , it talks ∅ a great victory

Bien intendu , il parle d’ une belle victoireWell understood , it talks about a great victory

𝑪𝒉𝒂𝒏𝒈𝒆𝟐(2, 𝑢𝑛𝑑𝑒𝑟𝑠𝑡𝑜𝑜𝑑, 6, 𝑎𝑏𝑜𝑢𝑡)

CSC401/2511– Spring2017 28

Exampleofgreedysearch

Bien intendu , il parle d’ une belle victoireWell understood , he talks 𝑎𝑏𝑜𝑢𝑡 a great victory

𝑪𝒉𝒂𝒏𝒈𝒆(4, ℎ𝑒)

Bien intendu , il parle d’ une belle victoireWell understood , it talks 𝑎𝑏𝑜𝑢𝑡 a great victory

CSC401/2511– Spring2017 29

Exampleofgreedysearch

Bien intendu , il parle d’ une belle victoireQuite naturally , he talks 𝑎𝑏𝑜𝑢𝑡 a great victory

𝑪𝒉𝒂𝒏𝒈𝒆𝟐(1, 𝑞𝑢𝑖𝑡𝑒, 2, 𝑛𝑎𝑡𝑢𝑟𝑎𝑙𝑙𝑦)

Bien intendu , il parle d’ une belle victoireWell understood , he talks 𝑎𝑏𝑜𝑢𝑡 a great victory

CSC401/2511– Spring2017 30

Greedytransformations• Ateachiteration,wetryeachpossibletransformation.

• Foreachpossibletransformation,weevaluate

𝑃 𝐸 𝑃 𝐹, 𝑎 𝐸

• Wechoosethetransformationthatgivesthehighestprobability,anditerateuntilsomestoppingcondition.

CSC401/2511– Spring2017 31

Nomoredecoding Iwanttosee

evaluation

CSC401/2511– Spring2017 32

EvaluationofMTsystems

Human AccordingtothedataprovidedtodaybytheMinistryofForeignTradeandEconomicCooperation,asofNovemberthis year,Chinahasactuallyutilized46.959BUSdollarsofforeigncapital,including40.007BUSdollarsofdirectinvestmentfromforeignbusinessmen.

IBM4 TheMinistry ofForeignTradeandEconomicCooperation,includingforeigndirectinvestment40.007BUSdollarstodayprovidedataincludethatyeartoNovemberChinaactuallyusingforeign46.959BUSdollarsand

Yamada/Knight

Today’savailabledataoftheMinistryofForeignTradeandEconomicCooperationshowsthatChina’sactualutilizationofNovemberthisyearwillinclude40.007B USdollarsfortheforeigndirectinvestmentamong46.959BUSdollarsinforeigncapital.

Howcanweobjectivelycomparethequalityoftwotranslations?

CSC401/2511– Spring2017 33

Automaticevaluation• Wewantanautomatic andeffectivemethodtoobjectively rankcompetingtranslations.• WordErrorRate(WER)measuresthenumberoferroneouswordinsertions,deletions,substitutionsinatranslation.• E.g., Reference:howto recognize speech

Translation:howunderstandaspeech

• Problem:Therearemanypossiblevalidtranslations.(There’snoneedforanexactmatch)

CSC401/2511– Spring2017 34

Challengesofevaluation• Humanjudges: expensive,slow,non-reproducible

(differentjudges– differentbiases).

• Multiplevalidtranslations,e.g.:• Source: Ils’agit d’unguidequiassureque l’armée

seratoujours fidèle auParti• T1: Itisaguidetoactionthatensuresthatthe

militarywillforeverheedPartycommands• T2: Itistheguidingprinciplewhichguarantees

themilitaryforcesalwaysbeingundercommandoftheParty

CSC401/2511– Spring2017 35

BLEUevaluation• BLEU(BiLingual EvaluationUnderstudy)isanautomaticandpopularmethodforevaluatingMT.• Itusesmultiple humanreference translations,andlooksforlocalmatches,allowing forphrasemovement.

• Candidate: n. atranslationproducedbyamachine.

• ThereareafewpartstoaBLEUscore…

CSC401/2511– Spring2017 36

ExampleofBLEUevaluation• Reference1:ItisaguidetoactionthatensuresthatthemilitarywillforeverheedPartycommands• Reference2:ItistheguidingprinciplewhichguaranteesthemilitaryforcesalwaysbeingundercommandoftheParty• Reference3:Itisthepracticalguideforthearmyalwaystoheedthedirectionsoftheparty

• Candidate1:Itisaguidetoactionwhichensuresthatthemilitaryalwaysobeysthecommandsoftheparty• Candidate2:Itistoinsurethetroopsforeverhearingtheactivityguidebookthatpartydirect

CSC401/2511– Spring2017 37

BLEU:Unigramprecision• Theunigramprecisionofacandidateis

𝐶𝑁

where𝑁 isthenumberofwordsinthecandidateand𝐶 isthenumberofwordsinthecandidate

whichareinatleastonereference.

• e.g.,Candidate1:Itisaguidetoactionwhichensuresthatthemilitaryalwaysobeys thecommandsoftheparty• Unigramprecision= A\

A](obeys appearsinnoneofthethreereferences).

CSC401/2511– Spring2017 38

BLEU:Modifiedunigramprecision• Reference1:Thelunaticisonthegrass• Reference2:Thereisalunaticuponthegrass• Candidate:Thethe the the the the the• Unigramprecision= \

\= 1

• Cappedunigramprecision:Acandidatewordtype𝑤 canonlybecorrectamaximumof𝑐𝑎𝑝(𝑤) times.• e.g.,with𝒄𝒂𝒑 𝒕𝒉𝒆 = 𝟐,theabovegives

𝑝A =M\

CSC401/2511– Spring2017 39

BLEU:GeneralizingtoN-grams• Generalizestohigher-orderN-grams.• Reference1:Itis aguidetoactionthatensuresthat

themilitarywillforeverheedPartycommands• Reference2:Itis theguidingprinciplewhich

guaranteesthemilitaryforcesalwaysbeingundercommandoftheParty

• Reference3:Itis thepracticalguideforthearmyalwaystoheedthedirectionsoftheparty

• Candidate1:Itisaguidetoactionwhichensuresthatthemilitaryalwaysobeysthecommandsoftheparty

• Candidate2:Itistoinsurethetroopsforeverhearingtheactivityguidebookthatpartydirect

𝑝M = 1/13

𝑝M = 10/17

Bigramprecision,𝑝M

CSC401/2511– Spring2017 40

BLEU:Precisionisnotenough• Reference1: ItisaguidetoactionthatensuresthatthemilitarywillforeverheedPartycommands• Reference2:ItistheguidingprinciplewhichguaranteesthemilitaryforcesalwaysbeingundercommandoftheParty• Reference3:Itisthepracticalguideforthearmyalwaystoheedthedirectionsoftheparty

• Candidate1:ofthe

Bigramprecision,𝑝M =AA= 1Unigramprecision,𝑝A =

MM= 1

CSC401/2511– Spring2017 41

BLEU:Brevity• Solution:Penalizebrevity.• Step1: foreachcandidate,

findthereferencemost similar inlength.• Step2: 𝒄𝒊 isthelengthofthe𝑖89 candidate,and

𝒓𝒊 isthenearestlengthamongthereferences,

𝑏𝑟𝑒𝑣𝑖𝑡𝑦¬ = 𝑟¬𝑐¬

• Step3: multiplyprecisionbythe(0..1)brevitypenalty:

𝐵𝑃 = ®1 if𝑏𝑟𝑒𝑣𝑖𝑡𝑦 < 1

𝑒A²³´zµ¬8¶ if𝑏𝑟𝑒𝑣𝑖𝑡𝑦 ≥ 1(𝑟¬ < 𝑐¬ )

(𝑟¬ ≥ 𝑐¬ )

Bigger=toobrief

CSC401/2511– Spring2017 42

BLEU:Finalscore• Onslide39, 𝑟A = 16,𝑟M = 17,𝑟N = 16,and

𝑐A = 18and𝑐M = 14,

𝑏𝑟𝑒𝑣𝑖𝑡𝑦A =1718 𝐵𝑃A = 1

𝑏𝑟𝑒𝑣𝑖𝑡𝑦M =1614 𝐵𝑃M = 𝑒A²

]\ = 0.8669

• Finalscore ofcandidate𝐶:

𝐵𝐿𝐸𝑈 = 𝐵𝑃s× 𝑝A𝑝M …𝑝v A v⁄

where𝑝v isthe𝑛-gramprecision.(Youcanset𝑛 empirically)

CSC401/2511– Spring2017 43

Example:FinalBLEUscore• Reference1: IamafraidDaveReference2: IamscaredDaveReference3: IhavefearDavidCandidate: IfearDavid

• 𝑏𝑟𝑒𝑣𝑖𝑡𝑦 = YN≥ 1 so𝐵𝑃 = 𝑒A²

½{

• 𝑝A = A�A�AN

= 1

• 𝑝M = AM

• 𝐵𝐿𝐸𝑈 = 𝐵𝑃 𝑝A𝑝Mxy = 𝑒A²

½{

AM

xy ≈ 0.5067

Assume𝑐𝑎𝑝 ⋅ =2 forallN-grams

AlsoassumeBLEUorder𝑛 = 2

CSC401/2511– Spring2017 44

BLEU:summary• BLEUisageometricmeanover𝑛-gramprecisions.• Theseprecisionsarecappedtoavoidstrangecases.• E.g.,thetranslation“thethethe the”isnotfavoured.

• Thisgeometricmeanisweighted soasnottofavourunrealisticallyshorttranslations,e.g.,“the”

• Initially,evaluationsshowedthatBLEUpredictedhumanjudgementsverywell,but:• PeoplestartedoptimizingMTsystemstomaximize BLEU.CorrelationsbetweenBLEUandhumansdecreased.

Reading• Entirelyoptional:Vogel,S.,Ney,H.,andTillman,C.(1996).HMM-basedWordAlignmentinStatisticalTranslation.In:Proceedingsofthe16thInternationalConferenceonComputationalLinguistics,pp.836-841,Copenhagen.

• UsefulreadingonIBMModel-1:Section25.5ofthe2nd editionoftheJurafsky&Martintext.• 1st editionavailableatRobarts library.

• Other: Manning&Schütze Sections13.1.2(Gale&Church),13.1.3(Church),13.3,14.2.2

CSC401/2511– Spring2017 45

CSC401/2511– Spring2017 46

Announcements• Assignment1marks/commentswillbeemailedindividually.

• Not-for-marksmidtermonMonday6March.