6-2 smt 3 - cs.toronto.edufrank/csc401/lectures2017/6-2_smt_3.pdf · •we also need to learnthe...
TRANSCRIPT
CSC401/2511– Spring2017 2
Today• OverviewofIBM-2,IBM-3,andphrase-basedmethods.
• DecodingforSMT.
• EvaluationofMTsystems.
CSC401/2511– Spring2017 3
PracticalnoteonprogrammingIBM-1
• IfyouweretocodetheEMalgorithmforIBM-1,youwouldnot initialize𝜃 = 𝑃(𝑓|𝑒) uniformlyovertheentire vocabulary.• Don’tmakea𝑉*×𝑉, tablewith𝑃 𝑓 𝑒 = 1/ 𝑉,
• Thisstructurewouldbetoolarge.• Probabilitieswouldbetoosmall.• Itwouldtaketoomuchwork toupdate.
• Rather,initializeahashtableoverpossible alignments,ℳ.ForeveryEnglishword𝑒,onlyconsiderFrenchwords𝑓 insentencesalignedwithEnglishsentencescontaining𝑒.
• e.g.,structureP. 𝑒. 𝑓 ≔ 𝑃 𝑓 𝑒 = 1/ ℳ
CSC401/2511– Spring2017 4
HigherIBMmodels
• OnlyIBMModel1trainingreachesaglobalmaximum• TrainingofeachIBMmodelextends thenextlowestmodel.
• Highermodelsbecomecomputationallyexpensive.
IBMModel1 lexicaltranslationIBMModel2 addsabsolutere-orderingmodelIBMModel3 addsfertilitymodel… …
CSC401/2511– Spring2017 5
IBM-2• Unlike IBMModel-1,theplacementofawordin,say,Spanish inIBMModel-2dependsonwhereitsequivalent wordwasinEnglish.• IBM-2capturestheintuitionthattranslationsshouldlieroughly“alongthediagonal”.
Buenos dias , me gusta papas frías
Good X
day X
, X
I X
like X
cold X
potatoes X
CSC401/2511– Spring2017 6
IBM-2• IBMModel2builds onModel1byaddingare-orderingmodeldefinedbydistortion parametersregardlessofactualwords.
𝐷 𝑖 𝑗, ℒ,, ℒ* =theprobabilitythatthe𝑖89 English slotisalignedtothe𝑗89 French slot,givensentencelengthsℒ, andℒ*.
• InIBMModel2:
𝑃 𝑎 𝐸, ℒ,, ℒ* = =𝐷(𝑎>|𝑗, ℒ,, ℒ*)ℒ?
>@A• RecallthatinIBMModel1,
𝑃 𝑎 𝐸, ℒ,, ℒ* =𝑃(ℒ*)ℒ, + 1 ℒ?
CSC401/2511– Spring2017 7
IBM-2– Probabilityofalignment• 𝐸 = Andtheprogramhasbeenimplemented• 𝐹 = Leprogrammeaété mis enapplication• ℒ, = 6• ℒ* = 7• 𝑎 = {2,3,4,5,6,6,6} (i.e.,𝑓A ← 𝑒M,𝑓M ← 𝑒N,…)
• 𝑃(𝑎|𝐸, ℒ,, ℒ*)=𝐷 2 1,6,7 ×𝐷 3 2,6,7 ×𝐷 4 3,6,7 ×𝐷 5 4,6,7 ×𝐷 6 5,6,7 ×𝐷 6 6,6,7 ×𝐷(6|7,6,7)
D(2nd Englishword|1st Frenchword,…)
Thisisindependentoftheactualwords.Thiscaresonlyaboutposition.
CSC401/2511– Spring2017 8
IBM-2:generation
• TogenerateaFrenchsentence𝐹 fromEnglish𝐸,1. Pickanalignmentwithprobability
∏ 𝑫(𝒂𝒋|𝒋, 𝓛𝑬, 𝓛𝑭)𝓛𝑭𝒋@𝟏
3. Sample Frenchwordswithprobability
𝑃 𝐹 𝑎, 𝐸 ==𝑃(𝑓>|𝑒WX)ℒ?
>@A
𝑃 𝐹, 𝑎 𝐸 = 𝑃 𝑎 𝐸 𝑃 𝐹 𝑎, 𝐸 ==𝑫(𝒂𝒋|𝒋, 𝓛𝑬, 𝓛𝑭)𝑃(𝑓>|𝑒WX)ℒ?
>@A
So,
Thisisthesame𝑃(𝑓|𝑒)asinIBM-1.
CSC401/2511– Spring2017 9
IBM-2:training
• WeuseEM,asbeforewithIBM-1except thatweneedtotakethedistortion intoaccountwhencomputingtheprobabilityofanalignment.
• Wealsoneedtolearn thedistortionfunction.
• Aren’tyougladthatyoudon’tneedtoknowhowtocomputeEMforIBM-2?
CSC401/2511– Spring2017 10
IBM-3• IBMModel3extendsModel2byaddingafertilitymodelthatdescribeshowmanyFrench wordseachEnglish wordcanproduce.• Intheexamplebelow,implemented appearstobemorefertilethanprogram.
𝑒A 𝑒M 𝑒N 𝑒Y 𝑒Z 𝑒[
𝑓A 𝑓M 𝑓N 𝑓Y 𝑓Z 𝑓[ 𝑓\ 𝑓] 𝑓
𝑒_
CSC401/2511– Spring2017 11
IBM-3:Thegenerationmodel
𝑒A 𝑒M 𝑒N 𝑒Y 𝑒Z 𝑒[
𝑓A 𝑓M 𝑓N 𝑓Y 𝑓Z 𝑓[ 𝑓\ 𝑓] 𝑓
𝑒_
• First,wereplicateeachwordaccordingtoanewhiddenparameter,𝑵(𝒏|𝒆),whichistheprobabilitythatword𝒆 produces𝒏 words.• Wethenre-align (withdistortion)andtranslate aswedidinIBM-2.
𝑁(𝑛|𝑒)
CSC401/2511– Spring2017 12
IBMmodels
IBMModel1 lexicaltranslationIBMModel2 addsabsolutere-orderingmodelIBMModel3 addsfertilitymodel
CSC401/2511– Spring2017 14
Phrase-basedstatisticalMT• Phrase-based statisticalMTinvolvessegmentingsentencesintocontiguousblocksorsegments.• Eachphraseisprobabilisticallytranslated.
e.g.,𝑃 𝑧𝑢𝐻𝑎𝑢𝑠𝑠𝑒 𝑎𝑡ℎ𝑜𝑚𝑒• Eachphraseisprobabilisticallyre-ordered.
CSC401/2511– Spring2017 15
Phrase-basedstatisticalMT• Phrase-basedSMTallowsmany-to-many wordmappings.
• Largercontextallowsforsomedisambiguation thatisnotpossibleinword-basedalignment.• E.g.,
𝑃(coup|stroke)vs.
𝑃 coupdepoing|𝑝𝑢𝑛𝑐ℎ >𝑃 coupdepoing|strokeoffist𝑃 coupd’𝑜𝑒𝑖𝑙|𝑔𝑙𝑎𝑛𝑐𝑒 >𝑃 coupd’𝑜𝑒𝑖𝑙|strokeofeye
NocontextL
AtinyamountofcontextJ
CSC401/2511– Spring2017 16
Learningphrase-translations
• Typically,weusealignmenttemplates (Och etal.,1999).• Startwithaword-alignment,thenbuildphrases.
Maria no dió una bofetada a la bruja verde
Mary
did
not
slap
the
green
witch
Thisword-alignment isproducedbyamodellikeIBM-3
CSC401/2511– Spring2017 17
Learningphrase-translations
• Aphrasealignmentmust containall wordalignmentsforeachofitsrowsandcolumns.• Collectall phrasealignmentsthatareconsistentwiththewordalignment,e.g.
Consistent Inconsistent Inconsistent
CSC401/2511– Spring2017 18
Learningphrase-translations
• Givenword-alignments (producedautomaticallyorotherwise),wedonot needtodoEMtraining.E.g.,
• 𝑃 𝑓A𝑓M 𝑒A𝑒M𝑒N = stuv8(wxwy,zxzyz{)stuv8(zxzyz{)
CSC401/2511– Spring2017 21
Decoding• Decoding istheactoftranslatinga‘foreign’languageintoyournativelanguage.• DecodingisanNP-completeproblem(Knight,1999).
• IBMModelsoftendecodedwithstackdecodingorA*search.
• Seminalpaper:U.Germann,M.Jahr,K.Knight,D.Marcu,K.Yamada(2001)FastDecodingandOptimalDecodingforMachineTranslation.In:ACL-2001.• Introducesgreedydecoding– startwithasolutionandincrementallytrytoimprove it.
CSC401/2511– Spring2017 22
Firststageofgreedymethod
• ForeachFrenchword𝑓>,picktheEnglishword𝑒∗ suchthat
𝑒∗ = argmaxz
𝑃(𝑓>|𝑒)
• Thisgivesaninitialalignment,e.g.,
(Better:quitenaturally,hetalksaboutagreatvictory)
Bien entendu , il parle d’ une belle victoireWell heard , it talking ∅ a beautiful victory
CSC401/2511– Spring2017 23
Sometransformations• 𝑪𝒉𝒂𝒏𝒈𝒆(𝒋, 𝒆): setstranslationof𝒇𝒋 to𝒆• UsuallyweonlyconsiderEnglishwords𝑒 thatarein
thetop𝑁 rankedtranslationsfor𝑓>.
• 𝑪𝒉𝒂𝒏𝒈𝒆𝟐(𝒋𝟏, 𝒆𝟏, 𝒋𝟐, 𝒆𝟐): setstranslationof𝒇𝒋𝟏 to𝒆𝟏andtranslationof𝒇𝒋𝟐 to𝒆𝟐
• Likeperformingtwo 𝐶ℎ𝑎𝑛𝑔𝑒 transformationsinsequence,butwithout evaluatingtheintermediatestring.
• 𝑪𝒉𝒂𝒏𝒈𝒆𝑨𝒏𝒅𝑰𝒏𝒔𝒆𝒓𝒕(𝒋, 𝒆𝟏, 𝒆𝟐): setstranslationof𝒇𝒋 to𝒆𝟏 andinserts𝒆𝟐 atitsmostlikelyposition.
CSC401/2511– Spring2017 24
Somemoretransformations• 𝑹𝒆𝒎𝒐𝒗𝒆𝑰𝒏𝒇𝒆𝒓𝒕𝒊𝒍𝒆(𝒊): Removes𝒆𝒊 if𝒆𝒊 isalignedwithno
Frenchwords.
• 𝑺𝒘𝒂𝒑𝑺𝒆𝒈 𝒊𝟏, 𝒊𝟐, 𝒋𝟏, 𝒋𝟐 : Swapssegment𝒆𝒊𝟏:𝒊𝟐 withsegment𝒆𝒋𝟏:𝒋𝟐 suchthatsegmentsdonotoverlap.
• 𝑱𝒐𝒊𝒏𝑾𝒐𝒓𝒅𝒔(𝒊𝟏, 𝒊𝟐): Removes𝒆𝒊𝟏 andaligns allFrenchwords thatwerealignedto𝒆𝒊𝟏 to𝒆𝒊𝟐.
CSC401/2511– Spring2017 25
Iteratinggreedily• Wehaveaninitialpair(𝐸 _ , 𝑎 _ ).
• Uselocaltransformations tomap(𝐸, 𝑎) tonewpairs,(𝐸�, 𝑎�).
• Ateachiteration,𝒌,takethehighestprobabilitypairfromallpossibletransformations• i.e.,if𝓡(𝑬 𝒌 , 𝒂 𝒌 ) isthesetofall(𝑬, 𝒂) ‘reachable’from(𝑬 𝒌 , 𝒂 𝒌 ),thenateachiteration:
𝐸 ��A , 𝑎 ��A = argmax,,W ∈𝓡(𝑬 𝒌 ,𝒂 𝒌 )
𝑃 𝐸 𝑃(𝐹, 𝑎|𝐸)
CSC401/2511– Spring2017 26
Exampleofgreedysearch
Bien intendu , il parle d’ une belle victoireWell heard , it talking ∅ a beautiful victory
Bien intendu , il parle d’ une belle victoireWell heard , it talks ∅ a great victory
𝑪𝒉𝒂𝒏𝒈𝒆𝟐(5, 𝑡𝑎𝑙𝑘𝑠, 8, 𝑔𝑟𝑒𝑎𝑡)
CSC401/2511– Spring2017 27
Exampleofgreedysearch
Bien intendu , il parle d’ une belle victoireWell heard , it talks ∅ a great victory
Bien intendu , il parle d’ une belle victoireWell understood , it talks about a great victory
𝑪𝒉𝒂𝒏𝒈𝒆𝟐(2, 𝑢𝑛𝑑𝑒𝑟𝑠𝑡𝑜𝑜𝑑, 6, 𝑎𝑏𝑜𝑢𝑡)
CSC401/2511– Spring2017 28
Exampleofgreedysearch
Bien intendu , il parle d’ une belle victoireWell understood , he talks 𝑎𝑏𝑜𝑢𝑡 a great victory
𝑪𝒉𝒂𝒏𝒈𝒆(4, ℎ𝑒)
Bien intendu , il parle d’ une belle victoireWell understood , it talks 𝑎𝑏𝑜𝑢𝑡 a great victory
CSC401/2511– Spring2017 29
Exampleofgreedysearch
Bien intendu , il parle d’ une belle victoireQuite naturally , he talks 𝑎𝑏𝑜𝑢𝑡 a great victory
𝑪𝒉𝒂𝒏𝒈𝒆𝟐(1, 𝑞𝑢𝑖𝑡𝑒, 2, 𝑛𝑎𝑡𝑢𝑟𝑎𝑙𝑙𝑦)
Bien intendu , il parle d’ une belle victoireWell understood , he talks 𝑎𝑏𝑜𝑢𝑡 a great victory
CSC401/2511– Spring2017 30
Greedytransformations• Ateachiteration,wetryeachpossibletransformation.
• Foreachpossibletransformation,weevaluate
𝑃 𝐸 𝑃 𝐹, 𝑎 𝐸
• Wechoosethetransformationthatgivesthehighestprobability,anditerateuntilsomestoppingcondition.
CSC401/2511– Spring2017 32
EvaluationofMTsystems
Human AccordingtothedataprovidedtodaybytheMinistryofForeignTradeandEconomicCooperation,asofNovemberthis year,Chinahasactuallyutilized46.959BUSdollarsofforeigncapital,including40.007BUSdollarsofdirectinvestmentfromforeignbusinessmen.
IBM4 TheMinistry ofForeignTradeandEconomicCooperation,includingforeigndirectinvestment40.007BUSdollarstodayprovidedataincludethatyeartoNovemberChinaactuallyusingforeign46.959BUSdollarsand
Yamada/Knight
Today’savailabledataoftheMinistryofForeignTradeandEconomicCooperationshowsthatChina’sactualutilizationofNovemberthisyearwillinclude40.007B USdollarsfortheforeigndirectinvestmentamong46.959BUSdollarsinforeigncapital.
Howcanweobjectivelycomparethequalityoftwotranslations?
CSC401/2511– Spring2017 33
Automaticevaluation• Wewantanautomatic andeffectivemethodtoobjectively rankcompetingtranslations.• WordErrorRate(WER)measuresthenumberoferroneouswordinsertions,deletions,substitutionsinatranslation.• E.g., Reference:howto recognize speech
Translation:howunderstandaspeech
• Problem:Therearemanypossiblevalidtranslations.(There’snoneedforanexactmatch)
CSC401/2511– Spring2017 34
Challengesofevaluation• Humanjudges: expensive,slow,non-reproducible
(differentjudges– differentbiases).
• Multiplevalidtranslations,e.g.:• Source: Ils’agit d’unguidequiassureque l’armée
seratoujours fidèle auParti• T1: Itisaguidetoactionthatensuresthatthe
militarywillforeverheedPartycommands• T2: Itistheguidingprinciplewhichguarantees
themilitaryforcesalwaysbeingundercommandoftheParty
CSC401/2511– Spring2017 35
BLEUevaluation• BLEU(BiLingual EvaluationUnderstudy)isanautomaticandpopularmethodforevaluatingMT.• Itusesmultiple humanreference translations,andlooksforlocalmatches,allowing forphrasemovement.
• Candidate: n. atranslationproducedbyamachine.
• ThereareafewpartstoaBLEUscore…
CSC401/2511– Spring2017 36
ExampleofBLEUevaluation• Reference1:ItisaguidetoactionthatensuresthatthemilitarywillforeverheedPartycommands• Reference2:ItistheguidingprinciplewhichguaranteesthemilitaryforcesalwaysbeingundercommandoftheParty• Reference3:Itisthepracticalguideforthearmyalwaystoheedthedirectionsoftheparty
• Candidate1:Itisaguidetoactionwhichensuresthatthemilitaryalwaysobeysthecommandsoftheparty• Candidate2:Itistoinsurethetroopsforeverhearingtheactivityguidebookthatpartydirect
CSC401/2511– Spring2017 37
BLEU:Unigramprecision• Theunigramprecisionofacandidateis
𝐶𝑁
where𝑁 isthenumberofwordsinthecandidateand𝐶 isthenumberofwordsinthecandidate
whichareinatleastonereference.
• e.g.,Candidate1:Itisaguidetoactionwhichensuresthatthemilitaryalwaysobeys thecommandsoftheparty• Unigramprecision= A\
A](obeys appearsinnoneofthethreereferences).
CSC401/2511– Spring2017 38
BLEU:Modifiedunigramprecision• Reference1:Thelunaticisonthegrass• Reference2:Thereisalunaticuponthegrass• Candidate:Thethe the the the the the• Unigramprecision= \
\= 1
• Cappedunigramprecision:Acandidatewordtype𝑤 canonlybecorrectamaximumof𝑐𝑎𝑝(𝑤) times.• e.g.,with𝒄𝒂𝒑 𝒕𝒉𝒆 = 𝟐,theabovegives
𝑝A =M\
CSC401/2511– Spring2017 39
BLEU:GeneralizingtoN-grams• Generalizestohigher-orderN-grams.• Reference1:Itis aguidetoactionthatensuresthat
themilitarywillforeverheedPartycommands• Reference2:Itis theguidingprinciplewhich
guaranteesthemilitaryforcesalwaysbeingundercommandoftheParty
• Reference3:Itis thepracticalguideforthearmyalwaystoheedthedirectionsoftheparty
• Candidate1:Itisaguidetoactionwhichensuresthatthemilitaryalwaysobeysthecommandsoftheparty
• Candidate2:Itistoinsurethetroopsforeverhearingtheactivityguidebookthatpartydirect
𝑝M = 1/13
𝑝M = 10/17
Bigramprecision,𝑝M
CSC401/2511– Spring2017 40
BLEU:Precisionisnotenough• Reference1: ItisaguidetoactionthatensuresthatthemilitarywillforeverheedPartycommands• Reference2:ItistheguidingprinciplewhichguaranteesthemilitaryforcesalwaysbeingundercommandoftheParty• Reference3:Itisthepracticalguideforthearmyalwaystoheedthedirectionsoftheparty
• Candidate1:ofthe
Bigramprecision,𝑝M =AA= 1Unigramprecision,𝑝A =
MM= 1
CSC401/2511– Spring2017 41
BLEU:Brevity• Solution:Penalizebrevity.• Step1: foreachcandidate,
findthereferencemost similar inlength.• Step2: 𝒄𝒊 isthelengthofthe𝑖89 candidate,and
𝒓𝒊 isthenearestlengthamongthereferences,
𝑏𝑟𝑒𝑣𝑖𝑡𝑦¬ = 𝑟¬𝑐¬
• Step3: multiplyprecisionbythe(0..1)brevitypenalty:
𝐵𝑃 = ®1 if𝑏𝑟𝑒𝑣𝑖𝑡𝑦 < 1
𝑒A²³´zµ¬8¶ if𝑏𝑟𝑒𝑣𝑖𝑡𝑦 ≥ 1(𝑟¬ < 𝑐¬ )
(𝑟¬ ≥ 𝑐¬ )
Bigger=toobrief
CSC401/2511– Spring2017 42
BLEU:Finalscore• Onslide39, 𝑟A = 16,𝑟M = 17,𝑟N = 16,and
𝑐A = 18and𝑐M = 14,
𝑏𝑟𝑒𝑣𝑖𝑡𝑦A =1718 𝐵𝑃A = 1
𝑏𝑟𝑒𝑣𝑖𝑡𝑦M =1614 𝐵𝑃M = 𝑒A²
]\ = 0.8669
• Finalscore ofcandidate𝐶:
𝐵𝐿𝐸𝑈 = 𝐵𝑃s× 𝑝A𝑝M …𝑝v A v⁄
where𝑝v isthe𝑛-gramprecision.(Youcanset𝑛 empirically)
CSC401/2511– Spring2017 43
Example:FinalBLEUscore• Reference1: IamafraidDaveReference2: IamscaredDaveReference3: IhavefearDavidCandidate: IfearDavid
• 𝑏𝑟𝑒𝑣𝑖𝑡𝑦 = YN≥ 1 so𝐵𝑃 = 𝑒A²
½{
• 𝑝A = A�A�AN
= 1
• 𝑝M = AM
• 𝐵𝐿𝐸𝑈 = 𝐵𝑃 𝑝A𝑝Mxy = 𝑒A²
½{
AM
xy ≈ 0.5067
Assume𝑐𝑎𝑝 ⋅ =2 forallN-grams
AlsoassumeBLEUorder𝑛 = 2
CSC401/2511– Spring2017 44
BLEU:summary• BLEUisageometricmeanover𝑛-gramprecisions.• Theseprecisionsarecappedtoavoidstrangecases.• E.g.,thetranslation“thethethe the”isnotfavoured.
• Thisgeometricmeanisweighted soasnottofavourunrealisticallyshorttranslations,e.g.,“the”
• Initially,evaluationsshowedthatBLEUpredictedhumanjudgementsverywell,but:• PeoplestartedoptimizingMTsystemstomaximize BLEU.CorrelationsbetweenBLEUandhumansdecreased.
Reading• Entirelyoptional:Vogel,S.,Ney,H.,andTillman,C.(1996).HMM-basedWordAlignmentinStatisticalTranslation.In:Proceedingsofthe16thInternationalConferenceonComputationalLinguistics,pp.836-841,Copenhagen.
• UsefulreadingonIBMModel-1:Section25.5ofthe2nd editionoftheJurafsky&Martintext.• 1st editionavailableatRobarts library.
• Other: Manning&Schütze Sections13.1.2(Gale&Church),13.1.3(Church),13.3,14.2.2
CSC401/2511– Spring2017 45