statistical machine translation with the moses toolkitthe 11th conference of the association for m...
TRANSCRIPT
![Page 1: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/1.jpg)
The 11th Conference of the Association for Machine Translation in the Americas
October 22 – 26, 2014 -- Vancouver, BC Canada
Tutorial on
Statistical Machine Translation
With the Moses Toolkit
Hieu Hoang, Matthias Huck, and Philipp Koehn
Association for Machine Translation in the Americas
http://www.amtaweb.org
![Page 2: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/2.jpg)
Mose
sMach
ineTranslationwithOpen
SourceSoftware
HieuHoangandMatthiasHuck
Octob
er20
14
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 3: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/3.jpg)
1Outline
09:30-10:15
Introduction
10:15-11:00
Hands-onSession—
youwill
needalaptop
11:00-11:30
Break
11:30-12:30
Adva
ncedTopics
Slides
dow
nloadable
from
http://www.statmt.org/moses/amta.2014.pdf
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 4: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/4.jpg)
2BasicIdea
Sta
tistic
al
Mac
hine
Tr
ansl
atio
n S
yste
m
Trai
ning
Dat
aLi
ngui
stic
Too
ls
Sta
tistic
al
Mac
hine
Tr
ansl
atio
n S
yste
m
Tran
slat
ion
Sou
rce
Text
Training
Using
para
llel c
orpo
ram
onol
ingu
al c
orpo
radi
ctio
narie
s
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 5: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/5.jpg)
3StatisticalMach
ineTranslationHistory
around1990
Pioneeringworkat
IBM,inspired
bysuccessin
speech
recogn
ition
1990s
Dom
inance
ofIBM’sword-based
models,supporttechnolog
ies
early2000s
Phrase-based
models
late
2000s
Tree-based
models
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 6: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/6.jpg)
4MosesHistory
2002
Pharaohdecoder,precursor
toMoses
(phrase-based
models)
2005
Moses
startedby
HieuHoangandPhilippKoehn(factoredmodels)
2006
JHU
workshop
extendsMoses
sign
ificantly
2006-2012
Fundingby
EU
projects
EuroMatrix,
EuroMatrixP
lus
2009
Tree-based
modelsim
plementedin
Moses
2012-2015
MosesCoreproject.
Full-timestaff
tomaintain
andenhance
Moses
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 7: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/7.jpg)
5Mosesin
Aca
dem
ia
•Built
byacadem
ics,foracadem
ics
•Reference
implementation
ofstateof
theart
–researchersdevelop
new
methodson
topof
Moses
–develop
ersre-implementpublished
methods
–usedby
other
researchersas
black
box
•Baselineto
beat
–researcherscomparetheirmethodagainst
Moses
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 8: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/8.jpg)
6Dev
eloper
Community
•Maindevelop
mentat
University
ofEdinburgh,butalso:
–Fon
dazioneBrunoKessler
(Italy)
–CharlesUniversity
(Czech
Republic)
–DFKI(G
ermany)
–RWTH
Aachen
(Germany)
–others...
•Codeshared
ongithub.com
•Mainforum:Moses
supportmailinglist
•Mainevent:
MachineTranslationMarathon
–annual
open
sourceconvention
–presentation
ofnew
open
sourcetools
–hands-on
workon
new
open
sourceprojects
–summer
school
forstatisticalmachinetranslation
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 9: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/9.jpg)
7Open
SourceComponen
ts
•Moses
distribution
usesexternal
open
sourcetools
–wordalignment:
giza++,mgiza,BerkeleyAligner,Fast
Align
–langu
agemodel:sr
ilm,irst
lm,randlm,kenlm
–scoring:
bleu,ter,meteor
•Other
usefultools
–sentence
aligner
–syntactic
parsers
–part-of-speech
tagg
ers
–morpholog
ical
analyzers
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 10: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/10.jpg)
8Other
Open
SourceMT
Systems
•Joshua—
JohnsHop
kinsUniversity
http://joshua.sourceforge.net/
•CDec
—University
ofMaryland
http://cdec-decoder.org/
•Jane—
RWTH
Aachen
http://www.hltpr.rwth-aachen.de/jane/
•Phrasal—
Stanford
University
http://nlp.stanford.edu/phrasal/
•Verysimilartechnolog
y
–JoshuaandPhrasalim
plementedin
Java,othersin
C++
–Joshuasupports
only
tree-based
models
–Phrasalsupports
only
phrase-based
models
•Open
sourcingtoolsincreasingtrendin
NLPresearch
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 11: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/11.jpg)
9Mosesin
Industry
•DistributedwithLGPL—
free
touse
•Com
petitivewithcommercial
SMT
solution
s(G
oogle,Microsoft,SDLLangu
ageWeaver,...)
•But:
–not
easy
touse
–requires
sign
ificantexpertise
forop
timal
perform
ance
–integrationinto
existingworkfl
ownot
straight-forward
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 12: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/12.jpg)
10Case
Studies
Europea
nCommission
—usesMoses
in-hou
seto
aidhuman
translators
Autodesk
—show
edproductivityincreasesin
translatingmanualswhen
post-editingou
tput
from
acustom
-build
Moses
system
Systran
—develop
edstatisticalpost-editingusingMoses
AsiaOnlin
e—
offerstranslationtechnolog
yandservices
based
onMoses
Manyothers...
World
Trade
Organisation,
Adob
e,Sym
antec,
WIPO,
Sybase,
Safaba,
Bloom
berg,
Pangeanic,KatanMT,Capita,
...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 13: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/13.jpg)
11Phrase-basedTranslation
•Foreign
inputissegm
entedin
phrases
•Eachphrase
istranslated
into
English
•Phrasesarereordered
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 14: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/14.jpg)
12Phrase
TranslationOptions
heergeht
janicht
nach
hause
it , it
, he
is are
goes
go
yes
is, o
f cou
rse
not
do n
otdo
es n
otis
not
afte
rto
acco
rdin
g to
in
hous
eho
me
cham
ber
at h
ome
not
is n
otdo
es n
otdo
not
hom
eun
der
hous
ere
turn
hom
edo
not
it is
he w
ill b
eit
goes
he g
oes
is are
is a
fter
all
does
tofo
llow
ing
not a
fter
not t
o
, not
is n
otar
e no
tis
not
a
•Manytranslationop
tion
sto
choosefrom
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 15: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/15.jpg)
13Phrase
TranslationOptions
heergeht
janicht
nach
hause
it , it
, he
is are
goes
go
yes
is, o
f cou
rse
not
do n
otdo
es n
otis
not
afte
rto
acco
rdin
g to
in
hous
eho
me
cham
ber
at h
ome
not
is n
otdo
es n
otdo
not
hom
eun
der
hous
ere
turn
hom
edo
not
it is
he w
ill b
eit
goes
he g
oes
is are
is a
fter
all
does
tofo
llow
ing
not a
fter
not t
ono
tis
not
are
not
is n
ot a
•Themachinetranslationdecoder
does
not
know
therigh
tansw
er–
pickingtherigh
ttranslationop
tion
s–
arrangingthem
intherigh
torder
→Searchprob
lem
solved
bybeam
search
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 16: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/16.jpg)
14Decoding:Preco
mpute
TranslationOptions
ergeht
janicht
nach
hause
consultphrase
translationtable
forallinputphrases
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 17: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/17.jpg)
15Decoding:Start
withInitialHyp
othesis
ergeht
janicht
nach
hause
initialhyp
othesis:noinputwordscovered,noou
tputproduced
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 18: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/18.jpg)
16Decoding:Hyp
othesis
Exp
ansion
ergeht
janicht
nach
hause
are
pickanytranslationop
tion
,create
new
hyp
othesis
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 19: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/19.jpg)
17Decoding:Hyp
othesis
Exp
ansion
ergeht
janicht
nach
hause
are ithe
create
hyp
otheses
forallother
translationop
tion
s
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 20: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/20.jpg)
18Decoding:Hyp
othesis
Exp
ansion
ergeht
janicht
nach
hause
are ithe
goes
does
not
yes
go to
hom
e
hom
e
also
create
hyp
otheses
from
createdpartial
hyp
othesis
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 21: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/21.jpg)
19Decoding:FindBestPath
ergeht
janicht
nach
hause
are ithe
goes
does
not
yes
go to
hom
e
hom
e
backtrack
from
highestscoringcomplete
hyp
othesis
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 22: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/22.jpg)
20FactoredTranslation
•Factoredrepresention
ofwords
wor
dw
ord
part
-of-
spee
ch
Output
Input
mor
phol
ogy
part
-of-
spee
ch
mor
phol
ogy
wor
d cl
ass
lem
ma
wor
d cl
ass
lem
ma
......
•Goals
–generalization,e.g.
bytranslatinglemmas,not
surfaceform
s–
richer
model,e.g.
usingsyntaxforreordering,
langu
agemodeling
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 23: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/23.jpg)
21FactoredModel
Example:
lem
ma
lem
ma
part
-of-
spee
ch
Output
Input
mor
phol
ogy
part
-of-
spee
ch
wor
dw
ord
Decom
posingthetranslationstep
Translatinglemmaandmorpholog
ical
inform
ationmorerobust
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 24: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/24.jpg)
22Syn
tax-basedTranslation
String-to-String
String-to-T
ree
JohnmissesMary
JohnmissesMary
⇒Marie
manqueaJean
⇒S
NP
Marie
VP
V
manque
PP
P a
NP
Jean
Tree-to-String
Tree-to-T
ree
S
NP
John
VP
V
misses
NP
Mary
S
NP
John
VP
V
misses
NP
Mary⇒
S
NP
Marie
VP
V
manque
PP
P a
NP
Jean
⇒MariemanqueaJean
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 25: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/25.jpg)
23Syn
tax-basedDecoding
Sie
PP
ER
will
VA
FIN
eine
AR
TTa
sse
NN
Kaf
fee
NN
trin
ken
VV
INF
NP
VP
S
VB
drin
k
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 26: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/26.jpg)
24Syn
tax-basedDecoding
Sie
PP
ER
will
VA
FIN
eine
AR
TTa
sse
NN
Kaf
fee
NN
trin
ken
VV
INF
NP
VP
S
PR
Osh
eV
Bdr
ink
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 27: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/27.jpg)
25Syn
tax-basedDecoding
Sie
PP
ER
will
VA
FIN
eine
AR
TTa
sse
NN
Kaf
fee
NN
trin
ken
VV
INF
NP
VP
S
PR
Osh
eV
Bdr
ink
NN
coffe
e
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 28: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/28.jpg)
26Syn
tax-basedDecoding
Sie
PP
ER
will
VA
FIN
eine
AR
TTa
sse
NN
Kaf
fee
NN
trin
ken
VV
INF
NP
VP
S
PR
Osh
eV
Bdr
ink
NN
coffe
e
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 29: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/29.jpg)
27Syn
tax-basedDecoding
Sie
PP
ER
will
VA
FIN
eine
AR
TTa
sse
NN
Kaf
fee
NN
trin
ken
VV
INF
NP
VP
S
PR
Osh
eV
Bdr
ink
NN |
cup
IN | of
NP
PP
NN
NP
DE
T | a
NN
coffe
e
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 30: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/30.jpg)
28Syn
tax-basedDecoding
Sie
PP
ER
will
VA
FIN
eine
AR
TTa
sse
NN
Kaf
fee
NN
trin
ken
VV
INF
NP
VP
S
PR
Osh
eV
Bdr
ink
NN |
cup
IN | of
NP
PP
NN
NP
DE
T | a
VB
Z |w
ants
VB
VP
VP
NP
TO | to
NN
coffe
e
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 31: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/31.jpg)
29Syn
tax-basedDecoding
Sie
PP
ER
will
VA
FIN
eine
AR
TTa
sse
NN
Kaf
fee
NN
trin
ken
VV
INF
NP
VP
S
PR
Osh
eV
Bdr
ink
NN |
cup
IN | of
NP
PP
NN
NP
DE
T | a
VB
Z |w
ants
VB
VP
VP
NP
TO | to
NN
coffe
e
S
PR
O V
P
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 32: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/32.jpg)
30How
doIget
started?
•Collect
yourdata
–Paralleldata
∗Freelyavailable
data,
e.g.
Europarl,MultiUN,WIT3,
OPUS,...
∗TAUS,Lingu
isticDataCon
sortium
(LDC),...
–Mon
olingu
aldata
∗Com
mon
Crawl,WMT
New
sCrawlcorpus,...
•Set
upMoses
–Dow
nload
sourcecodeforMoses,GIZA++,MGIZA
–Com
pile,install
–Oruse
precom
piledMoses
packages(W
indow
s,Linux,
Mac
OSX)
–Moreinfo:http://www.statmt.org/moses/
•Train
system
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 33: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/33.jpg)
31Data-drive
nMT
Training
ParallelTrainingData
TranslationModel
Mon
olingu
alTrainingData
Langu
ageModel
LM
Estim
ation
Tuning
Develop
mentSet
SMT
System
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 34: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/34.jpg)
32MosesPipeline
Execute
alotof
scripts
tokenize<corpus.en>corpus.en.tok
lowercase<corpus.en.tok>corpus.en.lc
...
mert.perl....
moses...
mteval-v13.pl...
Change
apartof
theprocess,execute
everythingagain
tokenize<corpus.en>corpus.en.tok
lowercase<corpus.en.tok>corpus.en.lc
...
mert.perl....
moses...
mteval-v13.pl...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 35: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/35.jpg)
33Phrase-basedModel
TrainingwithMoses
•Com
mandline
train-model.perl...
•Example
phrase
from
model
Bndnisse|||alliances|||11112.718||||||11
GeneralMusharrafbetratam|||generalMusharrafappearedon|||11112.718||||||11
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 36: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/36.jpg)
34Phrase-basedDecodingwithMoses
•Com
mandline
moses-fmoses.ini-iin.txt>out.txt
•Advantages
–fast
—under
halfasecondper
sentence
forfast
configu
ration
–low
mem
oryrequirem
ents
—∼2
00MBRAM
forlowestconfigu
ration
–state-of-the-arttranslationqualityon
mosttasks,
especially
forrelatedlangu
agepairs
–robust
—does
not
rely
onanysyntactic
annotation
•Disadvantages
–poor
modelingof
lingu
istickn
owledge
andof
long-distance
dependencies
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 37: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/37.jpg)
35HierarchicalModel
TrainingwithMoses
•Hierarchical
model:
form
ally
syntax-based,withou
tlingu
isticannotation(string-to-string)
•Com
mandline
train-model.perl-hierarchical...
•Example
rule
from
model
Bundnisse[X][X]Kraften[X]|||alliances[X][X]forces[X]|||11112.718|||1-1|||0.05263160.0526316
•Visualizationof
rule
X
Bundnisse
X1
Kraften
⇒X
alliances
X1
forces
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 38: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/38.jpg)
36HierarchicalDecodingwithMoses
•Com
mandline
moses_chart-fmoses.ini-iin.txt>out.txt
•Advantages
–able
tomodel
non
-con
tigu
itieslikene...pas
→not
–betterat
medium-range
reordering
–ou
tperform
sphrase-based
system
swhen
translatingbetweenwidelydifferent
langu
ages,e.g.
Chinese-English
•Disadvantages
–morediskusage
—translationmodel
×10larger
than
phrase-based
–slow
er—
0.5-2sec/sent.forfastestconfigu
ration
–higher
mem
oryrequirem
ents
—morethan
1GBRAM
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 39: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/39.jpg)
37Syn
tax-basedModel
TrainingwithMoses
•Com
mandline
train-model.perl-ghkm...
•Example
rule
from
model
[X][NP-SB]alsowanted[X][ADJA][X][NN][X]|||[X][NP-SB]wolltenauchdie[X][ADJA][X][NN][S-TOP]|||...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 40: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/40.jpg)
38Syn
tax-basedDecodingwithMoses
•Com
mandline
moses_chart-fmoses.ini-iin.txt>out.txt
(likehierarchical)
•Advantage
–canuse
outsidelingu
isticinform
ation
–prom
ises
tosolveim
portantprob
lemsin
SMT,e.g.
long-range
reordering
•Disadvantages
–trainingslow
anddiffi
cultto
getrigh
t–
requires
syntactic
parse
annotation
∗syntactic
parsers
available
only
forsomelangu
ages
∗not
designed
formachinetranslation
∗unreliable
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 41: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/41.jpg)
39Exp
erim
entManagem
entSystem
•EMSautomates
theentire
pipeline
•Oneconfigu
ration
file
forallsettings:record
ofallexperim
entaldetails
•Schedulerof
individual
stepsin
pipeline
–automatically
keepstrackof
dependencies
–parallelexecution
–crashdetection
–automatic
re-use
ofpriorresults
•Fastto
use
–setupanew
experim
entin
minutes
–setupavariationof
anexperim
entin
seconds
•Disadvantage:not
allMoses
featuresareintegrated
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 42: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/42.jpg)
40How
does
itwork?
•Write
aconfigu
ration
file
(typically
byadaptingan
existingfile)
•Test:
experiment.perl-configconfig
•Execute:
experiment.perl-configconfig-exec
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 43: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/43.jpg)
41
Wor
kflow
aut
omat
ical
lyge
nera
ted
byex
perim
ent.p
erl
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 44: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/44.jpg)
42W
ebInterface
Listof
experim
ents
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 45: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/45.jpg)
43ListofRuns
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 46: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/46.jpg)
44Analysis:
BasicStatistics
•Basic
statistics
–n-gram
precision
–evaluationmetrics
–coverage
oftheinputin
corpusandtranslationmodel
–phrase
segm
entation
sused
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 47: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/47.jpg)
45Analysis:
UnknownW
ords
grou
ped
bycountin
test
set
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 48: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/48.jpg)
46Analysis:
OutputAnnotation
Color
highlightingto
indicaten-gram
overlapwithreference
translation
darkerbleu=
wordispartof
larger
n-gram
match
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 49: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/49.jpg)
47Analysis:
InputAnnotation
•For
each
wordandphrase,colorcodingandstatson
–number
ofoccurrencesin
trainingcorpus
–number
ofdistinct
translationsin
translationmodel
–entropyof
conditionaltranslationprob
ability
distribution
φ(e|f)(normalized)
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 50: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/50.jpg)
48Analysis:
BilingualConco
rdancer
translationof
inputphrase
intrainingdatacontext
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 51: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/51.jpg)
49Analysis:
Alig
nmen
t
Phrase
alignmentof
thedecodingprocess
(red
border,interactive)
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 52: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/52.jpg)
50Analysis:
TreeAlig
nmen
t
Usesnestedboxes
toindicatetree
structure
(red
border,yellow
shaded
spansin
focus,interactive)
forsyntaxmodel,non
-terminalsarealso
show
n
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 53: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/53.jpg)
5151
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 54: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/54.jpg)
52Analysis:
Comparisonof2Runs
Differentwordsarehighlighted
sortable
bymostim
provem
ent,deterioration
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 55: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/55.jpg)
53
Hands-OnSession
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 56: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/56.jpg)
54Adva
ncedFea
tures
•Faster
Training
•FasterDecoding
•Moses
Server
•Dataanddom
ainadaptation
•Instructionsto
decoder
•Inputform
ats
•Outputform
ats
•Increm
entalTraining
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 57: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/57.jpg)
55Adva
ncedFea
tures
•Faster
Training
–Tokenization
–Tuning
–Alignment
–Phrase-Table
Extraction
–Train
langu
agemodel
...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 58: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/58.jpg)
56Faster
Training
•Runstepsin
parallel(that
donot
dependon
each
other)
•MulticoreParallelization
.../train-model.perl-parallel
•EMS:
[TRAINING]
parallel=yes
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 59: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/59.jpg)
57Adva
ncedFea
tures
•FasterTraining
–Toke
nization
–Tuning
–Alignment
–Phrase-Table
Extraction
–Train
langu
agemodel
...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 60: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/60.jpg)
58Faster
Training
•Multi-threaded
tokenization
•Specifynumber
ofthreads
.../tokenizer.perl-threadsNUM
•EMS:
input-tokenizer="$moses-script-dir/tokenizer/tokenizer.perl
-threadsNUM"
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 61: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/61.jpg)
59Adva
ncedFea
tures
•FasterTraining
–Tokenization
–Tuning
–Alignment
–Phrase-Table
Extraction
–Train
langu
agemodel
...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 62: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/62.jpg)
60Faster
Training
•Multi-threaded
tokenization
•Specifynumber
ofthreads
.../mert-threadsNUM
•EMS:
tuning-settings="-threadsNUM"
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 63: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/63.jpg)
61Adva
ncedFea
tures
•FasterTraining
–Tokenization
–Tuning
–Alig
nmen
t
–Phrase-Table
Extraction
–Train
langu
agemodel
...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 64: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/64.jpg)
62Faster
Training
•WordAlignment
•Multi-threaded
–Use
MGIZA,not
GIZA++
.../train-model.perl-mgiza-mgiza-cpusNUM
EMS:
training-options="-mgiza-mgiza-cpusNUM"
•On:mem
ory-lim
ited
machines
–snt2coocprog
ram
requires
6GB+
mem
ory
–Reimplementation
uses10
MB,buttake
longerto
run
.../train-model.perl-snt2coocsnt2cooc.pl
EMS:
training-options="-snt2coocsnt2cooc.pl"
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 65: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/65.jpg)
63Adva
ncedFea
tures
•FasterTraining
–Tokenization
–Tuning
–Alignment
–Phrase-T
able
Extraction
–Train
langu
agemodel
...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 66: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/66.jpg)
64Faster
Training
•Phrase-Table
Extraction
–Split
trainingdatainto
NUM
equal
parts
–Extract
concurrently .../train-model.perl-coresNUM
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 67: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/67.jpg)
65Faster
Training
•Sorting
–Relyheavily
onUnix
’sort’command
–may
take
50%+
oftranslationmodel
build
time
–Needto
optimizefor
∗speed
∗diskusage
–Dependenton
∗sort
version
∗Unix
version
∗available
mem
ory
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 68: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/68.jpg)
66Faster
Training
•Plain
sorted
sort<extract.txt>extract.sorted.txt
•Optimized
forlargeserver
sort--buffer-size10G--parallel5
--batch-size253--compress-program[gzip/pigz]...
–Use
10GB
ofRAM
—themorethebetter
–5CPUs—
themorethebetter
–mergesort
atmost25
3files
–compressinterm
ediate
files—
less
diski/o
•In
Moses:
.../train-model.perl-sort-buffer-size10G-sort-parallel5
-sort-batch-size253-sort-compresspigz
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 69: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/69.jpg)
67Adva
ncedFea
tures
•FasterTraining
–Tokenization
–Tuning
–Alignment
–Phrase-Table
Extraction
–Train
languagemodel
...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 70: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/70.jpg)
68IR
STLM:Training
•Develop
edby
FBK-irst,Trento,Italy
•Specialized
trainingforlargecorpora
–parallelization
–reduce
mem
oryusage
•Quantization
ofprob
abilities
–reducesmem
orybutlose
accuracy
–prob
ability
stored
in1byte
insteadof
4bytes
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 71: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/71.jpg)
69IR
STLM:Training
•Training:
build-lm.sh-i"gunzip-ccorpus.gz"-n3
-otrain.irstlm.gz-k10
–-n3
=n-gram
order
–-k10
=split
trainingprocedure
into
10steps
•EMS:
irst-dir=[IRSTpath]
lm-training="$moses-script-dir/generic/trainlm-irst.perl
-coresNUM-irst-dir$irst-dir"
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 72: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/72.jpg)
70New
:KENLM
Training
•Can
trainvery
largelangu
agemodelswithlim
ited
RAM
(ondiskstream
ing)
lmplz-o[order]-S[memory]<text>text.lm
•-oorder
=n-gram
order
•-Smemory
=How
much
mem
oryto
use.
–NUM%
=percentage
ofphysical
mem
ory
–NUM[b/K/M/G/T]
=specified
amou
ntin
bytes,kilo
bytes,etc.
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 73: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/73.jpg)
71Adva
ncedFea
tures
•FasterTraining
•Faster
Decoding
•Moses
Server
•Dataanddom
ainadaptation
•Instructionsto
decoder
•Inputform
ats
•Outputform
ats
•Increm
entalTraining
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 74: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/74.jpg)
72Adva
ncedFea
tures
•FasterTraining
•Faster
Decoding
–Multi-threading
–Speedvs.Mem
ory
–Speedvs.Quality
...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 75: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/75.jpg)
73Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
–Multi-threading
–Speedvs.Mem
ory
–Speedvs.Quality
...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 76: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/76.jpg)
74Faster
Decoding
•Multi-threaded
decoding
.../moses--threadsNUM
•Easyspeed-up
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 77: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/77.jpg)
75Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
–Multi-threading
–Spee
dvs.Mem
ory
–Speedvs.Quality
...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 78: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/78.jpg)
76Spee
dvs.Mem
ory
Use
Typical
Europarlfile
sizes:
•Langu
agemodel
–17
0MB
(trigram
)–
412MB
(5-gram)
•Phrase
table
–11GB
•Lexicalized
reordering
–9.4G
B
→total=
20.8
GB
Wor
king
mem
ory
Tran
slat
ion
mod
el
Lang
uage
mod
el
Pro
cess
siz
e(R
AM
)
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 79: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/79.jpg)
77Spee
dvs.Mem
ory
Use
•Loadinto
mem
ory
–longload
time
–largemem
oryusage
–fast
decoding
Wor
king
mem
ory
Tran
slat
ion
mod
el
Lang
uage
mod
el
Pro
cess
siz
e(R
AM
)
•Load-on-dem
and
–storeindexed
model
ondisk
–binaryform
at–
minim
alstart-uptime,
mem
oryusage
–slow
erdecoding
Dis
kca
che
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 80: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/80.jpg)
78Create
BinaryTables
Phrase
Table:
Phrase-based
exportLC_ALL=C
catpt.txt|sort|./processPhraseTable-ttable00-\
-nscores4-outout.file
exportLC_ALL=C
./CreateOnDiskPt1141002pt.txtout.folder
Hierarchical
/Syntax
exportLC_ALL=C./CreateOnDiskPt1141002pt.txtout.folder
Lexical
ReorderingTable:
exportLC_ALL=C
processLexicalTable-inr-t.txt-outout.file
Langu
ageModels(later)
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 81: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/81.jpg)
79SpecifyBinaryTables
Change
inifile
Phrase
Table
[feature]
PhraseDictionaryBinaryname=TranslationModel0table-limit=20\
num-features=4path=/.../phrase-table
Hierarchical
/Syntax
[feature]
PhraseDictionaryOnDiskname=TranslationModel0table-limit=20\
num-features=4path=/.../phrase-table
Lexical
ReorderingTable
automatically
detected
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 82: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/82.jpg)
80Compact
Phrase
Table
•Mem
ory-effi
cientdatastructure
–phrase
table
6–7times
smallerthan
on-diskbinarytable
–lexicalreorderingtable
12–1
5times
smallerthan
on-diskbinarytable
•Storedin
RAM
•May
bemem
orymapped
•Train
withprocessPhraseTableMin
•SpecifywithPhraseDictionaryCompact
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 83: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/83.jpg)
81IR
STLM
•Develop
edby
FBK-irst,Trento,Italy
•Createabinaryform
atwhichcanberead
from
diskas
needed
–reducesmem
orybutslow
erdecoding
•Quantization
ofprob
abilities
–reducesmem
orybutlose
accuracy
–prob
ability
stored
in1byte
insteadof
4bytes
•Not
multithreaded
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 84: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/84.jpg)
82IR
STLM
inMoses
•Com
pile
thedecoder
withIRSTLM
library
./configure--with-irstlm=[rootdiroftheIRSTLMtoolkit]
•Createbinaryform
at:
compile-lmlanguage-model.srilmlanguage-model.blm
•Load-on-dem
and:
renamefile.mm
•Change
inifile
touse
IRSTLM
implementation
[feature]
IRSTLMname=LM0factor=0path=/.../lmorder=5
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 85: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/85.jpg)
83KENLM
•Develop
edby
KennethHeafield(C
MU
/Edinburgh/Stanford)
•Fastest
andsm
allest
langu
agemodel
implementation
•Com
pile
from
LM
trained
withSRILM
build_binarymodel.lmmodel.binlm
•Specifyin
decoder
[feature]
KENLMname=LM0factor=0path=/.../model.binlmorder=5
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 86: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/86.jpg)
84New
:OSM
(OperationsSeq
uen
ceModel)
•Amodel
that
–considerssourceandtarget
contextual
inform
ationacross
phrases
–integrates
translationandreorderinginto
asinglemodel
•Con
vert
abilingu
alsentence
toasequence
ofop
erations
–Translate(G
enerateaminim
altranslationunit)
–Reordering(Insert
agapor
Jump)
•P(e,f,a)=
N-gram
model
over
resultingop
erationsequences
•Overcom
esphrasalindependence
assumption
–Con
siderssourceandtarget
contextual
inform
ationacross
phrases
•Betterreorderingmodel
–Translationandreorderingdecisionsinfluence
each
–Handleslocalandlongdistance
reorderings
inaunified
manner
•Nospuriou
sphrasalsegm
entation
prob
lem
•Average
gain
of+0.40
onnew
s-test20
13across
10pairs
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 87: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/87.jpg)
85New
:OSM
(OperationsSeq
uen
ceModel)
Sie
wür
den
gege
nS
iest
imm
en
The
y w
ould
vot
e ag
ains
t you
Op
erat
ion
so 1
Gen
erat
e (S
ie, T
hey)
o 2G
ener
ate
(wür
den,
wou
ld)
o 3In
sert
Gap
o 4G
ener
ate
(stim
men
, vot
e)
o 5Ju
mp
Bac
k (1
)
o 6G
ener
ate
(geg
en, a
gain
st)
o 7G
ener
ate
(Sie
, you
)
Sie
wür
den
stim
men
The
y
wou
ld
vot
e
o 1o 2
o 3o 4
Co
nte
xt W
ind
ow
Mo
del
:
p osm
(F,E
,A)
= p
(o1,
...,o
N)
= �
ip(
o i|o
i-n+
1...o
i-1)
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 88: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/88.jpg)
86Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
–Multi-threading
–Speedvs.Mem
ory
–Spee
dvs.Quality
...
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 89: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/89.jpg)
87Spee
dvs.Quality tim
e (s
econ
ds/w
ord)
quality (bleu)op
timum
for
win
ning
com
petit
ions
optim
umfo
r pr
actic
alus
e
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 90: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/90.jpg)
88Spee
dvs.Quality
ergeht
janicht
nach
hause
are ithe
goes
does
not
yes
go to
hom
e
hom
e
•Decoder
search
createsvery
largenumber
ofpartial
translations(”hyp
otheses”)
•Decodingtime∼
number
ofhyp
otheses
created
•Translationquality∼
number
ofhyp
othesiscreated
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 91: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/91.jpg)
89Hyp
othesis
Stack
s
are ithe
goes
does
not
yes
no w
ord
tran
slat
edon
e w
ord
tran
slat
edtw
o w
ords
tran
slat
edth
ree
wor
dstr
ansl
ated
•Phrase-based:Onestackper
number
ofinputwordscovered
•Number
ofhyp
othesiscreated=
sentence
length×
stacksize
×applicable
translationop
tion
s
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 92: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/92.jpg)
90PruningParameters
•Regularbeam
search
–--stackNUM
max.number
ofhyp
otheses
contained
ineach
stack
–--ttable-limitNUM
max.num.of
translationop
tion
sper
inputphrase
–search
timerough
lylinearwithrespectto
each
number
•Cubepruning
(fixednumber
ofhyp
otheses
areadded
toeach
stack)
–--search-algorithm1
turnson
cubepruning
–--cube-pruning-pop-limitNUM
number
ofhyp
otheses
added
toeach
stack
–search
timerough
lylinearwithrespectto
pop
limit
–note:
stacksize
andtranslationtable
limithavelittleim
pactin
speed
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 93: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/93.jpg)
91Syn
taxHyp
othesis
Stack
s
•Onestackper
inputwordspan
•Number
ofhyp
othesiscreated=
sentence
length2×
number
ofhyp
otheses
added
toeach
stack
--cube-pruning-pop-limitNUM
number
ofhyp
otheses
added
toeach
stack
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 94: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/94.jpg)
92Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
•MosesServe
r•
Dataanddom
ainadaptation
•Instructionsto
decoder
•Inputform
ats
•Outputform
ats
•Increm
entalTraining
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 95: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/95.jpg)
93MosesServe
r
•Moses
commandline:
.../moses-f[ini]<[inputfile]>[outputfile]
–Not
practicalforcommercial
use
•Moses
Server:
.../mosesserver-f[ini]--server-port[PORT]--server-log[LOG]
–AcceptHTTPinput.
XMLSOAPform
at•
Client:
–Com
municateviahttp
–Example
clients
inJava
andPerl
–Write
yourow
nclient
–Integrateinto
yourow
napplication
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 96: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/96.jpg)
94Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
•Moses
Server
•Data
anddomain
adaptation
–Train
everythingtogether
–Secon
daryphrase
table
–Dom
ainindicator
features
–Interpolated
langu
agemodels
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 97: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/97.jpg)
95Data
•Parallelcorpora→
translationmodel
–sentence-aligned
translated
texts
–translationmem
oriesareparallelcorpora
–diction
ariesareparallelcorpora
•Mon
olingu
alcorpora→
langu
agemodel
–text
inthetarget
langu
age
–billionsof
wordseasy
tohandle
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 98: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/98.jpg)
96Domain
Adaptation
•Themoredata,
thebetter
•Themorein-dom
aindata,
thebetter
(evenin-dom
ainmon
olingu
aldatavery
valuable)
•Alwaystunetowardstarget
dom
ain
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 99: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/99.jpg)
97Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
•Moses
Server
•Dataanddom
ainadaptation
–Train
everythingtogether
–Secon
daryphrase
table
–Dom
ainindicator
features
–Interpolated
langu
agemodels
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 100: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/100.jpg)
98Default:Train
Eve
rythingTogether
•Easyto
implement
–Con
catenatenew
datawithexistingdata
–Retrain
•Disadvantages:
–Slower
trainingforlargeam
ountof
data
–Cannot
weigh
toldandnew
dataseparately
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 101: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/101.jpg)
99Default:Train
Eve
rythingTogether
Specification
inEMS:
•Phrase-table
[CORPUS]
[CORPUS:in-domain]
raw-stem=....
[CORPUS:background]
raw-stem=....
•LM
[LM]
[LM:in-domain]
raw-corpus=....
[LM:background]
raw-corpus=....
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 102: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/102.jpg)
100
Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
•Moses
Server
•Dataanddom
ainadaptation
–Train
everythingtogether
–Secondaryphrase
table
–Dom
ainindicator
features
–Interpolated
langu
agemodels
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 103: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/103.jpg)
101
SecondaryPhrase
Table
•Train
initialphrase
table
andLM
onbaselinedata
•Train
secondaryphrase
table
andLM
new
/in-dom
aindata
•Use
bothin
Moses
–Secon
daryphrase
table
[feature]
PhraseDictionaryMemorypath=.../path.1
PhraseDictionaryMemorypath=.../path.2
[mapping]
0T0
1T1
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 104: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/104.jpg)
102
SecondaryPhrase
Table
•–
Secon
daryLM
[feature]
KENLMpath=.../path.1
KENLMpath=.../path.2
•Can
give
differentweigh
tsforprim
aryandsecondarytables
•Not
integrated
into
theEMS
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 105: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/105.jpg)
103
SecondaryPhrase
Table
•Terminolog
y/Glossarydatabase
–fixedtranslation
–per
client,project,etc
•Primaryphrase
table
–backoffto
’normal’phrase-table
ifnoglossary
term
[feature]
PhraseDictionaryMemorypath=.../glossary
PhraseDictionaryMemorypath=.../normal.phrase.table
[mapping]
0T0
1T1
[decoding-graph-backoff]
0 1
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 106: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/106.jpg)
104
Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
•Moses
Server
•Dataanddom
ainadaptation
–Train
everythingtogether
–Secon
daryphrase
table
–Domain
indicatorfeatures
–Interpolated
langu
agemodels
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 107: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/107.jpg)
105
Domain
IndicatorFea
tures
•Onetranslationmodel
•Flageach
phrase
pair’sorigin
–indicator:binaryflag
ifitoccurs
inspecificdom
ain
–ratio:
how
oftenitoccurs
inspecificdom
ainrelative
toall
–subset:
similarto
indicator,butifin
multipledom
ains,markedwithmultiple-
dom
ainfeature
•In
EMS:
[TRAINING]
domain-features="indicator"
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 108: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/108.jpg)
106
Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
•Moses
Server
•Dataanddom
ainadaptation
–Train
everythingtogether
–Secon
daryphrase
table
–Dom
ainindicator
features
–Interpolatedlanguagemodels
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 109: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/109.jpg)
107
InterpolatedLanguageModels
•Train
onelangu
agemodel
per
corpus
•Com
binethem
byweigh
tingeach
accordingto
itsim
portance
–weigh
tsob
tained
byop
timizingperplexity
ofresultinglangu
agemodel
ontuningset
(not
thesameas
machinetranslationquality)
–modelsarelinearlycombined
•EMSprovides
asection[INTERPOLATED-LM]that
needsto
becommentedou
t
•Alternative:
use
multiple
langu
agemodels
(disadvantage:larger
process,slow
er)
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 110: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/110.jpg)
108
Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
•Moses
Server
•Dataanddom
ainadaptation
•Instructionsto
decoder
•Inputform
ats
•Outputform
ats
•Increm
entalTraining
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 111: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/111.jpg)
109
SpecifyingTranslationswithXML
•Translationtablesfornumbers?
fe
p(f|e)
2003
2003
0.74
322003
2000
0.04
212003
year
0.02
122003
the
0.01
752003
...
...
•Instruct
thedecoder
withXMLinstruction
therevenuefor<numtranslation="20
03">
2003</num>
ishigher
than...
•Dealwithdifferentnumber
form
ats
ererzielte<numtranslation="17.55">
17,55</num>
Punkte
.
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 112: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/112.jpg)
110
SpecifyingTranslationswithXML
./moses-xml-input[exclusive|inclusive|constraint]
therevenuefor<numtranslation="20
03">
2003</num>
ishigher
than...
Threetypes
ofXMLinput:
•Exclusive
Only
possible
translationisgivenin
XML
•Inclusive
Translationisgivenin
XMLisin
additionto
phrase-table
•Con
straint
Only
use
translationsfrom
phrase-table
ifitmatch
XMLspecification
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 113: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/113.jpg)
111
ConstraintXML
•Specifically
fortranslatingterm
inolog
y
–consistentlytranslateparticularphrase
inadocument
–may
havelearned
larger
phrase
pairs
that
contain
term
inolog
yterm
•Example:
Microsoft<optiontranslation="W
indow
s">
Window
s</option>
8...
•Allowsuse
ofphrase
pairon
lyifmapsW
indow
sto
Window
s
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 114: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/114.jpg)
112
Placeholders
•Translate:
–You
oweme100dollars
!–
You
oweme200dollars
!–
You
oweme9.56dollars
!
•Problem:needtranslationsfor
–100
–200
–9.56
•Som
ethings
arebetteroff
beinghandledby
simple
rules:
–Numbers
–Dates
–Currency
–Nam
edentities
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 115: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/115.jpg)
113
Placeholders
•Input
You
oweme100dollars
!
•Replace
numberswith@num@
You
oweme@num@
dollars
!
•Specification
You
oweme<netranslation="@num@"entity="10
0">@num@</ne>
dollars
!
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 116: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/116.jpg)
114
Walls
andZones
•Specification
ofreorderingconstraints
•Zon
e
sequence
tobetranslated
withou
treorderingwithou
tsidematerial
•Wall
hardreorderingconstraint,nowordsmay
bereordered
across
•Localwall
wallwithin
azone,
not
valid
outsidezone
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 117: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/117.jpg)
115
Walls
andZones:Exa
mples
•Requiringthetranslationof
quoted
materialas
ablock
Hesaid<zone>
”yes
”</zone>.
•Hardreorderingconstraint
Number
1:<wall/>
thebeginning.
•Localhardreorderingconstraintwithin
zone
Anew
plan<zone>
(<wall/>
maybenotnew<wall/>
)</zone>
emerged
.
•Nesting
The<zone>
”new<zone>
(old)</zone>
”</zone>
proposal.
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 118: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/118.jpg)
116
PreservingMarkup
•How
doyoutranslatethis:
<h1>
MyHomePage<
/h1>
Ireallylike
to<b>eat<
/b>
chicken!
•Solution
1:XMLtranslations,wallsandzones
<xtranslation="<h1>"/><wall/>
MyHomePage<wall/>
<xtranslation="</h1>"/>
Ireallylike
to<zone><xtranslation="<b>"/><wall/>
eat<wall/>
<xtranslation="</b
>"/></zone>
chicken
!
(note:
specialXMLcharacters
like<
and>
needto
beescaped)
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 119: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/119.jpg)
117
PreservingMarkup
•Solution
2:Handle
marku
pexternally
–trackwordpositionsandtheirmarku
pI
really
like
to<b>eat<
/b>
chicken
!1
23
45
67
--
--
<b>
--
–translatewithou
tmarku
p Ireallylike
toeatchicken
!
–keep
wordalignmentto
source
Ich
esse
wirklich
gerne
Huhnchen
!1
52
3-4
67
–re-insert
marku
p Ich<b>esse</b
>wirklich
gerneHuhnchen
!
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 120: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/120.jpg)
118
Transliteration
•Langu
ages
arewritten
indifferentscripts
–Russian,BulgarianandSerbian-written
inCyrillic
script
–Urdu,FarsiandPashto
-written
inArabic
script
–Hindi,MarathiandNepalese-written
inDevanagri
•Transliterationcanbeusedto
translateOOVsandNam
edEntities
•Problem:Transliterationcorpusisnot
alwaysavailable
•Naive
Solution
:
–Crawltrainingdatafrom
Wikipedia
titles
–Build
character-based
transliterationmodel
–Replace
OOVwordswith1-besttransliteration
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 121: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/121.jpg)
119
Transliteration
•2methodsto
integrateinto
MT
•Post-decodingmethod
–Use
langu
agemodel
topickbesttransliteration
–Transliterationfeatures
•In-decodingmethod
–Integratetransliterationinsidedecoder
–Wordscanbetranslated
ORtransliterated
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 122: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/122.jpg)
120
Transliteration
•EMS:
[TRAINING]
transliteration-module="yes"
•Post-processingmethod
post-decoding-transliteration="yes"
•In-decodingmethod
in-decoding-transliteration="yes"
transliteration-file=/listofwordstobetransliterated/
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 123: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/123.jpg)
121
Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
•Moses
Server
•Dataanddom
ainadaptation
•Instructionsto
decoder
•Inputform
ats
•Outputform
ats
•Increm
entalTraining
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 124: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/124.jpg)
122
Exa
mple:Missp
eltW
ords
•Misspeltsentence:
Theroom
was*exellentbutthehallway
was
*filty.
•Strategiesfordealingwithspellingerrors:
–Createcorrectsentence
withcorrection
×prob
lem:ifnot
correctedprop
erly,addsmoreerrors
–Createmanysentenceswithdifferentcorrection
s×
prob
lem:haveto
decodeeach
sentence,slow
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 125: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/125.jpg)
123
ConfusionNetwork
Theroom
was*exellentbutthehallway
was
*filty.
Inputto
decoder:
Let
thedecoder
decide
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 126: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/126.jpg)
124
Exa
mple:Diacritics
•Correct
sentence
•Som
ethinganon
-nativepersonmighttype
TrungQuoccanhbao
Myve
luat
tien
te
•Con
fusion
network
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 127: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/127.jpg)
125
ConfusionNetwork
Specifica
tion
Argumenton
commandline
./moses-inputtype1
Inputto
moses
the1.0
room1.0
was1.0
excel0.33excellent0.33excellence0.33
but1.0
the1.0
hallway1.0
was1.0
guilty0.5filthy0.5
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 128: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/128.jpg)
126
Lattice
Exa
mple:ChineseW
ord
Seg
men
tation
•Unsegm
entedsentence
•Incorrectsegm
ention
•Correct
segm
ention
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 129: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/129.jpg)
127
Lattice
Inputto
decoder:
Let
thedecoder
decide
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 130: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/130.jpg)
128
Exa
mple:CompoundSplitting
•Inputsentence
einen
wettbew
erbsbedingtenpreissturz
•Differentcompou
ndsplits
•Let
thedecoder
decide
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 131: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/131.jpg)
129
LatticeSpecifica
tion
Com
mandlineargu
ment
Inputto
Moses
(PLFform
at-Python
Lattice
Format)
./moses-inputtype1
(
(
(’einen’,1.0,1),
),
(
(’wettbewerbsbedingten’,0.5,2),
(’wettbewerbs’,0.25,1),
(’wettbewerb’,0.25,1),
),
(
(’bedingten’,1.0,1),
),
(
(’preissturz’,0.5,2),
(’preis’,0.5,1),
),
(
(’sturz’,1.0,1),
),
)
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 132: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/132.jpg)
130
Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
•Moses
Server
•Dataanddom
ainadaptation
•Instructionsto
decoder
•Inputform
ats
•Outputform
ats
•Increm
entalTraining
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 133: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/133.jpg)
131
N-B
estList
•Input
esgibtverschiedeneanderemeinungen
.
•BestTranslation
therearevariousdifferentopinions.
•Nextninebesttranslationstherearevariousother
opinions.
therearedifferentdifferentopinions.
thereareother
differentop
inions.
weare
variousdifferentopinions.
therearevariousother
opinionsof.
itis
variousdifferentop
inions.
therearedifferentother
opinions.
itis
variousother
opinions.
itis
adifferentop
inions.
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 134: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/134.jpg)
132
UsesofN-B
estLists
•Let
thetranslator
choosefrom
possible
translations
•Reranker
–addmorekn
owledge
sources
–cantake
glob
alview
–coherency
ofwholesentence
–coherency
ofdocument
•Usedto
tunecompon
entweigh
ts
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 135: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/135.jpg)
133
N-B
estLists
inMoses
Argumentto
commandline
./moses-n-bestlistn-best.file.txt[distinct]100
Output
0|||therearevariousdifferentopinions.|||d:0lm:-21.6664w:-6...|||-113.734
0|||therearevariousotheropinions.|||d:0lm:-25.3276w:-6...|||-114.004
0|||therearedifferentdifferentopinions.|||d:0lm:-27.8429w:-6...|||-117.738
0|||thereareotherdifferentopinions.|||d:-4lm:-25.1666w:-6...|||-118.007
0|||wearevariousdifferentopinions.|||d:0lm:-28.1533w:-6...|||-118.142
0|||therearevariousotheropinionsof.|||d:0lm:-33.7616w:-7...|||-118.153
0|||itisvariousdifferentopinions.|||d:0lm:-29.8191w:-6...|||-118.222
0|||therearedifferentotheropinions.|||d:0lm:-30.426w:-6...|||-118.236
0|||itisvariousotheropinions.|||d:0lm:-32.6824w:-6...|||-118.395
0|||itisadifferentopinions.|||d:0lm:-20.1611w:-6...|||-118.434
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 136: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/136.jpg)
134
Sea
rchGraph
•Input
ergehtja
nichtnach
hause
•Return
internal
structure
from
thedecoder
•Encodemillionsof
other
possible
translations
(every
paththrough
thegraph=
1translation)
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 137: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/137.jpg)
135
UsesofSea
rchGraphs
•Let
thetranslator
choose
–Individual
words
orphrases
–’Sugg
est’nextphrase
•Reranker
•Used
totune
compon
ent
weigh
ts
–Morediffi
cultthan
with
n-bestlist
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 138: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/138.jpg)
136
Sea
rchGraphsin
Moses
Argumentto
commandline
./moses-output-search-graphsearch-graph.file.txt
Argumentto
commandline
0hyp=0stack=0forward=36fscore=-113.734
0hyp=75stack=1back=0score=-104.943...covered=5-5out=.
0hyp=72stack=1back=0score=-8.846...covered=4-4out=opinions
0hyp=73stack=1back=0score=-10.661...covered=4-4out=opinionsof
•hyp
-hyp
othesisid
•stack-how
manywordshavebeentranslated
•score-totalweigh
tedscore
•covered-whichwordsweretranslated
bythishyp
othesis
•ou
t-target
phrase
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 139: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/139.jpg)
137
Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
•Moses
Server
•Dataanddom
ainadaptation
•Instructionsto
decoder
•Inputform
ats
•Outputform
ats
•Increm
entalTraining
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 140: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/140.jpg)
138
Adva
ncedFea
tures
•FasterTraining
•FasterDecoding
•Moses
Server
•Dataanddom
ainadaptation
•Instructionsto
decoder
•Inputform
ats
•Outputform
ats
•Increm
entalTraining
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 141: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/141.jpg)
139
Increm
entalTraining
sour
ce te
xt
MT
tran
slat
ion
hum
an tr
ansl
atio
n
MT
engi
ne
post-edit
re-train
translate
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 142: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/142.jpg)
140
Increm
entalTraining
•Increm
entalwordalignment
–requires
modified
versionof
GIZA++
(available
athttp://code.go
ogle.com
/p/inc-giza-pp/)
–on
lyworks
forHMM
alignment(not
thecommon
IBM
Model
4)
•Translationmodel
isdefined
byparallelcorpus
PhraseDictionaryBitextSampling\
path=/path/to/corpus\
L1=sourcelanguageextension\
L2=targetlanguageextension
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 143: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/143.jpg)
141
Update
Word
Alig
nmen
t
•Usesoriginal
wordalignmentmodels
(withadditional
model
filesstored
aftertraining)
•Increm
entalGIZA++
loadsmodel
•New
sentence
pairs
isaligned
onthefly
•Typically,GIZA++
processes
arerunin
bothdirection
s,symmetrized
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 144: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/144.jpg)
142
Update
TranslationModel
•Translationtable
isstored
asword-aligned
parallelcorpus
•Update=
addwordaligned
sentence
pair
•UpdatingarunningMoses
instance
viaXMLRPC
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 145: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/145.jpg)
143
BeyondMoses:
CASMACAT
Workben
ch
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14
![Page 146: Statistical Machine Translation With the Moses ToolkitThe 11th Conference of the Association for M achine Translation in the Americas October 22 – 26, 2014 -- Vancouver, BC Canada](https://reader034.vdocuments.site/reader034/viewer/2022051917/60095eb0daa409218a5034d4/html5/thumbnails/146.jpg)
144
Ack
nowledgem
ents
Hoang,
Huck
andKoehn
MachineTranslationwithOpen
Sou
rceSoftware
Octob
er20
14