motivations for transfer-based translation
DESCRIPTION
Motivations for transfer-based translation. lexical ambiguity structural differences See further Ingo 91. Example 1. Sv. Fyll på olja i växellådan. En. Fill gearbox with oil. (from the Scania corpus) fyll på fill obj adv adv obj. Example 2. - PowerPoint PPT PresentationTRANSCRIPT
Motivations for transfer-based translation
• lexical ambiguity
• structural differences
See further Ingo 91
Example 1
Sv. Fyll på olja i växellådan. En. Fill gearbox with oil.(from the Scania corpus)
• fyll på fill
• obj adv
• adv obj
Example 2
Sv. I oljefilterhållaren sitter en överströmningsventil.
En. The oil filter retainer has an overflow valve.(from the Scania corpus)
• sitter has• adv subj• subj obj
Transfer-based translation
• intermediary sentence structure• basic processes
– analysis– transfer– generation (synthesis)
• language modules– dictionary and grammar of SL– transfer dictionary and transfer rules– dictionary and grammar of TL
SL TL
Interlingua
Direct translation
Transfer
Multra
Metal
Levels of intermediary structure
• cf. J&M, Chapter 21
• word order
Metal
• See H&S
MULTRA
Multilingual Support for Translation and Writing• translation engine• transfer-based
– shake-and-bake
• modular• unification-based• preference machinery• trace-able
Analysis
• chart parser (Lisp C)– procedural formalism
• unification and other kinds of operations
• sentence structure– feature structure– grammatical relations– surface order implicit via grammatical relations
See further Sågvall Hein&Starbäck (99),Weijnitz (02), Dahllöf (89)
Transfer
• unification-based• declarative formalism
– Multra transfer formalism (Beskow 93) • lexical and structural rules
• rules are partially ordered• a more specific rule takes precedence over a
less specific one– specificity in terms of number of transfer equations
• all applicable rules are applied• written in prolog
Generation
• syntactic generation– Multra syntactic generation formalism (Beskow 97a)– PATR-like style
• unification• concatenation• typed features
• morphological generation (Beskow 97b)– lexical insertion rules– morphological realisation and phonological finish in
prolog
• written in prolog
An example: Tippa hytten.Tippa hytten. :
(* = (PHR.CAT = CL MODE = IMP
SUBJ = 2ND VERB = (WORD.CAT = VERB INFF = IMP DIAT = ACT LEX = TIPPA.VB.1
VSURF = +) OBJ.DIR = (PHR.CAT = NP NUMB = SING GENDER = UTR CASE = BASIC DEF = DEF HEAD = (LEX = HYTT.NN.1 WORD.CAT = NOUN))) REG = (V1.LEM = TIPPA.VB) SEP = (WORD.CAT = SEP LEX = STOP.SR.0)))
Transfer structureTransfer structure
[VERB : [WORD.CAT : VERB LEX : TILT.VB.0 DIAT : ACT INFF : IMP] OBJ.DIR : [PHR.CAT : NP DEF : DEF NUMB : SING HEAD : [WORD.CAT : NOUN LEX : CAB.NN.0]] MODE : IMP SUBJ: 2ND VSURF: + SEP : [WORD.CAT : SEP LEX : STOP.SR.0] PHR.CAT : CL]
Generation
Tilt the cab.
A grammar rule
defrule legal.obj {<?1 phr.cat> = 'np,not <?1 case> = 'gen, not <?1 case> = 'subj
}
Transfer rules
• copy feature
• delete feature
• transfer feature
• assign feature
Copy feature
LABEL modeSOURCE <* mode> = ?x1TARGET <* mode> = ?x2TRANSFER
Delete feature
LABEL REGSOURCE <* REG> = ANYTARGET <*> = <*> TRANSFER
Transfer feature
LABEL OBJ.DIRSOURCE <* OBJ.DIR> = ?x1TARGET <* OBJ.DIR> = ?x2TRANSFER ?x1 <=> ?x2
Define feature
LABEL trycka.in-pressSOURCE <* lex sym>=trycka.vb+in.ab.1 <* word.cat>=VERBTARGET <* lex>=press.vb.1 <* word.cat>=VERBTRANSFER
A generation rule
LABEL CL.IMPX1 ---> X2 X3 X4 : <X1 PHR.CAT> = CL <X1 VERB> = <X2> <X1 TYPE> = IMP <X1 OBJ.DIR> = <X3> <X1 SEP> = <X4>
A contextual lexical ruleLABEL tänka.på-think.aboutSOURCE <* verb lex sym> = tänka.vb.1 <* obj.prep phr.cat> = pp <* obj.prep prep> = ?prep <* obj.prep prep lex sym> = på.pp.1 <* obj.prep rect> = ?rect1TARGET <* obj.prep phr.cat> = pp <* obj.prep prep word.cat> = PREP <* obj.prep prep lex> = about.pp.1 <* obj.prep rect> = ?rect2TRANSFER ?rect1<=>?rect2
A generation trace
1-Applying Rule cl-sep
1- Applying Rule cl.imp
1- Applying Rule subj2nd-verb-obj.dir
1- Applying Rule verb.main.act
1- Applying Rule np.the-df
1- Applying Rule ng.noun-def
1-Success!
Language resources in the MATS system
• dictionary in a database with different views
• analysis grammar
• transfer grammar– incl. contextually defined lexical rules
• generation grammar
sv-en_LinkLexicon
en-Inflections
en_LemmaLexicon
en_LexemeLexicon
en_Lexicon
en_StemLexicon
sv_Inflections
sv_LemmaLexicon
sv_LexemeLexicon
sv_Lexicon
sv_StemLexicon
The MATS system
Frozen demo…
Assignment 2: Working with MATS
http://stp.ling.uu.se/~evapet/mt04/assignment2.html
Lexicalistic translation
• Identify (lexical) translation units in the source sentence
• Translate each unit separately (considering the context)
• Order the result in agreement with a model of the target language
Formulation due to Lars Ahrenberg; see further AH (reading list) ; see also Beaven, L. John, Shake-and-Bake Machine Translation. Coling –92, Nantes, 23-28 Aout 1992.
T4F – a lexicalistic system
• processes in T4F– tokenisation– tagging– transfer– transposition– filtering
See further AH (in the reading list)
Interlingua translation
• See SN
Applications of alignment
• translation memories
• translation dictionaries
• lexicalistic translation
• statistical machine translation
• example-based translation
Translation memories
• based on sentence links
• optionally, sub sentence links
See further Macklovitch, E. (2000)
Translation dictionaries
• based on word links
• refinement of word links
Refinement of word alignment data
• neutralise capital letters where appropriate• lemmatise or tag source and target units• identify ambiguities
– search for criteria to resolve them
• identify partial links– compounds?– remove or complete them
• manual revision?
Informally about statistical MT
• build a translation dictionary based on word alignment
• aim for as big fragments as possible• keep information on link frequency• build an n-gram model of the target language• implement a direct translation strategy
– including alternatives ordered by length and frequency
• process the output by the n-gram model filtering out the best alternatives and adjust the translation accordingly
Example-based MT
HS (in the reading list)
Some current research topics
• intersentential dependences• hybrid systems: data-driven and rule-driven• improved alignment techniques• improved language modeling in ST• automatic learning from post-editing• translation by structural correspondences• translation of spoken language• improved preference strategies• ambiguity preserving translation
Intersentential dependencies
• pronoun resolution
• lexical ambiguity resolution, such as– (torkar)motorn the motor– (förbrännings)motorn the engine
• fluency
Preserving the information structure
• information structure is expressed in different ways in the source and the target
• syntactic clues are exploited in the analysis to compute the information structure (topic-focus articulation)
• information structure is used to guide the generation
An example
Torkarmotorn M2 är sammankopplad med omkopplare S24 och intervallrelä R22. För att inte motorn skall överbelastas, t.ex. om torkarbladen fastnat, finns en inbyggd termovakt som bryter strömmen till motorn när …
Wiper motor M2 is connected to switch S24 and intermittent relay R22. To prevent motor overload, e.g. if the wiper blade gets stuck, there is an integral thermal sensor which breaks the current to the motor when …
Preferences
• syntactic preferences– the principle of right association– the principle of minimal attachment– two-stage processing
• semantic preferences– lexical selectional restrictions– lexical contextual rules– conceptual taxonomies– likelihood of occurrence
See further Bennet, P. & Paggio, P., 1993, Preference in Eurotra.
Preferences in Multra
• parsing– a formalism for expressing syntactic
preferences in the parse• not fully developed
• transfer– contextual lexical rules– rule specificity
• generation– rule specificity
Hybrid systems
• aims
• components
• problems
• architecture
• scores
Aims of a hybrid system
• simple techniques for simple tasks
• complex techniques for complex tasks
Components of a hybrid systems
• component strategies– translation memory
• full sentences• fragments
• direct translation– statistical translation– ebmt
Component strategies, cont’d
• rule-based translation– simplistic analysis (cf. direct translation)
• word by word (S sequence of words)• phrase by phrase (S sequence of phrases)
– partial parsing– full parsing
Problems of a hybrid system
• how does the system know when a simple technique is appropriate?– does the source tell?– does the target tell?
Architecture and scores
• simple first?
• concerting results?
• scoring?
Improved techniques for re-use of translation
• combining clues for word alignment (Tiedemann 2003)
• interactive word alignment (Ahrenberg et al. 2003)
• parallel treebanks
Translation by structural correspondences
• LFG
• HPSG
Translation of spoken language
See
Krauver, Steven (ed.), 2000, Machine Translation, June 2000. Volume 15, Issue 1-2, Special issue on Spoken Language Translation.