latest developments in (s)mt

72
Latest Developments in (S)MT Harold Somers University of Manchester MT Wars II: The Empire* strikes back * Linguistics

Upload: shea

Post on 15-Jan-2016

83 views

Category:

Documents


0 download

DESCRIPTION

Latest Developments in (S)MT. MT Wars II: The Empire* strikes back. Harold Somers University of Manchester. * Linguistics. Overview. The story so far EBMT SMT Latest developments in RBMT Is there convergence? Some attempts to classify MT (Carl and Wu’s MT model spaces) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Latest Developments in (S)MT

Latest Developments in (S)MT

Harold SomersUniversity of Manchester

MT Wars II:The Empire* strikes back

* Linguistics

Page 2: Latest Developments in (S)MT

Overview

The story so far EBMT SMT Latest developments in RBMT

Is there convergence? Some attempts to classify MT (Carl and Wu’s MT model spaces)

Has the empire struck back?

Page 3: Latest Developments in (S)MT

The story so far: EBMT Early history well known

Nagao (1981/3) Early development as part of RBMT Relationship with Translation Memories Focus (cf. Somers 1998) on

Matching algorithms Selection and storage of examples Mainly sentence-based TL generation (Recombination) not much

addressed

Somers, H. (1998) New paradigms in MT, 10th European Summer School in Logic, Language and Information, Workshop on MT, Saarbrücken; revised version in Machine Translation 14 (1999) and 2nd revised version in M. Carl & A. Way (2003) Recent Advances in EBMT (Kluwer).

Page 4: Latest Developments in (S)MT

EBMT in a nutshell(In case you’ve been on Tatooine for the last 15 years)

Database of paired examples Translation involves

Finding the best example(s) (matching) Identifying which bits do(n’t) match (alignment) Replacing the non-matching bits (if multiple

examples, gluing them together) (recombination)

All of the above at run-time

Page 5: Latest Developments in (S)MT

The operation was interrupted because the file was hidden.a. The operation was interrupted because the Ctrl-c key was pressed. L’opération a été interrompue car la touché Ctrl-c a été enfoncée.b. The specified method failed because the file is hidden. La méthode spécifiée a échoué car le fichier est masqué

EBMT in a nutshell (cont.)

Main difficulty is “boundary friction” in two senses:

The old man is dead : Le vieil homme est mortThe old woman is dead : * Le vieil femme est mort

Page 6: Latest Developments in (S)MT

EBMT later developments

Example generalisation (templates) Incorporation of linguistic resources

and/or statistical measures Structured representation of

examples Use of statistical techniques

Page 7: Latest Developments in (S)MT

Example generalisation (Furuse & Iida, Kaji et al., Matsumoto et al., Carl, Cicekli &

Güvenir, Brown, McTait, Way et al.) Similar examples can be combined to give

a more general example Can be seen as a way of generating

transfer rules (and lexicons) Process may be entirely automatic, based

on string matching … … or “seeded” using linguistic information

(POS tags) or resources (bilingual dictionary)

Page 8: Latest Developments in (S)MT

Example generalisation (cont.)

The dog ate a rabbit inu wa usagi o tabeta

monkey saru man hito

The … ate a peach … wa momo o tabeta

The monkey ate a peach saru wa momo o tabeta The man ate a peach hito wa momo o tabeta

dog inurabbit usagi

The …x ate a ...y …x wa …y tabeta

Page 9: Latest Developments in (S)MT

Example generalisation (cont.)

That’s too simple (e.g. because of boundary friction)

Need to introduce constraints on the slots, e.g. using POS tags and morphological information (which implies some other processing)

Can use clustering algorithms to infer substitution sets

Page 10: Latest Developments in (S)MT

Incorporation of linguistic resources Actually, early EBMT used all sorts of

linguistic resources Briefly there was a move towards more

“pure” approaches Now we see much use of POS tags

(sometimes only partial, e.g. marker words – Way et al.), morphological analysis (as just mentioned), bilingual lexicons

Target-language grammars for recombination/generation phase

Page 11: Latest Developments in (S)MT

Incorporation of statistical measures Example database preprocessed to assign

weights (probabilities) to fragments and their translations (Aramaki et al.) Good way of handling “ambiguities” due to

alternative translations Clustering words into equivalence classes

for example generalization (Brown) Using statistical tools to extract translation

knowledge from parallel corpora (Yamamoto & Matsumoto)

Statistically induced grammars for translation or generation, as in ...

Page 12: Latest Developments in (S)MT

Use of structured representations

Again, a feature of early EBMT, now reappearing

Translation grammars induced from the example set

Examples stored as tree structures (overwhelmingly: dependency structures)

Page 13: Latest Developments in (S)MT

Translation grammars

Carl: generates translation grammars from aligned linguistically annotated texts

Way: Data-Oriented Translation based on Poutsma’s DOP, using both PS and LFG models)

Page 14: Latest Developments in (S)MT

Structured examples Use of tree comparison algorithms to

extract translation patterns from parsed corpora/tree banks (Watanabe et al.)

Translation pairings extracted from aligned parsed examples (Menezes & Richardson)

Tree-to-string approach used by Langlais & Gotti and Liu et al. (+ statistical generation model)

Page 15: Latest Developments in (S)MT

Typical use of structured examples Rule-based analysis and generation +

example-based transfer Input is parsed into representation using a

traditional or statistics-based analyser TL representation constructed by combining

translation mappings learned from the parallel corpus

TL sentence generated using a hand-written or machine-learned generation grammar

Is this still EBMT? Note that the only example-based part is

use of mappings which are learned, not computed at run-time

Page 16: Latest Developments in (S)MT

Pure EBMT (Lepage & Denoual)

In contrast (but now something of an oddity): pure analogy-based EBMT

Use of proportional analogies A:B::C:D Terms in the analogies are translation

pairs A→A’: B→B’:: C→C’: D→D’

Page 17: Latest Developments in (S)MT
Page 18: Latest Developments in (S)MT

Pure EBMT

No explicit transfer No extraction of symbolic knowledge

No use of templates Analogies do not always represent any

sort of linguistic reality No training or preprocessing

Solving the proportional analogies is done at run-time

Page 19: Latest Developments in (S)MT

The story so far (SMT) Early history well known

IBM group inspired by improved results in speech recognition when non-linguistic approach taken

Availability of Canadian Hansards inspired purely statistical approach to MT (1988)

Immediate partial success (60%) to the dismay of MT people

Early observers (Wilks) predicted hybrid methods (“stone soup”) would evolve

Later developments Phrase-based SMT Syntax-based SMT

Page 20: Latest Developments in (S)MT

SMT in a nutshell(In case you’ve been on Kamino for the last 15 years) From parallel corpus two sets of statistical

data are extracted Translation model: probabilities that a given

word e in the SL gives rise to a word f in the TL (Target) language model: most probable word-

order for the words predicted by the translation model

These two models are computed off-line Given an input sentence, a “decoder”

applies the two models, and juggles the probabilities to get the best score; various methods have been proposed

Page 21: Latest Developments in (S)MT

SMT in a nutshell (cont.) The translation model has to take into

account the fact that for a given e in there may be various different fs

depending on context (grammatical variants as well as alternatives due to polysemy or homonymy)

a given e may not necessarily correspond to a single f, or any f at all: “fertility”

(e.g. may have → aurait; implemented → mis en application)

Page 22: Latest Developments in (S)MT

SMT in a nutshell (cont.) The language model has to take into

account the fact that The TL words predicted by the translation model

will not occur in the same order as the SL words: “distortion”

TL word choices can depend on neighbouring words (which may be easy to model) or, especially because of distortion, more distant words: “long-distance dependencies”, much harder to model

Page 23: Latest Developments in (S)MT

SMT in a nutshell (cont.) Main difficulty: combination of fertility and

distortion:Zeitmangel erschwert das Problem.Lack of time makes the problem more difficult.Eine Diskussion erübrigt sich demnach.Therefore there is no point in discussion.Das ist der Sache nicht angemessen.That is not appropriate for this matter.Den Vorschlag lehnt die Kommission ab.The Commission rejects the proposal.

Page 24: Latest Developments in (S)MT

SMT later developments Phrase-based SMT Extend models beyond individual words to

word sequences (phrases) Direct phrase alignment Word alignment induced phrase model Alignment templates

Results better than word-based models, and show improvement proportional (log-linear) to corpus size

Phrases do not correspond to constituents, and limiting them to do so hurts results

Page 25: Latest Developments in (S)MT

Direct phrase alignment(Wang & Waible 1998, Och et al., 1999, Marcu & Wong 2002)

Enhance word translation model by adding joint probabilities, i.e. probabilities for phrases

Phrase probabilities compensate for missing lexical probabilities

Easy to integrate probabilities from different sources/methods, allows for mutual compensation

Page 26: Latest Developments in (S)MT

Word alignment induced modelKoehn et al. 2003; example stolen from Knight & Koehn

http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/tutorial2003.pdf

Maria did not slap the green witch

Maria no daba una botefada a la bruja verda

Start with all phrase pairs justified by the word alignment

Page 27: Latest Developments in (S)MT

Word alignment induced modelKoehn et al. 2003; example stolen from Knight & Koehn

http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/tutorial2003.pdf

(Maria, Maria), (no, did not)(daba una botefada, slap),(a la, the), (verde, green), (bruja, witch)

Page 28: Latest Developments in (S)MT

Word alignment induced modelKoehn et al. 2003; example stolen from Knight & Koehn

http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/tutorial2003.pdf

(Maria, Maria), (no, did not)(daba una botefada, slap),(a la, the), (verde, green) (bruja, witch), (Maria no, Maria did not), (no daba una botefada, did not slap),(daba una botefada a la, slap the), (bruja verde, green witch)

etc.

Page 29: Latest Developments in (S)MT

Word alignment induced modelKoehn et al. 2003; example stolen from Knight & Koehn

http://www.iccs.inf.ed.ac.uk/~pkoehn/publications/tutorial2003.pdf

(Maria, Maria), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch), (verde, green), (Maria no, Maria did not), (no daba una bofetada, did not slap),(daba una bofetada a la, slap the), (bruja verde, green witch),(Maria no daba una bofetada, Maria did not slap),(no daba una bofetada a la, did not slap the), (a la bruja verde, the green witch),(Maria no daba una bofetada a la, Maria did not slap the),(daba una bofetada a la bruja verde, slap the green witch),(no daba una bofetada a la bruja verde, did not slap the green witch),(Maria no daba una bofetada a la bruja verde, Maria did not slap the green witch)

Page 30: Latest Developments in (S)MT

Word alignment induced model

Given the phrase pairs collected, estimate the phrase translation probability distribution by relative frequency (without smoothing)

Page 31: Latest Developments in (S)MT

Alignment templatesOch et al. 1999; further developed by Marcu and Wong

2002, Koehn and Knight 2003, Koehn et al. 2003) Problem of sparse data worse for phrases So use word classes instead of words

alignment templates instead of phrases more reliable statistics for translation table smaller translation table more complex decoding

Word classes are induced (by distributional statistics), so may not correspond to intuitive (linguistic) classes

Takes context into account

Page 32: Latest Developments in (S)MT

Problems with phrase-based models

Still do not handle very well ... dependencies (especially long-distance) distortion discontinuities (e.g. bought = habe ... gekauft)

More promising seems to be ...

Page 33: Latest Developments in (S)MT

Syntax-based SMT

Better able to handle Constituents Function words Grammatical context (e.g. case marking)

Inversion Transduction Grammars Hierarchical transduction model Tree-to-string translation Tree-to-tree translation

Page 34: Latest Developments in (S)MT

Inversion transduction grammars

Wu and colleagues (1997 onwards) Grammar generates two trees in

parallel and mappings between them Rules can specify order changes Restriction to binary rules limits

complexity

Page 35: Latest Developments in (S)MT

Inversion transduction grammars

Page 36: Latest Developments in (S)MT

Inversion transduction grammars Grammar is trained on word-aligned

bilingual corpus: Note that all the rules are learned automatically

Translation uses a decoder which effectively works like traditional RBMT: Parser uses source side of transduction rules to

build a parse tree Transduction rules are applied to transform the

tree The target text is generated by linearizing the

tree

Page 37: Latest Developments in (S)MT
Page 38: Latest Developments in (S)MT
Page 39: Latest Developments in (S)MT

Almost all possible mappings can be handledMissing ones (crossing constraints) are not found in Wu’s corpusBut examples can be found, apparently

Page 40: Latest Developments in (S)MT

Hierarchical transduction model

(Alshawi et al. 1998) Based on finite-state transducers, also

uses binary notation Uses automatically induced

dependency structure Initial head-word pair is chosen Sentence is then expanded by

translating the dependent structures

Page 41: Latest Developments in (S)MT

Tree-to-string translation

(Yamada & Knight 2001, Charniak 2003) Uses (statistical) parser on input side

only Tree is then subject to reordering and

insertion according to models learned from data

Lexical translation is then done, again according to probability models

Page 42: Latest Developments in (S)MT

wa wa

reorder

insert

translate

linearize kare ha ongaku wo kiku no ga daisuki desuwa

Page 43: Latest Developments in (S)MT

Tree-to-tree translation

(Gildea 2003) Use parser on both sides to capture

structurual differences Subtree cloning(Habash 2002, Čmejrek et al. 2003) Full morphology/syntactic/semantic

parsing All based on stochastic grammars

Page 44: Latest Developments in (S)MT

Latest developments in RBMT RBMT making a come-back (e.g. METIS) Perhaps it was always there, just wasn’t

represented in CL journals/conferences There is some activity, but around the

periphery Open-source systems development for low-density languages

Much use made of corpus-derived modules, eg tagging, chunking

SMT is now RBMT, only the rules are learned rather than written by linguists

Page 45: Latest Developments in (S)MT

Overview

The story so far EBMT SMT Latest developments in RBMT

Is there convergence? Some attempts to classify MT (Carl and Wu’s MT model spaces)

Has the empire struck back?

Page 46: Latest Developments in (S)MT

Classifications of MT Empirical vs. Rationalist

data- vs theory-driven use (or not) of symbolic representation From MLIM chapter 4:

high vs. low coverage low vs. high quality/fluency shallow vs. deep representation

Distinguish in the above design vs. consequence How true are they anyway?

Page 47: Latest Developments in (S)MT

EBMT~SMT: Is there convergence?

Lively debate on mtlist Articles by

Somers, Turcato & Popowich in Carl & Way (2003)

Hutchins, Carl, Wu (2006) in special issue of Machine Translation

Slides marked need your input!!

Page 48: Latest Developments in (S)MT

Essential features of EBMT

Use of bilingual corpus data as the main (only?) source of knowledge (Somers) Most early EBMT systems were hybrids

We do not know a priori which parts of example are relevant (Turcato & Popowich) Raw data is consulted at run-time: (little or) no

preprocessing Therefore template-based EBMT is already a

hybrid (with RBMT) Act of matching the input against the examples,

regardless of how they are stored (Hutchins)

Page 49: Latest Developments in (S)MT

Pros (and cons) of analogy model Like CBR:

Library of cases used during task performance Analogous examples broken down, adapted,

recombined In contrast with other machine learning methods

Offline learning to compile abstract performance model

No loss of coverage due to incorrect generalization during training

Guaranteed correct when input is exactly like an example in the training set (not true of SMT)

But: Lack of generalization leads to potential runtime inefficiency

(Wu, 2006)

Page 50: Latest Developments in (S)MT

EBMT~SMT: Common features Easily agreed

Use of bilingual corpus data as the main (only?) source of knowledge

Translation relations are derived automatically from the data

Underlying methods are independent of language-pair, and hence of language similarity

More contentious Bilingual corpus data should be real (a practical

issue for SMT, but some EBMT systems use “hand-crafted” examples)

System can be easily extended just by adding more data

Page 51: Latest Developments in (S)MT

EBMT~RBMT common features

Hybrid is easy to conceive Rule-based analysis/generation with

example-based transfer Example-based processing only for

awkward cases

!

Page 52: Latest Developments in (S)MT

SMT~RBMT common features

Some versions of SMT exactly mirror classic RBMT parse-transfer-generate

Same things are hard Long-distance dependency Discontinuous constituents

!

Page 53: Latest Developments in (S)MT

Wu’s 3D classification of all MT Example-based vs.

schema-based abstraction or

generalization performed at run-time

Compositional vs. lexical Relates primarily to

transfer (or equiv.) Statistical vs. logical Pictures also show

historical development

Page 54: Latest Developments in (S)MT

Classic (direct and transfer) MT models

Early systems (Georgetown) lexical and compositional

Treatment of idioms, collocations, phrasal translations in classical 2G transfer systems

Modern RBMT systems starting to adopt statistical methods (according to Wu)

Where do commercial systems sit?

Page 55: Latest Developments in (S)MT
Page 56: Latest Developments in (S)MT

EBMT systems

Page 57: Latest Developments in (S)MT

SMT systems

Page 58: Latest Developments in (S)MT

Example-based SMT systems

Page 59: Latest Developments in (S)MT

Summary

Page 60: Latest Developments in (S)MT

Model space corpus-based MT (Carl 2000)

Based on Dummett’s theory of meaning

Rich vs austere Complexity of representations

Molecular vs holistic Descriptions based on finite set of

predefined features vs global distinctions Fine-grained vs coarse-grained

Based on smaller or larger units

Page 61: Latest Developments in (S)MT

Rich vs austere Translation memories are most austere,

depending only on graphemic similarity TMs with annotated examples (eg Planas &

Furuse) are richer Early EBMT systems, and recent systems where

examples are generalized are rich EBMT using light annotation (eg TAGS, markers)

are moderately rich Pure EBMT (Lepage & Denoual) is austere Early SMT systems were austere, but move

towards syntax makes them richer Phrase-based SMT still austere

Page 62: Latest Developments in (S)MT

Annotatedtranslation memories Classic EBMT

(Sato, Nagao)

Template-based EBMT (McTait, Brown, Cicekli)

Phrase-based SMT Syntax-based

SMTMarker-based EBMT (Way)

EBMT where examples are

lightly annotated

Translation memories

Early SMT (Brown et al.)

Pure EBMT(Lepage)

METIS

Page 63: Latest Developments in (S)MT

Molecular vs holistic Early SMT purely holistic, as is pure

EBMT TMs molecular: distance measure

based on fixed set of symbols Translation templates are holistic, but

molecular if they depend on some sort of analysis

Phrase-based and syntax-based SMT highly molecular

Page 64: Latest Developments in (S)MT

Annotatedtranslation memories

Classic EBMT(Sato, Nagao)

Template-based EBMT (Cicekli)

Phrase-based SMT

Syntax-based SMT

Marker-based EBMT (Way)

EBMT where examples are

lightly annotated

Translation memories

Early SMT (Brown et al.)

Pure EBMT(Lepage)

Template-based EBMT (McTait,

Brown)

METIS analysis

METIS generation

Page 65: Latest Developments in (S)MT

Coarse- vs. fine-grained Coarse-grained translates with bigger

units TM system works only on sentences:

coarse-grained Word-based systems are fine-grained:

Early SMT Phrase-based SMT slightly more

coarse-grained Template-based EBMT fine-grained

!

Page 66: Latest Developments in (S)MT

Phrase-based SMT

Marker-based EBMT (Way)

Translation memories

Early SMT (Brown et al.)

coarse

fine

Template-based EBMT (McTait,

Brown)

Page 67: Latest Developments in (S)MT

Overview

The story so far EBMT SMT Latest developments in RBMT

Is there convergence? Some attempts to classify MT (Carl and Wu’s MT model spaces)

Has the empire struck back?

Page 68: Latest Developments in (S)MT

Has the empire struck back?

Is linguistics back in MT? Was MT ever of interest to linguists?

Is SMT like RBMT?

!

Page 69: Latest Developments in (S)MT

Vauquois triangle

To what extent can a given system be described in terms of the classic view of MT (G2) ?

!Interlingua

Transfer

Direct translation

Ana

lysi

s

Generation

Page 70: Latest Developments in (S)MT

Has the empire struck back?

Is linguistics back in MT? Was MT ever of interest to linguists?

Is SMT like RBMT?

!

As predicted by Wilks (“Stone soup” talk, 1992) way forward is hybrid

Negative experience (for me) of seeing SMT presenters rediscovering problems first described by Yngve, Vauquois ...

... without referencing the original papers!

Page 71: Latest Developments in (S)MT

IT’S LIFE, JIM, BUT NOT AS WE KNOW IT.

LINGUISTICS

Page 72: Latest Developments in (S)MT

SMT EBMT

RBMT

!

Fill in the gaps

Annotatedtranslation memories

Classic EBMT(Sato, Nagao)

Template-based EBMT (Cicekli)

Phrase-based SMT

Syntax-based SMT

Marker-based EBMT (Way)

EBMT where examples are

lightly annotated

Translation memories

Early SMT (Brown et al.)

Pure EBMT(Lepage)

Template-based EBMT (McTait,

Brown)