types of machine translation

Post on 10-May-2015

857 Views

Category:

Technology

8 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Drop me a mail: Drop me a mail: rushdecoder@yahoo.comVisit me at: Visit me at: http://rushdishams.googlepages.com

1Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh

Translation Approach The translation process may be stated as:

1. Decoding the meaning of the source text2. Re-encoding this meaning in the target

language. Machine translation can use a method

based on linguistic rules- words will be translated in a linguistic way the most suitable words of the target language

will replace the ones in the source language.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 2

Translation Approach The success of machine translation requires

the problem of natural language understanding to be solved first.

Generally, rule-based methods parse a text, usually creating an intermediary, symbolic

representation, from which the text in the target language is

generated.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 3

Translation Approach According to the nature of the intermediary

representation, an approach is described as interlingual machine translation or transfer-based machine translation.

These methods require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 4

Translation Approach Machine translation programs often work

well enough for a native speaker of one language to get the

approximate meaning of what is written by the other native speaker.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 5

Translation Approach the large multilingual corpus of data needed

for statistical methods to work is not necessary for the grammar-based methods.

But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 6

Types of Machine Translation

Text Generation

Syntactic Parsing

Semantic Analysis

Sentence Planning

Source (Arabic)

Target(English)

Transfer Rules

Direct: SMT, EBMT

Interlingua

Rule based MT The rule-based machine translation

paradigm includes 1. transfer-based machine translation, 2. interlingual machine translation and 3. dictionary-based machine translation

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 8

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 9

Transfer based MT Itis necessary to have an intermediate

representation that captures the "meaning" of the original sentence in order to generate the correct translation

In interlingua-based MT this intermediate representation must be independent of the languages in question, whereas in transfer-based MT, it has some dependence on the language pair involved.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 10

Transfer based MT The original text is first analyzed

morphologically and syntactically

in order to obtain a syntactic representation.

This representation can then be refined to a more abstract level putting emphasis on the parts relevant for translation and ignoring other types of information.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 11

Transfer based MT The transfer process then converts this final

representation (still in the original language) to a representation of the same level of abstraction in the target language.

These two representations are referred to as "intermediate" representations.

From the target language representation, the stages are then applied in reverse.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 12

Transfer based MT

Transformation process Morphological analysis

Surface forms of the input text are classified as○ to part-of-speech (e.g. noun, verb, etc.) and ○ sub-category (number, gender, tense, etc.)

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 14

Transformation process Lexical categorization

In any given text some of the words may have more than one meaning, causing ambiguity in analysis.

Lexical categorization looks at the context of a word to try and determine the correct meaning in the context of the input.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 15

Transformation process Lexical transfer

This is basically dictionary translationthe source language lemma (perhaps with sense

information) is looked up in a bilingual dictionary and the translation is chosen.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 16

Transformation process Structural transfer

While the previous stages deal with words, this stage deals with larger constituents

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 17

Transformation process Morphological generation

From the output of the structural transfer stage, the target language surface forms are generated.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 18

Transfer Types Superficial transfer (or syntactic)

This level is characterized by transferring "syntactic structures" between the source and target languages.

It is suitable for languages in the same family or of the same type.

for example in the Romance languages between Spanish, Catalan, French, Italian, etc.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 19

Transfer Types Deep transfer (or semantic)

This level constructs a semantic representation that is dependent on the source language.

This representation can consist of a series of structures which represent the meaning.

In these transfer systems predicates are typically produced. The translation also typically requires structural transfer. This level is used to translate between more distantly related

languages (e.g. Spanish-English or Spanish-Basque, etc.)

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 20

Dependency Grammar

Case Grammar

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 23

Interlingual MT the source language, i.e. the text to be

translated is transformed into an interlingua, i.e., an abstract language-independent representation.

The target language is then generated from the interlingua.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 24

Interlingual MT In the direct approach, words are translated

directly without passing through an additional representation.

In the transfer approach the source language is transformed into an abstract, less language-specific representation.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 25

Interlingual MT

Advantage and disadvantage The advantage in multilingual machine

translations is that no transfer component has to be created for each language pair

The obvious disadvantage is that the definition of an interlingua is difficult and maybe even impossible for a wider domain.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 27

Components Dictionaries for analysis and generation A conceptual lexicon, which is

the knowledge base about events and entities known in the domain.

A set of projection rules (specific to the domain and the languages).

Grammars for the analysis and generation of the languages involved.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 28

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 29

Dictionary-based MT The words will be translated as a dictionary does

— word by word, usually without much correlation of meaning between them

Dictionary lookups may be done with or without morphological analysis or lemmatisation

used to expedite manual translation, if the person carrying it out is fluent in both languages and therefore capable of correcting syntax and grammar.

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 30

Dictionary-based MT

Dictionary-based MT

Example-based MT

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 33

Example-based MT characterized by its use of a bilingual corpus

with parallel texts as its main knowledge base

It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning

Example-based MT characterized by its use of a bilingual corpus

with parallel texts as its main knowledge base

It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning

Example-based MT

Example-based MT bilingual parallel corpora contain sentence

pairs like the example shown in the table. How much is that X ? corresponds to Ano X

wa ikura desu ka. red umbrella corresponds to akai kasa small camera corresponds to chiisai kamera

Example-based MT President Kennedy was shot dead during the

parade. and The convict escaped on July 15th. We could translate the sentence The convict was shot dead during the parade. by substituting the appropriate parts of the sentences.

Statistical MT

Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 39

Statistical MT

The idea behind statistical machine translation comes from information theory.

A document is translated according to the probability distribution p(e | f) that a string e in the target language (for example, English) is the translation of a string f in the source language (for example, French).

Statistical MT

The problem of modeling the probability distribution p(e | f) has been approached in a number of ways. One intuitive approach is to apply Bayes Theorem

where the translation model p(f | e) is the probability that the source string is the translation of the target string, and the language model p(e) is the probability of seeing that target language string string.

Statistical MT Finding the best translation is done by picking up

the one that gives the highest probability

top related