shiva and shakti english-hindi machine translation systemsweb.iiit.ac.in/~papi_reddy/test.pdf ·...
TRANSCRIPT
Shiva and ShaktiEnglish-Hindi Machine Translation
SystemsT.Papi Reddy
papi [email protected]
International Institute of Information Technology-Hyderabad,AP
conference name – p.1/30
Introduction
A Brief History
Different kinds of approaches
Shiva
Shakti
Problems in MT
Lexical Resources needed for MT
Conclusion
conference name – p.2/30
A Brief History
Beginning (1950’s)
Dark period (1964 - 1979)
Re-Birth (1980)
A new paradigm (1990)
MT in India
conference name – p.3/30
Beginning (1950’s)
Very optimistic attitude towards MT
Very high expectations
Mainly from English to Russian and vice-versa
Successfull demonstration of an MT system withrestricted vocabulary and grammar by Georgetownuniversity and IBM in 1956
Attracted huge fundinginspired many other MT projects
Scaling up of restricted vocabulary and grammar didnot produce satisfactory results (infact deteriorated thesystem’s performance)
conference name – p.4/30
Dark period (1964 - 1979)
Report of Automatic Language Processing AdvisoryCommitte (ALPAC)
MT not possible and practical in near future! :(
Funding for MT stopped in US
However research continued in canada and a very fewother places.
conference name – p.5/30
Re-Birth (1979)
Installation of Taum-Meteo system for translation ofweather bulleins from English to French in Canada in1979
Limited subject domainproduced good translation :)
Japanese MT projectsMu project at Kyoto Univ (1982)
Eurotra MT project in Europe for many Europeanlanguages
Demand came from different sources
conference name – p.6/30
New Paradigm (1990)
Candide: A purely statistical MT system from IBM
Usage of corpus based approaches in Japanese MTsystems
Focus changed from purely research to practicalapplications
By the end of 90’s , huge growth in usage of MTsystems for different purposes
conference name – p.7/30
MT in India
MT was taken seriously only in 90’s
Anusaaraka (language accessor) among Indianlanguages built at IITK and University of Hyderabad
English to Hindi MT systemsNCSTCDACIITKSHIVA - CMU, IISC-B, LTRC IIITSHAKTI - LTRC IIIT
conference name – p.8/30
Different approaches
Rule-based : A lingust or a language expert givesnecessary rules to generate translation of sourcelanguage into a target language
Corpus-based : Machine tries to learn the rulesnecessary for translation from a parallel corpus.
StatisticalExample-based
Ex: Shiva
Hybrid approachA combination of both the above approachesEx: Shakti
conference name – p.9/30
Shiva
Example-based
Requires huge amounts of parallel corpora
Learns by associating word sequences in English withword sequences in Hindi
English-Hindi parallel corpora not readily available! :(
Controlled corporaphrasal and sentence level translations
conference name – p.10/30
Shakti
Hybrid system
Uses both the rule-based and corpus-basedapproaches
In addition to the rules provided by the languageexpert, it uses corpus based techniques in translation
conference name – p.11/30
Architecture
Source language(English) dependent modules
English sentence
English sentence Analysis
WSD
parsed output
senses marked
conference name – p.12/30
Architecture
Bilingual modules
parsed output
Transfer Grammar
English-Hindi Dictornary lookup
TAM lookup
reordered output
Hindi words substituted
Hindi TAM substitutedconference name – p.13/30
Architecture
Source Language (Hindi) sentence Synthesis
representation in Hindi
vibhakti Insertion
Agreement
Word form generation
Hindi sentence
vibhaktis inserted
approriate word forms generated
conference name – p.14/30
English Sentence Analysis
Morphological analyzer
root and other features of each wordmore than one analsysis possible
went (root : go, lex_cat : v)saw (root : see, lex_cat : v) orsaw (root : saw, lex_cat : v)
Parts of speach (POS) tagger
single parts of speech tag for each worduses statistical language model
conference name – p.15/30
English Sentence Analysis
Chunker
verb groupsnoun phrases
Ex: Godhra_FW (( is_VBZ simmering_VBG ))with_IN [[ anger_NN ]]
Parser
sentence analysisintermediate representation
conference name – p.16/30
Bilingual Analysis
Transfer Rules
Result is a re-ordered intermediate representationfor Hindi.
Word substitution from Bilingual dictionary.sense taken into account
TAM substitution from Bilingual TAM dictionary.
conference name – p.17/30
Transfer grammar
Reordering rules
(Ram)1 (saw)2 (the girl)3 (in the garden)4(Ram)1 (the girl)3 (the garden in)4 (saw)2Mirror principle
Verb Frames (TransLexGram)
verb : meetverb frameEnglish : A meets BHindi : A B sao milaa
Machine learned rules
very specific rulesless coverage
conference name – p.18/30
Tense Aspect Modality(TAM)
Tense Aspect Modality(TAM)
Ram is going home
TAM : is_ing
Hindi TAM : 0_ rha hO
conference name – p.19/30
Engineering Principle
Non-lexical grammar: 6 rulesmirror principle
Category based grammar : 60 verb classes
Lexicalized grammar : 6000 verbsTransLexGram
Fine grained grammar : 60,000 ruleslearn from a corpus
conference name – p.20/30
Hindi Sentence Synthesis
Vibhakti Insertionrule based
if verb = transitive and tense = past thensubj + nao
corpus based
Agreementwith in a chunkacross the chunk
conference name – p.21/30
Hindi Sentence Synthesis
Agreement of verbsubject if no vibhaktielse object if no vibhakti elseneutral form (masculine, singular, third person)
Word form generatortakes the root word and its featuresgives appropriate word form
jaafeatures : v f s 3 0_ raha hOjaa rhI hO
conference name – p.22/30
Problems in MT
Multiple Word Senses
Syntactic and Semantic ambiguities
Multiple analysis of a single sentence
Preposition disambiguation
Different sentence structures
Idioms
conference name – p.23/30
Multiple Word Senses
bankmoney bank?river bank?
Ex: I deposited hundred rupees in the bank
maO nao baOMk maoM saÝ $Payao jamaa ikyao
conference name – p.24/30
Mutliple Word Senses
Ex: She ate lunch in the bank
]sanao baOMk maoM daoPahr ka #aanaa #aayaaEx: She ate lunch on the bank
]sanao iknaara Par daoPahr ka #aanaa #aayaaUses WASP for word sense disambiguation
conference name – p.25/30
Mutliple Word Senses
had
Ex: She had lunch in the restaurant
]sanao jalaPaana-gaRh maoM daoPahr ka #aanaa #aayaaEx: She had tea in the restaurant
]sanao jalaPaana-gaRh maoM caaya iPayaa
conference name – p.26/30
Different Setence Structures
QuestionsWhat, Where, When . . .Yes/No
relative clauses
conference name – p.27/30
Lexical Resources
Bilingual dictionary
TAM dictionary
TransLexGramTransfer lexicon (Bilingual dictionary)Transfer grammarverb framesExample parallel sentences
Phrazal dictionary
conference name – p.28/30
Lexical Resources
Idiom dicionary
Wordnet (indian languages)
Word form generator
Annotated corporaPOS taggingtree banks
Transfer rules
conference name – p.29/30
Conclusion
It is necessary for the linguists and language experts towork hand in hand. Only then we can build practicalsystems
conference name – p.30/30