automatic translation of nominal compound into hindi prashant mathur iiit hyderabad soma paul iiit...

39
Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Upload: claire-casey

Post on 27-Dec-2015

233 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Automatic Translation of Nominal Compound into Hindi

Prashant Mathur

IIIT Hyderabad

Soma Paul

IIIT Hyderabad

Page 2: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

OUTLINEOUTLINE

What is a Nominal Compound (NC) ? Translation variation of English NC into

Hindi Motivation Approach Results Future Work Bibliography

2Prashant Mathur

Page 3: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Nominal Compound

A construct of two or more nouns. The rightmost noun being the head, preceding

nouns modifiers.

Oil Pump : a device used to pump oil

Customer satisfaction indices : index that indicates the satisfaction rate of customer

Two word nominal compounds are the object of study here

3Prashant Mathur

Page 4: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Frequency of NC in English Corpus (Baldwin et al 2004)

Corpus Words NC Frequency

BNC 84M 2.6%

Reuters 108M 3.9%

4Prashant Mathur

Page 5: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

OUTLINEOUTLINE

What is a Nominal Compound (NC) ? Translation variation of English NC into

Hindi Motivation Approach Results Future Work Bibliography

5Prashant Mathur

Page 6: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Variation in translating English NC into Hindi

As Nominal Compound ‘Hindu texts’ hindU SastroM, ‘milk production’ dugdha

utpAdana

As Genitive Construction ‘rice husk’ cAval kI bhUsI, ‘room temperature’ kamare ka tApamAnaAs one word Cow dung gobar

As Adjective Noun Construction ‘nature cure’ prAkratik cikitsA, ‘hill camel’ ‘pahARI UMTa’

As other syntactic phrase wax work mom par kalAkArI ‘work on wax’, body pain SarIr meM dard ‘pain in body’Others Hand luggage haat meM le jaaye jaane vaale saamaan

6Prashant Mathur

Page 7: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

OUTLINEOUTLINE

What is a Nominal Compound (NC) ? Translation variation of English NC into

Hindi Motivation Approach Results Future Work Bibliography

7Prashant Mathur

Page 8: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Motivation

Issues in translation Choice of the appropriate target lexeme during

lexical substitution; and Selection of the right target construct type.

Occurrence of NCs in a corpus is high in frequency, however individual compound occur only a few times.

NCs are too varied to be precompiled in an exhaustive list of translated candidates

8Prashant Mathur

Page 9: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Therefore …

NCs are to be handled on the fly. The task of translation of NCs from English

into Hindi becomes a challenging task of NLP

9Prashant Mathur

Page 10: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

With Google translator

When tested on the same dataset that has been used to evaluate our system

Translation formation Precision

Overall 45%

Eng NC Hindi NC 29%

Eng NC Hindi Genitive 10%

Others 6%

10Prashant Mathur

Page 11: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

OUTLINEOUTLINE

What is a Nominal Compound (NC) ? Translation variation of English NC into

Hindi Motivation Approach Results Future Work Bibliography

11Prashant Mathur

Page 12: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Approach

Translation template generation Extraction of NC from English corpus Sense disambiguation of components Lexical substitution of the component nouns using

Bi-Lingual Dictionary Preparing translation candidates Corpus Search of translation candidates and their

Ranking.

12Prashant Mathur

Page 13: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Translation Template GenerationTranslation Template Generation

Construction Type No. of occurrences Percentage

Nominal Compound 3959 42.9%

Genitive 1976 21.4%

Long Phrases 581 6.284

Adjective Noun Phrase 557 6.024%

Single Word 766 8.285%

Transliterated Nominal Compound

1208 13.065%

None 199 2.152%

We did the survey of 50,000 sentences of parallel corpora and found out the following construction types.

13Prashant Mathur

Page 14: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Some Templates

Nominal Compound H1 H2

Genitive H1 kA H2 H1 ke H2 H1 kI H2

Long Phrases H1 pe H2 H1 meM H2 H1 par H2 H1 ke xvArA H2 H1 se prApwa H2

Total of 44 templates were formed, some of them are showed below.

Adjective H1-ikA H2

Single-Word H1

14Prashant Mathur

Page 15: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Approach

Translation template generation Extraction of NC from English corpus Sense disambiguation of components Lexical substitution of the component nouns using Bi-

Lingual Dictionary Preparing translation candidates Corpus Search of translation candidates and their

Ranking.

15Prashant Mathur

Page 16: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

ExtractionExtraction

1Tree-Tagger is a POS-Tagger which gives some extra information.

Word Tree-Tagger word POS TAG lemmarods rods_NNS_rod

2As assumed previously we consider only Noun-Noun formation as Nominal Compound.

16Prashant Mathur

Page 17: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Approach

Translation template generation Extraction of NC from English corpus Sense disambiguation of components Lexical substitution of the component nouns using Bi-

Lingual Dictionary Preparing translation candidates Corpus Search of translation candidates and their

Ranking.

17Prashant Mathur

Page 18: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Lexical Substitution

18Prashant Mathur

Page 19: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Step 3 : Sense Disambiguation of components

To reduce the number of translation candidates

Example :

Campaigns for road safety are organized to keep everyone safer on the Indian roads

Noun Component

No. of WN sense

Sense selected

Synset

Road 2 #1 <road, route>

Safety 6 #2 <safety, refuge>

19Prashant Mathur

Page 20: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

WordNet Sense-Relate by Ted Peterson. 80% accuracy in case of NC disambiguation.

20Prashant Mathur

Page 21: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Approach

Translation template generation Extraction of NC from English corpus Sense disambiguation of components Lexical substitution Preparing translation candidates Corpus Search of translation candidates and their

Ranking.

21Prashant Mathur

Page 22: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Lexical Substitution

Now how to translate it into Hindi ?We don’t have direct wordnet mapping from

English to Hindi. We use alternative method to translate.

22Prashant Mathur

Page 23: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Step 4: Lexical SubstitutionStep 4: Lexical Substitution

Acquire all possible translations for all the words within a synset.

Road path, maarg, saDak, raastaa

Route maarg, saDak, raastaa

Safety ahAnikArakatA, suraksita sthAna, suraksA, salAmatI, suraksA sAdhana

Refuge ASraya sthAna, ASraya, sahArA, SaraNa, CipanA

23Prashant Mathur

Page 24: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Contd…

Select those Hindi words which are common translations to all English words of a synset, if there is one

Selected words are: maarg, saDak, raastaa

All words are selected

Road path, maarg, saDak, raastaa

Route maarg, saDak, raastaa

Safety ahAnikArakatA, suraksita sthAna, suraksA, salAmatI, suraksA sAdhana

Refuge

ASraya sthAna, ASraya, sahArA, SaraNa, CipanA

24Prashant Mathur

Page 25: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Approach

Translation template generation Extraction of NC from English corpus Sense disambiguation of components Lexical substitution Preparing translation candidates Corpus Search of translation candidates and their

Ranking.

25Prashant Mathur

Page 26: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Step 5: Preparing Translation CandidateStep 5: Preparing Translation Candidate

For “road safety” Templates generated are:

mArga para surakRA,

mArga surakRA,

SaDak para surakRA,

SaDak kI surakRA

...

26Prashant Mathur

Page 27: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Approach

Translation template generation Extraction of NC from English corpus Sense disambiguation of components Lexical substitution Preparing translation candidates Corpus Search of translation candidates and their

Ranking.

27Prashant Mathur

Page 28: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Step 6 Corpus Search Step 6 Corpus Search

Hindi Corpus (Raw): 28 million words IndexedSearch – pattern match

28Prashant Mathur

Page 29: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Example

election time cunAva ke samaya temple community maMxira kA samAja marriage customs vivAha kI praWA

But we didn’t found any translation for

road safety Ф

Prashant Mathur 29

Page 30: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

CTQ (Corpus based Translation Quality)

Rate a given translation candidate for both The fully specified translation and Its parts in the context of the translation template in

question.

CTQ (w1H , w2

H , t) = αP(w1H , w2

H , t) + βP(w1H,t) P(w2

H , t) P(t)

t is the translation template used w1

H , w2H are the translations of components of NC

α = 1, β=0 if P(w1H , w2

H , t) > 0 (didn’t perform variation in α, β constants)

30Prashant Mathur

Page 31: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Contd..

Example road safety P(w1

H , w2H , t) = 0

road mArga, mArga ke, mArga meM, saDaka, saDaka par … safety surakRA, ke surakRA, meM surakRA, … so on

P (mArga, meM) * P(meM, surakRA) * P(meM) = (2.28*10-5) * (9.14*10-6) * (.286) = 6 * 10-11

P (mArga, kI) * P(kI, surakRA) * P(kI) = (1.35 × 10-5) * (3.82857143 × 10-5) * (.228) = 1.17 × 10-10

Higher probablity for “mArga kI surakRA”

31Prashant Mathur

Page 32: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Ranking

Baseline Ranking: Count based ranking

A stronger ranking measure CTQ

(borrowed from Baldwin and Tanaka (2004))

32Prashant Mathur

Page 33: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Results

0

10

20

30

40

50

60

70

80

90

100

Dictionary 1st Sense+Dict WSD + Dict

Baseline Recall

Baseline Precision

CTQ Recall

CTQ Precision

14

50

24

46.1

24.6

53.6

19

56.2

28

54.1

28.5

62.1

33Prashant Mathur

Page 34: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Contd..

Measure taken to improve recall: By using genitives as default construct when

translation for a NC is not found

Motivation: We conduct one experiment on development data We verify whether the NCs for which no translation found

during corpus search can be legitimately translated as a genitive construct

We found the heuristics is working for 59% cases

34Prashant Mathur

Page 35: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Results

0102030405060

Recall

Precision

24.8

54

44.5

57

Using genitive as default construct where the system fails to produce a translation

35Prashant Mathur

Page 36: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Related works

Similar approaches (search of translation templates in the corpus) adopted in Bungum and Oepen (2009) for Norwegian to

English nominal compound translation Tanaka and Baldwin (2004) for English to

Japanese nominal compound and vice versa

36Prashant Mathur

Page 37: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Conclusion

Novelty of our approach Using a WSD tool on Source language - to select

the correct sense of nominal components The result : The number of possible translation

candidates to be searched in the target language corpus is significantly reduced.

37Prashant Mathur

Page 38: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Future Work

Multinary NC translation Using semantic features provided in

UW-Dictionary Varying α & β in ranking technique to produce

more effective results.

38Prashant Mathur

Page 39: Automatic Translation of Nominal Compound into Hindi Prashant Mathur IIIT Hyderabad Soma Paul IIIT Hyderabad

Bibliography

Translation by Machine of Complex Nominals: Getting it right Tanaka and Timothy Baldwin

Translation Selection for Japanese-English Noun-Noun Compounds

Tanaka, Takaaki and Timothy Baldwin

Automatic Translation Of Noun Compounds Rackow, Ido Dagan, Ulrike Schwall

Norwegian to English nominal compound translation Bungum, Oepen

39Prashant Mathur