![Page 1: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/1.jpg)
Improving Translation Selection using Conceptual Vectors
LIM Lian TzeComputer Aided Translation Unit
School of Computer Sciences
Universiti Sains Malaysia
![Page 2: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/2.jpg)
Presentation Overview
Problem Background & Motivation Research Objectives Methodology Advantages & Contributions
![Page 3: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/3.jpg)
Presentation Overview
Problem Background & Motivation Research Objectives Methodology Advantages & Contributions
![Page 4: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/4.jpg)
Natural Language is Ambiguous
bankbank
?? ??
![Page 5: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/5.jpg)
Word Sense Disambiguation
Given: a list of meanings/senses of
words (dictionaries) input text containing
occurrences of ambiguous words
Assign the correct sense to particular instance of ambiguous word in context
A.k.a. “sense-tagging”
….bank#1: a financial institution that accepts deposits and channels the money into lending activities
bank#2: sloping land (especially the slope beside a body of water)
….
…withdraw money from the bank...
bank#1
![Page 6: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/6.jpg)
Disambiguation in Machine Translation (1)
….bank#1: a financial institution that accepts deposits and
channels the money into lending activities
bank#2: sloping land (especially the slope beside a bodyof water)….
…withdraw money from the bank...
(Malay translations)
bank
tebing
…withdraw money from the bank#1...
…mengeluarkan wang dari bank...
English input
Malay output
sense-tag(WSD)
select translation wordThat worked
well…
![Page 7: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/7.jpg)
Disambiguation in Machine Translation (2)
….circulation#6: the spread or transmission of something
(as news or money) to a wider group or area ….
(Malay translations)
edaran (money)
penyebaran (berita)
…50 ringgit notes in circulation...
… 50 ringgit notes in circulation#6...
…duit kertas 50 ringgit dalam edaran?? penyebaran?...
English input
Malay output
sense-tag(WSD)
translate
That DIDN’T work well…
![Page 8: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/8.jpg)
Optimising WSD for MT
Input word Sense number Translation word
select select
select
(Lee and Kim 2002)
![Page 9: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/9.jpg)
Presentation Overview
Problem Background & Motivation Research Objectives Methodology Advantages & Contributions
![Page 10: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/10.jpg)
Main Objective
Existing MT system: Selects fragments (translation units) from previously
translated examples Re-combines selected translation units to produce
translation output for new input text
Improve the translation quality of this MT system by adapting a WSD algorithm specifically for MT purposes
.
![Page 11: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/11.jpg)
Need semantic knowledge about…
Word senses Use dictionary definitions
Pairs of translation words From bilingual knowledge bank (BKB) made up of pairs of sentences
that are translations of each other Corresponding words in each translation sentence pair are explicitly
marked
Need a model to capture semantic knowledge of lexical items Conceptual Vectors (Lafourcade 2001) Using a selection of concepts or themes Construct mathematical vectors from concepts Thematic similarity between lexical items ≡ angle between CVs
![Page 12: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/12.jpg)
Need to:
Compile CVs for word meanings on 2 levels: Word sense (from dictionary) Word/phrase translation unit (from BKB) using data
compiled from previous step
Use compiled information during translation runtime to select correct translation units
![Page 13: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/13.jpg)
Presentation Overview
Problem Background & Motivation Research Objectives Methodology Advantages and Contributions
![Page 14: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/14.jpg)
Brief OutlineDictionary /
Lexicon
Word senses
word → sense numberlevel knowledge
Concept Category Labels
BKB
ExamplesTranslation
units
tag
Translation Unit Profile(word → translation level
knowledge)
Input Text
“clues”
matching, comparison, selection
selected translation units
Translated Text
Data Preparation Phase EBMT Run-time Phase
![Page 15: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/15.jpg)
During TranslationDictionary /
Lexicon
Word senses
word → sense numberlevel knowledge
Concept Category Labels
BKB
ExamplesTranslation
units
tag
Translation Unit Profile(word → translation level
knowledge)
Input Text
“clues”
matching, comparison, selection
selected translation units
Translated Text
Data Preparation Phase EBMT Run-time Phase
![Page 16: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/16.jpg)
Some Results
Translating ‘circulation’ to Malay edaran or penyebaran
TS: proposed translation selection using CVs BS: baseline strategy, chooses
the translation that co-occur with the same input words (and same structure) as in the BKB
or the most frequently occuring translation
InputTranslation chosen
by TSTranslation chosen
by BS
We will stop the circulation of that magazine. edaran penyebaran
We will stop the circulation of that rumour. penyebaran penyebaran
We will stop the circulation of that newspaper. edaran penyebaran
![Page 17: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/17.jpg)
Presentation Overview
Problem Background & Motivation Research Objectives Methodology Advantages & Contributions
![Page 18: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/18.jpg)
Advantages and Weaknesses
Pros: optimized for EBMT
focus on translation selection, bypass intermediate WSD at run time Handles many-to-many mapping of source word sense translation
words allows for bi-directional translation with sense-tagging for 1 language mathematical operations on vectors are easy to implement avoids combinatorial effect when multiple ambiguous words in input
Cons: not all ambiguities can be solved using co-occurring concepts does not handle translation selection of function words manual work required in data preparation
![Page 19: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/19.jpg)
Research Contributions
Adaptation of a WSD approach for the specific aim of translation selection
Proposal of specific guidelines for assigning related concepts for word meanings from dictionaries
Production of knowledge about word meanings on two levels: Word senses as in dictionaries Translations as in parallel text
![Page 20: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/20.jpg)
Summary
WSD can be customized for different NLP applications accordingly Different requirements Increase efficiency
WSD and related tasks based on concepts common to co-occurring word senses can be facilitated using conceptual vector model Requires a concept category hierarchy and word sense list Concepts related to a word sense modelled as mathematical vector Conceptual similarity = angular distance between vectors
Future work Automating data preparation tasks Investigating suitable weights or normalizing factors during CV manipulation Integration with other WSD or translation selection strategies
![Page 21: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/21.jpg)
Future Work
Automate tagging tasks that are currently done manually
Investigate different weight values for CVs for different syntactic relations or word classes
Integrate with other WSD/translation selection tasks
![Page 22: Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia](https://reader035.vdocuments.site/reader035/viewer/2022062423/5697bf891a28abf838c8a240/html5/thumbnails/22.jpg)
Thank You