Download - R&D Lingua et Machina
1
Lingua et Machina
Research & Development
Franco-Thai Workshop 2010
2
About me
● Estelle Delpech
● Research engineer at Lingua et Machina, France
● CAT tools provider● ed(at)lingua-et-machina(dot)com● www.lingua-et-machina.com
● Ph. Candidate at LINA, France● taln team : specialises in NLP● estelle.delpech(at)univ-nantes(dot)fr
3
LINGUA ET MACHINA
● French company● Founded by Dr E. Planas● Led by Dr. F. De Colstoun
● Small but innovative● 8 persons● 2 R&D engineers / Ph. D. candidates
● NLP● Computational Linguistics● Translation Studies
4
LINGUA ET MACHINA
● 2002● SIMILIS● 2nd generation translation
memories● Based on Ph.D. work
● 2007 ● LIBELLEX● Access to TM for non-professionals● Translation and terminology
management platform
5
They trust us
6
Partners
7
SIMILIS
● Computer-aided translation● Free -lance translators● Translation agencies
● Translation memories● Pre translations
● Terminology extraction
● 7 languages : FR,EN,IT,ES,PT,DE,NL→ rule based
8
Similis
Part 1/1
TITLE 1
9
SIMILIS technology
Based on the Ph. D. work of E. Planas● First generation translation memory
● Works with segments, sentences● Second generation translation memory
● Works with chunks● [the driver] [steps] [on the gas pedal]
● Chunking● Rules written by linguists
● Fuzzy matching● Modified edit-distance● Several linguistic levels
●
10
From SIMILIS to LIBELLEX
Translated Text
Source Text
Moderator
Moderator
Translators linguistsBusiness Experts
French Documents
English DocumentsGlossary
Memory(TMX)
(lexicon)
11
LIBELLEX
● Translation memories meet corporate content management
● Target : global companies ● Many languages
● customers● Parterns● employees
● Speakers● Non native● Not language professionals
● Terminology and translations needs● Official documentation● Day to day intern communication
12
Libellex
● Terminology management platform● builds corporate TM● extract / check terminology● help employees communicate
● Translation management platform● manage translations jobs● terminologies for translation agencies● chunk matches for MT
13
Libellex
Part 1/1
TITLE 1
● Look up a word, a term, an expression● Manage terminology ● Have a document translated● Check translations● Check text● Add new documents
14
R-D-I at Lingua et Machina
On going ● Statistical term extraction
● « Cheap and quick » addition of new languages
● Consider hybridation with rule-based methods● Term alignment in comparable corpora● Modelize translation process
Planned● Development of rule-based chunking on
Chinese● Extraction of « Knowledge-rich contexts » for
terminologies
15
Research partnerships
● Statistical term extraction and alignment ● A. Lardilleux, Y. Lepage (Caen/Waseda)
● Chinsese processing● EDF, Kinep
● Comparable corpora● National project + Ph. D. candidate
● KRC extraction ● European project submission
● Translation studies ● Ph. D. candidate : Stendhal University
16
Statistical term extraction and alignment
● Algorithm developed by A. Lardilleux in Ph. D. Thesis
● http://users.info.unicaen.fr/~alardill/● Uses “perfect alignments“
● Source and target words that only occur in the same source and target sentences
adf ↔ ADb ↔ BE b ↔ CF
ad ↔ e ADE● Randomly builds small samples of corpus
● Perfect alignments add-up
17
Chinese and other languages
● Chinese processing● EDF uses Libellex● Needs ZH↔FR ZH ↔ EN translation
● Currently :● Statistical term alignment and extraction
● Planned : ● Chinese chunking rule● Develop hybrid statistical/rule-based
chunk alignment ● Other languages :
● Asian● Northern european● Eastern european
18
Metricc projetc
● Scope : national● Bilingual terminologies mining from
comparable corpora● CAT● Translation memories● CLIR
● Partners● Syllabs, Sinéqua, LM● IMAG, Valoria
http://www.metricc.com
19
Metricc : term alignment in comparable corpora
● Based on distributional analysis hypothesis● Words that appear in similar contexts
have similar meaning● Represent context of a word in vector :
● Word cooccurrents + normalized frequencies
● Translate context vector with seed lexicon● Compute distance between source and target
vectors● The closer , the better
20
Knowledge-Rich Contexts Extraction
● Project under submission● Scope : european ● Partners :
● Inbenta , BEO● Lljublana University, LINA
● Knowlege-rich contexts● Help understand the term● Indicates of to use the term
21
Knowledge-Rich Contexts Extraction
● Examples of KRC :● Contains of definition● Describes a relation between two terms● Indicates a collocation● Illustrates the term
● KRC linguistic description● Exemples, definitions in dictionaries● Corpus study
● KRC automatic identification● Morpho syntactic patterns● Statistical clues
22
Modelization of translation process
● Research engineer / Ph. D. Thesis● Department of translations studies● Université Stendhal, Grenoble
● How do we translate ?● What knowledge is helpful to
translators ? ● What is a good translation ? ● Do non-professional translate
differently ? ● How do you improve software usability
?
23
More information
● Lingua et Machina● www.lingua-et-machina.com/● contact(a)lingua-et-machina.com
● Libellex● http://libellex.fr/
● Download Similis● http://similis.org/Download/SimilisFreel
ance-2.16.04-Setup.exe
24
Thank you
ed(a)lingua-et-machina.com
Franco-Thai Workshop 2010