r&d lingua et machina

Download R&D Lingua et Machina

Post on 26-Jun-2015




0 download

Embed Size (px)


Material of the 4th Intensive Summer school and collaborative workshop on Natural Language Processing (NAIST Franco-Thai Workshop 2010). Bangkok, Thaıland. Institution: Institut de Recherche en Informatique de Toulouse (IRIT), Lingua et Machina


  • 1. Franco-Thai Workshop 2010Lingua et Machina Research & Development1

2. About me Estelle Delpech Research engineer at Lingua et Machina, France CAT tools provider ed(at)lingua-et-machina(dot)com www.lingua-et-machina.com Ph. Candidate at LINA, France taln team : specialises in NLP estelle.delpech(at)univ-nantes(dot)fr2 3. LINGUA ET MACHINA French company Founded by Dr E. Planas Led by Dr. F. De Colstoun Small but innovative 8 persons 2 R&D engineers / Ph. D. candidates NLP Computational Linguistics Translation Studies3 4. LINGUA ET MACHINA2002 SIMILIS 2nd generation translation memories Based on Ph.D. work2007 LIBELLEX Access to TM for non-professionals Translation and terminology management platform4 5. They trust us5 6. Partners6 7. SIMILIS Computer-aided translation Free -lance translators Translation agencies Translation memories Pre translations Terminology extraction 7 languages : FR,EN,IT,ES,PT,DE,NL rule based7 8. SimilisPart1/1TITLE 18 9. SIMILIS technology Based on the Ph. D. work of E. Planas First generation translation memory Works with segments, sentences Second generation translation memory Works with chunks [the driver] [steps] [on the gas pedal] Chunking Rules written by linguists Fuzzy matching Modified edit-distance Several linguistic levels 9 10. From SIMILIS to LIBELLEXSource TextFrench DocumentsModeratorMemory (TMX)Glossary English DocumentsTranslated Text(lexicon)Moderator Translators linguists Business Experts 10 11. LIBELLEXTranslation memories meet corporate content management Target : global companies Many languages customers Parterns employees Speakers Non native Not language professionals Terminology and translations needs Official documentation Day to day intern communication 11 12. LibellexTerminology management platform builds corporate TM extract / check terminology help employees communicate Translation management platform manage translations jobs terminologies for translation agencies chunk matches for MT12 13. LibellexPart1/1TITLE 1 Look up a word, a term, an expression Manage terminology Have a document translated Check translations Check text Add new documents 13 14. R-D-I at Lingua et Machina On going Statistical term extraction Cheap and quick addition of new languages Consider hybridation with rule-based methods Term alignment in comparable corpora Modelize translation process Planned Development of rule-based chunking on Chinese Extraction of Knowledge-rich contexts for terminologies 14 15. Research partnerships Statistical term extraction and alignment A. Lardilleux, Y. Lepage (Caen/Waseda) Chinsese processing EDF, Kinep Comparable corpora National project + Ph. D. candidate KRC extraction European project submission Translation studies Ph. D. candidate : Stendhal University15 16. Statistical term extraction and alignment Algorithm developed by A. Lardilleux in Ph. D. Thesis http://users.info.unicaen.fr/~alardill/ Uses perfect alignments Source and target words that only occur in the same source and target sentences adf AD b BE b CF a e AE d D R n o ly b ild sm sa p s o co u adm u s a ll m le f rp s Perfect alignments add-up 16 17. Chinese and other languagesChinese processing EDF uses Libellex Needs ZHFR ZH EN translation Currently : Statistical term alignment and extraction Planned : Chinese chunking rule Develop hybrid statistical/rule-based chunk alignment Other languages : Asian Northern european Eastern european 17 18. Metricc projetc Scope : national Bilingual terminologies mining from comparable corpora CAT Translation memories CLIR Partners Syllabs, Sinqua, LM IMAG, Valoriahttp://www.metricc.com 18 19. Metricc : term alignment in comparable corpora Based on distributional analysis hypothesis Words that appear in similar contexts have similar meaning Represent context of a word in vector : Word cooccurrents + normalized frequencies Translate context vector with seed lexicon Compute distance between source and target vectors The closer , the better19 20. Knowledge-Rich Contexts Extraction Project under submission Scope : european Partners : Inbenta , BEO Lljublana University, LINA Knowlege-rich contexts Help understand the term Indicates of to use the term20 21. Knowledge-Rich Contexts Extraction Examples of KRC : Contains of definition Describes a relation between two terms Indicates a collocation Illustrates the term KRC linguistic description Exemples, definitions in dictionaries Corpus study KRC automatic identification Morpho syntactic patterns Statistical clues 21 22. Modelization of translation process Research engineer / Ph. D. Thesis Department of translations studies Universit Stendhal, Grenoble How do we translate ? What knowledge is helpful to translators ? What is a good translation ? Do non-professional translate differently ? How do you improve software usability ?22 23. More information Lingua et Machina www.lingua-et-machina.com/ contact(a)lingua-et-machina.com Libellex http://libellex.fr/ Download Similis http://similis.org/Download/SimilisFreel ance-2.16.04-Setup.exe23 24. Franco-Thai Workshop 2010 Thank you ed(a)lingua-et-machina.com24