Transcript
Page 1: R&D Lingua et Machina

1

Lingua et Machina

Research & Development

Franco-Thai Workshop 2010

Page 2: R&D Lingua et Machina

2

About me

● Estelle Delpech

● Research engineer at Lingua et Machina, France

● CAT tools provider● ed(at)lingua-et-machina(dot)com● www.lingua-et-machina.com

● Ph. Candidate at LINA, France● taln team : specialises in NLP● estelle.delpech(at)univ-nantes(dot)fr

Page 3: R&D Lingua et Machina

3

LINGUA ET MACHINA

● French company● Founded by Dr E. Planas● Led by Dr. F. De Colstoun

● Small but innovative● 8 persons● 2 R&D engineers / Ph. D. candidates

● NLP● Computational Linguistics● Translation Studies

Page 4: R&D Lingua et Machina

4

LINGUA ET MACHINA

● 2002● SIMILIS● 2nd generation translation

memories● Based on Ph.D. work

● 2007 ● LIBELLEX● Access to TM for non-professionals● Translation and terminology

management platform

Page 5: R&D Lingua et Machina

5

They trust us

Page 6: R&D Lingua et Machina

6

Partners

Page 7: R&D Lingua et Machina

7

SIMILIS

● Computer-aided translation● Free -lance translators● Translation agencies

● Translation memories● Pre translations

● Terminology extraction

● 7 languages : FR,EN,IT,ES,PT,DE,NL→ rule based

Page 8: R&D Lingua et Machina

8

Similis

Part 1/1

TITLE 1

Page 9: R&D Lingua et Machina

9

SIMILIS technology

Based on the Ph. D. work of E. Planas● First generation translation memory

● Works with segments, sentences● Second generation translation memory

● Works with chunks● [the driver] [steps] [on the gas pedal]

● Chunking● Rules written by linguists

● Fuzzy matching● Modified edit-distance● Several linguistic levels

Page 10: R&D Lingua et Machina

10

From SIMILIS to LIBELLEX

Translated Text

Source Text

Moderator

Moderator

Translators linguistsBusiness Experts

French Documents

English DocumentsGlossary

Memory(TMX)

(lexicon)

Page 11: R&D Lingua et Machina

11

LIBELLEX

● Translation memories meet corporate content management

● Target : global companies ● Many languages

● customers● Parterns● employees

● Speakers● Non native● Not language professionals

● Terminology and translations needs● Official documentation● Day to day intern communication

Page 12: R&D Lingua et Machina

12

Libellex

● Terminology management platform● builds corporate TM● extract / check terminology● help employees communicate

● Translation management platform● manage translations jobs● terminologies for translation agencies● chunk matches for MT

Page 13: R&D Lingua et Machina

13

Libellex

Part 1/1

TITLE 1

● Look up a word, a term, an expression● Manage terminology ● Have a document translated● Check translations● Check text● Add new documents

Page 14: R&D Lingua et Machina

14

R-D-I at Lingua et Machina

On going ● Statistical term extraction

● « Cheap and quick » addition of new languages

● Consider hybridation with rule-based methods● Term alignment in comparable corpora● Modelize translation process

Planned● Development of rule-based chunking on

Chinese● Extraction of « Knowledge-rich contexts » for

terminologies

Page 15: R&D Lingua et Machina

15

Research partnerships

● Statistical term extraction and alignment ● A. Lardilleux, Y. Lepage (Caen/Waseda)

● Chinsese processing● EDF, Kinep

● Comparable corpora● National project + Ph. D. candidate

● KRC extraction ● European project submission

● Translation studies ● Ph. D. candidate : Stendhal University

Page 16: R&D Lingua et Machina

16

Statistical term extraction and alignment

● Algorithm developed by A. Lardilleux in Ph. D. Thesis

● http://users.info.unicaen.fr/~alardill/● Uses “perfect alignments“

● Source and target words that only occur in the same source and target sentences

adf ↔ ADb ↔ BE b ↔ CF

ad ↔ e ADE● Randomly builds small samples of corpus

● Perfect alignments add-up

Page 17: R&D Lingua et Machina

17

Chinese and other languages

● Chinese processing● EDF uses Libellex● Needs ZH↔FR ZH ↔ EN translation

● Currently :● Statistical term alignment and extraction

● Planned : ● Chinese chunking rule● Develop hybrid statistical/rule-based

chunk alignment ● Other languages :

● Asian● Northern european● Eastern european

Page 18: R&D Lingua et Machina

18

Metricc projetc

● Scope : national● Bilingual terminologies mining from

comparable corpora● CAT● Translation memories● CLIR

● Partners● Syllabs, Sinéqua, LM● IMAG, Valoria

http://www.metricc.com

Page 19: R&D Lingua et Machina

19

Metricc : term alignment in comparable corpora

● Based on distributional analysis hypothesis● Words that appear in similar contexts

have similar meaning● Represent context of a word in vector :

● Word cooccurrents + normalized frequencies

● Translate context vector with seed lexicon● Compute distance between source and target

vectors● The closer , the better

Page 20: R&D Lingua et Machina

20

Knowledge-Rich Contexts Extraction

● Project under submission● Scope : european ● Partners :

● Inbenta , BEO● Lljublana University, LINA

● Knowlege-rich contexts● Help understand the term● Indicates of to use the term

Page 21: R&D Lingua et Machina

21

Knowledge-Rich Contexts Extraction

● Examples of KRC :● Contains of definition● Describes a relation between two terms● Indicates a collocation● Illustrates the term

● KRC linguistic description● Exemples, definitions in dictionaries● Corpus study

● KRC automatic identification● Morpho syntactic patterns● Statistical clues

Page 22: R&D Lingua et Machina

22

Modelization of translation process

● Research engineer / Ph. D. Thesis● Department of translations studies● Université Stendhal, Grenoble

● How do we translate ?● What knowledge is helpful to

translators ? ● What is a good translation ? ● Do non-professional translate

differently ? ● How do you improve software usability

?

Page 23: R&D Lingua et Machina

23

More information

● Lingua et Machina● www.lingua-et-machina.com/● contact(a)lingua-et-machina.com

● Libellex● http://libellex.fr/

● Download Similis● http://similis.org/Download/SimilisFreel

ance-2.16.04-Setup.exe

Page 24: R&D Lingua et Machina

24

Thank you

ed(a)lingua-et-machina.com

Franco-Thai Workshop 2010


Top Related