3gtm: a third-generation translation memoryfelipe/papers/slides-cline-3gtm-2005.pdf ·...
TRANSCRIPT
3GTM: A Third-Generation Translation Memory
Fabrizio Gotti†, Philippe Langlais†, Elliott Macklovitch†,Didier Bourigault?, Benoit Robichaud‡ and Claude Coulombe‡
†RALIDepartement d’informatique et de recherche operationnelle
Universite de Montreal
‡ Lingua Technologies Inc.Montreal
? ERSS-CNRSToulouse
CLiNE — August 26th 2005
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 1 / 35
Outline
1 Overview of the 3GTM project
2 Experimental Setting
3 ExperimentsSentence CoverageRandom Substring CoverageChunk-Based CoverageTree-Phrase Coverage
4 Discussion
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 2 / 35
Overview of the 3GTM project
Translation MemoryA Computer Assisted Tool which eases the recycling of pasttranslations
1st -generation TM never translates again a sentence that has alreadybeen translatedFull-sentence repetition is a rather marginal phenomemon
2nd -generation TM 2 source sentences might be considered identicalif they differ only slightly (named entities, edit distance,etc.)Fuzzy matching
3rd -generation TM (3GTM) recycles at a sub-sentential level
A project currently funded by PRECARN
Lingua Technologies Inc., RALI, Transetix Inc.
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 3 / 35
3GTM in a Screenshot
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 4 / 35
1 Overview of the 3GTM project
2 Experimental Setting
3 ExperimentsSentence CoverageRandom Substring CoverageChunk-Based CoverageTree-Phrase Coverage
4 Discussion
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 5 / 35
Experimental Setting
English-French language pair : querying the French side,proposing English material
TM populated with Canadian Hansard texts
Coverage statistics computed over a test corpus
help appreciating the number of useful units that can bequeried/foundthe easiest thing to implement in an early stage of a projectultimately, we target human evaluation runs (or simulations)
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 6 / 35
Training MaterialNumber of sentences, tokens and types in the training corpus
Language English FrenchNb. sentences 1 753 443 1 753 443Nb. tokens 31 637 775 34 150 039Nb. types 85 810 106 987Avg. word/sent. 17.5 19.3
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 7 / 35
Test Material
1000 sentences (Hansard corpus)
chronologically distinct from the training material
French = query or source language
English = output or target language
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 8 / 35
Tools used
JAPA an in-house sentence alignerhttp://rali.iro.umontreal.ca/Japa/
LUCENE a freely available full-featured text search enginehttp://lucene.apache.org
SIMAC an in-house implementation of a word aligner(Simard and Langlais, 2003)
GIZA++ a tool to train translation models(Och and Ney, 2000)
GRAMMATICUM a constituant-based parser(Coulombe, 1991)
SYNTEX a dependency-based parser(Bourigault and Fabre, 2000)
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 9 / 35
1 Overview of the 3GTM project
2 Experimental Setting
3 ExperimentsSentence CoverageRandom Substring CoverageChunk-Based CoverageTree-Phrase Coverage
4 Discussion
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 10 / 35
Full Sentence CoverageUsing Verbatim Match
Nb. of sentences 1000Nb. of sent. found verbatim 148Avg. size of sent. in test corpus 19.2Avg. size of sent. found verbatim 11.1
14.8 % because of Hansard idioms :
I don’t knowMr. Speaker : Order, please .
within a TM ≡ TSRALI.com (6.6 M. pairs of phrases), we onlyfound 11 out of 1000 sentences of the EuroParl corpus.
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 11 / 35
1 Overview of the 3GTM project
2 Experimental Setting
3 ExperimentsSentence CoverageRandom Substring CoverageChunk-Based CoverageTree-Phrase Coverage
4 Discussion
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 12 / 35
Random Substring CoverageProtocol
1 Query the TM with any sequence of the source (French) material(length ≥ 2)A query found at least once is a valid one
2 Compute a source (French) optimal coverageMaximizing the source coverage while minimizing the number ofqueries
3 Consider the target (English) material associatedBy following the word alignment
4 Compute a target (English) optimal coverageWait for details
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 13 / 35
Random Substring Coverage
S Il travaille dans la chocolaterieT He works in a chocolate factoryq la chocolaterie
Match :
S Charlie 1 et 2 [la 3 chocolaterie 4,5]T Charlie and [the chocolate factory]m
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 14 / 35
Random Substring Coverage
S Il travaille dans la chocolaterieT He works in a chocolate factoryq la chocolaterie
Match :
S Charlie 1 et 2 [la 3 chocolaterie 4,5]T Charlie and [the chocolate factory]m
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 14 / 35
Random Substring Coverage
S Il travaille dans la chocolaterieT He works in a chocolate factoryq la chocolaterie
Match :
S Charlie 1 et 2 [la 3 chocolaterie 4,5]T Charlie and [the chocolate factory]m the chocolate factory
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 14 / 35
Random Substring Coverage
S Il travaille dans la chocolaterieT He works in a chocolate factoryq la chocolaterie
Match :
S Charlie 1 et 2 [la 3 chocolaterie 4,5]T Charlie and [the chocolate factory]m the chocolate factory
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 15 / 35
Random Substring CoverageCoverage statistics
Metric Source TargetOptimal coverage 98.8% 55.8%Cov. unit size (words) 4.09 2.98Number of cov. units 4.65 3.23
Avg. nb. LUCENE queries per sentence : 226
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 16 / 35
Random Substring CoverageThe unsustainable Truth
S : m. mcinnes : je m’ excuseT : mr . mcinnes : i apologize
↗ m.
mci
nnes
: je m’
excu
se
excuse – – – – – –m’ – – – – – 446je – – – – 3719 347: – – – 12330 185 107
mcinnes – – 43 4 0 0m. – 69 43 4 0 0
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 17 / 35
Random Substring CoverageThe unsustainable Truth
S : m. mcinnes – : je m’ excuseT : mr . mcinnes : i apologize
↗ m.
mci
nnes
: je m’
excu
se
excuse – – – – – –m’ – – – – – 446je – – – – 3719 347: – – – 12330 185 107
mcinnes – – 43 4 0 0m. – 69 43 4 0 0
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 17 / 35
Random Substring CoverageThe unsustainable Truth
S : m. mcinnes : je m’ excuseT : mr . mcinnes : i apologize
m. mcinnes : (69)
42 mr . mcinnes :17 mr . mcinnes )
2 mr . mcinnes ) ,1 mr . mcinnes (1 mr . mcinnes : it not is required reading , mr . speaker1 mr . mcinnes moved1 mr . mcinnes moves
: je m’ excuse (107)
46 : i am sorry ,16 : i am sorry14 : i am sorry to14 : i apologize ,
8 : i apologize for interrupting8 : i apologize to8 : i do apologize6 : excuse me ,6 : : i apologize6 : i beg your pardon
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 19 / 35
Random Substring CoverageThe unsustainable Truth
S : m. mcinnes : je m’ excuseT : mr . mcinnes : i apologize
m. mcinnes : (69)
42 mr . mcinnes :17 mr . mcinnes )
2 mr . mcinnes ) ,1 mr . mcinnes (1 mr . mcinnes : it not is required reading , mr . speaker1 mr . mcinnes moved1 mr . mcinnes moves
: je m’ excuse (107)
46 : i am sorry ,16 : i am sorry14 : i am sorry to14 : i apologize ,
8 : i apologize for interrupting8 : i apologize to8 : i do apologize6 : excuse me ,6 : : i apologize6 : i beg your pardon
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 19 / 35
Random Substring CoverageThe unsustainable Truth
S : m. mcinnes : je m’ excuseT : mr . mcinnes : i apologize
m. mcinnes : (69)
42 mr . mcinnes :17 mr . mcinnes )
2 mr . mcinnes ) ,1 mr . mcinnes (1 mr . mcinnes : it not is required reading , mr . speaker1 mr . mcinnes moved1 mr . mcinnes moves
: je m’ excuse (107)
46 : i am sorry ,16 : i am sorry14 : i am sorry to14 : i apologize ,
8 : i apologize for interrupting8 : i apologize to8 : i do apologize6 : excuse me ,6 : : i apologize6 : i beg your pardon
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 19 / 35
1 Overview of the 3GTM project
2 Experimental Setting
3 ExperimentsSentence CoverageRandom Substring CoverageChunk-Based CoverageTree-Phrase Coverage
4 Discussion
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 20 / 35
Chunk-Based CoverageQuerying the Memory with Chunks : Pros
Speeding upBy limiting the number of queries
Improving the target material extractionBy taking into account chunk boundaries computed on both sides
Avoiding overwhelming the user with too much dataLess queries, reduced target material overlaps
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 21 / 35
Chunk-Based CoverageProtocol
1 The test material was first chunked using a tool from LinguaTechnologies Inc. (Coulombe,1991)
2 28.35 chunks per source (French) sentence on average
3 11.7 chunks if we only consider those of size ≥ 2
4 We used those selected chunks to query the TM
5 Everything else was kept identical to the previous experiment
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 22 / 35
Chunk-Based CoverageCoverage Statistics
Metric Source TargetOptimal coverage 98.8% 55.8%Cov. unit size (words) 4.09 2.98Number of cov. units 4.65 3.23
Avg. nb. LUCENE queries per sentence : 226
Metric Source TargetOptimal coverage 59.9% 59.3%Cov. unit size (words) 3.73 2.99Number of cov. units 3.08 3.47Avg. nb. L UCENE queries per sentence : 11.7
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 23 / 35
Chunk-Based Coverage
S : m. mcinnes : je m’ excuseT : mr . mcinnes : i apologize
↗ m.
mci
nnes
: je m’
excu
se
excuse – – – – – –m’ – – – – – 446je – – – – 3719 347: – – – 12330 185 107
mcinnes – – 43 4 0 0m. – 69 43 4 0 0
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 24 / 35
Chunk-Based Coverage
S : m. mcinnes : – je m’ excuseT : mr . mcinnes : i apologize
↗ m.
mci
nnes
: je m’
excu
se
excuse – – – – – –m’ – – – – – 446je – – – – 3719 347: – – – 12330 185 107
mcinnes – – 43 4 0 0m. – 69 43 4 0 0
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 24 / 35
Chunk-Based Coverage
S : m. mcinnes : je m’ excuseT : mr . mcinnes : i apologize
m. mcinnes : (69)
42 mr . mcinnes :17 mr . mcinnes )
2 mr . mcinnes ) ,1 mr . mcinnes (1 mr . mcinnes : it not is required reading , mr . speaker1 mr . mcinnes moved1 mr . mcinnes moves
je m’ excuse (347)
40 i am sorry ,33 i apologize to24 i apologize21 i am sorry15 i apologize for13 i am sorry to13 i apologize ,
6 i apologize if6 i do apologize to5 i apologize for interrupting
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 26 / 35
Chunk-Based Coverage
S : m. mcinnes : je m’ excuseT : mr . mcinnes : i apologize
m. mcinnes : (69)
42 mr . mcinnes :17 mr . mcinnes )
2 mr . mcinnes ) ,1 mr . mcinnes (1 mr . mcinnes : it not is required reading , mr . speaker1 mr . mcinnes moved1 mr . mcinnes moves
je m’ excuse (347)
40 i am sorry ,33 i apologize to24 i apologize21 i am sorry15 i apologize for13 i am sorry to13 i apologize ,
6 i apologize if6 i do apologize to5 i apologize for interrupting
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 26 / 35
Chunk-Based Coverage
S : m. mcinnes : je m’ excuseT : mr . mcinnes : i apologize
m. mcinnes : (69)
42 mr . mcinnes :17 mr . mcinnes )
2 mr . mcinnes ) ,1 mr . mcinnes (1 mr . mcinnes : it not is required reading , mr . speaker1 mr . mcinnes moved1 mr . mcinnes moves
je m’ excuse (347)
40 i am sorry ,33 i apologize to24 i apologize21 i am sorry15 i apologize for13 i am sorry to13 i apologize ,
6 i apologize if6 i do apologize to5 i apologize for interrupting
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 26 / 35
1 Overview of the 3GTM project
2 Experimental Setting
3 ExperimentsSentence CoverageRandom Substring CoverageChunk-Based CoverageTree-Phrase Coverage
4 Discussion
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 27 / 35
Tree-Phrase Coverage
Motivation : The translations of a good friend could be useful totranslate a very good friend which do not appear in the TM.
From now on,TM = collection of Tree-Phrases (TPs)
whereTP = a combination of a treelet (TL) and an elastic-phrase (EP)
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 28 / 35
Tree-Phrase CoverageSYNTEX (Bourigault et Fabre, 2000)
Dependency parser available for French and English.
On demande des cr edits f ederaux
demandeSUB
llllllllll OBJYYYYYYYYYYYYYYYYYY
on cr editsDET
llllllllll ADJRRRRRRRRRR
des f ederaux
A link identifies two words : a governor and its dependent (e.g.demande governs cr edits )
We do not consider link labels in this study
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 29 / 35
Tree-Phrase CoverageFacts
We parsed the source (French) part of the training material withSYNTEX
We extracted all TLs of depth 1
We collected more than 3 million types of TLs from which weprojected 6.5 million kinds of EPs
The TLs range in size from 2 to 8 words, and EPs from 1 to 9
Roughly half the TLs and 40% of the EPs are contiguous ones
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 30 / 35
Tree-Phrase CoverageOn demande des cr edits f ederaux / We request for federal funding
alignment :demande ≡ request for // federaux ≡ federal // credits ≡ funding
treelets :
demande
qqqqqqqMMMMMMM
on cr edits
cr edits
qqqqqqqMMMMMMM
des f ederaux
tree-phrases :TL? {{on@-1} demande {cr edits@2 }}EP? |request@0||for@1||funding@3|
TL {{des@-1} cr edits {f ederaux@1}}EP |federal@0||funding@1|
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 31 / 35
Tree-Phrase CoverageCoverage Statistics
Metric Source TargetOptimal coverage 59.9% 59.3%Cov. unit size (words) 3.73 2.99Number of cov. units 3.08 3.47
Avg. nb. LUCENE queries per sentence : 11.7
Metric Source TargetOptimal coverage 62.7% 56.4%Cov. by contiguous TPs 46.0% 38.6%
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 32 / 35
Tree-Phrase Coverage
S presentation de le 1er rapport de le comite permanentT presentation of first report of standing committee
rapport
||||
|BB
BBB
de le 1er
of first report
rapport
���� ++++
de le
report, report of, of report
rapport
||||
|BB
BBB
le de
report, of reportreport of
rapport
||||
|BB
BBB
PPPPPPPP
de le 1er de
report, of first report,first report of
comit e
||||
|BB
BBB
de le
committee, of committee
comit e
uuuuuu
�����
6666
6
de le permanent
of committee, standing committee,of standing committee
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 33 / 35
Tree-Phrase Coverage
S presentation de le 1er rapport de le comite permanentT presentation of first report of standing committee
rapport
||||
|BB
BBB
de le 1er
of first report
rapport
���� ++++
de le
report, report of, of report
rapport
||||
|BB
BBB
le de
report, of reportreport of
rapport
||||
|BB
BBB
PPPPPPPP
de le 1er de
report, of first report,first report of
comit e
||||
|BB
BBB
de le
committee, of committee
comit e
uuuuuu
�����
6666
6
de le permanent
of committee, standing committee,of standing committee
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 33 / 35
1 Overview of the 3GTM project
2 Experimental Setting
3 ExperimentsSentence CoverageRandom Substring CoverageChunk-Based CoverageTree-Phrase Coverage
4 Discussion
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 34 / 35
Discussion
We have considered distinct approaches to query a TM
Full-sentence queries yield a poor coverage
Random substring querying does well at coverage, but does notseem viable without stringent pruning strategies (Simard, 2003)
Chunk-based querying seems attractive
TP querying seems a viable alternative, and non-contiguous unitsmight be useful to the end user
Coverage simulations are only approximations (Langlais andSimard, 2003)
RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI Departement d’informatique et de recherche operationnelle Universite de Montreal ‡ Lingua Technologies Inc. Montreal ? ERSS-CNRS Toulouse )3GTM CLiNE — August 26th 2005 35 / 35