esr7 carolina scarton - expert summer school - malaga 2015
TRANSCRIPT
![Page 1: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/1.jpg)
Finding Ways to Assess Machine Translated Documents for Document-level Quality Prediction
Carolina Scarton [email protected]
Supervisor: Dr Lucia Specia
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
![Page 2: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/2.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
2
Agenda
Introduction
Quality Estimation Framework
Related Work
Document-level Quality Estimation
Quality Label problem
Two-stage post-edition experiment
Large-scale experiments
Conclusion
![Page 3: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/3.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
3
Agenda
Introduction
Quality Estimation Framework
Related Work
Document-level Quality Estimation
Quality Label problem
Two-stage post-edition experiment
Large-scale experiments
Conclusion
![Page 4: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/4.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
4
Introduction
Quality estimation (QE) of machine translations
– quality predictions for new, unseen machine translated texts
![Page 5: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/5.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
5
Introduction
Quality estimation (QE) of machine translations
– quality predictions for new, unseen machine translated texts
– use of machine learning techniques – only few labelled data points
![Page 6: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/6.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
6
Introduction
Quality estimation (QE) of machine translations
– quality predictions for new, unseen machine translated texts
– use of machine learning techniques – only few labelled data points
– different from BLEU-style metrics – QE does not rely on reference translations
![Page 7: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/7.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
7
Introduction
Open problems:
– Granularity level?• Word-level• Sentence-level• Document-level
![Page 8: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/8.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
8
Introduction
Open problems:
– Granularity level?• Word-level• Sentence-level• Document-level
– Which are the best features?• Linguistic features have been explored: but not
much on discourse features!
![Page 9: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/9.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
9
Introduction
Open problems:
– Granularity level?• Word-level• Sentence-level• Document-level
– Which are the best features?• Linguistic features have been explored: but not
much on discourse features!
– Which are the best quality labels?• Likert• HTER• BLEU-style
![Page 10: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/10.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
10
Target documents
Source documents
Introduction
![Page 11: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/11.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
11
Target documents
Feature extractor
Source documents
Introduction
![Page 12: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/12.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
12
Target documents
Features for QE
Feature extractor
Source documents
Introduction
![Page 13: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/13.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
13
Target documents
Features for QE
Feature extractor
Source documents
QE model training
Introduction
![Page 14: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/14.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
14
Target documents
Features for QE
Feature extractor
Source documents
Quality labels Likert HTER BLEU ...
QE model training
Introduction
![Page 15: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/15.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
15
Target documents
Features for QE
Feature extractor
QE modelSource documents
Quality labels Likert HTER BLEU ...
QE model training
Introduction
![Page 16: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/16.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
16
Target documents
Features for QE
Feature extractor
QE model
Predictions
Source documents
Quality labels Likert HTER BLEU ...
QE model training
Introduction
![Page 17: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/17.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
17
Target documents
Features for QE
Feature extractor
QE model
Predictions
Source documents
Quality labels Likert HTER BLEU ...
QE model training
Defining the ideal quality label for document-level prediction is a
challenge
Introduction
![Page 18: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/18.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
18
Agenda
Introduction
Quality Estimation Framework
Related Work
Document-level Quality Estimation
Quality Label problem
Two-stage post-edition experiment
Large-scale experiments
Conclusion
![Page 19: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/19.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
19
Quality Estimation Framework
QuEst (www.quest.dcs.shef.ac.uk)
– Framework for sentence-level QE
![Page 20: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/20.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
20
Quality Estimation Framework
QuEst (www.quest.dcs.shef.ac.uk)
– Framework for sentence-level QE
– QuEst++ → recent extension for word and document levels • https://github.com/ghpaetzold/questplusplus
![Page 21: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/21.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
21
Quality Estimation Framework
QuEst (www.quest.dcs.shef.ac.uk)
– Framework for sentence-level QE
– QuEst++ → recent extension for word and document levels • https://github.com/ghpaetzold/questplusplus
– Feature Extraction module (Java)
![Page 22: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/22.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
22
Quality Estimation Framework
QuEst (www.quest.dcs.shef.ac.uk)
– Framework for sentence-level QE
– QuEst++ → recent extension for word and document levels • https://github.com/ghpaetzold/questplusplus
– Feature Extraction module (Java)
– Machine Learning module (Python)
![Page 23: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/23.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
23
Quality Estimation Framework
![Page 24: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/24.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
24
Agenda
Introduction
Quality Estimation Framework
Related Work
Document-level Quality Estimation
Quality Label problem
Two-stage post-edition experiment
Large-scale experiments
Conclusion
![Page 25: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/25.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
25
Related Work
Soricut and Echihabi (2010) → TrustRank– Ranking documents according to BLEU scores
![Page 26: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/26.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
26
Related Work
Soricut and Echihabi (2010) → TrustRank– Ranking documents according to BLEU scores
Scarton and Specia (2014)– Document-level QE prediction using discourse features – also
predicted BLEU scores
![Page 27: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/27.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
27
Related Work
Soricut and Echihabi (2010) → TrustRank– Ranking documents according to BLEU scores
Scarton and Specia (2014)– Document-level QE prediction using discourse features – also
predicted BLEU scores
Carpuat and Simard (2012)– Lexical consistency study of MT outputs → MT is overall consistent!
![Page 28: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/28.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
28
Related Work
Soricut and Echihabi (2010) → TrustRank– Ranking documents according to BLEU scores
Scarton and Specia (2014)– Document-level QE prediction using discourse features – also
predicted BLEU scores
Carpuat and Simard (2012)– Lexical consistency study of MT outputs → MT is overall consistent!
Meyer and Weber (2013)– Implicit discourse connectives in MT
![Page 29: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/29.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
29
Related Work
Soricut and Echihabi (2010) → TrustRank– Ranking documents according to BLEU scores
Scarton and Specia (2014)– Document-level QE prediction using discourse features – also
predicted BLEU scores
Carpuat and Simard (2012)– Lexical consistency study of MT outputs → MT is overall consistent!
Meyer and Weber (2013)– Implicit discourse connectives in MT
Li et al. (2014)– Discourse connectives → improve MT → correlations between
discourse connectives and HTER
![Page 30: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/30.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
30
Related Work
Soricut and Echihabi (2010) → TrustRank– Ranking documents according to BLEU scores
Scarton and Specia (2014)– Document-level QE prediction using discourse features – also
predicted BLEU scores
Carpuat and Simard (2012)– Lexical consistency study of MT outputs → MT is overall consistent!
Meyer and Weber (2013)– Implicit discourse connectives in MT
Li et al. (2014)– Discourse connectives → improve MT → correlations between
discourse connectives and HTER
Guzmán et al. (2014)– Document-level evaluation metric using RST
![Page 31: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/31.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
31
Agenda
Introduction
Quality Estimation Framework
Related Work
Document-level Quality Estimation
Quality Label problem
Two-stage post-edition experiment
Large-scale experiments
Conclusion
![Page 32: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/32.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
32
Quality Label problem
Quality labels are a challenge:
– Which is the ideal quality label for document-level QE?
Document-level QE
![Page 33: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/33.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
33
Quality Label problem
Quality labels are a challenge:
– Which is the ideal quality label for document-level QE?
– How can we assess documents?
Document-level QE
![Page 34: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/34.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
34
Quality Label problem
Quality labels are a challenge:
– Which is the ideal quality label for document-level QE?
– How can we assess documents?
• Sentence-level scores aggregation?
Document-level QE
![Page 35: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/35.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
35
Quality Label problem
Quality labels are a challenge:
– Which is the ideal quality label for document-level QE?
– How can we assess documents?
• Sentence-level scores aggregation?
• New assessment score of the document as a whole?
Document-level QE
![Page 36: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/36.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
36
Quality Label problem
Quality labels are a challenge:
– BLEU-style metrics as quality labels• LIG corpus (FR-EN) → 119 documents
Document-level QE
![Page 37: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/37.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
37
Quality Label problem
Quality labels are a challenge:
– BLEU-style metrics as quality labels• WMT corpus (EN-DE) → 52 documents (1215
paragraphs)
– Low STDEV → documents have similar quality• Is it really true?
Document-level QE
![Page 38: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/38.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
38
Quality Label problem
Quality labels are a challenge:
– BLEU-style metrics as quality labels• WMT corpus (EN-DE) → 52 documents (1215
paragraphs)
– Low STDEV → documents have similar quality• Is it really true?
Document-level QE
![Page 39: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/39.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
39
Quality Label problem
Quality labels are a challenge:
– BLEU-style metrics as quality labels• WMT corpus (EN-DE) → 52 documents (1215
paragraphs)
– Low STDEV → documents have similar quality• Is it really true?
Document-level QE
![Page 40: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/40.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
40
Two-stage post-edition method
PE1:
– Post-edition of sentences without context
Document-level QE
![Page 41: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/41.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
41
Two-stage post-edition method
PE1:
– Post-edition of sentences without context• Wir brauchen das kulturelle Fundament, aber wir haben jetzt
mehr Schriftsteller als Leser.
Document-level QE
![Page 42: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/42.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
42
Two-stage post-edition method
PE1:
– Post-edition of sentences without context• Wir brauchen das kulturelle Fundament, aber wir haben jetzt
mehr Schriftsteller als Leser.
PE2:
– Post-edition of sentence with context
Document-level QE
![Page 43: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/43.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
43
Two-stage post-edition method
PE1:
– Post-edition of sentences without context• Wir brauchen das kulturelle Fundament, aber wir haben jetzt
mehr Schriftsteller als Leser.
PE2:
– Post-edition of sentence with context• - St. Petersburg bietet nicht viel kulturelles Angebot, Moskau hat
viel mehr Kultur, es hat eine Grundlage. Es ist schwer fr die Kunst, sich in unserem Umfeld durchzusetzen. Wir brauchen das kulturelle Fundament, aber wir haben jetzt mehr Schriftsteller als Leser. Das ist falsch. In Europa gibt es viele neugierige Menschen, die auf Kunstausstellungen, Konzerte gehen. Hier ist diese Schicht ist dünn.
Document-level QE
![Page 44: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/44.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
44
Two-stage post-edition method
Hypothesis:
– There are problems in MT outputs that can only be solved in context
Document-level QE
![Page 45: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/45.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
45
Two-stage post-edition method
Hypothesis:
– There are problems in MT outputs that can only be solved in context
– Measuring the difference from PE1 to PE2
Document-level QE
![Page 46: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/46.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
46
Two-stage post-edition method
Hypothesis:
– There are problems in MT outputs that can only be solved in context
– Measuring the difference from PE1 to PE2
• Isolating document-level problems
Document-level QE
![Page 47: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/47.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
47
Two-stage post-edition method
Hypothesis:
– There are problems in MT outputs that can only be solved in context
– Measuring the difference from PE1 to PE2
• Isolating document-level problems
• Using the difference to create a better quality label
Document-level QE
![Page 48: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/48.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
48
Two-stage post-edition method
Hypothesis:
– There are problems in MT outputs that can only be solved in context
– Measuring the difference from PE1 to PE2
• Isolating document-level problems
• Using the difference to create a better quality label
Document-level QE
![Page 49: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/49.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
49
Two-stage post-edition method
Experiments:
– Data: 1215 paragraphs → WMT EN-DE corpus
Document-level QE
![Page 50: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/50.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
50
Two-stage post-edition method
Experiments:
– Data: 1215 paragraphs → WMT EN-DE corpus • Filter 1: only paragraphs with more than 3
sentences (less than 8)
Document-level QE
![Page 51: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/51.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
51
Two-stage post-edition method
Experiments:
– Data: 1215 paragraphs → WMT EN-DE corpus • Filter 1: only paragraphs with more than 3
sentences (less than 8)• Filter 2: Paragraphs ordered by number of
discourse phenomena (discourse connectives and pronouns)
Document-level QE
![Page 52: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/52.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
52
Two-stage post-edition method
Experiments:
– Data: 1215 paragraphs → WMT EN-DE corpus • Filter 1: only paragraphs with more than 3
sentences (less than 8)• Filter 2: Paragraphs ordered by number of
discourse phenomena (discourse connectives and pronouns)
• Final data: 200 paragraphs
Document-level QE
![Page 53: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/53.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
53
Two-stage post-edition method
Experiments:
– Data: 1215 paragraphs → WMT EN-DE corpus • Filter 1: only paragraphs with more than 3
sentences (less than 8)• Filter 2: Paragraphs ordered by number of
discourse phenomena (discourse connectives and pronouns)
• Final data: 200 paragraphs
– Annotators → students of “translation studies” in Saarland University, Saarbrücken, Germany
Document-level QE
![Page 54: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/54.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
54
Two-stage post-edition method
Experiments:
– Data: 1215 paragraphs → WMT EN-DE corpus • Filter 1: only paragraphs with more than 3
sentences (less than 8)• Filter 2: Paragraphs ordered by number of
discourse phenomena (discourse connectives and pronouns)
• Final data: 200 paragraphs
– Annotators → students of “translation studies” in Saarland University, Saarbrücken, Germany
– 16 sets → evaluate agreement
Document-level QE
![Page 55: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/55.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
55
Two-stage post-edition method
Annotator's agreement:
Document-level QE
![Page 56: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/56.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
56
Two-stage post-edition method
Annotator's agreement:
Document-level QE
![Page 57: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/57.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
57
Two-stage post-edition method
Annotator's agreement:
Document-level QE
![Page 58: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/58.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
58
Two-stage post-edition method
Annotator's agreement:
Document-level QE
![Page 59: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/59.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
59
Two-stage post-edition method
Annotator's agreement:
Document-level QE
![Page 60: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/60.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
60
Two-stage post-edition method
Annotator's agreement:
Document-level QE
![Page 61: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/61.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
61
Two-stage post-edition method
Annotator's agreement:
Document-level QE
![Page 62: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/62.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
62
Two-stage post-edition method
Annotator's agreement:
Document-level QE
![Page 63: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/63.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
63
Two-stage post-edition method
Changes from PE1 to PE2 – paragraphs perspective:
Document-level QE
![Page 64: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/64.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
64
Two-stage post-edition method
Changes from PE1 to PE2 – paragraphs perspective:
Document-level QE
All paragraphswere changed
![Page 65: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/65.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
65
Two-stage post-edition method
Paragraph changes example:
Document-level QE
![Page 66: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/66.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
66
Two-stage post-edition method
Paragraph changes example:
Document-level QE
![Page 67: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/67.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
67
Two-stage post-edition method
Paragraph changes example:
Document-level QE
Better wordchoices
![Page 68: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/68.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
68
Two-stage post-edition method
Paragraph changes → manual analysis
Document-level QE
![Page 69: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/69.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
69
Two-stage post-edition method
Paragraph changes → manual analysis– Discourse/context changes
Document-level QE
![Page 70: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/70.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
70
Two-stage post-edition method
Paragraph changes → manual analysis– Discourse/context changes– Stylistic changes
Document-level QE
![Page 71: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/71.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
71
Two-stage post-edition method
Paragraph changes → manual analysis– Discourse/context changes– Stylistic changes– Other changes
Document-level QE
![Page 72: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/72.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
72
Two-stage post-edition method
Paragraph changes → manual analysis– Discourse/context changes– Stylistic changes– Other changes
Document-level QE
![Page 73: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/73.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
73
Two-stage post-edition method
Paragraph changes → manual analysis– Discourse/context changes– Stylistic changes– Other changes
– Low agreement• Annotators should not made lots of stylistic
changes!
Document-level QE
![Page 74: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/74.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
74
Two-stage post-edition method
Final results:
Document-level QE
![Page 75: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/75.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
75
Two-stage post-edition method
Final results:
– 116 paragraphs analysed
Document-level QE
![Page 76: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/76.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
76
Two-stage post-edition method
Final results:
– 116 paragraphs analysed
– Some changes → only with paragraph context
Document-level QE
![Page 77: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/77.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
77
Two-stage post-edition method
Final results:
– 116 paragraphs analysed
– Some changes → only with paragraph context
– However
• How to combine the results into a quality label?
Document-level QE
![Page 78: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/78.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
78
Agenda
Introduction
Quality Estimation Framework
Related Work
Document-level Quality Estimation
Quality Label problem
Two-stage post-edition experiment
Large-scale experiments
Conclusion
![Page 79: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/79.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
79
Large-scale experiments
Extending the research:
![Page 80: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/80.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
80
Large-scale experiments
Extending the research:
– Data: ~ 1000 data points • Different language pairs• Entire documents (?)
![Page 81: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/81.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
81
Large-scale experiments
Extending the research:
– Data: ~ 1000 data points • Different language pairs• Entire documents (?)
– Annotators: expert annotators (familiar with post-editing)• Improving guidelines and training
![Page 82: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/82.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
82
Large-scale experiments
Extending the research:
– Data: ~ 1000 data points • Different language pairs• Entire documents (?)
– Annotators: expert annotators (familiar with post-editing)• Improving guidelines and training
– Evaluation: combining PE2 – PE1 with other metrics (HTER, BLEU, …)
![Page 83: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/83.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
83
Large-scale experiments
Extending the research:
– Data: ~ 1000 data points • Different language pairs• Entire documents (?)
– Annotators: expert annotators (familiar with post-editing)• Improving guidelines and training
– Evaluation: combining PE2 – PE1 with other metrics (HTER, BLEU, …)
– Alternative approach: • Post-editions in contexts → available• Apply PE1 (post-editing the sentences again →
without context)• PE2 – PE1 as usual
![Page 84: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/84.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
84
Agenda
Introduction
Quality Estimation Framework
Related Work
Document-level Quality Estimation
Quality Label problem
Two-stage post-edition experiment
Large-scale experiments
Conclusion
![Page 85: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/85.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
85
Conclusion
Two-stage post-edition method → promising!
– Problems that can only be solved in context
![Page 86: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/86.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
86
Conclusion
Two-stage post-edition method → promising!
– Problems that can only be solved in context
How to compute a quality label?
![Page 87: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/87.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
87
Conclusion
Two-stage post-edition method → promising!
– Problems that can only be solved in context
How to compute a quality label?
– Combine PE2-PE1 with other metrics?
– Use PE2-PE1?
![Page 88: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/88.jpg)
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015
88
Acknowledgement
Saarland University: Marcos Zampieri, Mihaela Vela, Heike Przybyl and Josef Van Genabith
Reviewers from EXPERT Workshop
![Page 89: ESR7 Carolina Scarton - EXPERT Summer School - Malaga 2015](https://reader033.vdocuments.site/reader033/viewer/2022051521/587329931a28ab596c8b5489/html5/thumbnails/89.jpg)
Thank you!
Carolina Scarton [email protected]
Supervisor: Dr Lucia Specia
EXPERT – Scientific and Technological Workshop – Malaga, Spain – 27/06/2015