speech and language technologies in the next generation localisation cset prof. andy way, school of...
Post on 18-Dec-2015
220 Views
Preview:
TRANSCRIPT
Speech and Language Technologies in the Next Generation Localisation CSET
Prof. Andy Way, School of Computing, DCU
Overview of Presentation
Speech & Language Technologies in the NGL CSET
Overview of Presentation
Speech & Language Technologies in the NGL CSET
Facilitating Optimal Multilingual NGL Applications
Overview of Presentation
Speech & Language Technologies in the NGL CSET
Facilitating Optimal Multilingual NGL Applications
Key Research Challenges
Overview of Presentation
Speech & Language Technologies in the NGL CSET
Facilitating Optimal Multilingual NGL Applications
Key Research Challenges
Novel Research Tracks
Overview of Presentation
Speech & Language Technologies in the NGL CSET
Facilitating Optimal Multilingual NGL Applications
Key Research Challenges
Novel Research Tracks
Typical LSP’s Translation Process
Overview of Presentation
Speech & Language Technologies in the NGL CSET
Facilitating Optimal Multilingual NGL Applications
Key Research Challenges
Novel Research Tracks
Typical LSP’s Translation Process
Key Integration Challenges
Overview of Presentation
Speech & Language Technologies in the NGL CSET
Facilitating Optimal Multilingual NGL Applications
Key Research Challenges
Novel Research Tracks
Typical LSP’s Translation Process
Key Integration Challenges
Concluding Remarks
ILT - Integrated Language Technologies
NextGenerationLocalisation
SystemsFramework
Ent
erp
rise
Lo
calis
atio
n
Per
son
alis
ed L
ocal
isat
ion
Unified Model
DigitalContentManagement
IntegratedLanguageTechnologies
Prof. Andy WayILT Area Coordinator
ILT: Facilitating Optimal Multilingual NGL Applications
Machine Translation
Text Input
Text Output
Text Processing
e.g. bulk localisation
ILT: Facilitating Optimal Multilingual NGL Applications
Speech TechnologiesMachine Translation
Text Input
Text OutputSpeech Output
Speech Input
Text Processing
e.g. bulk localisation e.g. personalisation
Machine Translation: Significance
For our industrial partners, volume of material needing translation increasing, while budgets remain the sameIn the EU, now 23 official languages (506 language pairs), and expanding …In the US, huge investment in translation between Arabic, Chinese and UrduEnglish …
Machine Translation: Significance
For our industrial partners, volume of material needing translation increasing, while budgets remain the sameIn the EU, now 23 official languages (506 language pairs), and expanding …In the US, huge investment in translation between Arabic, Chinese and UrduEnglish …
Automation the only option (especially for PL) …
Enhanced Translation Quality
MT: Key Research Challenges
Enhanced Translation Quality
Faster Translation Times
Scalability
Other Modalities (Speech, SMS etc.)
The State-of-the-Art
Source:
Reference: The two sides highlighted the role of the World Trade Organization (WTO)
Baseline: The two sides on the role of the WTO
Improving the State-of-the-Art
Our MT systems have knowledge of syntaxParts of speech (nouns, verbs etc.)Roles in sentences (subject, object etc.)
better translation quality
Source:
Reference: The two sides highlighted the role of the World Trade Organization (WTO)
Baseline: The two sides on the role of the WTO
Our System: The two sides reaffirmed the role of the WTO
The State-of-the-Art
Source:
Reference: Mahmoud Abbas: The wall and settlements will not bring Israel security
Baseline: Mahmoud Abbas, the wall and settlements will provide security to Israel
Our System: Mahmoud Abbas, the wall and settlements will not provide security for Israel
Improving the State-of-the-Art
better translation quality (especially where end-users are concerned)
DCU ArabicEnglish system ranked first at international MT evaluation in Oct. 2007
Source:
Reference: Mahmoud Abbas: The wall and settlements will not bring Israel security
Baseline: Mahmoud Abbas, the wall and settlements will provide security to Israel
Our System: Mahmoud Abbas, the wall and settlements will not provide security for Israel
MT Novel Research: Handling Different Types of Text
Translating patent applications, or doctors’ prescriptions, or visa applications: different tasks, as the content is different …
So is the form …
MT Novel Research: Handling Different Types of Text
Translating patent applications, or doctors’ prescriptions, or visa applications: different tasks, as the content is different …
So is the form …
Build different MT systems for each different task, using our industrial partners’ documentation
Text Processing: Significance and Challenges
If texts are automatically annotated with:
syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM)
Text Processing: Significance and Challenges
If texts are automatically annotated with:
syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM)
text-type and genre information, this helps our MT systems disambiguate text and improve translation quality
Text Processing: Significance and Challenges
If texts are automatically annotated with:
syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM)
text-type and genre information, this helps our MT systems disambiguate text and improve translation quality
localisation information (e.g. <DNT>Andy Way</DNT>), then the workflows of our industrial partners (currently done manually) can be significantly improved (cf. LOC)
Speech Technology: Significance
Speech interfaces for eyes-busy, hands-busy scenairos
Speech recognition and synthesis systems which can deal withpotentially an unlimited vocabularymultiple (and non-native) speakersmultiple languages
and can be tightly integrated with MT
localisation & personalisation
volume & scalability
access
the more it snows the more it goes…
them ore its nows them ore it goes?themo rei tsn ow sthe mo reitg o es?
themoreitsnowsthemoreitgo
es
Speech Technology: Challenges
the more it snows the more it goes…
them ore its nows them ore it goes?themo rei tsn ow sthe mo reitg o es?
themoreitsnowsthemoreitgo
esdemoreisnowsdemoregoes
Speech Technology: Challenges
themoreitsnowsthemoreitgo
es
linguistic competence of native speaker
“rules” and vocabulary of
system
performance of (native) speaker
Speech Technology: Challenges
the more it snows the more it goes…
them ore its nows them ore it goes?themo rei tsn ow sthe mo reitg o es?
demoreisnowsdemoregoes
themoreitsnowsthemoreitgo
esthe more it snows the more it goes…
linguistic competence of native speaker
them ore its nows them ore it goes?themo rei tsn ow sthe mo reitg o es?
“rules” and vocabulary of
system
performance of (native) speaker
Speech Technology: Innovations
which integrates explicit linguistic knowledge
Robust & Novel Speech
Recognition Engine
demoreisnowsdemoregoes
themoreitsnowsthemoreitgo
esdetverkarhavaritenstorstormhurmån
the more it snows the more it goes…
linguistic competence of native speaker
them ore its nows them ore it goes?themo rei tsn ow sthe mo reitg o es?
“rules” and vocabulary of
system
Jemehreschneitdesto
mehresgeht
Innovations: Speech Recognition & MT
Robust & Novel Speech
Recognition Engine
Tight coupling with MT Engines
which integrates explicit linguistic knowledge
themoreitsnowsthemoreitgo
esdetverkarhavaritenstorstormhurmån
Jemehreschneit
destomehres
geht
Innovations: MT & Speech Synthesis
Robust & Novel Speech
Synthesis Enginewhich integrates explicit linguistic knowledge
Tight coupling with MT Engines
Typical LSP’s Translation Process
Freelance Translators
Step 2: Post-editing &
translation
In-house Translators
Incoming documents
(segmented)
Partially Translated Documents, with confidence rating
for segments
Translation Memory
DB
Step 1: Translation
Memory
Step 3: Documents Validation & Finalization
Requirement: Requirement: minimal disruption minimal disruption
of this processof this process
& Machine Translation TM match score < 50 %:
expensive50 % < TM match score < 70 %: medium
TM match score > 70 %: cheap
Key Integration Challenges
Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]
Key Integration Challenges
Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]
Linking MT automatic evaluation metrics with post-editing cost
Key Integration Challenges
Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]
Linking MT automatic evaluation metrics with post-editing cost
Ensuring that MT omissions are highlighted
Key Integration Challenges
Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]
Linking MT automatic evaluation metrics with post-editing cost
Ensuring that MT omissions are highlighted
Enforcing customer terminology
Key Integration Challenges
Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]
Linking MT automatic evaluation metrics with post-editing cost
Ensuring that MT omissions are highlighted
Enforcing customer terminology
Deal with markup, tags …
Key Integration Challenges
Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]
Linking MT automatic evaluation metrics with post-editing cost
Ensuring that MT omissions are highlighted
Enforcing customer terminology
Deal with markup, tags …
Produce true-cased translations
Key Integration Challenges
Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]
Linking MT automatic evaluation metrics with post-editing cost
Ensuring that MT omissions are highlighted
Enforcing customer terminology
Deal with markup, tags …
Produce true-cased translations
Integrate into pre-existing workflows!
Concluding Remarks
For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students
Concluding Remarks
For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students
Large interest from industrial partners, both large and small
Concluding Remarks
For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students
Large interest from industrial partners, both large and small
Input from LOC, DCM and SF
Concluding Remarks
For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students
Large interest from industrial partners, both large and small
Input from LOC, DCM and SF
Significant role in CNGL demonstrators
Concluding Remarks
For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students
Large interest from industrial partners, both large and small
Input from LOC, DCM and SF
Significant role in CNGL demonstrators
Research tools Industrial prototypes
Concluding Remarks
For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students
Large interest from industrial partners, both large and small
Input from LOC, DCM and SF
Significant role in CNGL demonstrators
Research tools Industrial prototypes
Well placed to succeed in going ‘beyond TMs’ …
Speech & Language Technologies in the NGL CSET
Thanks for listening!
Questions?
http://www.cngl.ie
away@computing.dcu.ie
top related