machine translation: latest innovations and their impact on commercial translation
TRANSCRIPT
SDL Proprietary and Confidential
Machine Translation: Latest Innovations and their Impact on Commercial Translation
SDL Customer Success Summit MontrealRodrigo Fuentes Corradi, MT Business ConsultantJune, 2015
2
Agenda
○ Evolution of MT
○ Common MT Use-Cases
○ Engine Training
○ Introducing SDL XMT
○ How to Deploy MT
○ MT and the Post-Editor
Evolution of MT
4
1950s
2002
2010
2011
2015SDL acquires RBMTengine…establishes
MT group dedicated to improving quality for
enterprise applications
First SDL Post- Editing projects
using SMT go into production
Post-Editing booms: 4-fold
increase
SDL launches PE Certification
Program
War-time cryptography requirements,
with subsequent experiments & investment in
automated translation
SDL launches XMT next-
generation MT platform
2014
Brief history of Machine TranslationSDL acquires
Language Weaver / BeGlobal Statistical Machine Translation (SMT)
5
Overview: The SDL MT Team
Who we areFirst to commercialize Statistical
Machine Translation
o 50+ Professionalso Over 10 Nationalitieso Across 5 Time Zoneso 8 Locations
o Ex-translatorso Computational
Linguistso Project
Managers
Widespread team of language lovers:
o Data Specialists
o Post-Editors
o Architects
…all gathered from the four corners of SDL!
What we doDrive MT Adoption:
Educate, promote and support MT usage in existing SDL accounts
& new opportunities
o Designo Createo Testo Implemento Monitor
Custom Engine Builds:
…custom Statistical Machine
Translation engines
Linguistic Projects:Semantic annotation projectsfor US Government bodies
& academic institutes
How we do it
o Los Angeles, CAo Cambridge, UK
Two Research Labs:
o 100s of Scientific Publications
o Over 50 Patents Approved or Filed
We’re Evangelists…about Machine Translation, using automation to accelerate
productivity
Common MT Use-Cases
7
Global Content Explosion
Communication Channels
Consumer PreferencesIncreased Global
Competition
Export Market Growth
8
Right translation method, right price, right timeQ
ual
ity
Volume
Human Translation Machine Translation
Blogs
User Forums
Reviews
ChatEmail
Support
FAQ
Websites
Wikis
KnowledgeBase
Alerts/Notifications
Help
UserGuides
Documentation
Post-Edit
Newsletters
Advertising Content
Legal
9
Description:
○ Direct access to machine translation from SDL Trados Studio
Benefits:
○ Improve the efficiency of translators by providing results of machine translation to them for segments that do not match entries in translation memory
Translator productivity
10
Description:
○ Real-time translation of web-basedchat conversations
Benefits:
o Reduces cost of staffing the support/sales operations as theydo not need multi-lingual agents
o Customer acquisition rates and satisfaction are much higher if you engage the customer in chat.
Live chat translation
11
Description:
○ Translation of user-generated contentin web-based community forums
Benefits:
o Enable interactions between customers who speak different languages
o Leverage community expertise across languages instead of only within the language of community experts
Community forum translation
12
Description:
○ Translation of knowledge base content for local language customers of technical solutions
Benefits:
o Reduces customer support costsand activity level by allowing remote language customers to directlyaccess solutions
o Increases customer satisfaction by providing solutions in their native language
Knowledgebase content translation
13
Case study: MT for online customer reviews
Requirements:o Share customer reviews with
international audienceso Automate the translation of customer
reviews into 13 languages
Results:o Reduced bounce rate from 70% to 25%o Increased user dwell times and page viewso Economically translate 2 billion words/month
14
Case study: MT for instant MS Office translation
[a large global
retail client]
Requirements:o Improve communication among
geographically scattered company employees
o Fast, low-cost translation of MS Outlook emails & MS Office business documents
Results:o BeGlobal Machine Translation integrated
via API with MS Office appso Any employee can instantly translate emails
or attachments with a simple double-click
15
Engine training: Making MT smarter
Customized engines
Domain verticals
Baselines
16
Baselines
Baselines
Data mined from reliable
sources available in the public domain,
covering various subjects
Core generic MT engines for each
language pairWork well for
general & varied content
Can be usedas backup for
verticals & customized
engines
Contain hundreds of millions of words of
bilingual data
100Ms+
17
Domain verticals
Domain verticals
Trained statistical engines exclusive
for a domain
Data selected from sources within a
domain or industry
MT output more likely to follow
technical terminology
Solution used when client-specific data is not available or not enough for a
customization
18
Customized engines
Customized engines
Optimize the MT output for
specific client projects
Training based on client-specific
bilingual data
More data usually has a positive effect
on the MT output
Quality & consistencyof data is as important as
quantity
Adherence to client-specific terminology
& style
19
How SDL trains an MT engine
Training Data Prep &Engine Customization
Prep of Testing Material
Evaluate MT Output
MachineTranslation
Post-Edit
Quality Assessment& Translation
Delivery
Update Translation
Memory
SourceContent
ApplyTranslation
Memory
Content Evaluation MT Customization Production QA
Refine Training or Deployfor Production
Integrate MT on Translation Process
SDL MT Server
TranslationMemory
20
SDL MT Group developers are constantly researching ways to improve Generic, Vertical, and Customized MT Engines
SDL Research Scientists are continuously improving the Statistical Machine Translation algorithms (e.g. Language Models, Translation Models, Reordering Models, Syntax, Transliteration, Rule-Based Components, etc…)
SDL Data Engineers are continuously mining large amounts of good data used by the statistical algorithms
Continuous improvement
21
Introducing SDL XMT…
A NEW, modular & flexible technology that will power the
“next generation” of SDL MT
Syntax-based Machine
TranslationPhrase-based
Machine Translation
Word-basedMachine
Translation
2002
2003
2008
2015
XMTXMT
22
Legacy MT
Legacy MT(Monolithic
Phrase-based)Foreign
LanguageYour
Language
23
……
Neural Networks
Compound Splitting
Phrase- Based
Finite State
Automata
String to Tree
Rule- Based
Tree to String
Pre- Ordering
Trans-literation
Hidden Markov Model
HyperGraphs
Modular &Flexible
“State-of-the-Art”Machine Learning
Better Translation Quality
Rapid Research Transition
SDL XMT: Next generation technology, higher quality
XMT
Foreign Language
Your Language
M O D U L A R C O M P O N E N T S
24
Language Learning in XMT
Continuous improvement by
learning fromPost-Editing.
○ The machine learns how to translate from source to target during the training process
○ The machine doesnot learn during the translation process
Machine TranslationMachine Translation+ Language Learning
○ The machine learns howto translate from source to target during thetraining process
○ The machine learns & improves seamlessly, continuously, and inreal-time from user feedback during the translation process
○ See it in action: SDL XMT
XMT
How to DeployMT Post-Edit
26
Post-Editing experience in Montreal
Quality delivered & owned by SDL, therefore commitment to qualityremains our number #1 priority !
o Costs reductions up to 40% vs.conventional translation
o File Formats received, TXT, XL, and XML
o Unique client-specific process developed with collaborationof engineering & IT Teams from SDL& customers
SDL Canada
Post-EditedPost-Editing Large Retail
Customers e-Commerce Sites
Post-Edited(Forecasted)
2013
2014
2015
25M
10M 15MWords
Words
Words
40%
27
Quality in MTBuilding blocks are there as a lot of content is pulled from the engines
Allows the linguist to focus on refining the output
Custom engines pull in client terminology & style
Fewer resources equals greater consistency
Trained linguists well-versed in handling MT output & certified
28
Post-Editing quality requirements
When post-editing to publishable quality, the following basic principles still apply:
o The same references mustbe used for asfor conventional translation (project-specific guidelines, TMs, glossaries, termbases, etc.)
o Grammar, spelling and punctuation must be correct
o Appropriate style & correct terminology must be used consistently
o The translation must read well and be suitable for its intended purpose
CustomerUser Guide
29
Features to watch out for in SMT output…
Incorrect Formatting
Additional or Missing words
Words Not Localized or
Wrong Flavor
Gender, Number, Agreement or Verb Inflection
Issues
Articles & Prepositions
Syntax & Word Order Issues
Wrong Punctuation
Inconsistent or Non-compliant Terminology
Mistranslations
!
30
Post-Editing Machine Translation certification
○ The demand for MT solutions is growing quickly & Post-Editing is becoming a mainstream skill for translators
○ In response, SDL have created Post-Editing Certification – released in June 2014
○ 85% of in-house staff completed the Certification in 2014
○ 2,500+ freelancers signed up for the course
○ The Certification covers the theory behind Machine Translation as well as practical approaches to Post-Editing
○ Our Certification is for anyone impacted by Post-Editing – certified translators can offer an extended skill set
JUNE 2014
85%
2,500+
31
SDL iMT: Key steps in the process
○ Evaluate content and translation assets
○ Train MT engines for your content or use existing solution
○ Configure the trained MT engines with SDL’s translation environment (TMS, WS, Studio)
○ Post-edit the MT output to full publishable quality
○ SDL infrastructure to support these steps
Evaluate Train MT Configure Post-Edit
SDL Infrastructure
Copyright © 2008-2015 SDL plc. All rights reserved. All company names, brand names, trademarks, service marks,
images and logos are the property of their respective owners.
This presentation and its content are SDL confidential unless otherwise specified, and may not be copied, used or
distributed except as authorised by SDL.
Global Customer Experience Management