real-time translation services - locworld · real-time translation services ... require speech...

17
IBM T.J. Watson Research Center © 2010 IBM Corporation REAL-TIME TRANSLATION SERVICES Salim Roukos Sr. Manager, Multilingual NLP Technologies CTO Translation Technologies [email protected]

Upload: vuongthuy

Post on 28-May-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

IBM T.J. Watson Research Center

© 2010 IBM Corporation

REAL-TIME TRANSLATION SERVICES

Salim Roukos

Sr. Manager, Multilingual NLP Technologies

CTO Translation Technologies

[email protected]

3 Machine Translation application segments:

The basic building block of machine translation•

Used for text such as documents translation, websites, blogs, & chat, etc.•

Sample customers: commercial technical support, global enterprise

Require speech recognition technology•

Used to monitor media, telephone intercepts, etc.•

Sample customers: media companies

Requires speech recognition and speech generation technologies•

Used to communicate in the field, conduct field interrogations, etc.•

Sample customers: military, law enforcement, hospitals, business

travelers, service industry

Text-to-Text

Speech-to-Text

Speech-to-Speech

Text translation

TranslationSpeech

RecognitionSpeech

Synthesis

T2T

S2T

S2S

© 2009 IBM Corporation4

English Foreign

E Ftranslator

F E translator

Translation between natural languages using corpus-based methods

Learn from parallel corpus: example translations by humans

Easier to create MT systems for new language pairs & Domains

E->F

Translator

F->E

Translator

Statistical Machine Translation

© 2009 IBM Corporation5

AFP Corpus: 177 Sentences, 3 Reference Translations MT03 Corpus: 663 Sentences, 4 Reference Translations

0%

10%

20%

30%

40%

50%

60%

Mar-02 Jun-02 Apr/May-03

Jan-04 May-04 Aug-04 Sep-04 Jun-05 Jun-06 7-Jun

AFP news (3ref)MT03 (4ref)COTS

1st SMT product to market – Aug 2003

Word Precision %1-word = 82% 3-word = 37%2-word = 54% 4-word = 25%

Bleu: Improving Quality of IBM Arabic-English SMT Over Time

IBM T.J. Watson Research Center

© 2010 IBM Corporation

Rosetta ConsortiumRosetta ConsortiumRosetta Consortium

-

Structured: NW, BN

-

Unstructured: Blog, BC

-

Arabic, Chinese, English

-

Aggressive GNG goals

-

Scalable to 10’s languages

-

24x7 Utility system

-

Personalized systems

Multi-Lingual & Multi-Modal Information Management

VIDEO

AUDIOShow me reports

on Sarkozy’s

visit to Middle East

:

WireService

FAX

Web/Blog

Crawls

TEXT

IMAGE

Distillation

Geo-Spatial KB

UnstructuredInformation Management Architecture

VideoExtraction

LanguageTranslation

SpeechTranscription

Distillation

UnstructuredInformation Management Architecture

OpticalCharacter

Recognition

ArabicChineseFrenchHindi

Spanish

© 2009 IBM Corporation9

24x7 real-time monitoring of foreign news

Watch live TV news with generated English captions

Browse and translate web journalism

Languages supported:–

English

Modern Standard Arabic

Mandarin Chinese

Spanish

Farsi

Search archived media.

Information extraction

Scalable architecture

Technology Stack

Software GUIIBM TALES -

A News Monitor/Search Solution

35 year history of investment and development via the TJ Watson Research Center

RTTS-based n.Fluent Machine Translation Portal is capable of automatically translating

Text

Web pages

Documents

Instant MessagingSametime

chats (via a Sametime

plug-in)

Mobile (BlackBerry and others) translation applications

Language Pairs: English to/from Arabic, Simplified and Traditional Chinese, French, German, Japanese, Korean, Italian, Portuguese, Russian, and Spanish

APIs to RTTS and n.Fluent services allow other applications to access speech and translation services

Refined through crowdsourcing model

n.Fluent

application is “tuned”

using input from 400,000 IBMers

in 170 countries

RTTS Powers IBM RTTS Powers IBM n.Fluentn.Fluent

PilotPilotSecure, enterprise-strength, real-time translation for the Global Enterprise

Real Time Translation Services: Robust Platform for Language Translation

Chat

Voice (VoIP)

Web/Text

SMS/MMS

(data)

Voice

WAS Foundation

Ope

ratio

ns &

Man

agem

entSpeech Recognition Services

Translation Services

Lang

uage

s an

d D

omai

ns

UIMA EE Infrastructure

Analytics Services

RTT

S A

pplic

atio

n In

terf

aces

Text to Speech Services

13

n.Fluent Translation System

Sametime plug-in

Lotus Notes plug-in

BlackBerry Translator

Translation Solutions

n.Fluent

Text Translation

Crowdsourcing

Document Repository

User Profile Management

n.Fl

uent

Ser

ver I

nter

face

s

n.Fluent Server

RTT

S C

lient

RTTS Server

Document Repository Database

Dat

abas

e S

erve

r

User Profile Database

RTT

S In

terf

aces

n. means n-way and represents Crowd

Fluent - "able to express oneself readily and effortlessly" to represent translation quality

14

n.Fluent OverviewSecure, enterprise-strength, real-time translation for the Global Enterprisen.Fluent Machine Translation Portal

Cut&PasteText–

Web pages–

Documents–

Instant MessagingSametime

chats (via a Sametime plug-in) –

Mobile (BlackBerry and others) translation applicationLanguage Pairs: English to/from Arabic, Simplified and Traditional Chinese, French, German, Japanese, Korean, Italian, Portuguese, Russian, and Spanish.

APIs to RTTS and n.Fluent services allow other applications to access speech and translation servicesIn the process of moving from the Technology Adaption Program (TAP) environment to a 24x7 supported environmentCrowdsourcing – leverage 400K IBMers

العربية 中文

Deutsch

English

Français

Italiano

日本語

PortuguêsРусский

Español한국어

15

n.Fluent Text Translation Portal

20

21

SMT-Based Technologies Increases Accuracy through ‘Domains’

1. Collect training data in both languages2. Review and ‘clean’

the data 3. Translate each sentence to the other language to

create the parallel corpus4. Execute the model building tools to create the

statistical models5. Test the created models6. Repeat 1-5 refining models and improving

weakness identified during accuracy testing

Collect

Clean

Translate

Build

Test

Domain Model Creation Steps

Domain: Topic of conversation that the translation engine will recognize and translate

Domain Examples:

Base Domains (News) Conversational Adaptations

English ForeignTraining Data Training Data

Parallel Corpus

Human Translated

An SMT domain model is based on a ‘parallel corpus’

Domain Model creation is an iterative process

Custom domains are created by enhancing a broad base domain, and must be created for each language pair

00.050.1

0.150.2

0.250.3

0.350.4

0.450.5

Base29

kwords

180 k

words

350 k

words

BLEU

The quality / size of the domain model directly impacts the translation quality

Qua

lity

Base 29k 180k 350k Words

Travel MedicalMilitary Custom

CommerceIT Help

© Copyright IBM Corporation 2010

January 31, 2010

n.Fluent Update