approved for public release, distribution unlimited machine translation at darpa joseph olive...
Post on 19-Dec-2015
220 views
TRANSCRIPT
Approved for Public Release, Distribution Unlimited
Machine Translation at DARPA
Joseph OliveProgram Manager
Agenda
●Pre-GALE Programs and Studies
●DARPA and the Language Community
●GALE Plans
●GALE MT Evaluation
●GALE Accomplishments
●Future Research
2Approved for Public Release, Distribution Unlimited
Language Research at DARPA
●Four Decades of Research
●Continuous progress
● Limited vocabulary single talker
● Speaker-independent speech recognition
● Large vocabulary
● Machine translation
● Natural language processing
●TIDES and EARS
● Great Accomplishments
● Need for a New Program
3Approved for Public Release, Distribution Unlimited
GALE Program Goal
4Approved for Public Release, Distribution Unlimited
Enable Automated Processes &English Speaking Soldiers and Commanders
to Absorb & Analyze All Incoming Information In a Timely Manner
Genres• Newswire• Broadcast news• New Groups• Talk Shows...
Languages• Arabic• Chinese...
Topics
• Unbounded
Planning for GALE
●The community offered:
● More Data
● Evaluations
● Word Error Rate - WER
● Bilingual Evaluation Understudy - BLEU
●DARPA Questions:
● What are the applications for the research?
● When is a technology good enough?
● What is new?
● How will progress be measured?
5Approved for Public Release, Distribution Unlimited
Pre-GALE Studies
●Main question – how good is good enough?
●New MT study
●Interpolation between human and machine translation
●Analysts as subjects
●The birth of Human-Targeted Translation Error Rate - HTER
●HTER is the GALE MT metric
6Approved for Public Release, Distribution Unlimited
HTER Translation Evaluation
7Approved for Public Release, Distribution Unlimited
Foreign Language Text & Speech
No. of errorsAccuracy =1 – No. of words
Translators
Evaluators
Adjudicator
Human Editors who conduct comparison
Gold Standard Translation
GALE Machine Translation
Which is right?Can it be ambiguous?
Is it an idiom?
GALE Machine Translation Engine
HTER Editing Example
8Approved for Public Release, Distribution Unlimited
Machine translationThe statement said that the brothers in the military wing to regulate Al Jihad base in the country had carried out the assassination of one of the criminals in the city of penalty.
Corrected machine translationThe statement said that the your brothers in the military wing to regulate Al Jihad base in the country had carried out the assassination of one of the criminals in the city of penalty.
1 error
Corrected machine translationThe statement said that the your brothers in the military wing to regulate of the Al Jihad base in the country had carried out the assassination of one of the criminals in the city of penalty.
5 errors
Corrected machine translationThe statement said that the your brothers in the military wing to regulate of the Al Qaeda Jihad base in the country had carried out the assassination of one of the criminals in the city of penalty.
6 errors
Corrected machine translationThe statement said that the your brothers in the military wing to regulate of the Al Qaeda Jihad organization base in the country Mesopotamia had carried out the assassination of one of the criminal tyrants in the city of penalty Baquba.
11 errors in 33 words (67% accuracy) DeletionInsertion
Corrected machine translation
Human-Translated ReferenceThe statement said that “your brothers in the military wing of the Al-Qaeda Jihad Organization in Mesopotamia carried out an assassination of one of the criminal tyrants in the city of Baquba.”
New Technologies Implemented in GALE
●Topic-Dependent Language Modeling
●Morphology
●Extraction
●Syntax Analysis
●Hierarchical Classes
●Long Distance Language Models
●Semantic Analysis
●Predicate Argument Analysis
9Approved for Public Release, Distribution Unlimited
Arabic Translation Targets – Structured Language
10Approved for Public Release, Distribution Unlimited
Base Φ1 Φ2 Φ3 Φ4 Φ5Line
90
80
70
60
50
40
90
80
70
60
50
40
75/90
55
35
% d
ocum
ents
exce
eding
acc
urac
y
targ
ets
Acc
ura
cy (
%)
Translation from text
Translation from speech
Completed
Pre-GALE
(% accuracy / % of documents)
35
55
75/90
65/80
65/80
80/9080/90
75/8075/80
75/90
Targets include accuracy and consistency
85/85
85/90
85/9085/85
90/8590/85
90/90
90/9090/9090/90
90/95
90/95
Arabic Translation Results – Newswire
11Approved for Public Release, Distribution Unlimited
0 4 8 12 16 21 25 29 33 37 41 45 49 54 58 62 66 70 74 78 82 87 91 95 9960
65
70
75
80
85
90
95
100
Phase 4
90.0
% A
ccur
acy
% of documents
Ph 4
Target
Arabic progress
Approved for Public Release, Distribution Unlimited
% e
rror
P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4NW WB BN BC
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
Arabic Machine Translation
Formal Text
Semi-Formal Text
Formal Audio
Semi-Formal Audio
12
Chinese Progress
13Approved for Public Release, Distribution Unlimited
Formal Text
Semi-Formal Text
Formal Audio
Semi-Formal Audio
Human vs. Machine
GALE is as good as a single human in Arabic
1 8 15 22 29 36 43 50 57 64 71 78 85 9270
75
80
85
90
95
100
105
Human vs. Machine Arabic Formal Text
pass 1
pass 2
GALE P4
P4-Target
Percent of Documents
Per
cent
Acc
urac
y
1 8 15 22 29 36 43 50 57 64 71 78 85 9270
75
80
85
90
95
100
105
Human vs. Machine Arabic Semi-Formal Text
pass 1
pass 2
GALE P4
P4-target
Percent of Documents
Per
cent
Acc
urac
y
1 9 17 25 33 41 49 57 65 73 81 89 9770
75
80
85
90
95
100
105
Human vs. Machine Chinese Formal Text
pass 1
pass 2
GALE P4
P4-Target
Percent of Documents
Per
cent
Acc
urac
y
1 9 17 25 33 41 49 57 65 73 81 89 9770
75
80
85
90
95
100
105
Human vs. Machine Chinese Semi-Formal Text
pass 1
pass 2
GALE P4
P4-Target
Percent of Documents
Per
cent
Acc
urac
y
14Approved for Public Release, Distribution Unlimited
Improving Translation of Chinese Speech
●Chinese transcription error rates are extremely low, but increase along with perplexity
●Improvement in translation of Chinese speech will require work in lowering perplexity
15Approved for Public Release, Distribution Unlimited
Evaluation Set
Formal Audio Semi-Formal Audio Overall
PPL CER PPL CER PPL CER
Phase 2 21 2.7 33 14.8 26 8.5
Phase 3 30 4.6 33 18.7 31 11.7
Phoneme Transcription Experiment, Human Vs. Machine
●Overall Goal● Assess the bounds of human phonetic recognition and compare with
machines
●Previous Work● Human recognition tested on artificial stimuli● Results show that human accuracy is extremely high● Artificial stimuli lack the complexity of natural speech
●The Problem● Isolate phonetic recognition from language biases ● Human phonetic discrimination abilities are intimately tied with language,
phonotactic and prosodic processing, and lexical and semantic familiarity
●Solution● Use natural speech for stimuli● Use transcribers who lack prosodic, phonotactic, lexical, and semantic
information, but share a phoneme space
16Approved for Public Release, Distribution Unlimited
●Japanese speakers – Italian transcribers
●15 Human Subjects
●420 phonemes per subject
17Approved for Public Release, Distribution Unlimited
System Subst Del Ins PER
ASR HMM-CI 19.6 7.9 7.4 34.9
Human
Average 15.3 8.6 5.9 29.9
Best 9.0 4.0 4.3 17.2
Worst 16.6 10.7 10.2 37.5
Phoneme Transcription Experiment, Human Vs. Machine
●The difference between human and machine performance was around 10%
●Result indicates that progress in STT will require improved language models
Systems in Use Today
18Approved for Public Release, Distribution Unlimited18
FOUO
Real-time translation of Arabic, Chinese, Spanish*, or Farsi* broadcasts and web text into English
BBN
Broadcast Monitoring System& Web Monitoring System
Real-time translation of Arabic, Chinese, Spanish*, or Farsi* broadcasts and web text into English
BBN Web Monitoring System
IBM
Translingual Automated Language Exploitation System
“The Baghdad system was under extensive operation and the users were very pleased with its capability”
– LTC. John Venhaus, commanding officer for Joint PSYOP Group at CENTCOM (Oct. 2007)
*Farsi and Spanish were funded by outside sources.
“We are excited about the upgrades and think the program is a great asset to the Global War on Terror and beyond.”
– SFC Douglas Wilderman 10th Special Forces Group(A) (Nov. 2008)
Broadcast Monitoring System* Arabic example
19Approved for Public Release, Distribution Unlimited19
Real-time streaming video(~5 min delay)
1Automatic transcription
of Arabic speech
2Automatic translationof Arabic transcript
3
Although there are no official sources, and accurate numbers of dead, many believe that the number this year is the largest since the American invasion of Iraq and the fall of Saddam Hussein’s regime two thousand three.
The estimated number of civilians killed daily in Iraq at least one hundred and twenty persons as well as the wounded.Sample Fielded Arabic
Translation
DARPA Present Status
20Approved for Public Release, Distribution Unlimited
Success
● GALE – Groundbreaking Improvements in machine translation of Arabic and Chinese text and speech, in some cases approaching human performance
● TRANSTAC – New state of the art in two way multi-lingual communication by speech for tactical use
● Deployment – GALE and TRANSTAC technologies have been integrated into operational systems and transitioned to users.
DARPA Present Status (Continued)
21Approved for Public Release, Distribution Unlimited
Limitations
● Lack of Flexibility – No ability to communicate or monitor informal language● Conversations, chat, messaging, etc. are mostly informal● Technology does not exist to cope with informal language models
● Lack of Reliability – Error propagation in multiple dialogue turns● To perform multi-turn conversations and chat we need extremely high translation accuracies● Need human machine dialogue to clarify and disambiguate input to reduce probability of error
● Lack of Robustness – No capabilities to translate speech signals of less than 25db SNR● Conversing and monitoring of conversation are often not in clean signal. ● Transcription of degraded signals are unusable
● Lack of Generality – Costly and time consuming methods to develop new language● Cannot duplicate the GALE effort for each new language and dialect
● Huge parallel corpora – $60M-$160M/language● Parallel corpora are insufficient
● e.g. Chinese corpora already consist of 200 million words● Requires expensive and time consuming annotations
Future Language Research Areas
● One way translation – Monitoring● Improvement of translation quality in language very different from English (e.g.
Chinese)● Inclusion of informal genres – conversation, e-mail, web chat, messaging● Extension into Arabic dialects – Modern Standard Arabic is seldom used in
informal genres● Fast acquisition of new language capabilities● Robustness to noise
● Two way translation – Communication● Human-machine dialogue● Human-human and human-computer verbal and text interaction
● Information retrieval – linguistically enabled search● Accurate retrieval of relevant, non-redundant information● Natural language query capability
● Language Understanding● Grounded language comprehension through experiential learning of objects,
actions, and consequences
22Approved for Public Release, Distribution Unlimited
These four thrusts share many underlying technologies
Future Algorithm Research
●Rugged Syntactic, Semantic Role Labeling, and Predicate –Argument Analysis● Unconstrained topics and genres
● Use semantic equivalences
● Analysis of incomplete sentences and/or Analysis of inconclusive acoustic output
● Projection of syntax and SRL from known to unknown languages
●Powerful Language Models● Modeling non-adjacent words
● Utilizing syntactic and semantic information
● Using wild cards for incomplete sentences and/or inconclusive acoustic output
●Analysis and Translation of Longer Input● discourse threading
● Prosodic cues
● Coherency of topics
● Co-reference resolution
● Content analysis
23Approved for Public Release, Distribution Unlimited
Future Algorithm Research (Continued)
●Increasing reliability of two-way communication and natural language query
● Human – machine dialogue for clarification and disambiguation● Automatic error detection● Ambiguity resolution● Language generation● Multimodal input
●Semantic Role Labeling and Dependency Parsing Analysis in Both Source and Target Languages
●Dialects● Translation from one dialect to another (e.g. Modern Standard Arabic to dialectal
Arabic)
● Dialect detection and identification
●New Techniques in Automatic Evaluation of Translation Quality as a Target for Optimization and Automatic Quality Assessment
●Language Understanding
24Approved for Public Release, Distribution Unlimited
www.darpa.mil
25Approved for Public Release, Distribution Unlimited
26Approved for Public Release, Distribution Unlimited
Abstract: Defense Advanced Research Projects Agency (DARPA) Program Manager Joseph Olive will discuss the Chinese and Arabic machine translation work being carried out under DARPA's Global Autonomous Language Exploitation Program. Topics will include preparation for the program, the evaluation paradigm, the current status, and potential future research directions.