speech processing 15-492/18-492optional speech control push-to-talk buttons close-talking microphone...
TRANSCRIPT
-
Speech Processing 15-492/18-492
Speech TranslationCase study: TranstacDetails
-
Transtac: Two S2S System
��DARPA developed forDARPA developed for�� Check points, medical and civil defenseCheck points, medical and civil defense
��RequirementsRequirements�� Two wayTwo way
�� EyesEyes--free (no screen)free (no screen)
�� PortablePortable
�� Usable by real Usable by real usersSusersS
-
Transtac System
Laptop secured in Backpack
Optional speech controlPush-to-Talk Buttons
Close-talking Microphone
Small powerful Speakers
-
Transtac System Details
��Two way systemTwo way system�� 2 ASR systems: English and Iraqi2 ASR systems: English and Iraqi
�� 2 way statistical translation2 way statistical translation
�� 2 synthesizers2 synthesizers
��PushPush--toto--talk systemtalk system�� (Users don’t like “translate everything mode”)(Users don’t like “translate everything mode”)
��Echo back ASR resultEcho back ASR result�� And then translationAnd then translation
-
Iraqi Language
�� Iraqi Arabic is a dialectIraqi Arabic is a dialect�� Most Iraqi’s write Modern Standard ArabicMost Iraqi’s write Modern Standard Arabic
�� Most Iraqi’s do not write their own dialectMost Iraqi’s do not write their own dialect
��No standardized spellingNo standardized spelling�� TranstacTranstac project invented oneproject invented one
�� But Iraqi’s may not be used to itBut Iraqi’s may not be used to it
��Arabic (MSA and dialects)Arabic (MSA and dialects)�� Do not write short vowels in wordsDo not write short vowels in words
-
Data for Training
�� Collected human mediated dialogsCollected human mediated dialogs�� Human acts as a machineHuman acts as a machine
�� Passed a microphone back an forwardPassed a microphone back an forward
�� Try to get people not to talk at same timeTry to get people not to talk at same time
�� Large number of collections (over 4 years)Large number of collections (over 4 years)�� 650 thousand sentences pairs650 thousand sentences pairs
�� Many different speakersMany different speakers
�� Hand transcribed by experts (in Iraqi spelling)Hand transcribed by experts (in Iraqi spelling)
�� Hand translate (Source sentences and Interpreter’s)Hand translate (Source sentences and Interpreter’s)
-
Iraqi ASR
�� Acoustic model from Iraqi dataAcoustic model from Iraqi data�� Based on MSA Based on MSA phonesetphoneset
�� Needs to be small fast modelsNeeds to be small fast models�� Discriminative TrainingDiscriminative Training
�� Speaker specific adaptationSpeaker specific adaptation
�� LexiconLexicon�� Based on LDC provided lexiconBased on LDC provided lexicon�� Multiple pronunciations/typos still a problemMultiple pronunciations/typos still a problem
�� Statistically trained LTS rulesStatistically trained LTS rules
�� Language ModelLanguage Model�� Trained on Iraqi input (and translated output)Trained on Iraqi input (and translated output)
-
English ASR
�� Acoustic modelAcoustic model�� Originally using other modelsOriginally using other models�� Then trained from collected dataThen trained from collected data�� (Mostly military personnel)(Mostly military personnel)
�� LexiconLexicon�� Existing lexicon but needed to add Military speak: Existing lexicon but needed to add Military speak:
MRAP, IEDMRAP, IED
�� Language modelLanguage model�� Trained from data providedTrained from data provided�� Trained from “similar” data found on the webTrained from “similar” data found on the web�� Training from hand created “typical” examplesTraining from hand created “typical” examples
-
TTS
�� Standard English TTSStandard English TTS�� Appropriate “command” voiceAppropriate “command” voice
�� Unit selectionUnit selection
�� Added lots of military vocabularyAdded lots of military vocabulary
�� Iraqi TTSIraqi TTS�� Recorded from Iraqi radio announcerRecorded from Iraqi radio announcer
�� Based on example sentences in the domainBased on example sentences in the domain
�� LDC lexicon and LTS rules (same as ASR)LDC lexicon and LTS rules (same as ASR)
�� Hand tunedHand tuned
-
S2S Interface Issues
��How do you teach people to use the systemHow do you teach people to use the system�� ““TranstacTranstac say instructions”say instructions”
�� Not really sufficientNot really sufficient
��How can you tell it translated correctlyHow can you tell it translated correctly�� Give (speech) feedback.Give (speech) feedback.
BacktranslationBacktranslation
ASR echo backASR echo back
-
S2S Interface Issues
��How do you translate namesHow do you translate names�� A correct translation/transliteration is hard to A correct translation/transliteration is hard to
understandunderstand
��Mark names in translationsMark names in translations�� “My name is … Abdullah”“My name is … Abdullah”
�� “He lives on … al“He lives on … al--AqarAqar … street”… street”
-
S2S Evaluation (Transtac)
�� Offline testsOffline tests�� ASRASR-->Text and Text>Text and Text-->Text>Text�� Compare to translation referencesCompare to translation references�� WER and “BLEU” scoreWER and “BLEU” score
�� Online testsOnline tests�� Concept transfer (through defined scenarios)Concept transfer (through defined scenarios)�� Speed (number of concepts per minute)Speed (number of concepts per minute)�� (English speech masking)(English speech masking)
�� Utility testsUtility tests�� Does it really workDoes it really work
-
Transtac Participants
�� Developer groupsDeveloper groups�� IBMIBM�� SRISRI�� BBNBBN�� CMUCMU�� USCUSC
�� EvaluationsEvaluations�� Twice a year in Iraqi (somewhere in DC)Twice a year in Iraqi (somewhere in DC)�� One surprise language (Farsi, One surprise language (Farsi, BahasaBahasa Malay)Malay)�� Other evaluations with military groupsOther evaluations with military groups
-
Does it work??
��Yes, mostlyYes, mostly�� 27 concepts out of 3027 concepts out of 30--ish turnsish turns
��Systems are mostly similarSystems are mostly similar�� But some better than othersBut some better than others
��Other techniquesOther techniques�� Belt/holster based PC with handheld speakerBelt/holster based PC with handheld speaker
�� Small PC in pouchSmall PC in pouch
�� Chest mounted array microphoneChest mounted array microphone
-
S2S ASR Advanced issues
�� Tight couplingTight coupling�� ASR should output NASR should output N--bestbest
�� Translated all (lattice)Translated all (lattice)
�� Choose best translationChoose best translation
�� (MT as a LM for ASR)(MT as a LM for ASR)
�� Remove Remove disfluencies/hestitationsdisfluencies/hestitations
�� Add more relevant dataAdd more relevant data�� Automatically convert past tense/third person data to Automatically convert past tense/third person data to
present tense/present tense/first+secondfirst+second person …person …
-
S2S TTS Advance Issues
��MT output isn’t MT output isn’t gramticalgramtical�� TTS doesn’t care and just says itTTS doesn’t care and just says it
�� TTS should try to say MT output with more TTS should try to say MT output with more breaks.breaks.
��TTS (unit selection)TTS (unit selection)�� As a LM on MT output As a LM on MT output
�� Choose the best translation on what is said bestChoose the best translation on what is said best