speech processing 15-492/18-492optional speech control push-to-talk buttons close-talking microphone...

17
Speech Processing 15-492/18-492 Speech Translation Case study: Transtac Details

Upload: others

Post on 03-Feb-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

  • Speech Processing 15-492/18-492

    Speech TranslationCase study: TranstacDetails

  • Transtac: Two S2S System

    ��DARPA developed forDARPA developed for�� Check points, medical and civil defenseCheck points, medical and civil defense

    ��RequirementsRequirements�� Two wayTwo way

    �� EyesEyes--free (no screen)free (no screen)

    �� PortablePortable

    �� Usable by real Usable by real usersSusersS

  • Transtac System

    Laptop secured in Backpack

    Optional speech controlPush-to-Talk Buttons

    Close-talking Microphone

    Small powerful Speakers

  • Transtac System Details

    ��Two way systemTwo way system�� 2 ASR systems: English and Iraqi2 ASR systems: English and Iraqi

    �� 2 way statistical translation2 way statistical translation

    �� 2 synthesizers2 synthesizers

    ��PushPush--toto--talk systemtalk system�� (Users don’t like “translate everything mode”)(Users don’t like “translate everything mode”)

    ��Echo back ASR resultEcho back ASR result�� And then translationAnd then translation

  • Iraqi Language

    �� Iraqi Arabic is a dialectIraqi Arabic is a dialect�� Most Iraqi’s write Modern Standard ArabicMost Iraqi’s write Modern Standard Arabic

    �� Most Iraqi’s do not write their own dialectMost Iraqi’s do not write their own dialect

    ��No standardized spellingNo standardized spelling�� TranstacTranstac project invented oneproject invented one

    �� But Iraqi’s may not be used to itBut Iraqi’s may not be used to it

    ��Arabic (MSA and dialects)Arabic (MSA and dialects)�� Do not write short vowels in wordsDo not write short vowels in words

  • Data for Training

    �� Collected human mediated dialogsCollected human mediated dialogs�� Human acts as a machineHuman acts as a machine

    �� Passed a microphone back an forwardPassed a microphone back an forward

    �� Try to get people not to talk at same timeTry to get people not to talk at same time

    �� Large number of collections (over 4 years)Large number of collections (over 4 years)�� 650 thousand sentences pairs650 thousand sentences pairs

    �� Many different speakersMany different speakers

    �� Hand transcribed by experts (in Iraqi spelling)Hand transcribed by experts (in Iraqi spelling)

    �� Hand translate (Source sentences and Interpreter’s)Hand translate (Source sentences and Interpreter’s)

  • Iraqi ASR

    �� Acoustic model from Iraqi dataAcoustic model from Iraqi data�� Based on MSA Based on MSA phonesetphoneset

    �� Needs to be small fast modelsNeeds to be small fast models�� Discriminative TrainingDiscriminative Training

    �� Speaker specific adaptationSpeaker specific adaptation

    �� LexiconLexicon�� Based on LDC provided lexiconBased on LDC provided lexicon�� Multiple pronunciations/typos still a problemMultiple pronunciations/typos still a problem

    �� Statistically trained LTS rulesStatistically trained LTS rules

    �� Language ModelLanguage Model�� Trained on Iraqi input (and translated output)Trained on Iraqi input (and translated output)

  • English ASR

    �� Acoustic modelAcoustic model�� Originally using other modelsOriginally using other models�� Then trained from collected dataThen trained from collected data�� (Mostly military personnel)(Mostly military personnel)

    �� LexiconLexicon�� Existing lexicon but needed to add Military speak: Existing lexicon but needed to add Military speak:

    MRAP, IEDMRAP, IED

    �� Language modelLanguage model�� Trained from data providedTrained from data provided�� Trained from “similar” data found on the webTrained from “similar” data found on the web�� Training from hand created “typical” examplesTraining from hand created “typical” examples

  • TTS

    �� Standard English TTSStandard English TTS�� Appropriate “command” voiceAppropriate “command” voice

    �� Unit selectionUnit selection

    �� Added lots of military vocabularyAdded lots of military vocabulary

    �� Iraqi TTSIraqi TTS�� Recorded from Iraqi radio announcerRecorded from Iraqi radio announcer

    �� Based on example sentences in the domainBased on example sentences in the domain

    �� LDC lexicon and LTS rules (same as ASR)LDC lexicon and LTS rules (same as ASR)

    �� Hand tunedHand tuned

  • S2S Interface Issues

    ��How do you teach people to use the systemHow do you teach people to use the system�� ““TranstacTranstac say instructions”say instructions”

    �� Not really sufficientNot really sufficient

    ��How can you tell it translated correctlyHow can you tell it translated correctly�� Give (speech) feedback.Give (speech) feedback.

    BacktranslationBacktranslation

    ASR echo backASR echo back

  • S2S Interface Issues

    ��How do you translate namesHow do you translate names�� A correct translation/transliteration is hard to A correct translation/transliteration is hard to

    understandunderstand

    ��Mark names in translationsMark names in translations�� “My name is … Abdullah”“My name is … Abdullah”

    �� “He lives on … al“He lives on … al--AqarAqar … street”… street”

  • S2S Evaluation (Transtac)

    �� Offline testsOffline tests�� ASRASR-->Text and Text>Text and Text-->Text>Text�� Compare to translation referencesCompare to translation references�� WER and “BLEU” scoreWER and “BLEU” score

    �� Online testsOnline tests�� Concept transfer (through defined scenarios)Concept transfer (through defined scenarios)�� Speed (number of concepts per minute)Speed (number of concepts per minute)�� (English speech masking)(English speech masking)

    �� Utility testsUtility tests�� Does it really workDoes it really work

  • Transtac Participants

    �� Developer groupsDeveloper groups�� IBMIBM�� SRISRI�� BBNBBN�� CMUCMU�� USCUSC

    �� EvaluationsEvaluations�� Twice a year in Iraqi (somewhere in DC)Twice a year in Iraqi (somewhere in DC)�� One surprise language (Farsi, One surprise language (Farsi, BahasaBahasa Malay)Malay)�� Other evaluations with military groupsOther evaluations with military groups

  • Does it work??

    ��Yes, mostlyYes, mostly�� 27 concepts out of 3027 concepts out of 30--ish turnsish turns

    ��Systems are mostly similarSystems are mostly similar�� But some better than othersBut some better than others

    ��Other techniquesOther techniques�� Belt/holster based PC with handheld speakerBelt/holster based PC with handheld speaker

    �� Small PC in pouchSmall PC in pouch

    �� Chest mounted array microphoneChest mounted array microphone

  • S2S ASR Advanced issues

    �� Tight couplingTight coupling�� ASR should output NASR should output N--bestbest

    �� Translated all (lattice)Translated all (lattice)

    �� Choose best translationChoose best translation

    �� (MT as a LM for ASR)(MT as a LM for ASR)

    �� Remove Remove disfluencies/hestitationsdisfluencies/hestitations

    �� Add more relevant dataAdd more relevant data�� Automatically convert past tense/third person data to Automatically convert past tense/third person data to

    present tense/present tense/first+secondfirst+second person …person …

  • S2S TTS Advance Issues

    ��MT output isn’t MT output isn’t gramticalgramtical�� TTS doesn’t care and just says itTTS doesn’t care and just says it

    �� TTS should try to say MT output with more TTS should try to say MT output with more breaks.breaks.

    ��TTS (unit selection)TTS (unit selection)�� As a LM on MT output As a LM on MT output

    �� Choose the best translation on what is said bestChoose the best translation on what is said best