asru 2019 poster session 1asru2019.org/wp/wp-content/uploads/postersessions_all-1.pdf · sunday...

6
Sunday 15th Dec 2019 Time: 10:30 am-12:00 noon SESSION CHAIR: Koichi Shinoda Poster Board Room ID Paper Num Paper Title P1 A ASR-1.1 1011 INCREMENTAL LATTICE DETERMINIZATION FOR WFST DECODERS P2 A ASR-1.2 1013 A COMPARISON OF TRANSFORMER AND LSTM ENCODER DECODER MODELS FOR ASR P3 A ASR-1.3 1017 A DROPOUT-BASED SINGLE MODEL COMMITTEE APPROACH FOR ACTIVE LEARNING IN ASR P4 A ASR-1.4 1020 PERSONALIZATION OF END-TO-END SPEECH RECOGNITION ON MOBILE DEVICES FOR NAME ENTITIES P5 A ASR-1.5 1027 SIMULTANEOUS SPEECH RECOGNITION AND SPEAKER DIARIZATION FOR MONAURAL DIALOGUE RECORDINGS WITH TARGET-SPEAKER ACOUSTIC MODELS P6 A ASR-1.6 1049 INTEGRATING SOURCE-CHANNEL MODEL WITH ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION P7 A ASR-1.7 1053 AN INVESTIGATION INTO THE EFFECTIVENESS OF ENHANCEMENT IN ASR TRAINING AND TEST FOR CHIME-5 DINNER PARTY TRANSCRIPTION P8 A ASR-1.8 1057 STATE-OF-THE-ART SPEECH RECOGNITION USING MULTI-STREAM SELF-ATTENTION WITH DILATED 1D CONVOLUTIONS P9 A ASR-1.9 1061 HIGHLY EFFICIENT NEURAL NETWORK LANGUAGE MODEL COMPRESSION USING SOFT BINARIZATION TRAINING P10 A ASR-1.10 1066 IMPROVED MULTI-STAGE TRAINING OF ONLINE ATTENTION-BASED ENCODER-DECODER MODELS P11 A ASR-1.11 1068 LEAD2GOLD: TOWARDS EXPLOITING THE FULL POTENTIAL OF NOISY TRANSCRIPTIONS FOR SPEECH RECOGNITION P12 A ASR-1.12 1076 ORTHOGONALITY CONSTRAINED MULTI-HEAD ATTENTION FOR KEYWORD SPOTTING P13 A ASR-1.13 1080 LEARNING BETWEEN DIFFERENT TEACHER AND STUDENT MODELS IN ASR P14 A ASR-1.14 1090 A UNIFIED ENDPOINTER USING MULTITASK AND MULTIDOMAIN TRAINING P15 B ASR-1.15 1107 DOMAIN EXPANSION IN DNN-BASED ACOUSTIC MODELS FOR ROBUST SPEECH RECOGNITION P16 B ASR-1.16 1108 IMPROVING RNN TRANSDUCER MODELING FOR END-TO-END SPEECH RECOGNITION P17 B ASR-1.17 1113 SIMPLE GATED CONVENT FOR SMALL FOOTPRINT ACOUSTIC MODELING P18 B ASR-1.18 1134 GANS FOR CHILDREN: A GENERATIVE DATA AUGMENTATION STRATEGY FOR CHILDREN SPEECH RECOGNITION P19 B ASR-1.19 1154 ESPRESSO: A FAST END-TO-END NEURAL SPEECH RECOGNITION TOOLKIT P20 B TTS.1 1081 ON THE STUDY OF GENERATIVE ADVERSARIAL NETWORKS FOR CROSS-LINGUAL VOICE CONVERSION P21 B TTS.2 1089 WAVENET FACTORIZATION WITH SINGULAR VALUE DECOMPOSITION FOR VOICE CONVERSION P22 B TTS.3 1094 A MODULARIZED NEURAL NETWORK WITH LANGUAGE-SPECIFIC OUTPUT LAYERS FOR CROSS- LINGUAL VOICE CONVERSION P23 B TTS.4 1112 KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION P24 B TTS.5 1126 INVESTIGATION OF SHALLOW WAVENET VOCODER WITH LAPLACIAN DISTRIBUTION OUTPUT P25 B TTS.6 1131 LEARNING HIERARCHICAL REPRESENTATIONS FOR EXPRESSIVE SPEAKING STYLE IN END-TO-END SPEECH SYNTHESIS P26 B TTS.7 1228 CONTROLLING EMOTION STRENGTH WITH RELATIVE ATTRIBUTE FOR END-TO-END SPEECH SYNTHESIS P27 B TTS.8 1242 BOOTSTRAPPING NON-PARALLEL VOICE CONVERSION FROM SPEAKER-ADAPTIVE TEXT-TO- SPEECH P28 B TTS.9 1245 IMPROVING MANDARIN END-TO-END SPEECH SYNTHESIS BY SELF-ATTENTION AND LEARNABLE GAUSSIAN BIAS P29 B TTS.10 1358 TACOTRON-BASED ACOUSTIC MODEL USING PHONEME ALIGNMENT FOR PRACTICAL NEURAL TEXT-TO-SPEECH SYSTEMS ASRU 2019 POSTER SESSION 1 LOCATION: EVENTS CENTER LEVEL 1

Upload: others

Post on 03-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ASRU 2019 POSTER SESSION 1asru2019.org/wp/wp-content/uploads/PosterSessions_ALL-1.pdf · sunday 15th dec 2019 time: 16:00 - 17:30 session chair: tomi kinnunen poster board room id

Sunday 15th Dec 2019 Time: 10:30 am-12:00 noonSESSION CHAIR: Koichi Shinoda

Poster Board

Room ID Paper Num Paper Title

P1 A ASR-1.1 1011 INCREMENTAL LATTICE DETERMINIZATION FOR WFST DECODERS

P2 A ASR-1.2 1013 A COMPARISON OF TRANSFORMER AND LSTM ENCODER DECODER MODELS FOR ASR

P3 A ASR-1.3 1017 A DROPOUT-BASED SINGLE MODEL COMMITTEE APPROACH FOR ACTIVE LEARNING IN ASR

P4 A ASR-1.4 1020 PERSONALIZATION OF END-TO-END SPEECH RECOGNITION ON MOBILE DEVICES FOR NAME ENTITIES

P5 A ASR-1.5 1027 SIMULTANEOUS SPEECH RECOGNITION AND SPEAKER DIARIZATION FOR MONAURAL DIALOGUE RECORDINGS WITH TARGET-SPEAKER ACOUSTIC MODELS

P6 A ASR-1.6 1049 INTEGRATING SOURCE-CHANNEL MODEL WITH ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION

P7 A ASR-1.7 1053AN INVESTIGATION INTO THE EFFECTIVENESS OF ENHANCEMENT IN ASR TRAINING AND TEST FOR CHIME-5 DINNER PARTY TRANSCRIPTION

P8 A ASR-1.8 1057 STATE-OF-THE-ART SPEECH RECOGNITION USING MULTI-STREAM SELF-ATTENTION WITH DILATED 1D CONVOLUTIONS

P9 A ASR-1.9 1061 HIGHLY EFFICIENT NEURAL NETWORK LANGUAGE MODEL COMPRESSION USING SOFT BINARIZATION TRAINING

P10 A ASR-1.10 1066 IMPROVED MULTI-STAGE TRAINING OF ONLINE ATTENTION-BASED ENCODER-DECODER MODELS

P11 A ASR-1.11 1068 LEAD2GOLD: TOWARDS EXPLOITING THE FULL POTENTIAL OF NOISY TRANSCRIPTIONS FOR SPEECH RECOGNITION

P12 A ASR-1.12 1076 ORTHOGONALITY CONSTRAINED MULTI-HEAD ATTENTION FOR KEYWORD SPOTTING

P13 A ASR-1.13 1080 LEARNING BETWEEN DIFFERENT TEACHER AND STUDENT MODELS IN ASRP14 A ASR-1.14 1090 A UNIFIED ENDPOINTER USING MULTITASK AND MULTIDOMAIN TRAINING

P15 B ASR-1.15 1107 DOMAIN EXPANSION IN DNN-BASED ACOUSTIC MODELS FOR ROBUST SPEECH RECOGNITION

P16 B ASR-1.16 1108 IMPROVING RNN TRANSDUCER MODELING FOR END-TO-END SPEECH RECOGNITION

P17 B ASR-1.17 1113 SIMPLE GATED CONVENT FOR SMALL FOOTPRINT ACOUSTIC MODELING

P18 B ASR-1.18 1134 GANS FOR CHILDREN: A GENERATIVE DATA AUGMENTATION STRATEGY FOR CHILDREN SPEECH RECOGNITION

P19 B ASR-1.19 1154 ESPRESSO: A FAST END-TO-END NEURAL SPEECH RECOGNITION TOOLKIT

P20 B TTS.1 1081 ON THE STUDY OF GENERATIVE ADVERSARIAL NETWORKS FOR CROSS-LINGUAL VOICE CONVERSION

P21 B TTS.2 1089 WAVENET FACTORIZATION WITH SINGULAR VALUE DECOMPOSITION FOR VOICE CONVERSION

P22 B TTS.3 1094 A MODULARIZED NEURAL NETWORK WITH LANGUAGE-SPECIFIC OUTPUT LAYERS FOR CROSS-LINGUAL VOICE CONVERSION

P23 B TTS.4 1112 KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION

P24 B TTS.5 1126 INVESTIGATION OF SHALLOW WAVENET VOCODER WITH LAPLACIAN DISTRIBUTION OUTPUT

P25 B TTS.6 1131 LEARNING HIERARCHICAL REPRESENTATIONS FOR EXPRESSIVE SPEAKING STYLE IN END-TO-END SPEECH SYNTHESIS

P26 B TTS.7 1228 CONTROLLING EMOTION STRENGTH WITH RELATIVE ATTRIBUTE FOR END-TO-END SPEECH SYNTHESIS

P27 B TTS.8 1242 BOOTSTRAPPING NON-PARALLEL VOICE CONVERSION FROM SPEAKER-ADAPTIVE TEXT-TO-SPEECH

P28 B TTS.9 1245 IMPROVING MANDARIN END-TO-END SPEECH SYNTHESIS BY SELF-ATTENTION AND LEARNABLE GAUSSIAN BIAS

P29 B TTS.10 1358 TACOTRON-BASED ACOUSTIC MODEL USING PHONEME ALIGNMENT FOR PRACTICAL NEURAL TEXT-TO-SPEECH SYSTEMS

ASRU 2019 POSTER SESSION 1 LOCATION: EVENTS CENTER LEVEL 1

Page 2: ASRU 2019 POSTER SESSION 1asru2019.org/wp/wp-content/uploads/PosterSessions_ALL-1.pdf · sunday 15th dec 2019 time: 16:00 - 17:30 session chair: tomi kinnunen poster board room id

Sunday 15th Dec 2019 Time: 16:00 - 17:30SESSION CHAIR: Tomi Kinnunen

Poster Board

Room ID Paper Num Paper Title

P1 A ADV.1 1012 SPEAKER-AWARE SPEECH-TRANSFORMERP2 A ADV.2 1086 SPEECH SEPARATION USING SPEAKER INVENTORY

P3 A ADV.3 1167 MIMO-SPEECH: END-TO-END MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION

P4 A ADV.4 1237 JOINT DISTRIBUTION LEARNING IN THE FRAMEWORK OF VARIATIONAL AUTOENCODERS FOR FAR-FIELD SPEECH ENHANCEMENT

P5 A ADV.5 1318 ANALYZING LARGE RECEPTIVE FIELD CONVOLUTIONAL NETWORKS FOR DISTANT SPEECH RECOGNITION

P6 A ADV.6 1342 FASNET: LOW-LATENCY ADAPTIVE BEAMFORMING FOR MULTI-MICROPHONE AUDIO PROCESSING

P7 A ADV.7 1374 DOMAIN ADAPTATION VIA TEACHER-STUDENT LEARNING FOR END-TO-END SPEECH RECOGNITION

P8 A ADV.8 1389 ADVANCES IN ONLINE AUDIO-VISUAL MEETING TRANSCRIPTION

P9 A SLR-1.1 1031 JOINT OPTIMIZATION OF CLASSIFICATION AND CLUSTERING FOR DEEP SPEAKER EMBEDDING

P10 A SLR-1.2 1037 EXPLORING EFFECTIVE DATA AUGMENTATION WITH TDNN-LSTM NEURAL NETWORK EMBEDDING FOR SPEAKER RECOGNITION

P11 A SLR-1.3 1054 END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTIONP12 A SLR-1.4 1069 A CROSS-CORPUS STUDY ON SPEECH EMOTION RECOGNITION

P13 A SLR-1.5 1095 ADVERSARIAL ATTACKS ON SPOOFING COUNTERMEASURES OF AUTOMATIC SPEAKER VERIFICATION

P14 A SLR-1.6 1099 SPOKEN LANGUAGE IDENTIFICATION USING BIDIRECTIONAL LSTM BASED LID SEQUENTIAL SENONES.

P15 B SLR-1.7 1104 TIME-DOMAIN SPEAKER EXTRACTION NETWORK

P16 B SLR-1.8 1120 SHORT UTTERANCE COMPENSATION IN SPEAKER VERIFICATION VIA COSINE-BASED TEACHER-STUDENT LEARNING OF SPEAKER EMBEDDINGS

P17 B SLR-1.9 1153 NOVEL ENHANCED TEAGER ENERGY BASED CEPSTRAL COEFFICIENTS FOR REPLAY SPOOF DETECTION

P18 B SLR-1.10 1165 SYLLABLE-DEPENDENT DISCRIMINATIVE LEARNING FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION

P19 B SLR-1.11 1173LATENT SPACE REPRESENTATION FOR MULTI-TARGET SPEAKER DETECTION AND IDENTIFICATION WITH A SPARSE DATASET USING TRIPLET NEURAL NETWORKS

P20 B SLR-1.12 1210 SELF-ADAPTIVE SOFT VOICE ACTIVITY DETECTION USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION

P21 B SLR-1.13 1243 SPHEREDIAR: AN EFFECTIVE SPEAKER DIARIZATION SYSTEM FOR MEETING DATA

P22 B SLR-1.14 1258 BAYESIAN ADVERSARIAL LEARNING FOR SPEAKER RECOGNITION

P23 B SLR-1.15 1270 AN INVESTIGATION OF LSTM-CTC BASED JOINT ACOUSTIC MODEL FOR INDIAN LANGUAGE IDENTIFICATION

P24 B SLR-1.16 1273 A MULTI PURPOSE AND LARGE SCALE SPEECH CORPUS IN PERSIAN AND ENGLISH FOR SPEAKER AND SPEECH RECOGNITION: THE DEEPMINE DATABASE

P25 B SLR-1.17 1302 NATIVE LANGUAGE IDENTIFICATION FROM RAW WAVEFORMS USING DEEP CONVOLUTIONAL NEURAL NETWORKS WITH ATTENTIVE POOLING

P26 B SLR-1.18 1322 SPEAKER VERIFICATION WITH APPLICATION-AWARE BEAMFORMING

ASRU 2019 POSTER SESSION 2 LOCATION: EVENTS CENTER LEVEL 1

Page 3: ASRU 2019 POSTER SESSION 1asru2019.org/wp/wp-content/uploads/PosterSessions_ALL-1.pdf · sunday 15th dec 2019 time: 16:00 - 17:30 session chair: tomi kinnunen poster board room id

Monday 16th Dec 2019 Time: 10:30 am - 12:00 noonSESSION CHAIR: Hemant Patil

Poster Board

Room ID Paper Num Paper Title

P1 A ASR-2.1 1156 TRAINING LANGUAGE MODELS FOR LONG-SPAN CROSS-SENTENCE EVALUATIONP2 A ASR-2.2 1178 TRANSFORMER ASR WITH CONTEXTUAL BLOCK PROCESSING

P3 A ASR-2.3 1185 A DENSITY RATIO APPROACH TO LANGUAGE MODEL FUSION IN END-TO-END AUTOMATIC SPEECH RECOGNITION

P4 A ASR-2.4 1192 IMPROVING GRAPHEME-TO-PHONEME CONVERSION BY INVESTIGATING COPYING MECHANISM IN RECURRENT ARCHITECTURES

P5 A ASR-2.5 1193 A COMPARATIVE STUDY ON TRANSFORMER VS RNN IN SPEECH APPLICATIONS

P6 A ASR-2.6 1194 FROM SENONES TO CHENONES: TIED CONTEXT-DEPENDENT GRAPHEMES FOR HYBRID SPEECH RECOGNITION

P7 A ASR-2.7 1201 ATTENTION-BASED SPEECH RECOGNITION USING GAZE INFORMATION

P8 A ASR-2.8 1203 LISTENING WHILE SPEAKING AND VISUALIZING: IMPROVING ASR THROUGH MULTIMODAL CHAIN

P9 A ASR-2.9 1215 EMBEDDINGS FOR DNN SPEAKER ADAPTIVE TRAINING

P10 A ASR-2.10 1219 LANGUAGE MODEL BOOTSTRAPPING USING NEURAL MACHINE TRANSLATION FOR CONVERSATIONAL SPEECH RECOGNITION

P11 A ASR-2.11 1224 SPEAKER AND LANGUAGE AWARE TRAINING FOR END-TO-END ASR

P12 A ASR-2.12 1225 DATA AUGMENTATION BASED ON VOWEL STRETCH FOR IMPROVING CHILDREN'S SPEECH RECOGNITION

P13 A ASR-2.13 1234 MIXED BANDWIDTH ACOUSTIC MODELING LEVERAGING KNOWLEDGE DISTILLATION

P14 A ASR-2.14 1256 ON TEMPORAL CONTEXT INFORMATION FOR HYBRID BLSTM-BASED PHONEME RECOGNITION

P15 B ASR-2.15 1268 EXPLORING MODEL UNITS AND TRAINING STRATEGIES FOR END-TO-END SPEECH RECOGNITION

P16 B ASR-2.16 1269 QUERY-BY-EXAMPLE ON-DEVICE KEYWORD SPOTTING

P17 B ASR-2.17 1276 SMALL-FOOTPRINT KEYWORD SPOTTING WITH GRAPH CONVOLUTIONAL NETWORK

P18 B ASR-2.18 1327 SIMPLIFIED LSTMS FOR SPEECH RECOGNITION

P19 B ASR-2.19 1395 GENERALIZED LARGE-CONTEXT LANGUAGE MODELS BASED ON FORWARD-BACKWARD HIERARCHICAL RECURRENT ENCODER-DECODER MODELS

P20 B ASR-2.20 1409 END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM

P21 B S2S.1 1004 MULTILINGUAL END-TO-END SPEECH TRANSLATIONP22 B S2S.2 1218 NEURAL MACHINE TRANSLATION WITH ACOUSTIC EMBEDDINGP23 B S2S.3 1360 ONE-TO-MANY MULTILINGUAL END-TO-END SPEECH TRANSLATION

P24 B S2S.4 1390 SPEECH-TO-SPEECH TRANSLATION BETWEEN UNTRANSCRIBED UNKNOWN LANGUAGES

P25 B SDR.1 1055 ENHANCED BERT-BASED RANKING MODELS FOR SPOKEN DOCUMENT RETRIEVAL

P26 B SDR.2 1065 VIRTUAL ADVERSARIAL TRAINING FOR DS-CNN BASED SMALL-FOOTPRINT KEYWORD SPOTTING

P27 B SDR.3 1091 VERIFYING DEEP KEYWORD SPOTTING DETECTION WITH ACOUSTIC WORD EMBEDDINGS

P28 B SDR.4 1124 MULTILINGUAL BOTTLENECK FEATURES FOR QUERY BY EXAMPLE SPOKEN TERM DETECTION

P29 B SDR.5 1133 ADDITIONAL SHARED DECODER ON SIAMESE MULTI-VIEW ENCODERS FOR LEARNING ACOUSTIC WORD EMBEDDINGS

P30 B SDR.6 1257 EFFICIENT FREE KEYWORD DETECTION BASED ON CNN AND END-TO-END CONTINUOUS DP-MATCHING

ASRU 2019 POSTER SESSION 3 LOCATION: EVENTS CENTER LEVEL 1

Page 4: ASRU 2019 POSTER SESSION 1asru2019.org/wp/wp-content/uploads/PosterSessions_ALL-1.pdf · sunday 15th dec 2019 time: 16:00 - 17:30 session chair: tomi kinnunen poster board room id

Tuesday 17th Dec 2019 Time: 10:30 am - 12:00 noonSESSION CHAIR: Yifan Gong

Poster Board

Room ID Paper Num Paper Title

P1 A NEW.1 1059 IMPROVING SPEECH ENHANCEMENT WITH PHONETIC EMBEDDING FEATURESP2 A NEW.2 1098 DETECTING DECEPTION IN POLITICAL DEBATES USING ACOUSTIC AND TEXTUAL FEATURES

P3 A NEW.3 1205 END-TO-END OVERLAPPED SPEECH DETECTION AND SPEAKER COUNTING WITH RAW WAVEFORM

P4 A NEW.4 1244 TIME DOMAIN AUDIO VISUAL SPEECH SEPARATION

P5 A NEW.5 1283 SPEECH REVEALS FUTURE RISK OF DEVELOPING DEMENTIA: PREDICTIVE DEMENTIA SCREENING FROM BIOGRAPHIC INTERVIEWS

P6 A NEW.6 1284 IMPROVING FUNDAMENTAL FREQUENCY GENERATION IN EMG-TO-SPEECH CONVERSION USING A QUANTIZATION APPROACH

P7 A NEW.7 1363 TOWARDS REAL-TIME MISPRONUNCIATION DETECTION IN KIDS’ SPEECH

P8 A SLR-2.1 1114 INCORPORATING PRIOR KNOWLEDGE INTO SPEAKER DIARIZATION AND LINKING FOR IDENTIFYING COMMON SPEAKER

P9 A SLR-2.2 1195 LOGISTIC SIMILARITY METRIC LEARNING VIA AFFINITY MATRIX FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

P10 A SLR-2.3 1304 LOW-RESOURCE DOMAIN ADAPTATION FOR SPEAKER RECOGNITION USING CYCLE-GANSP11 A SLR-2.4 1319 CNN WITH PHONETIC ATTENTION FOR TEXT-INDEPENDENT SPEAKER VERIFICATIONP12 A SLR-2.5 1320 PROBING THE INFORMATION ENCODED IN X-VECTORSP13 A SLR-2.6 1326 IN-THE-WILD END-TO-END DETECTION OF SPEECH AFFECTING DISEASES

P14 A SLR-2.7 1330 OPTIMIZING NEURAL NETWORK EMBEDDINGS USING PAIR-WISE LOSS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

P15 B SLR-2.8 1334 TOWARDS CONTROLLING FALSE ALARM --- MISS TRADE-OFF IN PERCEPTUAL SPEAKER COMPARISON VIA NON-NEUTRAL LISTENING TASK FRAMING

P16 B SLR-2.9 1348 DOVER: A METHOD FOR COMBINING DIARIZATION OUTPUTS

P17 B SLU.1 1014 USING VERY DEEP CONVOLUTIONAL NEURAL NETWORKS TO AUTOMATICALLY DETECT PLAGIARIZED SPOKEN RESPONSES

P18 B SLU.2 1016 SPOKEN MULTIPLE-CHOICE QUESTION ANSWERING USING MULTIMODAL CONVOLUTIONAL NEURAL NETWORKS

P19 B SLU.3 1043 TRANSFER LEARNING FOR CONTEXT-AWARE SPOKEN LANGUAGE UNDERSTANDINGP20 B SLU.4 1191 EMOCEPTION: AN INCEPTION INSPIRED EFFICIENT SPEECH EMOTION RECOGNITION NETWORKP21 B SLU.5 1209 A COMPARATIVE STUDY ON END-TO-END SPEECH TO TEXT TRANSLATION

P22 B SLU.6 1249 JOINT LEARNING OF WORD AND LABEL EMBEDDINGS FOR SEQUENCE LABELLING IN SPOKEN LANGUAGE UNDERSTANDING

P23 B SLU.7 1264 MARKOV RECURRENT NEURAL NETWORK LANGUAGE MODEL

P24 B SLU.8 1266 TOPIC-AWARE POINTER-GENERATOR NETWORKS FOR SUMMARIZING SPOKEN CONVERSATIONS

P25 B SLU.9 1279 SLU FOR VOICE COMMAND IN SMART HOME: COMPARISON OF PIPELINE AND END-TO-END APPROACHES

P26 B SLU.10 1313 SCALABLE NEURAL DIALOGUE STATE TRACKINGP27 B SLU.11 1351 HIERARCHICAL TRANSFORMERS FOR LONG DOCUMENT CLASSIFICATION

P28 B SLU.12 1352 ADAPTING PRETRAINED TRANSFORMER TO LATTICES FOR SPOKEN LANGUAGE UNDERSTANDING

P29 B SLU.13 1355 SPATIO-TEMPORAL CONTEXT MODELLING FOR SPEECH EMOTION CLASSIFICATIONP30 B SLU.14 1387 PARAPHRASE GENERATION BASED ON VAE AND POINTER-GENERATOR NETWORKS

ASRU 2019 POSTER SESSION 4 LOCATION: EVENTS CENTER LEVEL 1

Page 5: ASRU 2019 POSTER SESSION 1asru2019.org/wp/wp-content/uploads/PosterSessions_ALL-1.pdf · sunday 15th dec 2019 time: 16:00 - 17:30 session chair: tomi kinnunen poster board room id

Tuesday 17th Dec 2019 Time: 16:00-17:30SESSION CHAIR: Kong Aik Lee

Poster Board Room ID Paper Title

P4 A DEMO-1.1TOWARDS AUTOMATIC SPEECH EVALUATION FOR MULTILINGUAL SOCIETIES: PROTOTYPE SYSTEM FOR SINGAPORE CHILDREN LEARNING MALAY

P7 A DEMO-1.2 A SOUND TRACKING ROBOT ON BIO-INSPIRED ALGORITHM

P10 A DEMO-1.3SIMMC: SITUATED INTERACTIVE MULTI-MODAL CONVERSATIONAL DATA COLLECTION AND EVALUATION PLATFORM

P13 A DEMO-1.4 MUSIGPRO: AUTOMATIC LEADERBOARD OF SINGERS USING REFERENCE INDEPENDENT EVALUATION

ASRU 2019 DEMO SESSIONLOCATION: EVENTS CENTER LEVEL 1

Page 6: ASRU 2019 POSTER SESSION 1asru2019.org/wp/wp-content/uploads/PosterSessions_ALL-1.pdf · sunday 15th dec 2019 time: 16:00 - 17:30 session chair: tomi kinnunen poster board room id

Wednesday 18th Dec 2019 Time: 10:30 am - 12:00 noonSESSION CHAIR: Rohan Kumar Das

Poster Board

Room IDPaper Num

Paper Title

P1 A ASR-3.1 1285 SEMI-SUPERVISED TRAINING AND DATA AUGMENTATION FOR ADAPTATION OF AUTOMATIC BROADCAST NEWS CAPTIONING SYSTEMS

P2 A ASR-3.2 1286 ONLINE BATCH NORMALIZATION ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION

P3 A ASR-3.3 1288 SPEAKER ADAPTIVE TRAINING USING MODEL AGNOSTIC META-LEARNING

P4 A ASR-3.4 1294 A COMPARISON OF END-TO-END MODELS FOR LONG-FORM SPEECH RECOGNITION

P5 A ASR-3.5 1298 ACOUSTIC MODEL ADAPTATION FROM RAW WAVEFORMS WITH SINCNET

P6 A ASR-3.6 1305 RECURRENT NEURAL NETWORK TRANSDUCER FOR AUDIO-VISUAL SPEECH RECOGNITION

P7 A ASR-3.7 1312 EXPLICIT ALIGNMENT OF TEXT AND SPEECH ENCODINGS FOR ATTENTION-BASED END-TO-END SPEECH RECOGNITION

P8 A ASR-3.8 1329 RECOGNIZING LONG-FORM SPEECH USING STREAMING END-TO-END MODELS

P9 A ASR-3.9 1335 LEVERAGING LANGUAGE ID IN MULTILINGUAL END-TO-END SPEECH RECOGNITION

P10 A ASR-3.10 1336 STREAMING END-TO-END SPEECH RECOGNITION WITH JOINT CTC-ATTENTION BASED MODELS

P11 A ASR-3.11 1344 MONOTONIC RECURRENT NEURAL NETWORK TRANSDUCER AND DECODING STRATEGIES

P12 A ASR-3.12 1368 CHARACTER-AWARE ATTENTION-BASED END-TO-END SPEECH RECOGNITION

P13 A ASR-3.13 1370 ATTENTION BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH LARGE SPEECH CORPUS

P14 A ASR-3.14 1375 ZERO-SHOT CODE-SWITCHING ASR AND TTS WITH MULTILINGUAL MACHINE SPEECH CHAIN

P15 B ASR-3.15 1376 END-TO-END CODE-SWITCHING ASR FOR LOW-RESOURCED LANGUAGE PAIRS

P16 B ASR-3.16 1408 UNSUPERVISED ADAPTATION OF ACOUSTIC MODELS FOR ASR USING UTTERANCE-LEVEL EMBEDDINGS FROM SQUEEZE AND EXCITATION NETWORKS

P17 B ASR-3.17 1410POWER-LAW NONLINEARITY WITH MAXIMALLY UNIFORM DISTRIBUTION CRITERION FOR IMPROVED NEURAL NETWORK TRAINING IN AUTOMATIC SPEECH RECOGNITION

P18 B ASR-3.18 1317 SPEECH RECOGNITION WITH AUGMENTED SYNTHESIZED SPEECH

P19 B MLP.1 1152 SECOND LANGUAGE TRANSFER LEARNING IN HUMANS AND MACHINES USING IMAGE SUPERVISION

P20 B MLP.2 1377 ZERO-SHOT PRONUNCIATION LEXICONS FOR CROSS-LANGUAGE ACOUSTIC MODEL TRANSFER

P21 B SDS.1 1070 ROBUST BELIEF STATE SPACE REPRESENTATION FOR STATISTICAL DIALOGUE MANAGERS USING DEEP AUTOENCODERS

P22 B SDS.2 1097 IMPROVING SPEECH-BASED END-OF-TURN DETECTION VIA CROSS-MODAL REPRESENTATION LEARNING WITH PUNCTUATED TEXT DATA

P23 B SDS.3 1271 DIALOGUE ENVIRONMENTS ARE DIFFERENT FROM GAMES: INVESTIGATING VARIANTS OF DEEP Q-NETWORKS FOR DIALOGUE POLICY

P24 B SDS.4 1325 EFFICIENT SEMI-SUPERVISED LEARNING FOR NATURAL LANGUAGE UNDERSTANDING BY OPTIMIZING DIVERSITY

P25 B SS-1.1 1143 DEVELOPMENT OF VOICE SPOOFING DETECTION SYSTEMS FOR 2019 EDITION OF AUTOMATIC SPEAKER VERIFICATION AND COUNTERMEASURES CHALLENGE

P26 B SS-1.2 1281 SPOOF DETECTION USING TIME-DELAY SHALLOW NEURAL NETWORK AND FEATURE SWITCHING

P27 B SS-1.3 1364 LONG RANGE ACOUSTIC AND DEEP FEATURES PERSPECTIVE ON ASVSPOOF 2019

P28 B SS-2.1 1101 MGB-5: ARABIC DIALECT IDENTIFICATION ACROSS 17 DIALECTS AND MOROCCAN SPEECH RECOGNITION

P29 B SS-2.2 1333 RDI-CU SYSTEM FOR THE 2019 ARABIC MULTI-GENRE BROADCAST CHALLENGE

ASRU 2019 POSTER SESSION 5 LOCATION: EVENTS CENTER LEVEL 1