arf @ mediaeval 2012: a romanian asr-based approach to spoken term detection
DESCRIPTION
TRANSCRIPT
Motivation Spoken Term Detection trough ASR Based on the Romanian ASR for continuous speech:
acoustic model trained with 64h of speech language model trained with 170 million words 18% WER on clean speech
Adaptation of Romanian ASR to Lwazi language Provided searching algorithms based on different
outputs of ASR
ASR adaptation Tuning the Romanian ASR to minimize PhER at 8KHz 77 African phones mapped to 28 Romanian phones Romanian to Lwazi phone mapping rules:
1) directly by IPA classification2) to the closest phone according to IPA full chart3) based on the confusion matrix
MAP adaptation of acoustic model with the development data set
ASR accuracy
Adaptation steps PhER [%]Romanian ASR for continuous speech 36.8Romanian ASR - beam width tuned 31.4Romanian ASR - language model tuned 25.3African speech with Romanian ASR 61.2MAP adaptation with Lwazi dev set 48.1
Searching techniques The ASR output can be:
String of characters Lattice Confusion Networks
Character comparison based techniques: DTW String Search (DTWSS) Sausage Technique (ST)
Acoustics based technique: Lattice Grammar (LG)
DTWSS Sliding window length proportional to the query
lengths Shorter DTW matches are given higher score Longer queries are given higher scores The score formula:
)1)(1)(1(Q
SW
QmQM
QmQ
LLL
LLLL
PhERs
Sausage Technique (ST)
Lattice Grammar (LG)
Recognition of the query Building of a finite state grammar (FSG) from the
lattice (query) output of the ASR Recognition of the contents with the FSG. Calculation of the likelihood probability Normalization of the likelihood probability and use it
as decision score
Results on evaluation data set
Results on all data set
ATWV evalQ-evalC
evalQ-devC
devQ-evalC
devQ-devC
DTWSS (α=0.8 β=0.4) 0.31 0.47 0.33 0.49DTWSS (α=0.6 β=0.6) 0.31 0.48 0.33 0.47DTWSS (α=0.1 β=0.4) 0.27 0.44 0.32 0.47
ST 0.12 0.22 0.17 0.25LG 0 0.02 0 -
Conclusions
The Romanian ASR is adapted to recognize African phones
DTWSS obtains by far the best results The penalization of long DTW matches and short
query lengths helped increase the ATWV ST and LG methods suffer the low PhER (48%)