mediaeval 2015 - but quesst 2015 system description

11
BUT QUESST 2015 System Description Miroslav Skácel, Igor Szöke Speech@FIT Faculty of Information Technology Brno University of Technology MediaEval QUESST 2015 workshop, September 14.-15. 2015, Wurzen

Upload: multimediaeval

Post on 20-Jan-2017

91 views

Category:

Education


1 download

TRANSCRIPT

BUT QUESST 2015 System Description

Miroslav Skácel, Igor SzökeSpeech@FIT

Faculty of Information TechnologyBrno University of Technology

MediaEval QUESST 2015 workshop, September 14.-15. 2015, Wurzen

System overviewOur internal task was:

● to reuse some Atomic systems as we have● to incorporate bottlenecks● to calibrate and fuse● to cope with T2/T3 queries

We ended up with:● 4 Atomic systems● 3 QbE subsystems based on DTW● 4 languages (Czech, Portuguese, Russian and Spanish).

2

3

Atomic system● no adaptation on target data (SMVN, VTLN, …)● Artificial Neural Networks – to estimate bottlenecks ● bottlenecks – trained on GlobalPhone (GP) database

4

Subsystem

Neural network based features:● bottleneck features (30 dimensional)● No VTLN, No SMN/SVN

Query detector● based on Dynamic Time Warping (DTW)

5

DTW QbE subsystem● segmental DTW (query can start in any frame of utterance)● Voice Activity Detection (VAD) only on queries● Pearson product-moment correlation distance (dcorr)● slope limitation● online normalizing of the path● bottlenecks superior to posteriors

features dcorr in minCnxe (ALL)

SD CZ POST 0.984

SD HU POST 0.972

SD RU POST 0.952

GP CZ BN 0.853

GP PO BN 0.894

GP RU BN 0.893

GP SP BN 0.904

6

Slope limitation

7

Dealing with T2● query split into equal parts● each part searched in utterance separately● results averaged together● query split into 2 (denoted as 2w) and 3 (3w) parts

in late evaluation

8

Score normalization● raw detection scores normalized by length● the best detection per utterance-query pair selected● mode normalization performed

original mode norm.

9

Results

● posteriors do not work for this year dataset● slope limitation helps to control path shape● fea stack of more than 4 langs does not improve performance● mode norm is good for raw score normalization

● we will focus on denoising and dereverberation in next year

10

Thanks for your attention