mediaeval 2015 - query by example search on speech task

Query by Example Search on Speech Task

(QUESST 2015)

Igor Szoke, Luis Javier Rodriguez-Fuentes, Andi Buzo, Xavier Anguera, Florian Metze

(with help of Jorge Proenca, Martin Lojka, Xiao Xiong as data providers)

14-15.9.2015 MediaEval workshop, Wurzen, Germany

What is QUESST about...

• Spoken Audio Search (or Query-by-Example Spoken-Term Detection)

• Given a spoken query we search for matches (at lexical level) within a set of spoken documents

• It is similar to Spoken Term Detection (NIST STD2006, OpenKWS) ut …

• Queries are spoken

• No prior information

• Different acoustic conditions

• “ear hi g for whole do u e ts

Evolution

• SWS2011

• English and Indian lang, exact match, find document, TVW

• SWS2012

• 6 South African lang., exact match, queries from data, find exact place, TVW metric

• SWS2013

• 6 European lang., exact match, queries from data , find exact place, TVW metric

• QUESST2014

• 6 European lang., not exact match, queries are dictated, find document, Cnxe metric

• QUESST2015

• ...

Evolution in 2015

• 6 lang. (Albanian, Chinese, Czech, Portuguese, Romanian, Slovak)

• 19 hours of audio (dev = eval), per sentence segmentation

• 450 queries/dev, 450 queries/eval • Recorded in isolation by different speakers (some non-native of the language)

• Utterance-level matching

• Recorded with context New!

• 3 types of search • T1 - Exact match, dictated

• T2 - Reordering and small variations, dictated

• T3 - Reordering and small variations, conversational speech New!

• We provided • Scoring tool, Features, Baseline search technique (DTW), Calibration and Fusion, Speech Kitchen (VM) New!

• Surprise • The data was artificially noised and reverberated New!

• Data examples: Clean, Noisy, Reverb, Noisy+Reverb

Teams Team Affiliation Country Note

BUT BUT Speech@FIT, Faculty of Infromation Technology, Brno University

of Technology

Czech late

CUNY Department of Computer Science at Queens College of The City

University of New York.

US

ELiRF Natural Language Engineering and Pattern Recognition, Departament de

Sistemes Informàtics i Computació, Universitat Politècnica de València

Spain

GTM-UVigo Multimedia Technology Group, Universida de Vigo Spain Late

IIT-B Department of Electrical Engineering , Indian Institute of Technology

Bombay

India Not

arrived

NNI Northwestern Polytechnical University, Xi’an, China

Nanyang Technological University, Singapore

Institute for Infocomm Research, A*STAR, Singapore

China

Singapore

NTU National Taiwan University Taiwan zero

SpeeD SpeeD Research Laboratory, University Politehnica of Bucharest Romania

SPL-IT-UC Instituto de Telecomunicações, Coimbra

Electrical and Computer Eng. Department, University of Coimbra

Portugal

TUKE Laboratory of Speech Technologies in Telecommunications @ Technical

University of Košice

Slovakia late, zero

Scoring

● Is a query in document?

● Analysis per search type / language / noise

● Metrics: Cnxe (lower is better, up to 0, 1 is random)

TWV (higher is better, up to 1, 0 is random)

All teams Cnxe

Per type search (avg top7)

Per noise/reverb (avg top7)

All teams Cnxe – T1 & clean

Conclusion

• We made it really hard. All teams fight bravely!

• No surprise

• Big fusion vs. Simple system

• Addressing the reordering and noisy data.

• Zero resourced and non DTW.. But not good results.

• Is the DTW really the best approach?

It was really hard this year!

Thank you!

.. do not forget technical retreat today at 15:00 ..

Conclusion - RT

• Who used provided tools?

• Who used query context?

• There was problem i data … we need to use on-line signal centering

• Future … the same task on the „same data?

Technical retreat

● What provided technologies were used by teams? (make a table)

● Show data problem (raw audio)..

● Did participants used the query context?

Technical retreat

• Task description - going deeper

• IIT-B remote presentation

• round • "everyone" - if wants to speak

• conclusion + future (next year)

• prepare answers to: • What was the easiest thing • What was the toughest thing • What would you did in a different way this year • One thing that should be the same next year • One thing that should change next year • How happy are you with your participation in QUESST (0-10, more is better, should not reflect

your score, rather your feeling of work done)

mediaeval 2015 - query by example search on speech task

Education