mediaeval 2015 - query by example search on speech task

20
Query by Example Search on Speech Task (QUESST 2015) Igor Szoke, Luis Javier Rodriguez-Fuentes, Andi Buzo, Xavier Anguera, Florian Metze (with help of Jorge Proenca, Martin Lojka, Xiao Xiong as data providers) 14-15.9.2015 MediaEval workshop, Wurzen, Germany

Upload: multimediaeval

Post on 20-Jan-2017

165 views

Category:

Education


0 download

TRANSCRIPT

Page 1: MediaEval 2015 - Query by Example Search on Speech Task

Query by Example Search on Speech Task

(QUESST 2015)

Igor Szoke, Luis Javier Rodriguez-Fuentes, Andi Buzo, Xavier Anguera, Florian Metze

(with help of Jorge Proenca, Martin Lojka, Xiao Xiong as data providers)

14-15.9.2015 MediaEval workshop, Wurzen, Germany

Page 2: MediaEval 2015 - Query by Example Search on Speech Task

What is QUESST about...

• Spoken Audio Search (or Query-by-Example Spoken-Term Detection)

• Given a spoken query we search for matches (at lexical level) within a set of spoken documents

• It is similar to Spoken Term Detection (NIST STD2006, OpenKWS) ut …

• Queries are spoken

• No prior information

• Different acoustic conditions

• “ear hi g for whole do u e ts

Page 3: MediaEval 2015 - Query by Example Search on Speech Task

Evolution

• SWS2011

• English and Indian lang, exact match, find document, TVW

• SWS2012

• 6 South African lang., exact match, queries from data, find exact place, TVW metric

• SWS2013

• 6 European lang., exact match, queries from data , find exact place, TVW metric

• QUESST2014

• 6 European lang., not exact match, queries are dictated, find document, Cnxe metric

• QUESST2015

• ...

Page 4: MediaEval 2015 - Query by Example Search on Speech Task

Evolution in 2015

• 6 lang. (Albanian, Chinese, Czech, Portuguese, Romanian, Slovak)

• 19 hours of audio (dev = eval), per sentence segmentation

• 450 queries/dev, 450 queries/eval • Recorded in isolation by different speakers (some non-native of the language)

• Utterance-level matching

• Recorded with context New!

• 3 types of search • T1 - Exact match, dictated

• T2 - Reordering and small variations, dictated

• T3 - Reordering and small variations, conversational speech New!

• We provided • Scoring tool, Features, Baseline search technique (DTW), Calibration and Fusion, Speech Kitchen (VM) New!

• Surprise • The data was artificially noised and reverberated New!

• Data examples: Clean, Noisy, Reverb, Noisy+Reverb

Page 5: MediaEval 2015 - Query by Example Search on Speech Task

Teams Team Affiliation Country Note

BUT BUT Speech@FIT, Faculty of Infromation Technology, Brno University

of Technology

Czech late

CUNY Department of Computer Science at Queens College of The City

University of New York.

US

ELiRF Natural Language Engineering and Pattern Recognition, Departament de

Sistemes Informàtics i Computació, Universitat Politècnica de València

Spain

GTM-UVigo Multimedia Technology Group, Universida de Vigo Spain Late

IIT-B Department of Electrical Engineering , Indian Institute of Technology

Bombay

India Not

arrived

NNI Northwestern Polytechnical University, Xi’an, China

Nanyang Technological University, Singapore

Institute for Infocomm Research, A*STAR, Singapore

China

Singapore

NTU National Taiwan University Taiwan zero

SpeeD SpeeD Research Laboratory, University Politehnica of Bucharest Romania

SPL-IT-UC Instituto de Telecomunicações, Coimbra

Electrical and Computer Eng. Department, University of Coimbra

Portugal

TUKE Laboratory of Speech Technologies in Telecommunications @ Technical

University of Košice

Slovakia late, zero

Page 6: MediaEval 2015 - Query by Example Search on Speech Task
Page 7: MediaEval 2015 - Query by Example Search on Speech Task
Page 8: MediaEval 2015 - Query by Example Search on Speech Task
Page 9: MediaEval 2015 - Query by Example Search on Speech Task

Scoring

● Is a query in document?

● Analysis per search type / language / noise

● Metrics: Cnxe (lower is better, up to 0, 1 is random)

TWV (higher is better, up to 1, 0 is random)

Page 10: MediaEval 2015 - Query by Example Search on Speech Task

All teams Cnxe

Page 11: MediaEval 2015 - Query by Example Search on Speech Task

Per type search (avg top7)

Page 12: MediaEval 2015 - Query by Example Search on Speech Task

Per noise/reverb (avg top7)

Page 13: MediaEval 2015 - Query by Example Search on Speech Task

All teams Cnxe – T1 & clean

Page 14: MediaEval 2015 - Query by Example Search on Speech Task

Conclusion

• We made it really hard. All teams fight bravely!

• No surprise

• Big fusion vs. Simple system

• Addressing the reordering and noisy data.

• Zero resourced and non DTW.. But not good results.

• Is the DTW really the best approach?

Page 15: MediaEval 2015 - Query by Example Search on Speech Task

It was really hard this year!

Thank you!

.. do not forget technical retreat today at 15:00 ..

Page 16: MediaEval 2015 - Query by Example Search on Speech Task

Conclusion - RT

• Who used provided tools?

• Who used query context?

• There was problem i data … we need to use on-line signal centering

• Future … the same task on the „same data?

Page 17: MediaEval 2015 - Query by Example Search on Speech Task

Technical retreat

● What provided technologies were used by teams? (make a table)

● Show data problem (raw audio)..

● Did participants used the query context?

Page 18: MediaEval 2015 - Query by Example Search on Speech Task

Technical retreat

• Task description - going deeper

• IIT-B remote presentation

• round • "everyone" - if wants to speak

• conclusion + future (next year)

• prepare answers to: • What was the easiest thing • What was the toughest thing • What would you did in a different way this year • One thing that should be the same next year • One thing that should change next year • How happy are you with your participation in QUESST (0-10, more is better, should not reflect

your score, rather your feeling of work done)

Page 19: MediaEval 2015 - Query by Example Search on Speech Task

Technical retreat

• Task description - going deeper

• IIT-B remote presentation

• round • "everyone" - if wants to speak

• conclusion + future (next year)

• prepare answers to: • What was the easiest thing • What was the toughest thing • What would you did in a different way this year • One thing that should be the same next year • One thing that should change next year • How happy are you with your participation in QUESST (0-10, more is better, should not reflect

your score, rather your feeling of work done)

Page 20: MediaEval 2015 - Query by Example Search on Speech Task