metiss 17-18 mai 2006evaluation inria1 metiss modélisation et expérimentation pour le traitement...
TRANSCRIPT
17-18 Mai 2006 Evaluation INRIA 1METISS
METISSMETISS Modélisation et Expérimentationpour le Traitement des Informations et des Signaux Sonores
Scientific leader : Frédéric BIMBOT
Audio & speech processingAudio & speech processing
Overview of activities 2002-2005Overview of activities 2002-2005
INRIA-Rennes
17-18 Mai 2006 Evaluation INRIA 2METISS
IntroductionIntroduction
17-18 Mai 2006 Evaluation INRIA 3METISS
Framework and foundations
Scientific foundations Probabilistic models and statistical estimation Redundant systems and adaptive representations
analysis, processingmodelling, representation description, decompositiondetection, classificationrecognition
General frameworkaudiospeechmusicmultimedia…
signalsrecordingsstreamstracks…
of
Audio scene analysis, description and recognition
17-18 Mai 2006 Evaluation INRIA 4METISS
Scientific objectives
to design generic, robust, fast and flexible approaches to a variety of problems in speech and audio segmentation, detection and classification, operating in the probabilistic framework
to investigate on theoretical properties and practical applications of adaptive representations and sparseness criteria with the purpose of advanced processing and structured description of audio signals
to extend and adapt approaches classically used in the context of speech processing to other classes of signals and problems
to study convergence between statistical approaches and adaptive decomposition within a common framework embedding signal representations and classification
17-18 Mai 2006 Evaluation INRIA 5METISS
Application domain and focus
Applicative fields Security, verification, authentication, rights management Rich audio transcription, content-based indexing, multi-purpose
navigation, information retrieval and summarization Advanced audio processing : segmentation, separation, spatialisation,
sound object extraction, music modeling Audio and audio-visual authoring, production and repurposing Education and entertainement
Primary focuses Speaker characterisation Audio structuring and indexing Sparse representations : theory and applications Audio source separation (under-determined case)
17-18 Mai 2006 Evaluation INRIA 6METISS
Team composition
MAILHEARBERET
TENGHUET
FORTHOFERSALLOZEROVLESAGECOLLET
BEN
BENAROYABLOUET
MC DONAGH
POREEBETSER
KIJAKKRSTULOVIC
GONONBEN
MORARU
BIMBOTGRAVIERGRIBONVAL
3
3 2
Permanent researchers (CR - CNRS or INRIA)Non-permanent staff (Engineers, ATER, Post-Doc)
PhD - 100 % with METISS PhD ~ 50 % with METISS2
2002 2003 2004 2005
+ Marie-Noëlle Georgeault administrative assistant (~ 25 %)
17-18 Mai 2006 Evaluation INRIA 7METISS
Probabilistic modeling Probabilistic modeling of audio signalsof audio signals
17-18 Mai 2006 Evaluation INRIA 8METISS
Probabilistic modeling (1)
1 audio class or 1 sound object
a variety of observations
1 family of sounds 1 probabilistic model
1 probability density function 1 likelihood function
)( 1 XYP T )(ˆ 1 XYP T
17-18 Mai 2006 Evaluation INRIA 9METISS
Probabilistic modeling (2)
Probabilistic modelingStatistical estimationState-sequence decodingBayesian decision
+ « know-how »
DetectionClassificationVerificationSegmentation…
Probabilistic models offer a well-understood generic inter-operable framework for the description and the classification of audio and speech signals
Dominant position of Hidden Markov Models (HMM) (and variants)
Highly competitive field in speech processing (research & industry)
More open in audio indexing (additional factors of complexity)
17-18 Mai 2006 Evaluation INRIA 10
METISS
Challenges and positioning
Robustness to unseen acoustic conditions to scarce training data to poorly representative samples to missing observations to …
Implementability size speed
scalability
distribution etc …
Generalisation to wider classesof signals with an audio component
multiple scales multiple sources multiple structures multiple sensors multiple levels of underlying processes heterogeneous streams (audio-visual) external sources of knowledge
METISS positioning :
- robust training and test methods- compact distributed algorithms- versatility / migration of formalism- methodology and evaluation
speaker verification audio segmentation broad sound-class indexing( speech recognition)
17-18 Mai 2006 Evaluation INRIA 11
METISS
Adaptive Adaptive representationsrepresentations
17-18 Mai 2006 Evaluation INRIA 12
METISS
Adaptive representations (1)
Audio signal : diversity of structures (time, frequency, statistics,…) superimposition of objects (notes, sources, tracks, …)
Redundant system(dictionary of atoms)
Adaptive decomposition
NiTti tgD
11)(
Tttss 1)(
TN with
Large set of vectors with various :- scales- time structures- frequency structures- phases- statistical properties- …
)()(1
tgts iNii
Selection of the« best » decomposition,
according to a given criterion :- sparsity- perception criterion- separability- conditional entropy- …
17-18 Mai 2006 Evaluation INRIA 13
METISS
Adaptive representations (2)
= 2 : quadratic norm maximizes dispersion = 0 : minimum non-zero coefficient NP-complete = 1 : tractable « compromise »
)(
FMinArgConstraint :
1
1
)()(
Ni
iLFCriterion :
)()(1
tgts iNii
Nii 1
Decomposition
Sparsity criteria
Pursuit algorithms (Matching Pursuit)
17-18 Mai 2006 Evaluation INRIA 14
METISS
Recent fast-growing field
High applicative potential
Intense emerging competition
Optimality and convergence of adaptive decompositions
Dictionary design (knowledge-based, data driven, …) Deformable, stochastic, multi-dimensional, … atoms Efficient decomposition algorithms and implementations Application scope
Ongoing scientific issues
METISS positioning :
- theoretical results- concepts and methodologies- decomposition algorithms
audio source separation(under-determined case)
17-18 Mai 2006 Evaluation INRIA 15
METISS
AchievementsAchievements2002-20052002-2005
and selected resultsand selected results
Speaker characterisation Audio structuring and indexing Sparse representations : theory and applications Audio source separation (under-determined case)
17-18 Mai 2006 Evaluation INRIA 16
METISS
Speaker characterisation
CART trees for scalable and distributable speaker verification
Model-based metrics and normalisations for speaker verification
Structural adaptation of speaker models (hierarchical Bayesian networks)
Methodology and algorithms for optimizing the coverage of a speaker database
Relative speaker space and metrics for efficient speaker indexing and retrieval [ongoing]
17-18 Mai 2006 Evaluation INRIA 17
METISS
CART based speaker verification
)Xy(P̂
)Xy(P̂log)y(S
t
t
tX
direct score functionassignment
-0.4-0.5
-0.80.7
0.9
0.3
1a
1b
2a
2b
3b
11ay
12by
32by
21ay
22by
YES
YES
YES
YES
NO
NO
NO
NO
YES
NO
-0.8
0.3 -0.5
0.9
0.7 -0.4
CART Treesused as a familyof approximatingfunctions
Blouet, Bimbot, Gonon, et al.
+ Extensionto oblique trees
complexity down 200 xerror rate up 33% only
EU-ISTINSPIRED Project
17-18 Mai 2006 Evaluation INRIA 18
METISS
Speaker recognition inthe model space (1)
Formal links between LLR and KL-divergence
+mean-only adaptation
training procedure
likelihood ratio test
~=Euclidean distance in the
model space
Ben, Bimbot et al.
17-18 Mai 2006 Evaluation INRIA 19
METISS
Speaker recognition inthe model space (2)
Consequences :
- faster score computation procedure (at least -50%)- simpler normalization schemes (M-Norm)
no need of additional development data
with no performance degradation
Ben, Bimbot et al.
Tested successfullyfor speaker recognition forNIST and ESTER campaigns
17-18 Mai 2006 Evaluation INRIA 20
METISS
Audio indexing
HMM-based audio and audio-visual structuring (applied to sports programmes)
Audio segmentation and tracking using probabilistic models and statistical tests
Detection of simultaneous events in audio tracks
Granular models of audio signals using deformable atoms
Comparison and evaluation of beam-search techniques and hypothesis rescoring using external sources of knowledge [ongoing]
Algebraic representations and statistical modeling of formal music [ongoing]
17-18 Mai 2006 Evaluation INRIA 21
METISS
Multi-stream HMM modeling (1)of a tennis match
inspiredand adapted
from thespeech
recognitionparadigms
multi-level state-sequencerepresentation of a tennis match
Kijak et al. (with TMM)
multi-stream audio-visual HMM
17-18 Mai 2006 Evaluation INRIA 22
METISS
Video-onlyShot-basedC = 77%
Video+AudioShot-based + segmentalC = 85%
Multi-stream HMM modeling (2)Delakis, Gravier et al.(with TexMex)
segmental models relaxed synchronyconstraints
17-18 Mai 2006 Evaluation INRIA 23
METISS
Sparse representations
Mathematical test for the optimality of a sparse representation
Matching pursuit made tractable (1 hour 0.25 x RT)
Structured matching pursuit incorporating explicit signal family models
Adaptive computational strategies
Beyond sparsity : recovering structured representations…
Learning shift-invariant atoms (MoTIF algorithms) [ongoing]
17-18 Mai 2006 Evaluation INRIA 24
METISS
Sparse solutions to inverse linear problems
In the under-determined case :
Gribonval et al.
BUT if :
If a sparse representation is sparse enough,then it is the sparsest one
17-18 Mai 2006 Evaluation INRIA 25
METISS
Matching Pursuit made tractableGribonval, Krstulovic et al.
C++ ToolkitGPL Licence
for a 1 hour audio signalprocessing time reduced from 20 h 0.25 h
flexible operationreproducible results
usable in other fields : medical signals, sismology, etc …
MPTK
17-18 Mai 2006 Evaluation INRIA 26
METISS
Source separation(with primary focus on undertermined problems)
Statistical schemes and adaptive training for single-channel separation
Source separation approaches using multi-channel Matching Pursuit in the underdetermined case
Contributions in evaluation methodology : task definition & performance measurements
Speech « denoising » using underdetermined sources separation techniques
Dictionary design methods for source separation [ongoing]
DEMIX : a robust algorithm to estimate the number of sources using clustering techniques [ongoing]
17-18 Mai 2006 Evaluation INRIA 27
METISS
Single sensor audio source separation
Factorial GMM
Voice GMM
Music GMM
Observed signalVoice + Music
Wiener filter
EstimatedVoice signal
Benaroya, Bimbot, Gribonval, Ozerov (with FTR&D)
innovative scheme for underdetermined source separation compatibility with speech processing state-of-the-art strong links with sparse decomposition problems versatile and efficient for a range of audio description tasks
Use of afactorial GMMto builda time-varyingWiener filter
Articlein IEEETrans SAP2006
+ new resultsto come
17-18 Mai 2006 Evaluation INRIA 28
METISS
Underdetermined stereophonicsource separation using sparse method
Separation
least squares sparsity
Mixing matrix
Lesage, Gribonval et al.
Audio examplesavailable
17-18 Mai 2006 Evaluation INRIA 29
METISS
Collaborations, Disseminationand Visibility
Privileged cooperation with the TEXMEX group at IRISA (+ VISTA)
Consistent network of academic and industrial partners outside IRISA
Regular participation to collaborative projects (EU-IST, RNRT, bilateral partnership, …)
Strong involvement in concerted research actions (ESTER, MathSTIC, GDR-ISIS, NIST evaluations, …)
Visible participation to and production of free software : ELISA platform, AudioSeg, MPTK, SIROCCO, BSS-EVAL
Sustained effort of publication and dissemination of the group research results
Additional visibility through responsability taking in scientific societies, workshop organisation and editorial boards
17-18 Mai 2006 Evaluation INRIA 30
METISS
Summary 2002-2005Summary 2002-2005
Strategy and perspectivesStrategy and perspectives2006-20102006-2010
17-18 Mai 2006 Evaluation INRIA 31
METISS
Achievements 2002-2005 (1)
solid contributions to the state-of-the art with respect to several topics related to speaker and audio class modelling and recognition
key extension, experimentation and validation of the Hidden Markov Model framework for joint audio and video modelling and structuring
major theoretical and experimental progress in the field of sparse representations and adaptive decomposition
pioneering work in mono- and multi-channel source separation in the underdetermined case
17-18 Mai 2006 Evaluation INRIA 32
METISS
Achievements 2002-2005 (2)
strategic improvement in the efficiency of pursuit algorithms both in terms of search strategy and implementation
development of a usable know-how in keyword spotting and speech recognition
sustained activities in assessment methodology, resource distribution and evaluation campaigns
scientific objective #4 needs consolidation
17-18 Mai 2006 Evaluation INRIA 33
METISS
To keep our position in our initial field of expertise : models, algorithms and tools for automatic processing of audio and speech signal
To push our advantage in the field of sparse representations, both from the theoretical and applicative viewpoint.
To extend our scope towards more powerful approaches for the representation and modeling of audio and multi-modal signals with an audio component
To step in and progress in the area of compressing large-scale high-dimensional multi-modal data
Strategy 2006-2010
17-18 Mai 2006 Evaluation INRIA 34
METISS
Scientific challenges
Probabilistic multi-level multi-stream dependency models for the representation of multiple sources and the integration of heterogeneous levels of knowledge in audio (-visual) streams Bayesian networks
Data-driven representations, model discovery and self-structuring of information in audio and audio-visual streams and contents
theoretical consolidation
Experimental platforms and numerically efficient algorithms for large scale data and near real-time processing engineering work
Deeper understanding of the links between theoretical concepts of adaptive representation, sparse decomposition, multi-scale analysis and pratical implications in terms of robustness, separability and adaptability
potential links with SVM
Compressing large-scale high-dimensional multimodal data for storage, description and classification compressed sensing
17-18 Mai 2006 Evaluation INRIA 35
METISS
QuestionsQuestions