duraid y. mohammed philip j. duncan francis f. li. school of computing science and engineering,...

Post on 17-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Audio Content Analysis in The Presence of Overlapped Classes - A Non-Exclusive Segmentation Approach to

Mitigate Information Losses

Global Summit and Expo onMultimedia & Applications

August 10-11, 2015 Birmingham, UK

Increasing volume of digital Media archives leading to increased demand for these goals

Introduction

Classification- Challenge and Solution

11:50

Speech

SE

Music

Classical classification problems are logically exclusive, i.e. an element is assumed to be a member of one class and of that class only. This hinders some practical uses in audio information mining, since a segment of the soundtrack can have either speech, music, event sounds or a combination of them (fuzzy element)

Non-exclusive classification can mitigate info losses.

11:50

A system integration approach to audio information mining can be

hypothetically built upon the success in the following diverse areas.

To re-deploy these tools, it is essential that a pre-processor should effectively

Where speech, music and audio events of interest occur.These audio segments can be further processed by dedicated algorithms to obtain further information.

The Concept

Hello Door Knock

Universal Open Architecture

Spectral Subtraction Algorithm

11:50

A noise reduction technique.

VAD is employed to detects musical speech and musical

segments

Calculate spectral magnitude to musical and musical speech

segments.

Estimate the clean speech through the following formula

))((|])(ˆ||)(|)(ˆ idjeidixis

Data reduction.

Extract characteristic

features.

Feature Spaces

Mel Frequency Cepstrum Coefficients (MFCCs).

STFT –Temporal pattern analysis. ZCR, RMS ‘Loudness’, Entropy,

Short term energy. Optimized Feature Space For

Speech and Music Detection.

11:50

• Music Analysis Retrieval and SYnthesis for Audio Signals.• Open source framework for audio processing by George

Tzanetakis  University of Victoria Canada.• Development of real time audio analysis and synthesis tools• Audio processing system with specific emphasis on MIR.• Implemented for exclusive classification (Speech or Music).• Music genre organisation.

Speech and Music classes are involved as starting point. Toward generalization, different styles of samples were

included in the training set. Speech samples (children, male, female, speaker with

different languages, aloud speech, speech with laughs,). Music, all genres are added (Jazz, pop, classical, rock ,…). All speech and music samples were mixed together after

normalizing them to produce speech over music samples.

Training Database Building

Pure Speech Mix Samples Pure Music

Speech 100% 90% 80% 70% 60% 50% 40% 30% 0%

Music 0% 10% 20% 30% 40% 50% 60% 70% 100%

Toolbox Demonstration

Results Comparison Before and After Speech Enhance

11:50

AUDIO CLASS MARSYAS ED UOA Length/Seconds

Fr Fa ERD Fr Fa ERD  

SPEECH 45.56% 7.03% 26.30% 2.49% 8.45% 5.47% 1580

MUSIC 7.70% 45.56% 26.63% 11.76% 1.53% 6.65% 2115

2/)( FrFationRateErrorDetec

Open Structure and Common Interfaces toward

general classifier.

Redeployment of currently available techniques.

Encourage third party contributions.

Rapid prototyping of UOA Audio Information Mining

system.

Summary and Conclusions

Thank you for Listening

Audio Routing

Machine Learning

Sound Events Detections

ASR

Role of MIR in UOA

top related