statistical tools for audio processing

61
STATISTICAL TOOLS FOR AUDIO PROCESSING Signal Image (Ecn) Mathieu Lagrange Some material taken from Dan Ellis courses

Upload: others

Post on 24-May-2022

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STATISTICAL TOOLS FOR AUDIO PROCESSING

STATISTICAL TOOLS FOR AUDIO

PROCESSING Signal Image (Ecn) Mathieu Lagrange

Some material taken from Dan Ellis courses

Page 2: STATISTICAL TOOLS FOR AUDIO PROCESSING

Machine Learning

• Machine Learning deals with sub-problems in engineering and sciences rather than the global “intelligence” issue! •  Applied •  A set of well-defined approaches each within its limits that

can be applied to a problem set •  Classification / Pattern Recognition / Sequential

Reasoning / Induction / Parameter Estimation etc.

2

Page 3: STATISTICAL TOOLS FOR AUDIO PROCESSING

Machine Learning • Provide tools and reasoning for the design process of a

given problem •  Is an empirical science • Has a profound theoretical background •  Is extremely diverse • Keep in mind that,

•  Algorithms SHALL NOT be applied blindly to your data/problem set! •  The MATLAB Toolbox syndrome: Examine the hypothesis and limitation of each approach before hitting enter!

•  Do not forget your own intelligence!

3

Page 4: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (I) • Communication theory:

•  Question: What should an optimal decoder do to recover Y from X ?

•  X is usually referred to as observation and is a random variable. •  In most problems, the real state of the world (y) is not observable

to us! So we try to infer this from the observation.

4

Page 5: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (I) •  This is a typical Classification problem •  Intuitive Solution:

•  Threshold on 0.5 •  But let’s make life more difficult!

5

Page 6: STATISTICAL TOOLS FOR AUDIO PROCESSING
Page 7: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (I) • Simple Solution 2:

•  Try to find an optimal boundary (defined as g(x)) that can best separate the two.

•  Define the decision function as + or - distance from this boundry.

•  I am thus assuming that the family of g(x) that discriminate X classes.

7

g(x)

Page 8: STATISTICAL TOOLS FOR AUDIO PROCESSING
Page 9: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (I) •  In the real world things are not as simple

•  Consider the following 2-dimensional problem •  Not hard to see the problem!

9

Page 10: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (I) •  In the real world things are not as simple

•  Consider the following 2-dimensional problem 1. To what extend does our solution

generalize to new data? •  The central aim of designing a classifier

is to correctly classify novel input!

10

Page 11: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (I) •  In the real world things are not as simple

•  Consider the following 2-dimensional problem 1. To what extend does our solution

generalize to new data? •  The central aim of designing a classifier

is to correctly classify novel input! 2. How do we know when we have

collected adequately large and representative set of examples for training?

11

Page 12: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (I) •  In the real world things are not as simple

•  Consider the following 2-dimensional problem 1. To what extend does our solution

generalize to new data? •  The central aim of designing a classifier

is to correctly classify novel input! 2. How do we know when we have

collected adequately large and representative set of examples for training?

3. How can we decide model complexity versus performance?

12

Page 13: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (II) •  This is a typical Regression problem • Polynomial Curve Fitting

13

Page 14: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (II) • Polynomial Curve Fitting

•  Sum-of-squares Error Function

14

Page 15: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (II) • Polynomial Curve Fitting

•  0th order polynomial

15

Page 16: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (II) • Polynomial Curve Fitting

•  1st order polynomial

16

Page 17: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (II) • Polynomial Curve Fitting

•  3rd order polynomial

17

Page 18: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (II) • Polynomial Curve Fitting

•  9th order polynomial

18

Page 19: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (II) • Polynomial Curve Fitting

•  Over-fitting

19

Root-­‐Mean-­‐Square  (RMS)  Error:  

Page 20: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (II) • Polynomial Curve Fitting

•  Over-fitting and regularization •  Effect of data set size (9th order polynomial)

20

Page 21: STATISTICAL TOOLS FOR AUDIO PROCESSING

Sample Example (II) • Polynomial Curve Fitting

•  Regularization •  Penalize large coefficient values

•  9th order polynomial with

21

Page 22: STATISTICAL TOOLS FOR AUDIO PROCESSING

TRAIN MACHINES

•  Interaction between • The machine • The designer

Page 23: STATISTICAL TOOLS FOR AUDIO PROCESSING

The machine • Pattern recognition in action:

•  Examples: •  Speaker Detection •  Music genre classification •  Many more

23

Page 24: STATISTICAL TOOLS FOR AUDIO PROCESSING

The designer • Pattern recognition design cycle:

•  Examples: •  Speaker Detection •  Music genre classification •  Many more

24

Page 25: STATISTICAL TOOLS FOR AUDIO PROCESSING

Feature Extraction • Right features are critical

•  Invariance under irrelevant modications

•  Theoretically equivalent features may act very differently in a particular classifer •  Representations make important aspects explicit •  Remove irrelevant information

•  Feature design incorporates `domain knowledge’ •  although more data -> less need for `cleverness’

• Smaller `feature space' (fewer dimensions) •  Simpler models (fewer parameters) •  less training data needed •  faster training

Page 26: STATISTICAL TOOLS FOR AUDIO PROCESSING

The right features for audio ? • Completely depends on the task at hand

•  Speaker recognition •  Musical genre detection

• Most common perceptual dimensions •  Loudness (Amplitude) •  Pitch (Frequency) •  Timbre (Spectral Envelope)

Page 27: STATISTICAL TOOLS FOR AUDIO PROCESSING

What is important for human ?

Page 28: STATISTICAL TOOLS FOR AUDIO PROCESSING

Frequency Decomposition • A great idea that can be implemented in various ways:

•  Mechanically •  Analogically •  Numerically (fortunately)

Page 29: STATISTICAL TOOLS FOR AUDIO PROCESSING

Discrete Fourier Transform (DFT)

Page 30: STATISTICAL TOOLS FOR AUDIO PROCESSING

Short Time Fourier Transform (STFT) • Want to localize energy in time and frequency

•  break sound into short-time pieces •  calculate DFT of each one

Page 31: STATISTICAL TOOLS FOR AUDIO PROCESSING

The Spectrogram

Page 32: STATISTICAL TOOLS FOR AUDIO PROCESSING

Focus on the spectral envelope

Page 33: STATISTICAL TOOLS FOR AUDIO PROCESSING

MFCCs ? 1.  Take the Fourier

transform of (a windowed excerpt of) a signal.

2.  Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.

3.  Take the logs of the powers at each of the mel frequencies.

4.  Take the discrete cosine transform (DCT)

5.  The MFCCs are the amplitudes of the resulting spectrum.

33

Page 34: STATISTICAL TOOLS FOR AUDIO PROCESSING

MFCCs Rules ?

34

Page 35: STATISTICAL TOOLS FOR AUDIO PROCESSING

Example • Audio

35

Page 36: STATISTICAL TOOLS FOR AUDIO PROCESSING

Potentials of the DCT step • Observation of Pols that the main components capture

most of the variance using a few smooth basis functions, smoothing away the pitch ripples

• Principal components of vowel spectra on a warped frequency scale aren't so far from the cosine basis functions

• Decorrelates the features. •  This is important because the MFCC are in most cases modelled

by Gaussians with diagonal covariance matrices

36

Page 37: STATISTICAL TOOLS FOR AUDIO PROCESSING

Classification • Given some data x and some classes Ci, the optimal

classifier is

• Can model data distribution directly •  Nearest neighbor, SVMs, AdaBoost, neural nets •  Leads to a discriminative model

• Can consider data likelihood •  Thanks to the Bayes’ rule •  Leads to a generative model

Page 38: STATISTICAL TOOLS FOR AUDIO PROCESSING

Basics on random variables • Random variable have joint

distributions p(x, y) • Marginal distribution of y is

• Knowing one value in a joint distribution constrains the remainder

Page 39: STATISTICAL TOOLS FOR AUDIO PROCESSING

Bayes Rule • Bayes is powerful

•  For generative models, it boils down to

Page 40: STATISTICAL TOOLS FOR AUDIO PROCESSING

Gaussian models • Easiest way to model distributions is via a parametric

model •  Assume known form, estimate a few parameters

• Gaussian model is simple and useful:

• Parameters to fit:

•  Mean •  variance

Page 41: STATISTICAL TOOLS FOR AUDIO PROCESSING

In d dimensions

• Described by •  A d-dimensional mean •  A dxd covariance matrix

Page 42: STATISTICAL TOOLS FOR AUDIO PROCESSING

Gaussian mixture models • Single Gaussians cannot model

•  distributions with multiple modes •  distributions with nonlinear correlations

• What about a weighted sum ?

•  Can fit anything given enough components

•  Interpretation: each observation is generated by one of the Gaussians, chosen with probability

Page 43: STATISTICAL TOOLS FOR AUDIO PROCESSING

Gaussian mixtures • Can approximate non linear correlation

• Problem: estimate the parameters of the model •  Easy if we knew which gaussian generated each x

Page 44: STATISTICAL TOOLS FOR AUDIO PROCESSING

Expectation-maximisation (EM) • General procedure for estimating model parameters when

some are unknown •  e.g. which GMM component generated a point

•  Iteratively updated model parameters to maximize Q, the •  expected log-probability of observed data x and hidden

data z

•  E step: calculate using •  M step: find model that maximizes Q using •  can prove that the likelihood is non-decreasing •  hence maximum likelihood model •  local optimum -> depends on initialization

Page 45: STATISTICAL TOOLS FOR AUDIO PROCESSING

Fitting GMMs with EM • Want to find

•  The parameters of the Gaussians •  Weights / priors on Gaussians •  That maximize the likelihood of training data x

•  If one could assign each training sample x to a particular gaussian, the estimation is trivial (model fitting)

• Hence, we treat mixture indices, z, as hidden •  Want to optimize Q of the form •  Differentiate wrt model parameters •  Leads to update equations that are:

Page 46: STATISTICAL TOOLS FOR AUDIO PROCESSING

Update equations • Parameters that maximize Q

• Each involves a « soft assignment » of xn in Gaussian k

Page 47: STATISTICAL TOOLS FOR AUDIO PROCESSING

E-M example • Start

47

(Fig. From A. Moore’s Tutorial)

Page 48: STATISTICAL TOOLS FOR AUDIO PROCESSING

E-M example •  1-st iteration

48

(Fig. From A. Moore’s Tutorial)

Page 49: STATISTICAL TOOLS FOR AUDIO PROCESSING

E-M example •  2-nd iteration

49

(Fig. From A. Moore’s Tutorial)

Page 50: STATISTICAL TOOLS FOR AUDIO PROCESSING

E-M example •  3-rd iteration

50

(Fig. From A. Moore’s Tutorial)

Page 51: STATISTICAL TOOLS FOR AUDIO PROCESSING

E-M example •  4-th iteration

51

(Fig. From A. Moore’s Tutorial)

Page 52: STATISTICAL TOOLS FOR AUDIO PROCESSING

E-M example •  5-th iteration

52

(Fig. From A. Moore’s Tutorial)

Page 53: STATISTICAL TOOLS FOR AUDIO PROCESSING

E-M example •  6-th iteration

53

(Fig. From A. Moore’s Tutorial)

Page 54: STATISTICAL TOOLS FOR AUDIO PROCESSING

E-M example •  20-th iteration

54

(Fig. From A. Moore’s Tutorial)

Page 55: STATISTICAL TOOLS FOR AUDIO PROCESSING

Density Estimation

55

(Fig. Wikipedia)

Page 56: STATISTICAL TOOLS FOR AUDIO PROCESSING

What about K-means then ? • A special case of EM for

GMMs, where •  The membership assignement

is thresholded •  The Gaussians are fully

described by their means

Page 57: STATISTICAL TOOLS FOR AUDIO PROCESSING

K-means

Page 58: STATISTICAL TOOLS FOR AUDIO PROCESSING

K-means

Page 59: STATISTICAL TOOLS FOR AUDIO PROCESSING

K-means

Page 60: STATISTICAL TOOLS FOR AUDIO PROCESSING

Now •  You have

•  Features for representing audio in a meaningful way •  the MFCCs are able to complactly describe the spectral envelope

•  A tool to learn GMMs from training data (complete data for which you know the memberships)

•  Thanks to the Bayes’ theorem, •  you know that given an observation, the model for which this observation

have the maximum likelihood is the best one to consider. •  You can

•  Abstract recorded audio in a meaningful way •  Learn models for each class •  Given an unlabeled sample, decide which label is the most suitable

•  So, •  Have some rest and some food •  See you this afternoon for some hands-on practise

Page 61: STATISTICAL TOOLS FOR AUDIO PROCESSING

Resources

•  Artificial Intelligence: A Modern Approach Stuart Russel and Peter Norvig, Prentice Hall.

•  Pattern Classification R. Duda, P. Hart, D. Stork, Wiley Interscience, 2000.

•  Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006.

•  Introduction to Machine Learning, Ethem Alpaydin, MIT Press, 2004.

•  The Elements of Statistical learning, T. Hastie, R. Tibshirani, J. Friedman, Springer Verlag, 2001. Also available online: http://www-stat.stanford.edu/~tibs/ElemStatLearn/

61