gabor presentation

28
Easy Does It: Robust Spectro-Temporal Many- Stream ASR without Fine Tuning Streams Ravuri, Morgan, UC Be rk eley Presented by JJ

Upload: jom-kantapon

Post on 06-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 1/28

Easy Does It:Robust Spectro-Temporal Many-

Stream ASR without Fine Tuning

Streams

Ravuri, Morgan, UC Berkeley

Presented by JJ

Page 2: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 2/28

Motivation

• Physiological experiments indifferent mammal species : alarge percentage of neurons inthe primary auditory cortex (A1)respond differently to upward-versus downward-moving ripplesin the spectrogram of the input(Depireux et al., 2001).

• Spectro-temporal receptivefields (STRFs) : individual neurons

are sensitive to specific spectro-temporal modulation frequenciesin the incoming sound signal

Page 3: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 3/28

Introduction

• Cortically-inspired TF features, which capture

spectral and temporal modulations speech

recognition and discrimination.

• Basically, spectro-temporal features are

derived from filtering spectrograms with

particular filters.

• In this case, the GABOR filter is applied to the

auditory spectrogram.

Page 4: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 4/28

Example

Page 5: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 5/28

Example Gabor Filters

Page 6: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 6/28

Example Gabor Filters

Gaussian envelope

complex sinusoid s(n, k)

Page 7: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 7/28

1D Gabor

Gaussian envelope complex sinusoid s(n, k)

Page 8: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 8/28

2D GaborGaussian envelope complex sinusoid s(n, k)

Page 9: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 9/28

Example Gabor Filters

Gaussian envelope

complex sinusoid s(n, k)

Page 10: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 10/28

Their Gabor Filters

Page 11: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 11/28

Their Gabor Filters

parametersDummy

indices

Page 12: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 12/28

Tons of Combinations!

Page 13: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 13/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

Page 14: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 14/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

Page 15: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 15/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

• MLP (Multilayer Perceptron)

• The structure of the MLP

depends on the type of feature

and corpus.

Number of Spectral Cepstral

input units 567 351

frames of context 9 9

hidden units 160 for Aurora2

500 for Number95

160 for Aurora2

500 for Number95

output units 56 56

56D

32D

56D

45D

Page 16: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 16/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

• The outputs of the MLP stream

provide an estimate of the

posterior probability distribution

for phones.

• Then, combine each of these

phone probability estimates

across streams by inverse

entropy.56D

32D

56D

71D

Page 17: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 17/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

• then apply the KL

Transform to the log

probabilities of the

merged MLPs

Principal Components Analysis

56D

32D

56D

71D

Page 18: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 18/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

• then apply the KLTransform to the logprobabilities of themerged MLPs

• reduced to 32D

• orthogonalized

• the features are meanand variance normalized

by utterance• finally appended to the

MFCC feature

56D

32D

56D

71D

Page 19: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 19/28

System

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

• Features HMM

56D

32D

56D

71D39D 32D

Page 20: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 20/28

Experiments

Database

• Aurora 2 (0  –  20 dB)

• Numbers95

• consists of various numeric portionsextracted from telephone dialogues .

• vocabulary size of 32 words

• training set contains 3590 utterancesof clean data, totaling roughly 3 hrs

• 2 test sets contains 1227 utterances.

• The first contains only clean data

• The second contains the sameutterances with noise added at five

SNR (20dB, 15dB, 10dB, 5dB, and0dB).

• Additive noise

Baseline

• 39 MFCC

• 4-stream system

• 28-stream system

Uni-modulation system

• 150 stream

• spectral only and spectral/cepstral

Metric: Word Error Rate (WER)

Page 21: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 21/28

ResultsAurora 2

Numbers 95

Page 22: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 22/28

ResultsAurora 2

Numbers 95

Page 23: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 23/28

ResultsAurora 2

Numbers 95

Page 24: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 24/28

ResultsAurora 2

Numbers 95

Discussion 1

Page 25: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 25/28

ResultsAurora 2

Numbers 95

Discussion 2

Page 26: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 26/28

ResultsAurora 2

Numbers 95

Discussion 3

Page 27: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 27/28

ResultsAurora 2

Numbers 95

Page 28: Gabor presentation

8/3/2019 Gabor presentation

http://slidepdf.com/reader/full/gabor-presentation 28/28

Stream

……. 

……. 

……. 

Stream

Merge MLP outputs

PCA

MFCC Output

• Not just additive noise

• Another TF feature

might not work

• Log-mel filterbank? Orpower like PNCC?

• How to combine MLP?

Inverse Entropy?

56D

32D

56D

71D39D 32D

Future Work