spring 2005 e6820 speech and audio processing and...

Reverberation Characterization

Spring 2005 E6820 Speech and AudioProcessing and Recognition

Byung Suk Lee

2005-3-9

[email protected]

Columbia University

Reverberation Characterization – p. 1/18

Introduction

Reverberant sound is the collection of all the reflected sounds inan auditorium.

• A conventional model: direct, early reflection, andexponential decay. → Focus on exponential decay.

• Reverberation Time: T60 is defined as the time for thesound to die away to a level of 60 dB below its originallevel.

• Sabine’s model: T60 = 0.16(s/m) VSe

[meter units]

= 0.049(s/ft) VSe

[foot units], where V is the volume of theenclosure, Se = effective absorbing area =a1S1 + a2S2 + a3S3 + · · · , and ai is absorbtion constant forsurface Si, i = 1, 2, · · · .

• Air absorbtion: frequency dependent → effect of lowpassfiltering.


Motivation

• Influence audio perception

◦ Degradation of speech intelligibility: RT larger than 0.5sec. (detrimental)

◦ Musical auditorium: reverberation adds richness ofsound. (favorable to some extent)

• Application

◦ Reverberation estimation could give a high levelinformation about the recording condition.

◦ Online RT estimation is an enabling technology for areal-time signal processing system that responds thedegree of reverberation.

◦ Enhance speech recognition and speaker verification.◦ Speech quality improvement: intelligibility, richness.


Digital Reverberator

The spectrogram of reverberator output using pseudo-randomdecaying impulse response.



The waveform of reverberator output using pseudo-randomdecaying impulse response.



The spectrogram of reverberator outputs using: (1) 1st-orderall-pole comb filter (2) 2nd-order all-pole oscillatory filter (3)1st-order all-pass filter (4) 2nd-order all-pass filter.



1. Ratnam et al., 2003. → Exponential decay assumption

P (y; a, σ) =1

a (0) · · · a (N − 1)

(

1

2πσ2

)N/2

× exp

(

−

∑N−1

n=0(y (n) /a (n))2

2σ2

)

(1)

(a) Solving two conditions iteratively until convergence.(b) Fast block algorithm: gradient decent type iteration.(c) Fast online algorithm: calculate L (y; a, σ) with a set of

candidate RTs.

2. Nakatani & Miyoshi, 2003. → Dereverberation by inversefiltering using the harmonic structure of the voiced speech.



(1) Likelihood → Log-likelihood, a (n) → an and a = exp (−1/τ)

L (y; a, σ) = −N (N − 1)

2ln (a)−

N

2ln(

2πσ2)

−1

2σ2

N−1∑

n=0

a−2ny2 (n)

To find a∗ that maximizes L (y; a, σ), take derivative with respectto a and σ, then set them to 0:

∂ ln L (y; a, σ)

∂a= −

N (N − 1)

2a+

1

aσ2

N−1∑

n=0

na−2ny2 (n) (= 0)

∂ ln L (y; a, σ)

∂σ= −

N

σ+

1

σ3∑N−1

n=0a−2ny2 (n)

(= 0)


Exponentially decaying white noise


RT estimation

An exponentially decaying white noise


RT estimation

Nakatani’s woman speech sample


RT estimation

A Meeting Recorder sample


Project Goal

1. Metric: A fast, online, accurate, and robust RT estimation.

• Low computational complexity is desired.• Ratnam claims that the approach is the first of its kind.• First, implement Ratnam’s method.• Incorporating with frequency dependent RT estimation.• Reducing variance of estimation.• Adaptive window length adjustment.

2. Online dereverberation of speech.

• Using the inverse of an all-pole filter. (invertible)• Designing an all-pole filter that has a desired RT.• Estimate RT and dereverberate using a designed

all-pole filter of the estimated RT.


Preliminary result

1. RT estimation algorithm implementation• olrvbest.m: fast online estimation by comparing

log-likelihood of candidate RTs → tested with varioussamples.

• Critical parameter: estimation window (N ).(N ↓⇒variance↑ and N ↑⇒estimated RT↑)⇒ two cases where the estimation yield longer RT.◦ not during free decay: onset of speech or ongoing

speech◦ gradual offset

• The min a gives the right RT: NO, due to the underlyingstochastic process. → order statistics filter.

2. DereverberationInverse filter method: using the inverse of the reverberationfilter.


Preliminary result


Project Plan

• RT estimation based on exponential decay model.Continue to work on characterizing the RT estimation foraccuracy and robustness.◦ Include order statistics filter for automatic RT decision.

→ finding x such that P (x) =∫ p0

(a∗) da∗ = γ

◦ Develop a way to estimate RT that captures frequencydependent decay of reverberation. → calculatelikelihood along consecutive frequency bins.

◦ Reduce variance by applying higher weight to theearlier portion of exponential decay.

◦ Trace the frequency bin with greater energy to get arobust estimate: wider dynamic range of signal beforedecaying below the noise floor.

◦ Adaptively adjust the estimation window to get robustRT estimation.


Project Plan

• Dereverberation → by using the inverse of an all-pole typefilter with an equivalent RT.◦ Design all-pole filter that yields a specific RT.◦ Dereverberate speech using the designed filter .◦ Devise a filter that can reverse reverberation of a

specific RT. [True research]


References

1. R. Ratnam, D. L. Jones, and W. D. O’Brien, Jr., "Fastalgorithms for blind estimation of reverberation time,"Signal Processing Letters, IEEE, vol. 11, pp. 537, 2004.

2. R. Ratnam et al., "Blind estimation of reverberation time",The Journal of the Acoustical Society of America, 114 (5),p2877-92, November 2003.

3. T. Nakatani and M. Miyoshi, "Blend Dereverberation ofSingle Channel Speech Signal Based on HarmonicStructure", IEEE ICASSP-2003, I-92-95, 2003.


spring 2005 e6820 speech and audio processing and...

Documents