digital audio signal processing lecture-4: noise reduction

Digital Audio Signal Processing

Lecture-4: Noise Reduction

Marc Moonen/Alexander BertrandDept. E.E./ESAT-STADIUS, KU Leuven

[email protected]

homes.esat.kuleuven.be/~moonen/

mailto:[email protected]

Digital Audio Signal Processing Version 2014-2015 Lecture-4: Noise Reduction p. 2

Overview

• Spectral subtraction for single-micr. noise reduction– Single-microphone noise reduction problem– Spectral subtraction basics (=spectral filtering)– Features: gain functions, implementation, musical noise,…– Iterative Wiener filter based on speech signal model

• Multi-channel Wiener filter for multi-micr. noise red.– Multi-microphone noise reduction problem– Multi-channel Wiener filter (=spectral+spatial filtering)

• Kalman filter based noise reduction– Kalman filters– Kalman filters for noise reduction


Single-Microphone Noise Reduction Problem

• Microphone signal is

• Goal: Estimate s[k] based on y[k]

• Applications: Speech enhancement in conferencing, handsfree telephony, hearing aids, …

Digital audio restoration • Will consider speech applications: s[k] = speech signal

desired signal estimate

desired signal s[k]

noise signal(s)

? ][ksy[k]

][][][ knksky

desired signal contribution

noisecontribution


Spectral Subtraction Methods: Basics

• Signal chopped into `frames’ (e.g. 10..20msec), for each frame a frequency domain representation is

(i-th frame)

• However, speech signal is an on/off signal, hence some frames have speech +noise, i.e.

some frames have noise only, i.e.

• A speech detection algorithm is needed to distinguish between these 2 types of frames (based on energy/dynamic range/statistical properties,…)

][][][ knksky

)()()( iii NSY

frames} only'-noise{`frame )( 0 )( iii NY


Spectral Subtraction Methods: Basics

• Definition: () = average amplitude of noise spectrum

• Assumption: noise characteristics change slowly, hence estimate () by (long-time) averaging over (M) noise-only frames

• Estimate clean speech spectrum Si() (for each frame), using corrupted speech spectrum Yi() (for each frame, i.e. short-time estimate) + estimated ():

based on `gain function’

)()()(ˆ iii YGS

))(ˆ),(()( ii YfG

framesonly -noise

)(1

)(ˆM

iYM

})({)( iNE


Spectral Subtraction: Gain Functions

2

2

)(

)(ˆ1)(

i

iY

G

2

2

)(

)(ˆ11

2

1)(

i

iY

G

))(),(ˆ()( ii YfG

)SNR,SNR()( priopostfGi

)(

)(ˆ1)(

ii Y

G

2

2

)(

)(ˆ1)(

i

iY

G

Ephraim-Malah = most frequently used in practice

Non-linear Estimation

Maximum Likelihood

Wiener Estimation

Spectral Subtraction

Magnitude Subtraction

see next slide



• Example 1: Ephraim-Malah Suppression Rule (EMSR)

with:

• This corresponds to a MMSE (*) estimation of the speech spectral amplitude |Si()| based on observation Yi() ( estimate equal to E{ |Si()| | Yi() } ) assuming Gaussian a priori distributions for Si() and Ni() [Ephraim & Malah 1984].

• Similar formula for MMSE log-spectral amplitude estimation [Ephraim & Malah 1985].

(*) minimum mean squared error

prio

priopost

prio

prio

post SNR1

SNRSNR.

SNR1

SNR

SNR

1

2)( MGi

2

2

11postprio

2

2

post

102

)(

)()(1,0)-)max(SNR-(1)(SNR

)(ˆ

)()(SNR

)2

()2

()1(][

ii

i

YG

Y

IIeM

modified Bessel functions

skip fo

rmulas



• Example 2: Magnitude Subtraction – Signal model:

– Estimation of clean speech spectrum:

– PS: half-wave rectification

)(,)(

)()()(

iyji

iii

eY

NSY

)(

)(

)(ˆ1

)(ˆ)()(ˆ

)(

)(,

i

G

i

jii

YY

eYS

i

iy

))(,0max()( ii GG



• Example 3: Wiener Estimation – Linear MMSE estimation:

find linear filter Gi() to minimize MSE

– Solution:

Assume speech s[k] and noise n[k] are uncorrelated, then...

– PS: half-wave rectification

2

2

2

22

,

,,

,

,

)(

)(ˆ1

)(

)(ˆ)(

)(

)()(

)(

)()(

ii

i

iyy

inniyy

iyy

issi

YY

Y

P

PP

P

PG

)(

)(

)().(

)().()(

,

,

iyy

isy

ii

iii P

P

YYE

YSEG <- cross-correlation in i-th frame

<- auto-correlation in i-th frame

2

)(ˆ

)().()(

iS

YGSE iii


Spectral Subtraction: Implementation

® Short-time Fourier Transform (=uniform DFT-modulated analysis filter bank)

= estimate for Y(n ) at time i (i-th frame)

N=number of frequency bins (channels) n=0..N-1 M=downsampling factor K=frame length h[k] = length-K analysis window (=prototype filter)

® frames with 50%...66% overlap (i.e. 2-, 3-fold oversampling, N=2M..3M)® subband processing:

® synthesis bank: matched to analysis bank (see DSP-CIS)

1

0

/2].[][],[K

k

NknjekMiykhinY

y[k]Y[n,i]

Short-timeanalysis

Short-timesynthesis

Gain functions

][ˆ ks],[ˆ inS

],[ˆ in

],[].,[],[ˆ inYinGinS


Spectral Subtraction: Musical Noise

• Audio demo: car noise

• Artifact: musical noise

What? Short-time estimates of |Yi()| fluctuate randomly in noise-only frames,

resulting in random gains Gi() ® statistical analysis shows that broadband noise is transformed into

signal composed of short-lived tones with randomly distributed frequencies (=musical noise)

magnitude subtraction][ky ][̂ks


probability that speech is present, given observation

)()()(ˆ ii YGS

instantaneousaverage

Spectral Subtraction: Musical Noise

• Solutions?- Magnitude averaging: replace Yi() in calculation of Gi()

by a local average over frames

- EMSR (p7)- augment Gi() with soft-decision VAD:

Gi() P(H1 | Yi()). Gi() …


Spectral Subtraction: Iterative Wiener Filter

Example of signal model-based spectral subtraction…• Basic:

Wiener filtering based spectral subtraction (p.9), with (improved) spectra estimation based on parametric models

• Procedure:1. Estimate parameters of a speech model from noisy signal y[k]

2. Using estimated speech parameters, perform noise reduction (e.g. Wiener estimation, p. 9)

3. Re-estimate parameters of speech model from the speech signal estimate

4. Iterate 2 & 3



white noise

generator

pulse train……

pitch period

voiced

unvoiced

x

M

m

mjme

1

1

1

sg

speech

signal

frequency domain:

time domain:

= linear prediction parameters

all-pole filter

u[k]

)(1

)(

1

Ue

gS M

m

mjm

s

M

msm kugmksks

1

][][][

TM 1α



For each frame (vector) y[m] (i=iteration nr.)

1. Estimate and

2. Construct Wiener Filter (p.9)

with:

• estimated during noise-only periods

•

3. Filter speech frame y[m]

isg , TiMii ,,1 α

)()(

)(..)(

nnss

ss

PP

PG

)(nnP

2

1,

,

1

)(

M

m

mjim

isss

e

gP

][ˆ mis

Repeat until

some errorcriterion issatisfied


Overview





Multi-Microphone Noise Reduction Problem

(some) speech estimate

speech source

noise source(s)

microphone signals

Miknksky iii ...1],[][][ ][kn

][ks

? ][ks

speech part noise part



Will estimate speech

part in microphone 1

(*) (**)

Miknksky iii ...1],[][][

? ][1 ks

(*) Estimating s[k] is more difficult, would include dereverberation... (**) This is similar to single-microphone model (p.3), where additional microphones (m=2..M) help to get a better estimate

][kn

][ks



• Data model:

See Lecture-2 on multi-path propagation, with q left out for conciseness.

Hm(ω) is complete transfer function from speech source position to m-the microphone

)(

)(

)(

)(.

)(

)(

)(

)(

)(

)(

)()().(

)()()(

2

1

2

1

2

1

MMM N

N

N

S

H

H

H

Y

Y

Y

S

Nd

NSY


Multi-Channel Wiener Filter (MWF)

• Data model:

• Will use linear filters to obtain speech estimate (as in Lecture-2)

• Wiener filter (=linear MMSE approach)

Note that (unlike in DSP-CIS) `desired response’ signal S1(w) is unknown here (!), hence solution will be `unusual’…

)()().( )( NdY S

)().()().()(ˆ1

*1 YFH

M

mmm YFS

})().()({min2

1)( YFFHSE



• Wiener filter solution is (see DSP-CIS)

– All quantities can be computed !– Special case of this is single-channel Wiener filter formula on p.9

)}().({)}().({.)}().({

...

.)}().({)}().({)(

*1

*1

1

)0)}().({(with

lationcrosscorre

*1

1

ationautocorrel

*

1

NEYEE

SEE

H

NE

H

NYYY

YYYF

S

compute during speech+noise periods

compute during noise-only periods


• MWF combines spatial filtering (as in Lecture-2) with single-channel spectral filtering (as in single-channel noise reduction)

if

then…

noise vectorsteering

2

1

)()(.)(

)(

)(

)(

)(

Nd

Y

S

Y

Y

Y

M

)()}().({ NNHE ΦNN



• …then it can be shown that

• represents a spatial filtering (*)

Compare to superdirective & delay-and-sum beamforming (Lecture-2) – Delay-and-sum beamf. maximizes array gain in white noise field– Superdirective beamf. maximizes array gain in diffuse noise field – MWF maximizes array gain in unknown (!) noise field.

MWF is operated without invoking any prior knowledge (steering

vector/noise field) ! (the secret is in the voice activity detection… (explain))

(*) Note that spatial filtering can improve SNR, spectral filtering never improves SNR

(at one frequency)

)().(.)()( 1

scalar

dΦF NN

)().(1 dΦNN



• …then it can be shown that (continued)

• represents an additional `spectral post-filter’ i.e. single-channel Wiener estimate (p.9), applied to output signal

of spatial filter

(prove it!)

)().(.)()( 1

scalar

dΦF NN

1)(.)().().(

)(.)(...)( 21

*1

2

S

HS

NNH dΦd

)(



Multi-Channel Wiener Filter: Implementation

• Implementation with short-time Fourier transform: see p.10 • Implementation with time-domain linear filtering:

M

ii

Ti

iiiTi

kks

Lkykykyk

11 ].[][

]1[...]1[][][

fy

y

][kn

][ks

][1 ks

][1 ky

filter

coefficients

][][][ kkk iii nsy


Solution is…

}].[][{min2

1 fyf kksE T

TTM

TT

TTM

TT

kkkk ][...][][][

...

21

21

yyyy

ffff

]}[.][{]}[.][{.}][.][{

]}[.][{.}][.][{

11

1

1

1

knkEkykEkkE

kskEkkE

T

T

nyyy

yyyf

compute during speech+noise periods

compute during noise-only periods

• Implementation with time-domain linear filtering:

Multi-Channel Wiener Filter: Implementation


Overview





State space model of a time-varying discrete-time system

with v[k] and w[k]: mutually uncorrelated, zero mean, white noises

Then: given A[k], B[k], C[k], D[k], V[k], W[k] and

input/output-observations u[k],y[k], k=0,1,2,... then

Kalman filter produces MMSE estimates of internal states x[k], k=0,1,...

PS: will use shorthand notation here, i.e.

xk, yk ,.. instead of x[k], y[k],..

process noise

measurement noise

Kalman Filter 1/12


Kalman Filter 2/12

Definition: = MMSE-estimate of xk using all available data up until time l

• `FILTERING’ = estimate

• `PREDICTION’ = estimate

• `SMOOTHING’ = estimate


Initalization:

‘Conventional Kalman Filter’ operation @ time k (k=0,1,2,..) Given and corresponding error covariance matrix , Compute and using uk, yk :Step 1: Measurement Update (produces ‘filtered’ estimate)

(compare to standard RLS!)

Step 2: Time Update

(produces ‘1-step prediction’)

=error covariance matrix

Kalman Filter 3/12


PS: ‘Standard RLS’ is a special case of ‘Conventional KF’

Kalman Filter 4/12

Internal state vector is FIR filter

coefficients vector, which is

assumed to be time-invariant


State estimation @ time k corresponds to overdetermined set of linear

equations, where vector of unknowns contains all previous state vectors :

Kalman Filter 5/12




Kalman Filter 6/12Te

chn

ical

det

ail…




Kalman Filter 7/12


‘Square-Root’ Kalman Filter

Kalman Filter 8/12

Propagated from

time k-1 to time k

..hence requires only lower-right/lower part



Kalman Filter 9/12

Propagated from

time k-1 to time k



Kalman Filter 10/12

Propagated from

time k-1 to time k

Propagated from

time k to time k+1


Kalman Filter 11/12

QRD-

=QRD-


Kalman Filter 12/12

can be derived from square-root KF equations

: can be worked into measurement update eq.

: can be worked into state update eq.

is .

[details omitted]


Kalman filter for Speech Enhancement

• Assume AR model of speech and noise

• Equivalent state-space model is…

y=microphone signal

N

mnm

M

msm

kwgmknkn

kugmksks

1

1

][][][

][][][

u[k], w[k] = zero mean, unit

variance,white noise

][][

][][]1[

kky

kkkT xc

vAxx



with: ][]1[][]1[][ knNknksMkskT x

n

s

NM

T

n

s

g0

0gGC

A0

0AA ;100100;

1111

100

00

010

;100

00

010

NN

n

MM

s AA

Tkwkuk ][][.][ Gv

;00;00 nTns

Ts gg gg

TGGQ .



Disadvantages iterative approach:• complexity• delay

split signal

in frames

estimate

parameters

Kalman Filter

reconstruct

signal

imisg ,, ˆ;ˆ iming ,,

ˆ;ˆ

iterations

y[k]

][̂ks

],[ˆ mis

][ˆ min

Iterative algorithm


CONCLUSIONS

• Single-channel noise reduction– Basic system is spectral subtraction– Only spectral filtering, hence can only exploit differences in spectra

between noise and speech signal:• noise reduction at expense of speech distortion• achievable noise reduction may be limited

• Multi-channel noise reduction– Basic system is MWF,– Provides spectral + spatial filtering (links with beamforming!)

• Iterative Wiener filter & Kalman filtering– Signal model based approach

digital audio signal processing lecture-4: noise reduction

Documents