speech enhancement ee 516 spring 2009

50
Speech Enhancement EE 516 Spring 2009 Alex Acero

Upload: masao

Post on 31-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Speech Enhancement EE 516 Spring 2009. Alex Acero. Outline. A model of the acoustical environment Simple things first! Microphones Echo cancellation Microphone arrays Single channel noise suppression. Additive noise. Stationary noise: properties don’t change over time: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Speech Enhancement EE 516 Spring 2009

Speech EnhancementEE 516 Spring 2009

Alex Acero

Page 2: Speech Enhancement EE 516 Spring 2009

Outline

• A model of the acoustical environment• Simple things first!• Microphones• Echo cancellation• Microphone arrays• Single channel noise suppression

Page 3: Speech Enhancement EE 516 Spring 2009

Additive noise

• Stationary noise: properties don’t change over time:– White noise x[n]

• flat power spectrum• Samples are uncorrelated

– White Gaussian Noise

• Pdf is Gaussian (see chapter 10)– Typical noise is colored

• Pink noise: low-pass in nature• Non-stationary: properties changes over time

– Babble noise– Cocktail party effect

( )xxS f q[ ] [ ]xxR n q n

Page 4: Speech Enhancement EE 516 Spring 2009

Reverberation

• Impulse response of an average office

0 200 400 600 800 1000 1200 1400 1600 1800 2000-3000

-2000

-1000

0

1000

2000

3000

4000

5000

6000

7000

Time (samples)

Roo

m Im

puls

e R

espo

nse 0 0

1[ ] [ ] [ ]k k

k kk kk k

h n n T n Tr c T

Page 5: Speech Enhancement EE 516 Spring 2009

Model of the Environment

n[m]

x[m] y[m]h[m] +

[ ] [ ] [ ] [ ]y m x m h m n m

Page 6: Speech Enhancement EE 516 Spring 2009

Outline

• A model of the acoustical environment• Simple things first!• Microphones• Echo cancellation• Microphone arrays• Single channel noise suppression

Page 7: Speech Enhancement EE 516 Spring 2009

Cepstral Mean NormalizationCompute mean of cepstrum

And subtract it from input

CMN robust to channel

distortion

Normalizes average

vocal tract or short filters

Average must include

> 2 sec of speech

1

0

1 T

ttT

x x

ˆ t t x x x

0

2

4

6

8

10

12

14

16

10 15 20 30

SNR (dB)

Wo

rd E

rro

r R

ate

(%) No CMN

CMN-2

Page 8: Speech Enhancement EE 516 Spring 2009

RASTA

• CMN is a low-pass filter with rectangular window

• Can use other low-pass filters too• RASTA filter is band-pass

1 3 44

1

2 2( ) 0.1 *

1 0.98

z z zH z z

z

1

0

T

t t ttT

x x x

Page 9: Speech Enhancement EE 516 Spring 2009

Retrain with noisy data

• Mismatches between training and testing are bad for pattern recognition systems

• Retrain with noisy data• Approximation: add noise to clean data and retrain

0

20

40

60

80

100

0 5 10 15 20 25 30

SNR (dB)

Wo

rd E

rro

r R

ate

(%)

Mismatched

Matched (Noisy)

Page 10: Speech Enhancement EE 516 Spring 2009

Multi-condition training

• Very hard to predict exactly the type of noise we’ll encounter at test time

• Too expensive to retrain the system for each noise condition• Train system offline with several noise types and levels

0

5

10

15

20

25

30

5 10 15 20 25 30

SNR (dB)

Wo

rd E

rro

r R

ate

(%)

Matched Noise

Multistyle

Page 11: Speech Enhancement EE 516 Spring 2009

Outline

• A model of the acoustical environment• Simple things first!• Microphones• Echo cancellation• Microphone arrays• Single channel noise suppression

Page 12: Speech Enhancement EE 516 Spring 2009

Condenser Microphone

b

b

h

~

ZM RL

v(t) G+

-

PreamplifierMicrophone

Page 13: Speech Enhancement EE 516 Spring 2009

Ommidirectional microphones

• Polar response

0.5

1

30

210

60

240

90

270

120

300

150

330

180 0

Diaphragm

Mic opening

Page 14: Speech Enhancement EE 516 Spring 2009

Bidirectional microphones

Speech sound wave from the front

Noise sound wave from the side

r

source

(d, 0)(–d, 0)

r1r2

5

10

15

20

25

30

210

60

240

90

270

120

300

150

330

180 0

Page 15: Speech Enhancement EE 516 Spring 2009

Bidirectional microphones

• bidirectional microphone with d=1 cm at 0• Solid line corresponds to far field conditions ( ) and the

dotted line to near field conditions ( )

102

103

104

-30

-25

-20

-15

-10

-5

0

Frequency (Hz)

Diff

eren

ce in

air

pres

sure

(dB

)

0.02 0.5 /d r

Page 16: Speech Enhancement EE 516 Spring 2009

Unidirectional microphones

5

10

15

20

25

30

210

60

240

90

270

120

300

150

330

180 0

Speech sound wave from the front

Noise sound wave from the side

Page 17: Speech Enhancement EE 516 Spring 2009

Dynamic microphones

Output voltage

Magnet

Coil

Diaphragm

Page 18: Speech Enhancement EE 516 Spring 2009

Outline

• A model of the acoustical environment• Simple things first!• Microphones• Echo cancellation• Microphone arrays• Single channel noise suppression

Page 19: Speech Enhancement EE 516 Spring 2009

Acoustic Echo cancellation

2

10 2

{ [ ]}( ) 10log

ˆ{( [ ] [ ]) }

E d nERLE dB

E d n d n

Adaptive filter

Acoustic path H

-

x[n]

s[n]

r[n]

Loudspeaker

e[n]

Speech signal

Microphone

+ +v[n] Local

noise

d[n]ˆ[ ]d n

Page 20: Speech Enhancement EE 516 Spring 2009

Line echo cancellation

Adaptive filter

Hybrid circuit H

-

x[n]

s[n]r[n]

Speaker A

e[n]

Speaker B

+ +v[n]

d[n]

Noise

ˆ[ ]d n

2

10 2

{ [ ]}( ) 10log

ˆ{( [ ] [ ]) }

E d nERLE dB

E d n d n

Page 21: Speech Enhancement EE 516 Spring 2009

Least Mean Squares (LMS)

• Given input

• Estimate output

• Compute error

• Update filter

• Need to tune step size

[ 1] [ ] [ ] [ ]n n e n n W W X

[ ] [ ] [ ]e n d n y n

[ ] { [ ], [ 1], [ 1]}n x n x n x n L X

1

0

[ ] [ ] [ ] [ ] [ ]L

Tk

k

y n w n x n k n n

W X

Page 22: Speech Enhancement EE 516 Spring 2009

Normalized LMS

• Make step size adaptive to ensure convergence

• Where we track the input energy

2[ ]

ˆ [ ]x

nL n

2 2 2ˆ ˆ[ ] (1 ) [ 1] [ ]x xn n x n

Page 23: Speech Enhancement EE 516 Spring 2009

Recursive Least Squares (RLS)

• Newton Raphson

• New weights

• Faster convergence, but more CPU intensive

x0x1

f(x)

1

( )

( )i

i ii

f xx x

f x

121 [ ] ( ) ( )i i i in e e

w w w w2 ( ) [ ] { [ ] [ ]}T

ie n E n n w R x x

[ ] [ 1] [ ] [ ]Tn n n n R R x x

Page 24: Speech Enhancement EE 516 Spring 2009

Outline

• A model of the acoustical environment• Simple things first!• Microphones• Echo cancellation• Microphone arrays• Single channel noise suppression

Page 25: Speech Enhancement EE 516 Spring 2009

12.5

25

30

210

60

240

90

270

120

300

150

330

180 0

12.5

25

30

210

60

240

90

270

120

300

150

330

180 0

12.5

25

30

210

60

240

90

270

120

300

150

330

180 0

12.5

25

30

210

60

240

90

270

120

300

150

330

180 0

Microphone arrays: delay & sum

5 microphones spaced 5 cm apart. Source located at 5 m

Angle 0

400Hz 880Hz 4400Hz 8000 Hz

21

0

1arg max [ sin( )]

N

in i

y n iaN

M0

M1

M2

S

M-2

M-1

a1

0

1[ ] [ sin ]

N

ii

y n y n iaN

Page 26: Speech Enhancement EE 516 Spring 2009

12.5

25

30

210

60

240

90

270

120

300

150

330

180 0

12.5

25

30

210

60

240

90

270

120

300

150

330

180 0

12.5

25

30

210

60

240

90

270

120

300

150

330

180 0

12.5

25

30

210

60

240

90

270

120

300

150

330

180 0

Microphone arrays: delay & sum

5 microphones spaced 5 cm apart. Source located at 5 m.

Angle 30

400Hz 880Hz 4400Hz 8000 Hz

21

0

1arg max [ sin( )]

N

in i

y n iaN

M0

M1

M2

S

M-2

M-1

a1

0

1[ ] [ sin ]

N

ii

y n y n iaN

Page 27: Speech Enhancement EE 516 Spring 2009

WITTY: Who Is Talking To You?

( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

Y f X f V f

B f H f X f G f V f W f

Page 28: Speech Enhancement EE 516 Spring 2009

Bone microphone for noise robust ASR

• Conventional microphones are sensitive to noise• Bone microphones are more noise resistant, but distort the signal

• Not enough data to retrain recognizer with bone microphone

• Fusion between acoustic microphone and bone microphone

Page 29: Speech Enhancement EE 516 Spring 2009

Acoustic Microphone

Page 30: Speech Enhancement EE 516 Spring 2009

Bone Microphone

Page 31: Speech Enhancement EE 516 Spring 2009

Microphone fusion

Page 32: Speech Enhancement EE 516 Spring 2009

Relationship between acoustic mic and bone mic

Acoustic

Contact

Page 33: Speech Enhancement EE 516 Spring 2009

Relationship between acoustic mic and bone mic

Page 34: Speech Enhancement EE 516 Spring 2009

WITTY: Who is talking to you?

Page 35: Speech Enhancement EE 516 Spring 2009

Blind source separation

• Linear mixing• Estimate filter • Separate signals• Using assumption signals are independent

• Do gradient descent:

[ ] [ ]n ny Gx1H G

[ ] [ ]n nx Hy

( [ ]) | | ( [ ])p n p ny xy H Hy

1 1

0 0

( [0], [1], , [ 1]) ( [ ]) | | ( [ ])N N

N

n n

p N p n p n

y y xy y y y H Hy

1

1 ( [ ])( [ ])T Tn n n n n n

H H H H y y

Page 36: Speech Enhancement EE 516 Spring 2009

Blind source separation

Idea: Estimate filters h11[n] and h12[n] that maximize p(z1[n]|) where is a HMM.

Approximate HMM by a Gaussian Mixture Model with LPC parameters => EM algorithm with a linear set of equations

+

+

h11[n]

h22[n]

h12[n]

h21[n]

z1[n]

z2[n]

y1[n]

y2[n]

+

+

h11[n]

h22[n]

h12[n]

h21[n]

z1[n]

z2[n]

y1[n]

y2[n]

Page 37: Speech Enhancement EE 516 Spring 2009

Outline

• A model of the acoustical environment• Simple things first!• Microphones• Echo cancellation• Microphone arrays• Single channel noise suppression

Page 38: Speech Enhancement EE 516 Spring 2009

Spectral subtraction

Corrupted signal

Power spectrum

but

So

Estimate noise power spectrum from noisy frames

Estimate clean power spectrum as

[ ] [ ] [ ]y m x m n m

2 2 2( ) ( ) ( )Y f X f N f

12 2

0

1ˆ ( ) ( )M

ii

N f Y fM

2 22 2 1ˆ ˆ( ) ( ) ( ) ( ) 1( )

X f Y f N f Y fSNR f

2

2

( )( )

ˆ ( )

Y fSNR f

N f

2 2 2( ) ( ) ( ) 2 ( ) ( ) cosY f X f N f X f N f

cos 0E

Page 39: Speech Enhancement EE 516 Spring 2009

Spectral subtraction

Keep original phase

Ensure it’s positive

ˆ ( ) ( ) ( )ssX f Y f H f1

( ) max 1 ,( )ssH f a

SNR f

-5 0 5 10 15 20-12

-10

-8

-6

-4

-2

0

Instantaneous SNR (dB)

Ga

in(d

B)

spectral subtractionmagnitude subtractionOversubtraction

Page 40: Speech Enhancement EE 516 Spring 2009

Aurora2

• ETSI STQ group• TIDigits• Added noise at SNRs: -5dB, 0dB, 5dB, 10dB, 15dB, 20dB• Set A: subway, babble, car, exhibition• Set B: restaurant, airport, street, station• Set C: one noise from set A and one noise from set C• Aurora 3 recorded in car (no digital mixing!)• Aurora4 for large vocabulary• Advanced Front-End (AFE) standard (2001) uses a variant of

spectral subtraction

Page 41: Speech Enhancement EE 516 Spring 2009

Aurora 2 (Clean training)

Using SPLICE algorithm

AA BB CCSubwaySubway BabbleBabble CarCar ExhibitionExhibition AverageAverage RestaurantRestaurant StreetStreet AirportAirport StationStation AverageAverage Subway MSubway M Street MStreet M AverageAverage AverageAverage

CleanClean 20 dB20 dB 98.1698.16 98.5298.52 98.7298.72 98.2798.27 98.4298.42 98.6598.65 97.5897.58 98.8198.81 98.798.7 98.4498.44 98.3498.34 98.0498.04 98.1998.19 98.3898.3815 dB15 dB 96.6596.65 97.6497.64 98.0998.09 96.6196.61 97.2597.25 97.8897.88 96.8996.89 97.9797.97 97.8497.84 97.6597.65 96.8196.81 96.496.4 96.6196.61 97.2897.2810 dB10 dB 93.7793.77 94.6894.68 95.7195.71 93.0993.09 94.3194.31 94.7594.75 93.4493.44 95.8595.85 94.694.6 94.6694.66 93.1893.18 91.2391.23 92.2192.21 94.0394.035 dB5 dB 87.4787.47 84.4684.46 88.4688.46 85.5385.53 86.4886.48 85.0885.08 83.7183.71 87.0387.03 84.9484.94 85.1985.19 84.3184.31 80.3580.35 82.3382.33 85.1385.130 dB0 dB 65.9265.92 57.1357.13 63.6763.67 63.7863.78 62.6362.63 59.7259.72 57.8357.83 63.1163.11 57.4257.42 59.5259.52 59.2359.23 52.952.9 56.0756.07 60.0760.07-5dB-5dB AveragAveragee

88.3988.39 86.4986.49 88.9388.93 87.4687.46 87.8287.82 87.2287.22 85.8985.89 88.5588.55 86.7086.70 87.0987.09 86.3786.37 83.7883.78 85.0885.08 86.9886.98

AA BB CCSubwaySubway BabbleBabble CarCar ExhibitionExhibition AverageAverage RestaurantRestaurant StreetStreet AirportAirport StationStation AverageAverage Subway MSubway M Street MStreet M AverageAverage AverageAverage

CleanClean 20 dB20 dB 37.63%37.63% 84.97%84.97% 50.58%50.58% 52.08%52.08% 56.31%56.31% 86.51%86.51% 43.19%43.19% 87.29%87.29% 75.38%75.38% 73.09%73.09% 74.62%74.62% 59.75%59.75% 67.19%67.19% 65.20%65.20%15 dB15 dB 48.54%48.54% 91.01%91.01% 80.82%80.82% 57.41%57.41% 69.45%69.45% 91.08%91.08% 73.07%73.07% 91.17%91.17% 86.79%86.79% 85.53%85.53% 75.89%75.89% 67.54%67.54% 71.71%71.71% 76.33%76.33%10 dB10 dB 70.72%70.72% 89.48%89.48% 87.00%87.00% 71.61%71.61% 79.70%79.70% 88.39%88.39% 80.05%80.05% 91.01%91.01% 86.40%86.40% 86.46%86.46% 73.87%73.87% 65.70%65.70% 69.79%69.79% 80.42%80.42%5 dB5 dB 73.81%73.81% 78.77%78.77% 82.49%82.49% 73.77%73.77% 77.21%77.21% 78.37%78.37% 73.53%73.53% 81.38%81.38% 79.11%79.11% 78.10%78.10% 67.80%67.80% 61.31%61.31% 64.56%64.56% 75.04%75.04%0 dB0 dB 53.94%53.94% 52.74%52.74% 57.53%57.53% 55.80%55.80% 55.00%55.00% 54.76%54.76% 48.67%48.67% 56.90%56.90% 51.85%51.85% 53.05%53.05% 45.33%45.33% 38.90%38.90% 42.12%42.12% 51.64%51.64%-5dB-5dB AveragAveragee

61.96%61.96% 73.03%73.03% 71.90%71.90% 63.75%63.75% 68.48%68.48% 73.03%73.03% 63.33%63.33% 75.52%75.52% 70.02%70.02% 70.83%70.83% 59.73%59.73% 52.14%52.14% 55.93%55.93% 67.39%67.39%

Page 42: Speech Enhancement EE 516 Spring 2009

Aurora 2 (multi-condition training)

Using SPLICE algorithm

AA BB CC

SubwaySubway BabbleBabble CarCar ExhibitionExhibition AverageAverage RestaurantRestaurant StreetStreet AirportAirport StationStation AverageAverage Subway MSubway M Street MStreet M AverageAverage AverageAverage

CleanClean 20 dB20 dB 98.5398.53 98.6498.64 98.5198.51 98.6498.64 98.5898.58 98.4698.46 97.9197.91 98.698.6 98.5898.58 98.3998.39 98.498.4 98.2598.25 98.3398.33 98.4598.4515 dB15 dB 97.6497.64 98.0798.07 98.3398.33 97.6997.69 97.9397.93 97.7997.79 97.4997.49 97.4497.44 97.4797.47 97.5597.55 97.8897.88 97.1697.16 97.5297.52 97.7097.7010 dB10 dB 95.9895.98 96.3796.37 96.8496.84 95.6595.65 96.2196.21 95.2795.27 94.4194.41 95.1195.11 95.1295.12 94.9894.98 95.7995.79 93.893.8 94.8094.80 95.4395.435 dB5 dB 92.0892.08 88.9488.94 92.7892.78 90.2590.25 91.0191.01 87.6387.63 88.0688.06 88.1688.16 87.0487.04 87.7287.72 90.9790.97 85.8585.85 88.4188.41 89.1889.180 dB0 dB 78.0278.02 65.5765.57 76.8376.83 74.4274.42 73.7173.71 65.3765.37 68.2368.23 69.4969.49 65.5765.57 67.1767.17 72.6772.67 65.4265.42 69.0569.05 70.1670.16-5dB-5dB AverageAverage 92.4592.45 89.5289.52 92.6692.66 91.3391.33 91.4991.49 88.9088.90 89.2289.22 89.7689.76 88.7688.76 89.1689.16 91.1491.14 88.1088.10 89.6289.62 90.1890.18

AA BB CCSubwaySubway BabbleBabble CarCar ExhibitionExhibition AverageAverage RestaurantRestaurant StreetStreet AirportAirport StationStation AverageAverage Subway MSubway M Street MStreet M AverageAverage AverageAverage

CleanClean 20 dB20 dB 38.49%38.49% 40.09%40.09% 24.37%24.37% 47.49%47.49% 37.61%37.61% 50.80%50.80% 13.64%13.64% 45.31%45.31% 52.51%52.51% 40.56%40.56% 40.74%40.74% 49.28%49.28% 45.01%45.01% 40.27%40.27%15 dB15 dB 33.14%33.14% 34.80%34.80% 30.13%30.13% 30.63%30.63% 32.17%32.17% 52.98%52.98% 31.98%31.98% 34.02%34.02% 43.40%43.40% 40.59%40.59% 41.92%41.92% 36.47%36.47% 39.19%39.19% 36.95%36.95%10 dB10 dB 27.70%27.70% 23.09%23.09% 25.82%25.82% 26.15%26.15% 25.69%25.69% 41.17%41.17% 1.06%1.06% 27.12%27.12% 31.56%31.56% 25.23%25.23% 36.79%36.79% 17.33%17.33% 27.06%27.06% 25.78%25.78%5 dB5 dB 31.96%31.96% 11.16%11.16% 40.82%40.82% 21.37%21.37% 26.33%26.33% 24.85%24.85% 17.03%17.03% 13.89%13.89% 21.36%21.36% 19.28%19.28% 48.66%48.66% 19.00%19.00% 33.83%33.83% 25.01%25.01%0 dB0 dB 33.60%33.60% 9.04%9.04% 50.24%50.24% 28.23%28.23% 30.27%30.27% 14.93%14.93% 17.82%17.82% 12.55%12.55% 21.54%21.54% 16.71%16.71% 48.61%48.61% 24.10%24.10% 36.35%36.35% 26.06%26.06%-5dB-5dB AverageAverage 32.85%32.85% 13.01%13.01% 45.52%45.52% 27.57%27.57% 30.15%30.15% 24.04%24.04% 16.83%16.83% 17.14%17.14% 24.99%24.99% 21.05%21.05% 47.14%47.14% 24.13%24.13% 36.01%36.01% 27.87%27.87%

Page 43: Speech Enhancement EE 516 Spring 2009

Wiener Filtering

• Find linear estimate of clean signal• MMSE (Minimum Mean Squared Error)

• Wiener-Hopf equation

• In Freq domain

• If noise and signal are uncorrelated

[ ] [ ] [ ]n n n y x v

ˆ[ ] [ ] [ ]m

n h m n m

x y

2

[ ] [ ] [ ]m

E n h m n m

x y

[ ] [ ] [ ]xy yym

R l h m R l m

( )( )

( )xy

yy

S fH f

S f

[ ] [ ] [ ]xym

R l x m y m l

[ ] [ ] [ ]yym

R l y m y m l

( )( )

( ) ( )xx

xx vv

S fH f

S f S f

Page 44: Speech Enhancement EE 516 Spring 2009

Wiener Filtering

• Find linear estimate of clean signal• If noise and signal are uncorrelated

• With

• Compare with Spectral Subtraction

[ ] [ ] [ ]n n n y x v

ˆ[ ] [ ] [ ]m

n h m n m

x y

( ) ( )( ) 1( ) 1

( ) ( ) ( )yy vvxx

yy yy

S f S fS fH f

S f S f SNR f

( )( )

( )yy

vv

S fSNR f

S f

1( ) max 1 ,

( )ssH f aSNR f

Page 45: Speech Enhancement EE 516 Spring 2009

Spectral Subtraction

0

20

40

60

80

100

0 5 10 15 20 25 30

SNR (dB)

Wo

rd E

rro

r R

ate

(%)

Clean Speech Training

Spectral Subtraction

Matched Noisy Training

Page 46: Speech Enhancement EE 516 Spring 2009

Vector Taylor Series (VTS)

• Acero, Moreno

• The power spectrum, on the average

• Taking logs

• Cepstrum is DCT (matrix C) of log power spectrum

( ) y x h g n x h 1

( ) ln 1 e

C zg z C

[ ] [ ] [ ] [ ]y m x m h m n m

2 2 2 2( ) ( ) ( ) ( )i i i iY f X f H f N f

2 2 2

2 2 2

ln ( ) ln ( ) ln ( )

ln 1 exp ln ( ) ln ( ) ln ( )

i i i

i i i

Y f X f H f

N f X f H f

Page 47: Speech Enhancement EE 516 Spring 2009

Vector Taylor Series (VTS)

• x, h, and n are Gaussian random vectors with means , , and and covariance matrices , , and

• Expand y in first-order Taylor series

xμ hμnμ xΣ hΣ nΣ

( )

( ) ( ) ( )( )x h n x h

x h n

y μ μ g μ μ μ

A x μ A h μ I A n μ

1A CFC1

1( )

1 e

C μf μ

( )y x h n x h μ μ μ g μ μ μ

( ) ( )T T T y x h nΣ AΣ A AΣ A I A Σ I A

Page 48: Speech Enhancement EE 516 Spring 2009

Vector Taylor Series

• Distribution of corrupted log-spectra• Noise with mean of 0dB and std dev of 2dB• Speech with mean of 25dB• Montecarlo simulation• Std dev: 25dB 10dB 5dB

0 50 1000

0.01

0.02

0.03

0 20 40 600

0.01

0.02

0.03

0.04

0 20 40 600

0.02

0.04

0.06

0.08

Page 49: Speech Enhancement EE 516 Spring 2009

Phase matters

Corrupted signal

Spectrum

But is only an approximation

[ ] [ ] [ ]y m x m n m

2 2 2( ) ( ) ( )Y f X f N f

2 2 2( ) ( ) ( ) 2 ( ) ( ) cosY f X f N f X f N f

cos 0E

2 ( ) ( ) cos 0t t tX f N f

-6 -4 -2 0 2-6

-5

-4

-3

-2

-1

0

1

2

-6 -5 -4 -3 -2 -1 0 1 2-6

-5

-4

-3

-2

-1

0

1

2

Page 50: Speech Enhancement EE 516 Spring 2009

Non-stationary noise

• Speech/Noise decomposition (Varga et al.)

Observations

Speech HMM

Noise HMM