reduction of non-stationary noise using a non-negative latent … · 2008-10-28 · • signal...

15
Reduction of non-stationary noise using a non-negative latent variable decomposition Mikkel M. Schmidt and Jan Larsen DTU Informatics Technical University of Denmark www.imm.dtu.dk

Upload: others

Post on 23-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Reduction of non-stationary noise using a non-negative latent variable decomposition

    Mikkel M. Schmidt and Jan LarsenDTU InformaticsTechnical University of Denmarkwww.imm.dtu.dk

  • 28/10/20082 DTU Informatics, Technical University of Denmark2DTU Informatics, Technical University of Denmark

    Jan Larsen

    Noise Reduction

    • Single channel recording• Unknown speaker / signal of interest• Focus on modeling noise

    Noise Reduction

    System

  • 28/10/20083 DTU Informatics, Technical University of Denmark3DTU Informatics, Technical University of Denmark

    Jan Larsen

    Single channel source separation is hard

    • The is no spatial information hence– beamforming– independent component

    analysisare not feasible

    • Maybe higher-level cognitive capabilties such as context detection could help

    • We will use a data-driven approach to learn a good noise representation

  • 28/10/20084 DTU Informatics, Technical University of Denmark4DTU Informatics, Technical University of Denmark

    Jan Larsen

    The spectrum of alternative methods

    • Wiener filter (Wiener, 1949)• Spectral subtraction (Boll 1979; Berouti et al. 1979) • AR codebook-based spectral subtraction

    (Kuropatwinski & Kleijn 2001) • Minimum statistics (Martin et al. 1994,2001, 2005)• Masking techniques (Wang; Weiss & Ellis 2006)• Factorial models (Roweis 2000,2003)• Non-negative sparse coding (Casey&Westner 2000,

    Wang&Plumbley2006, Schmidt&Olsson 2006, Schmidt,Larsen&Hsiao 2007)

    Several method requires a VAD

    Largely fail for fast changing non-

    stationary noise

  • 28/10/20085 DTU Informatics, Technical University of Denmark5DTU Informatics, Technical University of Denmark

    Jan Larsen

    Our approach in brief

    Signal representation

    Latent variable model

    signal

    Signal representation

    Latent variable model

    signal

    Noise rep.

    Learning phase

    Signal reconstruction

    Extracted signal of interestnot in foucs in this work

  • 28/10/20086 DTU Informatics, Technical University of Denmark6DTU Informatics, Technical University of Denmark

    Jan Larsen

    Signal Representation

    • Exponentiated magnitude spectrogram

    γ = 2 Power spectrogramγ = 1 Magnitude spectrogramγ = 0.67 Cube root compression

    (Steven’s power law - perceived intensity)• Any other representation could be used – wavelets, perceptually

    weighted etc.• Ignores phase information. Reconstruct by re-filtering

  • 28/10/20087 DTU Informatics, Technical University of Denmark7DTU Informatics, Technical University of Denmark

    Jan Larsen

    Non-negative latent variable model

    • Speech

    • Noise

    • Noisy speech

    Non-negative basis Weights

    Binary activation

    Residual

  • 28/10/20088 DTU Informatics, Technical University of Denmark8DTU Informatics, Technical University of Denmark

    Jan Larsen

    Non-negative latent variable model

    • Use a probabilistic Bayesian setting• Exponential priors (sparsity) on B and C

    • Gaussian residual RGoals:

    Posterior of all parmeters A, B, C, S, Nand noise variance

    Marginalized mean estimate of signalcomponent

  • 28/10/20089 DTU Informatics, Technical University of Denmark9DTU Informatics, Technical University of Denmark

    Jan Larsen

    A three-step simple approximate learning procedure for speech1. Compute speech activation using state-of-the-art voice

    activity detector (Qualcomm-ICSI-OGI)2. Compute noise basis representation using non-speech

    signal frames3. Jointly compute noise weights, speech basis, and speech

    weights and reconstruct speech signal

  • 28/10/200810 DTU Informatics, Technical University of Denmark10DTU Informatics, Technical University of Denmark

    Jan Larsen

    Experimental setup

    • Four different noise types: machine gun, string quartet, restaurant noise, traffic noise

    • Mixed by 100 sentences from the TIMIT database with SNR in the range -9dB to 6dB

    • Signal represented by SFTF using 64ms 50% overlapping Hann windowed frames and mapped onto 32 MEL frequency bins [20Hz; 4kHz]

    • 256 bases for signal and noise• Optimal sparsity (hyper-parameters): λB=0.1, λC=0• Qualcomn-ICSI-OGI voice activity detector (VAD)

  • 28/10/200811 DTU Informatics, Technical University of Denmark11DTU Informatics, Technical University of Denmark

    Jan Larsen

    Quality Measure• Signal to noise ratio

    – Simple measure, has only indirect relation to perceived quality

    • Representation-based metrics– In systems based on time-frequency masking, evaluate the

    masks• Perceptual models

    – Promising to use PEAQ or PESQ• High-level Attributes

    – For example word error rate in a speech recognition setup• Listening-tests

    – Expensive, time-consuming, aspects (comfort, intelligibility)

  • 28/10/200812 DTU Informatics, Technical University of Denmark12DTU Informatics, Technical University of Denmark

    Jan Larsen

    Example: Bursts of machine gun shots

  • 28/10/200813 DTU Informatics, Technical University of Denmark13DTU Informatics, Technical University of Denmark

    Jan Larsen

    Results

    • Highly non-stationary noise– Spectral subtraction breaks

    down due to stationarityassumption

    • Almost stationary noise– Proposed method works

    equally well or better than spectral subtraction

    Proposed methodSpectral subtraction

    -9 -6 -3 0 3 60

    5

    10

    15

    -9 -6 -3 0 3 60

    5

    10

    15

    -9 -6 -3 0 3 60

    5

    10

    15

    -9 -6 -3 0 3 60

    5

    10

    15

    Traf

    ficR

    esta

    uran

    tSt

    ring

    quar

    tet

    Mac

    hine

    gun

    SN

    R

    Signal-to-noise ratio (SNR)

    SN

    RS

    NR

    SN

    R

    Non

    -sta

    tiona

    ryS

    tatio

    nary

    More sound examples at

    www.mikkelschmidt.dk

  • 28/10/200814 DTU Informatics, Technical University of Denmark14DTU Informatics, Technical University of Denmark

    Jan Larsen

    Potential ways ahead• Prior model speech activity pattern, e.g. using HMM• Harmonic prior for speech basis• Full Bayesian inference in the model (working on Gibbs

    sampling approach)• Better residual models be including phase uncertainty

    (Rayleigh distribution) (Parry and Essa 2007)• Advanced post-processing (weighted, thresholded spectral

    subtraction, and smoothing) can help

  • 28/10/200815 DTU Informatics, Technical University of Denmark15DTU Informatics, Technical University of Denmark

    Jan Larsen

    • A probabilistic non-negative latent variable decomposition method was presented

    • A full Bayesian inference is possible, we resorted to simple a step-wise procedure

    • The method has potential over classical methods for handling very non-stationary strong noise conditions

    • As an essential output of the method is a good noise estimate, it can also be used as an integral part of other methods we are based on noise estimation.

    Conclusion

    Thank you for your attention!

    Reduction of non-stationary noise using a non-negative latent variable decompositionNoise ReductionSingle channel source separation is hardThe spectrum of alternative methodsOur approach in briefSignal RepresentationNon-negative latent variable modelNon-negative latent variable modelA three-step simple approximate learning procedure for speechExperimental setupQuality MeasureExample: Bursts of machine gun shotsResultsPotential ways aheadConclusion