multiresolution stft for analysis and processing of audio alexey lukin moscow state university,...

25
Multiresolution STFT Multiresolution STFT for Analysis and Processing of for Analysis and Processing of Audio Audio Alexey Lukin Alexey Lukin Moscow State University, Russia; Moscow State University, Russia; iZotope Inc., Cambridge, MA iZotope Inc., Cambridge, MA Talk at B.U. Talk at B.U. Sept. 2010 Sept. 2010

Upload: chad-carson

Post on 28-Dec-2015

229 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

Multiresolution STFTMultiresolution STFTfor Analysis and Processing of for Analysis and Processing of

AudioAudio

Alexey LukinAlexey LukinMoscow State University, Russia;Moscow State University, Russia;

iZotope Inc., Cambridge, MAiZotope Inc., Cambridge, MA

Talk at B.U.Talk at B.U.Sept. 2010Sept. 2010

Page 2: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

22/25/25

Short-Time Fourier Short-Time Fourier TransformTransform

Most commonly used transform for audio:Most commonly used transform for audio:►Spectral analysisSpectral analysis►Noise reduction (spectral subtraction algorithm)Noise reduction (spectral subtraction algorithm)► Time-variable filters and other effectsTime-variable filters and other effects

Very fast implementation for a large number of bands via FFTVery fast implementation for a large number of bands via FFT Good energy compaction for many musical signalsGood energy compaction for many musical signals

Many oscillations in basis functions → ringing (Gibbs Many oscillations in basis functions → ringing (Gibbs phenomenon)phenomenon)

Uniform frequency resolution → inadequate resolution at low Uniform frequency resolution → inadequate resolution at low freqs.freqs.

m

mjenmxmwnSTFT ][][],[

+

Page 3: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

33/25/25

Short-Time Fourier Short-Time Fourier TransformTransform

Spectrogram: displays evolution of Spectrogram: displays evolution of spectrum in timespectrum in time

m

mjenmxmwnSTFT ][][],[

Page 4: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

44/25/25

SpectrogramsSpectrograms

Problems:Problems:►Most perceptually meaningful energy is concentrated Most perceptually meaningful energy is concentrated

in a narrow band below 4 kHz → can’t see enough in a narrow band below 4 kHz → can’t see enough detailsdetails

► Time/frequency resolution trade-offTime/frequency resolution trade-off

ConventionalSTFT spectrogram(linear frequency scale)

Page 5: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

55/25/25

SpectrogramsSpectrograms

Problems:Problems:► Poor frequency resolution at low frequencies → Poor frequency resolution at low frequencies →

can’t separate bass harmonics from the bass drumcan’t separate bass harmonics from the bass drum►Time/frequency resolution trade-offTime/frequency resolution trade-off

Mel-scaleSTFT spectrogram(window size = 12 ms)

Page 6: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

66/25/25

SpectrogramsSpectrograms

Problems:Problems:► Poor time resolution at transients → time-smearing Poor time resolution at transients → time-smearing

of drums and other percussive soundsof drums and other percussive sounds

Mel-scaleSTFT spectrogram(window size = 93 ms)

Page 7: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

77/25/25

Filter banksFilter banks

IdeaIdea::

Decompositions of a time-frequency planeDecompositions of a time-frequency plane

DecompositionProcessingof subband

signalsSynthesis

x[n] y[n]… …

f

tSTFT

f

tDWT

Uncertaintyprinciple

Page 8: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

88/25/25

Filter banksFilter banks

Diagram of an mp3 encoder

mp3 filex[n]

FFT

Filter bank Q Huffman

Psychoacoustic model

Perceptual coding of audioPerceptual coding of audio

Page 9: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

99/25/25

Filter banksFilter banks

Window size switching Window size switching (guided by transients (guided by transients detection)detection)

Pre-echo

Transient

Reducedpre-echo

Page 10: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

1010/25/25

Proposed approachProposed approach

Transforms should varytheir time-frequency resolutionin a perceptually motivated way

► Imitation of time-frequency resolution of Imitation of time-frequency resolution of human hearinghuman hearing

► Adaptation of resolution to local signal Adaptation of resolution to local signal featuresfeatures

Page 11: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

1111/25/25

SpectrogramsSpectrograms

Simple solution:Simple solution:►Combine spectrograms with different resolutions: Combine spectrograms with different resolutions:

take bass from a spectrogram with good frequency take bass from a spectrogram with good frequency resolution, take treble from a spectrogram with good resolution, take treble from a spectrogram with good time resolutiontime resolution

Combined resolutionspectrogram(window sizes

from 12 to 93 ms)

Page 12: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

1212/25/25

SpectrogramsSpectrograms

Simple solution: combine spectrograms with Simple solution: combine spectrograms with different resolutionsdifferent resolutions

Each spectrogram is computed on the same Each spectrogram is computed on the same grid of time-frequency points grid of time-frequency points (using zero (using zero padding)padding)

Analysis

Filter bank 1 Filter bank 2

Mixer of coefficients

x[t]

af,t,1

control

af,t,2

af,t

Page 13: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

1313/25/25

SpectrogramsSpectrograms

Better approach: select best resolution for Better approach: select best resolution for each time-frequency neighborhoodeach time-frequency neighborhood

Criteria?Criteria?►Better frequency resolution at bass Better frequency resolution at bass (reflects a-priori (reflects a-priori

psychoacoustical knowledge)psychoacoustical knowledge)

►Maximal energy compaction Maximal energy compaction (to minimize spectral (to minimize spectral smearing in both time and frequency, i.e. maximize sparsity)smearing in both time and frequency, i.e. maximize sparsity)

6 ms 12 ms 24 ms 48 ms 96 ms

best

STFT window size

Page 14: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

1414/25/25

SpectrogramsSpectrograms

Calculation of sparsityCalculation of sparsity(in a given block,(in a given block,

for all T/F resolutions r)for all T/F resolutions r)

6 ms 12 ms 24 ms 48 ms 96 ms

best

STFT window sizes

Here aai,ri,r are STFT magnitudes in the block, Sr is the spectrum sparsity for the given resolution rr, rr00 is the resolution with best sparsity.

rrSr maxarg0

n

iri

n

i

L

Lr

a

an

anorm

nanormS

ri

1,

1

2

1

2,

)(

)(

Page 15: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

1515/25/25

SpectrogramsSpectrograms

Benefits:Benefits:►Sharper bass drum hits and other transients, even Sharper bass drum hits and other transients, even

in mid-frequency rangein mid-frequency range►Sharper guitar harmonics at high frequenciesSharper guitar harmonics at high frequencies

Adaptive resolutionAdaptive resolutionspectrogramspectrogram(window sizes

from 12 to 93 ms)

Page 16: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

1616/25/25

SpectrogramsSpectrograms

Simple solution:Simple solution:►Combine spectrograms with different resolutions: Combine spectrograms with different resolutions:

take bass from a spectrogram with good frequency take bass from a spectrogram with good frequency resolution, take treble from a spectrogram with good resolution, take treble from a spectrogram with good time resolutiontime resolution

Combined resolutionspectrogram(window sizes

from 12 to 93 ms)

Page 17: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

1717/25/25

SpectrogramsSpectrograms

Tone onset waveform

More examplesMore examples

ConventionalSTFT spectrogram

Page 18: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

1818/25/25

SpectrogramsSpectrograms

Combined resolutionspectrogram

More examplesMore examples

Adaptive resolutionAdaptive resolutionspectrogramspectrogram

Page 19: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

1919/25/25

Processing frameworkProcessing framework

General framework forGeneral framework for

multi-resolution processingmulti-resolution processing► Perform processing withPerform processing with

several different resolutionsseveral different resolutions►Adaptively combine (mix)Adaptively combine (mix)

results in a time-frequency spaceresults in a time-frequency space►Mixing is controlled by a-prioriMixing is controlled by a-priori

knowledge of psychoacousticsknowledge of psychoacoustics

and analysis of local signal featuresand analysis of local signal features(e.g. transience or sparsity)(e.g. transience or sparsity)

Processing 1 Processing 2

Analysis

Filter bank Filter bank

Mixer of coefficients

Inversefilter bank

x[t]

x1[t] x2[t]

y[t]

control

Page 20: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

2020/25/25

Noise reductionNoise reduction

Spectral subtraction algorithmSpectral subtraction algorithm1.1. STFT of a noisy signalSTFT of a noisy signal2.2. Estimate power spectrum of noise Estimate power spectrum of noise (manually or (manually or

automatically)automatically)

3.3. Subtract noise power spectrum from a signal power Subtract noise power spectrum from a signal power spectrumspectrum

4.4. InverseInverse STFTSTFT

STFT

Noise spectrumestimation

InverseSTFT

x[t] X[f,t]–

W[f]

S[f,t] s[t]

Page 21: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

2121/25/25

Noise reductionNoise reduction

Spectral subtraction(short windows)

Mix

er

of

coeffi

cien

ts

y[t] x3[t]

Spectral subtraction(long windows)

STFT

STFT

Synthesis

x1[t]

x2[t]

Transienceanalysis

control

Example of adaptive resolutionExample of adaptive resolution►Better frequency resolution at low frequencies Better frequency resolution at low frequencies

(according to the resolution of human hearing)(according to the resolution of human hearing)

►Better temporal resolution near signal transients Better temporal resolution near signal transients (for reduction of Gibbs phenomenon)(for reduction of Gibbs phenomenon)

Page 22: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

2222/25/25

Noise reductionNoise reduction

Results of single-resolution and multi-Results of single-resolution and multi-resolution algorithmsresolution algorithms

Noisy recording(guitar + castanets)

Page 23: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

2323/25/25

Noise reductionNoise reduction

Results of single-resolution and multi-Results of single-resolution and multi-resolution algorithmsresolution algorithms

Single resolution

Multi-resolution(notice less pre-ringing on transients)

Page 24: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

2424/25/25

ConclusionConclusion

When using STFT –do care about the window size!

Choose the size wisely:

► Maximize sparsity (spactrogram sharpness)Maximize sparsity (spactrogram sharpness)

► Account for human perceptionAccount for human perception

Page 25: Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA Talk at B.U. Sept

A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”

2525/25/25

Your questionsYour questions

Demo web page: http://www.izotope.com/tech/aes_adapt/

??