5 4 3 2 1 music analysis sinewave synthesis nonspeech lecture...

38
E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820: Speech & Audio Processing & Recognition Lecture 7: Music analysis and synthesis Music and nonspeech Music synthesis techniques Sinewave synthesis Music analysis Transcription Dan Ellis <[email protected]> http://www.ee.columbia.edu/~dpwe/e6820/ 1 2 3 4 5

Upload: others

Post on 27-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1

EE E6820: Speech & Audio Processing & Recognition

Lecture 7:Music analysis and synthesis

Music and nonspeech

Music synthesis techniques

Sinewave synthesis

Music analysis

Transcription

Dan Ellis <[email protected]>http://www.ee.columbia.edu/~dpwe/e6820/

1

2

3

4

5

Page 2: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 2

Music & nonspeech

• What is ‘nonspeech’?

- according to research effort: a little music- in the world: most everything

attributes?

1

Originnatural man-made

Info

rmat

ion

cont

ent

low

high

wind &water

animalsounds

speech music

machines & enginescontact/

collision

Page 3: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 3

Sound attributes

• Attributes suggest model parameters

• What do we notice about ‘general’ sound?

- psychophysics: pitch, loudness, ‘timbre’- bright/dull; sharp/soft; grating/soothing- sound is not ‘abstract’:

tendency is to describe by source-events

• Ecological perspective

- what matters about sound is ‘what happened’

our percepts express this more-or-less directly

Page 4: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 4

Aside: Sound textures

• What do we hear in:

- a city street- a symphony orchestra

• How do we distinguish:

- waterfall- rainfall- applause- static

Levels

of ecological description...

time / s

freq

/ H

z

Applause04

0 1 2 3 40

1000

2000

3000

4000

5000

time / s

freq

/ H

z

Rain01

0 1 2 3 40

1000

2000

3000

4000

5000

Page 5: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 5

Motivations for modeling

• Describe/classify

- cast sound into model because want to use the resulting parameters

• Store/transmit

- model implicitly exploits limited structure of signal

• Resynthesize/modify

- model separates out interesting parameters

Sound Model parameterspace

Page 6: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 6

Analysis and synthesis

• Analysis is the converse of synthesis:

• Can exist apart:

- analysis for classification- synthesis of artificial sounds

• Often used together:

- encoding/decoding of compressed formats- resynthesis based on analyses-

analysis-by-synthesis

Sound

AnalysisSynthesis

Model / representation

Page 7: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 7

Outline

Music and nonspeech

Music synthesis techniques

- Framework- Historical development

Sinewave synthesis

Music analysis

Transcription

elements?

1

2

3

4

5

Page 8: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 8

Music synthesis techniques

• What is music?

- could be anything

flexible synthesis needed!

• Key elements of conventional music

- instruments

note-events (time, pitch, accent level)

melody, harmony, rhythm- patterns of repetition & variation

• Synthesis framework:

instruments: common framework for many notesscore: sequence of (time, pitch, level) note events

2

Page 9: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 9

The nature of musical instrument notes

• Characterized by instrument (register), note, loudness (emphasis), articulation...

distinguish how?

Time

Fre

quen

cy

Piano

0

1000

2000

3000

4000

0 1 2 3 4 Time

Violin

0

1000

2000

3000

4000

0 1 2 3 4

Time

Fre

quen

cy

Clarinet

0

1000

2000

3000

4000

0 1 2 3 4 Time

Trumpet

0

1000

2000

3000

4000

0 1 2 3 4

Page 10: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 10

Development of music synthesis

• Goals of music synthesis:

- generate realistic / pleasant new notes- control / explore timbre (quality)

• Earliest computer systems in 1960s(voice synthesis, algorithmic)

• Pure synthesis approaches:

- 1970s: Analog synths- 1980s: FM (Stanford/Yamaha)- 1990s: Physical modeling, hybrids

• Analysis-synthesis methods:

- sampling / wavetables- sinusoid modeling- harmonics + noise (+ transients)

others?

Page 11: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 11

Analog synthesis

• The minimum to make an ‘interesting’ sound

• Elements:

- harmonics-rich oscillators- time-varying filters- time-varying envelope- modulation: low frequency + envelope-based

• Result:

- time-varying spectrum, independent pitch

Oscillator Filter

Envelope

PitchTrigger

Vibrato

t

t

f

+

Cutofffreq

GainSound

++

Page 12: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 12

FM synthesis

• Fast freq. modulation

harmonic sidebands:

J

n

(

β

)

is a Bessel function:

Complex harmonic spectra by varying

β

ωct β ωmtsin+( )cos Jn β( ) ω0 nωm+( )cosn ∞–=

∞∑=

0 1 2 3 4 5 6 7 8 9-0.5

0

0.5

1J0

J1 J2 J3 J4Jn(β) ≈ 0 for β < n + 2

modulation index β

time / s

freq

/ H

z

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

1000

2000

3000

4000

whatuse?

Page 13: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 13

Sampling synthesis

• Resynthesis from real notes

vary pitch, duration, level

• Pitch:

stretch (resample) waveform

• Duration: loop a ‘sustain’ section

• Level: cross-fade different examples

- need to ‘line up’ source samples

-0.2

-0.1

0

0.1

0.2

0 0.1 0.2 time

0 0.002 0.004 0.006 0.008 time / s-0.2

-0.1

0

0.1

0.2

0 0.002 0.004 0.006 0.008 time / s

596 Hz 894 Hz

-0.2

-0.1

0

0.1

0.2

-0.2

-0.1

0

0.1

0.2

-0.2

-0.1

0

0.1

0.2

0 0.1 0.2 0.3 0 0.1 0.2 0.3time / s time / s

0.204 0.2060.174 0.176

-0.2

-0.1

0

0.1

0.2

-0.2

-0.1

0

0.1

0.2

0 0.05 0.1 0.15 0 0.05 0.1 0.15time / s time / s

Soft Loud

veloc

mix

good& bad?

Page 14: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 14

Outline

Music and nonspeech

Music synthesis techniques

Sinewave synthesis (detail)- Sinewave modeling- Sines + residual ...

Music analysis

Transcription

1

2

3

4

5

Page 15: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 15

Sinewave synthesis

• If patterns of harmonics are what matter,why not generate them all explicitly:

- particularly powerful model for pitched signals

• Analysis (as with speech):- find peaks in STFT |S[ω,n]| & track

- or track fundamental ω0 (harmonics / autoco)

& sample STFT at k·ω0

→set of Ak[n] to duplicate tone:

• Synthesis via bank of oscillators

3

s n[ ] Ak n[ ] n k ω⋅ 0⋅( )cosk∑=

0 0.05 0.1 0.15 0.2 time / s time / s0

2000

4000

6000

8000

freq

/ H

z

freq / Hz

mag

00.1

0.2

05000

0

1

2

Page 16: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 16

Steps to sinewave modeling - 1

• The underlying STFT:

What value for N (FFT length & window size )?What value for H (hop size: n0 = r·H, r = 0, 1, 2...)?

• STFT window length determines freq. resol’n:

• Choose N long enough to resolve harmonics→ 2-3x longest (lowest) fundamental period- e.g. 30-60 ms = 480-960 samples @ 16 kHz- choose H ≤ N/2

• N too long → lost time resolution- limits sinusoid amplitude rate of change

X k n0,[ ] x n n0+[ ] w n[ ] j2πkn

N-------------

–exp⋅ ⋅n 0=

N 1–

∑=

Xw ejω( ) X e

jω( ) W ejω( )*=

Page 17: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 17

Steps to sinewave modeling - 2

• Choose candidate sinusoids at each time by picking peaks in each STFT frame:

• Quadratic fit for peak, lin. interp. for phase:

+ linear interp. of unwrapped phase

time / s

freq

/ H

zle

vel /

dB

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.180

2000

4000

6000

8000

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz-60

-40

-20

0

20

400 600 800 freq / Hz

-20

-10

0

10

20

400 600 800 freq / Hz

-10

-5

0

leve

l / d

B

phas

e / r

ady

x

y = ax(x-b)

b/2

ab2/4

Page 18: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 18

Steps to sinewave modeling - 3

• Which peaks to pick?Want ‘true’ sinusoids, not noise fluctuations- ‘prominence’ threshold above smoothed spec.

• Sinusoids exhibit stability...- of amplitude in time- of phase derivative in time→compare with adjacent time frames to test?

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz-60

-40

-20

0

20

leve

l / d

B

Page 19: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 19

Steps to sinewave modeling - 4

• ‘Grow’ tracks by appending newly-found peaks to existing tracks:

- ambiguous assignments possible

• Unclaimed new peak- ‘birth’ of new track- backtrack to find earliest trace?

• No continuation peak for existing track- ‘death’ of track- or: reduce peak threshold for hysteresis

timefr

eq

death

birth

existingtracks

newpeaks

Page 20: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 20

Resynthesis of sinewave models

• After analysis, each track defines contours in frequency, amplitude fk[n], Ak[n] (+ phase?)

- use to drive a sinewave oscillators & sum up

• ‘Regularize’ to exactly harmonic fk[n] = k·f0[n]

0 0.05 0.1 0.15 0.2500

600

7000

1

2

3

0 0.05 0.1 0.15 0.2time / s

time / sfreq

/ Hz

leve

l

-3

-2

-1

0

1

2

3

Ak[n]·cos(2πfk[n]·t)

fk[n]

Ak[n]

n

0 0.05 0.1 0.15 0.20

2000

4000

6000

0 0.05 0.1 0.15 0.2550

600

650

700

time / s time / s

freq

/ H

z

freq

/ H

z

whatto do?

Page 21: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 21

Modification in sinewave resynthesis

• Change duration by warping timebase- may want to keep onset unwarped

• Change pitch by scaling frequencies- either stretching or resampling envelope

• Change timbre by interpolating params

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

1000

2000

3000

4000

5000

time / s

freq

/ H

z

0 1000 2000 3000 40000

10

20

30

40

freq / Hz

leve

l / d

B

0 1000 2000 3000 40000

10

20

30

40

freq / Hz

leve

l / d

B

Page 22: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 22

Sinusoids + residual

• Only ‘prominent peaks’ became tracks- remainder of spectral energy was noisy?→ model residual energy with noise!

• How to obtain ‘non-harmonic’ spectrum?- zero-out spectrum near extracted peaks?- or: resynthesize (exactly) & subtract waveforms

.. must preserve phase!

• Can model residual signal with LPC→flexible representation of noisy residual

es n[ ] s n[ ] Ak n[ ] 2πn f k n[ ]⋅( )cosk∑–=

0 1000 2000 3000 4000 5000 6000 7000 freq / Hz

mag

/ dB

-80

-60

-40

-20

0

20

original

sinusoids

residualLPC

Page 23: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 23

Sinusoids + noise + transients

• Sound represented as sinusoids and noise:

Parameters are {Ak[n], fk[n]}, hn[n]

• Separate out abrupt transients in residual?

- more specific → more flexible

s n[ ] Ak n[ ] 2πn f k n[ ]⋅( )cosk∑ hn n[ ] b n[ ]*+=

Sinusoids Residual es n[ ]

time / s

freq

/ H

z

0 0.2 0.4 0.60

2000

4000

6000

8000

2000

4000

6000

8000

0 0.2 0.4 0.60

2000

4000

6000

8000

{Ak[n], fk[n]}

hn[n]

es n[ ] tk n[ ]k∑ hn n[ ] b n[ ]*+=

Page 24: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 24

Outline

Music and nonspeech

Music synthesis techniques

Sinewave synthesis

Music analysis- Instrument identification- Pitch tracking

Transcription

1

2

3

4

5

Page 25: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 25

Music analysis

• What might we want to get out of music?

• Instrument identification- different levels of specificity- ‘registers’ within instruments

• Score recovery- transcribe the note sequence- extract the ‘performance’

• Ensemble performance- ‘gestalts’: chords, tone colors

• Broader timescales- phrasing & musical structure- artist / genre clustering and classification

4

Page 26: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 26

Instrument identification

• Research looks for perceptual ‘timbre space’

• Cues to instrument identification- onset (rise time), sustain (brightness)

• Hierarchy of instrument families- strings / reeds / brass- optimize features at each level

bright

dull

low fluxhi flux

low attack

hi attack

procedure?

Page 27: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 27

Pitch tracking• Fundamental frequency ( → pitch)

is a key attribute of musical sounds→pitch tracking as a key technology

• Pitch tracking for speech- voice pitch & spectrum highly dynamic- speech is voiced and unvoiced ground truth?

• Applications- voice coders (excitation description)- harmonic modeling

Page 28: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 28

Pitch tracking for music

• Pitch in music- pitch is more stable (although vibrato)- but: multiple pitches

• Applications- harmonic modeling- music transcription (→ storage, resynthesis)- source separation

• Approaches: “place” & “time”

Time

Fre

quen

cy

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

1000

2000

3000

4000

??

Page 29: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 29

Meddis & Hewitt pitch model

• Autocorrelation (time) based pitch extraction- fundamental period → peak(s) in autocorrelation

• Compute separately in each frequency band& ‘summarize’ across (perceptual) channels

x t( ) x t T+( )≈ r xx T( ) x t( )x t T+( )∫= max≈→

0 20 40 60 80 100-0.2-0.1

00.10.2

0 20 40 60 80 100-10123

time / samples lag / samples

Waveform x[n] Autocorrelation rxx[l]

80

328

866

1924

4000CF / Hz Autocorrelogram

2.5 5.0 7.5 10.0 12.50

1

lag / ms

Ban

dpas

sfil

ters

Rec

tific

atio

n &

low

-pas

s fil

ter

Periodicity detection

Cross-channel sum

sound

Summary ACG

Page 30: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 30

Tolonen & Karjalainen simplification• Multiple frequency channels can have different

pitches dominant...

• But equalizing (flattening) the spectrum works:

→ Summary AC as a function of time:

- ‘Enhancement’ = cancel subharmonics

Pre-whiteningsound

Highpass@ 1kHz

Rectify &low-pass

Lowpass@ 1kHz

Periodicitydetection

SACFenhance

ESACFPeriodicitydetection

+

time/s

50

100

200

400

1000f/Hz Periodogram for M/F voice mix

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8Summary autocorrelation at t=0.775 s

0.001 0.002 0.003 0.004 0.006 0.01 0.02 lag/s

200 Hz(0.005s)

125 Hz(0.008s)

lag vs.freq?

Page 31: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 31

Post-processing of pitch tracks

• Remove outliers with median filtering

• Octave errors are common:- if x(t) ≈ x(t + T ) then x(t) ≈ x(t + 2T ) etc.

→ dynamic programming/HMM

• Validity- “is there a pitch at this time?”- voiced/unvoiced decision for speech

• Event detection- when does a pitch slide indicate a new note?

time

5-pt median

Page 32: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 32

Music and nonspeech

Music synthesis techniques

Sinewave synthesis

Music analysis

Transcription- Bottom-up and top-down- Transcription from sinewave models

1

2

3

4

5

Page 33: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 33

Transcription

• Basic idea: Recover the score

• Is it possible? Why is it hard?- music students do it

... but they are highly trained; know the rules

• Motivations- for study: what was played?- highly compressed representation (e.g. MIDI)- the ultimate restoration system...

5

Time

Fre

quen

cy

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

1000

2000

3000

4000

Page 34: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 34

Transcription framework

• Recover discrete events to explain signal

- analysis-by-synthesis?

• Exhaustive search?- would be possible given exact note waveforms - .. or just a 2-dimensional ‘note’ template?

but superposition is not linear in |STFT| space

• Inference depends on all detected notes- is this evidence ‘available’ or ‘used’?- full solution is exponentially complex

Note events{tk, pk, ik}

ObservationsX[k,n]synthesis

?

notetemplate

2-Dconvolution

Page 35: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 35

Bottom-up versus top-down

• Bottom-up: observ’n directly gives description- e.g. peaks in 2-D convolution- but: few domains are that ‘linear’

• Top-down: pursue & confirm hypotheses- e.g. analysis-by-resynthesis matching- but: need to limit search space

• Generally, need to do both:

- bottom-up guides & limits search- top-down resolves ambiguities in low-levelhow to transcribe?

Abstractconstraints

guidance

tests

Rawobservations

Data-drivenanalyses

Hypothesissearch

Page 36: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 36

Transcription from sinewave models

• Form sinusoid model- as with synthesis, but signal

is more complex

• Break tracks- need to detect new ‘onset’

at single frequencies

• Group by onset & common harmonicity- find sets of tracks that start

around the same time- + stable harmonic pattern

• Pass on to constraint-based filtering...

time / s

freq

/ H

z

0 1 2 3 40

500

1000

1500

2000

2500

3000

bu/td? mistakes?

0 0.5 1 1.5 time / s0

0.020.040.06

Page 37: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 37

Problems for transcription

• Music is practically worst case!- note events are often synchronized

→ defeats common onset- notes have harmonic relations (2:3 etc.)

→ collision/interference between harmonics- variety of instruments, techniques, ...

• Listeners are very sensitive to certain errors- .. and impervious to others

• Apply further constraints- like our ‘music student’- maybe even the whole score (Scheirer)!

Page 38: 5 4 3 2 1 Music analysis Sinewave synthesis nonspeech Lecture 7dpwe/classes/e6820-2001-01/lecnotes/... · 2001-03-27 · E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 1 EE E6820:

E6820 SAPR - Music A & S - Dan Ellis 2001-03-06 - 38

Summary

• ‘Nonspeech audio’- i.e. sound in general- characteristics: ecological

• Music synthesis- control of pitch, duration, loudness, articulation- evolution of techniques- sinusoids + noise + transients

• Music analysis- different aspects: instruments, pitches,

performance- transcription complications:

representation, octaves, onsets, ...- rely on high-level structural constraints

and beyond?