primitive probabilistic auditory scene analysis -...
TRANSCRIPT
![Page 1: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/1.jpg)
Primitive probabilistic auditory scene analysis
Richard E. Turner ([email protected])
Computational and Biological Learning Lab, University of Cambridge
Maneesh Sahani ([email protected])
Gatsby Computational Neuroscience Unit, UCL, London
![Page 2: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/2.jpg)
Motivation
object
parts of objects
structural primitives
sensoryinput
![Page 3: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/3.jpg)
Motivation
object
parts of objects
structural primitives
sensoryinput
trees
bark
edges
photoreceptoractivities
![Page 4: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/4.jpg)
Motivation
object
parts of objects
structural primitives
sensoryinput
trees
bark
edges
photoreceptoractivities
bird song
motif
AM tones and noise
auditory filteractivities
![Page 5: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/5.jpg)
Auditory Scene Analysis
object
parts of objects
structural primitives
sensoryinput
trees
bark
edges
photoreceptoractivities
bird song
motif
AM tones and noise
auditory filteractivities
superschema−based
grouping
schema−basedgrouping
primitivegrouping
![Page 6: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/6.jpg)
Generative model: Probabilistic Auditory Scene Synthesis
trees
bark
edges
photoreceptoractivities
bird song
motif
AM tones and noise
auditory filteractivities
superschema−based
grouping
schema−basedgrouping
primitivegrouping
p(source)
p(parts|source)
p(primitive|part)
p(sound|primitives)
![Page 7: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/7.jpg)
Recognition model: Probabilistic Auditory Scene Analysis
bird song
motif
AM tones and noise
auditory filteractivities
superschema−based
grouping
schema−basedgrouping
primitivegrouping
p(source)
p(parts|source)
p(primitive|part)
p(sound|primitives)
p(source|sound)
p(part|sound)
p(primitive|sound)
sound waveform
![Page 8: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/8.jpg)
Primitive Probabilistic Auditory Scene Analysis
bird song
motif
AM tones and noise
auditory filteractivities
superschema−based
grouping
schema−basedgrouping
primitivegrouping
p(source)
p(parts|source)
p(primitive|part)
p(sound|primitives)
p(source|sound)
p(part|sound)
p(primitive|sound)
sound waveform
![Page 9: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/9.jpg)
Primitive Probabilistic Auditory Scene Analysis
Part 1: Probabilistic generative model: primitive auditory scenesynthesis
What are the important low-level statistics of natural sounds?
Part 2: Probabilistic recognition model: primitive auditoryscene analysis
Provocative computational theory: Grouping rules arise frominferences based on the statistics of natural sounds.
![Page 10: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/10.jpg)
Primitive Probabilistic Auditory Scene Analysis
Part 1: Probabilistic generative model: primitive auditory scenesynthesis
What are the important low-level statistics of natural sounds?
Part 2: Probabilistic recognition model: primitive auditoryscene analysis
Provocative computational theory: Grouping rules arise frominferences based on the statistics of natural sounds.
![Page 11: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/11.jpg)
Primitive Probabilistic Auditory Scene Analysis
Part 1: Probabilistic generative model: primitive auditory scenesynthesis
What are the important low-level statistics of natural sounds?
Part 2: Probabilistic recognition model: primitive auditoryscene analysis
Provocative computational theory: Grouping rules arise frominferences based on the statistics of natural sounds.
![Page 12: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/12.jpg)
Heuristic Analysis: Fire sound
y
![Page 13: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/13.jpg)
Heuristic Analysis: Fire sound
y
−202
4.1
KH
z
−202
2.4
KH
z
−202
1 K
Hz
0.5 0.52 0.54 0.56 0.58 0.6 0.62−2
02
time /s
0.4
KH
z
filter
![Page 14: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/14.jpg)
Heuristic Analysis: Fire sound
y
−202
4.1
KH
z
−202
2.4
KH
z
−202
1 K
Hz
0.5 0.52 0.54 0.56 0.58 0.6 0.62−2
02
time /s
0.4
KH
z
filter and demodulate
![Page 15: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/15.jpg)
Heuristic Analysis: Fire sound
![Page 16: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/16.jpg)
Heuristic Analysis: Fire sound
![Page 17: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/17.jpg)
Heuristic Analysis: Water
![Page 18: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/18.jpg)
Heuristic Analysis: Rain
![Page 19: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/19.jpg)
Heuristic Analysis: Speech
![Page 20: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/20.jpg)
Summary
sound → filter bank → demodulate → transformed envelopes
• Important statistics include
– energy in sub-bands (power-spectrum)– patterns of co-modulation (local power-sharing)– time-scale of the modulation– depth of the modulation (sparsity)
• Josh’s texture synthesis work tomorrow
• Formulate a probabilistic model to capture these statistics:
sound ← modulate carriers ← modulators ← transformed envelopes
Structural Primitives = Co-modulated narrow-band processes
![Page 21: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/21.jpg)
Summary
sound → filter bank → demodulate → transformed envelopes
• Important statistics include
– energy in sub-bands (power-spectrum)– patterns of co-modulation (local power-sharing)– time-scale of the modulation– depth of the modulation (sparsity)
• Josh’s texture synthesis work tomorrow
• Formulate a probabilistic model to capture these statistics:
sound ← modulate carriers ← modulators ← transformed envelopes
Structural Primitives = Co-modulated narrow-band processes
![Page 22: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/22.jpg)
Summary
sound → filter bank → demodulate → transformed envelopes
• Important statistics include
– energy in sub-bands (power-spectrum)– patterns of co-modulation (local power-sharing)– time-scale of the modulation– depth of the modulation (sparsity)
• Josh’s texture synthesis work tomorrow
• Formulate a probabilistic model to capture these statistics:
sound ← modulate carriers ← modulators ← transformed envelopes
Structural Primitives = Co-modulated narrow-band processes
![Page 23: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/23.jpg)
Model Schematic
0 0.05 0.1
0
time
y t
0 0.05 0.1
0
time0 0.05 0.1
0
time
0
a 1,t
0
a D,t
0c 1,t
0c D,t
=
...
...
+ ... +
× ×
= =
![Page 24: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/24.jpg)
Model Schematic
0 0.05 0.1
0
time
y t
0 0.05 0.1
0
time0 0.05 0.1
0
time
0
a 1,t
0
a D,t
0c 1,t
0c D,t
0x 1,t
0x D,t
=
...
...
+ ... +
× ×
= =
...
![Page 25: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/25.jpg)
Probabilistic Model
• Transformed envelopes: slow and +/–
p(xe,1:T |Γe,1:T,1:T ) = Norm (xe,1:T ; 0,Γe,1:T,1:T )
Γe,t−t′ = σ2e exp
(− 1
2τ2e
(t− t′)2)
• Envelopes: positive mixture of transformed envelopes with controllable sparsity
ad,t = a
(E∑e=1
gd,exe,t + µd
), e.g. a(z) = log(1 + exp(z)).
• Carriers: band-limited Gaussian noise
p(cd,t|cd,t−1:t−2, θ) = Norm
(cd,t;
2∑t′=1
λd,t′cd,t−t′, σ2d
)
yt =D∑d=1
cd,tad,t + σyεt
![Page 26: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/26.jpg)
Probabilistic Model
• Transformed envelopes: slow and +/–
p(xe,1:T |Γe,1:T,1:T ) = Norm (xe,1:T ; 0,Γe,1:T,1:T )
Γe,t−t′ = σ2e exp
(− 1
2τ2e
(t− t′)2)
• Envelopes: positive mixture of transformed envelopes with controllable sparsity
ad,t = a
(E∑e=1
gd,exe,t + µd
), e.g. a(z) = log(1 + exp(z)).
• Carriers: band-limited Gaussian noise
p(cd,t|cd,t−1:t−2, θ) = Norm
(cd,t;
2∑t′=1
λd,t′cd,t−t′, σ2d
)
yt =D∑d=1
cd,tad,t + σyεt
![Page 27: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/27.jpg)
Probabilistic Model
• Transformed envelopes: slow and +/–
p(xe,1:T |Γe,1:T,1:T ) = Norm (xe,1:T ; 0,Γe,1:T,1:T )
Γe,t−t′ = σ2e exp
(− 1
2τ2e
(t− t′)2)
• Envelopes: positive mixture of transformed envelopes with controllable sparsity
ad,t = a
(E∑e=1
gd,exe,t + µd
), e.g. a(z) = log(1 + exp(z)).
• Carriers: band-limited Gaussian noise
p(cd,t|cd,t−1:t−2, θ) = Norm
(cd,t;
2∑t′=1
λd,t′cd,t−t′, σ2d
)
yt =D∑d=1
cd,tad,t + σyεt
![Page 28: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/28.jpg)
Probabilistic Model
• Transformed envelopes: slow and +/–
p(xe,1:T |Γe,1:T,1:T ) = Norm (xe,1:T ; 0,Γe,1:T,1:T )
Γe,t−t′ = σ2e exp
(− 1
2τ2e
(t− t′)2)
• Envelopes: positive mixture of transformed envelopes with controllable sparsity
ad,t = a
(E∑e=1
gd,exe,t + µd
), e.g. a(z) = log(1 + exp(z)).
• Carriers: band-limited Gaussian noise
p(cd,t|cd,t−1:t−2, θ) = Norm
(cd,t;
2∑t′=1
λd,t′cd,t−t′, σ2d
)
yt =D∑d=1
cd,tad,t + σyεt
![Page 29: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/29.jpg)
What do the parameters control?
λd control the centre frequency and bandwidth of each sub-band
σ2d control the power in each sub-band
τe controls time-scale of modulators
µd and σ2e control the modulation depth, skew and the sparsity
gd,e control the patterns of modulation across sub-bands
0 0.1 0.2 0.3 0.40
0.1
0.2
0.3
0.4
0.5
centre frequency
band
wid
th
![Page 30: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/30.jpg)
Intuitions for role of model parameters: Generation
![Page 31: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/31.jpg)
Generation: Increasing sparsity
![Page 32: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/32.jpg)
Generation: Increasing sparsity
![Page 33: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/33.jpg)
Generation: Increasing sparsity
![Page 34: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/34.jpg)
Generation: Increasing sparsity
![Page 35: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/35.jpg)
Generation: Increasing sparsity
![Page 36: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/36.jpg)
Generation: Increasing sparsity
![Page 37: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/37.jpg)
Generation: Increasing co-modulation
![Page 38: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/38.jpg)
Generation: Increasing co-modulation
![Page 39: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/39.jpg)
Generation: Increasing co-modulation
![Page 40: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/40.jpg)
Generation: Increasing co-modulation
![Page 41: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/41.jpg)
Generation: Increasing co-modulation
![Page 42: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/42.jpg)
Generation: Increasing co-modulation
![Page 43: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/43.jpg)
Generation: Increasing co-modulation
![Page 44: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/44.jpg)
Generation: Decreasing time-scale
![Page 45: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/45.jpg)
Generation: Decreasing time-scale
![Page 46: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/46.jpg)
Generation: Decreasing time-scale
![Page 47: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/47.jpg)
Generation: Decreasing time-scale
![Page 48: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/48.jpg)
Generation: Decreasing time-scale
![Page 49: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/49.jpg)
Generation: Decreasing time-scale
![Page 50: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/50.jpg)
Generation: Decreasing time-scale
![Page 51: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/51.jpg)
Inference and Learning
• Key Observation: Fix envelopes, posterior over carriers is Gaussian.
• Leads to MAP estimation of the envelopes (like HMMs, ICA, NMF)
XMAP = arg maxX
p(X|Y)
p(X|Y) =1Zp(X,Y) =
1Z
∫dCp(X,Y,C) =
1Zp(X)
∫dCp(Y|A,C)p(C)
• Compute integral efficiently using Kalman Smoothing
• Leads to gradient based optimisation for transformed amplitudes
• Learning: approximate Maximum Likelihood θ = arg maxθ p(Y|θ)
• See Turner and Sahani, ICASSP 2010.
![Page 52: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/52.jpg)
Inference and Learning
• Key Observation: Fix envelopes, posterior over carriers is Gaussian.
• Leads to MAP estimation of the envelopes (like HMMs, ICA, NMF)
XMAP = arg maxX
p(X|Y)
p(X|Y) =1Zp(X,Y) =
1Z
∫dCp(X,Y,C) =
1Zp(X)
∫dCp(Y|A,C)p(C)
• Compute integral efficiently using Kalman Smoothing
• Leads to gradient based optimisation for transformed amplitudes
• Learning: approximate Maximum Likelihood θ = arg maxθ p(Y|θ)
• See Turner and Sahani, ICASSP 2010.
![Page 53: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/53.jpg)
Synthetic Sounds
Stream Stream Stream Wind Wind Fire Fire Rain Twigs Speech
Orig Orig Orig Orig Orig Orig Orig Orig Orig Orig
Model Model Model Model Model Model Model Model Model Model
![Page 54: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/54.jpg)
Primitive Probabilistic Auditory Scene Analysis
Part 1: Probabilistic generative model: primitive auditory scenesynthesis
What are the important low-level statistics of natural sounds?
Part 2: Probabilistic recognition model: primitive auditoryscene analysis
Provocative computational theory: Grouping rules arise frominferences based on the statistics of natural sounds.
![Page 55: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/55.jpg)
Introduction to the psychophysics
• Probe model with stimuli used to study psychophysics of auditory sceneanalysis
• Fix carriers to be ‘auditory filter’ like
• Assume perceptual access to:
– instantaneous frequency of the carriers (not the phase)– modulators
How does the model ‘hear’ these stimuli?
![Page 56: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/56.jpg)
Continuity Illusion
yt = a1,tc1,t + a2,tc2,t
−5
0
5
sign
al
0
100
freq
uenc
y /H
z
![Page 57: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/57.jpg)
Continuity Illusion
yt = a1,tc1,t + a2,tc2,t
−5
0
5
sign
al
0
100
freq
uenc
y /H
z
![Page 58: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/58.jpg)
Continuity Illusion
yt = a1,tc1,t + a2,tc2,t
−5
0
5
sign
al
0
100
freq
uenc
y /H
z
−1
0
1
ytone
0 0.5 1 1.5
−5
0
5
ynois
e
time /s
![Page 59: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/59.jpg)
Common Amplitude Modulation
yt = (a1,t + a3,t)c1,t + (a2,t + a3,t)c2,t
0
sign
al
0 0.5 1 1.5 2 2.5 3 3.5 4
100
150
200
time /s
freq
uenc
y /H
z
![Page 60: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/60.jpg)
Common Amplitude Modulation
yt = (a1,t + a3,t)c1,t + (a2,t + a3,t)c2,t
0
sign
al
0 0.5 1 1.5 2 2.5 3 3.5 4
100
150
200
time /s
freq
uenc
y /H
z
![Page 61: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/61.jpg)
Common Amplitude Modulation
yt = (a1,t + a3,t)c1,t + (a2,t + a3,t)c2,t
0
sign
al
0 0.5 1 1.5 2 2.5 3 3.5 4
100
150
200
time /s
freq
uenc
y /H
z
![Page 62: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/62.jpg)
Old Plus New Heuristic
yt = a1,t(c1,t + c2,t + c3,t) + a2,t(c2,t + c4,t) + a3,tc2,t + a4,t(c1,t + c3,t)
−2
0
2
0
200
400
freq
uenc
y /H
z
![Page 63: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/63.jpg)
Old Plus New Heuristic
yt = a1,t(c1,t + c2,t + c3,t) + a2,t(c2,t + c4,t) + a3,tc2,t + a4,t(c1,t + c3,t)
−2
0
2
0
200
400
freq
uenc
y /H
z
![Page 64: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/64.jpg)
Old Plus New Heuristic
yt = a1,t(c1,t + c2,t + c3,t) + a2,t(c2,t + c4,t) + a3,tc2,t + a4,t(c1,t + c3,t)
−2
0
2
0
400
freq
uenc
y /H
z
−202
a 1
−202
a 2
−101
a 3
0 0.5 1 1.5 2 2.5 3 3.5 4−2
02
a 4
time /s
![Page 65: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/65.jpg)
Comodulation Masking Release
yt = a1,tc1,t + a2,tc2,t
−4−2
024
signal 1
80
100
120
freq
uenc
y /H
z
signal 2
![Page 66: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/66.jpg)
Comodulation Masking Release
yt = a1,tc1,t + a2,tc2,t
−4−2
024
signal 1
80
100
120
freq
uenc
y /H
z
signal 2
![Page 67: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/67.jpg)
Comodulation Masking Release
yt = a1,tc1,t + a2,tc2,t
−4−2
024
signal 1
80
100
120
freq
uenc
y /H
z
signal 2
−4−2
024
ynois
e
0 0.5 1 1.5 2−2
0
2
ytone
time /s0 0.5 1 1.5 2
−2
0
2
time /s
![Page 68: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/68.jpg)
Good Continuation
yt = a1,tc1,t + a2,tc2,t
0
sign
al 1
0
50
100
freq
uenc
y /H
z
0
sign
al 2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 80
120
160
time /s
freq
uenc
y /H
z
![Page 69: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/69.jpg)
Good Continuation
yt = a1,tc1,t + a2,tc2,t
0
sign
al 1
0
50
100
freq
uenc
y /H
z
0
sign
al 2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 80
120
160
time /s
freq
uenc
y /H
z
![Page 70: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/70.jpg)
Good Continuation
yt = a1,tc1,t + a2,tc2,t
0
sign
al 1
0
50
100
freq
uenc
y /H
z
0
sign
al 2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5160
240
320
time /s
freq
uenc
y /H
z
![Page 71: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/71.jpg)
Proximity
yt = a1,tc1,t + a2,tc2,t
−1
0
1
sign
al 1
0
100
freq
uenc
y /H
z
−1
0
1
sign
al 2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0
100
time /s
freq
uenc
y /H
z
![Page 72: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/72.jpg)
Proximity
yt = a1,tc1,t + a2,tc2,t
−1
0
1
sign
al 1
0
100
freq
uenc
y /H
z
−1
0
1
sign
al 2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0
100
time /s
freq
uenc
y /H
z
![Page 73: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/73.jpg)
Proximity
yt = a1,tc1,t + a2,tc2,t
−1
0
1
sign
al 1
0
100
freq
uenc
y /H
z
−1
0
1
sign
al 2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0
100
time /s
freq
uenc
y /H
z
![Page 74: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/74.jpg)
Conclusions
• Developed a model for natural sounds comprising quickly varying carriers andslowly varying modulators
• Inspired by natural sound statistics
• Inference replicates many of the characteristics of auditory scene analysis
Future work
• Train on a large corpus of natural sounds: quantitatively predictpsychophysical results
• Trading between grouping principles learned from natural sounds
• Psychophysics: prediction of grouping on new stimuli
• Experimental stimulus generation: samples from the model are controlled,but natural sounding
![Page 75: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/75.jpg)
• Model extensions: Hierarchical (cascades of modulators), asymmetries, moregeneral filter-shapes
![Page 76: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/76.jpg)
Additional Slides
![Page 77: Primitive probabilistic auditory scene analysis - UCLturner/PublicTalks/MachineAudition2010/MacAu...Primitive probabilistic auditory scene analysis Richard E. Turner (ret26@cam.ac.uk)](https://reader033.vdocuments.site/reader033/viewer/2022051509/5af8de3a7f8b9a5f588d5825/html5/thumbnails/77.jpg)
Inference with fixed amplitudes, ad,t = 1
700 800 900 1000 1100 1200 1300 1400 1500 1600frequency /Hz
carr
ier
spec
tra
filte
r 3
filte
r 2
filte
r 1
5.35 5.36 5.37 5.38 5.39 5.4time /s
y
5.35 5.36 5.37 5.38 5.39 5.4time /s