04 efficient auditory coding
TRANSCRIPT
-
7/31/2019 04 Efficient Auditory Coding
1/35
Efficient auditory codingEvan Smith & Michael Lewicki (2006)
Presented by Yuanliang Meng
-
7/31/2019 04 Efficient Auditory Coding
2/35
Auditory system
-
7/31/2019 04 Efficient Auditory Coding
3/35
Basilar membrane in the cochlea A stiff structural element that separates two liquid-filled tubes that runalong the coil of the cochlea Frequecy dispersion: Sound input of certain frequency vibrates certain
locations of the basialr membrane more than other locations.
-
7/31/2019 04 Efficient Auditory Coding
4/35
Basilar membrane impulse response
-
7/31/2019 04 Efficient Auditory Coding
5/35
Optimal coding of an acoustic
waveform Goal: Predict optimal transformation of acoustic
waveform from statistics of the environment.
Find an efficient representation.
-
7/31/2019 04 Efficient Auditory Coding
6/35
Block-based representations: does not yield time-
relative codes
-
7/31/2019 04 Efficient Auditory Coding
7/35
Problems of block-based representations:It obscures transients and periodicities.Small time shifts can produce large changes
in the representation. e.g. Consonants in a
speech, onset of a gunshotCoding can be optimal only within a block.
Intuitively, we may want to increase theblock rate to reduce shift-sensitivity. Thisultimately leads to a continuous filterbank.
-
7/31/2019 04 Efficient Auditory Coding
8/35
Convolution representations
The representation is shift-invariant but it does not
reduce the information rate. Therefore it is a highly
inefficientcode.
-
7/31/2019 04 Efficient Auditory Coding
9/35
A sparse, shiftable kernel representation
The signal is decomposed in terms of discreteacoustic events, represented by the kernelfunctions m.
Each kernel function has a precise amplitude sm,iand temporal position m,i
The kernels could be assumed to be anyfunctions, such as gammatones; but they canalso be learned from data.
-
7/31/2019 04 Efficient Auditory Coding
10/35
The signal (A) is represented in the spikegram (B) as a set of
ovals whose size and intensity indicate the amplitude of the
spike. The position of the oval indicates the kernel center
frequency (CF, y-axis) and timing (x-axis). The kernel
functions corresponding to the spikes (represented by each
oval) are overlayed in gray.
-
7/31/2019 04 Efficient Auditory Coding
11/35
The word cateen IPA: [kntin ]
-
7/31/2019 04 Efficient Auditory Coding
12/35
Encoding algorithms
The computational objective is to minimizethe error while maximizing coding
efficiency. There is a tradeoff between the error and
the computational complexity.
There are many possible encodingalgorithms. Matching-pursuitis chosen.
-
7/31/2019 04 Efficient Auditory Coding
13/35
Matching pursuit
Iteratively approximate the input signal with successive orthogonalprojection onto kernels.
The projection with the largest inner product will minimize the powerofRx(t), thereby capturing the most structure possible given a single
kernel.
In each iteration, the kernel projection is subtracted from the signal,leading to a reduced residual.
-
7/31/2019 04 Efficient Auditory Coding
14/35
-
7/31/2019 04 Efficient Auditory Coding
15/35
-
7/31/2019 04 Efficient Auditory Coding
16/35
-
7/31/2019 04 Efficient Auditory Coding
17/35
-
7/31/2019 04 Efficient Auditory Coding
18/35
-
7/31/2019 04 Efficient Auditory Coding
19/35
-
7/31/2019 04 Efficient Auditory Coding
20/35
-
7/31/2019 04 Efficient Auditory Coding
21/35
-
7/31/2019 04 Efficient Auditory Coding
22/35
-
7/31/2019 04 Efficient Auditory Coding
23/35
-
7/31/2019 04 Efficient Auditory Coding
24/35
Learning
-
7/31/2019 04 Efficient Auditory Coding
25/35
Sounds
Animal vocalizations: Cornell Macaulay library
Natural sounds: vocal:transient:ambient=1.0:0.8:1.2
Speech sounds: TIMIT corpus
-
7/31/2019 04 Efficient Auditory Coding
26/35
-
7/31/2019 04 Efficient Auditory Coding
27/35
-
7/31/2019 04 Efficient Auditory Coding
28/35
Fidelity curves for Frourier, Daubachies wavelet, gammatone and spike codes.
-
7/31/2019 04 Efficient Auditory Coding
29/35
Compare the adapted kernels with revcor filters.
We can use spike-triggered average to estimate impulse response of
auditory nerve. These response functions are called revcor filters.Even though the kernels are optimized independentof revcor filters, they
turn out to be very similar.
-
7/31/2019 04 Efficient Auditory Coding
30/35
-
7/31/2019 04 Efficient Auditory Coding
31/35
The characteristics of sound influence the features of adapted
kernels.
Kernels learned from any of the three categories alone cannot reflect
the population distribution of revcor filters of natural sounds.
-
7/31/2019 04 Efficient Auditory Coding
32/35
However, speech sounds seem to represent natural
sounds much better than animal vocalization orenvironmental sounds!
-
7/31/2019 04 Efficient Auditory Coding
33/35
Some implications
Revcor filters have sharp onsets and decaying offsets. Ithad been assumed to be phenomenological, being aconsequence of the impulse response of the basilarmembrane. However, the learned kernels share the
same feature. So it may just be the nature of sounds. Most languages prefer CV structure and dislike VC structure.
(fast change in the beginning)
A gunshot is likely to produce a very fast changing onset. This model still does not explain the role of stimulus
intensity. Speech is a compromise of natural sounds.
-
7/31/2019 04 Efficient Auditory Coding
34/35
-
7/31/2019 04 Efficient Auditory Coding
35/35
Further question (not in the article)
There is the million dollar question in speech science: the lack ofinvariance in speech signal. Perception of sounds in a connected speech often requires
restructuring. Formants and other acoustic cues do not give you reliablerepresentations.
Speech conditions Speakers (males, females and children produce different sounds!)
Solutions: Motor theory: the invariance is rooted in motor control of articulators. Top-down processing: you have to anticipate what it is to perceive it
Maybe the invariance can be found in some better representations,like spike code? Lewicki is trying that, but no good results yet.