04 efficient auditory coding

Upload: vui-le-ba

Post on 05-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 04 Efficient Auditory Coding

    1/35

    Efficient auditory codingEvan Smith & Michael Lewicki (2006)

    Presented by Yuanliang Meng

  • 7/31/2019 04 Efficient Auditory Coding

    2/35

    Auditory system

  • 7/31/2019 04 Efficient Auditory Coding

    3/35

    Basilar membrane in the cochlea A stiff structural element that separates two liquid-filled tubes that runalong the coil of the cochlea Frequecy dispersion: Sound input of certain frequency vibrates certain

    locations of the basialr membrane more than other locations.

  • 7/31/2019 04 Efficient Auditory Coding

    4/35

    Basilar membrane impulse response

  • 7/31/2019 04 Efficient Auditory Coding

    5/35

    Optimal coding of an acoustic

    waveform Goal: Predict optimal transformation of acoustic

    waveform from statistics of the environment.

    Find an efficient representation.

  • 7/31/2019 04 Efficient Auditory Coding

    6/35

    Block-based representations: does not yield time-

    relative codes

  • 7/31/2019 04 Efficient Auditory Coding

    7/35

    Problems of block-based representations:It obscures transients and periodicities.Small time shifts can produce large changes

    in the representation. e.g. Consonants in a

    speech, onset of a gunshotCoding can be optimal only within a block.

    Intuitively, we may want to increase theblock rate to reduce shift-sensitivity. Thisultimately leads to a continuous filterbank.

  • 7/31/2019 04 Efficient Auditory Coding

    8/35

    Convolution representations

    The representation is shift-invariant but it does not

    reduce the information rate. Therefore it is a highly

    inefficientcode.

  • 7/31/2019 04 Efficient Auditory Coding

    9/35

    A sparse, shiftable kernel representation

    The signal is decomposed in terms of discreteacoustic events, represented by the kernelfunctions m.

    Each kernel function has a precise amplitude sm,iand temporal position m,i

    The kernels could be assumed to be anyfunctions, such as gammatones; but they canalso be learned from data.

  • 7/31/2019 04 Efficient Auditory Coding

    10/35

    The signal (A) is represented in the spikegram (B) as a set of

    ovals whose size and intensity indicate the amplitude of the

    spike. The position of the oval indicates the kernel center

    frequency (CF, y-axis) and timing (x-axis). The kernel

    functions corresponding to the spikes (represented by each

    oval) are overlayed in gray.

  • 7/31/2019 04 Efficient Auditory Coding

    11/35

    The word cateen IPA: [kntin ]

  • 7/31/2019 04 Efficient Auditory Coding

    12/35

    Encoding algorithms

    The computational objective is to minimizethe error while maximizing coding

    efficiency. There is a tradeoff between the error and

    the computational complexity.

    There are many possible encodingalgorithms. Matching-pursuitis chosen.

  • 7/31/2019 04 Efficient Auditory Coding

    13/35

    Matching pursuit

    Iteratively approximate the input signal with successive orthogonalprojection onto kernels.

    The projection with the largest inner product will minimize the powerofRx(t), thereby capturing the most structure possible given a single

    kernel.

    In each iteration, the kernel projection is subtracted from the signal,leading to a reduced residual.

  • 7/31/2019 04 Efficient Auditory Coding

    14/35

  • 7/31/2019 04 Efficient Auditory Coding

    15/35

  • 7/31/2019 04 Efficient Auditory Coding

    16/35

  • 7/31/2019 04 Efficient Auditory Coding

    17/35

  • 7/31/2019 04 Efficient Auditory Coding

    18/35

  • 7/31/2019 04 Efficient Auditory Coding

    19/35

  • 7/31/2019 04 Efficient Auditory Coding

    20/35

  • 7/31/2019 04 Efficient Auditory Coding

    21/35

  • 7/31/2019 04 Efficient Auditory Coding

    22/35

  • 7/31/2019 04 Efficient Auditory Coding

    23/35

  • 7/31/2019 04 Efficient Auditory Coding

    24/35

    Learning

  • 7/31/2019 04 Efficient Auditory Coding

    25/35

    Sounds

    Animal vocalizations: Cornell Macaulay library

    Natural sounds: vocal:transient:ambient=1.0:0.8:1.2

    Speech sounds: TIMIT corpus

  • 7/31/2019 04 Efficient Auditory Coding

    26/35

  • 7/31/2019 04 Efficient Auditory Coding

    27/35

  • 7/31/2019 04 Efficient Auditory Coding

    28/35

    Fidelity curves for Frourier, Daubachies wavelet, gammatone and spike codes.

  • 7/31/2019 04 Efficient Auditory Coding

    29/35

    Compare the adapted kernels with revcor filters.

    We can use spike-triggered average to estimate impulse response of

    auditory nerve. These response functions are called revcor filters.Even though the kernels are optimized independentof revcor filters, they

    turn out to be very similar.

  • 7/31/2019 04 Efficient Auditory Coding

    30/35

  • 7/31/2019 04 Efficient Auditory Coding

    31/35

    The characteristics of sound influence the features of adapted

    kernels.

    Kernels learned from any of the three categories alone cannot reflect

    the population distribution of revcor filters of natural sounds.

  • 7/31/2019 04 Efficient Auditory Coding

    32/35

    However, speech sounds seem to represent natural

    sounds much better than animal vocalization orenvironmental sounds!

  • 7/31/2019 04 Efficient Auditory Coding

    33/35

    Some implications

    Revcor filters have sharp onsets and decaying offsets. Ithad been assumed to be phenomenological, being aconsequence of the impulse response of the basilarmembrane. However, the learned kernels share the

    same feature. So it may just be the nature of sounds. Most languages prefer CV structure and dislike VC structure.

    (fast change in the beginning)

    A gunshot is likely to produce a very fast changing onset. This model still does not explain the role of stimulus

    intensity. Speech is a compromise of natural sounds.

  • 7/31/2019 04 Efficient Auditory Coding

    34/35

  • 7/31/2019 04 Efficient Auditory Coding

    35/35

    Further question (not in the article)

    There is the million dollar question in speech science: the lack ofinvariance in speech signal. Perception of sounds in a connected speech often requires

    restructuring. Formants and other acoustic cues do not give you reliablerepresentations.

    Speech conditions Speakers (males, females and children produce different sounds!)

    Solutions: Motor theory: the invariance is rooted in motor control of articulators. Top-down processing: you have to anticipate what it is to perceive it

    Maybe the invariance can be found in some better representations,like spike code? Lewicki is trying that, but no good results yet.