linear predictive coding for speech compression

Linear Predictive Coding for Speech Compression

Dev GhoshECE 463

9 March 2006

Overview

General Model for Speech Synthesis Channel Vocoder Linear Predictive Coder (LPC-10) Code Excited Linear Prediction

(CELP) Novel Application

Sub-band adaptive filtering based on cochlear model

Model for Speech Synthesis Speech produced by forcing air through

vocal cords, larynx, pharynx, mouth and nose

At transmitter speech is divided into segments Each segment analyzed to determine excitation

signal and parameters of vocal tract filter

ExcitationSource

Vocal tractfilter

Speech

Channel Vocoder - analysis

Each segment of input speech analyzed by a bank of (bandpass) analysis filters

Energy at output of each filter is estimated 50 times a second and transmitted to receiver

Decision made whether segment voiced /a/, /e/, /o/ or unvoiced /s/, /f/

Estimate of pitch period (period of fundamental harmonic) is determined

Voice vs. Unvoiced Speech

Channel vocoder - synthesis

Vocal tract filter implemented by bank of (bandpass) synthesis filters For voiced segments, periodic pulse

generator is input For unvoiced segments, pseudonoise source

is input Period determined by pitch estimate Scaled by output of energy estimate First approach to speech compression

Linear Predictive Coder

Models vocal tract as a single linear filter

yn = ∑aiyn-i+Gn

Output: yn, Input: n, Gain: G Input is random noise (unvoiced)

or periodic pulse (voiced) LPC-10 is a standard (2.4 kb, 8000

Samples/sec)

LPC - Voiced/Unvoiced Decision

Voiced speech has more energy and lower frequency than unvoiced

Speech segment lowpass filtered, energy at output relative to background noise used to determine

Zero-crossings counted to determine frequency

Continuity critereon: voicing decision of neighboring frames taken into account

LPC - Estimating Pitch Period

Extracting pitch from short noisy segment is difficult

One approach is to maximize autocorrelation Periodicity isn’t strong enough Threshold can’t be used because

maximum value not known in advance

LPC - Estimating Pitch Period LPC-10 uses average magnitude difference

function (AMDF)AMDF(P) =(1/N)∑|yi-yi-P|

If {yn} is periodic with period P0, samples P0 apart will have values close to each other and AMDF will have a min at P0

AMDF is periodic for voiced and roughly flat for unvoiced

AMDF is min when P is the pitch period and spurious min in unvoiced segments are shallow

LPC - Obtaining Vocal Tract Filter

At transmitter, we want filter coeffs that best match the segment in a mean squared error

en2=(yn- ∑aiyn-i+Gn)2

Autocorrelation approach assumes {yn} is stationary

A = R-1P Recursive solution uses Levinson-

Durbin

LPC - Obtaining the Vocal Tract Filter

Covariance approach discards stationarity assumption (not valid for speech signals)

cij =E[yn-iyn-j]

yieldsCA = S

LPC - Obtaining the Vocal Tract Filter

cij are estimated as

cij = ∑yn-iyn-j

No longer assume values of yn outside of segment are zero

Cholesky decomposition required Reflection coeffs used to update

voicing decision

LPC - Transmitting Parameters

Tenth order filter used for voiced speech and fourth order for unvoiced

Vocal tract filter is sensitive to errors in reflection coeffs close to one

gi = (1+ki)/(1-ki)

are quantized and sent instead of ki

Code Excited Linear Prediction

Single pulse per pitch period leads to buzzy twang

Variety of excitation signals is allowed

For each segment encoder finds excitation vector that generates synthesized speech that best matches speech being coded

Sub-band adaptive filtering

Multi-channel speech enhancement system

Greater number of sub-bands used, the faster the convergence of the overall system

Cochlear Modelling

Sub-band filters are distributed logarithmically in frequency to approximate distribution of filters in cochlea

Adaptive Noise Cancellation

LMS algorithm is used to model differential transfer function between noise signals in a number of sub-bands

Lower power and shorter filters used in each sub-band

Convergence is equal across all bands if power is distributed equally and filter lengths are the same

Convergence dominated by sub-band with greatest power

linear predictive coding for speech compression

Documents