linear predictive coding for speech compression
DESCRIPTION
Linear Predictive Coding for Speech Compression. Dev Ghosh ECE 463. 9 March 2006. Overview. General Model for Speech Synthesis Channel Vocoder Linear Predictive Coder (LPC-10) Code Excited Linear Prediction (CELP) Novel Application Sub-band adaptive filtering based on cochlear model. - PowerPoint PPT PresentationTRANSCRIPT
Linear Predictive Coding for Speech Compression
Dev GhoshECE 463
9 March 2006
Overview
General Model for Speech Synthesis Channel Vocoder Linear Predictive Coder (LPC-10) Code Excited Linear Prediction
(CELP) Novel Application
Sub-band adaptive filtering based on cochlear model
Model for Speech Synthesis Speech produced by forcing air through
vocal cords, larynx, pharynx, mouth and nose
At transmitter speech is divided into segments Each segment analyzed to determine excitation
signal and parameters of vocal tract filter
ExcitationSource
Vocal tractfilter
Speech
Channel Vocoder - analysis
Each segment of input speech analyzed by a bank of (bandpass) analysis filters
Energy at output of each filter is estimated 50 times a second and transmitted to receiver
Decision made whether segment voiced /a/, /e/, /o/ or unvoiced /s/, /f/
Estimate of pitch period (period of fundamental harmonic) is determined
Voice vs. Unvoiced Speech
Channel vocoder - synthesis
Vocal tract filter implemented by bank of (bandpass) synthesis filters For voiced segments, periodic pulse
generator is input For unvoiced segments, pseudonoise source
is input Period determined by pitch estimate Scaled by output of energy estimate First approach to speech compression
Linear Predictive Coder
Models vocal tract as a single linear filter
yn = ∑aiyn-i+Gn
Output: yn, Input: n, Gain: G Input is random noise (unvoiced)
or periodic pulse (voiced) LPC-10 is a standard (2.4 kb, 8000
Samples/sec)
LPC - Voiced/Unvoiced Decision
Voiced speech has more energy and lower frequency than unvoiced
Speech segment lowpass filtered, energy at output relative to background noise used to determine
Zero-crossings counted to determine frequency
Continuity critereon: voicing decision of neighboring frames taken into account
LPC - Estimating Pitch Period
Extracting pitch from short noisy segment is difficult
One approach is to maximize autocorrelation Periodicity isn’t strong enough Threshold can’t be used because
maximum value not known in advance
LPC - Estimating Pitch Period LPC-10 uses average magnitude difference
function (AMDF)AMDF(P) =(1/N)∑|yi-yi-P|
If {yn} is periodic with period P0, samples P0 apart will have values close to each other and AMDF will have a min at P0
AMDF is periodic for voiced and roughly flat for unvoiced
AMDF is min when P is the pitch period and spurious min in unvoiced segments are shallow
LPC - Obtaining Vocal Tract Filter
At transmitter, we want filter coeffs that best match the segment in a mean squared error
en2=(yn- ∑aiyn-i+Gn)2
Autocorrelation approach assumes {yn} is stationary
A = R-1P Recursive solution uses Levinson-
Durbin
LPC - Obtaining the Vocal Tract Filter
Covariance approach discards stationarity assumption (not valid for speech signals)
cij =E[yn-iyn-j]
yieldsCA = S
LPC - Obtaining the Vocal Tract Filter
cij are estimated as
cij = ∑yn-iyn-j
No longer assume values of yn outside of segment are zero
Cholesky decomposition required Reflection coeffs used to update
voicing decision
LPC - Transmitting Parameters
Tenth order filter used for voiced speech and fourth order for unvoiced
Vocal tract filter is sensitive to errors in reflection coeffs close to one
gi = (1+ki)/(1-ki)
are quantized and sent instead of ki
Code Excited Linear Prediction
Single pulse per pitch period leads to buzzy twang
Variety of excitation signals is allowed
For each segment encoder finds excitation vector that generates synthesized speech that best matches speech being coded
Sub-band adaptive filtering
Multi-channel speech enhancement system
Greater number of sub-bands used, the faster the convergence of the overall system
Cochlear Modelling
Sub-band filters are distributed logarithmically in frequency to approximate distribution of filters in cochlea
Adaptive Noise Cancellation
LMS algorithm is used to model differential transfer function between noise signals in a number of sub-bands
Lower power and shorter filters used in each sub-band
Convergence is equal across all bands if power is distributed equally and filter lengths are the same
Convergence dominated by sub-band with greatest power