![Page 1: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/1.jpg)
CODING OF SPEECH SIGNALS
Waveform Coding Analysis-Synthesis or
Vocoders
Hybrid Coding
Speech Coding
Waveform Coding: an attempt is made to preserve the original waveform.
Vocoders: a theoretical model of the speech production mechanism is
considered.
Hybrid Coding: uses techniques from the other two.
![Page 2: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/2.jpg)
Speech Coders
![Page 3: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/3.jpg)
Speech Quality Vs Bit Rate for
Codecs
From: J. Wooward, “Speech coding overview”,
http://www-mobile.ecs.soton.ac.uk/speech_codecs
![Page 4: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/4.jpg)
Speech Coding Objectives
– High perceived quality
– High measured intelligibility
– Low bit rate (bits per second of speech)
– Low computational requirement (MIPS)
– Robustness to successive encode/decode cycles
– Robustness to transmission errors
Objectives for real-time only:
– Low coding/decoding delay (ms)
– Work with non-speech signals (e.g. touch tone)
![Page 5: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/5.jpg)
Speech Information Rates• Fundamental level:
• 10-15 phonemes/second for continuous speech.
• 32-64 phonemes per language => 6 bits/phoneme.
• Information Rate=60-90 bps at the source.
• Waveform level
• Speech bandwidth from 4 – 10 kHz => sampling rate from 8 –20 kHz.
• Need 12-16 bit quantization for high quality digital coding.
• Information Rate=96-320 kbps => more than 3 orders of magnitude difference in Information rates between the production and waveform levels.
![Page 6: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/6.jpg)
MOS (Mean Opinion Scores)
• Why MOS:– SNR is just not good enough as a subjective measure for
most coders (especially model-based coders where waveform is not preserved inherently)
– noise is not simple white (uncorrelated) noise
– error is signal correlated
• clicks/transients
• frequency dependent spectrum—not white
• includes components due to reverberation and echo
• noise comes from at least two sources, namely quantization and background noise
• delay due to transmission, block coding, processing
• transmission bit errors—can use Unequal Protection Methods
• tandem encodings
![Page 7: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/7.jpg)
MOS Ratings• Subjective evaluation of speech quality
– 5—excellent
– 4—good
– 3—fair
– 2—poor
– 1—bad
• MOS Scores:– 4.5 for natural wideband speech
– 4.05 for toll quality telephone speech
– 3.5-4.0 for communications quality telephone speech
– 2.0-3.5 for lower quality speech from synthesizers, low bit rate coders, etc
• other measures of intelligibility– DRT-diagnostic rhyme test => uses rhyming words
– DAM-diagnostic acceptability measure
![Page 8: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/8.jpg)
Digital Speech Coding
![Page 9: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/9.jpg)
Sampling Theorem • Theorem: If the highest frequency contained in an
analog signal xa(t) is Fmax = B, and the signal is sampled at a frequency Fs > 2B, then the analog signal can be exactly recovered from its samples using the following reconstruction formula:
• Note that at the original sample instances (t = nT), the reconstructed analog signal is equal to the value of the original analog signal. At times between the sample instances, the signal is the weighted sum of shifted sinc functions.
naa
nTtT
nTtT
sinnTxtx
![Page 10: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/10.jpg)
GRAPHICAL INTERPRETATION OF THESAMPLING THEOREM
![Page 11: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/11.jpg)
RECONSTRUCTION VIA SINC(X)INTERPOLATION
![Page 12: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/12.jpg)
TYPICAL SAMPLING FREQUENCIES IN SPEECH RECOGNITION
• 8 kHz: Popular in digital telephony. Provides
coverage of first three formants for most
speakers and most sounds.
• 16 kHz: Popular in speech research. Why?
• Sub 8 kHz Sampling: Can aliasing be useful in
speech recognition? Hint: Consumer
electronics.
![Page 13: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/13.jpg)
Problems
• Sampling theorem for bandlimited signals.
• How to change the sample rate of a signal?
• How this can be implemented using time domain
interpolation (based on the Sampling Theorem)?
• How this can be implemented efficiently using
digital filters?
![Page 14: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/14.jpg)
Digital Speech Coding
![Page 15: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/15.jpg)
Speech Probability Density
Function• Probability density function for x(n) is the same as for xa(t)
since x(n)=xa(nt) the mean and variance are the same for
both x(n) and xa(t).
• Need to estimate probability density and power spectrum from
speech waveforms
– probability density estimated from long term histogram of
amplitudes
– good approximation is of a gamma distribution of the form:
– Simpler approximation is Laplacian density, of the form:
![Page 16: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/16.jpg)
Measured Speech Densities
• Distribution normalized so
mean is 0 and variance is
1(x’=0, x=1)
• Gamma density more closely
approximates measured
distribution for speech than
Laplacian
• Laplacian is still a good
model and is used in
analytical studies
• Small amplitudes much
more likely than large
amplitudes by 100:1 ratio.
![Page 17: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/17.jpg)
Speech AC and Power Spectrum• Can estimate long term autocorrelation and power spectrum
using time-series analysis methods
where L is a large integer
• 8kHz sampled speech for several
speakers
• High correlation between
adjacent samples
• Low pass speech more highly
correlated than bandpass
speech
![Page 18: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/18.jpg)
Instantaneous Quantization
• Separating the processes of sampling and
quantization
• Assume x(n) obtained by sampling a bandlimited
signal at a rate at or above the Nyquist rate.
• Assume x(n) is known to infinite precision in
amplitude
• Need to quantize x(n) in some suitable manner.
![Page 19: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/19.jpg)
Quantization and Encoding
Coding is a two-stage process
• Quantization process: x( n)→x’ (n)
• Encoding process: x’ (n) → c(n)
where Δ is the (assumed fixed) quantization step size
• Decoding is a single-stage process
• decoding process:c’(n) → x’’(n)
• if c’(n)=c(n), (no errors in transmission) then x’’(n)
=x’(n)
• x’’(n) x’(n) coding and quantization loses
information.
![Page 20: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/20.jpg)
B-bit Quantization
• Use B-bit binary numbers to represent the quantized samples => 2B quantization levels
• Information Rate of Coder: I=B FS= total bit rate in bits/second
• B=16, FS= 8 kHz => I=128 Kbps
• B=8, FS= 8 kHz => I=64 Kbps
• B=4, FS= 8 kHz => I=32 Kbps
• Goal of waveform coding is to get the highest quality at a fixed value of I (Kbps), or equivalently to get the lowest value of I for a fixed quality.
• Since FS is fixed, need most efficient quantization methods to minimize I.
![Page 21: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/21.jpg)
Quantization Basics
• Assume |x(n)| ≤Xmax (possibly ∞)
• For Laplacian density (where Xmax=∞), can show that
0.35% of the samples fall outside the range -4σx ≤
x(n) ≤ 4σx => large quantization errors for 0.35% of
the samples.
• Can safely assume that Xmax is proportional to σx.
![Page 22: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/22.jpg)
Quantization Process
![Page 23: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/23.jpg)
Uniform Quantizer
• The choice of quantization range and levels chosen such that signal can easily be processed digitally
![Page 24: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/24.jpg)
Mid--Riser and Mid--Tread Quantizers
• mid-riser
– origin (x=0) in middle of rising part of the staircase
– same number of positive and negative levels
– symmetrical around origin.
• mid-tread
– origin (x=0) in middle of quantization level
– one more negative level than positive
– one quantization level of 0 (where a lot of activity occurs)
• Code words have direct numerical significance (sign-magnitude
representation for mid-riser, two’s complement for mid-tread).
![Page 25: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/25.jpg)
• Uniform Quantizers characterized by:
– number of levels—2B (B bits)
– quantization step size-Δ.
• If |x(n)| ≤ Xmax and x(n) is a symmetric density, then
Δ2B =2Xmax
Δ= 2Xmax/ 2B
• If we let
x’(n)=x(n) + e(n)
• With x(n) the unquantized speech sample, and e(n)
the quantization
- Δ/2 ≤ e(n) ≤ Δ/2
![Page 26: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/26.jpg)
Quantization Noise Model
• Quantization noise is a zero-mean, stationary white
noise process.
E[e(n)e(n+m)]=2e, m=0
= 0 otherwise
• Quantization noise is uncorrelated with the input
signal
E[x(n)e(n+m)]=0 m
• Distribution of quantization errors is uniform over
each quantization interval
pe(e)=1/ Δ - Δ/2 ≤ e ≤ Δ/2 ē =0, 2e = Δ2/12
=0 otherwise
![Page 27: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/27.jpg)
SNR for Quantization
![Page 28: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/28.jpg)
Review of Quantization
Assumptions
• Input signal fluctuates in a complicated manner so a
statistical model is valid.
• Quantization step size is small enough to remove
any signal correlated patterns in quantization error.
• Range of quantizer matches peak-to-peak range of
signal, utilizing full quantizer range with essentially
no clipping.
• For a uniform quantizer with a peak-to-peak range of
±4σx, the resulting SNR(dB) is 6B-7.2.
![Page 29: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/29.jpg)
Instantaneous Companding • In order to get constant percentage error (rather than
constant variance error), need logarithmically spaced
quantization levels
– quantize logarithm of input signal rather than input signal
itself
![Page 30: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/30.jpg)
Insensitivity to Signal Level
y(n) = ln|x(n)|
x(n) = exp[(y(n)].sign[x(n)]
where sign[x(n)] = -1 x(n) ≤ 0
= +1 x(n) > 0
The quantized log magnitude is
y’(n) = Q[log|x(n)|]
= log|x(n)| + e(n) a new error signal
Assume that e(n) is independent of log|x(n)|. The inverse is
x’(n) = exp[y’(n)].sign[x(n)]
= x(n).exp[e(n)]
Assume e(n) to be small, then
exp[e(v)] = 1 + e(n) + ….
x’(n) = x(n)[1+e(n)] = x(n) + e(n)x(n) = x(n) + f(n)
Since x(n) and e(n) are independent, then
σ2f = σ2
x . σ2e
SNR = σ2x / σ2
f =1/ σ2e
![Page 31: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/31.jpg)
Pseudo-Logarithmic Compression
• Unfortunately true logarithmic compression is not practical, since the dynamic range (ratio between the largest and smallest values) is infinite => need an infinite number of quantization levels.
• Need an approximation to logarithmic compression => µ-law/A-law compression.
![Page 32: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/32.jpg)
µ-law Compression
• y(n) = F[x(n)]
• When x(n) =0, y(n) =0;
• When =0, y(n)=x(n); no compression
• When is large and for large |x(n)|
)n(xsign.
)1log(
X/)n(x1logX
maxmax
max
max
X
)n(xlog.
log
X)n(y
![Page 33: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/33.jpg)
SNR for µ-law Quantizer
• 6B dependence on B good
• Much less dependence on Xmax/x good
• For large , SNR is less sensitive to changes in
Xmax/x good
• -law used in wireline telephony for more than 30
years.
x
max2
x
max1010
X2
X1log10)1ln(log2077.4B6)dB(SNR
![Page 34: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/34.jpg)
Companding
![Page 35: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/35.jpg)
µ-Law Companding
![Page 36: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/36.jpg)
Quantization for Optimum SNR
• Goal is to match quantizer to actual signal density to
achieve optimum SNR
• µ-law tries to achieve constant SNR over wide range of
signal variances => some sacrifice over SNR
performance when quantizer step size is matched to
signal variance
• If x is known, you can choose quantizer levels to
minimize quantization error variance and maximise SNR.
![Page 37: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/37.jpg)
Quantizer Levels for Maximum
SNR
• Variance of quantization noise is:
• 2e =E[e2(n)]=E[(x’(n)-x(n))2]
• With x’(n)=Q[x(n)]. Assume quantization levels
• [x’-(M/2), x’-(M/2)+1,…,x’(-1), x’(1), …, x’(M/2)]
• associating quantization level with signal intervals as:
• x’j = quantization level for interval [xj-1, xj]
• For symmetric, zero-mean distributions, with large
amplitudes (∞) it makes sense to define the boundary
• xo = 0 (central boundary point), x±M/2 = ±∞
• The error variance is thusde)e(pe e
22e
![Page 38: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/38.jpg)
Optimum Quantization Levels
![Page 39: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/39.jpg)
Solution for Optimum Levels
• To solve for optimum values for {x’i} and {xi}, we differentiate 2e
wrt the parameters, set the derivation to 0, and solve numerically:
Proof??
• With boundary conditions of x0 = 0, x±M/2 = ±∞
• Can also constrain quantizer to be uniform and solve for value of Δ that maximizes SNR
• Optimum boundary points lie halfway between M/2 quantizer levels
• Optimum location of quantization level x’ is at the centroid of the probability density over the interval xi-1 to xi.
• Solve the above set of equations iteratively to obtain {x’i}, {xi}
M/2 ..., 1,2,i )xx(2
1x
M/2 ..., 1,2,i 0dx)x(p)xx(
'1i
'ii
x
ix
1ix
'i
![Page 40: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/40.jpg)
Uniform/ Non-uniform Quantizers
![Page 41: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/41.jpg)
Adaptive Quantization
• Linear quantization => SNR depends on x being constant(this is clearly not the case)
• Instantaneous companding => SNR only weaklydependent on Xmax/x for large µ-law compression (100-500)
• Optimum SNR => minimize 2e when 2
x is known, non-
uniform distribution of quantization levels
• Quantization dilemna:– want to choose quantization step size large enough to
accommodate maximum peak-to-peak range of x(n);
– at the same time need to make the quantization step size small soas to minimize the quantization error.
• The non-stationary nature of speech (variability acrosssounds, speakers, backgrounds) compounds thisproblem greatly.
![Page 42: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/42.jpg)
Types of Adaptive Quantization
• Instantaneous-amplitude changes reflect sample-to-
sample variation of x(n) implying rapid adaptation.
• Syllabic-amplitude changes reflect syllable-to-syllable
variations in x(n) => slow adaptation
• Feed-forward-adaptive quantizers that estimate 2x
from x(n) itself.
• Feedback-adaptive quantizers that adapt the step
size, Δ, on the basis of the quantized signal, x’(n), (or
equivalently the codewords, c(n)).
![Page 43: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/43.jpg)
Feed Forward Adaptation
• Variable step size
• assume uniform quantizer
with step size Δ(n)
• x(n) is quantized using Δ(n)
=>c(n) and Δ(n) need to be
transmitted to the decoder
• if c’(n)=c(n) and Δ’(n)= Δ(n)
=> no error in the channel,
and
• x’’(n) = x’(n)
Don’t have x(n) at the decoder to estimate Δ(n) => need to
transmit Δ(n); This is the major drawback of the feed
forward adaptation.
![Page 44: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/44.jpg)
Feed-Forward Quantizer
• time varying gain, G(n) =>
c(n) and G(n) need to be
transmitted to the decoder.
Can’t estimate G(n) at
the decoder => it has to
be transmitted.
![Page 45: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/45.jpg)
Feed Forward Quantizers
• Feed forward systems make estimates of 2x, then make Δ
or quantization level proportional x, the gain is inversely
proportional to x.
• Assume 2x is proportional to short time energy
• where h(n) is a low pass filter
• Consider h(n) = n-1 n 1
• = 0 otherwise
)mn(h)m(x)n(
m
22
2x
2)]n([E
proof??) (recursion )1n(x)1n()n(
1)(0 )m(x)n(
222
1n
m
1mn22
![Page 46: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/46.jpg)
Feed Forward Quantizers
• Δ(n) and G(n) vary slowly compared to x(n)
• They must be sampled and transmitted as part of the waveform
coder parameters
• Rate of sampling depends on the bandwidth of the lowpass filter,
h(n)—for =0.99, the rate is about 13 Hz; for =0.9, the rate is
about 135 Hz
• It is reasonable to place limits on the variation of Δ(n) or G(n), of
the form
• Gmin G(n) Gmax
• Δ min Δ(n) Δ max
• For obtaining 2y ≈ constant over a 40 dB range in signal levels
Gmax/Gmin = Δmax/ Δmin =100 (40dB range)
![Page 47: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/47.jpg)
Feed Forward Adaptation Gain
• Δ(n) or G(n) is evaluated after every M samples
• Use 128 to 1024 samples for estimation
• Adaptive quantizer achieves up to 5.6 dB better SNR than non-adaptive quantizers
• Can achieve this SNR with low "idle channel noise" and wide speech dynamic range by suitable choice Δmin and Δmax
![Page 48: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/48.jpg)
Feedback Adaptation
• 2(n) estimated from quantizer output (or the code words).
• Advantage of feedback adaptation is that neither Δ(n) nor G(n) needs to be
transmitted to the decoder since they can be derived from the code words.
• Disadvantage of feedback adaptation is increased sensitivity to errors in
codewords, since such errors affect Δ(n) and G(n).
![Page 49: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/49.jpg)
Feedback Adaptation
• 2(n) is based only on past values of x’( )
• Two typical windows/filters are:
• Can use very short window lengths (e.g. M=2) to achieve
12dB SNR for a B=3 bit quantizer.
1n
Mnm
22
1n
)m('xM
1)n(
otherwise 0
Mn1 1/M h(n)
0
1n )n(h
m
22)mn(h)m('x)n(
![Page 50: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/50.jpg)
Alternative Approach to Adaptation
Input-output characteristic of a 3-bit adaptive quantizer
![Page 51: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/51.jpg)
Optimal Step Size Multipliers
![Page 52: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/52.jpg)
Nonuniform quantizer
CompressorUniform
Quantizer
Discrete
samplesdigital
signals
“Compressing-and-expanding” is called
“companding.”
Channel
• •
• •
• •
• •
Decoder Expanderreceived
digital
signals
output
![Page 53: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/53.jpg)
Compression Techniques
used. is 255 U.S.,In the
1ln
)(1ln)(
1)(
nally)internatiopopular (very
compressor law-
1
2
1
twtw
tw
1)(1
ln1
)(ln1
1)(0
ln1
)(
)(
0 ,1)(
compressor law-
1
1
1
1
2
1
twAA
twA
Atw
A
twA
tw
Atw
A
![Page 54: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/54.jpg)
Practical Implementation of µ-law compressor
![Page 55: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/55.jpg)
Waveform Coders
Pulse Code Modulation (PCM)
• Needs the sampling frequency, fs, to be greater than the
Nyquist frequency (twice the maximum frequency in the
signal)
• For n bits per sample, the dynamic range is +2n-1 and the
quantisation noise power equals 2/12 ( = step size)
• Total bit rate = nfs
• Can use non-uniform quantisation / variable length codes
– Logarithmic quantization (A-law, -law)
– Adaptive Quantization
![Page 56: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/56.jpg)
G.711
• Pulse Code Modulation (PCM) codecs are the simplest form of waveform codecs.
• Narrowband speech is typically sampled 8000 times per second, and then each speech sample must be quantized.
• If linear quantization is used then about 12 bits per sample are needed, giving a bit rate of about 96 kbits/s.
• However this can be easily reduced by using non-linear quantization.
• For coding speech it was found that with non-linear quantization 8 bits per sample was sufficient for speech quality which is almost indistinguishable from the original.
• This gives a bit rate of 64 kbits/s, and two such non-linear PCM codecs were standardised in the 1960s
![Page 57: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/57.jpg)
Waveform Coders
Differential Pulse Code Modulation (DPCM)
• Predict the next sample based on the last few decoded
samples
• Minimise mean squared error of prediction residual
– use LP coding
• Good prediction results in a reduction in the dynamic range
needed to code the prediction residual and hence a
reduction in the bit rate.
• Can use non-uniform quantisation or variable length codes
![Page 58: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/58.jpg)
Differential PCM (DPCM)
• Fixed predictors can give from 4-11dB SNR
improvement over direct quantization (PCM)
• Most of the gain occurs with first order predictor
• Prediction up to 4th or 5th order helps
![Page 59: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/59.jpg)
Another Implementation of DPCM
Quantization error is not accumulated.
![Page 60: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/60.jpg)
For slowly varying signals, a future sample can
predicted from past samples.
Transversal filter can perform the prediction process.
Predictor
++
-
e(t)s(t)
Transmitter Side
Predictor
++
s(t)e(t)
Receiver Side
+
![Page 61: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/61.jpg)
DPCM with Adaptive Quantization• Quantizer step size proportional to variance at quantizer input
• Can use d(n) or x(n) to control step size
• Get 5 dB improvement in SNR over µ-law non-adaptive PCM
• Get 6 dB improvement in SNR using differential configuration with fixed prediction => ADPCM is about 10-11 dB SNR better than from a fixed quantizer.
![Page 62: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/62.jpg)
Feedback ADPCM
• Can achieve same improvement in SNR as feed forward system
![Page 63: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/63.jpg)
DPCM with Adaptive Prediction
• Need adaptive prediction to
handle non-stationarity of
speech.
• ADPCM encoders with
pole-zero decoder filters
have proved to be
particularly versatile in
speech applications.
• The ADPCM 32 kbits/s
algorithm adopted for the
G.721 CCITT standard
(1984) uses a pole-zero
adaptive predictor.
![Page 64: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/64.jpg)
DPCM with Adaptive Prediction
• Prediction coefficients assumed to be time-dependent of the form
• Assume speech properties remain fixed over short time intervals.
• Choose to minimize the average squared prediction error over
short intervals.
• The optimum predictor coefficients satisfy the relationships
• Where Rn(j) is the short-time autocorrelation function of the form
• w(n-m), is window positioned at sample n of input.
• Update every 10-20msec.
p
k
kknxnnx
1
~~
)()()(
)(nk
p , . . . 1,2,j ),()()(1
kjRnjRn
p
k
kn
m
npjjmnwmjxmnwmxjR 0 ),()()()()(
s'
![Page 65: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/65.jpg)
Prediction Gain for DPCM with
Adaptive Prediction
• Fixed prediction
10.5dB prediction gain
for large p.
• Adaptive prediction
14dB gain for large p.
• Adaptive prediction
more robust to
speaker and speech
material.
)(
)(log10log10
2
2
1010ndE
nxEGp
![Page 66: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/66.jpg)
Comparison of Coders
• 6 dB between curves
• Sharp increase in
SNR with both fixed
prediction and
adaptive quantization
• Almost no gain for
adapting first order
predictor
![Page 67: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/67.jpg)
ADPCM G.721 Encoder
![Page 68: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/68.jpg)
ADPCM G.721 Encoder
• The algorithm consists of an adaptive quantizer and anadaptive pole-zero predictor.
• The pole-zero predictor (2 poles, 6 zeros) estimates theinput signal and hence it reduces the error variance.
• The quantizer encodes the error sequence into asequence of 4-bit words. The prediction coefficients areestimated using a gradient algorithm and the stability ofthe decoder is checked by testing the two roots of A(z).
• The performance of the coder, in terms of the MOS scale,is above 4 but it degrades as the number ofasynchronous tandem coding increases. The G.721ADPCM algorithm was also modified to accommodate 24and 40 kbits/s in the G.723 standard.
• The performance of ADPCM degrades quickly for ratesbelow 24 kbits/s.
![Page 69: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/69.jpg)
Delta Modulation
• Simplest form of differential quantization is in delta
modulation (DM).
• Sampling rate chosen to be many times the Nyquist
rate for the input signal => adjacent samples are
highly correlated.
• This leads to a high ability to predict x(n) from past
samples, with the variance of the prediction error
being very low, leading to a high prediction gain =>
can use simple 1-bit (2-level) quantizer =>the bit rate
for DM systems is just the (high) sampling rate of the
signal.
![Page 70: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/70.jpg)
Linear Delta Modulation (LDM)
![Page 71: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/71.jpg)
Linear Delta Modulation
2- level quantizer with fixed step
size with quantizer form
d’(n) = if d(n) > 0 (c(n) =1)
= - if d(n) <0 (c(n) =0)
![Page 72: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/72.jpg)
Illustration of DM• Basic equations of DM are x’(n) = x’(n-1) + d’(n)
• When ≈1 (it is digital integration or accumulation of increments of ±Δ)
• d(n) = x(n) – x’(n-1) = x(n) –x(n-1)-e(n-1)
• d(n) is the first backward difference of x(n), or an approximation to the derivative of the input.
• How big do we make Δ--at maximum slope of xa(t) we need
• Δ/T|dxa(t)/dt|
• Or else the resonstructed signal will lag the actual signal ‘slope overload’ condition resulting in quantization error called 'slope overload distortion'
• Since x’(n) can only increase by fixed increments of Δ, fixed DM is called linear DM or LDM
![Page 73: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/73.jpg)
DM Waveform
![Page 74: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/74.jpg)
DM Granular Noise
• When xa(t) has small slope, Δ determines the peak error when xa(t)=0, quantizer will be alternating sequence of 0's and 1's, and x’(n) will alternate around zero with peak variation of Δ this condition us called “granular noise”.
• Need large step size to handle wide dynamic range
• Need small step size to accurately represent low level signals.
• With LDM we need to worry about dynamic range and amplitude of the difference signal => choose Δto maximize mean-squared quantization error (a compromise between slope overload and granular noise).
![Page 75: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/75.jpg)
Performance of DM SystemsNormalized step size defined as
oversampling index defined as
F0 = Fs/(2FN)
where Fs is the sampling rate of the DM
And FN is the Nyquist frequency of the
signal.
The total bit rate of the DM is
BR = Fs = 2FN.Fo
Can see that for given value of F0, there is
an optimum value of Δ.
Optimum SNR increases by 9dB for each
doubling of F0 => this is better than the 6dB
obtained by increasing the number of
bits/sample by 1 bit curves are very sharp
around optimum value of Δ => SNR is very
sensitive to input level for SNR=35 dB, for
FN=3kHz => 200 kbps rate
For toll quality much higher rate is required.
2/12))1n(x)n(x(E
![Page 76: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/76.jpg)
Adaptive Delta Mod
• Step size adaptation for DM (from codewords)
– Δ(n) =M.Δ(n-1)
– Δmin Δ(n) Δmax
• M is a function of c(n) and c(n-1), since c(n) depends only on the sign of d(n)
– d(n) = x(n) - x’(n-1)
• The sign of d(n) can be determined before the actual quantized value d’(n) which needs the new value of Δ(n) for evaluation
• The algorithm for choosing the step size multiplier is
– M = P > 1 if c(n) = c(n-1)
– M = Q <1 if c(n) c(n-1)
![Page 77: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/77.jpg)
Adaptive DM Performance
• Slope overload in LDM causes runs of 0’s or 1’s
• Granularity causes runs of alternating 0’s and 1’s
• figure above shows how adaptive DM performs with P=2, Q=1/2,
=1.
• During slope overload, step size increases exponentially to follow
increase in waveform slope.
• During granularity, step size decreases exponentially to Δmin and
stays there as long as the slope is small.
![Page 78: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/78.jpg)
ADM Parameter Behavior
• ADM parameters are P, Q, Δmin
and Δmax
• Choose Δmax/ Δmin to maintain
high SNR over range of input
signal levels.
• Δmin should be chosen to
minimize idle channel noise.
• PQ should satisfy PQ1 for
stability.
![Page 79: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/79.jpg)
Comparison of LDM, ADM and of log
PCM
• ADM is 8 dB better SNR at 20
kbps than LDM, and 14 dB
better SNR at 60 kbps than
LDM.
• ADM gives a 10 dB increase in
SNR for each doubling of the bit
rate; LDM gives about 6 dB.
• For bit rate below 40 kbps, ADM
has higher SNR than µ-law
PCM; for higher bit rates log
PCM has higher SNR.
![Page 80: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/80.jpg)
Higher Order Prediction in DM
![Page 81: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/81.jpg)
Waveform Coding versus Block
Processing
• Waveform coding
– sample-by-sample matching of waveforms
– coding quality measured using SNR
• Source modeling (block processing)
– block processing of signal => vector of outputs every block
– overlapped blocks
![Page 82: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/82.jpg)
Adaptive Predictive Coder
Transmitter
Receiver
![Page 83: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/83.jpg)
Adaptive Predictive Coder
• The use of adaptive long-term prediction in addition toshort-term prediction provides additional coding gain (atthe expense of higher complexity) and high quality speechat 16kbits/s. The long-term (long delay) predictor
• Provides for the pitch (fine) structure of the short-timevoiced spectrum. The index is the pitch period insamples and is a small integer. The long-term predictor(ideally) removes the periodicity and thereby redundancy.
• At the receiver the synthesis filter introduces periodicitywhile the synthesis filter associated with the short-termprediction polynomial represents the vocal tract.
• The parameters of the short-term predictors are computedfor every frame (typically 10 to 30 ms). The long-termprediction parameters are computed more often.
j
ji
ijL za)z(A
![Page 84: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/84.jpg)
Model-Based Speech Coding• Waveform coding based on optimizing and maximizing
SNR about as far as possible.– achieved bit rate reductions on the order of 4:1 (i.e., from 128
kbps PCM to 32 kbps ADPCM) at the same time achieving toll quality SNR for telephone-bandwidth speech
• To lower bit rate further without reducing speech quality, we need to exploit features of the speech production model, including:– source modeling
– spectrum modeling
– use of codebook methods for coding efficiency
• We also need a new way of comparing performance of different waveform and model-based coding methods– an objective measure, like SNR, isn’t an appropriate measure
for model-based coders since they operate on blocks of speech and don’t follow the waveform on a sample-by-sample basis.
– new subjective measures need to be used that measure user perceived quality, intelligibility, and robustness to multiple factors.
![Page 85: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/85.jpg)
Frequency Domain Speech Coding • All frequency domain methods for speech coding
exploit the Short-Time Fourier Transform using a
filter bank view with scalar quantization
• Sub-band Coding-use small number of filters with
wide and overlapping bandwidths
• 2-band sub-band coder
– advantage of sub-band coder is that the quantization noise
is limited to the sub-band that generated it => better
perceptual control of noise spectrum
– with careful design of filters, can get complete cancellation
of quantization noise that leaks across bands => use QMF-
Quadrature Mirror Filters
– can continue to split lower bands into 2 bands, giving
octave band filter bank => auditory front-end like analysis.
![Page 86: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/86.jpg)
Sub-band coders
• Exploit the frequency sensitivity of the auditory
system.
• Split the signal into sub-band using band pass
filters.
• Code each sub-band at an appropriate resolution
– e.g. 4 bits per sample in the lower sub-bands and
– 2 bits per sample in the upper sub-bands
• Can also exploit auditory masking
– use fewer bits if a neighbouring sub-band is much louder
• Basis for the MPEG audio standard (5:1
compression of CD quality audio with no perceptual
degradation)
![Page 87: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/87.jpg)
Sub-band coder
![Page 88: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/88.jpg)
The 16 kbits/s SBC compared favorably against 16 kbits/s
ADPCM, and the 9.6 kbits/s SBC compared favorably
against 10.3 and 12.9 kbits/s ADMThe low-band filters in speech specific implementations are
usually associated with narrower widths so that they can resolve
more accurately the low-frequency narrowband formants. In the
absence of quantization noise, perfect reconstruction can be
achieved using Quadrature-Mirror Filter (QMF) banks.
![Page 89: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/89.jpg)
AT&T Sub-band coder
![Page 90: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/90.jpg)
AT&T Sub-band coder
• The AT&T SBC was used for voice storage at 16 or 24
kbits/s and consists of a five-band non-uniform tree-
structured QMF bank in conjunction with APCM coders.
• A silence compression algorithm is also part of the
standard. The frequency ranges for each band are: 0-0.5
kHz, 0.5-1 kHz, 1-2 kHz, 2-3 kHz, and 3-4 kHz. For the 16
kbits/s implementation the bit allocations are {4/4/2/2/0}
and for the 24 kbits/s the bit assignments are {5/5/4/3/0}.
The one-way delay of this coder is less than 18 ms. It
must be noted that although this coder was the
workhorse for the older AT&T voice store and forward
machines, the most recent AT&T machines use the new
16 kbits/s Low Delay CELP algorithm.
![Page 91: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/91.jpg)
G.722 Sub-band coder
![Page 92: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/92.jpg)
CCITT standard (G.722)
• The CCITT standard (G.722) for 7kHz audio at 64 kbits/s for
ISDN teleconferencing is based on a two-band sub-
band/ADPCM coder.
• The low-frequency sub-band is quantized at 48 kbits/s while the
high-frequency sub-band is coded at 16 kbits/s.
• The G.722 coder includes an adaptive bit allocation scheme
and an auxiliary data channel.
• Provisions for lower rates have been made by quantizing the
low-frequency sub-band at 40 kbits/s or at 32 kbits/s.
• The MOS at 64 kbits/s is greater than four for speech and
slightly less than four for music signals, and the analysis-
synthesis QMF banks introduce a delay of less than 3 ms.
![Page 93: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/93.jpg)
Introduction to VQ
• Vector quantization (VQ) is a lossy data
compression method
• based on the principle of block coding.
• It is a fixed-to-fixed length algorithm.
• In 1980, Linde, Buzo, and Gray (LBG) proposed a VQ
design algorithm based on a training sequence.
• The use of a training sequence bypasses the need
for multi-dimensional integration. this algorithm is
referred to as LBG-VQ.
![Page 94: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/94.jpg)
• Toll quality speech coder (digital wireline phone)
– G.711 (A-LAW and μ-LAW at 64 kbits/sec)
– G.721 (ADPCM at 32 kbits/ sec)
– G.723 (ADPCM at 40, 24 kbps)
– G.726 (ADPCM at 16,24,32,40 kbps)
• Low bit rate speech coder (cellular phone/IP phone)
– G.728 low delay (16 kbps, delay <2ms, same or better quality
than G.721)
– G. 723.1 (CELP Based, 5.3 and 6.4 kbits/sec)
– G.729 (CELP based, 8 bps)
– GSM 06.10 (13 and 6.5 kbits/sec, simple to implement, used in
GSM phones)
![Page 95: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/95.jpg)
1-D VQ
• A VQ is nothing more than an approximator.
• similar to that of ``rounding-off'' An example of a 1-dimensional VQ is shown below
• Here, every number less than -2 are approximated by -3. Every number between -2 and 0 are approximated by -1. Every number between 0 and 2 are approximated by +1. Every number greater than 2 are approximated by +3. Note that the approximate values are uniquely represented by 2 bits. This is a 1-dimensional, 2-bit VQ. It has a rate of 2 bits/dimension.
![Page 96: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/96.jpg)
2-D VQ
• An example of a 2-
dimensional VQ is shown
below:
• Here, every pair of numbers
falling in a particular region
are approximated by a red star
associated with that region.
Note that there are 16 regions
and 16 red stars -- each of
which can be uniquely
represented by 4 bits. Thus,
this is a 2-dimensional, 4-bit
VQ. Its rate is also 2
bits/dimension.
![Page 97: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/97.jpg)
• In the above two examples, the red stars are called
codevectors and the regions defined by the blue
borders are called encoding regions. The set of all
codevectors is called the codebook and the set of all
encoding regions is called the partition of the space.
![Page 98: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/98.jpg)
Design Problem
• Given a
– vector source with its statistical properties known
– a distortion measure
– the number of codevectors
• To find
– codebook (the set of all red stars)
– partition (the set of blue lines) which result in the
smallest average distortion.
![Page 99: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/99.jpg)
Design Problem Cont.
• Assume that there is a training sequenceconsisting of M source vectors
• This training sequence can be obtained from some large database– For example, if the source is a speech signal, then
the training sequence can be obtained by recording several long telephone conversations.
– M is assumed to be sufficiently large so that all the statistical properties of the source are captured by the training sequence
![Page 100: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/100.jpg)
Design Problem Cont.
• Assume that the source vectors are k-dimensional.
• Let N be the number of codevectors and let
represents the codebook.
• Each codevector is k-dimensional, e.g.,
• Let Sn be the encoding region associated with codevector
and let P denote the partition of the space.
![Page 101: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/101.jpg)
Design Problem Cont.
• If the source vector xm is in the encoding region Sn, then its
approximation (denoted by Q(xm)) is cn.
• Assuming a squared-error distortion measure, the average
distortion is given by:
• The design problem can be succinctly stated as follows: Given
T and N, find C and P such that Dave is minimized.
![Page 102: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/102.jpg)
Optimality Criteria
• If C and P are a solution to the above minimization problem, then it must satisfied the following two criteria.
• Nearest Neighbor Condition:
– This condition says that the encoding region Sn should consists of all vectors that are closer to cn than any of the other codevectors. For those vectors lying on the boundary (blue lines), any tie-breaking procedure will do
• Centroid Condition:
– This condition says that the codevector cn should be average of all those
training vectors that are in encoding region Sn . In implementation, one
should ensure that at least one training vector belongs to each encoding
region (so that the denominator in the above equation is never 0).
![Page 103: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/103.jpg)
LBG Design Algorithm
• Iterative algorithm which solves the two optimality criteria.
• Requires initial codebook C(0).
• The initial codebook is obtained by the splitting method.
• In this method, an initial codevector is set as the average of the entire training sequence.
• This codevector is then split into two. The iterative algorithm is run with these two vectors as the initial codebook. The final two codevectors are splitted into four and the process is repeated until the desired number of codevectors is obtained.
![Page 104: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/104.jpg)
LBG Design Algorithm Cont.
1. Given T. Fixed >0 to be a “small” number.
2. Let N =1.
Calculate
3. Splitting:
Set N=2N
4. Iteration: Let D(0)ave=D*
ave. Set the iteration index i=0;
i. For m=1,2, …, M, find the minimum value of over all n=1, 2, …, N. Let n* be the index which achieves the minimum. Set
![Page 105: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/105.jpg)
LBG Design Algorithm Cont.
ii. For n=1, 2, …, N, update the codevector.
iii. Set i=i+1
iv. Calculate
v. If , go back to Step (i).
vi. Set . For n=1, 2, …, N, set as the final
codevector
5. Repeat steps 3 and 4 until the desired number of
codevectors is obtained.
![Page 106: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/106.jpg)
Performance
• The performance of VQ are typically given in terms
of the signal-to-distortion ratio (SDR):
(in dB),
• where 2 is the variance of the source and Dave is
the average squared error distortion. The higher the
SDR the better the performance.
![Page 107: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/107.jpg)
Toy Example of VQ Coding• 2-pole model of the vocal tract => 4
reflection coefficients
• 4-possible vocal tract shapes => 4 sets of reflection coefficients
• 1. Scalar Quantization-assume 4 values for each reflection coefficient => 2-bits x 4 coefficients = 8 bits/frame
• 2. Vector Quantization-only 4 possible vectors => 2-bits to choose which of the 4 vectors to use for each frame (pointer into a codebook)
• this works because the scalar components of each vector are highly correlated.
• if scalar components are independent => VQ offers no advantage over scalar quantization
![Page 108: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/108.jpg)
Comparison of Scalar and Vector
Quantization
![Page 109: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/109.jpg)
VQ Codebook of LPC Vectors
64 vectors
in a
codebook
of spectral
Shapes
10-bit VQ is
comparable
to 24-bit
scalar
quantization.
![Page 110: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/110.jpg)
VQ Properties
• In VQ a cell can have arbitrary size and shape;
• Scalar quantization a decision region can have arbitrary size but its shape is fixed
• VQ has a distortion measure which is a measure of distance between the input and output used to both design the codebook vectors and to choose the optimal reconstruction vector
Iterative VQ Design Algorithm
• 1. assume initial set of points given– map all vectors to best set of points
– recomputed centroids from ensemble vectors in each cell
• 2. iterate until change in reconstruction levels is small
• Problem-need to know px(x) to correctly compute centroids
• Solution-use training set as an estimate of ensemble, k-mean clustering algorithm.
![Page 111: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/111.jpg)
VQ Performance
• The simplest form of a vector quantizer can be considered as a generalization of the scalar PCM and is called Vector PCM (VPCM). In VPCM, the codebook is fully searched (full search VQ or F-VQ) for each incoming vector. The number of bits per sample in VPCM is given by
– B =(log2N)/k
• and the signal-to-noise ratio for VPCM is given by
– SNRk = 6B + Kk (dB)
• VPCM yields improved SNR since it exploits the correlation within the vectors. In the case of speech coding, it is reported that K2 is larger than K1 by more than 3 dB while K8 is larger than K1 by more than 8 dB.
![Page 112: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/112.jpg)
VQ Performance
• Even though VQ offers significant coding gain by
increasing N and k its memory and computational
complexity grows exponentially with k for a given
rate.
• More specifically, the number of computations
required for F-VQ (full search VQ) is of the order of
2Bk while the number of memory locations required is
k 2Bk.
• In general the benefits of VQ are realized at rates of 1
bit per sample.
![Page 113: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/113.jpg)
GS-VQ• The complexity of VQ can also
be reduced by normalizing the
vectors of the codebook and
encoding the gain separately.
The technique is called
Gain/Shape VQ (GS-VQ) and
has been introduced by Buzo
and later studied by Sabin and
Gray.
• The waveform shape is
represented by a codevector
from the shape codebook while
the gain can be encoded from
the gain codebook.
Comparisons of GS-VQ with F-
VQ in the case of speech
coding at one bit per sample
revealed that GS-VQ yields
about 0.7 dB improvement at
the same level of complexity.
Encoder
Decoder
![Page 114: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/114.jpg)
Adaptive VQ
• The VQ methods discussed thus far are associated with time-invariant(fixed) codebooks. Since speech is a non-stationary process, onewould like to adapt the codebooks ("codebook design on the fly") toits changing statistics.
• VQ with adaptive codebooks is called adaptive VQ (A-VQ) andapplications to speech coding have been reported.
• There are two types of A-VQ, namely, forward adaptive and backwardadaptive.
• In backward adaptive VQ, codebook updating is based on past datawhich is also available at the decoder.
• Forward A-VQ updates the codebooks based on current (or sometimesfuture) data and as such additional information must be encoded.
• The principles of forward and backward A-VQ are similar to those ofscalar adaptive quantization.
• Practical A-VQ systems are backward adaptive and they can beclassified into vector predictive quantizers and finite state quantizers.Vector predictive coders are essentially an extension of scalarpredictive DPCM coders.
![Page 115: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/115.jpg)
Implementation Issues
• The complexity in high-dimensionality VQ can be reduced significantly
with the use of structured codebooks which allow for efficient search.
• Tree-structured and multistep vector quantizers are associated with
lower encoding complexity at the expense of loss of performance and in
some cases increased memory requirements.
• Gray and Abut compared the performance of F-VQ and binary tree
search VQ for speech coding and reported a degradation of 1 dB in the
SNR for tree-structured VQ.
• Multistep vector quantizers consist of a cascade of two or more
quantizers each one encoding the error or residual of the previous
quantizer.
• Gersho and Cuperman compared the performance of full search
(dimension 4) and multistep vector quantizers (dimension 12) for
encoding speech waveforms at 1 bit per sample and reported a gain of 1
dB in the SNR in the case of multistep VQ.
![Page 116: CODING OF SPEECH SIGNALS• Instantaneous companding => SNR only weakly dependent on Xmax/ x for large µ-law compression (100-500) • Optimum SNR => minimize 2 e when 2 x is known,](https://reader031.vdocuments.site/reader031/viewer/2022011818/5e8c5fea66cbc429aa36bfa8/html5/thumbnails/116.jpg)
I SEMESTER M. TECH. SESSIONAL TEST 2005-06
VOICE AND PICTURE CODING (EL-653)
Differentiate between Vowel and Diphthongs. (4)
Calculate the pitch in mel for a frequency of 5000Hz signal. (4)
Prove that the optimum level of the quantizer level is at the centroid of the probability density
function over the interval.
For an AT&T sub-band coder the frequency ranges for each band are: 0-0.5 kHz, 0.5-1 kHz,
1-2 kHz, 2-3 kHz, and 3-4 kHz. For the bit allocation {4/4/2/2/0} calculate the bit rate. Explain
why 0 bit is used for the 3-4 kHz?
Ans: 4x1+4x1+2x2+2x2=16kbps, only 3.4kHz bw is available