ece 556 basics of digital speech processingece556.cankaya.edu.tr/uploads/files/sunu2.pdf · nyquist...
TRANSCRIPT
![Page 1: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/1.jpg)
ECE 556 BASICS OF DIGITAL SPEECH PROCESSING
Assıst.Prof.Dr. Selma ÖZAYDIN
Spring Term-2017
Lecture 2
![Page 2: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/2.jpg)
Analog Sound to Digital Sound
![Page 3: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/3.jpg)
Characteristics of Sound
• Amplitude
• Wavelength (w)
• Frequency ( )
• Timbre
• Hearing: [20Hz – 20KHz]
• Speech: [200Hz – 8KHz]
![Page 4: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/4.jpg)
Digital Representation of Audio
• Must convert wave form to digital
–sample
–quantize
–compress
![Page 5: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/5.jpg)
Signal Sampling
• Sampling is converting a continuous time signal into a discrete time signal
• Categories: – Impulse (ideal) sampling
– Natural Sampling
– Sample and Hold operation
• Measure amplitude at regular intervals
![Page 6: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/6.jpg)
Impulse Sampling
![Page 7: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/7.jpg)
Impulse Sampling
• Impulse train spaced at Ts multiplies the signal x(t) in time domain, creating – discrete time,
– continuous amplitude signal xs(t)
• Impulse train spaced at fs convolutes the signal X(f) in frequency domain, creating – Repeating spectrum Xs(f)
– spaced at fs
![Page 8: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/8.jpg)
The Aliasing Effect
fs > 2fm
fs < 2fm
Aliasing happens
![Page 9: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/9.jpg)
Aliasing
Under sampling will result in
aliasing that will create spectral overlap
![Page 10: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/10.jpg)
Ideal Sampling and Aliasing
• Sampled signal is discrete in time domain with spacing Ts
• Spectrum will repeat for every fs Hz
• Aliasing (spectral overlapping) if fs is too small (fs < 2fm)
• Nyquist sampling rate fs = 2fm
• Generally oversampling is done fs > 2fm
![Page 11: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/11.jpg)
Natural Sampling
![Page 12: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/12.jpg)
Natural Sampling
• Sampling pulse train has a finite width τ
• Sampled spectrum will repeat itself with a ‘Sinc’ envelope
• More realistic modeling
• Distortion after recovery depends on τ/Ts
![Page 13: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/13.jpg)
Different Sampling Models
![Page 14: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/14.jpg)
Nyquist Theorem
For lossless digitization, the sampling rate should be at least twice the maximum frequency response.
• In mathematical terms:
fs > 2*fm
• where fs is sampling frequency and fm is the maximum frequency in the signal
![Page 15: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/15.jpg)
Nyquist Limit • max data rate = 2 H log2V bits/second, where
H = bandwidth (in Hz) V = discrete levels (bits per signal change)
• Shows the maximum number of bits that can be sent per second on a noiseless channel with a bandwidth of H, if V bits are sent per signal – Example: what is the maximum data rate for a 3kHz channel
that transmits data using 2 levels (binary) ? – Solution: H = 3kHz; V = 2; max. data rate = 2x3,000xln2=6,000bits/second
![Page 16: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/16.jpg)
Limited Sampling
• But what if one cannot sample fast enough?
• Reduce signal frequency to half of maximum
sampling frequency
– low-pass filter removes higher-frequencies
– e.g., if max sampling frequency is 22kHz, must low-pass filter a signal down to 11kHz
![Page 17: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/17.jpg)
Sampling Ranges
• Auditory range 20Hz to 22.05 kHz – must sample up to to 44.1kHz
– common examples are 8.000 kHz, 11.025 kHz, 16.000 kHz, 22.05 kHz, and 44.1 KHz
• Speech frequency [200 Hz, 8 kHz] – sample up to 16 kHz
– but typically 4 kHz to 11 kHz is used
![Page 18: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/18.jpg)
![Page 19: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/19.jpg)
![Page 20: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/20.jpg)
![Page 21: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/21.jpg)
![Page 22: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/22.jpg)
Quantization • Quantization is done to make the signal
amplitude discrete
Analog Signal
Sam
plin
g
Discrete Time Cont. Ampl. Signal
Qu
anti
zati
on
Discrete Time & Discrete
Ampl Signal
Map
pin
g
Binary Sequence
-1.5
-1
-0.5
0
0.5
1
1.5
![Page 23: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/23.jpg)
![Page 24: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/24.jpg)
Sampling and 4-bit quantization
![Page 25: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/25.jpg)
Quantization
• Typically use – 8 bits = 256 levels
– 16 bits = 65,536 levels
• How should the levels be distributed? – Linearly? (PCM)
– Perceptually? (u-Law)
– Differential? (DPCM)
– Adaptively? (ADPCM)
![Page 26: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/26.jpg)
![Page 27: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/27.jpg)
Linear Quantization
L levels
(L-1)q = 2Vp = Vpp
For large L
Lq ≈ Vpp
![Page 28: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/28.jpg)
Linear Quantization Summary • Mean Squared Error (MSE) = q2/12
• Mean signal power = E[m2(t)]
• Mean SNR = 12 E[m2(t)]/q2
• For binary PCM, L = 2n n bits/sample
• Let signal bandwidth = B Hz – If Nyquist sampling 2B samples/sec
– If 20% oversampling 1.2(2B) samples/sec
• Bit rate = 2nB bits/sec
• Required channel bandwidth = nB Hz
![Page 29: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/29.jpg)
![Page 30: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/30.jpg)
![Page 31: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/31.jpg)
![Page 32: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/32.jpg)
![Page 33: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/33.jpg)
![Page 34: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/34.jpg)
![Page 35: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/35.jpg)
![Page 36: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/36.jpg)
![Page 37: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/37.jpg)
![Page 38: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/38.jpg)
![Page 39: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/39.jpg)
![Page 40: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/40.jpg)
![Page 41: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/41.jpg)
![Page 42: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/42.jpg)
![Page 43: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/43.jpg)
![Page 44: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/44.jpg)
![Page 45: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/45.jpg)
![Page 46: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/46.jpg)
![Page 47: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/47.jpg)
![Page 48: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/48.jpg)
Pulse Code Modulation
• Pulse modulation – Use discrete time samples of analog signals
– Transmission is composed of analog information sent at different times
– Variation of pulse amplitude or pulse timing allowed to vary continuously over all values
• PCM – Analog signal is quantized into a number of discrete
levels
CS 414 - Spring 2012
![Page 49: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/49.jpg)
PCM Mapping
![Page 50: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/50.jpg)
PCM Quantization and Digitization
Digitization Quantization
![Page 51: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/51.jpg)
Linear Quantization (PCM)
• Divide amplitude spectrum into N units (for log2N bit quantization)
0
50
100
150
200
250
300
0 50 100 150 200 250 300
Sound Intensity
Qu
anti
zati
on I
nd
ex
![Page 52: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/52.jpg)
Non-uniform Quantization
![Page 53: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/53.jpg)
Non-Uniform Quantization
• In speech signals, very low speech volumes predominates
– Only 15% of the time, the voltage exceeds the RMS value
• These low level signals are under represented with uniform quantization
– Same noise power (q2/12) but low signal power
• The answer is non uniform quantization
![Page 54: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/54.jpg)
Non-uniform Quantization
Compress the signal first
Then perform linear quantization
Result in nonlinear quantization
![Page 55: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/55.jpg)
Uniform Non-Uniform
![Page 56: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/56.jpg)
Perceptual Quantization (u-Law)
• Want intensity values logarithmically mapped over N quantization units
0
50
100
150
200
250
300
0 50 100 150 200 250 300
Sound Intensity
Quan
tiza
tion
In
dex
![Page 57: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/57.jpg)
µ-law and A-law Widely used compression algorithms
![Page 58: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/58.jpg)
Differential Pulse Code Modulation (DPCM)
• What if we look at sample differences, not the samples themselves?
– dt = xt-xt-1
– Differences tend to be smaller
• Use 4 bits instead of 12, maybe?
![Page 59: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/59.jpg)
Differential Pulse Code Modulation (DPCM)
• Changes between adjacent samples small
• Send value, then relative changes
– value uses full bits, changes use fewer bits
– E.g., 220, 218, 221, 219, 220, 221, 222, 218,.. (all values between 218 and 222)
– Difference sequence sent: 220, +2, -3, 2, -1, -1, -1, +4....
– Result: originally for encoding sequence 0..255 numbers need 8 bits;
– Difference coding: need only 3 bits
![Page 60: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/60.jpg)
Adaptive Differential Pulse Code Modulation (ADPCM)
• Adaptive similar to DPCM, but adjusts the width of the quantization steps
• Encode difference in 4 bits, but vary the mapping of bits to difference dynamically
– If rapid change, use large differences
– If slow change, use small differences
![Page 61: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/61.jpg)
Signal-to-Noise Ratio (metric to quantify quality of digital audio)
![Page 62: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/62.jpg)
Signal To Noise Ratio
• Measures strength of signal to noise
SNR (in DB)=
• Given sound form with amplitude in [-A, A]
• Signal energy =
)(log10 10
energyNoise
energySignal
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1A
0
-A
2
2A
CS 414 - Spring 2012
![Page 63: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/63.jpg)
Quantization Error
• Difference between actual and sampled value
– amplitude between [-A, A]
– quantization levels = N
• e.g., if A = 1,
N = 8, = 1/4
N
A2
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
CS 414 - Spring 2012
![Page 64: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/64.jpg)
Compute Signal to Noise Ratio
• Signal energy = ; Noise energy = ;
• Noise energy =
• Signal to noise =
• Every bit increases SNR by ~ 6 decibels
12
2
N
A2
2
2
3 N
A
2
3log10
2N
2
2A
![Page 65: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/65.jpg)
Example
• Consider a full load sinusoidal modulating signal of amplitude A, which utilizes all the representation levels provided
• The average signal power is P= A2/2
• The total range of quantizer is 2A because modulating signal swings between –A and A. Therefore, if it is N=16 (4-bit quantizer), Δ = 2A/24 = A/8
• The quantization noise is Δ2/12 = A2/768
• The S/N ratio is (A2/2)/(A2/768) = 384; SNR (in dB) 25.8 dB
![Page 66: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/66.jpg)
Data Rates
• Data rate = sample rate * quantization * channel
• Compare rates for CD vs. mono audio – 8000 samples/second * 8 bits/sample * 1 channel
= 8 kBytes / second
– 44,100 samples/second * 16 bits/sample * 2 channel = 176 kBytes / second ~= 10MB / minute
![Page 67: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/67.jpg)
Comparison and Sampling/Coding Techniques
CS 414 - Spring 2012
![Page 68: ECE 556 BASICS OF DIGITAL SPEECH PROCESSINGece556.cankaya.edu.tr/uploads/files/Sunu2.pdf · Nyquist Limit • max data rate = 2 H log 2 V bits/second, where H = bandwidth (in Hz)](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed50c8a3394b6616e09c19d/html5/thumbnails/68.jpg)
Summary