multimedia communications ecp 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015....
TRANSCRIPT
![Page 1: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/1.jpg)
Omar A. Nasr
April, 2015
Multimedia communications
ECP 610
1
![Page 2: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/2.jpg)
Speech coding (compression)
2
A procedure to represent a digitized speech signal using as
few bits as possible, maintaining at the same time a
reasonable level of speech quality.
The standard defines the compression algorithm, not the
platform of implementation (DSP, GPP, FPGA, ASIC, .. etc)
Uncoded speech: 8 kHz sampling x 16bits/sample =
128kbps
Issues: effects due to the channel errors
![Page 3: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/3.jpg)
A good speech coder
3
Low Bit rate
High speech quality (intelligibility, naturalness, pleasantness,
and speaker recognizability)
Robustness across Different Speakers / Languages (males,
females, adults, kids)
Robustness in the Presence of Channel Errors
Low Memory Size and Low Computational Complexity
Low Coding Delay
![Page 4: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/4.jpg)
Coder delay
4
![Page 5: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/5.jpg)
Classification of speech coders
5
![Page 6: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/6.jpg)
Classification by coding technique
6
Waveform coders
preserve the original shape of the signal waveform, and hence
the resultant coders can generally be applied to any signal
source.
Data rates 24-64kbps
Can be measured by SNR
Parametric coders
the speech signal is assumed to be generated from a model,
which is controlled by some parameters
Does not preserve the shape of the signal
Low bit rates (can reach less than 2kbps)
![Page 7: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/7.jpg)
Classification by coding technique
7
Hybrid coders
Parametric + waveform
Assume a model, then add more parameters to reach a
waveform that is close to the original waveform
Medium bit rate
![Page 8: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/8.jpg)
Parametric speech coding
8
![Page 9: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/9.jpg)
Models
9
Human auditory systems
Speech production model
Phase perception
![Page 10: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/10.jpg)
Linear prediction
10
Basic idea: approximate each speech sample as a linear combination of the past few samples
Weights minimizes the mean square prediction error
The resultant weights are the Linear Prediction Coefficients (LPCs)
LPCs change from frame to frame
Another interpretation of LP is as a spectrum estimation method
By computing the LPCs of a signal frame, it is possible to generate another signal in such a way that the spectral contents are close to the original one
![Page 11: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/11.jpg)
Linear prediction
11
Prediction … redundancy removal
The problem of linear prediction
![Page 12: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/12.jpg)
Derivation of the LPCs
12
![Page 13: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/13.jpg)
Prediction Gain
13
![Page 14: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/14.jpg)
15
![Page 15: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/15.jpg)
For voiced frames, capture the envelop
16
![Page 16: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/16.jpg)
Reflection coefficients
17
There is a linear mapping between reflection coefficients and
the linear prediction coefficients
The effect of quantization of reflection coefficients is less
than the quantization of the LPC coefficients
![Page 17: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/17.jpg)
Long term linear prediction
18
Prediction order should be > pitch period to accurately
model voiced signals
Problem: time varying + high bit rate (many LPCs)
![Page 18: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/18.jpg)
Long Term Linear Prediction
19
![Page 19: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/19.jpg)
20
![Page 20: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/20.jpg)
Synthesis filters
21
![Page 21: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/21.jpg)
22
![Page 22: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/22.jpg)
Pre-emphasis of the speech waveform
23
To compensate the roll off of the high frequencies in the
spectrum
![Page 23: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/23.jpg)
Waveform CODECs
24
G.711
Objective: minimize average distortion.
You need to know the distribution of the input signal
G.711 standard
![Page 24: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/24.jpg)
G.726
25
![Page 25: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/25.jpg)
Vector quantization
26
-every pair of numbers falling
in a particular region are
approximated by a red star associated with that region
![Page 26: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/26.jpg)
27
![Page 27: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/27.jpg)
Linear Prediction Coding
28
FS1015, 2.4kbps, 1982
Originally for military applications. its synthetic output speech that often requires trained operators for reliable usage
Each frame has parameters
Encoder estimates paramters
![Page 28: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/28.jpg)
Linear Prediction Coding
29
Frame duration : 180 samples (22.5 ms)
![Page 29: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/29.jpg)
FS1015 (LPC10)
30
Input: 8kHz speech, PCM, 12 bits/sample
Frame size: 180 samples = 22.5 ms
Possible pitch periods = only 60 values
54 bits per frame. Hence bit rate = 54*8000/180 = 2400
![Page 30: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/30.jpg)
Advantages and disadvantages
31
Advantages:
Low bit rate
Very simple encoder and decoder
Disadvantages:
Sometimes the speech frame cannot be classified as strictly
voiced or unvoiced
The use of noise or impulse train is not a good modelling
Bad quality
Samples:
![Page 31: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/31.jpg)
REGULAR-PULSE EXCITATION CODERS
32
Multipulse excitation
Open loop
Use a certain criteria to select only few pulses of the prediction error
![Page 32: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/32.jpg)
33
Regular pulse excitation
![Page 33: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/33.jpg)
34
Closed loop (Analysis by Synthesis)
![Page 34: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/34.jpg)
GSM 6.10 (1988)
35
Regular pulse excited Long Term prediction (RPE-LTP)
Low computational cost
High quality reproduction
Robustness against channel errors
Coding efficiency
8 reflection coefficients
One LPC vector every 160 samples (20ms)
Selects one of 4 subsampled error sequences at each
subframe (40 samples)
![Page 35: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/35.jpg)
GSM 6.10 (1988)
36
![Page 36: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/36.jpg)
Code Excited Linear Prediction (CELP)
37
Excitation codebook can be fixed/adaptive ,
deterministic/random
No strict (Voiced/unvoiced) classification
![Page 37: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/37.jpg)
CELP
38
Analysis by synthesis
![Page 38: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/38.jpg)
39
![Page 39: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/39.jpg)
CELP
40
Advantages?
Disadvantages?
![Page 40: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/40.jpg)
G.728 (LD-CELP)
41
20 samples frames – Four 5 samples subframes
Pitch period: first coarse estimate, then a fine estimate
Compared to previous pitch to check for halving or doubling
Pitch: once per frame (obtained in decimated domain by a
factor of 4, then normal domain)
Bit rate: 16 kbps
![Page 41: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/41.jpg)
Vector Sum Excited Linear Prediction
(VSELP)
42
A CELP coder with a particular codebook structure having
reduced computational cost.
IS54 (7.96kbps) , GSM 6.20 (5.6kbps) “Half Rate"
Basic idea:
Form the codebook from some basis functions
G.729 uses CELP
![Page 42: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/42.jpg)
GSM EFR ACELP
43
A-CELP based
12.2kbps bit rate + 10.6kbps channel coding = 22.8kbps
ETSI AMR (Adaptive Multirate)
All coders based no ACELP
12.2 (EFR), 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, and 4.75 kbps.
![Page 43: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/43.jpg)
MELP (Mixed Excited Linear Prediction)
44
2.4 kbps
![Page 44: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/44.jpg)
Fourier Magnitudes
45
![Page 45: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/45.jpg)
Shaping filters
46
![Page 46: Multimedia communications ECP 610moodle.eece.cu.edu.eg/pluginfile.php/2112/mod_resource... · 2015. 5. 27. · Input: 8kHz speech, PCM, 12 bits/sample Frame size: 180 samples = 22.5](https://reader035.vdocuments.site/reader035/viewer/2022071609/61482ac4cee6357ef9252d84/html5/thumbnails/46.jpg)
MELP bit allocation
47