Download - Ban Dich DigitalAudio
-
8/10/2019 Ban Dich DigitalAudio
1/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 1
Phn 2: Nn Audio s
-
8/10/2019 Ban Dich DigitalAudio
2/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 2
Digital Audio Compression
-
8/10/2019 Ban Dich DigitalAudio
3/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 3
MPEG Audio: Specifications
MPEG-1 (ISO/IEC 11172-3) provides:
Single-channel ('mono') and two-channel ('stereo' or 'dual mono')coding of digitized sound waves at 32, 44.1, and 48 kHz
sampling rate. The predefined bit-rates range from 32 to 448 kbit/s for Layer I,
from 32 to 384 kbit/s for Layer II, and from 32 to 320 kbit/s forLayer III.
MPEG-2 BC (ISO/IEC 13818-3) provides: A backwards compatible (BC) multi-channel extension to
MPEG-1 Up to 5 main channels plus a 'low frequent enhancement' (LFE)
channel can be coded The bit-rate range is extended up to about 1 Mbit/s;
An extension of MPEG-1 towards lower sampling rates 16,22.05, and 24 kHz for bitrates from 32 to 256 kbit/s (Layer I)
and from 8 to 160 kbit/s (Layer II & Layer III).
-
8/10/2019 Ban Dich DigitalAudio
4/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 4
MPEG Audio: Specifications (2) MPEG-2 AAC (ISO/IEC 13818-7) provides
A very high-quality audio coding standard for 1 to 48 channels at samplingrates of 8 to 96 kHz, with multichannel, multilingual, and multiprogramcapabilities.
AAC works at bitrates from 8 kbit/s for a monophonic speech signal up to in
excess of 160 kbit/s/channelfor very-high-quality coding that permitsmultiple encode/decode cycles. Three profiles of AAC provide varying levels of complexity and scalability.
MPEG-4 (ISO/IEC 14496-3) provides Coding and composition of natural and synthetic audio objects, Scalability of the bitrate of an audio bitstream, Scalability of encoder or decoder complexity, Structured Audio: A universal language for score-driven sound synthesis TTSI: An interface for text-to-speech conversion systems.
MPEG-7 (ISO/IEC 15938) will provide Standardized descriptions and description schemes of audio structures
and sound content,
A language to specify such descriptions and description schemes.
-
8/10/2019 Ban Dich DigitalAudio
5/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 5
Related specifications
MUSICAM Masking pattern adapted Universal Sub-band Integrated
Coding And Multiplexing Designed to be suitable for DAB (Digital Audio Broadcasting)
ASPEC Adaptive Spectral Perceptual Entropy Coding Designed for high degrees of compression to allow audio
transmission on ISDN
NICAM 728 Used for European PAL television audio
Dolby AC-3 Design for ATSC Digital TV
-
8/10/2019 Ban Dich DigitalAudio
6/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 6
Tng quan v nn audio
Nn audio da vo hai hin tng sau: Th nht, vi tn hiu audio in hnh, khng phi mi
tn s u xut hinng thi. Th hai, do hin tng che mt n, thnh gic ca con
ngi khng th nhn bitc mi chi tit ca tnhiu audio.
C cu nn audio chia ph m thanh thnh cc bngbng cch lc hoc m ha bini, v s dng t dliu hn khi m t cc bng c bin thp.
Khi hin tng che mt n ngn cn hoc lm gimmc nghe thy ca mt bng c th, th lng d liucn gi cn c th gimi na.
-
8/10/2019 Ban Dich DigitalAudio
7/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 7
Background of audio compression
Audio compression takes advantage of two facts. First, in typical audio signals, not all frequencies are
simultaneously present. Second, because of thephenomenon of masking, human
hearing cannot perceive every detail of an audio signal.
Audio compression splits the audio spectrum intobands by filtering or transforms, and includesless data when describing bands in which the level
is low. Where maskingprevents or reduces audibilityof
a particular band, even less data needs to be sent.
-
8/10/2019 Ban Dich DigitalAudio
8/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 8
Tng quan v nn audio (2)
Nn audio kh hn nn video l do s chnh xc ca thnh gic.
1- S che mt n: Chc th che mt n khi m thanh che v m thanhc che l
trng nhau v khng gian. S trng nhau v khng gian lun tn ti ch thu mono
nhng khng c ch thu stereo
Do , h thng m thanh stereo v m thanh vng,ngi ta chpnhn h s nn thp tc mt cht lng xcnh.
2- Cht lng loa : Cng hng tr ca nhng loa cht lng kmchemt n cc
mo dng nhn to. Kimt ramt b nn bng loa cht lng km s cho kt qu sai, tn hiu c cht lng chp nhnc s gy tht vng khi nghechng bng loa tt.
-
8/10/2019 Ban Dich DigitalAudio
9/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 9
Background of audio compression (2)
Audio compression is relatively harder than video compressionbecause of the acuity of hearing.
1- Masking:
Masking only works properly when the masking and the maskedsounds coincide spatially. Spatial coincidence is always the case in mono recordings but
not in stereo recordings, where low-level signals can still beheard if they are in a different part of the soundstage.
Consequently, in stereo and surround sound systems, alower compression factor is allowable for a given quality.
2. Speakers quality: Delayed resonances in poor loudspeakers actually mask
compression artifacts. Testing a compressor with poor speakers gives a false result,
signals which are apparently satisfactory may be disappointingwhen heard on good equipment.
-
8/10/2019 Ban Dich DigitalAudio
10/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 10
c tnh nghe ca con ngi
Hnh pha trn cho thy ngng nghe cacon ngi l hmca tn s.
Tt nhin, nhy cao nht nm trong
vng ni. Hnh di m t ngng nghe khi c s
xut hin ca mn sc. Ch rng ngng nghec nng cao
vi cc m tn s cao v c tn s thp l hin tng che mt n.
Vi ph u vo phc tp, v d nh mnhc, ngng ngheu tng hu ht cc
tn s. Kt qu l, m x x ca ct-xt audio
tng t chc th nghec khi nhcim lng.
-
8/10/2019 Ban Dich DigitalAudio
11/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 11
The characteristics of human hearing
The top figure shows that the thresholdof hearing is a function of frequency.
Naturally, the greatest sensitivity isin the speech range.
The bottom figure shows the hearingthreshold in the presence of a singletone. Note that the threshold is raised for
tones at higher frequency and to someextent at lower frequency maskingeffect.
A complex input spectrum, such asmusic, raises the threshold at nearly all
frequencies. As a result, the hiss from an analog audio
cassette is only audible during quietpassages in music.
-
8/10/2019 Ban Dich DigitalAudio
12/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 12
c tnh nghe ca con ngi (2) m thanh phic xut hin
t nht 1ms trc khi chngtr nn c th nghec. Do s png chm ny,
hin tng che mt n vn cth xy ra ngay c khi hai tnhiu khng hin dinng
thi Hin tng che mt n trc
v che mt n sau xut hinkhi m che mt n tip tc chem thanh cc mc thp hntrc v sau khong thi giandin ramchemt n .
-
8/10/2019 Ban Dich DigitalAudio
13/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 13
The characteristics of human hearing (2)
A sound must be present forat least about 1 millisecondbefore it becomes audible. Because of this slow
response, masking can stilltake place even when the twosignals involved are not
simultaneous. Forward and backward
masking occur when themasking sound continues tomask sounds at lower levelsbefore and after the maskingsound's actual duration.
-
8/10/2019 Ban Dich DigitalAudio
14/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 14
S che mt n
S che mt n lm tng ngng nghe, ccb nn li dng hin tng ny ny bng
cch tng nhiu sn, cho php biu dinm thanh bng t bit hn.
Nhiu sn chc th tng tn s m ticnh hng ca s che mt n.
tia honh hng ca s che mt n,
cn phi chia ph audio ra lm cc bng tnkhc nhau cho phpa ra cc lng
nn/gin v nhiu khc nhau trong mi bng.
-
8/10/2019 Ban Dich DigitalAudio
15/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 15
Masking
Masking raises the threshold of hearing, compressors take advantage of this effect by
raising the noise floor, which allows the audiowaveform to be expressed with fewer bits.
The noise floor can only be raised at frequencies at
which there is effective masking. To maximize effective masking, it is necessary to
split the audio spectrum into different frequency
bands to allow introduction of different amounts ofcompanding and noise in each band.
-
8/10/2019 Ban Dich DigitalAudio
16/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 16
M hnh bm ho MPEG Audio tng qut
B lcbng con
Phnphi bit
PhtLung bit
Tnh tonngng
che mt n
u vo u ra
-
8/10/2019 Ban Dich DigitalAudio
17/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 17
MPEG Audio: General encoder model
Sub-band
Filter
Bit
Allocation
Bit-stream
Generation
Compute
Masking
Input Output
-
8/10/2019 Ban Dich DigitalAudio
18/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 18
Thut ton m ho MPEG Audio
Sdng cc b lc bng con chia tn hiu audio thnh 32 bng tn
con tngng vi 32 bng quan trng nht.
Tnh ton lng che mt n cho mi bng gy ra bi bng ln cn bngcch sdng m hnh tm l thnh gic.
Nu nng lng trong mt bng thp hn ngng che mt n , n s b
b qua. Mt khc, xcnh s bt cn thitbiu din h ssao cho
nhiu sinh ra do lng tha thp hn hiung che mt n (1bit tng
ng 6dB).
Pht ra lung bit.
-
8/10/2019 Ban Dich DigitalAudio
19/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 19
MPEG Audio encoding algorithm
Use sub-band filters to divide the audio signal into32 frequency sub-bands that approximate the 32
critical bands. Determine amount of masking for each band caused
by nearby band using the psychoacoustic model.
If the power in a band is below the maskingthreshold, ignore it. Otherwise, determine numberof bits needed to represent the coefficient such that
noise introduced by quantization is below themasking effect (1 bit 6 dB).
Generate bitstream
-
8/10/2019 Ban Dich DigitalAudio
20/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 20
MPEG Audio: V dm ha
Bng 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Mc (db) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1
Sau khi phn tch, mc ca 16 bngu tin trong 32bng
Mc ca bng th8 l 60dB. Nu c h o mc c h e mt n12 dBbng 7, 15dBbng 9, th: Mcbng 7 l 10 dB ( < 12 dB ), b qua.
Mcbng 9 l 35 dB ( > 15 dB), m ho.
C thm ho ln ti 2bit (=12db) ca sai s lng t.
-
8/10/2019 Ban Dich DigitalAudio
21/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 21
MPEG Audio: Coding example
Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Level (db) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1
After analysis, the levels of the first 16 of the 32 bands are:
The level of the 8th band is 60dB. If it gives a masking of 12
dB in the 7th band, 15dB in the 9th, then Level in 7th band is 10 dB ( < 12 dB ), ignore it.
Level in 9th band is 35 dB ( > 15 dB ), encode it.
Can encode with up to 2 bits (= 12 dB) of quantization error.
-
8/10/2019 Ban Dich DigitalAudio
22/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 22
M ho bng con _ Sub-band coding (SBC) - Nn gin
Hnh v bn m t mt b nn gin chia bng. B lc chia bng l mt tp hp cc bng hp,
pha tuyn tnh, chng gi ln nhau v c cngmt di thng.
u r a mi bng gm cc muc trng cho mt
dng sng. Trong mi bng tn, u vo audio c khuch
i ln mc cao nht trc khi truyni. Sau, mi mc s c quay tr v gi tr
chnh xc ca n Nhiu trnng truyn s c gim trong mi
bng. Nu so snh s gim nhiu vi ngng nghe ta
thy cc bng c th chp nhn mt lng nhiuln hn nh hin tng che mt n.
Do , trong mi bng sau khi nn-gin, c thgim di t ca cc mu.
Kthut nytc mt t s nn v nhiu gyra do gim phn giic che mt n.
-
8/10/2019 Ban Dich DigitalAudio
23/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 23
Sub-band coding (SBC) - Companding The Figure shows a band-splitting compandor. The band-splitting filter is a set of narrow-band,
linear-phase filters that overlap and all have thesame bandwidth.
The output in each band consists of samplesrepresenting a waveform.
In each frequency band, the audio input isamplified up to maximum level prior totransmission.
Afterwards, each level is returned to its correctvalue.
Noise picked up in the transmission is reduced ineach band. If the noise reduction is compared with the
threshold of hearing, it can be seen that greaternoise can be tolerated in some bands becauseof masking.
Consequently, in each band after companding,it is possible to reduce the wordlength ofsamples.
This technique achieves a compressionbecause the noise introduced by the loss ofresolution is masked.
-
8/10/2019 Ban Dich DigitalAudio
24/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 24
MPEG Audio Lp I
Hnh v trn m t b m ho chia bng dng trong MPEG lp I cn gin
ho. u vo audio s ca v o b lc chia bng, b ny chia ph ca tn hiu ra
lm cc bng (32 bng). Trc thi gianoc chia thnh cc khi c chiu di bng nhau. Trong MPEG Lp I, c 384 muu vo, do trongu vo ca b lc c 12 mu
trong mi bng ca lot 32 bng ny. Trong mi bng, tn hiuc khuchi ln tia nh mt php nhn H s khuchi y u cu l khngi trong khong thi gian ko di mt block. Mt h s tl c truyn cng vi mi block ca mi bng cho php qu trnh
co ngc li bn gii m.
-
8/10/2019 Ban Dich DigitalAudio
25/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 25
MPEG Audio Layer I
The figure shows a simplified bandsplitting coder used in MPEG Layer I.
The digital audio input is fed to a bandsplitting filter that divides the spectrum ofthe signal into a number of bands (32 bands). The time axis is divided into blocks of equal length. In MPEG Layer I, this is 384 input samples, so in the output of the filter there are
12 samples in each of 32 bands.
Within each band, the level is amplified by multiplication to bring the level upto maximum. The gain required is constant for the duration of a block A single scale factor is transmitted with each block for each bandin order
to allow the process to be reversed at the decoder.
-
8/10/2019 Ban Dich DigitalAudio
26/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 26
MPEG Audio Lp I (tip)
u r a caccb lc cngc phn tch xcnh ph catn hiuu vo.
S phn tch nyiu khin m hnh che mt n xcnhmc che mt n trong mi bng.
Kh nng thc hin che mt n cng ln th chnh xc cacc mu trong mi bng cng c th nh.
chnh xc mu b gimi bng cch lng t ho li gim di t.
S gim ny l khngi vi mi t trong mt bng, nhng cc
bng khc nhau c th s dng di t khc nhau di t cnc truyn di dng m phn phi bit cho mi
bng cho php b gii m gii m dng bit ng.
-
8/10/2019 Ban Dich DigitalAudio
27/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 27
MPEG Audio Layer I (cont.)
The filter bank output is also analyzed to determine thespectrum of the input signal.
This analysis drives a masking model that determines the
degree of masking that can be expected in each band. The more masking available, the less accurate the samples in
each band can be. The sample accuracy is reduced by requantizing to reduce
wordlength. This reduction is also constant for every word in a band, but
different bands can use different wordlengths.
The wordlength needs to be transmitted as a bit allocationcode for each band to allow the decoder to deserialize the bitstream properly.
MPEG d bi di 1
-
8/10/2019 Ban Dich DigitalAudio
28/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 28
MPEG dng bit audio mc 1
Hnh trn m t dng bit audio MPEG mc 1, n bao gm:
Mung b v phn mou. 32 t m phn phi bit, mi t 4 bit.
Nhng m ny m t di t ca c c mu trongmi bng con.
32 h s tl s dng trong vic nn-gin mi bng.
Cc h s tl ny xcnh khuchi cn thit trong b gii ma audio v mc chnh xc.
D liu audio trong mi bng.
MP G l 1 di bi
-
8/10/2019 Ban Dich DigitalAudio
29/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 29
MPEG Level 1 audio bit stream
The top Figure shows an MPEG Level 1 audio bit stream,
which includes: Synchronizing pattern and the header, 32 Bit allocation codes of four bits each.
These codes describe the wordlength of samples in eachsubband.
32 scale factors used in the companding of each band. These scale factors determine the gain needed in the decoder to
return the audio to the correct level. Audio data in each band.
MPEG b ii l I
-
8/10/2019 Ban Dich DigitalAudio
30/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 30
MPEG b gii m lp I
Tn hiung b c pht hin bi b nh thi, n cng tch cc dliu phn phi bit v d liu h s tl.
D liu phn phi bit sauc dng tch ra cc mu d liu(sample) c chiu di bini.
Qu trnh lng t ha li v qu trnh nnco ngc li bng
vic s dng d liu h s tl a mi bng tngng quay limc chnh xc ban u.
32 bng tch bit ny sauc ghp li vi nhau bng mt b lcghp sinh ra audio u ra.
MPEG L I d d
-
8/10/2019 Ban Dich DigitalAudio
31/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 31
MPEG Layer I decoder
The synchronization pattern is detected by the timinggenerator, which deserializes the bit allocation and scalefactor data.
The bit allocation data then allows deserialization of thevariable length samples.
The requantizing is reversed and the compression isreversed by the scale factor data to put each band back tothe correct level.
These 32 separate bands are then combined in a combinerfilter which produces the audio output.
MPEG A di khi i l
-
8/10/2019 Ban Dich DigitalAudio
32/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 32
MPEG Audio: khi nim lp
3 lp trong MPEG audio: L
p I, II, III
M hnh c bn ging nhau.
phc tp ca CODEC tng theo lp.
Bm ho lp cao hn c th gii m lung ca lp thp hn ( v d,
b gii m lp III c th gii m lung lp II, v..v ) M hnh tm l thnh gic con ngic dng xcnh m
phn phi bit cho mi bng con.
H s nn(Tc bit gc l 1,4 Mbps tngng vi cht lng audio CD)
1:4 Lp 1 (tngng 384 kbps cho tn hiu stereo).
1:6...1:8 Lp 2 (tngng 256..192 kbps cho tn hiu stereo).
1:10...1:12 Lp 3 (tngng 128..112 kbps cho tn hiu stereo).
MPEG A di Th f L
-
8/10/2019 Ban Dich DigitalAudio
33/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 33
MPEG Audio: The concept of Layers
Three layers in MPEG audio: Layer I, II, III Basic model is similar. CODEC complexity increases with each layer. Encoder of higher layer can decode stream of lower layer (e.g.
Layer III decoder can decode Layer II stream, etc) Psychoacoustic model is used to determine bit allocation to each
subband.
Compression Ratios (Original bitrate is 1,4 Mbps of CD quality audio)
1:4 by Layer 1 (corresponds to 384 kbps for a stereo signal),
1:6...1:8 by Layer 2 (corresponds to 256..192 kbps for a stereo signal),
1:10...1:12 by Layer 3 (corresponds to 128..112 kbps for a stereo signal),
MPEG A di L i bl
-
8/10/2019 Ban Dich DigitalAudio
34/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 34
MPEG Audio: Loi blc
Lp I: b lc DCT vi mt khung v tn s triu trong mi bng.
M hnh tm l thnh gic ch dng hin tng mt n tn s.
Lp II: Sdng 3 khung trong mt b lc (tng cng 1152 mu). M hnh tm l thnh gic sdng mt cht mt n thi gian.
Lp III: Sdng b lc bng tt hn (cc tn skhng bng nhau).
M hnh tm l thnh gic dng c hiung mt n thi gian.
C li dngd tha stereo.
Sdng bm ho Huffman.
MPEG A di Filt t p
-
8/10/2019 Ban Dich DigitalAudio
35/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 35
MPEG Audio: Filter type
Layer I: DCT type filter with one frame and equalfrequency spread per band
Psychoacoustic model only uses frequency masking. Layer II: Use three frames in filter (total 1152
samples)
Psychoacoustic models a little bit of the temporal masking. Layer III: Better critical band filter is used (non-
equal frequencies) Psychoacoustic model includes temporal masking effects. Takes into account stereo redundancy.
Uses Huffman coder.
B h A di MPEG 1 (L I & II)
-
8/10/2019 Ban Dich DigitalAudio
36/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 36
Bm ho Audio MPEG-1 (Layer I & II)
Analysisfilter bank Scaler Quantizer
Quantized
sampleencoder
Psychoacousticmodel
Bit-rateallocation
Bit-rateallocationencoder
Scalefactor
encoder
M
ultiplexer
Rn
32 subbands
0 to 31
PCMinput
Output
SFn
SMRn
SF = Scale factorR = Rate
SMR = Signal to Mask Ratio
MPEG 1 A di E d (L I & II)
-
8/10/2019 Ban Dich DigitalAudio
37/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 37
MPEG-1 Audio Encoder (Layer I & II)
Analysis
filter bank Scaler Quantizer
Quantized
sampleencoder
Psychoacousticmodel
Bit-rateallocation
Bit-rateallocationencoder
Scalefactor
encoder
M
ultiplexer
Rn
32 subbands
0 to 31
PCMinput
Output
SFn
SMRn
SF = Scale factorR = Rate
SMR = Signal to Mask Ratio
MPEG 1 b m ho Audio (tip)
-
8/10/2019 Ban Dich DigitalAudio
38/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 38
MPEG-1 b m ho Audio (tip)
Lung audio u vo chy qua mt bng lc chiau vo thnhnhiu bng con.
ng thi lung audio u voi qua m hnh tm l thnh gic
xcnh ts ca nng lng tn hiu vi mc che mt ncho mi bng con.
Khi phn phi bit s dng h s tn hiu trn mt n quytnh vic phn chia tng s bit c dng cho qu trnh lng tho tn hiu bng con gim thiu tia kh nng nghe thynhiu lng t ho.
Cui cng, b ghp knh ghp cc mu bng con c
lng t ha vnh dng cc d liu ny cng vi thng tinph thnh dng bit m ho.
Cc d liu ph thuc ty c th c chn vo trong lung bitm ho.
MPEG 1 Audio Encoder (cont )
-
8/10/2019 Ban Dich DigitalAudio
39/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 39
MPEG-1 Audio Encoder (cont.)
The input audio stream passes through a filter bank that dividesthe input into multiple subbands of frequency.
The input audio stream simultaneously passes through a
psychoacoustic model that determines the ratio of the signalenergy to the masking threshold for each subband. The bit- or noise allocation block uses the Signal-to-Mask
Ratios to decide how to apportion the total number of code
bits available for the quantization of the subband signals tominimize the audibility of the quantization noise. Finally, the multiplexer takes the representation of the quantized
subband samples and formats this data and side information into
a coded bitstream. Ancillary data not necessarily related to the audio stream can
be inserted within the coded bitstream.
MPEG Audio: ghp cc mu bng con
-
8/10/2019 Ban Dich DigitalAudio
40/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 40
MPEG Audio: ghp cc mu bng con
Lp I: 12 * 32 = 384 mu.
Lp II, III: 12* 3* 32 = 1152 mu.
12samples
12samples
12samples
Subbandfilter 1
Subbandfilter 2
Subbandfilter 32
12samples
12samples
12samples
12
samples
12
samples
12
samples
Layer Iframe
Layer II, IIIframe
.
.
Audiosamples
in
MPEG Audio: Subband sample grouping
-
8/10/2019 Ban Dich DigitalAudio
41/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 41
MPEG Audio: Subband sample grouping
Layer I: 12 * 32 = 384 samples, Layer II, III: 12* 3* 32 = 1152 samples
12samples
12samples
12samples
Subbandfilter 1
Subbandfilter 2
Subbandfilter 32
12samples
12samples
12samples
12
samples
12
samples
12
samples
Layer Iframe
Layer II, IIIframe
.
.
Audiosamples
in
M hnh tm l thnh gic: Lp I & II
-
8/10/2019 Ban Dich DigitalAudio
42/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 42
M hnh tm l thnh gic: Lp I & II
B tch nhn dng v phn tch cc thnh phn m thanh dng sinev cc m dng khng sine (ging nhiu) v kh nng che mt n cahai loi tn hiu ny khc nhau.
FastFourier
Transform(FFT)
Tonal/nontonalseparator
Computetonal
masking
thresholdfunction
Computenontonalmaskingthresholdfunction
Compute
signalpower
Compute
quietthreshold
CalculateMinimum
tonal
non
tonal Mn
Sn SMRn
PCMinput
512 or 1024frequencies
Maskingthresholdfunction
Psychoacoustic model: Layer I & II
-
8/10/2019 Ban Dich DigitalAudio
43/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 43
Psychoacoustic model: Layer I & II
The separator identifies and separates the tonal and noise-like components (non-tonal) of the audio signal because themasking abilities of the two types of signal differ.
FastFourier
Transform(FFT)
Tonal/nontonalseparator
Computetonal
masking
thresholdfunction
Computenontonalmaskingthresholdfunction
Compute
signalpower
Compute
quietthreshold
CalculateMinimum
tonal
non
tonal Mn
Sn SMRn
PCMinput
512 or 1024frequencies
Maskingthresholdfunction
MPEG-1 bm ho Audio Lp III (mp3)
-
8/10/2019 Ban Dich DigitalAudio
44/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 44
MPEG 1 bm ho Audio Lp III (mp3)
Analysisfilter bank
Scaler Quantizer
Quantizedsample
Huffman
encoder
Psycho-
acousticmodel
Calculate windows sizes,Scale factor bands,
Bit rate allocationand quantization taking
buffer fullness into account
Sideinformationencoder
Scalefactor
encoder
Multiplexer
32 subbands
0 to 31
PCM
input
Output
Scale_factors
MDCT
Buffer
Side
information
Buffer
fullnessSMRn
Sub-subbands
MPEG-1 Audio Layer III Encoder (mp3)
-
8/10/2019 Ban Dich DigitalAudio
45/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 45
MPEG 1 Audio Layer III Encoder (mp3)
Analysisfilter bank
Scaler Quantizer
Quantizedsample
Huffman
encoder
Psycho-
acousticmodel
Calculate windows sizes,Scale factor bands,
Bit rate allocationand quantization taking
buffer fullness into account
Sideinformationencoder
Scalefactor
encoder
Multiplexer
32 subbands
0 to 31
PCM
input
Output
Scale_factors
MDCT
Buff
er
Side
information
Buffer
fullnessSMRn
Sub-subbands
Dng khung ca 3 lp
-
8/10/2019 Ban Dich DigitalAudio
46/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 46
Dng khung ca 3 lp
SCFSI = Scale Factor Selection Information _ thng tin la chn h stl.
Thng tin ph ca khung mp3 = 17bytes (136 bit) trong ch knhn v 32 bytes (256 bits) trong ch knhi.
CRC l ty chn. Trong khi lp I chcha 384 mu th lp II v lp III cha 1152 mu. D liu chnh ca mp3 c th cha d liu ca cc khung hng xm
(xem cc slide sau)
Header(32)
CRC(0,16)
Bit Allocation(128,256)
Scale factor(0-384)
Samples Ancillarydata
Header
(32)
CRC
(0,16)
Bit Allocation
(128,256)
Scale factor
(0-384)
Samples Ancillary
data
SCFSI
(0-60)
Header
(32)
CRC
(0,16)
Side information
(136, 256)
Main Data
(may belong to other frames)
Ancillary
data
Layer I
Layer II
Layer III
Frame formats of 3 layers
-
8/10/2019 Ban Dich DigitalAudio
47/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 47
Frame formats of 3 layers
SCFSI = Scale Factor Selection Information Side Information of mp3 frame = 17 bytes (136 bits) in single
channel mode and 32 bytes (256 bits) in dual channel mode. CRC is optional While Layer I contains only 384 samples, Layer II and Layer III
contains 1152 samples Main data of mp3 may contain data of neighbor frames (See next
slide)
Header(32)
CRC(0,16)
Bit Allocation(128,256)
Scale factor(0-384)
Samples Ancillarydata
Header
(32)
CRC
(0,16)
Bit Allocation
(128,256)
Scale factor
(0-384)
Samples Ancillary
data
SCFSI
(0-60)
Header
(32)
CRC
(0,16)
Side information
(136, 256)
Main Data
(may belong to other frames)
Ancillary
data
Layer I
Layer II
Layer III
Khung mp3
-
8/10/2019 Ban Dich DigitalAudio
48/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 48
Khung mp3 Phn d liu chnh cha gi tr h s tl m ho v cc d liuc
m ho Huffman. Chiu di ca n p h thuc v o tc bit v chiu d i ca d liu ph thuc.
Chiu di ca phn h s tl ph thuc vo viccch s tl cc s
dng li hay khng, v cng ph thuc vo chiu di ca s (di hay ngn). H s tl c dng trong vic lng t ho li cc mu.
Do tnh cht ca m Huffman nn tc bit thayi theo thi gian trong
sut qu trnh m ho. C th dngnh dng VBR (tc bit thayi) kim sot vn ny,
nhng ccng dng nh truyn thng qung b th thng yu cu mttc bit c nh
Do ngi raa r a mt kthut gi l dtrbitcho php s dngkhong khng gian lu tr cc d liu cha s dngn ca mt khung cho mthoc hai khung tip theo.
MP3 frame
-
8/10/2019 Ban Dich DigitalAudio
49/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 49
The main data section contains the coded scale factor valuesand the Huffman coded frequency lines Its length depends on the bitrate and the length of the ancillary
data. The length of the scale factor part depends on whether scale
factors are reused, and also on the window length (short or long). The scale factors are used in the requantization of the
samples The demand for Huffman code bits varies with time during the
coding process. The variable bitrate format can be used to handle this, but a fixed
bitrate is often required for an application such as broadcasting Therefore there is also a bit reservoir technique that allows
unused main data storage in one frame to be used by up totwo consecutive frames
Khung mp3 - d tr bit
-
8/10/2019 Ban Dich DigitalAudio
50/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 50
g p
Thit k ca dng bit lp III ph hp hn vi nhucu thayi theo thi gianca b m ho
Ging vi lp II, lp III x l d liu audio trong cc khung c 1,152 mu.
Khc vi lp II, d liuc m ho th hin cc mu ny khng cn thitphi va kht vi khung c chiud i c nh trong dng bit m ho.
B m ha c th cho cc bit vo mt kho d tr khi n cn s bit t hn s bittrung bnh m ho mt khung.
MP3 frame Bit Reservoir
-
8/10/2019 Ban Dich DigitalAudio
51/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 51
The design of the Layer III bitstream better fits the encoder's timevarying demand on code bits.
As with Layer II, Layer III processes the audio data in frames of1,152 samples.
Unlike Layer II, the coded data representing these samples do notnecessarily fit into a fixed length frame in the code bitstream.
The encoder can donate bits to a reservoir when it needs fewerthan the average number of bits to code a frame.
Khung mp3 - d tr bit (2)
-
8/10/2019 Ban Dich DigitalAudio
52/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 52
g p ( )
Sau, khi b m ho cn nhiu hn s bittrung bnh ny, n c th mn cc bit trong
kho d tr. B m ho chc th mn cc bit cho
trong cc khung qu kh, n khng thmn cc bit cho t khung tng lai.
Dng bit MP3 bao gm 9-bit con tr, btu
d liu chnh, chraa chbyte khiuca d liu audio cho khung.
MP3 frame Bit Reservoir (2)
-
8/10/2019 Ban Dich DigitalAudio
53/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 53
( )
Later, when the encoder needs more thanthe average number of bits to code a frame,
it can borrow bits from the reservoir. The encoder can only borrow bits donated
from past frames; it cannot borrow fromfuture frames.
MP3 bitstream includes a 9-bit pointer,
"main_data_begin," with each frame's sideinformation pointing to the location of thestarting byte of the audio data for that frame.
MP3: phn tch tn s lai
-
8/10/2019 Ban Dich DigitalAudio
54/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 54
p
Mcch Tng phn gii tn s trong cc bng con c m
ho nhn thc tt hn. Cho php gim bt nhiu rng ca gy ra bi cc b lc
bng con. MDCT (Modified Discrete Cosine Transform) - Bini
cosin ri rc ci tin. 50% bini gi nhau Ca sMDCT ngn : 6 bng con ph (12 im DCT)
trong mi bng con. Phn gii thi gian tt hn. Ca sMDCT di: 18 bng con ph(36 im DCT)
trong mi bng con. Phn gii tn s tt hn.
MP3: Hybrid frequency analysis
-
8/10/2019 Ban Dich DigitalAudio
55/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 55
y q y y
Purpose Increase the frequency resolution in subbands for better
perceptural coding.
Allow for some cancelation of aliasing caused bypolyphase analysis subband filters.
MDCT (Modified Discrete Cosine Transform) 50% overlapped transform Short-window MDCT: 6 sub-subbands (12 point DCT) in
each subband. Better time resolution.
Long window MDCT: 18 sub-subbands (36 point DCT) ineach subband. Better frequency resolution.
B gii m mp3
-
8/10/2019 Ban Dich DigitalAudio
56/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 56
g p
MP3 Decoder
-
8/10/2019 Ban Dich DigitalAudio
57/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 57
c tnh MP3
-
8/10/2019 Ban Dich DigitalAudio
58/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 58
Sound quality Bandwidth Mode Bitrate Reduction ratio
Telephone sound 2.5 kHz mono 8 kbps * 96:1
Short wave 4.5 kHz mono 16 kbps 48:1
AM radio 7.5 kHz mono 32 kbps 24:1
FM radio 11 kHz stereo 56...64 kbps 26...24:1
Near-CD 15 kHz stereo 96 kbps 16:1
CD >15 kHz stereo 112..128kbps 14..12:1
MP3 Performance
-
8/10/2019 Ban Dich DigitalAudio
59/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 59
Sound quality Bandwidth Mode Bitrate Reduction ratio
Telephone sound 2.5 kHz mono 8 kbps * 96:1
Short wave 4.5 kHz mono 16 kbps 48:1
AM radio 7.5 kHz mono 32 kbps 24:1
FM radio 11 kHz stereo 56...64 kbps 26...24:1
Near-CD 15 kHz stereo 96 kbps 16:1
CD >15 kHz stereo 112..128kbps 14..12:1
MPEG-2 Audio
-
8/10/2019 Ban Dich DigitalAudio
60/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 60
S khc nhau gia MPEG-1 v MPEG-2 audioi vi stereo 2 knh.
Tc ly mu PCM mrng ti c cc tn s 16, 22.05,24 kHz.
Tcbit c mrng ti c mc thpn 8 kbits/s.
Bng lng t ho tt hncc tc thp hn. Ci thin hiu qum ho ca hs t l (scale_factor)
v ch cng (intensity_mode_stereo)lp III.
MPEG-2 Audio
-
8/10/2019 Ban Dich DigitalAudio
61/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 61
Difference between MPEG-1 and MPEG-2audio for two-channel stereo
Initial PCM sampling rate extends to include 16,22.05, 24 kHz.
Pre-assigned bitrates are extended to as low as 8
kbits/s. Provide better quantization tables for lower rates.
Improve the coding efficiency of the coding ofscale_factor and intensity_mode stereo in LayerIII.
MPEG-2 Audio: tng thch ngc
-
8/10/2019 Ban Dich DigitalAudio
62/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 62
nh ngh
a m thanh vng 5 knh
Tri trc (L), phi trc (R), trung tmtrc (C), cnh/sau tri (LS), cnh/sauphi (RS), v (ty chn) loa siu trm(low-frequency enhancement _ LFE)
B gii m MPEG-1 c th gii m tnhiu L v R.
Phng php m ho Knh L v R c m ho nh MPEG-
1.
Cc knh b sung c m ho nh dliu ph thuc trong lung audio MPEG-1.
3/2 stereo: L, R, C, LS, RS.
Stereo 5.1: L, R, C, LS, RS, LFE
MPEG-2 Audio: Backward Compatible (BC)
-
8/10/2019 Ban Dich DigitalAudio
63/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 63
Define a five-channel surroundsound Front left (L), front right (R), front
center (C), side/rear left (LS),side/rear right (RS), and (optional)
low-frequency enhancement (LFE) MPEG-1 decoder can decode the L
and R signal. Coding method:
L and R channels are coded asMPEG-1 does.
Additional channels are coded asancillary data in the MPEG-1 audio
stream. 3/2 stereo: L, R, C, LS, RS 5.1 channel stereo: L, R, C, LS, RS,
LFE
Khung MPEG-2 Audio
-
8/10/2019 Ban Dich DigitalAudio
64/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 64
n Theo hnh v, khung MPEG-2 audio l dng mrng ca khung MPEG-1, vi h tr a khung va
ngn ng.
Header CRC Bit Allocation Scale factor Samples Ancillary data 1SCFSI
MCHeader
MCCRC
MCBit Allocation
MCPredictor
MC Samples Multi-lingualCommentary
MCSCFSI
Ancillary data 2
Multi-Channel (MC) audio data information
MPEG-2 Audio frame
-
8/10/2019 Ban Dich DigitalAudio
65/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 65
As can be seen on the Figure, MPEG-2 Audio frameis an extension of MPEG-1 frame, which supports
multi-channel and multi-lingual.
Header CRC Bit Allocation Scale factor Samples Ancillary data 1SCFSI
MCHeader
MCCRC
MCBit Allocation
MCPredictor
MC Samples Multi-lingualCommentary
MCSCFSI
Ancillary data 2
Multi-Channel (MC) audio data information
S tng thch gia MPEG-2 BC and MPEG-1
-
8/10/2019 Ban Dich DigitalAudio
66/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 66
Layer ILayer I Layer IILayer II Layer IIILayer III
MPEG-1MPEG-1
Mono & Stereo32, 44.1, 48 Khz
Layer ILayer I Layer IILayer II Layer IIILayer III
LowFrequency
LowFrequency
Mono & Stereo18, 22.05, 24 Khz
Layer ILayer I Layer IILayer II Layer IIILayer III
Multi-Channel
Multi-Channel
5 channels32, 44.1, 48 Khz
MPEG-2MPEG-2
MPEG-2 BC and MPEG-1 compatibility
-
8/10/2019 Ban Dich DigitalAudio
67/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 67
Layer ILayer I Layer IILayer II Layer IIILayer III
MPEG-1MPEG-1
Mono & Stereo32, 44.1, 48 Khz
Layer ILayer I Layer IILayer II Layer IIILayer III
LowFrequency
LowFrequency
Mono & Stereo18, 22.05, 24 Khz
Layer ILayer I Layer IILayer II Layer IIILayer III
Multi-Channel
Multi-Channel
5 channels32, 44.1, 48 Khz
MPEG-2MPEG-2
MPEG-2 Audio: M ho audio tin tin (AAC)
-
8/10/2019 Ban Dich DigitalAudio
68/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 68
nng cao cht lng nn audio sdng cckthut mi nht
Cnc gi l MPEG-2 NBC (Non BackwardCompatible)
Tc ly mu PCM: 8 kHz n 96 kHz.
H trtmono lnn 48 knh audio
MPEG-2 Audio: Advanced Audio Coding (AAC)
-
8/10/2019 Ban Dich DigitalAudio
69/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 69
To further improve the quality of compressedaudio using state-of-the-art technologies.
It was designated as MPEG-2 NBC (NonBackward Compatible)
Initial PCM sampling rate: 8 kHz to 96 kHz.
Support from mono up to 48 audio channels
Nhng im quan trng
-
8/10/2019 Ban Dich DigitalAudio
70/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 70
Cc chun MPEG Audio C ch MPEG Audio
Thnh gic ca con ngi & hin tng mt n C ch m ho bng con SBC M hnh tm l thnh gic ca con ngi Khi nim cc lp
MPEG-1 M ho/gii m audio Lp I Lp II Lp III S khc nhau
MPEG-2 Audio BC MPEG-2 AAC (NBC)
Key Points
-
8/10/2019 Ban Dich DigitalAudio
71/71
9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 71
MPEG Audio Specifications MPEG Audio mechanism
Human hearing & Audio masking
Sub-band coding (SBC) mechanism
Psychoacoustic model
The concept of layers
MPEG-1 Audio encoding/decoding Layer I Layer II Layer III
Differences MPEG-2 Audio BC
MPEG-2 AAC (NBC)