Download - Ban Dich DigitalAudio

8/10/2019 Ban Dich DigitalAudio

1/71

9/14/2006 Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT 1

Phn 2: Nn Audio s


2/71


Digital Audio Compression


3/71


MPEG Audio: Specifications

MPEG-1 (ISO/IEC 11172-3) provides:

Single-channel ('mono') and two-channel ('stereo' or 'dual mono')coding of digitized sound waves at 32, 44.1, and 48 kHz

sampling rate. The predefined bit-rates range from 32 to 448 kbit/s for Layer I,

from 32 to 384 kbit/s for Layer II, and from 32 to 320 kbit/s forLayer III.

MPEG-2 BC (ISO/IEC 13818-3) provides: A backwards compatible (BC) multi-channel extension to

MPEG-1 Up to 5 main channels plus a 'low frequent enhancement' (LFE)

channel can be coded The bit-rate range is extended up to about 1 Mbit/s;

An extension of MPEG-1 towards lower sampling rates 16,22.05, and 24 kHz for bitrates from 32 to 256 kbit/s (Layer I)

and from 8 to 160 kbit/s (Layer II & Layer III).


4/71


MPEG Audio: Specifications (2) MPEG-2 AAC (ISO/IEC 13818-7) provides

A very high-quality audio coding standard for 1 to 48 channels at samplingrates of 8 to 96 kHz, with multichannel, multilingual, and multiprogramcapabilities.

AAC works at bitrates from 8 kbit/s for a monophonic speech signal up to in

excess of 160 kbit/s/channelfor very-high-quality coding that permitsmultiple encode/decode cycles. Three profiles of AAC provide varying levels of complexity and scalability.

MPEG-4 (ISO/IEC 14496-3) provides Coding and composition of natural and synthetic audio objects, Scalability of the bitrate of an audio bitstream, Scalability of encoder or decoder complexity, Structured Audio: A universal language for score-driven sound synthesis TTSI: An interface for text-to-speech conversion systems.

MPEG-7 (ISO/IEC 15938) will provide Standardized descriptions and description schemes of audio structures

and sound content,

A language to specify such descriptions and description schemes.


5/71


Related specifications

MUSICAM Masking pattern adapted Universal Sub-band Integrated

Coding And Multiplexing Designed to be suitable for DAB (Digital Audio Broadcasting)

ASPEC Adaptive Spectral Perceptual Entropy Coding Designed for high degrees of compression to allow audio

transmission on ISDN

NICAM 728 Used for European PAL television audio

Dolby AC-3 Design for ATSC Digital TV


6/71


Tng quan v nn audio

Nn audio da vo hai hin tng sau: Th nht, vi tn hiu audio in hnh, khng phi mi

tn s u xut hinng thi. Th hai, do hin tng che mt n, thnh gic ca con

ngi khng th nhn bitc mi chi tit ca tnhiu audio.

C cu nn audio chia ph m thanh thnh cc bngbng cch lc hoc m ha bini, v s dng t dliu hn khi m t cc bng c bin thp.

Khi hin tng che mt n ngn cn hoc lm gimmc nghe thy ca mt bng c th, th lng d liucn gi cn c th gimi na.


7/71


Background of audio compression

Audio compression takes advantage of two facts. First, in typical audio signals, not all frequencies are

simultaneously present. Second, because of thephenomenon of masking, human

hearing cannot perceive every detail of an audio signal.

Audio compression splits the audio spectrum intobands by filtering or transforms, and includesless data when describing bands in which the level

is low. Where maskingprevents or reduces audibilityof

a particular band, even less data needs to be sent.


8/71


Tng quan v nn audio (2)

Nn audio kh hn nn video l do s chnh xc ca thnh gic.

1- S che mt n: Chc th che mt n khi m thanh che v m thanhc che l

trng nhau v khng gian. S trng nhau v khng gian lun tn ti ch thu mono

nhng khng c ch thu stereo

Do , h thng m thanh stereo v m thanh vng,ngi ta chpnhn h s nn thp tc mt cht lng xcnh.

2- Cht lng loa : Cng hng tr ca nhng loa cht lng kmchemt n cc

mo dng nhn to. Kimt ramt b nn bng loa cht lng km s cho kt qu sai, tn hiu c cht lng chp nhnc s gy tht vng khi nghechng bng loa tt.


9/71


Background of audio compression (2)

Audio compression is relatively harder than video compressionbecause of the acuity of hearing.

1- Masking:

Masking only works properly when the masking and the maskedsounds coincide spatially. Spatial coincidence is always the case in mono recordings but

not in stereo recordings, where low-level signals can still beheard if they are in a different part of the soundstage.

Consequently, in stereo and surround sound systems, alower compression factor is allowable for a given quality.

2. Speakers quality: Delayed resonances in poor loudspeakers actually mask

compression artifacts. Testing a compressor with poor speakers gives a false result,

signals which are apparently satisfactory may be disappointingwhen heard on good equipment.


10/71


c tnh nghe ca con ngi

Hnh pha trn cho thy ngng nghe cacon ngi l hmca tn s.

Tt nhin, nhy cao nht nm trong

vng ni. Hnh di m t ngng nghe khi c s

xut hin ca mn sc. Ch rng ngng nghec nng cao

vi cc m tn s cao v c tn s thp l hin tng che mt n.

Vi ph u vo phc tp, v d nh mnhc, ngng ngheu tng hu ht cc

tn s. Kt qu l, m x x ca ct-xt audio

tng t chc th nghec khi nhcim lng.


11/71


The characteristics of human hearing

The top figure shows that the thresholdof hearing is a function of frequency.

Naturally, the greatest sensitivity isin the speech range.

The bottom figure shows the hearingthreshold in the presence of a singletone. Note that the threshold is raised for

tones at higher frequency and to someextent at lower frequency maskingeffect.

A complex input spectrum, such asmusic, raises the threshold at nearly all

frequencies. As a result, the hiss from an analog audio

cassette is only audible during quietpassages in music.


12/71


c tnh nghe ca con ngi (2) m thanh phic xut hin

t nht 1ms trc khi chngtr nn c th nghec. Do s png chm ny,

hin tng che mt n vn cth xy ra ngay c khi hai tnhiu khng hin dinng

thi Hin tng che mt n trc

v che mt n sau xut hinkhi m che mt n tip tc chem thanh cc mc thp hntrc v sau khong thi giandin ramchemt n .


13/71


The characteristics of human hearing (2)

A sound must be present forat least about 1 millisecondbefore it becomes audible. Because of this slow

response, masking can stilltake place even when the twosignals involved are not

simultaneous. Forward and backward

masking occur when themasking sound continues tomask sounds at lower levelsbefore and after the maskingsound's actual duration.


14/71


S che mt n

S che mt n lm tng ngng nghe, ccb nn li dng hin tng ny ny bng

cch tng nhiu sn, cho php biu dinm thanh bng t bit hn.

Nhiu sn chc th tng tn s m ticnh hng ca s che mt n.

tia honh hng ca s che mt n,

cn phi chia ph audio ra lm cc bng tnkhc nhau cho phpa ra cc lng

nn/gin v nhiu khc nhau trong mi bng.


15/71


Masking

Masking raises the threshold of hearing, compressors take advantage of this effect by

raising the noise floor, which allows the audiowaveform to be expressed with fewer bits.

The noise floor can only be raised at frequencies at

which there is effective masking. To maximize effective masking, it is necessary to

split the audio spectrum into different frequency

bands to allow introduction of different amounts ofcompanding and noise in each band.


16/71


M hnh bm ho MPEG Audio tng qut

B lcbng con

Phnphi bit

PhtLung bit

Tnh tonngng

che mt n

u vo u ra


17/71


MPEG Audio: General encoder model

Sub-band

Filter

Bit

Allocation

Bit-stream

Generation

Compute

Masking

Input Output


18/71


Thut ton m ho MPEG Audio

Sdng cc b lc bng con chia tn hiu audio thnh 32 bng tn

con tngng vi 32 bng quan trng nht.

Tnh ton lng che mt n cho mi bng gy ra bi bng ln cn bngcch sdng m hnh tm l thnh gic.

Nu nng lng trong mt bng thp hn ngng che mt n , n s b

b qua. Mt khc, xcnh s bt cn thitbiu din h ssao cho

nhiu sinh ra do lng tha thp hn hiung che mt n (1bit tng

ng 6dB).

Pht ra lung bit.


19/71


MPEG Audio encoding algorithm

Use sub-band filters to divide the audio signal into32 frequency sub-bands that approximate the 32

critical bands. Determine amount of masking for each band caused

by nearby band using the psychoacoustic model.

If the power in a band is below the maskingthreshold, ignore it. Otherwise, determine numberof bits needed to represent the coefficient such that

noise introduced by quantization is below themasking effect (1 bit 6 dB).

Generate bitstream


20/71


MPEG Audio: V dm ha

Bng 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Mc (db) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1

Sau khi phn tch, mc ca 16 bngu tin trong 32bng

Mc ca bng th8 l 60dB. Nu c h o mc c h e mt n12 dBbng 7, 15dBbng 9, th: Mcbng 7 l 10 dB ( < 12 dB ), b qua.

Mcbng 9 l 35 dB ( > 15 dB), m ho.

C thm ho ln ti 2bit (=12db) ca sai s lng t.


21/71


MPEG Audio: Coding example

Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Level (db) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1

After analysis, the levels of the first 16 of the 32 bands are:

The level of the 8th band is 60dB. If it gives a masking of 12

dB in the 7th band, 15dB in the 9th, then Level in 7th band is 10 dB ( < 12 dB ), ignore it.

Level in 9th band is 35 dB ( > 15 dB ), encode it.

Can encode with up to 2 bits (= 12 dB) of quantization error.


22/71


M ho bng con _ Sub-band coding (SBC) - Nn gin

Hnh v bn m t mt b nn gin chia bng. B lc chia bng l mt tp hp cc bng hp,

pha tuyn tnh, chng gi ln nhau v c cngmt di thng.

u r a mi bng gm cc muc trng cho mt

dng sng. Trong mi bng tn, u vo audio c khuch

i ln mc cao nht trc khi truyni. Sau, mi mc s c quay tr v gi tr

chnh xc ca n Nhiu trnng truyn s c gim trong mi

bng. Nu so snh s gim nhiu vi ngng nghe ta

thy cc bng c th chp nhn mt lng nhiuln hn nh hin tng che mt n.

Do , trong mi bng sau khi nn-gin, c thgim di t ca cc mu.

Kthut nytc mt t s nn v nhiu gyra do gim phn giic che mt n.


23/71


Sub-band coding (SBC) - Companding The Figure shows a band-splitting compandor. The band-splitting filter is a set of narrow-band,

linear-phase filters that overlap and all have thesame bandwidth.

The output in each band consists of samplesrepresenting a waveform.

In each frequency band, the audio input isamplified up to maximum level prior totransmission.

Afterwards, each level is returned to its correctvalue.

Noise picked up in the transmission is reduced ineach band. If the noise reduction is compared with the

threshold of hearing, it can be seen that greaternoise can be tolerated in some bands becauseof masking.

Consequently, in each band after companding,it is possible to reduce the wordlength ofsamples.

This technique achieves a compressionbecause the noise introduced by the loss ofresolution is masked.


24/71


MPEG Audio Lp I

Hnh v trn m t b m ho chia bng dng trong MPEG lp I cn gin

ho. u vo audio s ca v o b lc chia bng, b ny chia ph ca tn hiu ra

lm cc bng (32 bng). Trc thi gianoc chia thnh cc khi c chiu di bng nhau. Trong MPEG Lp I, c 384 muu vo, do trongu vo ca b lc c 12 mu

trong mi bng ca lot 32 bng ny. Trong mi bng, tn hiuc khuchi ln tia nh mt php nhn H s khuchi y u cu l khngi trong khong thi gian ko di mt block. Mt h s tl c truyn cng vi mi block ca mi bng cho php qu trnh

co ngc li bn gii m.


25/71


MPEG Audio Layer I

The figure shows a simplified bandsplitting coder used in MPEG Layer I.

The digital audio input is fed to a bandsplitting filter that divides the spectrum ofthe signal into a number of bands (32 bands). The time axis is divided into blocks of equal length. In MPEG Layer I, this is 384 input samples, so in the output of the filter there are

12 samples in each of 32 bands.

Within each band, the level is amplified by multiplication to bring the level upto maximum. The gain required is constant for the duration of a block A single scale factor is transmitted with each block for each bandin order

to allow the process to be reversed at the decoder.


26/71


MPEG Audio Lp I (tip)

u r a caccb lc cngc phn tch xcnh ph catn hiuu vo.

S phn tch nyiu khin m hnh che mt n xcnhmc che mt n trong mi bng.

Kh nng thc hin che mt n cng ln th chnh xc cacc mu trong mi bng cng c th nh.

chnh xc mu b gimi bng cch lng t ho li gim di t.

S gim ny l khngi vi mi t trong mt bng, nhng cc

bng khc nhau c th s dng di t khc nhau di t cnc truyn di dng m phn phi bit cho mi

bng cho php b gii m gii m dng bit ng.


27/71


MPEG Audio Layer I (cont.)

The filter bank output is also analyzed to determine thespectrum of the input signal.

This analysis drives a masking model that determines the

degree of masking that can be expected in each band. The more masking available, the less accurate the samples in

each band can be. The sample accuracy is reduced by requantizing to reduce

wordlength. This reduction is also constant for every word in a band, but

different bands can use different wordlengths.

The wordlength needs to be transmitted as a bit allocationcode for each band to allow the decoder to deserialize the bitstream properly.

MPEG d bi di 1


28/71


MPEG dng bit audio mc 1

Hnh trn m t dng bit audio MPEG mc 1, n bao gm:

Mung b v phn mou. 32 t m phn phi bit, mi t 4 bit.

Nhng m ny m t di t ca c c mu trongmi bng con.

32 h s tl s dng trong vic nn-gin mi bng.

Cc h s tl ny xcnh khuchi cn thit trong b gii ma audio v mc chnh xc.

D liu audio trong mi bng.

MP G l 1 di bi


29/71


MPEG Level 1 audio bit stream

The top Figure shows an MPEG Level 1 audio bit stream,

which includes: Synchronizing pattern and the header, 32 Bit allocation codes of four bits each.

These codes describe the wordlength of samples in eachsubband.

32 scale factors used in the companding of each band. These scale factors determine the gain needed in the decoder to

return the audio to the correct level. Audio data in each band.

MPEG b ii l I


30/71


MPEG b gii m lp I

Tn hiung b c pht hin bi b nh thi, n cng tch cc dliu phn phi bit v d liu h s tl.

D liu phn phi bit sauc dng tch ra cc mu d liu(sample) c chiu di bini.

Qu trnh lng t ha li v qu trnh nnco ngc li bng

vic s dng d liu h s tl a mi bng tngng quay limc chnh xc ban u.

32 bng tch bit ny sauc ghp li vi nhau bng mt b lcghp sinh ra audio u ra.

MPEG L I d d


31/71


MPEG Layer I decoder

The synchronization pattern is detected by the timinggenerator, which deserializes the bit allocation and scalefactor data.

The bit allocation data then allows deserialization of thevariable length samples.

The requantizing is reversed and the compression isreversed by the scale factor data to put each band back tothe correct level.

These 32 separate bands are then combined in a combinerfilter which produces the audio output.

MPEG A di khi i l


32/71


MPEG Audio: khi nim lp

3 lp trong MPEG audio: L

p I, II, III

M hnh c bn ging nhau.

phc tp ca CODEC tng theo lp.

Bm ho lp cao hn c th gii m lung ca lp thp hn ( v d,

b gii m lp III c th gii m lung lp II, v..v ) M hnh tm l thnh gic con ngic dng xcnh m

phn phi bit cho mi bng con.

H s nn(Tc bit gc l 1,4 Mbps tngng vi cht lng audio CD)

1:4 Lp 1 (tngng 384 kbps cho tn hiu stereo).

1:6...1:8 Lp 2 (tngng 256..192 kbps cho tn hiu stereo).

1:10...1:12 Lp 3 (tngng 128..112 kbps cho tn hiu stereo).

MPEG A di Th f L


33/71


MPEG Audio: The concept of Layers

Three layers in MPEG audio: Layer I, II, III Basic model is similar. CODEC complexity increases with each layer. Encoder of higher layer can decode stream of lower layer (e.g.

Layer III decoder can decode Layer II stream, etc) Psychoacoustic model is used to determine bit allocation to each

subband.

Compression Ratios (Original bitrate is 1,4 Mbps of CD quality audio)

1:4 by Layer 1 (corresponds to 384 kbps for a stereo signal),

1:6...1:8 by Layer 2 (corresponds to 256..192 kbps for a stereo signal),

1:10...1:12 by Layer 3 (corresponds to 128..112 kbps for a stereo signal),

MPEG A di L i bl


34/71


MPEG Audio: Loi blc

Lp I: b lc DCT vi mt khung v tn s triu trong mi bng.

M hnh tm l thnh gic ch dng hin tng mt n tn s.

Lp II: Sdng 3 khung trong mt b lc (tng cng 1152 mu). M hnh tm l thnh gic sdng mt cht mt n thi gian.

Lp III: Sdng b lc bng tt hn (cc tn skhng bng nhau).

M hnh tm l thnh gic dng c hiung mt n thi gian.

C li dngd tha stereo.

Sdng bm ho Huffman.

MPEG A di Filt t p


35/71


MPEG Audio: Filter type

Layer I: DCT type filter with one frame and equalfrequency spread per band

Psychoacoustic model only uses frequency masking. Layer II: Use three frames in filter (total 1152

samples)

Psychoacoustic models a little bit of the temporal masking. Layer III: Better critical band filter is used (non-

equal frequencies) Psychoacoustic model includes temporal masking effects. Takes into account stereo redundancy.

Uses Huffman coder.

B h A di MPEG 1 (L I & II)


36/71


Bm ho Audio MPEG-1 (Layer I & II)

Analysisfilter bank Scaler Quantizer

Quantized

sampleencoder

Psychoacousticmodel

Bit-rateallocation

Bit-rateallocationencoder

Scalefactor

encoder

M

ultiplexer

Rn

32 subbands

0 to 31

PCMinput

Output

SFn

SMRn

SF = Scale factorR = Rate

SMR = Signal to Mask Ratio

MPEG 1 A di E d (L I & II)


37/71


MPEG-1 Audio Encoder (Layer I & II)

Analysis

filter bank Scaler Quantizer

Quantized

sampleencoder

Psychoacousticmodel

Bit-rateallocation

Bit-rateallocationencoder

Scalefactor

encoder

M

ultiplexer

Rn

32 subbands

0 to 31

PCMinput

Output

SFn

SMRn

SF = Scale factorR = Rate

SMR = Signal to Mask Ratio

MPEG 1 b m ho Audio (tip)


38/71


MPEG-1 b m ho Audio (tip)

Lung audio u vo chy qua mt bng lc chiau vo thnhnhiu bng con.

ng thi lung audio u voi qua m hnh tm l thnh gic

xcnh ts ca nng lng tn hiu vi mc che mt ncho mi bng con.

Khi phn phi bit s dng h s tn hiu trn mt n quytnh vic phn chia tng s bit c dng cho qu trnh lng tho tn hiu bng con gim thiu tia kh nng nghe thynhiu lng t ho.

Cui cng, b ghp knh ghp cc mu bng con c

lng t ha vnh dng cc d liu ny cng vi thng tinph thnh dng bit m ho.

Cc d liu ph thuc ty c th c chn vo trong lung bitm ho.

MPEG 1 Audio Encoder (cont )


39/71


MPEG-1 Audio Encoder (cont.)

The input audio stream passes through a filter bank that dividesthe input into multiple subbands of frequency.

The input audio stream simultaneously passes through a

psychoacoustic model that determines the ratio of the signalenergy to the masking threshold for each subband. The bit- or noise allocation block uses the Signal-to-Mask

Ratios to decide how to apportion the total number of code

bits available for the quantization of the subband signals tominimize the audibility of the quantization noise. Finally, the multiplexer takes the representation of the quantized

subband samples and formats this data and side information into

a coded bitstream. Ancillary data not necessarily related to the audio stream can

be inserted within the coded bitstream.

MPEG Audio: ghp cc mu bng con


40/71


MPEG Audio: ghp cc mu bng con

Lp I: 12 * 32 = 384 mu.

Lp II, III: 12* 3* 32 = 1152 mu.

12samples

12samples

12samples

Subbandfilter 1

Subbandfilter 2

Subbandfilter 32

12samples

12samples

12samples

12

samples

12

samples

12

samples

Layer Iframe

Layer II, IIIframe

.

.

Audiosamples

in

MPEG Audio: Subband sample grouping


41/71


MPEG Audio: Subband sample grouping

Layer I: 12 * 32 = 384 samples, Layer II, III: 12* 3* 32 = 1152 samples

12samples

12samples

12samples

Subbandfilter 1

Subbandfilter 2

Subbandfilter 32

12samples

12samples

12samples

12

samples

12

samples

12

samples

Layer Iframe

Layer II, IIIframe

.

.

Audiosamples

in

M hnh tm l thnh gic: Lp I & II


42/71


M hnh tm l thnh gic: Lp I & II

B tch nhn dng v phn tch cc thnh phn m thanh dng sinev cc m dng khng sine (ging nhiu) v kh nng che mt n cahai loi tn hiu ny khc nhau.

FastFourier

Transform(FFT)

Tonal/nontonalseparator

Computetonal

masking

thresholdfunction

Computenontonalmaskingthresholdfunction

Compute

signalpower

Compute

quietthreshold

CalculateMinimum

tonal

non

tonal Mn

Sn SMRn

PCMinput

512 or 1024frequencies

Maskingthresholdfunction

Psychoacoustic model: Layer I & II


43/71


Psychoacoustic model: Layer I & II

The separator identifies and separates the tonal and noise-like components (non-tonal) of the audio signal because themasking abilities of the two types of signal differ.

FastFourier

Transform(FFT)

Tonal/nontonalseparator

Computetonal

masking

thresholdfunction

Computenontonalmaskingthresholdfunction

Compute

signalpower

Compute

quietthreshold

CalculateMinimum

tonal

non

tonal Mn

Sn SMRn

PCMinput

512 or 1024frequencies

Maskingthresholdfunction

MPEG-1 bm ho Audio Lp III (mp3)


44/71


MPEG 1 bm ho Audio Lp III (mp3)

Analysisfilter bank

Scaler Quantizer

Quantizedsample

Huffman

encoder

Psycho-

acousticmodel

Calculate windows sizes,Scale factor bands,

Bit rate allocationand quantization taking

buffer fullness into account

Sideinformationencoder

Scalefactor

encoder

Multiplexer

32 subbands

0 to 31

PCM

input

Output

Scale_factors

MDCT

Buffer

Side

information

Buffer

fullnessSMRn

Sub-subbands

MPEG-1 Audio Layer III Encoder (mp3)


45/71


MPEG 1 Audio Layer III Encoder (mp3)

Analysisfilter bank

Scaler Quantizer

Quantizedsample

Huffman

encoder

Psycho-

acousticmodel

Calculate windows sizes,Scale factor bands,

Bit rate allocationand quantization taking

buffer fullness into account

Sideinformationencoder

Scalefactor

encoder

Multiplexer

32 subbands

0 to 31

PCM

input

Output

Scale_factors

MDCT

Buff

er

Side

information

Buffer

fullnessSMRn

Sub-subbands

Dng khung ca 3 lp


46/71


Dng khung ca 3 lp

SCFSI = Scale Factor Selection Information _ thng tin la chn h stl.

Thng tin ph ca khung mp3 = 17bytes (136 bit) trong ch knhn v 32 bytes (256 bits) trong ch knhi.

CRC l ty chn. Trong khi lp I chcha 384 mu th lp II v lp III cha 1152 mu. D liu chnh ca mp3 c th cha d liu ca cc khung hng xm

(xem cc slide sau)

Header(32)

CRC(0,16)

Bit Allocation(128,256)

Scale factor(0-384)

Samples Ancillarydata

Header

(32)

CRC

(0,16)

Bit Allocation

(128,256)

Scale factor

(0-384)

Samples Ancillary

data

SCFSI

(0-60)

Header

(32)

CRC

(0,16)

Side information

(136, 256)

Main Data

(may belong to other frames)

Ancillary

data

Layer I

Layer II

Layer III

Frame formats of 3 layers


47/71


Frame formats of 3 layers

SCFSI = Scale Factor Selection Information Side Information of mp3 frame = 17 bytes (136 bits) in single

channel mode and 32 bytes (256 bits) in dual channel mode. CRC is optional While Layer I contains only 384 samples, Layer II and Layer III

contains 1152 samples Main data of mp3 may contain data of neighbor frames (See next

slide)

Header(32)

CRC(0,16)

Bit Allocation(128,256)

Scale factor(0-384)

Samples Ancillarydata

Header

(32)

CRC

(0,16)

Bit Allocation

(128,256)

Scale factor

(0-384)

Samples Ancillary

data

SCFSI

(0-60)

Header

(32)

CRC

(0,16)

Side information

(136, 256)

Main Data

(may belong to other frames)

Ancillary

data

Layer I

Layer II

Layer III

Khung mp3


48/71


Khung mp3 Phn d liu chnh cha gi tr h s tl m ho v cc d liuc

m ho Huffman. Chiu di ca n p h thuc v o tc bit v chiu d i ca d liu ph thuc.

Chiu di ca phn h s tl ph thuc vo viccch s tl cc s

dng li hay khng, v cng ph thuc vo chiu di ca s (di hay ngn). H s tl c dng trong vic lng t ho li cc mu.

Do tnh cht ca m Huffman nn tc bit thayi theo thi gian trong

sut qu trnh m ho. C th dngnh dng VBR (tc bit thayi) kim sot vn ny,

nhng ccng dng nh truyn thng qung b th thng yu cu mttc bit c nh

Do ngi raa r a mt kthut gi l dtrbitcho php s dngkhong khng gian lu tr cc d liu cha s dngn ca mt khung cho mthoc hai khung tip theo.

MP3 frame


49/71


The main data section contains the coded scale factor valuesand the Huffman coded frequency lines Its length depends on the bitrate and the length of the ancillary

data. The length of the scale factor part depends on whether scale

factors are reused, and also on the window length (short or long). The scale factors are used in the requantization of the

samples The demand for Huffman code bits varies with time during the

coding process. The variable bitrate format can be used to handle this, but a fixed

bitrate is often required for an application such as broadcasting Therefore there is also a bit reservoir technique that allows

unused main data storage in one frame to be used by up totwo consecutive frames

Khung mp3 - d tr bit


50/71


g p

Thit k ca dng bit lp III ph hp hn vi nhucu thayi theo thi gianca b m ho

Ging vi lp II, lp III x l d liu audio trong cc khung c 1,152 mu.

Khc vi lp II, d liuc m ho th hin cc mu ny khng cn thitphi va kht vi khung c chiud i c nh trong dng bit m ho.

B m ha c th cho cc bit vo mt kho d tr khi n cn s bit t hn s bittrung bnh m ho mt khung.

MP3 frame Bit Reservoir


51/71


The design of the Layer III bitstream better fits the encoder's timevarying demand on code bits.

As with Layer II, Layer III processes the audio data in frames of1,152 samples.

Unlike Layer II, the coded data representing these samples do notnecessarily fit into a fixed length frame in the code bitstream.

The encoder can donate bits to a reservoir when it needs fewerthan the average number of bits to code a frame.

Khung mp3 - d tr bit (2)


52/71


g p ( )

Sau, khi b m ho cn nhiu hn s bittrung bnh ny, n c th mn cc bit trong

kho d tr. B m ho chc th mn cc bit cho

trong cc khung qu kh, n khng thmn cc bit cho t khung tng lai.

Dng bit MP3 bao gm 9-bit con tr, btu

d liu chnh, chraa chbyte khiuca d liu audio cho khung.

MP3 frame Bit Reservoir (2)


53/71


( )

Later, when the encoder needs more thanthe average number of bits to code a frame,

it can borrow bits from the reservoir. The encoder can only borrow bits donated

from past frames; it cannot borrow fromfuture frames.

MP3 bitstream includes a 9-bit pointer,

"main_data_begin," with each frame's sideinformation pointing to the location of thestarting byte of the audio data for that frame.

MP3: phn tch tn s lai


54/71


p

Mcch Tng phn gii tn s trong cc bng con c m

ho nhn thc tt hn. Cho php gim bt nhiu rng ca gy ra bi cc b lc

bng con. MDCT (Modified Discrete Cosine Transform) - Bini

cosin ri rc ci tin. 50% bini gi nhau Ca sMDCT ngn : 6 bng con ph (12 im DCT)

trong mi bng con. Phn gii thi gian tt hn. Ca sMDCT di: 18 bng con ph(36 im DCT)

trong mi bng con. Phn gii tn s tt hn.

MP3: Hybrid frequency analysis


55/71


y q y y

Purpose Increase the frequency resolution in subbands for better

perceptural coding.

Allow for some cancelation of aliasing caused bypolyphase analysis subband filters.

MDCT (Modified Discrete Cosine Transform) 50% overlapped transform Short-window MDCT: 6 sub-subbands (12 point DCT) in

each subband. Better time resolution.

Long window MDCT: 18 sub-subbands (36 point DCT) ineach subband. Better frequency resolution.

B gii m mp3


56/71


g p

MP3 Decoder


57/71


c tnh MP3


58/71


Sound quality Bandwidth Mode Bitrate Reduction ratio

Telephone sound 2.5 kHz mono 8 kbps * 96:1

Short wave 4.5 kHz mono 16 kbps 48:1

AM radio 7.5 kHz mono 32 kbps 24:1

FM radio 11 kHz stereo 56...64 kbps 26...24:1

Near-CD 15 kHz stereo 96 kbps 16:1

CD >15 kHz stereo 112..128kbps 14..12:1

MP3 Performance


59/71


Sound quality Bandwidth Mode Bitrate Reduction ratio

Telephone sound 2.5 kHz mono 8 kbps * 96:1

Short wave 4.5 kHz mono 16 kbps 48:1

AM radio 7.5 kHz mono 32 kbps 24:1

FM radio 11 kHz stereo 56...64 kbps 26...24:1

Near-CD 15 kHz stereo 96 kbps 16:1

CD >15 kHz stereo 112..128kbps 14..12:1

MPEG-2 Audio


60/71


S khc nhau gia MPEG-1 v MPEG-2 audioi vi stereo 2 knh.

Tc ly mu PCM mrng ti c cc tn s 16, 22.05,24 kHz.

Tcbit c mrng ti c mc thpn 8 kbits/s.

Bng lng t ho tt hncc tc thp hn. Ci thin hiu qum ho ca hs t l (scale_factor)

v ch cng (intensity_mode_stereo)lp III.

MPEG-2 Audio


61/71


Difference between MPEG-1 and MPEG-2audio for two-channel stereo

Initial PCM sampling rate extends to include 16,22.05, 24 kHz.

Pre-assigned bitrates are extended to as low as 8

kbits/s. Provide better quantization tables for lower rates.

Improve the coding efficiency of the coding ofscale_factor and intensity_mode stereo in LayerIII.

MPEG-2 Audio: tng thch ngc


62/71


nh ngh

a m thanh vng 5 knh

Tri trc (L), phi trc (R), trung tmtrc (C), cnh/sau tri (LS), cnh/sauphi (RS), v (ty chn) loa siu trm(low-frequency enhancement _ LFE)

B gii m MPEG-1 c th gii m tnhiu L v R.

Phng php m ho Knh L v R c m ho nh MPEG-

1.

Cc knh b sung c m ho nh dliu ph thuc trong lung audio MPEG-1.

3/2 stereo: L, R, C, LS, RS.

Stereo 5.1: L, R, C, LS, RS, LFE

MPEG-2 Audio: Backward Compatible (BC)


63/71


Define a five-channel surroundsound Front left (L), front right (R), front

center (C), side/rear left (LS),side/rear right (RS), and (optional)

low-frequency enhancement (LFE) MPEG-1 decoder can decode the L

and R signal. Coding method:

L and R channels are coded asMPEG-1 does.

Additional channels are coded asancillary data in the MPEG-1 audio

stream. 3/2 stereo: L, R, C, LS, RS 5.1 channel stereo: L, R, C, LS, RS,

LFE

Khung MPEG-2 Audio


64/71


n Theo hnh v, khung MPEG-2 audio l dng mrng ca khung MPEG-1, vi h tr a khung va

ngn ng.

Header CRC Bit Allocation Scale factor Samples Ancillary data 1SCFSI

MCHeader

MCCRC

MCBit Allocation

MCPredictor

MC Samples Multi-lingualCommentary

MCSCFSI

Ancillary data 2

Multi-Channel (MC) audio data information

MPEG-2 Audio frame


65/71


As can be seen on the Figure, MPEG-2 Audio frameis an extension of MPEG-1 frame, which supports

multi-channel and multi-lingual.

Header CRC Bit Allocation Scale factor Samples Ancillary data 1SCFSI

MCHeader

MCCRC

MCBit Allocation

MCPredictor

MC Samples Multi-lingualCommentary

MCSCFSI

Ancillary data 2

Multi-Channel (MC) audio data information

S tng thch gia MPEG-2 BC and MPEG-1


66/71


Layer ILayer I Layer IILayer II Layer IIILayer III

MPEG-1MPEG-1

Mono & Stereo32, 44.1, 48 Khz


LowFrequency

LowFrequency



Multi-Channel

Multi-Channel

5 channels32, 44.1, 48 Khz

MPEG-2MPEG-2

MPEG-2 BC and MPEG-1 compatibility


67/71



MPEG-1MPEG-1



LowFrequency

LowFrequency



Multi-Channel

Multi-Channel

5 channels32, 44.1, 48 Khz

MPEG-2MPEG-2

MPEG-2 Audio: M ho audio tin tin (AAC)


68/71


nng cao cht lng nn audio sdng cckthut mi nht

Cnc gi l MPEG-2 NBC (Non BackwardCompatible)

Tc ly mu PCM: 8 kHz n 96 kHz.

H trtmono lnn 48 knh audio

MPEG-2 Audio: Advanced Audio Coding (AAC)


69/71


To further improve the quality of compressedaudio using state-of-the-art technologies.

It was designated as MPEG-2 NBC (NonBackward Compatible)

Initial PCM sampling rate: 8 kHz to 96 kHz.

Support from mono up to 48 audio channels

Nhng im quan trng


70/71


Cc chun MPEG Audio C ch MPEG Audio

Thnh gic ca con ngi & hin tng mt n C ch m ho bng con SBC M hnh tm l thnh gic ca con ngi Khi nim cc lp

MPEG-1 M ho/gii m audio Lp I Lp II Lp III S khc nhau

MPEG-2 Audio BC MPEG-2 AAC (NBC)

Key Points


71/71


MPEG Audio Specifications MPEG Audio mechanism

Human hearing & Audio masking

Sub-band coding (SBC) mechanism

Psychoacoustic model

The concept of layers

MPEG-1 Audio encoding/decoding Layer I Layer II Layer III

Differences MPEG-2 Audio BC

MPEG-2 AAC (NBC)

Download - Ban Dich DigitalAudio

Top Related