1 digitial audio coding perceptual coding international standard mpeg1 layer3
Post on 22-Dec-2015
234 views
TRANSCRIPT
3
Perceptual Coding
聽覺敏感度的不均勻性 Non-uniform sensitivity
聽覺在頻譜上的遮罩性 Spectral masking
聽覺在時軸上的遮罩性 Temporal masking
4
Non-uniform sensitivity
Critical bandBand no. Center freq Bandwidth
1 50 ~100
2 150 100~200
3 250 200~300
4 350 300~400
.
.
.
.
.
.
.
.
.
25 19500 15500~
7
Masking curve
0 5 10 15 20 25 30 3545
50
55
60
65
70
75
80Minimum masking threshold
Subband number
dB
8
影音編碼的國際標準
制訂機構 音訊標準 代表應用 定案時間
ITU-T G.728/G.722 Videoconference 1991
ISO ( MPEG-1 )
11172-3 PC multi-media 1992
ISO/ITU-T ( MPEG-2 )
13818-3 Digital movie 1995
ITU-T G723.1 Videophone 1996
ISO ( AAC ) 13818-7 Digital audio 1997
9
ISO MPEG-1之來源
Simplified MUSICAM => MPEG-1 Layer 1 Modified MUSICAM => MPEG-1 Layer 2 Modified ASPEC => MPEG-1 Layer 3 ( MP3 )
Sound quality : ASPEC 優於 MUSICAM Complexity : MUSICAM 簡捷過 ASPEC Coding delay : MUSICAM 短於 ASPEC
10
MPEG-1 Layer I ,II
Scaler and Quantizer
mux
Digital Channel
Dynamic Bit and Scalefactor Allocator and
Coder
Masking Thresholds
Analysis Filterbank
Dynamic Bit and Scalefact
Decoder
FFT
Demux
Dequantizer and Descaler
Synthesis Filterband
PCM INPUT
PCM OUTPUT
11
MPEG-1 Layer III
Scaler and Quantizer
mux
Digital Channel
Coding of Side Information
Masking Thresholds
Analysis Filterbank
Decoding of Side
Information
FFT
Demux
Dequantizer and Descaler
Synthesis Filterband
Inverse MDCT with Dynamic
Widowing
Huffman Coding
Huffman Decoding
MDCT with Dynamic
Windowing
12
MPEG-1 Audio 各 Layer特性MPEG-1 Layer I Layer II Layer III
Analysis/synthesis 32 subbands 32 subbands Subband+MDCT
Output bit-rate 32-448kbps 32-384kbps 32-320kbps
Effcient bit-rate 160-224kbps 96-128kbps 64-96kbps
Sampling freq. 32,44.1,48kHz 32,44.1,48kHz 32,44.1,48kHz
Intensity stereo Yes Yes Yes
Quantization Uniform Uniform Non-uniform
Window Fixed Fixed Dynamic
Entropy coding No No Yes
Slot size 4 bytes 1 bytes 1 bytes
Frame size 384 samples 1152 samples 1152 samples
Bit-allocation representation
Explicit Indexing Indexing
Frame self-decodable Yes Yes No
Suggested psychoacoustic model
Model 1 Model 1 Model 2
14
Syntax of MPEG-1 Audio
Frame Frame Frame Frame Frame
Header Error check Audio data Ancillary data
Syncword
(12)
ID
(1)
Layer
(2)
Protection bit
(1)
Bit rate index
(4)
Sampling frequency
(2)
Padding bit( 1 )
Private bit
( 1 )
Mode( 2 )
Mode extension
( 2)
Copy right( 1)
Org./copy( 1)
Emphasis( 2)
15
Header( 1)
Syncword = ‘1111 1111 1111’
ID :
Layer :
Protection bit :
reserved
audioMPEG
0
11
reserved
layer
layer
layer
00
301
210
111
absentcheckCRC
presentcheckCRC
1
0
16
Header ( 2 ) bit rate index
Bit-rate index Layer I Layer II Layer III
0000 Free format Free format Free format
0001 32 kbits/s 32 kbits/s 32 kbits/s
0010 64 kbits/s 48 kbits/s 40 kbits/s
0011 96 kbits/s 56 kbits/s 48 kbits/s
0100 128 kbits/s 64 kbits/s 56 kbits/s
0101 160 kbits/s 80 kbits/s 64 kbits/s
0110 192 kbits/s 96 kbits/s 80 kbits/s
0111 224 kbits/s 112 kbits/s 96 kbits/s
1000 256 kbits/s 128 kbits/s 112 kbits/s
1001 288 kbits/s 160 kbits/s 128 kbits/s
1010 320 kbits/s 192 kbits/s 160 kbits/s
1011 352 kbits/s 224 kbits/s 192 kbits/s
1100 384 kbits/s 256 kbits/s 224 kbits/s
1101 416 kbits/s 320 kbits/s 256 kbits/s
1110 448 kbits/s 384 kbits/s 320 kbits/s
1111 forbidden forbidden forbidden
17
Header ( 3 )
Sampling frequency :
Padding bit :
Private bit : 1 bit for private usage
reserved
KHz
KHz
KHz
11
3210
4801
1.4400
slotpaddingwith
slotpaddingwithout
1
0
18
Header( 4)
Mode :
Mode extension :
Copy right :” 1”means copyright protected
channelgle
channeldual
stereojo
stereo
sin11
10
int01
00
),(.311611
),(.311210
),(.31801
),(.31400
IIISJin
IIISJin
IIISJin
IIISJin
19
Joint Stereo
Stereo Joint Stereo
MS stereo( Layer III only)Left/Right are transform to Middle/Side
Intensity stereo Layer I II: Specify two channel scale factors on
common dataLayer III: Specify one ration on common data
20
Header( 5)
Original/copy :
Emphasis :
17.11
10
sec15/5001
00
CCITTJ
reserved
ondsmicro
emphasisno
original
copy
1
0
21
Error Check
16 bits parity check word is used for optional error detection
The generator polynomial is :
1)( 21516 XXXXGX
22
Protection Range
Layer I : Bits 16-31 of header Bit allocation
Layer II : Bits 16-31 of header Bit allocation Scalefactor selection information
Layer III : Bits 16-31 of header Side information
136 bits(single channel) 256 bits (other modes)
Header Error check Audio data Anillary data
23
Layer III Coding(MP3)
32 bands Filterbank
18 LineMDCT
Non-uniformQuantization
HuffmanCoding
multiplex
Bit allocation
PerceptionModelMaskingcalculation
Auido in576 spectral line
Side information
Bit streamoutput
24
Advance Features in Layer III
Higher ferquency resolution ( 576 spectral lines ) Variable length of data frame Adaptive windowing Variable length coding ( 32 VLC tables ) Two granules in one frame Spectral region partition Non-uniform Scale Factor Bands Special stereo mode extension
25
Variable Length of Main Data
Frame is not self-decodable
frame1 frame3frame2 frame4
Sy
nc inf
o Sy
nc inf
o Sy
nc inf
o Sy
nc inf
o
Main_data_begin1
Main_data_begin2
Main_data_begin3
Main_data_begin4
frame1 frame3frame2 frame4
Sy
nc inf
o Sy
nc inf
o Sy
nc inf
o Sy
nc inf
o
Main_data_begin1
Main_data_begin2
Main_data_begin3
28
Variable Length Codigg
Symbol Code Probability
A 00 0.5
B 01 0.25
C 10 0.125
D 11 0.125
Symbol Code Probability
A 0 0.5
B 10 0.25
C 110 0.125
D 111 0.125
Fixed length coding Variable length coding
2*0.5+2*0.25+2*0.125+2*0.125=2
1*0.5+2*0.25+3*0.125+3*0.125=1.375
29
Granule
1152 samplesFilter banks
and transform
Granule0576lines
Spectral lines
Granule1576lines
Spectral lines
30
Spectral region partition
Big value range ( coding by pair) One value range ( coding by quadruple) Zero range
2*big_values
576
2*big_values+4*count1
31
Non-uniform scale Factor Bands( Long,32K)
SF BAND Rang
0 0~3
1 4~7
2 8~11
3 12~15
4 16~19
5 20~23
6 24~29
7 30~35
8 36~43
9 44~53
10 54~65
SF BAND Rang
11 66~81
12 82~101
13 102~125
14 126~155
15 156~193
16 194~239
17 240~295
18 296~363
19 364~447
20 448~549
32
Non-uniform Scale factor Bands( Short,32k)
SF BAND Rang
0 0~3
1 4~7
2 8~11
3 12~15
4 16~21
5 22~29
6 30~41
7 42~57
8 58~77
9 78~103
10 104~137
11 138~179
33
Layer III Mode Extension
When mode = ’01’
Mode_extension Ms_stereo Intensity_stereo
00 - -
01 - On
10 On -
11 On On
34
MS_stereo
Sum/difference instead of left/right When difference<<sum, this mode is effective, S
channel will become sparse
S
M
R
L
11
11
2
1
R
L
S
M
11
11
2
1
35
Intensity Stereo( Layer III)
Stereo:
Intensity stereo:
iOii
i
iOii
rLR
r
rLL
1
1*
1*
Where ri = )12
tan( ip
36
MS+Intensity Stereo( Layer III)
MS+Intensity stereo :
iOii
i
iOii
rLR
r
rLL
1
1*
1*
Where ri = )12
tan( ip
37
Layer III Audio Data
Header Error check Audio data Anillary data
Side information
Description filed
Main data filed
( variable length)
Main data begin, private bits
Scale factor selection information
Granule0 information
Granule1 information
Scale factors( graunle0)
Huffman data field ( granule0)
Scale factors( granule1)
Huffman data field( graunle1)
38
Side informationUnit Window_switch_flag=1 Widow_switch_flag=0
Channel 1 2 1 2
Main_data_begin 9 9 9 9 9
Private_bits 5,3 5 3 5 3
Scfsi[ch][scfsi_band] 1 4 8 4 8
Part2_3_length[gr][ch] 12 24 48 24 48
Big_values[gr][ch 9 18 36 18 36
Global_gain[gr][ch] 8 16 32 16 32
Scalefac_compress[gr][ch] 4 8 16 8 16
Window_switching_flag[gr][ch]
1 2 4 2 4
Block_type[gr][ch] 2 4 8
Mixed_block_flag[gr][ch] 1 2 4
Table_select [gr][ch][region] 5 20 40 30 60
Subblock_gain [gr][ch][window]
3 18 36
Region0_count [gr][ch] 4 8 16
Region1_count [gr][ch] 3 6 12
preflag [gr][ch] 1 2 4 2 4
Scalefac_scale [gr][ch] 1 2 4 2 4
Count1table_select [gr][ch] 1 2 4 2 4
Sum 136 256 136 256
Bytes 17 32 17 32
39
Main Data Begin
9 bits unsigned integer to specify the location as a negative offset in bytes from the first byte of audio syncword.
Main_data_begin4
frame1 frame3frame2 frame4
Sy
nc inf
o Sy
nc inf
o Sy
nc inf
o Sy
nc inf
o
Main_data_begin1
Main_data_begin2
Main_data_begin3
Main_data_begin4
frame1 frame3frame2 frame4
Sy
nc inf
o Sy
nc inf
o Sy
nc inf
o Sy
nc inf
o
Main_data_begin1
Main_data_begin2
Main_data_begin3
40
Scfsi Bands
If scfsi[scfsi_band]=0 Scalefactor are transmitted for each granule
If scfsi[scfsi_band]=1 Scalefactors are transmitted for granule 0 They are also valid for granule 1
Scfsi band SF band range
0 0~5 0~23
1 6~10 24~65
2 11~15 66~193
3 16~20 194~549
41
Part2_3_length
The number of main data bits Main data bits=scale factors + Huffman code data
This is used to calculate the beginning of next granule ( or ancillary data )
Part2_3_length
Main data
Sy
nc inf
o Sy
nc inf
o Sy
nc inf
o
42
Big_values
Big value range=2*big_values( 9bits) One value range Zero range
2*big_values
576
2*big_values+4*count1
8191
43
Global_gain
Long blocks :
Short blocks :
Pretab[sbf] : value from pre-emphasis table 210 : the value is used to scale output properly
subgain
grgainglobal
iii sssignr2
23
4 210][_4
1
*
]][][[_][][
*_]][][[
grchsfblscalefacsfbpretabgrpreflag
multiplierscalefacgrchsbfsubgain
]][][][[_*_
]][][[_*2]][][][[
windowsfbchgrsscalefacmultiplierscalefac
windowgrchgainsubblockwindowgrchsfbsubgain
44
Scalefac_compress
Select the number of bits for the transmission of scale factors
Scalefac compress Slen1 Slen2
0 0 0
1 0 1
2 0 2
3 0 3
4 3 0
5 1 1
6 1 2
7 1 3
8 2 1
9 2 2
10 2 3
11 3 1
12 3 2
13 3 3
14 4 2
15 4 3
45
Scale factor length( slen1,slen2)
Block_type Mix_block_flag Slen1 Slen2
0,1,3 - 0~10 11~20
2
0 0~5 6~11
1 0~7(L)+3~5(S)
6~11
46
Regino0_count,Region1_count[gr][ch]
Region0_count=4 bits. Region1_count=3 bits. Region0_count+1=scale factor bands in region 0 Regino1_count+1=scale factor bands is region 1
2*big_values
… … … …
REGIN0 REGIN1 REGIN2Big Values Region
47
Window_switching_flag
If window_switch_flag==1 then block_type!=0
If window_switch_flag==0 then block_type=0
Block_type Mix_block_flag Region0 Regino1
1,3 - 7 36
2
0 7 36
1 8 36
Block_type Mix_block_flag Region0 Regino1
0 Not exist Designate designate
49
Mixed_block_flag[gr][ch]
- If mixed_block_flag=0 :- 32 sub-bands are transform with block_type[gr][ch]
- If mixed_block_flag=1 :- Two lowest sub-bands are transformed with long block- 30 other sub-bands are transformed with block_type[gr]
[ch]
50
Table_select[gr][ch][region]
- 5 bits for each region, each channel, each granule
- To select the 32 possible Huffman tables in big values
X Y Hlen Hcod
0 0 1 1
0 1 3 010
0 2 6 000001
1 0 3 011
1 1 3 001
1 2 5 00001
2 0 5 00011
2 1 5 00010
2 2 6 000000
51
Subblock_gain[gr][ch][window]
- Use only for block type 2- 3 bits for each window, each channel, each granule
]][][][[_*_
]][][[_*2]][][][[
windowsfbchgrsscalefacmultiplierscalefac
windowgrchgainsubblockwindowgrchsfbsubgain
subgain
grgainglobal
iii sssignr2
23
4 210][_4
1
*
52
Preflag[gr][ch]
- If preflag=1, internal pre-emphasis is used
- If block_thpe=2, preflag is never used
SF
BAND
Pretab Rang
0
0
0~3
1 4~7
2 8~11
3 12~15
4 16~19
5 20~23
6 24~29
7 30~35
8 36~43
9 44~53
10 54~65
SF
BAND
pretab Rang
11
1
66~81
12 82~101
13 102~125
14 126~155
15 2 156~193
16 194~239
17
3
240~295
18 296~363
19 364~447
20 2 448~549
53
Scalefac_scale[gr][ch]
- If scalefac_scale=0, scalefac_multiplier=1/2
- If scalefac_scale=1 scalefac_multiplier=1
- Long blocks :
- Short blocks :
]][][[210][_4
1
3
4
2*grchsfbsubgaingrgainglobal
iii sssignr
])][][[_(_
][][]][][[
grchsfblscalefacmultiplierscalefac
sfbpretabgrpreflaggrchsbfsubgain
]][][][[_(_
]][][[_*2]][][[
windowsfbchgrsscalefacmultiplierscalefac
windowchgrgainsubblockgrchsfbsubgain