mpeg-1 lag 1, 2 og lag 3€¦ · we do not hear all sounds. 1. absolute threshold of hearing: 2....

38
INSTITUTT FOR INFORMATIKK UNIVERSITETET I OSLO MPEG-1 lag 1, 2 og lag 3 Sverre Holm

Upload: others

Post on 08-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 1

UNIVERSITETET I OSLO

MPEG-1 lag 1, 2 og lag 3

Sverre Holm

Page 2: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 2

UNIVERSITETET I OSLO

MPEG audiokoding• Motivasjon for de fleste kapitlene i

Manolakis, Ingle: Applied Digital Signal Processing: Theory and Practice.

Page 3: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 4

UNIVERSITETET I OSLO

Beethovens 5. symfoni

0 20 40 60 80 100 120-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15Bethovens 9., samplet med 44.1 kHz

Tid

Am

plitu

de

16 bit kvantisering4 bit kvantisering

16 bit kvantisering216= 65536 nivåer

4 bits kvantisering 24=16 nivåer

Kap 15: Finite Wordlength Effects

Page 4: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 9

UNIVERSITETET I OSLO

Bitrater• CD: 44.1 * 2 * 16 = 1.411 Mbit/s

• 4 bit/sample: 25% => 350 kbit/s låter forferdelig

• MP3, AAC etc: • 128 kbit/s ~ CD/12, dvs16/12 = 1,25 bit/sample• 256 kbit/s ~ CD/6, dvs 16/6= 2,5 bit/sample

• Hva er det lure trikset?

Page 5: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 8

UNIVERSITETET I OSLO

Direktesampling (PCM)

KILDE

A/D

LP-FILTER

D/A

LP-FILTER

BITSTRØM

Unngå aliasing i samplingen

Bruk nok bit til å få liten nok kvantiseringsfeil

KODER DEKODER

Page 6: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 10

UNIVERSITETET I OSLO

MPEG-1 Audio Psychoacoustics in sound compression

• How do we hear?• Digital representation of sounds• Sound compression• Psychoacoustics

– Masking– Adaptive quantization– Bit allocation

• Filterbanks000001010011

111110101

110

Tid

Page 7: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 11

UNIVERSITETET I OSLO

Øret1. Tinning2. Øregang3. Ytre øre4. Trommehinne5. Ovale vindu6. Hammeren7. Ambolt8. Stigbøylen9. Bueganger10. Sneglehuset11. Hørselnerve12. Øretrompeten Wikimedia Commons

Page 8: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 12

UNIVERSITETET I OSLO The Ear

Page 9: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 13

UNIVERSITETET I OSLO The frequency filters of the ear:

Mapping frequency to a location

Unwound

cochlea

Kap 4: Fourier Representation of Signals

Page 10: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 14

UNIVERSITETET I OSLO

Frekvenser• The frequency mapping of

the human cochlea.• Highest frequencies placed

at the base of the cochlea and lowest at the centre of the cochlea

• Image by courtesy of MED-EL & Ralf Greisiger, PhD thesis, 2016

Page 11: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 15

UNIVERSITETET I OSLO

Threshold for audible sounds

Reference 0 dB:20 μPa = 2�10-5 N/m2 Kap 4: Fourier

Representation of Signals

Page 12: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 16

UNIVERSITETET I OSLO

FILT

ERB

AN

K Scaling Quantize

A/DLP-filterSource

Scaling

Scaling

MU

X

PSYCHOACOUSTICANALYSIS

BIT STREAM

Filterbank ApproachEncoding

BIT STR

EAM Quantize

Quantize

BIT ALLOCATION

Kap 5: Transform Analysis of LTI Systems

Page 13: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 17

UNIVERSITETET I OSLO

Decoding is much simpler

FILT

ERB

AN

KScaling

Scaling

Scaling

D/A LP-filter

BIT STREAM

Kap 3: Discrete-time Signalsand Systems: LinearityKap 5: Transform Analysis of LTI Systems: Inverse systems

Page 14: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 18

UNIVERSITETET I OSLO

• Polyphase filterbank• 32 subbands, e.g. bw

44100/2/32 = 689 Hz• 512 tap FIR-filters• 80 adds and mults per output

• Equal width• Not perfect reconstruction• Frequency overlap

Filterbanks in MPEG-1 audio layer 1-3

Kap 10: Design of FIR Filters

Kap 5: Transform Analysis of LTI Systems

Kap 3: The z-transform

Page 15: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 19

UNIVERSITETET I OSLO

• The subbands overlap at 3 dB point with the adjacent bands.

• The leakage to the other bands is small.• The total response almost adds up to one (0 dB).

A closer look

Page 16: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 20

UNIVERSITETET I OSLO

White noise

• The white noise run through the filterbank.

• The samples from each band are played in the order of the subbands.

• The reconstructed sequence – The reconstruction

error is –84 dB.

Kap 6: Sampling of Continuous-Time Signals

Page 17: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 21

UNIVERSITETET I OSLO

• In a perfect filterbank the first part is the only part.

• The second part consists of the aliasing terms.

• The filterbank is designed so that the aliasing is small.

1

1

0)()(1)()( jA

kj

M

k

Rk

jj eHeHM

eXeY

0

21

0

1

1

2

)()(1)(

Mnj

Ak

jM

k

Rk

M

n

Mnj

eHeHM

eX

Reconstruction Using Nonideal Filterbanks

Page 18: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 22

UNIVERSITETET I OSLO

Tubthumper, a time domain view

The red line is the reconstruction error after splitting the signal in subbands, down sampling and applying the synthesisfilterbank. The reconstruction error is –84 dB and sounds like

Page 19: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 23

UNIVERSITETET I OSLO

Subband 1 2 4 8 16 32

Center frequency [kHz] 0.3 1.0 2.4 5.2 10.7 21.7

No subsampling

Subsampled 32 times

Tubthumper, frequency view

Page 20: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 24

UNIVERSITETET I OSLO

What is this Psychoacoustics that is used in the Encoder ?

Kap 8: The Discrete Fourier Transform:Fourier Analysis of Signals

Page 21: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 25

UNIVERSITETET I OSLO Masking

We do not hear all sounds.1. Absolute threshold of hearing: �2. Masking: One sound is inaudible in the

presence of another sound.1. Simultaneous masking

– Noise Masking Tone– Tone Masking Noise– Noise Masking Noise

2. Nonsimultaneous masking• Pre masking (2 ms)• Post masking (100 ms)

Page 22: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 26

UNIVERSITETET I OSLO Noise Masking Tone

Filtered NoiseCenter 410 Hz Width 111 Hz

Tone 1, 820 Hz5 dB below noise

Tone 2, 410 Hz 5 dB below noise

Noise+

Tone 1

Noise+

Tone 2

Not masked Masked

You can not hear a sinusoid that lies in the same critical band as a filtered noise if the sound pressure level is below a certain threshold.

This effect also stretches out beyond the critical band.

Page 23: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 27

UNIVERSITETET I OSLO Tone Masking Noise

Filtered NoiseCenter 1 kHz Width 162 Hz15 dB below

Tone 1, 2 kHz Tone 2, 1 kHz Noise+

Tone 1

Noise+

Tone 2

Not masked Masked

You can not hear a filtered noise that lies in the same critical band as a sinusoid if the sound pressure level is below a certain threshold.

This effect also stretches out beyond the critical band.

Page 24: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 28

UNIVERSITETET I OSLO Exploit Masking• If a sound is masked we can’t hear it.

• Do a frequency analysis of the signal and find the masking threshold.

• Put the quantization noise under the masking threshold and we won’t hear the quantization.

Kap 8: The Discrete Fourier Transform:Fourier Analysis of Signals

Page 25: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 29

UNIVERSITETET I OSLO Pre echo distortion

• The quantization noise is spread over a whole window.

• This makes the castanets sound less distinct.

• Audible effects can be avoided with shorter windows, exploiting premasking.

• The original sound of a castanet.

• The abruptness in time domain results in all frequencies being involved.

Page 26: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 30

UNIVERSITETET I OSLO

Vindus-svitsjing: l.1 og l.3 (= MP3)– Blokkstørrelse i transform og

delbåndskodere:• Små blokker: god transientgjengivelse, dårlig

koding pga mye overhead• Store blokker: god kodingsgevinst; gir pre-

ekko

– Vindus-svitsjing mellom N=64 (1.3 ms) og 1024 (21.3 ms) blokkstørrelse

• Små blokker ved ikke-stasjonæritet• Ellers store blokker

Page 27: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 31

UNIVERSITETET I OSLO

Scale factors and Quantization• When the dynamics change over time, only a

small subset of the quantization steps are used in regions with low magnitudes.

• Use scale factors instead:– Take a window of data. – Find the max magnitude in this window.

• Use the next larger scale factor from a table.

– Normalize with the scale factor. – Quantize.

• Now the whole dynamic range of the quantizer is used.

– Send scale factor and quantized samples.

Page 28: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 32

UNIVERSITETET I OSLO

Bit Allocation and Masking• The masking threshold in each

subband gives the Just Noticeable Distortion (JND) limit for that band.

• Bits are assigned to subbands so that the quantization noise falls below or as little over the JND as possible.

• Then the Signal to Quantization noise Ratio (SQR) falls below JND

Page 29: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 33

UNIVERSITETET I OSLO Castanets and Guitar

22050 Hz

Page 30: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 34

UNIVERSITETET I OSLO Bit allocation with 2 bits per sample

22050 Hz

Page 31: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 35

UNIVERSITETET I OSLO Bit allocation with 4 bits per sample

22050 Hz

Page 32: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 36

UNIVERSITETET I OSLO Signal to Quantization Noise Ratio

and the Just Noticeable Distortion

Frame at t=0.6 s Frame at t=1.1 s

Page 33: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 37

UNIVERSITETET I OSLO Examples on compression

Compression 2 4 8MP1 4 bit 2 bit

MP1 error (SQR) 22 dB 11 dB

Direct Quantization 8 bit 4bit 2 bit

Direct Quantization Error (SQR)

31 dB 7.8 dB 1.1 dB

Downsampling to 22 kHz bandwidth and quantization

16 bit 8 bit 4 bit

Page 34: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 38

UNIVERSITETET I OSLO

MPEG-1 layers I and II

P. Noll, MPEG digital audio coding, IEEE Sign. Proc. Mag., Sep 1997

Page 35: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 39

UNIVERSITETET I OSLO

MPEG-1 layer III = MP3

P. Noll, MPEG digital audio coding, IEEE Sign. Proc. Mag., Sep 1997

Page 36: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 40

UNIVERSITETET I OSLO

MPEG-1 audio (ca 1990)– Lag I: Delbåndskoding i 32 like frekvensbånd,

512 koeffisienters polyfase kvadratur speilfiltre og psykoakustisk modell som bestemmer adaptiv bit-tilordning, rammelengde 8 ms

• ~192 kbit/s pr kanal for CD-kvalitet, ~384 kbit/s for stereo

– Lag II: Rammelengde 24 ms• 92 kbit/s pr kanal, 192 kbit/s for stereo

– Lag III: kaskadekopler en 6 eller 18 punkts (dynamisk vindus-svitsjing) MDCT med lag Is filterbank => 32*18 = 576 frekvensbånd =>

• 64 kbit/s pr kanal (variabel) (128 kbit/s for stereo)– MPEG-1, layer III = MP3

Page 37: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 42

UNIVERSITETET I OSLO

MPEG-2 AAC - april 1997• MPEG-2 AAC (Advanced Audio Coding)

– Samplingsrater fra 8 til 96 kHz– opp til 48 audiokanler– 320 kbit/s for 5.1 kanaler høykvalitets audio,

kompresjon 12:1 i forhold til CD – Høyoppløselig filterbank (1024 punkts mod.

cosinustransform),– støyforming i tid, prediksjonsteknikker

Page 38: MPEG-1 lag 1, 2 og lag 3€¦ · We do not hear all sounds. 1. Absolute threshold of hearing: 2. Masking: One sound is inaudible in the presence of another sound. 1. Simultaneous

INSTITUTT FOR INFORMATIKK August 2009 - 44

UNIVERSITETET I OSLO Kapitler i læreboka – MP3-koding1 Introduction Praktiske eksempler som mp1/mp3

2 Discrete-time Signals and Systems Musikk og talesignal Linearitet: kan dele i frekvensbånd, behandle for seg og så sette

sammen igjen Tidsinvarians gjelder bare over kort tid for musikk og tale. Differanseligninger: FIR filtre

3 The z-Transform Analyse av filter i filterbanken: nøkkel til filterdesign

4 Fourier Representation of Signals Frekvensdomene er sentralt i modell av hørsel

5 Transform Analysis of LTI Systems Frekvensselektive filtre: båndpassfiltre Inverse systemer: kan dele i bånd i koder og addere sammen

igjen i dekoder

6 Sampling of Continuous-time Signals Sampling og samplingsrater

7 The Discrete Fourier Transform Frekvensanalyse av signaler

8 Computation of the Discrete Fourier Transform

Fast Fourier Transform (FFT) brukes i estimering av spektrum i koder.

Må estimere korttidsspektrum for å gjøre adaptiv bittildeling

9 Structures for Discrete-Time Systems Filterstrukturer, effektiv implementasjon av filterbank i koder og dekoder

10 Design of FIR Filters. Hvordan finne koeffisienter til bp-filtrene i filterbanken?

12 Multirate Signal Processing Multirate system: Hvert delfilter kan nedsamples pga bare 1/32 av total båndbredde => trenger bare 1/32 samplerate per filter

15 Finite Word Length Effects A/D-analyse: kvantiseringsstøy