w.a.v.s. compression alex chen nader shehad aamir virani erik welsh

W.A.V.S. Compression

Alex ChenNader ShehadAamir ViraniErik Welsh

Overview Approach Psychoacoustic Modeling Filter Banks Quantization Demonstration Results Further Research

Approach

Filter Banks

PsychoacousticModel

Quantization

InverseQuantization

ReconstructionFilter Banks

Encoding:

Input EncodedSignal

Decoding:

EncodedSignal

Output

Psychoacoustic Model Based on studies that show hearing

capabilities affected by: Environment Limitations of human auditory system

Used to eliminate portions of signal average human won’t hear

Two key properties: Absolute threshold of hearing Auditory masking

Absolute Threshold of Hearing Experiment:

Plot audible threshold of tone

Observations: Auditory system

sensitive to some frequencies

Frequencies within “critical bandwidth” treated similarly

Basis for Bark scale

Auditory Masking Tones and noise

drown out less powerful sounds Affect neighboring

frequencies Affect critical

bandwidth Effects add to

produce overall masking threshold Mask quantization

Filter Banks Theory

Array of bandpass filters Break up signal into frequency subbands Allows for variable coding scheme

Analysis and Synthesis Banks

1) Analysis filters divide up the signal2) Down-sample3) Quantize

4) Up-sample5) Synthesis filters remove distortions6) Reconstruct the signal

Filter Bank Design Phase Tradeoff between fine and coarse

frequency resolution Piccolo vs. Castanets Non-stationary signals We used non-adaptive approach

Filter Bank Implementation We used Cosine Modulated PR

(perfect reconstruction) filter banks with 32 filters each

Output is a delayed version of the input (linear phase)

Distortion arises from quantization only

Quantization

Two types Narrow-band

Current input Overhead cost

Full-range Independent of

current input No overhead

Sampled Input

Quantized Version

Reconstructed Input

Quantization Narrow Band

More accurate Lower

compression ratio Full-Range

Less accurate Higher

compression ratio

Using 3-bit Quantization

Input: -.4 -.22 .14 .4

Levels: 1 3 6 8 Recon.: -.4 -.2 .1 .3 Total Error: .16

Input: -.4 -.22 .14 .4Output: 3 4 6 7Recon: -.5 -.25 .25 .50Total Error: .34

Demonstration Sine wave

Full range Narrow range

Chime 8-bit Full range Narrow range

Percussion Full Range Narrow Range

Modern 8-bit Full Range Narrow Range

Sine Wave (time)

Full-Range Quantization Narrow Quantization

Sine Wave (freq)


Sine Wave (freq error)


Modern (time)


Modern (freq)


Modern (freq error)


Results

Full Range: Smallest File, Worst Sound Quality Narrow Range: Better Sound Quality, Larger File MP3: Industry Standard

Data Full-Range Narrow 8bit 16bit Original (16*numsamples) Original wav Mp3Bytes Bytes Bytes Bytes Bytes Bytes Bytes

Pure sine 14854 18168 14000 27600 24000 24044 50152 separate sines 14374 17656 14000 27600 24000 24044 50152 near sines 14830 18142 14000 27600 24000 24044 5015percussion 261726 311542 237584 X 460000 460044 84427chimes 72330 89316 70000 X 120002 120046 37198Modern 252692 301260 246576 X 449232 449276 82755

Further Research Filter Banks

Wavelets Dynamic Frequency Ranges

Better Psychoacoustic Model Tone Designation Pre- and Post- Echo

Bit Allocation Writing a File

w.a.v.s. compression alex chen nader shehad aamir virani erik welsh

Documents

range narrow range slide

signal slide

research slide

range narrow range modern

quantization input

range narrow range chime

range narrow range percussion

encoded signal output