w.a.v.s. compression alex chen nader shehad aamir virani erik welsh
TRANSCRIPT
W.A.V.S. Compression
Alex ChenNader ShehadAamir ViraniErik Welsh
Overview Approach Psychoacoustic Modeling Filter Banks Quantization Demonstration Results Further Research
Approach
Filter Banks
PsychoacousticModel
Quantization
InverseQuantization
ReconstructionFilter Banks
Encoding:
Input EncodedSignal
Decoding:
EncodedSignal
Output
Psychoacoustic Model Based on studies that show hearing
capabilities affected by: Environment Limitations of human auditory system
Used to eliminate portions of signal average human won’t hear
Two key properties: Absolute threshold of hearing Auditory masking
Absolute Threshold of Hearing Experiment:
Plot audible threshold of tone
Observations: Auditory system
sensitive to some frequencies
Frequencies within “critical bandwidth” treated similarly
Basis for Bark scale
Auditory Masking Tones and noise
drown out less powerful sounds Affect neighboring
frequencies Affect critical
bandwidth Effects add to
produce overall masking threshold Mask quantization
Filter Banks Theory
Array of bandpass filters Break up signal into frequency subbands Allows for variable coding scheme
Analysis and Synthesis Banks
1) Analysis filters divide up the signal2) Down-sample3) Quantize
4) Up-sample5) Synthesis filters remove distortions6) Reconstruct the signal
Filter Bank Design Phase Tradeoff between fine and coarse
frequency resolution Piccolo vs. Castanets Non-stationary signals We used non-adaptive approach
Filter Bank Implementation We used Cosine Modulated PR
(perfect reconstruction) filter banks with 32 filters each
Output is a delayed version of the input (linear phase)
Distortion arises from quantization only
Quantization
Two types Narrow-band
Current input Overhead cost
Full-range Independent of
current input No overhead
Sampled Input
Quantized Version
Reconstructed Input
Quantization Narrow Band
More accurate Lower
compression ratio Full-Range
Less accurate Higher
compression ratio
Using 3-bit Quantization
Input: -.4 -.22 .14 .4
Levels: 1 3 6 8 Recon.: -.4 -.2 .1 .3 Total Error: .16
Input: -.4 -.22 .14 .4Output: 3 4 6 7Recon: -.5 -.25 .25 .50Total Error: .34
Demonstration Sine wave
Full range Narrow range
Chime 8-bit Full range Narrow range
Percussion Full Range Narrow Range
Modern 8-bit Full Range Narrow Range
Sine Wave (time)
Full-Range Quantization Narrow Quantization
Sine Wave (freq)
Full-Range Quantization Narrow Quantization
Sine Wave (freq error)
Full-Range Quantization Narrow Quantization
Modern (time)
Full-Range Quantization Narrow Quantization
Modern (freq)
Full-Range Quantization Narrow Quantization
Modern (freq error)
Full-Range Quantization Narrow Quantization
Results
Full Range: Smallest File, Worst Sound Quality Narrow Range: Better Sound Quality, Larger File MP3: Industry Standard
Data Full-Range Narrow 8bit 16bit Original (16*numsamples) Original wav Mp3Bytes Bytes Bytes Bytes Bytes Bytes Bytes
Pure sine 14854 18168 14000 27600 24000 24044 50152 separate sines 14374 17656 14000 27600 24000 24044 50152 near sines 14830 18142 14000 27600 24000 24044 5015percussion 261726 311542 237584 X 460000 460044 84427chimes 72330 89316 70000 X 120002 120046 37198Modern 252692 301260 246576 X 449232 449276 82755
Further Research Filter Banks
Wavelets Dynamic Frequency Ranges
Better Psychoacoustic Model Tone Designation Pre- and Post- Echo
Bit Allocation Writing a File