introduction to audio signal processing waveform audio file format (wav) chunkid contains the...
Post on 04-Mar-2020
0 views
Embed Size (px)
TRANSCRIPT
Introduction to
Audio Signal Processing Human-Computer Interaction
Angelo Antonio Salatino aasalatino@gmail.com
http://infernusweb.altervista.org
mailto:aasalatino@gmail.com
License
This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Overview
• Audio Signal Processing;
• Waveform Audio File Format;
• FFmpeg;
• Audio Processing with Matlab;
• Doing phonetics with Praat;
• Last but not least: Homework.
Audio Signal Processing
• Audio signal processing is an engineering field that focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.
Audio Signal
Processing
Input Signal
Output Signal
Data with meaning
Audio Processing in HCI
Some HCI applications involving audio signal processing are:
• Speech Emotion Recognition
• Speaker Recognition
▫ Speaker Verification
▫ Speaker Identification
• Voice Commands
• Speech to Text
• Etc.
Audio Signals
You can find audio signals represented in either digital or analog format.
• Digital – the pressure wave-form is a sequence of symbols, usually binary numbers.
• Analog – is a smooth wave of energy represented by a continuous stream of data.
Analog to Digital Converter (ADC)
• Don’t worry, it’s only a fast review!!!
Sample & Hold
Quantization Encoding Continuous in Time Continuous in Amplitude
Discrete in Time Continuous in Amplitude
Discrete in Time Discrete in Amplitude
Discrete in Time Discrete in Amplitude
Analog Signal Digital Signal
• For each measurement a number is assigned according to its amplitude.
• Sampling frequency and the number of bits to represent a sample can be considered as main features for digital signals.
• How these digital signals are stored?
Sampling Frequency must be defined
# bits per sample must be defined
Waveform Audio File Format (WAV)
Endianess Byte
Offeset Field Name Field Size Description
Big 0 ChunkID 4
RIFF Chunk Descriptor Little 4 ChunkSize 4
Big 8 Format 4
Big 12 SubChunk1ID 4
Format SubChunk
Little 16 SubChunk1Size 4
Little 20 AudioFormat 2
Little 22 NumChannels 2
Little 24 SampleRate 4
Little 28 ByteRate 4
Little 32 BlockAlign 2
Little 34 BitsPerSample 2
Big 36 SubChunk2ID 4
Data SubChunk Little 40 SubChunk2Size 4
Little 44 Data SubChunk2Size
The Wav file is an instance of a Resource Interchange File Format (RIFF) defined by IBM and Microsoft. The RIFF is a generic file container format for storing data in tagged chunks (basic building blocks). It is a file structure that defines a class of more specific file formats, such as: wav, avi, rmi, etc.
Waveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form)
Endianess Byte
Offeset Field Name Field Size Description
Big 0 ChunkID 4
RIFF Chunk Descriptor Little 4 ChunkSize 4
Big 8 Format 4
Big 12 SubChunk1ID 4
Format SubChunk
Little 16 SubChunk1Size 4
Little 20 AudioFormat 2
Little 22 NumChannels 2
Little 24 SampleRate 4
Little 28 ByteRate 4
Little 32 BlockAlign 2
Little 34 BitsPerSample 2
Big 36 SubChunk2ID 4
Data SubChunk Little 40 SubChunk2Size 4
Little 44 Data SubChunk2Size
ChunkSize This is the size of the rest of the chunk following this number. The size of the entire file in bytes minus 8 for the two fields not included: ChunkID and ChunkSize.
Format Contains the letters «WAVE» in ASCII form (0x57415645 big-endian form)
Waveform Audio File Format (WAV)
SubChunk1ID Contains the letters «fmt » in ASCII form (0x666d7420 big-endian form)
Endianess Byte
Offeset Field Name Field Size Description
Big 0 ChunkID 4
RIFF Chunk Descriptor Little 4 ChunkSize 4
Big 8 Format 4
Big 12 SubChunk1ID 4
Format SubChunk
Little 16 SubChunk1Size 4
Little 20 AudioFormat 2
Little 22 NumChannels 2
Little 24 SampleRate 4
Little 28 ByteRate 4
Little 32 BlockAlign 2
Little 34 BitsPerSample 2
Big 36 SubChunk2ID 4
Data SubChunk Little 40 SubChunk2Size 4
Little 44 Data SubChunk2Size
SubChunk1Size 16 for PCM. This is the size of the SubChunk which follows this number.
Waveform Audio File Format (WAV)
AudioFormat Format Code or compression type: PCM = 0x0001 (Linear quantization, uncompressed) IEEE_FLOAT = 0x0003 Microsoft_ALAW=0x0006 Microsoft_MLAW=0x0007 IBM_ADPCM = 0x0103 …
Endianess Byte
Offeset Field Name Field Size Description
Big 0 ChunkID 4
RIFF Chunk Descriptor Little 4 ChunkSize 4
Big 8 Format 4
Big 12 SubChunk1ID 4
Format SubChunk
Little 16 SubChunk1Size 4
Little 20 AudioFormat 2
Little 22 NumChannels 2
Little 24 SampleRate 4
Little 28 ByteRate 4
Little 32 BlockAlign 2
Little 34 BitsPerSample 2
Big 36 SubChunk2ID 4
Data SubChunk Little 40 SubChunk2Size 4
Little 44 Data SubChunk2Size
NumChannels Mono = 1, Stereo = 2, etc. Note: Channels are interleaved
Waveform Audio File Format (WAV)
SampleRate Samplig frequency: 8000, 16000, 44100, etc.
Endianess Byte
Offeset Field Name Field Size Description
Big 0 ChunkID 4
RIFF Chunk Descriptor Little 4 ChunkSize 4
Big 8 Format 4
Big 12 SubChunk1ID 4
Format SubChunk
Little 16 SubChunk1Size 4
Little 20 AudioFormat 2
Little 22 NumChannels 2
Little 24 SampleRate 4
Little 28 ByteRate 4
Little 32 BlockAlign 2
Little 34 BitsPerSample 2
Big 36 SubChunk2ID 4
Data SubChunk Little 40 SubChunk2Size 4
Little 44 Data SubChunk2Size
ByteRate Average bytes per second. It is typically determined by the Equation 1.
1) ByteRate = SampleRate ⋅NumChannels ⋅ BitsPerSample
8
2) BlockAlign = NumChannels ⋅ BitsPerSample
8
BlockAlign The number of bytes for one sample including all channels. It is determined by the Equation 2.
Waveform Audio File Format (WAV)
BitsPerSample 8 bits = 8, 16 bits = 16, etc.
Endianess Byte
Offeset Field Name Field Size Description
Big 0 ChunkID 4
RIFF Chunk Descriptor Little 4 ChunkSize 4
Big 8 Format 4
Big 12 SubChunk1ID 4
Format SubChunk
Little 16 SubChunk1Size 4
Little 20 AudioFormat 2
Little 22 NumChannels 2
Little 24 SampleRate 4
Little 28 ByteRate 4
Little 32 BlockAlign 2
Little 34 BitsPerSample 2
Big 36 SubChunk2ID 4
Data SubChunk Little 40 SubChunk2Size 4
Little 44 Data SubChunk2Size
SubChunk2ID Contains the letters «data» in ASCII form (0x64617461 big-endian form)
SubChunk2Size This is the number of bytes in the Data field. If AudioFormat=PCM, then you can compute the number of samples (see Equation 3).
3) NumOfSamples = 8 ⋅ SubChunk2Size
NumChannels ⋅ BitsPerSample
Example of wave header
Chunk Descriptor Fmt SubChunk
52 49 46 46 16 02 01 00 57 41 56 45 66 6d 74 20 10 00 00 00 01 00 01 00
R I F F W A V E f m t
Fmt SubChunk (cont…) Data SubChunk
80 3e 00 00 00 7d 00 00 02 00 10 00 64 61 74 61 f2 01 01 00 … . . .
d a t a
SampleRate = 16000
ChunkSize = 66070
ByteRate = 32000
BloackAlign = 2
BitsPerSample = 16
NumChannels = 1
AudioFormat = 1 (PCM)
SubChunk1Size = 16
SubChunk2Size = 66034
Data
Exercise
For the next 15 min, write a C/C++ program that takes a wav file as input and prints the following values on standard output: • Header size; • Sample rate; • Bits per sample; • Number of channels; • Number of samples.
Good work!
Solution typedef struct header_file
{
char chunk_id[4];
int chunk_size;
char format[4];
char subchunk1_id[4];
int subchunk1_size;
short int audio_format;
short int num_channels;
int sample_rate;
int byte_rate;
short int block_align;
short int bits_per_sample;
char subchunk2_id[4];
int subchunk2_size;
} header;
/************** Inside Main()
Recommended