introduction to audio signal processing waveform audio file format (wav) chunkid contains the...

Download Introduction to Audio Signal Processing Waveform Audio File Format (WAV) ChunkID Contains the letters

Post on 04-Mar-2020

0 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Introduction to

    Audio Signal Processing Human-Computer Interaction

    Angelo Antonio Salatino aasalatino@gmail.com

    http://infernusweb.altervista.org

    mailto:aasalatino@gmail.com

  • License

    This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

  • Overview

    • Audio Signal Processing;

    • Waveform Audio File Format;

    • FFmpeg;

    • Audio Processing with Matlab;

    • Doing phonetics with Praat;

    • Last but not least: Homework.

  • Audio Signal Processing

    • Audio signal processing is an engineering field that focuses on the computational methods for intentionally altering auditory signals or sounds, in order to achieve a particular goal.

    Audio Signal

    Processing

    Input Signal

    Output Signal

    Data with meaning

  • Audio Processing in HCI

    Some HCI applications involving audio signal processing are:

    • Speech Emotion Recognition

    • Speaker Recognition

    ▫ Speaker Verification

    ▫ Speaker Identification

    • Voice Commands

    • Speech to Text

    • Etc.

  • Audio Signals

    You can find audio signals represented in either digital or analog format.

    • Digital – the pressure wave-form is a sequence of symbols, usually binary numbers.

    • Analog – is a smooth wave of energy represented by a continuous stream of data.

  • Analog to Digital Converter (ADC)

    • Don’t worry, it’s only a fast review!!!

    Sample & Hold

    Quantization Encoding Continuous in Time Continuous in Amplitude

    Discrete in Time Continuous in Amplitude

    Discrete in Time Discrete in Amplitude

    Discrete in Time Discrete in Amplitude

    Analog Signal Digital Signal

    • For each measurement a number is assigned according to its amplitude.

    • Sampling frequency and the number of bits to represent a sample can be considered as main features for digital signals.

    • How these digital signals are stored?

    Sampling Frequency must be defined

    # bits per sample must be defined

  • Waveform Audio File Format (WAV)

    Endianess Byte

    Offeset Field Name Field Size Description

    Big 0 ChunkID 4

    RIFF Chunk Descriptor Little 4 ChunkSize 4

    Big 8 Format 4

    Big 12 SubChunk1ID 4

    Format SubChunk

    Little 16 SubChunk1Size 4

    Little 20 AudioFormat 2

    Little 22 NumChannels 2

    Little 24 SampleRate 4

    Little 28 ByteRate 4

    Little 32 BlockAlign 2

    Little 34 BitsPerSample 2

    Big 36 SubChunk2ID 4

    Data SubChunk Little 40 SubChunk2Size 4

    Little 44 Data SubChunk2Size

    The Wav file is an instance of a Resource Interchange File Format (RIFF) defined by IBM and Microsoft. The RIFF is a generic file container format for storing data in tagged chunks (basic building blocks). It is a file structure that defines a class of more specific file formats, such as: wav, avi, rmi, etc.

  • Waveform Audio File Format (WAV) ChunkID Contains the letters «RIFF» in ASCII form (0x52494646 big-endian form)

    Endianess Byte

    Offeset Field Name Field Size Description

    Big 0 ChunkID 4

    RIFF Chunk Descriptor Little 4 ChunkSize 4

    Big 8 Format 4

    Big 12 SubChunk1ID 4

    Format SubChunk

    Little 16 SubChunk1Size 4

    Little 20 AudioFormat 2

    Little 22 NumChannels 2

    Little 24 SampleRate 4

    Little 28 ByteRate 4

    Little 32 BlockAlign 2

    Little 34 BitsPerSample 2

    Big 36 SubChunk2ID 4

    Data SubChunk Little 40 SubChunk2Size 4

    Little 44 Data SubChunk2Size

    ChunkSize This is the size of the rest of the chunk following this number. The size of the entire file in bytes minus 8 for the two fields not included: ChunkID and ChunkSize.

    Format Contains the letters «WAVE» in ASCII form (0x57415645 big-endian form)

  • Waveform Audio File Format (WAV)

    SubChunk1ID Contains the letters «fmt » in ASCII form (0x666d7420 big-endian form)

    Endianess Byte

    Offeset Field Name Field Size Description

    Big 0 ChunkID 4

    RIFF Chunk Descriptor Little 4 ChunkSize 4

    Big 8 Format 4

    Big 12 SubChunk1ID 4

    Format SubChunk

    Little 16 SubChunk1Size 4

    Little 20 AudioFormat 2

    Little 22 NumChannels 2

    Little 24 SampleRate 4

    Little 28 ByteRate 4

    Little 32 BlockAlign 2

    Little 34 BitsPerSample 2

    Big 36 SubChunk2ID 4

    Data SubChunk Little 40 SubChunk2Size 4

    Little 44 Data SubChunk2Size

    SubChunk1Size 16 for PCM. This is the size of the SubChunk which follows this number.

  • Waveform Audio File Format (WAV)

    AudioFormat Format Code or compression type: PCM = 0x0001 (Linear quantization, uncompressed) IEEE_FLOAT = 0x0003 Microsoft_ALAW=0x0006 Microsoft_MLAW=0x0007 IBM_ADPCM = 0x0103 …

    Endianess Byte

    Offeset Field Name Field Size Description

    Big 0 ChunkID 4

    RIFF Chunk Descriptor Little 4 ChunkSize 4

    Big 8 Format 4

    Big 12 SubChunk1ID 4

    Format SubChunk

    Little 16 SubChunk1Size 4

    Little 20 AudioFormat 2

    Little 22 NumChannels 2

    Little 24 SampleRate 4

    Little 28 ByteRate 4

    Little 32 BlockAlign 2

    Little 34 BitsPerSample 2

    Big 36 SubChunk2ID 4

    Data SubChunk Little 40 SubChunk2Size 4

    Little 44 Data SubChunk2Size

    NumChannels Mono = 1, Stereo = 2, etc. Note: Channels are interleaved

  • Waveform Audio File Format (WAV)

    SampleRate Samplig frequency: 8000, 16000, 44100, etc.

    Endianess Byte

    Offeset Field Name Field Size Description

    Big 0 ChunkID 4

    RIFF Chunk Descriptor Little 4 ChunkSize 4

    Big 8 Format 4

    Big 12 SubChunk1ID 4

    Format SubChunk

    Little 16 SubChunk1Size 4

    Little 20 AudioFormat 2

    Little 22 NumChannels 2

    Little 24 SampleRate 4

    Little 28 ByteRate 4

    Little 32 BlockAlign 2

    Little 34 BitsPerSample 2

    Big 36 SubChunk2ID 4

    Data SubChunk Little 40 SubChunk2Size 4

    Little 44 Data SubChunk2Size

    ByteRate Average bytes per second. It is typically determined by the Equation 1.

    1) ByteRate = SampleRate ⋅NumChannels ⋅ BitsPerSample

    8

    2) BlockAlign = NumChannels ⋅ BitsPerSample

    8

    BlockAlign The number of bytes for one sample including all channels. It is determined by the Equation 2.

  • Waveform Audio File Format (WAV)

    BitsPerSample 8 bits = 8, 16 bits = 16, etc.

    Endianess Byte

    Offeset Field Name Field Size Description

    Big 0 ChunkID 4

    RIFF Chunk Descriptor Little 4 ChunkSize 4

    Big 8 Format 4

    Big 12 SubChunk1ID 4

    Format SubChunk

    Little 16 SubChunk1Size 4

    Little 20 AudioFormat 2

    Little 22 NumChannels 2

    Little 24 SampleRate 4

    Little 28 ByteRate 4

    Little 32 BlockAlign 2

    Little 34 BitsPerSample 2

    Big 36 SubChunk2ID 4

    Data SubChunk Little 40 SubChunk2Size 4

    Little 44 Data SubChunk2Size

    SubChunk2ID Contains the letters «data» in ASCII form (0x64617461 big-endian form)

    SubChunk2Size This is the number of bytes in the Data field. If AudioFormat=PCM, then you can compute the number of samples (see Equation 3).

    3) NumOfSamples = 8 ⋅ SubChunk2Size

    NumChannels ⋅ BitsPerSample

  • Example of wave header

    Chunk Descriptor Fmt SubChunk

    52 49 46 46 16 02 01 00 57 41 56 45 66 6d 74 20 10 00 00 00 01 00 01 00

    R I F F W A V E f m t

    Fmt SubChunk (cont…) Data SubChunk

    80 3e 00 00 00 7d 00 00 02 00 10 00 64 61 74 61 f2 01 01 00 … . . .

    d a t a

    SampleRate = 16000

    ChunkSize = 66070

    ByteRate = 32000

    BloackAlign = 2

    BitsPerSample = 16

    NumChannels = 1

    AudioFormat = 1 (PCM)

    SubChunk1Size = 16

    SubChunk2Size = 66034

    Data

  • Exercise

    For the next 15 min, write a C/C++ program that takes a wav file as input and prints the following values on standard output: • Header size; • Sample rate; • Bits per sample; • Number of channels; • Number of samples.

    Good work!

  • Solution typedef struct header_file

    {

    char chunk_id[4];

    int chunk_size;

    char format[4];

    char subchunk1_id[4];

    int subchunk1_size;

    short int audio_format;

    short int num_channels;

    int sample_rate;

    int byte_rate;

    short int block_align;

    short int bits_per_sample;

    char subchunk2_id[4];

    int subchunk2_size;

    } header;

    /************** Inside Main()

Recommended

View more >