le 460 l acoustics and experimental phonetics l-13 anu khosla drdo, delhi [email protected]...

29
LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi [email protected] 9313979365

Upload: annice-henderson

Post on 25-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

LE 460 L Acoustics and Experimental Phonetics

L-13

Anu KhoslaDRDO, Delhi

[email protected]

Page 2: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Introduction• Most of analysis methods are not designed to analyse sounds whose

characteristics are changing in time• Practical solution is to model the speech signal as a slowly varying function

of time.

• During intervals of 5 to 25 ms the speech characteristics don’t change too

much and are considered to be constant.

• Analyse in small segments - analysis intervals

• Optimal analysis interval length depends on the kind of information you

want to extract from the speech signal.

• Therefore the analysis results always represent some kind of average of the

analysis interval.

Page 3: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Parameters for AnalysisThree parameters to be decided for analysis Window Length

• There is no one optimal window length that fits all circumstances

• It depends on the type of analysis and the type of signal

• e.g

- to make spectrograms one often chooses either 5 ms for a wideband spectrogram or 40 ms for narrow band

- For pitch analysis a window length of 40 ms is more appropriate

Page 4: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Time step• This parameter determines the amount of overlap between

successive segments. • If the time step is much smaller than the window length we

have much overlap.• If time step is larger than the window length we have no

overlap at all.• In general we like to have at least 50% overlap between two

succeeding frames and we• will chose a time step smaller than half the window length.

Page 5: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Window shape

• In general we want the sound segment’s amplitudes to start and end smoothly.

• A lot of different window shapes are popular in speech analysis,

– square window (or rectangular window)

– Hamming window

– Hanning window

– Bartlett window.

• In Praat the default windowing function is the Gaussian window.

Page 6: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365
Page 7: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Speech Analysis

Short Time Analysis

• In time domain–Short time energy:Used to segment speech into smaller units

–Short time zero crossing: Used to help in making voicing decisions

(high ZCR indicates unvoiced speech)

–Short time autocorrelation : pitch determination

• In Frequency Domain–Fourier analysis:Spectrogram, formants

Page 8: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Computerized Speech

PrecautionsTry to avoid making recordings in reverberant rooms (a church is very reverberant).• Try to avoid making recordings at places where environment is noisy and uncontrollable • To avoid large intensity variations in the recording, the distance from the speaker’s mouth to the microphone should remain as constant as possible.Avoid simultaneous speaking

Page 9: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Th e s p ee ch s i g n al l e v el v a r ie s w i th t i m(e)

Computerized Speech•Speech (sound) is analog

• Computers are digital

•We need to convert

Page 10: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365
Page 11: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

• Sampling is the reduction of a continuous signal to a discrete signal

• Sampling frequency or sampling rate fs is defined as the number

of samples obtained in one second (samples per second), • fs = 1/T.

• Shannon and Nyquist proved in the 1930’s that for the digital signal to be a faithful representation of the analog signal, a relation between the sampling frequency and the bandwidth of the signal had to be maintained.

• The Nyquist-Shannon sampling theorem: A sound s(t) that contains no frequencies higher than F hertz is completely determined by giving its sample values at a series of points spaced 1=(2F ) seconds apart.

• The number of sample values per second corresponds to the term sampling frequency.

• Sample values at intervals of 1/2F s translate to a sampling frequency of 2F hertz.

Page 12: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Poor Sampling

-1.5

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8 10 12

Sampling Frequency = 1/2 X Wave Frequency

Sampling rate 2* wave period

Page 13: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Even Worse

-1.5

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8 10 12

Sampling Frequency = 1/3 X Wave Frequency

Page 14: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Higher Sampling Frequency

-1.5

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8 10 12

Sampling Frequency = 2/3 Wave Frequency

Page 15: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Getting Better

-1.5

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8 10 12

Sampling Frequency = Wave Frequency

Page 16: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Good Sampling

-1.5

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8 10 12

Sampling Frequency = 2 X Wave Frequency

Page 17: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Shannon-Nyquist's Sampling Theorem

• A sampled time signal must not contain components at frequencies above half the sampling rate (The so-called Nyquist frequency)

• The highest frequency which can be accurately represented is one-half of the sampling rate

Page 18: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Range of Human Hearing

• 20 – 20,000 Hz• We lose high frequency response with age• Women generally have better response than

men• To reproduce 20 kHz requires a sampling rate

of 40 kHz– Below the Nyquist frequency we introduce

aliasing

Page 19: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Effect of Aliasing

• Fourier Theorem states that any waveform can be reproduced by sine waves.

• Improperly sampled signals will have other sine wave components.

Page 20: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Half the Nyquist Frequency

-1.5

-1

-0.5

0

0.5

1

1.5

0 5 10 15 20 25

Page 21: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Nyquist Frequency

-1.5

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8 10 12

Page 22: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Recovery of a sampled sine wave for different sampling rates

Page 23: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

SamplingSampled waveform

0

1 201

Sampled waveform

0

1 201

Sampled waveform

0

1 201

Signal waveform

0

1 201

Impulse sampler

0

1 201

Page 24: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Quantization and encoding of a sampled signal

Page 25: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Quantization Error• When a signal is quantized, we introduce an error - the coded signal is

an approximation of the actual amplitude value.• The difference between actual and coded value (midpoint) is referred

to as the quantization error.• The more zones, the smaller which results in smaller errors.• BUT, the more zones the more bits required to encode the samples ->

higher bit rate

Page 26: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

Digitization of Analog Signal• Sample analog signal in time and amplitude• Find closest approximation

Original signal

Sample value

Approximation

Rs = Bit rate = # bits/sample x # samples/second

3 b

its /

sam

ple

Page 27: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

• All DAC’s have a fixed highest sampling frequency and to guarantee that the input contains no frequencies higher than half this frequency we have to filter them out.

• If we don’t filter out these frequencies, they get aliased and would also contribute to the digitized representation.

Page 28: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365
Page 29: LE 460 L Acoustics and Experimental Phonetics L-13 Anu Khosla DRDO, Delhi khoslaanu@yahoo.co.in 9313979365

For most phonemes, almost all of the energy is contained in the 5Hz-4 kHz range, allowing a sampling rate of 8 kHz. This is the sampling rate used by nearly all telephony systems

CD quality audio is recorded at 16-bit.