characteristics of speech zlong-term (sentence level, several seconds) ydrastic/irregular changes...

21
Characteristics of Speech Long-term (sentence level, several seconds) Drastic/irregular changes Short-term (frame level, 20ms or so) Regular periodic changes for voiced sounds Noise-like for unvoiced sounds Hard to recognize without context information

Upload: willa-greer

Post on 02-Jan-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Characteristics of Speech

Long-term (sentence level, several seconds) Drastic/irregular changes

Short-term (frame level, 20ms or so) Regular periodic changes for voiced sounds Noise-like for unvoiced sounds

Hard to recognize without context information

Page 2: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Spectrum in Frequency-DomainThree basic characteristics in a spectrum:

Timbre: Spectrum after smoothing Pitch: Distance between harmonics Intensity: Magnitude of spectrum

Second formant F2First formant

F1Pitch freq

Intensity

Page 3: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Timber Demo: Real-time Spectrogram

Simulink model for real-time display of spectrogram dspstfft_audio (Before MATLAB R2011a) dspstfft_audioInput (R2012a or later)

Spectrogram:Spectrum:

Page 4: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Audio Feature Extraction & Recog.

Frame blocking Frame duration of 20 ms

Feature extraction Volume, pitch, MFCC, LPC, etc

Endpoint detection Based on volume & ZCR

Recognition DTW, HMM

Page 5: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Example: Audio Feature Extraction

256 points/frame84 points overlap11025/(256-84)=64 feature vectors per second 0 50 100 150 200 250 300

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Zoom in

Overlap

Frame

0 500 1000 1500 2000 2500-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Page 6: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Three Basic Acoustic Features Three basic speech features

Volume/Energy/Intensity(音量、能量、強度): Vibration Amplitude

Pitch(音高): Fundamental frequency (which is equal to the reciprocal of the fundamental period)

Timbre(音色): The waveform within a fundamental period

These features are perceived subjectively by humans. However, we can use some mathematics to “emulate” human and capture these features.

Page 7: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Acoustic Feature: EnergyEnergy is the square sum of a frame, also known as

intensity or volume.Characteristics:

Usually noise and fricative have low energy. Energy is influence a lot by microphone setup. If we take log of square sum, and times 10, we have

energy in terms of Decibel(分貝) Energy is commonly used in endpoint detection. In embedded system implementation, volume can be

computed as the abs. sum of a frame in order to reduce computation.

Page 8: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Acoustic Feature: Zero Crossing Rate

Zero crossing rate (ZCR) The number of zero crossing in a frame.

Characteristics: Noise and unvoiced sound have high ZCR. ZCR is commonly used in endpoint detection,

especially in detection the start and end of unvoiced sound.

To distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.

Page 9: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic
Page 10: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Pitch

Computation Pitch freq. is the reciprocal of fundamental period. Pitch in terms of semitone:

440log*1269 2

freqsemitone

Page 11: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

一般聲音的產生與接收基本流程

發音體的震動 空氣的波動 耳膜的振動 內耳神經的接收 大腦的辨識

發聲機制 敲擊所引發的自然震動頻率(例:音叉) 空氣摩擦所引發的共振頻率(例:笛子)

Page 12: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Human Speech Production

Page 13: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

The Vocal Tract

Page 14: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Glottal Volume Velocity &Resulting Sound Pressure (Voiced)

Page 15: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Speech Production

Glottal Pulses Vocal Tract Speech Signal

(a) Source Spectrum (c) Output Energy Spectrum

+

+=

=

(b) Filter Function

Page 16: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Acoustical Analysis(speech signal of “ 七” )

Page 17: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Speech Production Modeling

phonation

whispering

frication

compression

vibration

Impulse Train

Generator

Noise Generator

Pitch Period

×u(n)

Time-varying digital filter

Vocal Tract Parameters

s(n)

G

Page 18: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Parametric Representation

×u(n)

G

A(z) s(n)

Z-Transform

Model

Write in A(z)

G = gain of excitationu(n) = excitation source(quasi-periodic pulse train or random noise)

p

kk

knsnuGns a1

)()(.)(

p

k

k

kzSzUGzS za

1

)()(.)(

)(

1

1

1

)(.

)()(

1zAzUG

zSzH p

k

k

k za

Page 19: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

The Speech Model : A Summary

Voiced/unvoiced classification,Pitch period for voiced sounds,The gain parameter, andThe coefficients of the digital filters, {ak}.

p

kk

knsnuGns a1

)()(.)(

p

kk

knsns a1

)()(

Page 20: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

名詞對照 Cochlea:耳蝸 Phoneme:音素、音位 Phonics:聲學;聲音基礎教學法(以聲音為基礎進而教拼字的教學法)

Phonetics:語音學 Phonology:音系學、語音體系 Prosody:韻律學;作詩法 Syllable:音節 Tone:音調 Alveolar:齒槽音

Silence:靜音 Noise:雜訊 Glottis:聲門 larynx:喉頭 Pharynx:咽頭 Pharyngeal:咽部的,喉音的 Velum:軟顎 Vocal chords:聲帶 Esophagus:食管 Diaphragm:橫隔膜 Trachea:氣管

Page 21: Characteristics of Speech zLong-term (sentence level, several seconds) yDrastic/irregular changes zShort-term (frame level, 20ms or so) yRegular periodic

Hints for Exercises

How to generate a sine wave signal: Math formula: MATLAB code:

duration=3;

f=440;

fs=16000;

time=(0:duration*fs-1)/fs;

y=0.8*sin(2*pi*f*t);

plot(time, y);

sound(y, fs);

)2sin(* ftay