speech processing, spring 2006 - aalborg...

21
Speech Processing, I, Zheng-Hua Tan, 2006 1 Speech Processing, Spring 2006 - SIPCom Program - Zheng-Hua Tan Speech and Multimedia Communication Division Department of Communication Technology Aalborg University, Denmark [email protected] Lecture 1: Introduction and Model of Speech Production Speech Processing, I, Zheng-Hua Tan, 2006 2 Part I: Introduction Introduction Speech science and state-of-the-art Course overview Speech production and acoustic phonetics The anatomy of speech production Articulatory phonetics Acoustic phonetics Models of speech production Short-time analysis

Upload: vophuc

Post on 10-Aug-2018

238 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

1

Speech Processing, I, Zheng-Hua Tan, 2006 1

Speech Processing, Spring 2006- SIPCom Program -

Zheng-Hua TanSpeech and Multimedia Communication Division

Department of Communication Technology Aalborg University, Denmark

[email protected]

Lecture 1: Introduction and

Model of Speech Production

Speech Processing, I, Zheng-Hua Tan, 2006 2

Part I: Introduction

IntroductionSpeech science and state-of-the-artCourse overview

Speech production and acoustic phoneticsThe anatomy of speech productionArticulatory phoneticsAcoustic phoneticsModels of speech productionShort-time analysis

Page 2: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

2

Speech Processing, I, Zheng-Hua Tan, 2006 3

HAL talks, listens, reads lips and solves problems

Computer as dream of human being

(After 2001: A Space Odyssey, 1968 )

• Nature and effortless for huamn• Hard for computer• Dream of AI scientists and human• True in 2001: A Space Odyssey

Speech Processing, I, Zheng-Hua Tan, 2006 4

Computer as a reality: state-of-the-art

Demo

Microsoft demo video

Text to speech (TTS)

Festival TTS @ CSTR Edinburg University

Next generation TTS @ AT&T

Page 3: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

3

Speech Processing, I, Zheng-Hua Tan, 2006 5

Information in Speech

Speech coding data ratesRate (bits/sec)

200k 100k 64k 32k 16k 12k 9k 4.8k 2k 1k 500 100 60ADPCM, DPCM, PCM LPC, CELP, MELP, VocodersWaveform coding Parametric (source) coding

Human can understand text: 10 char/sec x 6 bits/ASCII char = 60 bits/sec

Is content in speech more than 60 bits/sec?

Speech Processing, I, Zheng-Hua Tan, 2006 6

Information in Speech – cont.

Examples

“That's one small step for man; one giant leap for mankind.”

-- Neil Armstrong, Apollo 11 Moon Landing Speech

"I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character. I have a dream today!"

-- Martin Luther King, Jr., I Have a Dream

Speech contains speaker identity, emotion, meaning, text. speech techniques

Page 4: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

4

Speech Processing, I, Zheng-Hua Tan, 2006 7

Speech is a complex process

Speech

Physiology

Acoustics

Linguistics

Speech Processing, I, Zheng-Hua Tan, 2006 8

Human speech communication process

Rabiner and Levinson, IEEE Tans. Communications, 1981

Speech coding

Speech synthesis

Speech recognition

Speech understanding

(After Rabiner & Levinson, 1981)

Page 5: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

5

Speech Processing, I, Zheng-Hua Tan, 2006 9

Course Outline

MM1 – Fundamentals of speech science, model of speech production

ZTMM2 – Speech perception

PRMM3-5 – Speech coding

MGCMM6-7 - Speech enhancement

SHJMM8-10 – Speech synthesis and recognition

ZT

Speech Processing, I, Zheng-Hua Tan, 2006 10

Literature

Textbook:J Deller, J Hansen and J Proakis, Discrete-Time Processing of Speech Signals, IEEE Press, 2000.

Reading:Huang, Acero and Hon, Spoken Language Processing, Prentice-Hall, 2001.D. O’Shaughnessy, Speech Communications, IEEE Press, 2000Rabiner and Juang, Fundamentals of Speech Recognition, Prentice-Hall, 1993.Rabiner and Schafer, Digital Processing of Speech Signals, Prentice-Hall, 1978.

Page 6: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

6

Speech Processing, I, Zheng-Hua Tan, 2006 11

Part II: Speech production

IntroductionSpeech production, acoustic phonetics and speech modelling

The anatomy of speech productionArticulatory phoneticsAcoustic phoneticsModels of speech productionShort-time analysis

Speech Processing, I, Zheng-Hua Tan, 2006 12

The speech chain(After Denes & Pinson, 1993)

Page 7: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

7

Speech Processing, I, Zheng-Hua Tan, 2006 13

Schematic diagram of speech production

gross components

finer features

Vocal folds

Speech Processing, I, Zheng-Hua Tan, 2006 14

Block diagram of speech production

Page 8: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

8

Speech Processing, I, Zheng-Hua Tan, 2006 15

Cross section of the larynx

Larynx: the source of most speechVocal cords (folds): the two folds of tissue in the larynx. They can open and shut like a pair of fans. Glottis: the gap between the vocal cords. As air is forced through the glottis the vocal cords will start to vibrate and modulate the air flow. This process is known as phonation. The frequency of vibration determines the pitch of the voice (for a male, 50-200Hz; for a female, up to 500Hz).

Speech Processing, I, Zheng-Hua Tan, 2006 16

Vocal cords

Vocal cords form a relaxation oscillator (voiced excitation)

Page 9: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

9

Speech Processing, I, Zheng-Hua Tan, 2006 17

Glottal flow - excitation

Volume velocity(cc/sec)

Time (ms)50

Openingphase

Closingphase

Closure

Pitch Period = 12.5msFundamental frequency = 1/.0125 = 80Hz

Speech Processing, I, Zheng-Hua Tan, 2006 18

Source-filter model

Vocal tract is a concatenation of lossless tubes with varying cross-sectional areas

Source or excitation(air flow at the vocal cords)

Vocal tract modelling

Linear time-varying FilterOutput

Vocal tract(resonances of the vocal tract)

Page 10: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

10

Speech Processing, I, Zheng-Hua Tan, 2006 19

Digital model of speech production

Digital model of speech production

)(zR

)(zS

)()()()()( Voiced)()()()( Unvoiced

)1(1)(

1

1

0

1

0

zRzVzGzEzSzRzVzEzS

zp

V

zb

VzV N

kk

N

k

kk

==

−=

−=

∏∑=

=

)(zE

)(zG

E(z) also representsthe noise sequence

Speech Processing, I, Zheng-Hua Tan, 2006 20

Type of excitation

Voiced: produced by forcing air through the glottis

vowels (inc. diphthongs) are voicedUnvoiced: generated by forming a constriction at some point along the vocal tract and forcing air through the constriction

Page 11: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

11

Speech Processing, I, Zheng-Hua Tan, 2006 21

Role of the vocal tract

Vowels: produced by exciting a fixed vocal tract with quasi-periodic pulsed of air caused by vibration of the vocal cordsConsonants: a significant restriction and thus weaker in amplitude and noisy-like

Formants: resonances determined by the shape of vocal tract, which form the overall spectrum and the properties of the filter

Speech Processing, I, Zheng-Hua Tan, 2006 22

The speech signal

Speech is a sequence of highly changing soundsWhen producing sounds, the vocal cords and the various articulators slowly change over timeThere is a need to study speech sounds, their production, and the signs used to represent them phonetics

Page 12: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

12

Speech Processing, I, Zheng-Hua Tan, 2006 23

Phonetics

Phonetics: study of speech sounds, their production, and the signs used to represent them.

articulatory phonetics: how they are made by moving various organs in the vocal tract.acoustic phonetics: how they are perceived by the human ear and their physical properties. The study is conducted by observing and measuring the speech waveform and spectrum.

Speech Processing, I, Zheng-Hua Tan, 2006 24

Speech sounds and waveforms

sixteen /s/ /i/ /k/ /s/ /t/ /ee/ /n/

six periodicity, intensity, duration, boundary, etc

Page 13: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

13

Speech Processing, I, Zheng-Hua Tan, 2006 25

Observing pitch from waveforms

Male or female voice?

Speech Processing, I, Zheng-Hua Tan, 2006 26

Spectrogram

Spectrogram – short-time Fourier analysistwo-dimensional waveform (amplitude/time) is converted into a three-dimensional pattern (amplitude/frequency/time)Wideband spectrogram: analyzed on 15ms sections of waveform with a step of 1ms

voiced regions with vertical striations due to the periodicity of the time waveform (each vertical line represents a pulse of vocal folds) while unvoiced regions are solid/random, or ‘snowy’

Narrowband spectrogram: on 50mspitch for voiced intervals in horizontal lines

Page 14: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

14

Speech Processing, I, Zheng-Hua Tan, 2006 27

Sound Spectrogram: an example

Wideband spectrogram

waveform

narrowband spectrogram

F1

F2F3

Speech Processing, I, Zheng-Hua Tan, 2006 28

Phonemes in American English

(After J. Hansen)

Page 15: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

15

Speech Processing, I, Zheng-Hua Tan, 2006 29

Phoneme classification chart

Sound categorization according to the position of the articulators.

(After Rabiner and Schafer, 1978)

Speech Processing, I, Zheng-Hua Tan, 2006 30

Vowel production: examples

(After Joseph Picone )

• Fixed vocal tract shape

• Voiced

• Cross-sectional areaFi

• Tongue position

sound

Page 16: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

16

Speech Processing, I, Zheng-Hua Tan, 2006 31

The vowel space

by the locations of the first and second formant frequencies:

(After Peterson & Barney, 1952)

F1F2

F3

Speech Processing, I, Zheng-Hua Tan, 2006 32

The vowel triangle

Page 17: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

17

Speech Processing, I, Zheng-Hua Tan, 2006 33

Consonant production: examples

(After Joseph Picone )

Speech Processing, I, Zheng-Hua Tan, 2006 34

Diphthongs

A diphthongs involves an intentional movement from one vowel toward another vowelDiffer from two distinct vowels: representing a transition from one vowel target to another, yet neither vowel is actually reachedDiphthongs: (Fig. 2.14, pp129, John3 2000)

/Y/ hide /W/ down /O/ boy /X/ rose

Page 18: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

18

Speech Processing, I, Zheng-Hua Tan, 2006 35

Semivowels

Vowel-like, but weaker than most vowels due to their more constricted vocal tractVoicedSemivowels: (Fig. 2.15, pp130, John3 2000)

Liquids:/r/ ran/l/ liquid

Glides:/w/ want/y/ yard

Speech Processing, I, Zheng-Hua Tan, 2006 36

Nasals

Produced by the glottal waveform exciting an open nasal cavity and closed oral cavity.Similar to vowel but weaker due to limited ability of the nasal cavity to radiate soundNasals:

/m/ moon/n/ noon/G/ sing

Page 19: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

19

Speech Processing, I, Zheng-Hua Tan, 2006 37

Fricatives

Produced by exciting the vocal tract with a steady air-stream that becomes turbulent at some point of constrictionFricatives

Speech Processing, I, Zheng-Hua Tan, 2006 38

Affricates

formed by transitions from a stop to a fricativeAffricates:

/J/ just/C/ channel

Page 20: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

20

Speech Processing, I, Zheng-Hua Tan, 2006 39

Stops (or Plosives)

Stops consonants are transient, non-continuant sounds that are produced by building up pressure behind a total constriction somewhere along the vocal tract, and suddenly releasing this pressureStops

Speech Processing, I, Zheng-Hua Tan, 2006 40

Speech Tool

Speech Filing System- Tools for Speech Research

It performs standard operations such as recording, replay, waveform editing and labelling, spectrographic and formant analysis and fundamental frequency estimation. http://www.phon.ucl.ac.uk/resource/sfs/

Matlab

Page 21: Speech Processing, Spring 2006 - Aalborg Universitetkom.aau.dk/project/sipcom/SIPCom06/sites/sipcom8... · Speech Processing, Spring 2006 ... Processing of Speech Signals, IEEE Press,

21

Speech Processing, I, Zheng-Hua Tan, 2006 41

Summary

Speech technologyThe speech chainAnatomy of speech productionSpeech signals: waveform and spectrogramPhoneticsModelling

Next lecture: Speech Perception