speech processing introduction. 4 september 2015veton këpuska2 syllabus ece 5525 speech processing...

33
Speech Processing Introduction

Upload: sylvia-oneal

Post on 26-Dec-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

Speech Processing

Introduction

Page 2: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 2

Syllabus ECE 5525 Speech Processing Contact Info:

Këpuska, VetonOlin Engineering Building, Office # 353Tel. (321) 674-7183E-mail: [email protected]://my.fit.edu/~vkepuska/ece5525

CRN   Subj   Crse   Sec   Credits   Title  81888   ECE  5525  E1  3.00   Speech Processing  

Instructor(s): Veton Këpuska  Textbook(s):

“Discrete-Time Speech Signal Processing: Principles and Practice”, Thomas F.Quatieri, Prentice Hall, 2002

Reference Material: “Digital Processing of Speech Signals”,

L.R. Rabiner and R.W. Schafer, Prentice Hall, 1978 “Digital Signal Processing”,

Alan V. Oppenheim and Ronald W. Schafer, Prentice Hall, 1975 “Digital Signal Processing A Practical Approach”,

Emmanuel C. Ifeachor and Barrie W. Jervis, Second Edition, Prentice Hall, 2002

Page 3: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 3

Syllabus Course Goals:

Teach modern methods that are used to process speech signals.

Subject Area: Digital Speech Processing

Prerequisites by Topic: Digital Signal Processing

Topics Covered: Discrete-Time Speech Signal Processing, Production and Classification of Speech Sounds, Acoustic Theory of Speech Production, Speech Perception, Speech Analysis, Speech Synthesis, Homomorphic Speech Processing, Short-Time Fourier Transform Analysis and Synthesis, Filter-Bank Analysis/Synthesis, Sinusoidal Analysis/Synthesis, Frequency Domain-Pitch Estimation, Nonlinear Measurement and Modeling Techniques Coding of Speech Signals, Speech Enhancement, Speaker Recognition, Methods for Speech Recognition, Digital Speech Processing for Man-Machine Communication by Voice

Recommended Grading Homework: 20% Exams: 30% Project(s): 50%

MATLAB Exercises and Homework Problems

Page 4: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 4

Course Information

http://my.fit.edu/~vkepuska/web or Directly

http://my.fit.edu/~vkepuska/ece5525/

Page 5: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

Discrete-Time Speech Signal Processing

Introduction

April 19, 2023 Veton Këpuska 6

Page 6: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

Introduction

The topic of this study is as old as our human language and as new the newest as computer chip.

Goal of creation of more efficient and effective systems for: human-to-human communication, as well

as More recently human-machine

communication, is studied.

April 19, 2023 Veton Këpuska 7

Page 7: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

History

1960’s Digital Signal Processing (DSP):Assumed a central role in speech communication studies:

It enabled development of a large number of applications

Advances in IC technology DSP algorithms Computer architecture

April 19, 2023 Veton Këpuska 8

Page 8: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

History

Created environment with virtually limitless opportunities for innovation in Speech Processing Image Processing Video Processing Radar & Sonar Medical Diagnostics, and Consumer Electronics

April 19, 2023 Veton Këpuska 9

Page 9: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

History

Three levels of understanding are required to appreciate the technological advancement of DSP Theoretical Conceptual Practical

April 19, 2023 Veton Këpuska 10

Ability to implement theory and concepts in working code

(MATLAB, C, C++, JAVA)

Basic understanding of how theory is applied

Mathematics, derivations, signal

processing

Page 10: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

Speech Technology

Theoretical: Acoustic Theory of Speech Production The basic Mathematics of Speech Signal

representation. Derivation of Various Properties of Speech

associated with each representation Basic Signal Processing mathematics:

Speech signal Sampling Aliasing Filtering and other DSP methods

April 19, 2023 Veton Këpuska 11

Page 11: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

Speech Technology

Conceptual Hos speech Processing theory is applied

in order to make various speech measurements and to estimate and quantify various attributes of the speech signal

April 19, 2023 Veton Këpuska 12

Page 12: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

Speech Technology

Practical For technology to realize its full potential, it is essential to

be able to convert theory and conceptual understanding to practice.

This process involves constraints: Knowledge of the goals of the application Knowledge of the engering tradeoffs and judgments, and Ability to provide implementations in working computer code

(e.g., MATLAB, C/C++ or java), Specialized code running in real-time signal-processing chips Specialized languages and technologies such as:

ASICs FPGAs (Field Programmable Gate Arrays) DSP chips

April 19, 2023 Veton Këpuska 13

Page 13: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

Speech Technology

What is the nature of speech signal? How do DPS techniques play a role in

learning about the speech signal? What are the basic digital

representations of speech signal and how are they used in algorithms for speech processing?

What are the important applications that are enabled by digital speech processing methods?

April 19, 2023 Veton Këpuska 14

Page 14: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 15

Discrete-Time Speech Signal Processing

Speech has evolved as a primary form of communication between humans.

Technological advancement has enhanced our ability to communicate: One early case is the transduction by a

telephone handset of the continuously-varying speech pressure signal at the lips output to continuously-varying (analog) electric voltage signal.

Digital technology has brought communication to a new level. This technology requires additional transduction of the signals: Analog-to-digital (A/D) and Digital-to-Analog

converter (D/A), vs Continuous Analog Signal

Page 15: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

Speech Communication Chain

April 19, 2023 Veton Këpuska 16

Page 16: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 17

Discrete-Time Speech Signal Processing

The topic of this course can be loosely defined as:

The manipulation of sampled speech signals by a digital processor to obtain a new signal with some desired properties.

Example: Changing a speaker’s rate of articulation with the use of digital computer. Modification of articulation rate (referred to as time-

scale modification of speech) has as an objective to: generate a new speech waveform that corresponds to a

person talking faster or slower than the original rate, maintain the character of the speaker’s voice (i.e., there

should be little change in the pitch and spectrum of the original utterance).

Page 17: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 18

Time-Scale Modification Example

Useful applications: Fast Scanning of a

long recording in a message playback system.

Slowing Down difficult to understand speech.

Page 18: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 19

The Speech-Communication Pathway

The linguistic level of communication: Idea is first formed in the mind of the speaker. This idea is then transformed to words, phrases, and

sentences according to the grammatical rules of the language.

The physiological level of communication: The brain creates electric signals that move along the

motor nerves these electric signals activate muscles in the vocal tract and vocal cords.

The acoustic level in speech communication pathway: This vocal tract and vocal cord movement results in

pressure changes within the vocal tract, and, in particular, at the lips initiating a sound wave that propagates in space.

Pressure changes at the ear canal cause vibrations at the ear drum of the listener.

Page 19: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 20

The Speech-Communication Pathway

The physiological level of communication pathway: Eardrum vibrations induce electric signals that move

along the sensory nerves to the brain. The linguistic level of the listener:

The brain performs speech recognition and understanding.

The linguistic and physiological activity of: The speaker => “transmitter” The listener => “receiver”

Page 20: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 21

The Speech-Communication Pathway

Transmitter and Receiver of the speech-communication system have other functions besides basic communication: Monitoring and correction of one’s own speech via

the feedback through speakers ear (importance of the feedback studied in the speech of the deaf). Control of articulation rate Adaptation of speech production to mimic voices, etc.

Receiver performs voice recognition: Robust to noise and other interferences Able to focus on a single low-volume speaker in a room

full with louder interfering multiple speakers (cocktail party effect).

Page 21: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 22

The Speech-Communication Pathway

Significant advances in reproducing parts of this communication system by synthetic means.

Far from emulating the human communication system.

Page 22: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 23

Analysis/Synthesis Based on Speech Production and Perception

This class does not cover entire speech communication pathway: Signal measurements of the acoustic waveform.

From these measurements and current understanding of speech of how the vocal tract and vocal cords produce sound waves production models are build.

Starting from analog representations which are then transformed to discrete-time representations (A/D).

From the receiver side the signal processing of the ear and higher auditory levels are covered however to significantly lesser extend only to account for the effect of speech processing on perception.

Page 23: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 24

Analysis/Synthesis Based on Speech Production and Perception Preview of a speech model.

Figure in the next slide shows a model of vowel production.

In a vowel production: Air is forced from the lungs (by contraction of the muscles

around the lung cavity). Air then flows pass the vocal cord/folds (two masses of

flesh) causing periodic vibration of the cords. The rate of vibration of the cords determines the pitch of

the sound. Periodic puffs caused by vibration of cords act as an

excitation input, or source, to the vocal tract. The vocal tract is the cavity between the vocal cords and

the lips: Vocal tract acts as a resonator that spectrally shapes the

periodic input (much like the cavity of the musical wind instrument).

Page 24: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 25

Analysis/Synthesis Based on Speech Production and Perception

Page 25: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 26

Analysis/Synthesis Based on Speech Production and Perception

From this basic understanding of the speech production mechanism a simple engineering model can be build, referred to source/filter model.

In this model the following is assumed: Vocal tract is a liner time-invariant system (or

filter), This linear time-invariant system is driven by a

periodic impulse-like input. Those assumptions imply that the output at the

lips that is itself periodic.

Page 26: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 27

Analysis/Synthesis Based on Speech Production and Perception Example of a vowel is “a” as in the word “father”. Vowel “a” is one of many basic sounds of a language -

called phonemes. For each phoneme a different production model is built. Typical speech utterance consists of a string of vowel and

consonant phonemes whose temporal and spectral characteristics change with time.

This change corresponds to the changes in excitation source and vocal tract system.

This fact implies: A time-varying source and system, Furthermore, in realty there is a complex non-linear interaction

of both. Therefore, even though a simple linear time-invariant model

seems plausible, it does not always represent well the real system.

Page 27: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 28

Analysis/Synthesis Based on Speech Production and Perception Using discrete-time modeling of speech

production the course will cover the design of speech analysis/synthesis systems as depicted in Figure 1.3.

Analysis part extracts underlying parameters of time-varying model from speech waveform.

Synthesis takes extracted parameters and models to put back together the speech waveform.

An objective in this development is to achieve an identity system for which the output equals to input when no manipulation is performed.

A number of other analysis/synthesis methods can be derived based on various useful mathematical representations in time or frequency.

These analysis/synthesis methods are the backbone for applications that transform the speech waveform into some desirable form.

Page 28: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 29

Applications (Project Areas) Speech Modification:

time-scale manipulations: Fitting the speech waveform -

In Radio and TV commercials into an allocated time slot and the synchronization of audio and video presentation.

Speeding up speech – Message playback Voice mail Reading machines and books for the blind

Slowing down speech – Learning a foreign language

Voice transformations using Pitch and spectral changes of speech signal: Voice disguise Entertainment Speech synthesis

Spectral change of frequency compression and expansion: may be useful in transforming speech as an aid to the partially deaf.

Many methods can be applied to music and special effects.

Page 29: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 30

Applications (Project Areas) Speech Coding

Goal is to reduce the information rate measured in bits per second while maintaining the quality of the original waveform. Waveform coders:

Represent the speech waveform directly and do not rely on a speech production model.

Operate in a high range of 16-64 kbps Vocoders:

Largely are speech model-based and rely on a small set of model parameters.

Operate at the low bit range of 1.2-4.8 kbps Lower quality then waveform coders.

Hybrid coders: Partly waveform based and partly speech model-based Operate in the 4.8 – 16 kbps range

Page 30: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 31

Applications (Project Areas)

Applications of speech coders include: Digital telephony over constrained

bandwidth channels Cellular Satellite Voice over IP (Internet) Video phones Storage of Voice messages for computer voice

mail applications.

Page 31: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 32

Applications (Project Areas) Speech Enhancement

Goal is to improve the quality of degraded speech. Preprocess speech before is degraded:

Increasing the broadcast range of transmitters constrained by a peak power transmission limits (e.g., AM radio and TV transmissions).

Enhancing the speech waveform after it is degraded. Reduction of additive noise in

(Digital) telephony Vehicle and aircraft communications

Reduction of interfering backgrounds and speakers for the hearing impaired,

Removal of unwanted convolutional channel distortion and reverberation

Restoration of old phonograph recordings degraded by: Acoustic horns Impulse-like scratches from age and wear

Page 32: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

April 19, 2023 Veton Këpuska 33

Applications (Project Areas) Speaker Recognition

Speech signal processing exploits the variability of speech model parameters across speakers. Verifying a person’s identity (Biometrics) Voice identification in forensic investigation.

Understanding of the speech model features that cue a person’s identity is also important in speech modification where model parameters can be transformed for the study of specific voice characteristics: Speech modification and speaker recognition can be

developed synergistically.

Speech (Voice) Recognition is covered in ECE 5526

Page 33: Speech Processing Introduction. 4 September 2015Veton Këpuska2 Syllabus  ECE 5525 Speech Processing  Contact Info: Këpuska, Veton Olin Engineering Building,

End