03/04/2005enee408g spring 2005 multimedia signal processing 1 enee408g: capstone design project:...

17
03/04/200 5 ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital Speech Processing

Post on 21-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 1

ENEE408G: Capstone Design Project:

Multimedia Signal Processing

Design Project 3:Digital Speech Processing

Page 2: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 2

Outline of Design Project 1

Part I : Speech Analysis Part II : Speech Coding: Linear Predictive

Vocoder Part III: Speech Recognition by IBM

ViaVoice Part IV: Speech Synthesis Part V : Human Computer Interface Part VI: Mobile Computing and Pocket PC

Programming

Page 3: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 3

Adjust the Microphone Device

Use Sound Recorder By accessoriesentertainmentsound

recorder Select Line-In 2/Mic 2

By Editaudio propertiessound recording Volume

Page 4: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 4

Part I. Speech Analysis (1)

• Human Vocal Apparatus

Page 5: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 5

Part I. Speech Analysis (2)

ImpluseTrain

Generator

Vocal TractModel

Pitch Period

Vocal TractParameters

speech

WhiteNoise

X

Voiced

Unvoiced G

• Vocal Tract Model

Page 6: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 6

Part I. Speech Analysis (3)

COLEA toolbox: Waveform on Time Domain Spectrogram Pitch and Formant Tracking LPC Spectra

Record your own voice and analyze pitch and formants.

Page 7: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 7

Part I. Speech Analysis (4)

Page 8: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 8

Part I. Speech Analysis (5)

LPC Analysisby proclpc.m

FeaturesExtraction for

Training Set

GenderIdentificatoin

TrainingSet

Unknow genderwave files

Male / Female

Gender Identification: Use Auditory Toolbox to obtain Linear

Predictive coefficients. Design your algorithm to identify the gender

of samples in the training set. Test your algorithm on 9/26 by new

samples.

Page 9: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 9

Pat II. Linear Predictive Vocoder: Encoder

Encoder:

FrameSegmentation&LPC analysisproclpc.m

OrignalSpeech

LPC to LSPlpcar2ls.m

Q{wk}k=1~10 {ak}k=1~10

QGain, UV/V,T

2.4kbpscompressed

Speech

Page 10: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 10

Part II. Linear Predictive Vocoder:Decoder

2.4kbpscompressed

Speech

iQ

iQ

LSP to LPClpcls2ar.m

{w’k}k=1~10 {a’k}k=1~10

Gain’

Impluse TrainGenerator

White Noise

UV/V,T’

LPCsynthesis &

Framecombinationsynlpc.m

Source

ReconstructedSpeech

Page 11: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 11

Part III. Speech Recognition

IBM ViaVoice ViaVoice Training: Operate PC by ViaVoice

Page 12: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 12

Part III. IBM ViaVoice Training

Start from BLUE word.

Keep specking, the recognized words become GRAY.

If you hear sounds or the BLUE sign stop in a specific word, return to the blue word and read the BLACK sentence again.

Page 13: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 13

Part III. IBM ViaVoice Dictation

Speak Pad

Menu Bar: 1. Menu Button 2. Microphone State 3. Status Area 4. ViaCenter Help 5. Current User

Page 14: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 14

Part IV. Speech Synthesis

Text-To-Speech and Talking Head

Vowel Synthesis

Demo

Page 15: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 15

Part V. Human Computer Interface

CSLU Human Computer Interface Rapid Application Developer (RAD) StartSpeech Toolkit RAD

MIT Galaxy System JUPITER: Weather Information System

http://www.sls.lcs.mit.edu/sls/applications/jupiter.shtml

TEL: 1-888-573-8255 PEGASUS: Airline Flight Planning System

http://www.sls.lcs.mit.edu/sls/applications/pegasus.shtml

TEL: 1-877-527-8255

Page 16: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 16

Part VI. Pocket PC Programming

Apply what you learned from previous parts and design a simple application related to digital speech processing by Microsoft eMbedded Tools for Pocket PC.

Page 17: 03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital

03/04/2005

ENEE408G Spring 2005 Multimedia Signal Processing 17

Announcement

Final Project: Return PC camera, Pocket PC Check out Pocket PC