what does speech “look” like

12
What does speech “look” like to an Automatic Speech Recognition System? Jiang Wu [email protected] Electrical Engineering Department

Upload: duard

Post on 23-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

What does speech “look” like. to an Automatic Speech Recognition System? . Jiang Wu [email protected] Electrical Engineering Department. What are “speech features?”. Say if you are only allowed to use 39 values to represent a speech seg of 1 sec long…. What are good features?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What does speech “look” like

What does speech “look” like

to an Automatic Speech Recognition System?

Jiang [email protected] Engineering Department

Page 2: What does speech “look” like

Say if you are only allowed to use 39 values to represent a speech seg of 1 sec long…

What are “speech features?” What are good features?

◦ Discriminative

◦ “Curse of Dimensionality”

Page 3: What does speech “look” like

How do we extract features from speech?

From both time and frequency domain…

Page 4: What does speech “look” like

For each frame of pre-recorded speech, we try to extract the feature as to compress its spectrum.

Frequency Domain Features : “Static” Features

0 2 4 6 8-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Frequency [kHz]

Am

plitu

de

BV0

BV1BV2

Page 5: What does speech “look” like

-60 -40 -20 0 20 40 60-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Time [ms]

Am

plitu

de

BV0

BV1BV2

Time Domain Features : Trajectories of Static Spectral Features over Time Most recent research has

shown that spectral trajectories, over time, also play an important role in ASR.

Thus, we also want to let computers see what happens over time, about the center of each static feature.

0

10

20

30010

2030

40

0

0.2

0.4

0.6

0.8

1

0

510

1520

25

3005

1015

20

0

0.2

0.4

0.6

0.8

1

Page 6: What does speech “look” like

So finally to the ASR system, the speech features look like…

Time (Sec)

Freq

uenc

y (H

z)

Original Spectrogram

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

1000

2000

3000

4000

5000

6000

7000

8000

Time (Sec)

Freq

uenc

y (H

z)

Rebuilt Spectrogram

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

1000

2000

3000

4000

5000

6000

7000

8000

Page 7: What does speech “look” like

Other Features for Future Study… Pitch Contour :

◦ Strongly related to “tones”◦ Very popular feature type

for tonal languages: Mandarin, Cantonese, Some of Korean dialects, etc.

“Perceptual Features:◦ Analyze speech signal as to

how human’s auditory system “perceptually” process sound

◦ Frequency resolution and time resolution both depend on frequency and time..

0 0.5 1 1.5 2 2.5 3-4000

-2000

0

2000

4000Speech waveform

Time (seconds)

Am

plitu

de

0 0.5 1 1.5 2 2.50

100

200

300Pitch

Time (seconds)

Freq

uenc

y (H

z)

0.5 1 1.5 2 2.5 30

2000

4000

6000

8000Spectrogram

Time

Freq

uenc

y (H

z)

Page 8: What does speech “look” like

Our Speech Lab Dr. Montri Karnjanadecha (Force Alignment) Chandra Vootkuri (Ph.D.) (Landmark Theory) Brian Wong (M.S.) (Freq. Non-linearity) Andrew Hwang(M.S.) (Feature Transform)

Our Current Projects Project 1: To create an open source multi- language

audio database for spoken language processing applications.

Project 2: To understand tonal languages.

Page 9: What does speech “look” like

◦Signal processing (A/D)◦Probability theory, pattern recognition and

machine learning◦Understanding of human auditory system/

linguistic/musicality will be a bonus!

Background you need..

Page 10: What does speech “look” like

Speech Research is interesting!

◦ Pronunciation therapy

◦ Singing voice processing

+

◦ Hearing aids

Tons of interesting applications!!

Page 11: What does speech “look” like

Talk to a public computer in your real life…

◦ Not just the Microsoft speech-text software on your PC..

Page 12: What does speech “look” like

Thank you!Questions?