speech recognition principles

27
Speech Recognition Principles

Upload: others

Post on 24-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speech Recognition Principles

Speech Recognition Principles

Page 2: Speech Recognition Principles

Speech Recognition Concepts

2

NLPSpeech

Processing

Text Speech

NLPSpeech

Processing

SpeechUnderstanding

Speech Synthesis

TextPhone

Sequence

Speech Recognition

Speech recognition is the inverse of Speech Synthesis

Page 3: Speech Recognition Principles

Speech Recognition Approaches

Bottom-Up Approach

Top-Down Approach

Blackboard Approach

3

Page 4: Speech Recognition Principles

Bottom-Up Approach

4

Signal Processing

Feature Extraction

Segmentation

Signal Processing

Feature Extraction

Segmentation

Segmentation

Sound Classification Rules

Phonotactic Rules

Lexical Access

Language Model

Voiced/Unvoiced/SilenceK

now

ledge

Sourc

es

Recognized Utterance

Page 5: Speech Recognition Principles

Top-Down Approach

5

Recognized Utterance

Unit

Matching

System

Feature

Analysis

Lexical

Hypo

thesis

Syntactic

Hypo

thesis

Semantic

Hypo

thesis

Utterance

Verifier/

Matcher

Inventory

of speech

recognition

units

Word

DictionaryGrammar

Task

Model

Page 6: Speech Recognition Principles

Blackboard Approach

6

Environmental

Processes

Acoustic

Processes Lexical

Processes

Syntactic

Processes

Semantic

Processes

Black

board

Page 7: Speech Recognition Principles

7

An overall view of a speech recognition system

bottom up

top down

From Ladefoged 2001

Page 8: Speech Recognition Principles

Recognition TheoriesArticulatory Based Recognition

◦ Use from Articulatory system for recognition

◦ This theory is the most successful until now

Auditory Based Recognition◦ Use from Auditory system for recognition

Hybrid Based Recognition◦ Is a hybrid from the above theories

Motor Theory◦ Model the intended gesture of speaker

8

Page 9: Speech Recognition Principles

Recognition Problem

We have the sequence of acoustic symbols and we want to find the words expressed by speaker

Solution : Finding the most probable word sequence having Acoustic symbols

9

Page 10: Speech Recognition Principles

Recognition Problem

A : Acoustic Symbols

W : Word Sequence

we should find so that

10

w

)|(max)|ˆ( AwPAwPw

Page 11: Speech Recognition Principles

Bayse Rule

),()()|( yxPyPyxP

11

)(

)()|()|(

yP

xPxyPyxP

)(

)()|()|(

AP

wPwAPAwP

Page 12: Speech Recognition Principles

Bayse Rule (Cont’d)

12

)(

)()|(max

AP

wPwAP

w

)|(max)|ˆ( AwPAwPw

)()|(max

)|(maxˆ

wPwAPArg

AwPArgw

w

w

Page 13: Speech Recognition Principles

Simple Language Model

13

nwwwww 321

)|()( 1211

wwwwPwP iii

n

i

Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.

Page 14: Speech Recognition Principles

Simple Language Model (Cont’d)

14

)|()( 211

iii

n

iwwwPwP

)|()( 11

ii

n

iwwPwP

Trigram :

Bigram :

)()(1

i

n

iwPwP

Monogram :

Page 15: Speech Recognition Principles

Simple Language Model (Cont’d)

15

)|( 123 wwwP

Computing Method :

Number of happening W3 after W1W2

Total number of happening W1W2

AdHoc Method :

)()|()|()|( 332321231123 wfwwfwwwfwwwP

Page 16: Speech Recognition Principles

16

From Ladefoged 2001

Page 17: Speech Recognition Principles

P(A|W) Computing Approaches

Dynamic Time Warping (DTW)

Hidden Markov Model (HMM)

Artificial Neural Network (ANN)

Hybrid Systems

17

Page 18: Speech Recognition Principles

Dynamic Time Warping Method (DTW)To obtain a global distance between two speech patterns a time alignment must be performed

18

Ex :

A time alignment

path between a

template pattern

“SPEECH” and a

noisy input

“SsPEEhH”

Page 19: Speech Recognition Principles

Recognition Tasks

Isolated Word Recognition (IWR) And Continuous Speech Recognition (CSR)

Speaker Dependent And Speaker Independent

Vocabulary Size◦ Small <20

◦ Medium >100 , <1000

◦ Large >1000, <10000

◦ Very Large >10000

19

Page 20: Speech Recognition Principles

Error Production Factor

Prosody (Recognition should be Prosody Independent)

Noise (Noise should be prevented)

Spontaneous Speech

20

Page 21: Speech Recognition Principles

Artificial Neural Network

21

.

.

.

1x

0x

1w0w

1Nw

1Nx

y)(

1

1

i

N

i

i xwy

Simple Computation Element

of a Neural Network

Page 22: Speech Recognition Principles

Artificial Neural Network (Cont’d)

Neural Network Types◦ Perceptron

◦ Time Delay

◦ Time Delay Neural Network Computational Element (TDNN)

22

Page 23: Speech Recognition Principles

Artificial Neural Network (Cont’d)

23

. . .

. . .0x

0y 1My

1Nx

Single Layer Perceptron

Page 24: Speech Recognition Principles

Artificial Neural Network (Cont’d)

24

. . .

. . .

Three Layer Perceptron

. . .

. . .

Page 25: Speech Recognition Principles

Hybrid MethodsHybrid Neural Network and Matched Filter For Recognition

25

PATTERN

CLASSIFIER

SpeechAcoustic

Features DelaysOutput Units

Page 26: Speech Recognition Principles

Neural Network Properties

The system is simple, But too much iterative

Doesn’t determine a specific structure

Regardless of simplicity, the results are good

Training size is large, so training should be offline

Accuracy is relatively good

26

Page 27: Speech Recognition Principles

Hidden Markov Model

Observation : O1,O2, . . .

States in time : q1, q2, . . .

All states : s1, s2, . . .

27

tOOOO ,,,, 321

tqqqq ,,,, 321

Si Sjjia

ija