Download - 06.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay.1 Application of Noisy Channel, Channel Entropy CS 621 Artificial Intelligence Lecture 15 - 06/09/05

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 1

Application of Noisy Channel, Channel Entropy

CS 621 Artificial Intelligence

Lecture 15 - 06/09/05

Prof. Pushpak Bhattacharyya


S = {s1 , s2 … sq} R = {t1 , t2 … tq}

Noisy Channel

S R

SPEECH RECOGNITION

( ASR – Automatic SR)

- Signal processing (low level).

- Cognitive Processing (higher level categories).


Noisy Channel Metaphor

Due to Jelinek (IBM) – 1970’s

Main field of study – speech.

Problem Definition

S = {Speech signals}

= {s1 , s2 … ss}

R = {w1 , w2 … wr}

{s1 , s2 … sp} {w1 , w2 … wq}


Special and Easier case

Isolated word Recognition (IWR)

Complexity due to ‘Word Boundary’ will not arise.

Example : I got a plate

vs

I got up late


Homophones: Words have same pronunciation.

Example: bear, beer :

Homographs: Words have same spellings but different meaning

Example: bank; River bank and finance bank

Homophones And Homographs


World of sounds – speech signals

Phonetics Phonology

World of words Orthography

letters : Consonants

Vowels

World Of Sounds


Since alphabet to sound mapping is not one to one

Vowels

Tomato

Tomaeto Tomaato


Sound VariationsLexical variations

‘because’

‘cause because

Allophonic variations

‘because’

because becase


Allophonic variations: More remarkable example

Do [ δ][U]

Go [G][0]


Socio-cultural variationssomething

something somethin

formal informal

Dialectic variation Very – bheri in Bengal

apple – ieple in south eple in north aapel in bengal


Orthography -- Phonology

complex problem

Very difficult to model using ‘Rule Governed’ system.


Probabilistic ApproachW* = Best estimate for a word given S

N C

S W*

W* = ARGMAX [ P(w|s) ]

w belongs to set of words


P(w|s) called the ‘parameter’ of the system.

Estimation Training

The probability values need to be estimated from

“SPEECH CORPORA”.

Record speech of many speakers.


Look of Speech Corpora

Annotation – Unique pronunciation.

Signal

Apple


Repository of Standard Sound Symbols

IPA – International Phonetic Association.

ARPABET – American’s Phonetic STD.


t

Augment the Roman Alphabet with Greek symbols

e [Є] ‘ebb’

[i] ‘need’

top [ t] IPA

tool [θ] IPA


Speech corpora are annotated with IPA/ARPABET symbols.

Indian Scenario

Hindi TIFR

Marathi IITB

Tamil IITM


How to Estimate P(w|s) from speech corpora

count(w,s)/ count(s)Not done this way


Apply Bayes Theorem

P(w|s) = P(w). P(s|w) / P(s)

W* = ARGMAX (P(w). P(s|w)) / P(s)


W* =ARGMAX (P(w). P(s|w))

w belongs to Words

P(w) = Prior = Language model.

P(s|w) = Likelihood of W being pronounced as ‘s’.

= Acoustic Model.


Acoustic Model

Pronunciation dictionary (Finite State Automata).

Manually Built - Costly Resource.

Example

s 1 2 3

4

56 0

t 0 maa t

ae0


W* obtained from P(w) and P(w|s)

Language model ?

Rel. frequency of w in the corpora

Ref freq Ξ unigram model

P(knee) > P(need)

I _ _ _ _ _

Knee High probability

need Low probability


Language Modelling by

N-grams

N – grams

N:

2 – bigrams.

3 – trigrams (Best empirically for English).

Download - 06.09.2005Prof. Pushpak Bhattacharyya, IIT Bombay.1 Application of Noisy Channel, Channel Entropy CS 621 Artificial Intelligence Lecture 15 - 06/09/05

Top Related