06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 1
Application of Noisy Channel, Channel Entropy
CS 621 Artificial Intelligence
Lecture 15 - 06/09/05
Prof. Pushpak Bhattacharyya
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 2
S = {s1 , s2 … sq} R = {t1 , t2 … tq}
Noisy Channel
S R
SPEECH RECOGNITION
( ASR – Automatic SR)
- Signal processing (low level).
- Cognitive Processing (higher level categories).
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 3
Noisy Channel Metaphor
Due to Jelinek (IBM) – 1970’s
Main field of study – speech.
Problem Definition
S = {Speech signals}
= {s1 , s2 … ss}
R = {w1 , w2 … wr}
{s1 , s2 … sp} {w1 , w2 … wq}
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 4
Special and Easier case
Isolated word Recognition (IWR)
Complexity due to ‘Word Boundary’ will not arise.
Example : I got a plate
vs
I got up late
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 5
Homophones: Words have same pronunciation.
Example: bear, beer :
Homographs: Words have same spellings but different meaning
Example: bank; River bank and finance bank
Homophones And Homographs
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 6
World of sounds – speech signals
Phonetics Phonology
World of words Orthography
letters : Consonants
Vowels
World Of Sounds
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 7
Since alphabet to sound mapping is not one to one
Vowels
Tomato
Tomaeto Tomaato
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 8
Sound VariationsLexical variations
‘because’
‘cause because
Allophonic variations
‘because’
because becase
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 9
Allophonic variations: More remarkable example
Do [ δ][U]
Go [G][0]
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 10
Socio-cultural variationssomething
something somethin
formal informal
Dialectic variation Very – bheri in Bengal
apple – ieple in south eple in north aapel in bengal
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 11
Orthography -- Phonology
complex problem
Very difficult to model using ‘Rule Governed’ system.
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 12
Probabilistic ApproachW* = Best estimate for a word given S
N C
S W*
W* = ARGMAX [ P(w|s) ]
w belongs to set of words
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 13
P(w|s) called the ‘parameter’ of the system.
Estimation Training
The probability values need to be estimated from
“SPEECH CORPORA”.
Record speech of many speakers.
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 14
Look of Speech Corpora
Annotation – Unique pronunciation.
Signal
Apple
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 15
Repository of Standard Sound Symbols
IPA – International Phonetic Association.
ARPABET – American’s Phonetic STD.
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 16
t
Augment the Roman Alphabet with Greek symbols
e [Є] ‘ebb’
[i] ‘need’
top [ t] IPA
tool [θ] IPA
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 17
Speech corpora are annotated with IPA/ARPABET symbols.
Indian Scenario
Hindi TIFR
Marathi IITB
Tamil IITM
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 18
How to Estimate P(w|s) from speech corpora
count(w,s)/ count(s)Not done this way
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 19
Apply Bayes Theorem
P(w|s) = P(w). P(s|w) / P(s)
W* = ARGMAX (P(w). P(s|w)) / P(s)
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 20
W* =ARGMAX (P(w). P(s|w))
w belongs to Words
P(w) = Prior = Language model.
P(s|w) = Likelihood of W being pronounced as ‘s’.
= Acoustic Model.
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 21
Acoustic Model
Pronunciation dictionary (Finite State Automata).
Manually Built - Costly Resource.
Example
s 1 2 3
4
56 0
t 0 maa t
ae0
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 22
W* obtained from P(w) and P(w|s)
Language model ?
Rel. frequency of w in the corpora
Ref freq Ξ unigram model
P(knee) > P(need)
I _ _ _ _ _
Knee High probability
need Low probability