[k mpjutey nl] [fown l d i] speech recognition and text-to-speech systems

16
[] [] Speech Recognition And Text-to-Speech Systems

Upload: rolf-reed

Post on 26-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

[] []

Speech Recognition

And

Text-to-Speech Systems

Page 2: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Phonology

• Phonetic alphabets

• Phonological rules

• Computational Phonology

• Phonological Learning

• Optimality Theory

Kara Johnson
use of phonetic alphabets to describe pronunciation
Kara Johnson
Used to systematically record how sounds are differently realized in different environment.Also, look at how this system of sounds is related to the grammar in the rest of the sentence.
Kara Johnson
study of computational mechanisms for modeling phonological rules.We will look at speech recognition systems and text-to speech systems
Kara Johnson
how phonological rules can be automatically induced by machine learning algorithms
Page 3: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Phonetics

• Study of the pronunciation of words

• Words are strings of symbols which represent phones

• Can also include prosody

Kara Johnson
includes things like changes in pitch and duration
Page 4: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Phonetic Alphabets

• International Phonetic Alphabet (IPA)– Evolving standard since 1888– Goal is to be able to transcribe the sounds of

all human languages

Page 5: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Phonetic SymbolsIPA

• International Phonetic Alphabet

• Evolving standard since 1988

• Goal is to be able to transcribe sounds of all human languages

ARPAbet

• Specifically for American English

• can be used where non-ASCII fonts are inconvenient (such as in online pronunciation dictionaries)

Page 6: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Phonological Rules

• Not all [t]s are created equally

• Phones are pronounced differently in different contexts (phoneme vs. allophone)

• e.g. [t] in tunafish is aspirated

• e.g. [t] in starfish (following initial s) is unaspirated

• Broad transcription vs. narrow transcription

Kara Johnson
broad: leaves out a lot of predictable phonetic detailnarrow: includes allophonic variation, uses of various diacritics, environment, etc.
Page 7: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Phonological Rules

• ladder

• lotus

t

d{ } [ ] / V__V

Page 8: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Two-Level Morphology

• Koskenniemi (1983)• Most phonological rules are independent• Feeding and bleeding relations are rare• Explicitly code when rule is obligatory or optional

Rule type Interpretationa:b c ___ d a is always realized as b in the context c ___ d

a:b c ___ d a may be realize as b only in the context c ___ d

a:b c ___ d

a must be realized as b in the context c ___ d and nowhere else

a:b / c ___ d a is never realized as b in the context c ___ d

Kara Johnson
Rule interaction.bleeding: one rule destroys the environment for another
Kara Johnson
<-- --> opbligatory--> optional rulestwo levels: lexcal & surfacea:b means lexical a maps to suface c
Page 9: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Optimality Theory (OT)

• Prince and Smolensky, 1993

• Is a Connectionist theory of language

• Views phonological derivation based on:– Two functions (GEN and EVAL) and– A set of ranked violable constraints (CON)

• Assumed to be cross-linguistic generalizatoins

Kara Johnson
Has roots in neural network research
Page 10: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Optimality Theory (OT)

• Given underlying form:– GEN function produces all imaginable surface

forms – EVAL function then applies each constraint in

CON to these surface forms in order of constraint rank

Kara Johnson
Even those that couldn't possibly be a legal surface form
Page 11: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Optimality Theory (OT)

• Constraints– Faithfulness (checks how faithful the surface

form is to the underlying form)• e.g. FaithV—says “Don’t delete or insert vowels”• e.g. FaithC—says “Don’t delete or insert

consonants”

– Markedness (imposes requirements on the structural well-formedness of the output)

• e.g. *Complex –says “no complex onsets or codas”

http://en.wikipedia.org/wiki/Optimality_Theory

Page 12: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Optimality Theory (OT)

• Uses constraints to filter out unneeded surface forms

• Some constraints are more important than others

Page 13: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Optimality Theory (OT)

• Can OT be implemented by finite-state transducers?

• Is essential to enforce constraint only if does not reduce possibilities to zero

Page 14: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Optimality Theory (OT)

• Ordinal OT grammars– Tesar & Smolensky (1998) – No absolute ranking values

• i.e. they accepted only an ordinal relation between the constraint rankings

– learning algorithm (Error-Driven Constraint Demotion, EDCD)

• changes the ranking order whenever the form produced is different from the adult form

– Fast and convergent, but extremely sensitive to errors in the learning data

http://www.fon.hum.uva.nl/praat/manual/OT_learning_1__Kinds_of_OT_grammars.html

Kara J
Only ranking order plays a role
Kara J
anddoes not show realistic gradual learning curves
Page 15: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

Optimality Theory (OT)

• Stochastic OT grammars– Boersma (1997b) / Boersma (1998) / Boersma (2000) – every constraint has a ranking value along a

continuous ranking scale – a small amount of noise is added to this ranking

value at evaluation time – associated error-driven learning algorithm (Gradual

Learning Algorithm, GLA) effects small changes in the ranking values of the constraints with every learning step

– can learn languages with optionality and variation

http://www.fon.hum.uva.nl/praat/manual/OT_learning_1__Kinds_of_OT_grammars.html

Kara J
which was something that EDCD could not do
Page 16: [k  mpjutey  nl] [fown  l  d  i] Speech Recognition And Text-to-Speech Systems

SIGMORPHON

• ACL Special Interest Group on Computational Morphology and Phonology (SIGMORPHON)

• formerly known as the ACL Special Interest Group on Computational Phonology (SIGPHON )

• Recent research developments• Matters of interest in computational

phonology and morphology

http://salad.cs.swarthmore.edu/sigphon/systems.shtml