ee 516 lecture 1 geoffrey zweig microsoft research 4/2/2009

29
EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Upload: preston-harned

Post on 01-Apr-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

EE 516 Lecture 1

Geoffrey ZweigMicrosoft Research

4/2/2009

Page 2: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Our Topics

From JHU 2002 SuperSID Final Presentation – Reynolds et al.Introducing today!

Page 3: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Topic Coverage By Day• Data Representations and Models (4/23)– Vector Quantization– Gaussian Mixtures– The EM Algorithm

• Speaker Identification (5/7)• Language Identification (5/7)• Hidden Markov Models (5/14)– Dynamic Programming

• Building a Speech Recognizer (5/14)

Page 4: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Language Identification – Why Do it?

• Multi-lingual society– Applications should be able to deal with anyone

• Businesses– Automated help systems– Reservations, account access, etc.– Travel

• Airport Kiosks• Train stations

• Government– Funds research to identify languages– Runs evaluations in it

Page 5: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

How Do You Do it?

English Acoustic Model

French Acoustic Model

Tamil Acoustic Model

Output Likeliest

Gaussian Mixture Models - 4/23

Page 6: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

How Do You Do It? (2)

After Zissman 1996

Simple HMMs – 5/14 Language Models – 4/30

“p ih n s” – probably English…

“k r p s t” – probably Czech…

Page 7: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

How Do You Do It (3)

After Zissman 1996

Same methods multiple timesAcero et al., Chapter 4

4/23

Page 8: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

How Do You Do It? (4)

And we will see several other ways, and combinations!

Run a complete speech recognizer in each language

After Zissman 1996

Page 9: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Gauging Progress – The NIST Evaluations

• National Institute of Standards and Technology• Has sponsored benchmark tests in multiple language processing

areas for over a decade– Topic Detection & Tracking– Content Extraction– Video Analysis– Speech Recognition– Language Identification– Speaker Identification– Machine Translation– http://www.itl.nist.gov/iad/mig/tests/

• Coordination with site funding by Defense Advanced Research Projects Agency (DARPA)

• Along with business interest, the driving force in advancing the State-of-the-Art

Page 10: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

For Example, Progress in Speech Recognition

Page 11: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Language Identification - How Well Can It Be Done – Who Salutes?

Organization Location

Beijing Naphoo Technology Company+ ChinaBrno University of Technology Czech RepublicGeorgia Institute of Technology USAGroupe des Ecoles des Telecommunication, Ecole Nationale Superieure des Telecommunications France

IBM USAIKERLAN Technological Research Center SpainInstitut de Recherche en Informatique de Toulouse FranceInstitute for Infocomm Research SingaporeInstitute of Acoustics, Chinese Academy of Sciences+ China

Institut National de Recherche sur les Transports et Leur Securite France

International Computer Science Institute (USA) USALaboratoire d'Informatique pour la Mecanique et les Sciences de l'Ingenieur France

MIT Lincoln Laboratory USANanyang Technological University SingaporePolitecnico di Torino ItalySpescom Datavoice South AfricaTelefonica I & D SpainTNO Human Factors The NetherlandsTsinghua University ChinaUniversidad Autnoma de Madrid SpainUniversity of the Basque Country SpainUniversity of Stellenbosch South AfricaUniversity of Science and Technology of China+ China

From NIST 2007 LRE Website

Page 12: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

How Well Can it Be Done – What Languages?

From NIST 2007 LRE Website

Page 13: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

How Well Can It Be Done? – Testing Conditions

• 26 languages and dialects• Telephone speech• Multiple duration conditions– 3, 10, 30 seconds

• Detection Error Tradeoff (DET) Curves used to measure performance

Page 14: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

How Well Can it Be Done – Some Numbers

From NIST 2007 LRE Website

Page 15: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Language Identification Project

• Build a language ID system with the Call Friend Data set

• Implement several of the main techniques• Set up a demo on your laptop that will

recognize someone’s language

Page 16: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Flavors of Speaker Recognition

From JHU 2002 SuperSID Final Presentation – Reynolds et al.

Our Focus!

Page 17: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Speaker Recognition – Why Do It?

• Personal Applications– Voice-print passwords– Voicemail transcription – who left that message?

• Business Applications– Calling your bank

• Government– Is that Osama calling from Pakistan?– Prison call monitoring– Automated parolee calling – is he where you think?

Page 18: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

How Do You Do It?

The most basic approach:

Gaussian Mixture Models - 4/23

More recently:Support vector machines operating on GMMs (!)

Page 19: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

How Do You Do It? (2)Also use high-level information!

From JHU 2002 SuperSID Final Presentation – Reynolds et al.

Page 20: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

How Well Can It Be Done – Who Salutes?

From NIST 2008 SRE Presentation, Martin & Greenberg

Page 21: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

More Salutes

From NIST 2008 SRE Presentation, Martin & Greenberg

Page 22: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

From Europe

From NIST 2008 SRE Presentation, Martin & Greenberg

Page 23: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

More From Europe

From NIST 2008 SRE Presentation, Martin & Greenberg

Page 24: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

U.S. Entries

From NIST 2008 SRE Presentation, Martin & Greenberg

Page 25: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

How Well Can It Be Done – Testing Conditions

• Conditions for different amounts of data– 10 sec.– 3-5 minutes– 8 minutes– Separate channel and summed channel conditions

• English-speakers, non-English speakers, multilingual speakers

Page 26: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

How Well Can It Be Done?

Page 27: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Speaker Verification Project

• Implement a Speaker-ID system– Template based– GMM based– SVM based– Vector space model

• Demonstrate it:– NIST data, e.g. 2001 Evaluation – Your own voice – implement on laptop

Page 28: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Speech Recognition Project

• Implement an HMM based recognition system• Use, e.g., Phonebook isolated word data data

set or Aurora digit set• Write features with existing front-end• Build your own HMM trainer/decoder• Set it up on your laptop for online word

recognition (?!)

Page 29: EE 516 Lecture 1 Geoffrey Zweig Microsoft Research 4/2/2009

Highlights of Syllabus• Required Texts:

– Huang, Acero, Hon: Spoken Language Processing– Deng and O’Shaughnessy, Speech Processing– EE516 Reader, at Professional Copy ‘n Print, 4200 University Way

• Grading:– Projects: 50%– Final Exam: 30%– Homework 20%

• Projects:– Small team or individual

• Teams are self-forming– Presentation times TBD– Read ahead & pick an area!!!

• Talk to relevant instructor– Suggest deciding no later than 4/30

• Office Hours at end of class and by appointment• Please sign in on email list!