open text: speech recognition in opencast matterhorn

15
Open Text: Speech recognition in Opencast Matterhorn Stephen Marquard Centre for Educational Technology University of Cape Town June 2011

Upload: stephen-marquard

Post on 20-Aug-2015

3.285 views

Category:

Education


4 download

TRANSCRIPT

Open Text:Speech recognition in Opencast Matterhorn

Stephen MarquardCentre for Educational Technology

University of Cape TownJune 2011

Project goals• Integrate CMU Sphinx speech recognition engine into

Opencast Matterhorn• Provide easy mechanism for speaker training• Generate automatic transcripts of recorded lectures• Allow users to correct and improve the transcripts • Use feedback to improve recognition accuracy

(of the same, similar or subsequent recordings)

Why is it important?

• Video and audio is more useful if you can:– Navigate it easily– Locate relevant recordings from a large set

• Use by students:– Catch up on missed lectures

(continuous play or read the transcript)– Revision: jump to a particular point or find the lectures which

cover topic X• On the public web:

– Discoverability (search indexing)• Similar advantages to OCR recognition of slides (but harder)

Why is it difficult?

• Audio quality can dramatically affect speech recognition accuracy– Echo and reverberation– Background noise– Microphone location

• Speaker-independent large-vocabulary continuous speech recognition is the hardest type of ASR

• Best case: good acoustics, single speaker (limited dialogue), accent match with the acoustic model, limited vocabulary.

Prior work in ASR for lectures

MIT Lecture Browser (SUMMIT recognizer)

U. Toronto / ePresence PhD prototype by Cosmin Munteanu(SONIC recognizer)

ETH Zurich Integration of CMU Sphinx with REPLAY by Samir Atitallah

Speech recognition software ecosystem

Licensing and patentsClosed

Open

Proprietary

FOSS

Accounting for context: Language model adaptation

• Adapt a language model to more closely resemble the target speech

• Using related text for– Topic modelling (vocabulary, concepts)– Style-of-speech modelling

“ok and um it's quite useful to have a very good diagnostic test of of acute hepatitis um you know to prevent kind of unnecessary um surgery um so hepatitis is really one um example of a cause of acute abdominal pain that doesn't need surgery”

Using Wikipedia for LM adaptation

• Goal is to adapt a “standard” LM to be specific to the topic of the audio

• Start somewhere: title, keywords, text from slides• Select a set of documents, adapt the LM• Using wikipedia, select by similarity: identify the set of

documents most closely related to the starting point or keywords

Baseline performance with Sphinx4 (HUB4 acoustic and language models)

LectureWord Error Rate

(WER)

Thomas Pynchon, The Crying of Lot 49 30%

Mating Systems and Parental Care 31%

Socratic Citizenship: Plato, Apology 32%

Biomolecular Engineering: Engineering of Immunity 32%

Dark Energy and the Accelerating Universe and the Big Rip 35%

Cell Culture Engineering 35%

The American Revolution: The Logic of Resistance 40%

The "Afterlife" of the New Testament and Postmodern Interpretation 41%

What Is It Like to Be a Baby: The Development of Thought 42%

Death: Personal identity, Part IV; What matters? 49%

Milton: Lycidas 51%

Theory of Literature: The Postmodern Psyche 55%

Maximilien Robespierre and the French Revolution 61%

Lecture audio and transcripts from Open Yale Courses http://oyc.yale.edu/ Used under CC-BY-NC-SA license.

Best-case comparison (30% WER)Transcript, HUB4 LM, Wikipedia Similarity LM

before launching into not pynchon today route just take a few moments to look back cover the books that we've brad

and talk about the visions of language that they have offered up

and also just to reflect for mounted on the relationship

imagine between those visions of language

and what is happening outside of fiction in in what we might call the real

world we started this course talking about black boy and a weighing bat

a whole world of pressure political pressure racial tension

pushed on the borders and that work and actually changed its very nature eel for

Before launching into Pynchon today, I thought I would just take a few moments to look back over the books that we've read and talk about the visions of language that they have offered us, and also just to reflect for a moment on the relationship imagined between those visions of language and what is happening outside of fiction in what we might call the real world.

We started this course talking about Black Boy and the way that a whole world of pressure -- political pressure, racial tension -- pushed on the borders of that work and actually changed its very material form.

before launching into not mentioned today really does take a few moments to look back over the books that we've read

and talk about the visions of language that they have offered up

and also just to reflect for movement on the relationship

imagine between those visions of language

and what is happening outside of fiction in in what we might call the reel

well we started this course talking about black boy and a weighing of that

a whole world of pressure political pressure of racial tension

pushed on the borders of bad work and actually changed its very nature eel for

Worst-case comparison (61% WER)Transcript, HUB4 LM, Wikipedia Similarity LM

i'd talk with the french revolution this party do in all the myself will forty-five minutes after throughout beginning

i'm in seoul on

on i wanted it to do

two things unless the revolution through

the eyes of maps that ulmus piano

member of a treaty of public safety arguably without fascists

i'd solicit were not member

ah is jacobo out into an away he incarnated death jacobin chapel back he imparted the french revolution

I'm going to talk about the French Revolution.

It's hard to do.

I'll leave myself about forty-five minutes after I screw around at the beginning.

I want to do two things.

I want to see the Revolution through the eyes of Maximilien de Robespierre, a member of the Committee of Public Safety --arguably, with Saint-Just, its most important member.

In a way, Jacobin -- he incarnated the French Revolution.

i've talked with the french are loose in this part to do in all the myself low forty five minutes after score of beginning

i'm in seoul on

bob and i wanted to do

two things i want the revolution through

the eyes of maps that elvis piano

a member of the treaty of public safety are giveaway with that fascists

i thought it were not member

ah gee i go back into a a way he imparted that chappel been the chapel back he imparted the first revolution

Work in progress

• Identify requirements for recording recognition-quality audio (equipment, acoustics)

• Implement dynamic language model adaptation• Integrate into Opencast Matterhorn workflow• Show transcript to users in UI, enable search• Allow users to edit / improve transcript• Use edits to improve recognition

Other integration possibilities

• External transcription services (automate the workflow, choice between manual or automatic transcript)

• External speech recognition services (e.g. nexiwave.com)

Find out more

Email me:[email protected]

Follow me on Twitter:

http://twitter.com/stephenmarquard

Read my blog on open source language modelling and speech recognition: http://trulymadlywordly.blogspot.com

CMU Sphinxhttp://cmusphinx.sourceforge.net/