0910 spktek talteknologi jg.ppt · dialogsystem: joakim gustafson tal, musik och hörsel 5 j....

54
Dialogsystem: Joakim Gustafson Tal, musik och hörsel 1 J. Gustafson, CTT, KTH http://www.speech.kth.se Joakim Gustafson Centrum för Talteknologi, KTH Tal, musik och hörsel, KTH Dialogsystem Dialogsystem Dialogsystem Dialogsystem J. Gustafson, CTT, KTH 2 Agenda Agenda Agenda Agenda Introduction to speaker Spoken Dialogue Systems Research SDSs Commercial SDSs SDS Components An example SDS

Upload: others

Post on 15-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

1

J. Gustafson, CTT, KTH http://www.speech.kth.se

Joakim Gustafson

Centrum för Talteknologi, KTHTal, musik och hörsel, KTH

DialogsystemDialogsystemDialogsystemDialogsystem

J. Gustafson, CTT, KTH 2

AgendaAgendaAgendaAgenda

• Introduction to speaker• Spoken Dialogue Systems• Research SDSs• Commercial SDSs• SDS Components• An example SDS

Page 2: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

2

J. Gustafson, CTT, KTH 3

IntroductionIntroductionIntroductionIntroduction

J. Gustafson, CTT, KTH 4

Joakim GustafsonJoakim GustafsonJoakim GustafsonJoakim Gustafson

1987-1992 Electrical Engineering program at KTH

1992-1993 Linguistics at Stockholm University

1993-2000 PhD studies at CTT/KTH

2000-2007 Senior researcher at Telia R&D department

2007– Assistant Professor CTT/KTH

Page 3: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

3

J. Gustafson, CTT, KTH 5

My research field:My research field:My research field:My research field:spoken dialogue systemsspoken dialogue systemsspoken dialogue systemsspoken dialogue systems

J. Gustafson, CTT, KTH 6

Multi Multi Multi Multi ----disciplinary fielddisciplinary fielddisciplinary fielddisciplinary field

Page 4: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

4

J. Gustafson, CTT, KTH

Research areas• Speech production

• Speech perception

• Communication aids

• Multimodal speech synthesis

• Speech recognition

• Speaker verification

• Spoken conversational speech

• Interactive dialogue systems

CTT CTT CTT CTT ---- Centrum Centrum Centrum Centrum förförförför talteknologitalteknologitalteknologitalteknologi

J. Gustafson, CTT, KTH

The KTH speech groupThe KTH speech groupThe KTH speech groupThe KTH speech group

---- Early daysEarly daysEarly daysEarly days

Gunnar Fant and OVE I 1953

Page 5: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

5

J. Gustafson, CTT, KTH

Namn Nr Årskurs Period Poäng

Talteknologi(Speech Technology) 2F1111 D3-4, E4, F4 3 4

Talteknologi, utökad kurs(Speech Technology, extended course) 2F1112 Media 3-4 3 5

Spektrala transformer för Media 2F1120 Media 3-4 2 4

Musikakustik(Music Acoustics) 2F1212 D4, E4, F4 4 4

Musical communication & music technology(Musikalisk kommunikation och musikteknologi)

2F1213 Media 3-4,E4, D4 4 5

Elektroakustik(Electroacoustics) 2F1400 E4-5, M4-5 1 4

Audioteknik(Audio technology) 2F1410 Media 3-4,

E4, D4 2 5

Orkesterspelets teori(Theory of orchestral performance) 2F1601 alla 6

Orkesterspelets praktik(Orchestral performance) 2F1602 alla 6

Elektroprojekt 2U1700 E1 3-4 5

Medieteknik grundkurs (avsnitt Ljud) 2D1574 Media 2 1-2 12

J. Gustafson, CTT, KTH 10

Spoken Dialogue SystemsSpoken Dialogue SystemsSpoken Dialogue SystemsSpoken Dialogue Systems

Page 6: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

6

J. Gustafson, CTT, KTH 11

Speech application DomainsSpeech application DomainsSpeech application DomainsSpeech application Domains

• Information retrieval systems– e.g. train time table information or directory

inquiries

• Ordering– ticket booking, buying music

• Command control systems– home control (“turn the radio off”) or voice

command shortcuts (“save”)

• Dictation

J. Gustafson, CTT, KTH 12

Application Domains (cont)Application Domains (cont)Application Domains (cont)Application Domains (cont)

• Games and entertainment– Games can take advantage of the more social aspects of

spoken dialogue

• Co-ordinated collaboration– The task of controlling or over-viewing complex situations

requires flexibility and efficiency.

• Expert systems– Diagnose and help systems that need to reason about facts

and goals and may benefit from natural dialogue

• Learning and training – Naturalness, flexibility, and robustness are attractive features

in training environments

Page 7: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

7

J. Gustafson, CTT, KTH 13

The spoken dialogueThe spoken dialogueThe spoken dialogueThe spoken dialoguesystem vision system vision system vision system vision

An interface that allows speakers to interact with a computer using spontaneous, unconstrained speech

J. Gustafson, CTT, KTH 14

Talking machines as we Talking machines as we Talking machines as we Talking machines as we know them from movies…know them from movies…know them from movies…know them from movies…

• Star Trek

• HAL 9000 (2001)

• C3PO (Star Wars)

Page 8: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

8

J. Gustafson, CTT, KTH 15

“Välkommen till

Telia…Här beskriver du

ditt ärende med egna ord

istället för att knappa dig

fram. Om du säger vad du

vill ha hjälp med så kan

jag koppla fram dig. Vad

gäller ditt ärende?”

Hallå? 80 kategorier

Support Agent

Billing info

Mobile Refill

Broadband order

Broadband availability

Mobile info agent

“Ja hallå, vad

gäller ditt ärende?

Det gäller en faktura

Ok ett faktura ärende

[PAUS] Gäller det fast

telefoni,mobil telefoni

bredband eller uppringt

internet?

Bredband

...and since a couple of ...and since a couple of ...and since a couple of ...and since a couple of years from real systemsyears from real systemsyears from real systemsyears from real systems

(an example from 90200)

J. Gustafson, CTT, KTH

How do we want speaking How do we want speaking How do we want speaking How do we want speaking machines to behave?machines to behave?machines to behave?machines to behave?

As us...

...but to a limit?

Page 9: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

9

J. Gustafson, CTT, KTH 17

Interface metaphorsInterface metaphorsInterface metaphorsInterface metaphors

• Exploit specific knowledge that users already have from other domains

• Give the user instantaneous knowledge about how to interact

• Examples include – The desktop – the computer is a workplace

– File system – the computer is a filing cabinet

– Games – the computer is a toy

J. Gustafson, CTT, KTH 18

Metaphors for Metaphors for Metaphors for Metaphors for speech interfacesspeech interfacesspeech interfacesspeech interfaces

• The tool/interface metaphor–Speech as an alternative interface technology in GUIs (multimodality)

• The servant/human metaphor– The computer as an entity with human-like conversational abilities

One is not necessarily better than the other!

Page 10: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

10

J. Gustafson, CTT, KTH 19

Applications of machines with Applications of machines with Applications of machines with Applications of machines with human metaphor interactionhuman metaphor interactionhuman metaphor interactionhuman metaphor interaction

• Research tool• Cassel: ‘‘a machine that acts human enough that we respond to it as we respond to another human”

• Second LanguageLearning

• Social Robots

• Computer Games

J. Gustafson, CTT, KTH 20

What would make a What would make a What would make a What would make a speech interface intuitive?speech interface intuitive?speech interface intuitive?speech interface intuitive?

• The user should not have to learn: – When to speak

– How to speak

– What to say

• The system should support the user by:– Clearly showing when it will speak

– Saying things that reflects what it understands

Page 11: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

11

J. Gustafson, CTT, KTH 21

When to speakWhen to speakWhen to speakWhen to speak

• The computer understands when the user intends to stop or continue speaking

• The computer indicates when it intends to stop or continue speaking

J. Gustafson, CTT, KTH 22

Controlling the dialogue flowControlling the dialogue flowControlling the dialogue flowControlling the dialogue flow

• Conversation has two channels– Exchange of information

– Control of the exchange of information

• Control– Turn-taking (who speaks when)

– Feedback (attention, attitude…)

Dialogue systems needs both controls

Page 12: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

12

J. Gustafson, CTT, KTH

Using superficial methods to Using superficial methods to Using superficial methods to Using superficial methods to handle dialogue flowhandle dialogue flowhandle dialogue flowhandle dialogue flow

• Talk after the beep

• Talk-now icon/cursor

• Push-to-talk button

J. Gustafson, CTT, KTH 24

• Facial gestures and head movements

• Prosody and extra-linguistic sounds

Using humanUsing humanUsing humanUsing human----like methods to like methods to like methods to like methods to handle dialogue flowhandle dialogue flowhandle dialogue flowhandle dialogue flow

Page 13: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

13

J. Gustafson, CTT, KTH 25

The MonAMI ReminderThe MonAMI ReminderThe MonAMI ReminderThe MonAMI Reminder

Google Calendar

ECA

SMS notifications

Web GUI

15.00 Meeting with Sara at Wayne’s Coffee

Monday 13

Handwriting

J. Gustafson, CTT, KTH 26

Attention and interaction controllerAttention and interaction controllerAttention and interaction controllerAttention and interaction controller

NonAttentive

Attentive Speaking

Listening

UserStartLook

SystemInitiative

SystemResponse

SystemStopSpeak

UserStopLook

Holding

Timeout

Interrupted

UserStartSpeak SystemIgnore (resume)

SystemResponse (restart)SystemIgnore

Calling

PausingSpeakingAttending

Not attending

Attending

Speaking

Not attending

SystemInitiative

UserStartLook

SystemResponse

UserStartSpeak

UserStopLookUserStartLook (resume)

SystemStopSpeak

System

User

SystemAbortSpeak

Page 14: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

14

J. Gustafson, CTT, KTH 27

Facial animationFacial animationFacial animationFacial animation

NonAttentive Attentive SystemIgnoreListening

J. Gustafson, CTT, KTH 28

Experiments in a tutoring settingExperiments in a tutoring settingExperiments in a tutoring settingExperiments in a tutoring setting

Tutoring setting: Tutor explaining the system to a user

Problem• Is the user talking to the tutor or the system?

• Compare push-to-talk (PTT) with look-to-talk (LTT)

Subjects • 4 elderly persons from the target group

• 4 collegues

Page 15: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

15

J. Gustafson, CTT, KTH 29

Example Example Example Example

J. Gustafson, CTT, KTH 30

Research SpokenResearch SpokenResearch SpokenResearch SpokenDialogue SystemsDialogue SystemsDialogue SystemsDialogue Systems

Page 16: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

16

J. Gustafson, CTT, KTH 31

Classic research dialogue systemsClassic research dialogue systemsClassic research dialogue systemsClassic research dialogue systems

• Historic research systems– Voyager (1989)– ATIS (1992)– SUNDIAL (1993)– TRAINS (1996)

• Application– Philips Train Information (1995)

• Large Efforts– Communicator – Verbmobil– SmartKom

J. Gustafson, CTT, KTH 32

The Nordic SceneThe Nordic SceneThe Nordic SceneThe Nordic Scene

• Stockholm, Sweden– KTH (Waxholm, August, AdApt, Higgins)– Telia (Resebokning, NICE)

• Linköping, Sweden– IDA (LINLIN, Ornitology, Movie Finder)

• Göteborg, Sweden– Lingvistiken (TRINDI, GODIS, TALK)– Chalmers (State-chart xml)

• Aalborg, Odense Denmark• Helsinki, Tampere Finland• Trondheim, Norway

Page 17: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

17

J. Gustafson, CTT, KTH 33

The Waxholm Project (92The Waxholm Project (92The Waxholm Project (92The Waxholm Project (92----95)95)95)95)

• Tourist information– Stockholm archipelago

– time-tables, hotels, hostels, camping and dining possibilities.

• Mixed initiative dialogue– speech recognition

– multimodal synthesis

• Graphic information– pictures, maps, charts and

time-tables.

J. Gustafson, CTT, KTH 34

Waxholm SystemWaxholm SystemWaxholm SystemWaxholm System

Page 18: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

18

J. Gustafson, CTT, KTH

The Waxholm systemThe Waxholm systemThe Waxholm systemThe Waxholm system

J. Gustafson, CTT, KTH 36

The August kiosk at The August kiosk at The August kiosk at The August kiosk at KulturhusetKulturhusetKulturhusetKulturhuset 98 98 98 98

• Swedish spoken dialogue system for public use • Animated agent named after August Strindberg• Speech technology components developed at KTH• Designed to be easily expanded and reconfigured • Used to collect spontaneous speech data

Page 19: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

19

J. Gustafson, CTT, KTH

The August system

J. Gustafson, CTT, KTH 38

The HIGGINS system (03The HIGGINS system (03The HIGGINS system (03The HIGGINS system (03----))))

• The primary domain of HIGGINS is city navigation for pedestrians.

• Secondarily, HIGGINS is intended to provide simple information about the immediate surroundings.

Page 20: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

20

J. Gustafson, CTT, KTH 39

HigginsHigginsHigginsHiggins

J. Gustafson, CTT, KTH 40

The Commercial Speech The Commercial Speech The Commercial Speech The Commercial Speech ApplicationsApplicationsApplicationsApplications

Page 21: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

21

J. Gustafson, CTT, KTH 41

Access Convergence & SpeechAccess Convergence & SpeechAccess Convergence & SpeechAccess Convergence & Speech

J. Gustafson, CTT, KTH 42

Commercial ApplicationsCommercial ApplicationsCommercial ApplicationsCommercial Applicationsof Speech technologyof Speech technologyof Speech technologyof Speech technology

• Dictation– Medical

– Legal

– Windows Vista (95-99% without training?)

• Customer care center automation– Call Routing

– Technical support

• Multimodal mobile applications– Information search

– Content download

– Entertainment

Page 22: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

22

J. Gustafson, CTT, KTH 43

Problem solvingProblem solvingProblem solvingProblem solving

• SpeechCycle– Broadband support Agent

– Video support Agent

– IP Telephony Agent

J. Gustafson, CTT, KTH 44

Some nice dialogue features of Some nice dialogue features of Some nice dialogue features of Some nice dialogue features of Speechcycle’s Agents Speechcycle’s Agents Speechcycle’s Agents Speechcycle’s Agents

• Open prompts with free speech ASR

• References to user’s previous attempts to fix problem

• Answer to free prompts

• Error identification

• Memory of earlier turns

• Encouraging the user to stay

Page 23: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

23

J. Gustafson, CTT, KTH 45

SDS ComponentsSDS ComponentsSDS ComponentsSDS Components

J. Gustafson, CTT, KTH 46

Spoken dialogue processingSpoken dialogue processingSpoken dialogue processingSpoken dialogue processing

Page 24: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

24

J. Gustafson, CTT, KTH 47

The Components of The Components of The Components of The Components of Spoken Dialogue SystemsSpoken Dialogue SystemsSpoken Dialogue SystemsSpoken Dialogue Systems

• Automatic Speech Recognition (ASR)• Natural Language Understanding (NLU)• Dialogue Management (DM)• Natural Language Generation (NLG)• Text-To-Speech synthesis (TTS)• Multimodal modules

J. Gustafson, CTT, KTH 48

Speech RecognizerSpeech RecognizerSpeech RecognizerSpeech Recognizer

Purpose: convert speech to text

Input: digital speech

Output: String(s) or lattice that represent the most probable utterance(s)

Considerations: Vocabulary size, Grammar and Speech type

Page 25: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

25

J. Gustafson, CTT, KTH 49

Speech Recognition Speech Recognition Speech Recognition Speech Recognition

• Identify the speech sounds– Measure the spectrum of the speech signal 100

times per second

• Match the recognized sounds against a pronunciation dictionary– Only the allowed words are active

• Construct a sentence of the most probable words – Taking the probability of the context into account

J. Gustafson, CTT, KTH 50

Statistically based recognitionStatistically based recognitionStatistically based recognitionStatistically based recognition

Search

Language modelPossible word sequences

N-best

Sentence understanding

N-bestN-bestN-best

Phoneme modelsAcoustic deskr.

Lexiconvocabulary + transcription

Chosen sentenceKnowledge

Analysis

Comparison

ParameterisationFFT16 kHz 100 Hz

Spectral analysis

Classification

ContinuousDiscrete

Page 26: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

26

J. Gustafson, CTT, KTH

What makes speechWhat makes speechWhat makes speechWhat makes speechrecognition difficult? recognition difficult? recognition difficult? recognition difficult?

Speaker Channel Listener

• Between speakers– Age

– Gender

– Anatomy

– Dialect

• Within speaker– Stress

– Mood

– Health

– Formel / Spontanteous

• Environment– Additive noise

– Room

• Microphone, Telephone– Bandwidth

– Noise

• Listener– Age

– First Language

– Level of hearing

– Known / Unknown

– Human / Maschine

J. Gustafson, CTT, KTH 52

(.wav)

(.wav)

(.wav)

(.wav)

(.wav)

(.wav)

(.wav) (.wav)

”Flyget, tåget och bilbranschen tävlar om lönsamhet och folkets gunst”.

Född iUSAex-Jugoslavien

(.wav)

Page 27: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

27

J. Gustafson, CTT, KTH

Results 2005Results 2005Results 2005Results 2005

(earlier result in parentesis)(earlier result in parentesis)(earlier result in parentesis)(earlier result in parentesis)

Corpus Speech type Lexicon size Word Error Rate (%)

Human Error Rate (%)

Digit string (phone) spontaneous 10 0.3 0.009

Resource Management

read 1000 3.6 0.1

ATIS spontaneous 2000 2 -

Wall Street Journal

read 64000 6.6 1

Radio News mixed 64000 13.5 (20) -

Switchboard (phone)

conversation 10000 19.3 (38) 4

Call Home (phone) conversation 10000 30 (50) -

J. Gustafson, CTT, KTH 54

Natural Language Natural Language Natural Language Natural Language UnderstandingUnderstandingUnderstandingUnderstanding

Purpose: produce meaning representation from ASR output

Input: String(s) lattice of words

Output: Meaning representation

Considerations: Type of grammar (Finite-state, Full parse or Word-spotting)

Page 28: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

28

J. Gustafson, CTT, KTH 55

Phrases and syntaxPhrases and syntaxPhrases and syntaxPhrases and syntax

I want to take the boat

J. Gustafson, CTT, KTH 56

Robust Utterance AnalysisRobust Utterance AnalysisRobust Utterance AnalysisRobust Utterance Analysis

Page 29: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

29

J. Gustafson, CTT, KTH 57

Pragmatics/world knowledgePragmatics/world knowledgePragmatics/world knowledgePragmatics/world knowledge

• Flights can be delayed• To be delayed is bad• ”Usually” → recurring in time. How often is ”usually”?

• Is this particular flight delayed a lot? • What would that mean for connecting flights?

S: There is a flight at 8.10.

U: Isn’t that usually delayed?

J. Gustafson, CTT, KTH 58

Dialogue ManagementDialogue ManagementDialogue ManagementDialogue Management

Purpose: decide the next system action

Input: a meaning representation from NLU

Output: High-level communicative goal(s)

Page 30: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

30

J. Gustafson, CTT, KTH 59

Automating DialogueAutomating DialogueAutomating DialogueAutomating Dialogue

• To automate a dialogue, we must model:User’s Intentions User’s Intentions User’s Intentions User’s Intentions

• Recognized from input media forms (multi-modality)

• Disambiguated by the Context

System ActionsSystem ActionsSystem ActionsSystem Actions• Based on the task model and on output medium forms

(multi-media)

Dialogue FlowDialogue FlowDialogue FlowDialogue Flow• Content Selection (what to say)

• Information Packaging (how much to say)

• Turn-taking (who can speak/interrupt)

J. Gustafson, CTT, KTH 60

Knowledge SourcesKnowledge SourcesKnowledge SourcesKnowledge Sourcesfor Dialogue Systemsfor Dialogue Systemsfor Dialogue Systemsfor Dialogue Systems

• Context:– Dialogue history– Current task, slot, state

• Ontology:– World Knowledge model– Domain model

• Language:– Models of conversational competence:

• Turn-taking strategies• Discourse obligations

– Linguistic models:• Grammars, Dictionaries, Corpora

• User Model:– Information about user that could be relevant to the dialogue

• Age, gender, spoken language, location, preferences, etc.• Dynamic information: the current Mental State (Belief, Desires, Intentions)

Page 31: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

31

J. Gustafson, CTT, KTH 61

Dialogue Control StrategiesDialogue Control StrategiesDialogue Control StrategiesDialogue Control Strategies

• Finite-state based systems– dialog and states explicitly specified

• Frame based systems– dialog separated from information states

• Agent based systems– model of intentions, goals, beliefs

J. Gustafson, CTT, KTH 62

Natural Language GenerationNatural Language GenerationNatural Language GenerationNatural Language Generation

Purpose: produce an output string to be shown/spoken to the user

Input: Representation from DM

Output: String for TTS (possibly marked for prosody, etc.)

Page 32: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

32

J. Gustafson, CTT, KTH 63

Utterance GenerationUtterance GenerationUtterance GenerationUtterance Generation

• Predefined utterances• Frames with slots• Generation based on grammar and underlying semantics

J. Gustafson, CTT, KTH 64

TextTextTextText----totototo----Speech SynthesisSpeech SynthesisSpeech SynthesisSpeech Synthesis

Input: text string (with tags)

Purpose: speak string to user

Considerations: Human-like, Flexibility

Page 33: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

33

J. Gustafson, CTT, KTH 65

What is speech synthesis?What is speech synthesis?What is speech synthesis?What is speech synthesis?• Recorded human speech

– Words and phrases

– fix vocabulary

• Systematically recorded fragments of speech– one fix speaker

• Record the phoneme transitions "diphones"

• Parametric synthesis (artificial)– Formant synthesis

– articulatory synthesis

• Multimodal synthesis (talking head)

J. Gustafson, CTT, KTH 66

TextTextTextText----totototo----speech (TTS)speech (TTS)speech (TTS)speech (TTS)

text

Linguistic analysis

Prosodic analysis

Phonetic description

Sound generation

Morphological analysisLexicon and rulesSyntax analysis

Rules and lexicon

Rules and unit selection

Concatenation Rules

Language ident.

“abcd”

Page 34: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

34

J. Gustafson, CTT, KTH 67

Text PreprocessingText PreprocessingText PreprocessingText Preprocessing

• Sentence end detection (semicolon, period – ratio, time and decimal point, sentence ending respectively)

• Abbreviations (e.g. – for instance). Changed to their full form with the help of lexicons

• Acronyms (I.B.M – these can be read as a sequence of characters, or NASA which can be read following the default way)

• Numbers (Once detected, first interpreted as rational, time of the day, dates and ordinal depending on their context)

• Idioms (e.g. “In spite of”, “as a matter of fact”– these are combined into single FSU using a special lexicon)

Olov Engwall, Speech synthesis, 2008

J. Gustafson, CTT, KTH 68

GraphemeGraphemeGraphemeGrapheme----totototo----phoneme conversionphoneme conversionphoneme conversionphoneme conversion

• Dictionary:– Store a maximum of phonological knowledge into a lexicon.– Compounding rules describe how the morphemes of dictionary items are modified.– Hand-corrected, expensive– The lexicon is never complete: needs out of vocabulary pronouncer, transcribed by rule.

• Rules:– A set of letter to sound (grapheme to phoneme) rules.– Words pronounced in a such a particular way that they have their own rule are stored in

exceptions directory.– Fast & easy, but lower accuracy

• Machine learning:– Cart tree– Analogy pronunciation

Olov Engwall, Speech synthesis, 2008

Page 35: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

35

J. Gustafson, CTT, KTH 69

Sound generation approachesSound generation approachesSound generation approachesSound generation approaches

• Parametric synthesis (Synthesis by Rule )– Formant synthesis or articulatory synthesis

– Cognitive approach of the phonation mechanism– Speech is produced by mathematical rules that formally describe the

influence of phonemes on one another

– Flexible, but lower quality - today

• Synthesis by Concatenation– Diphones or larger units (unit selection)

– Limited knowledge of the data to be handled– Elementary speech units are stored in a database and then

concatenated and processed to produce the speech signal– One speaker

J. Gustafson, CTT, KTH

TextTextTextText----totototo----Speech Synthesis Speech Synthesis Speech Synthesis Speech Synthesis (TTS) Evolution(TTS) Evolution(TTS) Evolution(TTS) Evolution

1962 1967 1972 1977 1982 1987 1992 1997 2002

Year

Formant

Synthesis

Bell Labs; Joint

Speech Research

Unit; MIT (DEC-

Talk); Haskins

Lab; KTH

LPC-Based

Diphone/Dyad

Synthesis

Bell Labs; CNET;

Bellcore; Berkeley

Speech

Technology

Unit Selection

Synthesis

ATR in Japan; CSTR

in Scotland; BT in

England; AT&T Labs

(1998); L&H in

Belgium

Poor

Intelligibility;

Poor Naturalness

Good

Intelligibility;

Poor Naturalness

Good

Intelligibility;

Customer Quality

Naturalness

(Limited Context)

HMM

Synthesis

HTS, Nagoya,

Edinburgh

Page 36: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

36

J. Gustafson, CTT, KTH 7171

Formant synthesis (1959Formant synthesis (1959Formant synthesis (1959Formant synthesis (1959Formant synthesis (1959Formant synthesis (1959Formant synthesis (1959Formant synthesis (1959--------1987)1987)1987)1987)1987)1987)1987)1987)

• Haskins, 1959 • KTH – Stockholm, 1962• Bell Labs, 1973 • MIT, 1976• MIT-talk, 1979• Speak ‘N spell, 1980• BELL Labs, 1985• Dec talk, 1987

J. Gustafson, CTT, KTH

Source filter theorySource filter theorySource filter theorySource filter theory

Source

(glottal flow)filter

(shape of vocal tract)

Radiation

(lips)

Time:

Frequency:

Page 37: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

37

J. Gustafson, CTT, KTH 73

ArticulatoryArticulatoryArticulatoryArticulatory parametersparametersparametersparameters

• Jaw opening

• Lip rounding

• Lip Protrusion

• Tongue position

• Tongue height

• Tongue tip

• Velum

• Hyoid

J. Gustafson, CTT, KTH

Concatenative synthesisConcatenative synthesisConcatenative synthesisConcatenative synthesis• Diphone synthesis

– Svensk(Mbrola)

• Unit selection– Svensk weather (CSTR) good, less good

Page 38: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

38

J. Gustafson, CTT, KTH

Diphone synthesisDiphone synthesisDiphone synthesisDiphone synthesis

• Mid-phone is more stable than edge:

• Mid-phone is more stable than edge:

Slide from Dan Jurafsky

J. Gustafson, CTT, KTH

Diphone SynthesisDiphone SynthesisDiphone SynthesisDiphone Synthesis

• Well-understood, mature technology

• Problems:– Signal processing still necessary for modifying durations

– Source data is still not natural

– Units are just not large enough; can’t handle word-specific effects, etc

Slide from Dan Jurafsky

Page 39: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

39

J. Gustafson, CTT, KTH

Unit Selection SynthesisUnit Selection SynthesisUnit Selection SynthesisUnit Selection Synthesis

• Natural data solves problems with diphones– Diphone databases are carefully designed but:

• Speaker makes errors• Speaker doesn’t speak intended dialect• Require database design to be right

– If it’s automatic• Labeled with what the speaker actually said• Coarticulation, schwas, flaps are natural

• “There’s no data like more data”– Lots of copies of each unit mean you can choose just the

right one for the context– Larger units mean you can capture wider effects

Slide from Dan Jurafsky

J. Gustafson, CTT, KTH

Unit Selection SummaryUnit Selection SummaryUnit Selection SummaryUnit Selection Summary

• Advantages– Quality is far superior to diphones

– Natural prosody selection sounds better

• Disadvantages:– Quality can be very bad in places

• HCI problem: mix of very good and very bad is quite annoying

– Synthesis is computationally expensive

– Can’t synthesize everything you want:

• Diphone technique can move emphasis

• Unit selection gives good (but possibly incorrect) result

Slide from Richard Sproat

Page 40: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

40

J. Gustafson, CTT, KTH

Current trend Current trend Current trend Current trend –––– machine learningmachine learningmachine learningmachine learning

• Problems with Unit Selection Synthesis– Can’t modify signal

– (mixing modified and unmodified sounds bad)

– But database often doesn’t have exactly what you want

• Solution: HMM (Hidden Markov Model) Synthesis– Won the last TTS bakeoff.

– Sounds unnatural to researchers

– But naïve subjects preferred it

– Has the potential to improve on both diphone and unit selection.

Slide from Dan Jurafsky

J. Gustafson, CTT, KTH 80

HMM synthesisHMM synthesisHMM synthesisHMM synthesis

• A speech synthesis technique based on• HTK (Hidden Markov Model Toolkit)• Developed by the HTS working group• at the Department of Computer Science Nagoya

• Institute of Technology• Interdisciplinary Graduate School of Science and

• Engineering Tokyo Institute of Technology• http://hts.sp.nitech.ac.jp

Page 41: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

41

J. Gustafson, CTT, KTH 81

Features used in our Features used in our Features used in our Features used in our Swedish HMM synthesis trainingSwedish HMM synthesis trainingSwedish HMM synthesis trainingSwedish HMM synthesis training

• Segment features: – immediate context– position in syllable

• Syllable features – Stress and lexical accent type– position in word and phrase

• Word features – number of syllables– position in phrase– morphological feature (compound or not)– part-of-speech tag (content or function word)

• Phrase features– phrase length in terms of syllables and words

• Utterance features:– length in syllables, words and phrases

• Speaker – dialect

J. Gustafson, CTT, KTH

Dalarna Norrland Skåne Gotland SvealandGötaland

vart tar universum slut

centerpartister och kristdemokrater menar dock att

brudparet kan slippa betala det kan bådas föräldrar

göra

gula sidorna finns från och med i fredags sökbart via

mobiltelefonen

om man till exempel tar en telefon och frågar hur den

fungerar så svarar vetenskapsmannen med att lyfta på

luren och slå numret

Page 42: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

42

J. Gustafson, CTT, KTH 83

Multimodal SynthesisMultimodal SynthesisMultimodal SynthesisMultimodal Synthesis(talking heads)(talking heads)(talking heads)(talking heads)

Input: tagged text string

Purpose: support synthesis with lip movements

Considerations: Synchronization, Human-like?

J. Gustafson, CTT, KTH 84

Why Talking Heads?Why Talking Heads?Why Talking Heads?Why Talking Heads?

• The most natural form of communication that we know of is face-to-face interaction

• A virtual talking head can improve information transfer

– Through expression, gaze, head movements

– Through lip- cheek and tongue movements

Page 43: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

43

J. Gustafson, CTT, KTH 85

Tasks of an Animated AgentTasks of an Animated AgentTasks of an Animated AgentTasks of an Animated Agent

• Provide intelligible synthetic speech• Indicate emphasis and focus in utterances • Support turn-taking• Give spatial references (gaze, pointing etc)• Provide non-verbal back-channeling• Indicate the system’s internal state

J. Gustafson, CTT, KTH 86

ApplicationsApplicationsApplicationsApplications

• Improved speech synthesis• Human-Computer Interface in spoken dialogue systems

• Aid for hearing impaired• Educational software• Stimuli for perceptual experiments

• Entertainment: games, virtual reality, movies etc.

Page 44: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

44

J. Gustafson, CTT, KTH 87

Techniques for facial animationTechniques for facial animationTechniques for facial animationTechniques for facial animation

• Talking head synthesis requires a signal model and a control model.

The signal model may be • Video-based (2D)

– Enables realistic reproduction– Lacking flexibility

• Muscle based (3D)– Highly flexible– Difficult to gather anatomical data– Computationally intensive (although less of a problem today...)

• Direct parametrisation/morphing based (3D)– Good compromise between flexibility and realism– Simple data acuistion using optical methods

J. Gustafson, CTT, KTH

Direct parameterisationDirect parameterisationDirect parameterisationDirect parameterisation

• 3D-modelling

• Deformation through high-level parameters

• Different possible parameterisations:

– Articulatory oriented

– MPEG-4 (low-level)

Page 45: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

45

J. Gustafson, CTT, KTH 89

Articulatory oriented direct Articulatory oriented direct Articulatory oriented direct Articulatory oriented direct parameterisationparameterisationparameterisationparameterisation

• High-level parameterisation tailored for visual speech animation

• Parameter set includes– Jaw opening

– Lip rounding

– Bilabial closure

– Laboidental closure

• Parameters are normalized relatve to spatial targets

J. Gustafson, CTT, KTH 90

MPEGMPEGMPEGMPEG----4 direct parameterisation4 direct parameterisation4 direct parameterisation4 direct parameterisation

• Original purpose: model based video coding

• The standard defines a generic face object

• 84 feature points (FPs)• 68 facial animation parameters (FAPs)

• FAPs are normalized relative to distances in the face

• Expressed in FAPU (FAP unit)

Page 46: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

46

J. Gustafson, CTT, KTH 91

Measuring talking facesMeasuring talking facesMeasuring talking facesMeasuring talking faces

• Cyberware scanning– High definition 3D shape and texture

– Static

– Expensive equipment

• Motion capture– Captures points in 3D

– Dynamic

– Starting to become affordable

• Video analysis– Can provide texture, contours and points

– Inexpensive hardware

– Typically less accurate than motion capture

J. Gustafson, CTT, KTH

Collection of Collection of Collection of Collection of audioaudioaudioaudio----visual databasesvisual databasesvisual databasesvisual databases

� Eliciting technique: information

seeking scenario

� Focus on the speaker who has the

role of information giver

� The speaker seats facing 4 infrared

cameras, a digital video-camera, a

microphone The other person is only

video recorded.

Page 47: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

47

J. Gustafson, CTT, KTH

Recording and modelRecording and modelRecording and modelRecording and model

J. Gustafson, CTT, KTH

Emotional synthesisEmotional synthesisEmotional synthesisEmotional synthesis

Happiness Anger Fear

Surprise Disgust Sadness

Page 48: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

48

J. Gustafson, CTT, KTH

Interactions: emotion and articulationInteractions: emotion and articulationInteractions: emotion and articulationInteractions: emotion and articulation(from AV speech database (from AV speech database (from AV speech database (from AV speech database –––– EU/PF_STAR EU/PF_STAR EU/PF_STAR EU/PF_STAR

project)project)project)project)

J. Gustafson, CTT, KTH

Datadriven facial synthesis with MPEG4 modelDatadriven facial synthesis with MPEG4 modelDatadriven facial synthesis with MPEG4 modelDatadriven facial synthesis with MPEG4 model

Happy Angry

Sad Surprised

Page 49: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

49

J. Gustafson, CTT, KTH 97

An example SDSAn example SDSAn example SDSAn example SDS

J. Gustafson, CTT, KTH 98

The NICE systemThe NICE systemThe NICE systemThe NICE system

Page 50: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

50

J. Gustafson, CTT, KTH 99

BackgroundBackgroundBackgroundBackground

• NICE was a EU project 2002-2005• Five Partners

– Liquid Media, Sweden

– Scansoft (initially Philips, now Nuance), Germany

– Limsi, France

– TeliaSonera, Sweden

– Nislab, Denmark

• Two systems were developed– a Swedish Fairytale game

– an English HC Andersen story-teller

J. Gustafson, CTT, KTH 100

The NICE system setupThe NICE system setupThe NICE system setupThe NICE system setup

Page 51: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

51

J. Gustafson, CTT, KTH 101

A computer game with A computer game with A computer game with A computer game with multimodal dialoguemultimodal dialoguemultimodal dialoguemultimodal dialogue

• It puts new & different demands on speech and dialogue technology– young users– a game lasts for many hours– many kinds of dialogue: problem solving, negotiation,

information-seeking, socializing, command & control, multi-party, …

– the setup is multimodal and asynchronous

• Many things are easier than for ‘normal’ dialogue systems– low penalty for recognition errors– voluntary suspension of belief– gamers already know all the clichés

• $ Might pay off big some day $

J. Gustafson, CTT, KTH 102

The fairyThe fairyThe fairyThe fairy----tale gametale gametale gametale game

• Two scenes – the room and the fairy-tale world– Scene one – Multimodal training session put-that-there– Scene two – Multi-party negotiation dialogue

• Three characters– Cloddy Hans – slow but friendly Helper character– Karen – fast and sullen Gatekeeper character– Thumbelina – silent supporting character with attitudinal gestures

Page 52: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

52

J. Gustafson, CTT, KTH 103

Inside a fairyInside a fairyInside a fairyInside a fairy----tale braintale braintale braintale brain

• World model: Interrelated (Java) objects representing things, characters, and their relationships.

• Discourse history: Who said what when?

• Agenda: Goals, actions and their causal relationships.

J. Gustafson, CTT, KTH 104

EventEventEventEvent----based dialoguebased dialoguebased dialoguebased dialogue

Page 53: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

53

J. Gustafson, CTT, KTH 105

AttentionalAttentionalAttentionalAttentional and emotionaland emotionaland emotionaland emotionaloutput behaviorsoutput behaviorsoutput behaviorsoutput behaviors

J. Gustafson, CTT, KTH 106

Body gestures and Body gestures and Body gestures and Body gestures and physical actionsphysical actionsphysical actionsphysical actions

Page 54: 0910 spktek talteknologi jg.ppt · Dialogsystem: Joakim Gustafson Tal, musik och hörsel 5 J. Gustafson, CTT, KTH Namn Nr Årskurs Perio d Poäng Talteknologi (Speech Technology)

Dialogsystem: Joakim Gustafson

Tal, musik och hörsel

54

J. Gustafson, CTT, KTH 107

The End