goteborg university dialogue systems lab introduction to dialogue systems staffan larsson...
Post on 19-Dec-2015
218 views
TRANSCRIPT
Goteborg University Dialogue Systems Lab
Introduction to dialogue systems
Staffan Larsson
Dialogsystem HT04
Goteborg University Dialogue Systems Lab
Overview
• Why Develop Speech Applications for the Telephone (Larson ch.1)
• Dialogue and dialogue genres• Dialogue modeling and dialogue systems• Research areas & local projects• History of dialogue systems• Methodology for dialogue systems design• (Agenter, dialog och talakter)• (Dialogspel)
Goteborg University Dialogue Systems Lab
Chapter 1.
Why Develop Speech Applications for the Telephone
©2002 Larson Technical Services
Goteborg University Dialogue Systems Lab
Discussion Questions
• Why are speaking and listening fundamental to our lives?
• Why use speech to interact with a computer?• When might speech not be appropriate for interacting
with a computer?• Why are speech applications possible today?• What are the limitations of speech as a user interface
for a computer?• What can you do with a single call?• What are some example speech applications?©2002 Larson Technical Services
Goteborg University Dialogue Systems Lab
Speaking and Listening Are Fundamental to Our Lives.
• People interact by speaking and listening.• It’s “unnatural” when people don’t speak.• We spend the first three years of our lives learning
how to speak and listen.
Section 1.1©2002 Larson Technical Services
Goteborg University Dialogue Systems Lab
Speaking and Listening to a Computer Is Natural and Convenient.
• Despite physical handicaps such as blindness or poor physical dexterity
• To bypass the limitations of small keyboards and screens • When the device has no keyboard • When callers work with their hands and eyes• At anytime during the day • With instant connection without being placed on “hold.” • When languages do not lend themselves to keyboarding • To convey emotion—“earcons”• To access all types of time-sensitive data • To access all types of location-sensitive data • To access all types of public and personal information
• To control computerized processes and activities
Section 1.2©2002 Larson Technical Services
Goteborg University Dialogue Systems Lab
When Speaking and Listening toa Computer May Be Inappropriate.
• Graphics • Pointing • Selecting • Limitation of human memory • Impact of noise
– The computer cannot hear– The caller cannot hear
• Concern about privacy
Section 1.3©2002 Larson Technical Services
Goteborg University Dialogue Systems Lab
Speech Applications arePossible Today
• Speech application enablers– Increased computing power at less expense – Improved algorithms – Improved dialog design– Availability of telephones and cell phones
Section 1.4©2002 Larson Technical Services
Goteborg University Dialogue Systems Lab
Speech ApplicationChallenges
• Limitations of speech interfaces – Speech technologies are not perfect. – Callers have false expectations. – Speech is a transient medium.
Section 1.5©2002 Larson Technical Services
Goteborg University Dialogue Systems Lab
Types of Speech Applications
Application Classes Technologies
Touch- Speech Touch- MultimodalTone Only Tone +Only Speech
Speech in No Yes Yes Yes
Keypad Yes No Yes Yes
Mouse and keyboard No No No Yes
Speech out Yes Yes Yes Yes
Display monitor No No No Yes
©2002 Larson Technical Services
Goteborg University Dialogue Systems Lab
What Can You Do with aSingle Phone Call?
• Commerce– Self-service queries and transactions– Support desks, order tracking airline arrival and departure,
cinema and theater booking, home banking, e-commerce
• Content– Public information—Community information; local, national, and
international news; entertainment information– Personal information—Calendar, addresses, telephone lists, to-
do lists, shopping lists, calorie and exercise logs, personal diaries
• Communication– Initiate telephone calls, sending and receiving e-mail and voice-
Section 1.6©2002 Larson Technical Services
Goteborg University Dialogue Systems Lab
What Can You Do with aSingle Phone Call?
• New dial tone—“How may I help you?”• Voice portals—Verbal Web sites that enable
caller interaction with multiple services by speaking and listening
Section 1.6©2002 Larson Technical Services
Goteborg University Dialogue Systems Lab
Voice Portals• Mass market voice
portal– E-mail– Driving instructions– Traffic conditions– Weather– Telephone number
search– Business reminders– Local information– Stock quotes– Personalized news– Entertainment
information– Sports– Horoscopes
• Corporate external voice portal– Telephone
attendant
– Product and service information
– Order entry
– Help desk
– Banking
– Sales
• Corporate internal voice portal– Customer
relationship management
– Product availability and pricing
– Order status– Human resource
information– Supply chain
management– Customer account
information
Section 1.7©2002 Larson Technical Services
Goteborg University Dialogue Systems Lab
Key Concepts
• Speech enables new applications and new users.• Speech enables the hands-free and eyes-free
computer access.• Speech enables callers to access computers from
whereever they are—at work, but away from their desk; at home; or on the road.
• Speech has drawbacks:– No pointing– Stretches limits of human short-term memory– Privacy
• Callers use voice applications to access commerce, content, and communication services.
©2002 Larson Technical Services
Goteborg University Dialogue Systems Lab
Dialogue and dialogue genres
Goteborg University Dialogue Systems Lab
Dialogue & dialogue systems
• Dia logos = through language• What is interesting for dialogue system
applications?– Spoken natural language human-computer
dialogue in specific domains
• But maybe also– Written?– Multimodal (incl. graphics)?
Goteborg University Dialogue Systems Lab
Discourse and Dialogue
• Discourse (in one sense of the word)– Text; monologue
• Dialog– Speech; multiple participants
• Really two independent dimensions– Modality: text / speech ( / gesture / image )– monologue or dialogue
Goteborg University Dialogue Systems Lab
Dahlbäck (1997) taxonomy
• Modality: spoken/written• Kinds of agents: human/computer• Interaction: dialogue/monologue• Context : spatial, temporal• Number & type of tasks
– Simultaneous?
• Dialogue-task distance– Similarity of dialogue structure – task structure
• Kinds of shared knowledge exploited– Perceptual, linguistic, cultural
Goteborg University Dialogue Systems Lab
Discussion: Dahlbäck
• Several dimensions, some relevant but some not– We currently assume spoken human-computer
dialogue– Dialogue-task distance perhaps too abstract– Context, kinds of shared knowledge used, and
number of tasks relevant, but not yet included in our classification
– Type of task similar to our concept of activity
Goteborg University Dialogue Systems Lab
Allen et. al. (2001)
technique used example task task complexity
dialogue phenomena handled
finite-state script long-distance dialing
least complex user answers questions
frame-based getting train timetable info
user asks questions, simple clarifications by system
sets of contexts travel booking agent
shifts between predetermined topics
plan-based models
kitchen design consultant
dynamically generated topic structures, collaborative negotiation subdialogues
agent-based models
disaster relief management
most complex different modalities (e.g. planned world and actual world)
Goteborg University Dialogue Systems Lab
Discussion:Allen et. al.
• Relates properties of system to properties of activity, BUT
• Based on technologies, not properties of activities– Dialogue phenomena don’t necessarily come in
lumps• Focus on information seeking and
collaborative planning; some types of dialogue not included– Tutorial, Explanatory, Instructional…
Goteborg University Dialogue Systems Lab
Desiderata for a classification of dialogue
• Based on multiple independent properties of (dialogue in) different activities
• Relating properties of activity to properties of system
• Covering not only information seeking and collaborative planning dialogue
Goteborg University Dialogue Systems Lab
Some initial dimensions of classification
• Inquiry-oriented vs. Action-oriented dialogue
• Type of result: simple/complex
• Type of external process: active/passive
• Distribution of decision rights: shared/disjoint
Goteborg University Dialogue Systems Lab
Inquiry-oriented vs. action-oriented dialogue
• IOD: raising and addressing issues – E.g. database search
• AOD: introduces (non-communicative) actions to performed (requests)– E.g. programming a Video Recorder
Dialogue
genre
Moves/rules Information State components
Inquiry-
Oriented (IOD)
ask
answer
Question stack
Action-
Oriented (AOD)
request
confirm
Action stack
Goteborg University Dialogue Systems Lab
Result type
• Is the primary result of the dialogue a simple or a complex information object?– Simple: proposition, action– Complex: plan, proof, explanation
• Complex results require update rules and information state components (e.g. a tree) enabling incremental construction
• Example: offline planning– U: Get me coffee– R: How do I do that?– U: First, go to the kitchen.– R: OK. And then?– U: Go to the coffee machine.– …
Goteborg University Dialogue Systems Lab
Proactivity of external process
• Passive: database, simple device (e.g. Video Recorder)
• (Pro)active: device, e.g. robot, burglar alarm– May need to interrupt current dialogue, perhaps even
interrupt user utterances
• This dimension correlates with – the way the system is connected to the device
• Is the device interface a resource (passive) or a module (active)?
– System intitiative and turntaking mechanisms
Goteborg University Dialogue Systems Lab
Distribution of decision rights
• Disjoint: each question directed to a specific DP ; this DP decides on the answer and does not need to negotiate
• Shared: some question(s) should be answered jointly; negotiation may be needed
• Dialogue system requirements for negotiation:– Dialogue move: propose– Information state component: a stack of pairs of
• issue under negotiation, and• alternative solutions/answers to this issue
• N.B.: we here refer to collaborative negotiation (non-conflicting goals)– E.g. SunDial furniture selection task
Goteborg University Dialogue Systems Lab
activity IOD/AOD
result type external process
decision rights
database search IOD simple: price etc.
complex: itinerary
passive (database)
disjoint
ticket booking AOD+ IOD
simple: flight passive (database)
disjoint
simple device control
AOD+ IOD
simple: actions passive or active
disjoint
instructional (sys instructs usr)
AOD+
IOD
simple: actions passive (manual)
disjoint
offline planning, incl. itinerary planning, complex device control
AOD complex: plan(s) passive (planner)
shared
online planning, e.g. TRIPS
AOD+ IOD
complex: plan active (device+
planner)
shared
explanation IOD complex: proof or explanation
passive (inference engine)
shared
tutorial IOD/ AOD
complex? passive (planner)
disjoint
narration IOD complex: narrative passive disjoint
Goteborg University Dialogue Systems Lab
Possible additional activity-related factors
• Distribution of information– Symmetric: DPs have same kind of information– Asymmetric: DPs have different kinds of information– Relation to distribution of decision rights?
• Shared or conflicting goals– Conflicting goals may lead to non-collaborative negotiation,
which would require argumentation acts, including rhetorical acts
• Number of simultaneous tasks (one or several)– But probably very few activities with just one task
• …
Goteborg University Dialogue Systems Lab
Comments
• What we really are classifying are activities– Table shows a classification of activities according to
features of a dialogue system needed to particitpate in dialogues in these activities
• How specific should our activities, or activity types, be?– Action oriented dialogue? Device control? VCR control?
Dialogue with Panasonic VCR 4500? • Is ”genre” still a useful term?
– Could perhaps be reserved for very basic properties, such as IOD/AOD
– Or have genres like ”AOD for active devices and collaborative negotiation and asymmetric distrubution of information”
Goteborg University Dialogue Systems Lab
Dialogue modelling and dialogue systems
Goteborg University Dialogue Systems Lab
Dialogue modelling
• Theoretical motivations– find structure of dialogue– explain structure– relate dialogue structure to informational and
intentional structure
• Practical motivations– build dialogue systems to enable natural human-
computer interaction– speech-to-speech translation– ...
Goteborg University Dialogue Systems Lab
Informal approaches to dialogue modelling
• speech act theory (Austin, Searle, ...)– utterances are actions
– illocutionary acts: ask, assert, instruct etc.
• discourse analysis (Schegloff, Sacks, ...)– turn-taking, pre-sequences etc.
• dialogue games (Sinclair & Coulthard,...)– structure of dialogue segments (rather than separate utterances)
– can e.g. be encoded as regular expressions or finite automata• qna-game -> question qna-game* answer
Goteborg University Dialogue Systems Lab
Dialogue management frameworks(computational approaches)
• Industry systems– finite state automata – form-based (VoiceXML)
• Research systems:– plan-based
• speech acts as plan operators
– general reasoning (Sadek, ...)• often combined with plan-based
• Information state approach– generalises over all the above
Goteborg University Dialogue Systems Lab
Why build dialogue systems?
• theoretical: test theories– e.g. what kind of information does the system need to
keep track of?– problems
• complex system with many components• how to evaluate (Turing test not so useful)
• practical: natural language interfaces– databases (train timetables etc)– electronic devices (mobile phones,...)– instructional/helpdesk systems– booking flights etc– tutorial systems
Goteborg University Dialogue Systems Lab
What does a system need to be able to do?
• speech recognition• parsing, syntactic and semantic interpretation
– resolve ambiguities– anaphora and ellipsis resolution, etc...
• dialogue management– how does an utterance change the state of the dialogue?– given the current state of the dialogue, what should the
system do?
• natural language generation• speech synthesis
Goteborg University Dialogue Systems Lab
Why spoken dialogue?
• Spoken dialogue is the natural way for people to communicate– computers should adapt to humans rather than the
other way around
• important to enable system and user to communicate in a natural (human-like) way– mixed initiative– turntaking, feedback, barge-in – handle embedded subdialogues– ...
Goteborg University Dialogue Systems Lab
What’s happening with dialogue systems
• Simple systems are being used commercially– Command systems (user command + system response)– Form-filling (system questions + user responses; system
delivers info)
• Limited domains– need to encode domain-specific knowledge; – a general system would require general world knowledge,
which may not be feasible– speech recognition is harder with large lexicon
• Need to bridge gap between dialogue theory and working systems
Goteborg University Dialogue Systems Lab
frameworkdataflow, datastructures, etc.
Framework level
• Framework– Takes care of low-level programming:
dataflow, datastructures etc.
• Examples– Current, TrindiKit, OAA, Communicator,
SOAR
Goteborg University Dialogue Systems Lab
framework
basicdialogue theory
basic system
dataflow, datastructures, etc.
Basic dialogue system
• Formulate an application-independent dialogue theory to instantiate the framework
• Examples– GoDiS, VoiceXML, TRIPS, ...
Goteborg University Dialogue Systems Lab
basicdialogue theory
basic system
genre-specific theoryadditions
genre-specific system
Genre- and modality-specific system
• Add genre- and modality-dependent components
frameworkdataflow, datastructures, etc.
Goteborg University Dialogue Systems Lab
basicdialogue theory
domain & languageresources
basic system
application
genre-specific theoryadditions
genre-specific system
Application
• Add application-specific resources
frameworkdataflow, datastructures, etc.