universität ulm informationstechnik - dialogsysteme dirk bühler, universität ulm13.10.2004...
TRANSCRIPT
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Towards Embedding VoiceXML Applications Through Compilation
[email protected] (Harman/Becker AG)
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Outline
• Introduction
• Compilation procedure
• Preliminary results
• Conclusions, Future work
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Outline
• Introduction- Motivation- Problems with VXML interpretation
• Compilation procedure
• Preliminary results
• Conclusions, Future work
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Application Scenario
Automotive Environment
• Closed Environment- without needs to access dynamic data via web
• Tasks: control onboard devices such as- Navigation- Radio, CD, MP3 entertainment- Telephone (dialing)
Regarding VoiceXML, a static subset based on forms is sufficient for modeling the needed functionality.
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Typical Integration / Automotive
• Integration into cars currently done as- Hardware Integration- Software Integration
• Compilation: e.g. GDML (TEMIC dialog description language) C code native executable/hardware
• MTBF (mean time between failures) is an issue: typically > 10 years required
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Problems with Traditional VXIs
• HTTP/file serving for documents
• VXML/ECMA parsing (SRGS?)
• ECMA interpretation Resource consumption and control are
problematic
VoiceXML Interpreter
Document Server
Implementation Platform
request document
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Why the VoiceXML Standard is Relevant• Standard for dialogue development
• Need developers, e.g. external application providers
• “Buzzword” for clients
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
EcmaScript and its Role in VoiceXML
ECMA-262: Standardized version of JavaScript (1999)
• Dynamically typed imperative language• Expressive: records + 1st order functions = objects• Compact Profile: ECMA 327
In VoiceXML, it may be used• on the client-side • for event handling, cf. HTML,• simple calculations (e.g. check credit card
numbers)• and accessing platform objects VoiceXML specification based on EcmaScript
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
„Lines like these“
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Outline
• Introduction
• Compilation procedure- Approach- VoiceXML example- Form interpretation algorithm
• Preliminary results
• Conclusions, Future work
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Compilation Approach
• Off-line document retrieval: VoiceXML dialogs, grammars, scripts
• No dynamically created documents
• Compilation to EcmaScript - Copy script code in verbatim mode- Include VoiceXML <script> elements and
external scripts
• Adherence to Compact Profile (Try to)
• Platform integration through simple APIs for ASR and TTS
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Advantages
At run-time:
• No HTTP serving
• No XML parsing (VoiceXML and grammars)
• No EcmaScript parsing Resources, control, optimization, integration Intellectual property(?)
VoiceXML Application
Document Server
Implementation Platform
request document
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Translating a VoiceXML <form>
• Primary <form> components: - items: <field>, <block>, ...- <filled> handlers- <grammar> elements
• VoiceXML specifies a “dialog” data structure, corresponding to the current form.
• Idea: Generate this data structure to be processed by a separate Form Interpretation Algorithm (FIA).
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Translating a <field> item
• Spec: Item value is represented as an EcmaScript variable in dialog data structure
• Shadow variables, for storing visit count, events, etc.
• Important components: - @cond, specifying arbitrary guard conditions- <grammar>, for ASR (incl. semantics)- <prompt>, for TTS and audio- event handlers
• Executable parts may be translated as methods (function objects)
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Example: VoiceXML EcmaScript
// EcmaScriptdocument$.pizza_order = { grammars: [ loadGrammarFromURI("pizza.grxml") ], init: function(params$) { [...] dialog.initial1 = (undefined); dialog.initial1$ = { cond: function () { [...] return (document.doInit()); }, grammars: [], prompt: function (count$) { [...] if (true) { prompt ("May I take your order?"); } if (count$ == 2) { prompt("You can say something like: [...]"); } }, handle: function (_event, count$) { [...] if (count$ == 1 && (_event == "noinput")) { $doprompt = true; throw 'continue'; } else if (count$ == 2 && (_event == "noinput")) { initial1 = (’nothing’); $doprompt = true; throw 'continue'; } else throw _event; } };[...]
<vxml version="2.0"><form id="pizza_order"> <grammar src="pizza.grxml"/>
<initial name="initial1"
cond=” document.doInit()”>
<prompt> May I take your order? </prompt> <prompt count="2"> You can say something like: [...] </prompt>
<noinput count="1"> <reprompt/></noinput> <noinput count="2"> <assign name="initial1" expr=“’nothing’"/> <reprompt/></noinput>
</initial>[...]
Entering form “pizza_order” Activate grammar “pizza.grxml” Evaluate “initial1” Evaluate “doInit” Update “doInit$.count” Initiate prompts
System: May I take your order?User: <no input> Handle exception “noinput”
S: You can say something like: [...]U: <no input> Handle exception “noinput”
S: Would you like ......
Entering form “pizza_order” Activate grammar “pizza.grxml” Evaluate “initial1” Evaluate “doInit” Update “doInit$.count” Initiate prompts
System: May I take your order?User: <no input> Handle exception “noinput”
S: You can say something like: [...]U: <no input> Handle exception “noinput”
S: Would you like ......
Entering form pizza_order Activate grammar pizza.grxml Evaluate initial1 Evaluate document.doInit() Update doInit$.count Initiate prompts
System: May I take your order?User: <no input> Handle exception, reprompt
S: You can say something like: [...]U: <no input> Handle exception, assign initial1
S: Would you like ......
Entering form pizza_order Activate grammar pizza.grxml Evaluate initial1 Evaluate document.doInit() Update doInit$.count Initiate prompts
System: May I take your order?User: <no input> Handle exception, reprompt
S: You can say something like: [...]U: <no input> Handle exception, assign initial1
S: Would you like ......
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Form Interpretation Algorithm
• Coded completely in EcmaScript
While processing a form:
• Select field to execute, or quit
• Collect item:- Play (execute) item’s prompts- Pass grammars to recognizer- Determine type of result - and handle exceptions
• Process result and execute fillers
• Goto 10
dialog data structure
item$, item$.cond()
item$.countitem$.prompt()
item$.grammarsapplication.lastresult$
item$.handle()
dialog.filled()
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Outline
• Introduction
• Compilation procedure
• Preliminary results- VXML to Java- VXML on PocketPC- Limitations
• Conclusions, Future Work
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Compiler
• Written in Haskell (GHC) • OS independent: Linux, Windows, Solaris, ...• VoiceXML parsing and recursive document
retrieval (about 300 Lines of code)• Compilation into text (about 300 Lines of code)
Component
Format
Size [KB]
FIA .JAR 32
Front-end .JAR 57
Rhino .JAR 600
Demo (Pizza)
.VXML
6
.JS 15
.JAR 42
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Experiment 1: VoiceXML to Java
• Compilation to .class files using RHINO JavaScript interpreter (Mozilla Project)
• OS independent: Windows, Linux, ...
• Requires: JRE / Java Web Start
• Integration with FreeTTS,
• Typed input, SRGS processor written in Java
Shows VXML ECMA is feasible.
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Demo: http://it.e-technik.uni-ulm.de/~buehler
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Experiment 2: VoiceXML goes PocketPC• Compilation to EcmaScript
• Inclusion in HTML (“Embedding”)
• Viewing with Pocket Internet Explorer
VoiceXML interpretation on PocketPC EcmaScript interpretation (without Java)
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Demo: http://it.e-technik.uni-ulm.de/~buehler
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Experiment 2: Some glitches
• (Very) Limited grammar support
• FIA has to yield control, or else page will be in “Loading” state all the time (no interaction)
• Small modifications in FIA:1. Save state (form, item) and interrupt
execution.2. On user input, resume with saved state.
• Still, VoiceXML application on limited device.
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Limitations
VoiceXML 2.0 subset, similar to X+V:
• Mixed-initiative forms
• In-line and external grammars
• Links, exception handling
• Limited support for subdialogs
Important missing features:
• Application / document / session handling
Glitch: application/document scoping for <script>.
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Outline
• Introduction
• Compilation procedure
• Preliminary results
• Conclusions, Future Work
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Conclusions
• Practical compilation of a subset of VoiceXML
• No dynamically created documents
• Easy to use demos, also for off-line use
• VoiceXML’s dependency on EcmaScript is actually advantageous
• OS-independent compilation feasible and sensible in certain environments, i.e. towards embedded systems
Universität UlmInformationstechnik - DialogsystemeDirk Bühler, Universität Ulm13.10.200
4
Future Work
• Automotive environment: - Compiler for ECMA to Assembler for
integration in existing technology (TEMIC)
- Embedded Java as environment for SW integration in future cars (BMW Car IT)
• Platform integration for ASR
• Extend coverage 1: documents, applications
• Extend coverage 2: VoiceXML 2.1 goodies, e.g. <data> for obtaining XML data without <submit>
• Simplify compilation: Perhaps XSLT?
• Useful for “traditional VXIs”: Cache?
• Go SourceForge?