tr2007-003 february 2007 ai magazine, vol. 28, no. 2 ... · ai magazine, vol. 28, no. 2, summer...

15
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com DiamondHelp: A Generic Collaborative Task Guidance System Charles Rich and Candace L. Sidner TR2007-003 February 2007 Abstract DiamondHelp is a generic collaborative task guidance system motivated by the current usabil- ity crisis in high-tech home products. It combines an application-independent conversational interface (adapted from online chat programs) with an application-specific direct manipulation interfaces. DiamondHelp is implemented in Java and uses Collagen for representing and using task models. AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 2007 201 Broadway, Cambridge, Massachusetts 02139

Upload: others

Post on 26-Oct-2019

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

MITSUBISHI ELECTRIC RESEARCH LABORATORIEShttp://www.merl.com

DiamondHelp: A Generic CollaborativeTask Guidance System

Charles Rich and Candace L. Sidner

TR2007-003 February 2007

Abstract

DiamondHelp is a generic collaborative task guidance system motivated by the current usabil-ity crisis in high-tech home products. It combines an application-independent conversationalinterface (adapted from online chat programs) with an application-specific direct manipulationinterfaces. DiamondHelp is implemented in Java and uses Collagen for representing and usingtask models.

AI Magazine, Vol. 28, No. 2, Summer 2007.

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in partwithout payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies includethe following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment ofthe authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, orrepublishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. Allrights reserved.

Copyright c©Mitsubishi Electric Research Laboratories, Inc., 2007201 Broadway, Cambridge, Massachusetts 02139

Page 2: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

MERLCoverPageSide2

Page 3: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

DiamondHelp:A Generic Collaborative Task Guidance System

Charles Rich and Candace L. Sidner

2 DiamondHelp is a generic collaborativetask guidance system motivated by thecurrent usability crisis in high-tech homeproducts. It combines an application-independent conversational interface(adapted from online chat programs) withan application-specific direct manipulationinterfaces. DiamondHelp is implementedin Java and uses Collagen for representingand using task models.

Our diagnosis of the current usability crisisin high-tech home products (see sidebar thispage) identifies two fundamental underlyingcauses: the exhaustion of conventional in-teraction paradigms and the lack of consis-tency in user interface design. This paperaddresses both of these causes by introducinga new framework for building collaborativetask guidance systems, called DiamondHelp.

The dominant current paradigm forhuman-computer interaction is direct ma-nipulation (Shneiderman & Plaisant 2005),which can be applied with great effectivenessand elegance to those aspects of an interfacewhich afford natural and intuitive analogiesto physical actions, such as pointing, drag-ging, sliding, etc. In practice, however, mostinterfaces of any complexity also includean ad hoc and bewildering collection ofother mechanisms, such as tool bars, pop-upwindows, menus, command lines, etc.

To make matters worse, once one goes be-yond the most basic functions (such as open,save, etc.) there is very little design con-sistency between interfaces to different prod-ucts, even from the same manufacturer. Theresult is that it requires too large an invest-ment of the typical user’s time to learn all theintricacies of each new product.

The DiamondHelp framework consists ofthree components:

• an interaction paradigm based ontask-oriented human collaboration,

• a graphical user interface design which

The Usability Crisis in High-Tech Home Products

. . . technology remains far too hard for most folks to use and mostpeople can only utilize a tiny fraction of the power at their disposal.(Business Week, 2004)

Half of all “malfunctioning products” returned to stores byconsumers are in full working order, but customers can’t figure outhow to operate the device. (Reuters, 2006)

Figure 1. DiamondHelp for Combination Washer-Dryer.

combines conversational and directmanipulation interfaces,

• and a software architecture of reusableJava Beans.

The first two of these components are ori-ented towards the user; the third componentaddresses the needs of the software developer.Each of these components is described in de-tail in a section below.

1

Page 4: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

Figure 1 illustrates the DiamondHelp userinterface design. The top half of the screenis the generic conversational interface, whilethe bottom half is a direct manipulationinterface for a combination washer-dryer.Although our immediate motivation hasbeen the usability crisis in DVD recorders,programmable thermostats, combinationwasher-dryers, refrigerator-ovens, etc., Dia-mondHelp is not limited to home appliances;it can be applied to any software interface.

DiamondHelp grew out of a longstand-ing research thread on human-computer col-laboration, organized around the Collagensystem (see Collagen sidebar). Our workon Collagen, however, has been primarilyconcerned with the underlying semantic andpragmatic structures for modeling collabora-tion, whereas DiamondHelp is focused on theappearance (the “look and feel”) that a col-laborative system presents to the user. Al-though Collagen plays a key role in our im-plementations of DiamondHelp applications(see Software Architecture), the Diamond-Help framework can also be used indepen-dently of Collagen.

Finally, in relation to this special issue ofAI Magazine, we would say that mixed initia-tive is at the heart of DiamondHelp since col-laboration and mixed initiative are virtuallyco-extensive concepts: All collaborative sys-tems are mixed initiative; and most interest-ing mixed-initiative systems are collaborative(see further discussion in Mixed Initiative).

Interaction ParadigmAlthough an “interaction paradigm” is some-what abstract and difficult to define formally,it is crucial to the overall development of in-teractive systems. In particular, the consis-

GOALS (etc.)

GUI

SYSTEMcommunication

observe observe

action action

USER

Application

Figure 2. The Collaborative Paradigm.

tent expression and reinforcement of the in-teraction paradigm in the user interface de-sign (what the user sees) leads to systemsthat are easier to learn and use. The in-teraction paradigm also provides organizingprinciples for the underlying software archi-tecture, which makes such systems easier tobuild.

The collaborative paradigm for human-computer interaction, illustrated in Figure 2,mimics the relationships that typically holdwhen two humans collaborate on a task in-volving a shared artifact, such as two me-chanics working on a car engine together ortwo computer users working on a spreadsheettogether. This paradigm differs from the con-ventional view of interfaces in two key re-spects.

First, notice that the diagram in Figure 2is symmetric between the user and the sys-tem. Collaboration in general involves bothaction and communication by both partici-pants, and in the case of co-present collabo-ration (which is the focus of DiamondHelp)the actions of each participant are directlyobserved by the other participant.

One consequence of this symmetry is thatthe collaborative paradigm spans, in a prin-cipled fashion, a very broad range of interac-tion modes, depending on the relative knowl-edge and initiative of the system versus theuser. For example, “tutoring” (aka intelligentcomputer-aided instruction) is a kind of col-laboration in which the system has most ofthe knowledge and initiative. At the otherend of the range is “intelligent assistance,”wherein the user has most of the knowledgeand initiative. Furthermore, a collabora-tive system can easily and incrementally shiftwithin this range depending on the currenttask context.

Second, and at a deeper level, the primaryrole of communication in collaboration is notfor the user to tell the system what to do (thetraditional “commands”), but rather to es-tablish and negotiate about goals and how toachieve them. J.C.R. Licklider observed thisfundamental distinction over 40 years ago ina classic article, which is well worth revisiting(see first page sidebar and Lesh et al. 2004).

The problem with the command view ofuser interfaces is that it demands too muchknowledge on the part of the user. Conven-tional systems attempt to compensate for thisproblem by adding various kinds of ad hochelp facilities, tool tips, wizards, etc., whichare typically not integrated into any clearparadigm. (See the Related Work for specificdiscussion of the relation of DiamondHelp to

2

Page 5: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

J.C.R. Licklider in 1960 on Human-Computer Collaboration

[Compare] instructions ordinarily addressed to intelligent human beings with instructions ordinarily used withcomputers. The latter specify precisely the individual steps to take and the sequence in which to take them. Theformer present or imply something about incentive or motivation, and they supply a criterion by which the humanexecutor of the instructions will know when he has accomplished his task. In short: instructions directed tocomputers specify courses; instructions directed to human beings specify goals. (Licklider 1960)

wizards and Microsoft’s Clippy.) In contrast,collaboration encompasses all of these capa-bilities within the paradigm.

Finally, notice that Figure 2 does providesa place to include direct manipulation as partof the collaborative paradigm, i.e., one canchoose to implement the application’s graph-ical user interface (GUI) using direct manip-ulation. This is, in fact, exactly what wehave done in the user interface design for Di-amondHelp, described in the next section.

User Interface DesignOur overarching goal for the DiamondHelpuser interface design was to signal, as stronglyas possible, a break from the conventional in-teraction style to the new collaborative par-adigm. A second major challenge was to ac-comodate what is inescapably different abouteach particular application of DiamondHelp(the constructs needed to program a thermo-stat are very different from those needed toprogram a washing machine), while preserv-ing as much consistency as possible in thecollaborative aspects of the interaction. Ifsomeone knows how to use one DiamondHelpapplication, they should know how to use anyDiamondHelp application.

In order to address these concerns, we di-vided the screen into two areas, as shown inthe example DiamondHelp interfaces of Fig-ures 1, 3 and 4. The washer-dryer has beenimplemented in Java; the other two appli-ances are Flash simulations. The top half ofeach of these screens is the same distinctiveDiamondHelp conversational interface. Thebottom half of each screen is an application-specific direct manipulation interface. Divid-ing the screen into two areas is, of course, notnew; our contributions are the specific graph-ical interface design and the reusable softwarearchitecture described below.

Conversational Interface

To express and reinforce the human-computer collaboration paradigm, whichis based on human-human communica-tion, we adopted the scrolling speech bubble

Figure 3. DiamondHelp for DVD Recorder.

Figure 4. DiamondHelp for Programmable Thermostat.

3

Page 6: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

metaphor often used in online chat programs,which support human-human communica-tion. The bubble graphics nicely reflect thecollaborative paradigm’s symmetry betweenthe system and the user, discussed above.This graphical metaphor also naturallyextends to the use of speech synthesis andrecognition technology (see Conclusion).

The top half of every DiamondHelp inter-face is thus a conversation (“chat”) betweenDiamondHelp (the system), represented bythe Mitsubishi three-diamond logo on theleft, and the user, represented by the hu-man profile on the right. All communicationbetween the user and system takes place inthese bubbles; there are no extra toolbars,pop-up menus, etc.

The basic operation of the conversationalpart of the DiamondHelp interface is as fol-lows. Let’s start with the opening of the con-versation, which is the same for all Diamond-Help applications:

After the system says its welcome, a userbubble appears with several options for theuser to choose from. We call this the user’s“things to say” bubble. At the opening ofthe conversation, there are only four choices:“What next?,” “oops,” “done,” and “help.”Notice that the last three of these options arespecially laid out with icons to the right ofthe vertical line; this is because these threeoptions are always present (see PredefinedThings to Say below).

At this point, the user is free either toreply to the system’s utterance by selectingone of the four things to say or to interactwith the direct manipulation part of the in-terface. Unlike the case with traditional “di-alog boxes,” the application GUI is neverlocked. This flexibility is a key aspect of howwe have combined conversational and directmanipulation interfaces in DiamondHelp.

In this scenario, the user decides to re-quest task guidance from the system by se-lecting “What next?” in the things-to-saybubble. As we have argued elsewhere (seeCollagen sidebar article), every interactivesystem should be able to answer a “What(can I do) next?” question.

As part of the interface animation, when-

ever the user makes a things-to-say selection(by clicking or touching the desired word orphrase), the not-selected items are erased andthe enclosing bubble is shrunk to fit onlythe selected word or phrase. Furthermore, inpreparation for the next step of the conver-sation, the completed system and user bub-bles are “grayed out” and scrolled upward (ifnecessary) to make room for the system re-ply bubble and the user’s next things-to-saybubble:

DiamondHelp applications always replyto “What next?” by a “You can. . . ” utter-ance with the actual options for the possiblenext task goals or actions presented inside theuser’s things-to-say bubble:

In this scenario, from DiamondHelp for aprogrammable thermostat (see Figure 4), thedirect manipulation interface shows the fam-ily’s schedule for the week. The entries in thisschedule determine the thermostat’s temper-ature settings throughout the day. The userchooses the goal of removing a schedule en-try:

The system replies by asking the user toindicate, by selection in the direct manipula-tion interface, which schedule entry is to beremoved. Notice that the user’s things-to-say bubble below includes only three choices.“What next?” is not included because thesystem has just told the user what to do next:

Although the system is expecting the usernext to click (or touch) in the direct manip-ulation interface (see Figure 5), the user isstill free to use the conversational interface,e.g, to ask for help.

Scrolling History

As in chat windows, a DiamondHelp user canat any time scroll back to view parts of the

4

Page 7: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

conversation that have moved off the screen.This is particularly useful for viewing ex-planatory text, which can be quite long (e.g.,suppose the system utterance in Figure 3 wasseveral lines longer).

The oval beads on each side of the upperhalf of the screen in Figures 1, 3 and 4 arestandard scroll bar sliders. We also supportscrolling by simply dragging anywhere on thelined background of the upper window.

An interesting DiamondHelp extension toexplore is allowing the user to “restart” theconversation at an earlier point in the his-tory and move forward again with differentchoices. This could be a good way to sup-port the backtracking paradigm for problemsolving and is also closely related to our ear-lier work on history-based transformations inCollagen; see (Rich & Sidner 1998) in Colla-gen sidebar.

Things to Say

The user’s things-to-say bubble is partly anengineering compromise to compensate forthe lack of natural language understanding(see Conclusion). It also, however, partlyserves the function of suggesting to the userwhat she can do at the current point.

In the underlying software architecture,each choice in the things-to-say bubble is as-sociated with the semantic representation ofan utterance. When the user selects the de-sired word or phrase, the architecture treatsthe choice as if the missing natural-languageunderstanding system produced the associ-ated semantics.

From a design perspective, this meansthat the wording of the text displayed foreach things-to-say choice should read natu-rally as an utterance by the user in the con-text of the ongoing conversation. For ex-ample, in Figure 1, in reply to the system’squestion “How can I help you?”, one of thethings-to-say choices is “Explain wash agita-tion,” not “wash agitation.”

At a deeper level, to be true to the spiritof the collaborative paradigm, the content ofthe conversations in DiamondHelp, i.e., boththe system and user utterances, should con-cern not only primitive actions (“Remove aschedule entry”), but also higher-level goals(“Schedule a vacation”) and motivation(“Why?”).

When Collagen is used in the implemen-tation of a DiamondHelp application, all sys-tem utterances and the user’s things-to-saychoices are automatically generated from thetask model given for the application (see Col-lagen Collaboration Plug-in). Without Colla-

Figure 5. Selection in the Direct Manipulation Area.(See small triangular cursor in first column of schedule.)

gen, the appropriate collaborative content isprovided by the application developer, guidedby the principles elucidated here.

An obvious limitation of the things-to-say approach is that there is only room fora relatively small number of choices insidethe user bubble—six or eight without somekind of nested scrolling, which we would liketo avoid. Furthermore, since we would alsolike to avoid the visual clutter of drop-downchoices within a single user utterance, eachthing-to-say is a fixed phrase without vari-ables or parameters.

Given these limitations, our design strat-egy is to use the direct manipulation inter-face to enter variable data. For example, inDiamondHelp for a washer-dryer, instead ofthe system asking “How long is the dry cy-cle?” and generating a things-to-say bubblecontaining “The dry cycle is 10 minutes,” “. . .15 minutes,” etc., the system says “Please setthe dry cycle time,” points to the appropri-ate area of the direct manipulation interface,and expects the user to enter the time viathe appropriate graphical widget. Figure 5is a similar example of using the direct ma-nipulation interface to provide variable data,in this case to select the thermostat scheduleentry to be removed.

Use of a textual things-to-say list togetherwith Collagen and speech recognition hasbeen studied by Sidner and Forlines (Sidner& Forlines 2002) for a personal video recorderinterface and by Dekoven (DeKoven 2004) fora programmable thermostat.

5

Page 8: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

Predefined Things to Say

Most of the things to say are specific to an ap-plication. However, a key part of the genericDiamondHelp design is the following set ofuser utterances which should have the samemeaning in all applications: “What next?,”“Never mind,” “Oops,” “Done,” and “Help.”In addition, to save space and enhance ap-pearance, we have adopted a special layoutfor the last three of these utterances, whichare always present.

“What next?” has already been discussedabove. “Never mind” is a way to end aquestion without answering it (see Figure 1).“Oops,” “Done,” and “Help” each initiate asubdialog in which the system tries to deter-mine, respectively, how to recover from a mis-take, what level of task goal has been com-pleted, or what form of help the user desires.

When Collagen is used in the implemen-tation of DiamondHelp, the semantics of thepredefined things-to-say are automaticallyimplemented correctly; otherwise this is theresponsibility of the collaboration plug-inimplementor (see Software Architecture).

Task Bar

An additional component of the Diamond-Help conversational interface is the “taskbar,” which is the single line located im-mediately above the scrollable history andbelow the DiamondHelp logo in Figures 1and 3 (the task bar in Figure 4 happens tobe blank at the moment of the screen shot).For example, in Figure 1, the contents of thetask bar reads:Make a new cycle > Adjust wash agitation > Help

The task bar, like the scrollable historyand the predefined “What next?” utterance,is a mechanism aimed towards helping userswhen they lose track of their context, which isa common problem in complex applications.When Collagen is used in the implementationof DiamondHelp, the task bar is automati-cally updated with the path to the currentlyactive node in the task model tree (see Fig-ure 8 and Collagen sidebar). Otherwise, it isthe responsibility of the collaboration plug-in implementor to update the task bar withappropriate task context information.

Currently, the task bar is for display only.However, we have considered extending thedesign to allow users to click (touch) elementsof the displayed path and thereby cause thetask focus to move. This possible extension isclosely related to the scrollable history restartextension discussed above.

Application GUI

The bottom half of each screen in Figures 1,3 and 4 is an application-specific direct ma-nipulation interface. The details of theseGUI’s are not important to the main pointof this paper. If fact, there is nothing in theDiamondHelp framework that guarantees orrelies upon the application GUI being well-designed or even that it follow the direct ma-nipulation paradigm, though this is recom-mended. (The only way to guarantee thiswould be for DiamondHelp to automaticallygenerate the application GUI from a taskmodel. We have done some research on thisapproach (Eisenstein & Rich 2002), but it isstill far from practical.)

Furthermore, to achieve our overallgoal of consistency across applications, anindustrial-grade DiamondHelp would, likeconventional UI toolkits, provide standardcolor palettes, widgets, skins, etc. However,this is beyond the scope of a researchprototype.

Two design constraints which are reliedupon by the DiamondHelp framework are:(1) the user should be able to use the appli-cation GUI at any time, and (2) every useraction on the application GUI is reported tothe system (see Manipulation Plug-in below).

Finally, it is worth noting that, for thethree applications presented here, it is possi-ble to perform every function the applicationsupports entirely using the direct manipula-tion interface, i.e., totally ignoring the con-versational window. While this is a pleasantfact, we also believe that in more complexapplications, there may be some overall ad-vantage in relaxing this constraint. This is adesign issue we are still exploring.

Software ArchitectureIn contrast to most of the above discussion,which focuses on the the user’s view of Di-amondHelp, this section addresses the needsof the software developer. The overarchingissue in DiamondHelp’s software architectureis reuse, i.e., factoring the code so that thedeveloper of a new DiamondHelp applicationonly has to write what is idiosyncratic to thatapplication, while reusing as much genericDiamondHelp framework code as possible.

Figure 6 shows our solution to this chal-lenge, which we have implemented in JavaBeans and Swing. (Dotted lines indicate op-tional, but recommended, components.) Allthe application-specific code is contained intwo “plug-ins,” which are closely related tothe two halves of the user interface. The rest

6

Page 9: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

Model

View

ManipulationPlug−in

user action

userutterance

observation

Collagen

Task Model

utterancesystem

application GUI update

user bubble

system bubble

thingsto say

task bar

user manipulation

system action

userchoice

CollaborationPlug−in

DiamondHelp

Figure 6. DiamondHelp Software Architecture.

of the code (shaded region) is generic Dia-mondHelp framework code.

In addition to the issues discussed in thissection, there are a number of other standardfunctions of plug-and-play architectures (ofwhich DiamondHelp is an instance), such asdiscovering new devices connected to a net-work, loading the appropriate plug-ins, etc.,which are beyond the scope of this paper.

Manipulation Plug-in

Notice that the manipulation plug-in “sticksout” of the DiamondHelp framework box inFigure 6. This is because it is directly respon-sible for managing the application-specificportion of the DiamondHelp interface. Di-amondHelp simply gives this plug-in a Swingcontainer object corresponding to the bottomhalf of the screen.

We recommend that the manipulationplug-in provide a direct-manipulation styleof interface implemented using the model-view architecture, as shown by the dottedlines in Figure 6. In this architecture, allof the “semantic” state of the interface isstored in the model subcomponent; the viewsubcomponent handles the application GUI.

Regardless of how the manipulation plug-in is implemented internally, it must pro-vide an API with the DiamondHelp frame-work which includes two event types: out-going events (user action observation) whichreport state changes resulting from user GUIactions, and incoming events (system action)which specify desired state changes. Further-

more, in order to preserve the symmetry ofthe collaborative paradigm (see Figure 2), itis the responsibility of the plug-in to updatethe GUI in response to incoming events, sothat the user may observe system actions.As an optional but recommended feature, themanipulation plug-in may also provide theDiamondHelp framework with the graphicallocation of each incoming state change event,so that DiamondHelp can move a cursor orpointer to that location, to help the user ob-serve system actions.

Note that the content of both incomingand outgoing state change events is specifiedin semantic terms, e.g., “change dry cycletime to 20 min,” not “button click at pixel100,200.” Lieberman (Lieberman 1998) fur-ther discusses this and other issues relatedto interfacing between application GUI’s andintelligent systems.

Collaboration Plug-in

Basically, the responsibility of the collabora-tion plug-in is to generate the system’s re-sponses (actions and utterances) to the user’sactions and utterances. Among other things,this plug-in is therefore responsible for im-plementing the semantics of the predefinedutterances (see next section).

The collaboration plug-in has two in-puts: observations of user actions (receivedfrom the manipulation plug-in), and userutterances (resulting from user choicesin the things-to-say bubble). It also hasfour outputs: system actions (sent to the

7

Page 10: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

manipulation plug-in), system utterances(which go into system bubbles), things tosay (which go into user bubbles), and thetask bar contents.

Notice that the collaboration plug-in is re-sponsible for providing only the content of thesystem and user bubbles and the task bar.All of the graphical aspects of the conversa-tional window are managed by DiamondHelpframework code. It is also an important fea-ture of the DiamondHelp architecture thatthe collaboration plug-in developer does notneed to be concerned with the graphical de-tails of the application interface. The collab-oration plug-in developer and the manipula-tion plug-in developer need only to agree ona semantic model of the application state.

For a very simple application, the collabo-ration plug-in may be implemented by a state

ChooseDiscourse State

Agenda

DiscourseGeneration

Interpretation

user utterance

user action

system action

system utterance

observation

Task Model

Discourse

things to say

task bar

Figure 7. Collagen Architecture (See Figure 6).

select last day

add entry remove entry modify entry schedule vacation

program thermostat

select first day

Figure 8. Task Model for Programmable Thermostat.

machine or other ad hoc mechanisms. How-ever, in general, we expect to use the Colla-gen version described in the next section.

Collagen Collaboration Plug-in

The Collagen version of the collaborationplug-in includes an instance of Collagen, witha little bit of wrapping code to match the col-laboration plug-in API. Collagen has alreadybeen described in an earlier article (see side-bar). We will only highlight certain aspectshere which relate to DiamondHelp.

The most important reason to use theCollagen collaboration plug-in is that the ap-plication developer only needs to provide onething: a task model. All four outputs ofthe collaboration plug-in described above arethen automatically generated by Collagen asshown in Figure 7.

System utterances and actions areproduced by Collagen’s usual responsegeneration mechanisms (Rich et al. 2002).The semantics of some of the predefined Di-amondHelp user utterances, such as “Oops,”“Done,” and “Help,” “ are captured in ageneric task model, which is used by Collagenin addition to the application-specific taskmodel provided. Each of these predefinedutterances introduces a generic subgoal withan associated subdialog (using Collagen’sdiscourse stack). Other predefined utter-ances, such as “What next?”, and “Nevermind,” are built into Collagen.

The third collaborative plug-in outputproduced by Collagen is the user’s thingsto say. Basically, the things to say are theresult of filtering the output of Collagen’sexisting discourse (agenda) generation algo-rithm. The first step of filtering is removeeverything except expected user utterances.Then, if there are still too many choices, aset of application-independent heuristics areapplied based on the type of communicativeact (proposal, question, etc.). Lastly, prede-fined utterances are added as appropriate.The further details of this algorithm need tobe the topic of a separate paper.

Mixed InitiativeIn this section, we review DiamondHelp withrespect to the shared themes of this specialissue of AI Magazine on mixed-initiative as-sistants.

To begin, let us follow up on our earliercomment that mixed initiative and collab-oration are virtually co-extensive concepts.Horvitz provides as good a definition as anyof mixed initiative:

8

Page 11: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

Collagen: Applying Collaborative Discourse Theory to Human-Computer InteractionCharles Rich, Candace L. Sidner and Neal Lesh

AI Magazine, Vol. 22, No. 4, pp. 15–25, Fall 2001(Special Issue on Intelligent User Interfaces)

Collagen (for collaborative agent) isJava middleware for building collabo-rative interface agents based on Groszand Sidner’s SharedPlan theory of col-laborative discourse (Grosz & Sid-ner 1986; 1990; Grosz & Kraus 1996;Lochbaum 1998). Collagen has beenused to build more than a dozen pro-totypes, including:

• air travel planning assistant (Rich& Sidner 1998)

• email assistant (Gruen et al. 1999)• VCR programming assistant

(Sidner & Forlines 2002)• power system operation assistant

(Rickel et al. 2001)• gas turbine engine operation tutor

(Davies et al. 2001)• flight path planning assistant

(Cheikes & Gertner 2001)• recycling resource allocation

assistant• software design tool assistant• programmable thermostat helper

(DeKoven et al. 2001)• mixed-initiative multi-modal form

filling• robot hosting system (Sidner et al.

2005)

The two key data structures in Col-lagen’s architecture (see Figure 7) arethe task model and the discourse state.The two key algorithms are discourseinterpretation and generation.

Task Model. A task model is an ab-stract, hierarchical, partially orderedrepresentation of the actions typicallyperformed to achieve goals in the ap-plication domain. Figure 8 shows afragment of the task model for theprogrammable thermostat applicationin Figure 4. Although this exam-ple does not, task models in generalalso contain alternative decompositionchoices. Collagen currently providesa Java extension language for defin-ing task models, but we will adopt thenew CEA standard (see Conclusion) assoon as it is completed.

Discourse State. The discoursestate tracks the beliefs and intentionsof the participants in the current col-laboration and provides a focus of at-tention mechanism for tracking shiftsin the task and conversational con-text. Collagen’s discourse state isimplemented as a goal decomposition(plan) tree plus a stack of focus spaces(each of which includes a focus goal).

The top goal on the focus stack is the“current purpose” of the discourse.

Discourse Interpretation. The ba-sic job of discourse interpretation isto incrementally update the discoursestate by explaining how each occurringevent (action or utterance by the useror the system) contributes to the cur-rent purpose. An event contributes toa purpose if it either:

• directly achieves the purpose,• is a step in the task model for

achieving the purpose,• selects one of alternative

decomposition choices forachieving the purpose,

• identifies which participant shouldachieve the purpose, or

• identifies a parameter of thepurpose.

If the current event contributes,either directly or indirectly, to thecurrent purpose, then the plan treeis appropriately extended to includethe event; the focus stack may alsobe pushed or popped. Our dis-course interpretation algorithm ex-tends Lochbaum’s (1998) original for-mulation by incorporating plan recog-nition (Lesh, Rich, & Sidner 1999;2001) and handling interruptions.

Discourse Generation. The dis-course generation algorithm in Colla-gen is essentially the inverse of dis-course interpretation. Based on thecurrent discourse state, it producesa prioritized list, called the agenda(see Figure 7), of partially or totallyspecified utterances and actions whichwould contribute to the current dis-course purpose according to the fivecases itemized above.

In general, an agent may use any ap-plication-specific logic it wants to de-cide on its next action or utterance. Inmost cases, however, an agent can sim-ply choose the first item on the agendagenerated by Collagen.

References

Cheikes, B., and Gertner, A. 2001.Teaching to plan and planning to teachin an embedded training system. InProc. 10th Int. Conf. on Artificial In-telligence in Education, 398–409.

Davies, J.; Lesh, N.; Rich, C.; Sidner,C.; Gertner, A.; and Rickel, J. 2001.Incorporating tutorial strategies into

an intelligent assistant. In Proc. Int.Conf. on Intelligent User Interfaces,53–56.

DeKoven, E.; Keyson, D.; andFreudenthal, A. 2001. Designing col-laboration in consumer products. InProc. ACM Conf. on Computer Hu-man Interaction, Extended Abstracts,195–196.

Grosz, B. J., and Kraus, S. 1996. Col-laborative plans for complex group ac-tion. Artificial Intelligence 86(2):269–357.

Grosz, B. J., and Sidner, C. L. 1986.Attention, intentions, and the struc-ture of discourse. Computational Lin-guistics 12(3):175–204.

Grosz, B. J., and Sidner, C. L. 1990.Plans for discourse. In Cohen, P. R.;Morgan, J. L.; and Pollack, M. E.,eds., Intentions and Communication.Cambridge, MA: MIT Press. 417–444.

Gruen, D.; Sidner, C.; Boettner, C.;and Rich, C. 1999. A collaborative as-sistant for email. In Proc. ACM Conf.on Computer Human Interaction, Ex-tended Abstracts, 196–197.

Lesh, N.; Rich, C.; and Sidner,C. 1999. Using plan recognitionin human-computer collaboration. InProc. 7th Int. Conf. on User Mod-elling, 23–32.

Lesh, N.; Rich, C.; and Sidner, C.2001. Collaborating with focused andunfocused users under imperfect com-munication. In Proc. 9th Int. Conf. onUser Modelling, 64–73.

Lochbaum, K. E. 1998. A collab-orative planning model of intentionalstructure. Computational Linguistics24(4):525–572.

Rich, C., and Sidner, C. 1998.Collagen: A collaboration managerfor software interface agents. UserModeling and User-Adapted Interac-tion 8(3/4):315–350.

Rickel, J.; Lesh, N.; Rich, C.; Sid-ner, C.; and Gertner, A. 2001. Us-ing a model of collaborative dialogueto teach procedural tasks. In WorkingNotes of AI-ED Workshop on TutorialDialogue Systems, 1–12.

Sidner, C. L.; Lee, C.; Kidd, C.;Lesh, N.; and Rich, C. 2005. Explo-rations in engagement for humans androbots. Artificial Intelligence 166(1-2):104–164.

9

Page 12: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

...an efficient, natural interleaving ofcontributions by users and automatedservices aimed at converging on solu-tions to problems. (Horvitz 1999)

Compare this with the definition of collab-oration upon which Collagen (and thereforeDiamondHelp) is based:

...a process in which two or more par-ticipants coordinate their actions to-ward achieving shared goals. (Colla-gen sidebar article)

If the two participants in a collaboration arethe user and an automated system, then sucha collaboration is generally mixed-initiative;only in those rare or simple circumstanceswhere one participant provides no contribu-tion to any goal (including clarification, suc-cess/failure report, etc.) is mixed-initativelacking. Conversely, the only part of the de-finition of collaboration that is missing fromthe definition of mixed initiative is the ex-plicit mention of “shared goals” (and it is per-haps implicit in “converging on solutions”).One could imagine some sort of interactionwith interleaved contributions without sharedgoals, but it is hard to believe it would bevery efficient or natural.

Division of Labor and Control

In general in a collaboration, decisions aboutwho does what and when are matters for ne-gotiation between the participants. In manyconcrete settings, however, many of these de-cisions are conventionalized by the context—consider, for example, the collaboration be-tween the customer and clerk at a retailcounter. Similarly, DiamondHelp has beendesigned with a certain expected (though ad-justable, see Personalization below) divisionof labor and control.

Basically, DiamondHelp assumes thatthe user knows what she wants to do at ahigh level, but needs help carrying out thenecessary details. Furthermore, Diamond-Help normally returns control to the useras quickly as possible (by putting “Whatnext?” in the user bubble).

Architecture

The key to DiamondHelp’s power and flexi-bility as a mixed-initiative system is the highdegree of symmetry between the user and thesystem in all aspects of the architecture, fromthe user interface, to discourse interpretation,to the task model. Many interactive sys-tems, even mixed-initiative ones, have builtinasymmetries which limit their flexibility.

The symmetry of the chat-based user in-terface has already been discussed above. InFigure 7, notice that the discourse interpre-tation process is applied to both user and sys-tem utterances and actions. This is a reflec-tion of the fact that the collaborative dis-course state represents the mutual beliefs ofthe participants (see (Sidner 1994) for moredetails).

Finally, and most fundamentally, the taskmodel which drives a DiamondHelp interac-tion generally specifies only what needs tobe done, not who should do it. (The onlyexception is when one participant is funda-mentally incapable of performing some ac-tion.) For example, consider the two stepsto achieve the schedule vacation goal in Fig-ure 8. This task model accounts for an inter-action in which the user selects both the firstday and the last day via the direct manipu-lation area, and also an interaction in whichthe system selects both days (perhaps basedon other knowledge), or interactions in whichone day is selected by the user and one bythe system. Which interaction actually oc-curs depends on the actions and knowledgeof the user and the system in the particularsituation.

Communication and Shared Awareness

As discussed earlier, DiamondHelp is basedon a model of co-present collaboration involv-ing a shared artifact/application (see Fig-ure 2). Our user interface design attemptsto give the feeling of natural communication,without actually requiring natural languageor speech processing. For shared awareness ofthe application state, DiamondHelp relies to-tally on the graphical user interface providedvia the manipulation plug-in. DiamondHelpitself provides two mechanisms to supportshared awareness of the task/conversationstate: the task bar and the scrollable chatwindow. Collagen also provides a much morecomplete representation of task/conversationstate, called the segmented interaction his-tory (see Collagen sidebar article), which isnot currently incorporated into the design ofDiamondHelp.

Personalization

In addition to the significant personalizationand adaptation inherent in all collaborativeinteractions, there are at least two specificpersonalization mechanisms in DiamondHelpworth mentioning here. Both of these mecha-nisms take advantage of facilities in Collagen,such as the student model, developed for in-telligent tutoring (Rickel et al. 2002), which

10

Page 13: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

(C) DiamondHelp(A) No Guidance (B) Printed Manual

Figure 9. Three Conditions in Planned User Study.

is not surprising, since personalization is akey aspect of intelligent tutoring.

First, as discussed above, DiamondHelpnormally returns control to the user asquickly as possible. However, based onsimple observations of the user’s behavior,such as timing and errors, DiamondHelp canswitch into a mode where it takes controland guides the user through an entire taskor subtask, without the user having to ask“What next?” at each step.

A second personalization has to do withwhether the system asks the user to per-form certain manipulations on the applica-tion GUI, or simply performs them itself. Forexample, in the washer-dryer application, thesystem can either say “Please click here topop up the temperature dial,” or it can sim-ply pop up the temperature dial window atthe appropriate time itself. The former casehas the benefit that the user learns where toclick in the future; the latter case has the ben-efit of being faster. DiamondHelp can switchbetween these modes on a per action basis,depending on whether the user has alreadyperformed the action once or twice herself.

Evaluation

Figure 9 shows the three conditions in a userstudy we are currently planning to evalu-ate DiamondHelp using the washer-dryer ex-ample. The printed manual in condition Bwill contain the same information (literallythe same text) which is communicated dy-namically by DiamondHelp in condition C.Users in each condition will be assigned aset of tasks which will require them to usethe advanced programmability features of thewasher-dryer. We plan to obtain both objec-tive measures, such as time and quality oftask completion, and subjective evaluationsof experience.

Related WorkThe most closely related work to Diamond-Help, in terms of both application and ap-proach, is the Roadie system (Lieberman &Espinosa 2006), which is also directed to-wards consumer electronics and shares thetask-model approach. Roadie differs from Di-amondHelp primarily in not using a conversa-tional or collaborative model for its interface.Roadie also incorporates natural languageunderstanding and commensense knowledgecomponents for guessing the user’s desires,

Also closely related is the Writer’s Aidsystem (Babaian, Grosz, & Shieber 2002),which applies the same collaborative theoryupon which DiamondHelp is based to a singlespecific task, rather than developing a generaltool.

Of the papers in this special issue of AIMagazine, the most closely related is (Fergu-son & Allen 2006). The collaborative modeland implementation architecture used inthat work is very similar to those describedhere. The scope of their work, however,is broader than ours, since they also buildnatural language understanding components.Furthermore, their architecture includes aBDI (Beliefs, Desires and Intentions) compo-nent, which is present only in vestigial formin DiamondHelp/Collagen.

BDI architecture is at the heart of themixed-initiative personal assistant of (Meyers& others 2006). They describe the interactionparadigm used in their work as “delegative”,and contrast it with the collaborative para-digm. We would view delegation as being in-cluded within the range of collaboration; see(Grosz & Kraus 1996) in Collagen sidebar.

Another category of related work isgeneric software architectures for connectingarbitrary applications with help facilities.The best know example of this categoryis Microsoft’s Clippy. Clippy’s interactionparadigm might be described as “watching

11

Page 14: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

over the user’s shoulder and jumping inwhen you have some advice,” and is, inour opinion, the main reason for Clippy’swell-known unpopularity among users. Incontrast, the collaborative paradigm un-derlying DiamondHelp emphasizes ongoingcommunication between the system and userto maintain shared context.

Also in this category are so-called “wiz-ards,” such as software installation wizards,etc., which come in many different forms frommany different manufacturers. Our feelingabout wizards is that wizards embody theright paradigm, i.e., interactively guiding theuser, but in too rigid a form. DiamondHelpsubsumes the capabilities of wizards, but alsoallows users to take more initiative when theywant to.

Conclusion

The contributions of this work are twofold.First, we have explicated a novel user inter-face design, which expresses and reinforcesthe human-computer collaboration paradigmby combining conversational and direct ma-nipulation interfaces. Second, and more con-cretely, we have produced the DiamondHelpsoftware, which can be used by others to eas-ily construct such interfaces for new applica-tions.

A number of small extensions to Dia-mondHelp have already been discussed in thebody of this paper. Another general set of ex-tensions have to do with adding speech andnatural language understanding technology.

Adding text-to-speech generation to thesystem bubble is a very easy extension, whichwe have already done (it’s an optional featureof the software). Furthermore, we have foundthat using a system with synthesized speechis surprisingly more pleasant than without,even when user input is still by touch or click.

Adding speech recognition can be donein two ways. The first, more limited way,is to use speech recognition to choose one ofthe displayed things to say. We have alreadydone this (another optional feature) and havefound that the speech recognition is in gen-eral quite reliable, because of the small num-ber of very different utterances being recog-nized. Furthermore, because of the way thatthe things-to-say mechanism is implemented,this extension requires no changes in the col-laboration or manipulation plug-ins for anapplication.

A second, and much more ambitious ap-proach, is to dispense with the things-to-saybubble and try to recognize anything the user

says, which, of course, requires broad un-derlying speech and natural language under-standing capabilities. If one has broad nat-ural language understanding, another varia-tion is to support unrestricted typed inputfrom the user instead of speech.

Alternatively, it would be trivial (at leastfrom the user interface point of view) to re-vert the conversational interface to its origi-nal function as a chat between two humans,the user and a remote person. This could beuseful, for example, for remote instruction ortroubleshooting. The system could even au-tomatically shift between normal and “livechat” mode.

Finally, returning to the original motiva-tion of this work, namely the usability crisisin high-tech home products, the ConsumerElectronics Association (http://www.ce.org)has recently created a new working groupon Task-Based User Interface, which will de-velop a standard (CEA-2018) for representingtask models. This standard has the potentialof helping systems like DiamondHelp to havea significant impact on the usability problem.

Acknowledgements

The work described here took place at Mit-subishi Electric Research Laboratories be-tween approximately January 2004 and June2006. The basic DiamondHelp concept wasdeveloped in collaboration with Neal Leshand Andrew Garland. Shane Booth providedgraphic design input. Markus Chimani im-plemented key parts of the graphical interfacein Java.

ReferencesBabaian, T.; Grosz, B.; and Shieber, M. 2002. Awriter’s collaborative aid. In Proc. Int. Conf. onIntelligent User Interfaces, 7–14.

DeKoven, E. 2004. Help Me Help You: DesigningSupport for Person-Product Collaboration. Ph.D.Dissertation, Delft Inst. of Technology.

Eisenstein, J., and Rich, C. 2002. Agents andGUIs from task models. In Proc. Int. Conf. onIntelligent User Interfaces, 47–54.

Ferguson, G., and Allen, J. 2006. Mixed-initiative systems for collaborative problem solv-ing. AI Magazine. To appear.

Horvitz, E. 1999. Uncertainty, action and inter-action: In pursuit of mixed-initiative computing.IEEE Inteligent Systems 14(5):17–20.

Lesh, N.; Marks, J.; Rich, C.; and Sidner,C. 2004. “Man-computer symbiosis” revisited:Achieving natural communication and collabo-ration with computers. IEICE Transactions Inf.& Syst. E87-D(6):1290–1298.

12

Page 15: TR2007-003 February 2007 AI Magazine, Vol. 28, No. 2 ... · AI Magazine, Vol. 28, No. 2, Summer 2007. This work may not be copied or reproduced in whole or in part for any commercial

Licklider, J. C. R. 1960. Man-computer symbio-sis. IRE Trans. Human Factors in ElectronicsHFE-1:4–11.

Lieberman, H., and Espinosa, J. 2006. A goal-oriented interface to consumer electronics usingplanning and commonsense reasoning. In Proc.Int. Conf. on Intelligent User Interfaces, 226–233.

Lieberman, H. 1998. Integrating user interfaceagents with conventional applications. In Proc.Int. Conf. on Intelligent User Interfaces, 39–46.

Meyers, K., et al. 2006. An intelligent per-sonal assistant for task and time management.AI Magazine. To appear.

Rich, C.; Lesh, N.; Rickel, J.; and Garland, A.2002. A plug-in architecture for generating col-laborative agent responses. In Proc. 1st Int.J. Conf. on Autonomous Agents and MultiagentSystems.

Rickel, J.; Lesh, N.; Rich, C.; Sidner, C.; andGertner, A. 2002. Collaborative discourse theoryas a foundation for tutorial dialogue. In 6th Int.Conf. on Intelligent Tutoring Systems, 542–551.

Shneiderman, B., and Plaisant, C. 2005. De-signing the User Interface: Strategies for Ef-fective Human-Computer Interaction. Reading,MA: Addison-Wesley.

Sidner, C. L., and Forlines, C. 2002. Subsetlanguages for conversing with collaborative in-terface agents. In Int. Conf. on Spoken LanguageProcessing.

Sidner, C. L. 1994. An artificial discourse lan-guage for collaborative negotiation. In Proc. 12thNational Conf. on Artificial Intelligence, 814–819.

Charles Rich is a Distinguished Research Sci-entist and Associate Director of the Research Labat Mitsubishi Electric Research Laboratories. Heearned his Ph.D. at the MIT Artificial Intelli-gence Lab, where he was a founder and directorof the Programmer’s Apprentice project. He isa Fellow and past Councilor of AAAI, as well ashaving served as Chair of the 1992 Int. Conf.on Principles of Knowledge Representation andReasoning, Cochair of the 1998 National Conf.on Artificial Intelligence, and Program Cochairthe 2004 Int. Conf. on Intelligent User Inter-faces. His email address is [email protected].

Candace L. Sidner is a Senior Research Scien-tist at Mitsubishi Electric Research Laboratories.She earned her Ph.D. at the MIT Artificial Intel-ligence Laboratory, following which she has beena researcher at BBN Laboratories, DEC Cam-bridge Research Laboratory, and Lotus Develop-ment Corporation (IBM). She is a Fellow andpast Councilor of AAAI, currently General Chairof the 2007 Human Language Technology con-ference of the NAACL, as well as having servedas President of the Association for Computa-tional Linguistics, Chair of the 2001 and Pro-gram Cochair of the 2006 Int. Conf. on Intel-ligent User Interfaces, and Cochair of the 2004SIGdial Workshop on Discourse and Dialogue.Her email address is [email protected].

13