automating generation of coordinated muli media

8/6/2019 Automating Generation of Coordinated Muli Media

1/9

Automating theGeneration of CoordinatedMultimedia ExplanationsSteven K. Feiner and Kathleen R. McKeownColumbia University

We developed anexplanation system

to overcome thedisadvantages ofconventional authoring

in multimediaapplications. COMETnot only determineswhat to say, but how

to say it.

2~s metimes a picture is worth the proverbial thousand words; sometimes afew well-chosen words are far more effective than a picture. Picturesoften describe objects or diagram physical actions more clearly thanwords do. In contrast, language often conveys information about abstract objects,properties, and relations more effectively than pictures can. Pictures and languageused together can complement and reinforce each other to enable more effectivecommunication than can either medium alone. In this sense, multimedia informa-tion systems may greatly increase effective communications.

Fortunately, technical advances are beginning to reduce the cost of hardware forcomputer-based multimedia and hypermedia. First-generation authoring facilitieslet users create presentations that include text, graphics, animat ion, and video.Regardless of the basic functionality or interface provided. however, multimediaauthoring systems require authors to possess even more skills than do single-medium authoring systems. Not only must authors be skilled in the conventions ofeach medium. but they must also be able to coordinate multiple media in acoherent presentation. determining where and when to use different media, andreferencing material in one medium from another. Furthermore, since the presen-tation must be authored in advance, the ways in which it can be customized for anindividual user or situation are limited to those built in by the author.

To overcome the disadvantages of this predesigned authoring, we have devel-oped an experimental test bed for the automated generation of multimediaexplanations. COMET (Coordinated Multimedia Explanation Testbed) has as itsgoal the coordinated, interactive generation of explanations that combine text andthree-dimensional graphics, all of which is generated on the fly.

In response to a user request for an explanation, COMET dynamically deter-mines the explanation s content using constraints based on the type of request, theinformation available in a set of underlying knowledge bases, and informationabout the users background and goals. Having determined what to say, COMETalso determines how to express it at the time of generation. The pictures and textthat it uses are not canned: COMET does not select from a database ofconventionally authored text, preprogrammed graphics, or prerecorded video.


2/9

Clear display 1 Overview

Press the CLR button to clear the display.

Much of our work on COMET isbeing done in a field maintenance andrepair do main for a military radio re-ceiver-transmitter. When provided witha set of symptoms, COMET generatesmultimedia explanations that instructthe user in how to carry out diagnostictests. The user interacts with COMETby means of a simple menu and caninitially choose to request instructionsfor a specific procedure or to invokethe diagnostic component. In the lattercase, an underlying expert system iscalled to determine the problems thatthe radio is experiencing and to identi-fy their causes.

Figure 1. COMET s directions for clearing the radio display.

The user selects symptoms from amenu. If the expert system decides thata set of diagnostic tests must be run todetermine the cause of the problem, itcalls the explanation component to tellthe user how to carry out these tests.Explanations consist of one or moresteps that are presented in a series ofdisplays. Although our emphasis thusInstead, COMET decides which infor- sion-making. For example, if the graph- far has been on generating explana-mation should be expressed in each its generator shows the location of an tions, rather than on navigating throughmedium, which words and syntactic struc- object by highlighting it, the text gener- them, COMET s menu interface alsotures best express the portion to be con- ator can refer to the highlighted provides rudimentary facilities for ex-veyed textually, and which graphical object.- . ploring the explanation by paging for-objects, graphical style, and picture struc-

ture best express the portion to be con-veyed graphically. COMET s text andgraphics are created by separate mediugenerators, each of which can communi-cate with the other.

We first provide a brief overview ofCOMET s domain and architecture.Then we focus on the specific ways inwhich COMET can coordinate its textand graphics. Coordination begins withthe choice of media in which specificinformation is communicated. For ex-ample, an objects complex shape maybe shown in a picture, rather than de-scribed in text, while a sentence ma ydescribe a causal relation between sev-eral actions involving the object. Coor-dination also means applying knowledgeabout what information is expressed intext to influence the generation of graph-ics, and vice versa. Thus. the graphicsgenerator may use the fact that a causalrelation is being communicated in textto determine how it depicts other infor-mation, even though the relation itself isnot depicted. Finally, coordination meansusing knowledge about how informa-tion is expressed in other media in deci-34

(4Install the new holding battery. Step 1 of 7

Figure 2. Two steps from C OMETs explanation for installing the holding battery.

step 1:Stand the radio on its top side.

COMPUTER


3/9

ward and backward through its steps. Itcan also access steps by name.

Figure 1 shows COMETs explana-tion for clearing the radios display. Fig-ure 2 shows the beginning of COMETsmultistep expl anation for installing anew holding battery. (The holdingbattery provides power for the radiosmemory when the main battery has beenremoved.) In these first two steps, theuser is instructed to turn the radio up-side down and remove the cover platefrom the battery compartment. Replac-ing the holding battery is the first of aseries of actions that COMET instructsthe user to perform in the course oftroubleshooting loss of radio memory, asymptom that the user selected fromCOMETs menu.

System organization. COMET con-sists of a set of parallel processes thatcooperate in the design of an explana-tion, as shown in Figure 3. On receivinga request for an explanation, the con-tent planner uses text plans, or sche-mas2 to determine which informationfrom the underlying knowledge sourcesshould be included in the explanation.COMET uses three different domainknowledge sources: a static representa-tion of domain objects and actions en-coded in the Loom knowledge repre-

Render andtypeset

Knowledge sources:l Domain knowledge sourcesl User modell Model of previous discourse

Figure 3. System architecture.

sentation language, a diagnostic rulebase, and a detailed geometric knowl-edge base needed for graphics genera-tion. It also maintains a user model anda model of the previous discourse. Thes eknowledge sources are used by all sys-tem components to construct the expla-nation, not just by the content planner.Consequently, they are shown separate-ly at the bottom of the figure withoutarrows to each module.

The contentplannerproduces the fullcontent for the explanation, represent-ed as a hierarchy of logical forms(LFs),as explained in the sidebar titled Uni-fication in COMET. The LFs then are

Install the new holding battery. Step 2 of 7

step 2:Remove the holding battery cover plate:Loosen the captive screwa and pull the holding battery cover plate off of theradio.

passed to the media coordinator. Thiscomponent refines e ach LF by annotat-ing it with directives that indicate whichportions are to be produced by each ofthe media-specific generation systems.The text generator and graphics genera-tor each process the same LFs, produc-ing fragments of text and graphics thatare keyed to the LFs they instantiate.The output from both generators is pro-cessed by the media-layout component,which formats the final presentation forthe low-level rendering-and-typesettingsoftware.

COMET s major components run inparallel on up to five networked work-stations. Text and menus are displayedthrough the X Window System, while3D shaded graphics are rendered byHewlett-Packards Starbase graphicspackage. Each example shown in thefigures takes approximately lo-20 sec-onds to generate and display followingthe initial user request.

One main feature of COMETs archi-tecture is the use of the LF as a type ofblackboard facility. (A blackboard is acentral repository in which a systemcomponent can record its intermediatedecisions and examine those of othercomponents.) Each component readsand annotates its LF, continually en-riching it with further decisions and fin-er specifications until the explanation iscomplete. Annotations include direc-tives (like the mediacoordinatorschoiceof medium) and details about ho w apiece of information will be realized intext or graphics (like the proper verb toconvey an action). While the annotatedLF serves as a blueprint for the finalexplanation, it also allows for commu-nication between media-specific com-ponents. For example, when decidingwhich expressive possibilities best con-

October 1991 35


4/9

Unification in COMETCOMET uses FUF, an efficient extended version of Functional UnificationGrammar, 6 for media coordination. FUF also performs two text-generation tasks(selecting words and generating syntactic structure) and part of graphics genera-tion (mapping the LFs to a communicative goal language supported by thegraphics generator). Each component has its own grammar that is unified non-deterministically with the LF to annotate it with directives or further specifications.The result is a cascaded series of FUF grammars, each handling a separate

task.In FUF, both the input LF and the task grammar are represented using thesame formalism, a set of attribute-value pairs. A value can be an atomic symbolor, recursively, a set of attribute-value pairs. For example, consider the followingfragment of an input LF:(substeps[((process-type action)(process-concept c-push)(roles ( . . )) . . 1)This fragment contains a single attribute, substeps, whose value is a set of at-tribute-value pairs. It contains three subattributes. The first, process-type, speci-fies the type of substep process as an action. The second, process-concept, indi-cates that the specific action is a knowledge-base concept called c-push thatrepresents the action of pushing. The third, roles, specifies the actors and ob-jects that participate in the action. The value of the roles attribute is a set (notshown here). Additional attribute-value pairs occur in a full LF.Annotation is accomplished by unifying the task grammar with the input and iscontrolled by the grammar. For each attribute in the grammar that has an atomicvalue, any corresponding input attribute must have the same value. (Technically,it must have a compatible6 value. As one example, when the grammar attributehas the value Any, the input attribute can have any value.) When the values aredifferent, unification fails. When the attributes match and both values are sets,unification is applied recursively to the values, and the result replaces the inputvalue. When the input LF does not contain the grammar attribute, the attributeand its value are added to the input. Any attributes that occur in the input but notin the grammar remain in the input after unification. Thus, unification matchesonly the relevant subsections of the input. Note that unification is similar to setunion, since enriched attribute-value pairs from both input and grammar aremerged.A fragment of the media coordinators grammar states that an action should berealized in both graphics and text:(((process-type action) ;; If process is an action(media-graphi cs yes) ;; Use graphics(media-text yes) ;; Use text. . .))On unifying this fragment of the grammar with the value of the substeps frag-ment of the input LF, FUF first checks whether the attribute process-type occursin the input. Since it does and the value following it is action, FUF now checkswhether the attribute media-graphics occurs in the input. Since it does not, FUFadds to the input the attribute-value pair (media- graphics yes), a directive i ndicat-ing that the action is to be realized in graphics. Similarly, it adds the directive(media-text yes), specifying that the action is also to be realized in text. Thus, thefirst attribute-value pair is used as a test for this grammar portion. When it match-es the input (the input LF describes an action), the input is annotated with the re-maining attribute-value pairs. If the input LF had contained a different process-type (like abstract), unification with this portion of the grammar would have failedand FUF would have attempted unification with a new portion.In most uses of the media coordinator grammar, a fragment is recursively ap-plied to small nested segments of the input LF at least once. For example, the in-put LF can contain a causal relation, with two roles, both of which are actions. Inthis case, the grammar fragment as shown matches each role. A different frag-ment matches the causal relation (one that annotates it to be realized in textonly). The roles of the actions are annotated recursively, as are their modifiers.

vey the specified content, a media gen-erator can examine decisions mad e inother medi a by reading their annotatedLFs and use that information to influ-ence its own choices. COMET uses asingle mechanism, FUF (FunctionalUnification Formalism), to make anno-tations throughout the system. This al-lows for additional bidirectional inter-action between COMETs componentsthrough the use of unification, as de-scribed in the sidebar. Bef ore describ-ing coordination in more detail, we brief-ly discuss some of COMETs keycomponents, focusing on their individ-ual capabilities.

Media coordinator. This componentperforms a fine-grained analysis of aninput LF to decide whether each por-tion should be realized in either or bothmedia. After conducting a series of in-formal experiments and a survey of lit-erature on media effectiveness, we dis-tinguished six different types ofinformation that can appear in an LF.We have categorized each type as towhether it is more appropriatelypresented in text or graphics. We usegraphics alone for location and physicalattributes, and text alone for communi-cating abstract actions and express-ing connectives that indicate relation-ships among actions, such as causality.Both text and graphics represent simpleand compoun d actions. The media co-ordinator has a FUF grammar that mapsthese information types to media.

Figure 4 shows a representative por-tion of the annotated LF produced bythe media coordinator for Figure 1. Thepart of the LF in roman font was gener-ated by the content planner, while theannotations added by the media coor-dinator are in boldface. This LF speci-fies a single substep and its effect, wherethe substep is a simple action (c-push)and the effect is also a simple action (c-clear). The c-push sub step has one role(the medium, c-button-clr), and it alsospecifies that the location and size ofthe button should be included.

Figures 1 and 4 illustrate the fine-grained division of information amongmedia. For example, location informa-tion is portrayed in graphics only, whil ethe actions are realized in both text andgraphics. In contrast, other informa-tion in the LF is communica ted only intext, su ch as the causal relation be-tween pushing the button and clearingthe display.

36 COMPUTER


5/9

Text generator. COMETstext generator2realizes the LFsegments it has been assignedin text. It must determine boththe number of sentences need-ed to realize the segments an dtheir type (compound, simple,declarative, or imperative). Itmust select verbs to expressLF actions, and nouns andmodifiers to refer to LF ob-jects. Finally, it must constructthesyntacticstructureforeachsentence and linearize the re-sulting tree as a sentence.The text generator dividesinto two modules that carryout these functions: theLexical Chooser and the Sen-tence Generator. The LexicalChooser selects the overallsentence type and the words,while the Sentence Genera-tor produces individual sen-tences. Both modules are im-plemented using FUF.

The text generator can se-lect words based on a varietyof underlying constraints. Thisenables it to use a number ofdifferent words for the sameLF concepts, depending on thecontext. The result is a widervariety of more appropriateoutput. COMET can use con-straints from the underlyingknowledge base, from previ-ous discourse, from a usermodel, and from syntax. Forexample, for the knowledge-base concept c-install, the textgenerator can use install,reinstall, or return. Itmakes a choice based on pre-vious discourse (that is, whatit has already told the user).Thus, after the user has in-stalled the new holding bat-tery, COMET instructs theuser to remove the radiosmain battery to check the newholding batterys functional-ity. At this point, COMET usesthe previous discourse to se-

((cat If)(directive-act substeps)(substeps[((process-type action)

(process-concept c-push)(mood non-finite)(speech-act directive)(function ((type substeps)

(media-text yes)(media-graphics no)))(roles((medium

((object-concept c-button-clr)(roles((location ((object-concept c-location)

(media-graphics yes)(media-text no)))

(size ((object-concept c-size)(media-graphics yes)(media-text no)))))))

ii(cat If)(media-graphics yes)(media-text yes))])

(effects[((process-type action)(process-concept c-clear)

(mood non-finite)(function ((type effects)

(media-text yes)(media-graphics no)))(speech-act assertive)(roles((agent((object-concept c-display)

(roles((location ((object-concept c-location)(media-graphics yes)(media-text no)))

(size ((object-concept c-size)(media-graphics yes)(media-text no)))))))

ii(cat If)(media-text yes)(media-graphics yes))]))

Figure 4. Logical form for Figure 1.

lect the verbs reinstall and returnwhen instructing the user to put themain battery back in the radio. If theuser had not been previously instructedto remove the battery, COMET wouldhave selected the verb install.COMET can also avoid words thatthe user does not know. For example, it

generates Make sure the plus lines up to portray the location of the button inwith the plus instead of Check the Figure 1, as requested in the LF ofpolarity when describing battery in- Figure 4. It selects a viewing specifica-stallation to a user not familiar with the tion that (1) locates the button panelword polarity. The novel use of FUF centrally in the illustration, (2) makesto represent the lexicon efficiently pro- additional, surrounding context visible,vides a variety of different interacting and (3) ensures that both the object andconstraints on word choice. context are recognizable. It highlights

Graphics generator. IBIS(Intent-Based IllustrationSystem) generates illustra-tions designed to satisfy thecommunicative goals speci-fied in the annotated LFsthat it receives as input. Thecommunicative goals thatIBIS currently supports in-clude showing absolute andrelative locations of objects,physical properties (such assize, shape, material, andcolor), state (such as a knobsetting), change of state(such as the change in a knobsetting), and a variety of ac-tions (such as pushing, pull-ing, turning, and lifting). Indesigning an illustration,IBIS controls all aspects ofthe picture-making process:the objects included andtheir visual attributes, thelighting specification, thegraphical style used in ren-dering the objects, the view-ing specification, and thestructure of the pictureitself.IBIS uses a generate-and-test approach. The IBIS rulebase contains at least onedesign rule for each com-municative goal that can ap-pear in an input LF. Eachdesign rule invokes a set ofstylistic strategies that spec-ify high-level visual effects,such as highlighting an ob-ject. These strategies are inturn accomplished by stilllower level rules that real-ize the strategies. The lowerlevel rules create and ma-nipulate the graphical de-pictions of objects includedin the illustration and mod-ify the illustrations lightingspecification, viewing spec-ification, and rendering in-formation.For example, IBIS uses acombination of techniques

October 1991 37


6/9

the button by modifying the intensity ofthe lights that illuminate the objects inthe illustration.IBIS rules evaluate the success of eachtask that it performs. This is importantbecause of the complex interactions thatcan occur in an illustration. Considerobject visibility. Each object may beobscured by or obscure other objects.IBIS must determine whether visibilityconstraints are violated and addressthese by modifying the illustration. If anIBIS strategy doesnt succeed, it canbacktrack and try another one. For ex-ample, if illuminating an object doesntmake it brighter than surrounding ob-jects, IBIS can try to decrease the inten-sity of the lights illuminating the sur-rounding objects.

Media coordinationA multimedia explanation systemmust coordinate the use of different

media in a single explanation. It mustdetermine how to divide explanationcontent between different media suchas pictures and text. Moreover, once thecontent has been divided, the systemmust determine how material can begenerated in each medium to comple-ment that of the other media.

A few researchers are addressing theautomated generation of coordinatedmultimedia explanations with empha-sis on how the media can complementeach other. Integrated Interface9 pro-duces US Navy briefing charts, usingrules to map objects in the applicationdomain (a database of information aboutships) into objects in the presentation.SageY explains how and why quantita-tive models change over time, whileWip explains physical actions like thoseof COMETs domain.

Integrated Interfaces and Sage oper-ate in a two-dimensional world of chartsand graphs, and do not address the prob-lems of describing objects and actionsin 3D. Integrated Interfaces also usesmany design rules specific to the partic-ular kind of briefing chart it produces.Although Wip also emphasizes tight me-dia coordination in application to 3Ddomains, its content planner takes anincremental approach. Ea ch piece ofinformation to be communicated is as-signed sequentially to its generators.Evaluations of potential success are re-turned to the planner to help determinemedia assignments even before enough

information is provided for an entiresentence or illustration.In contrast, C OMET provides its me-dia generators with more information ata time in a common LF that describeswhat the other generators have beenassigned. The LF can also be enrichedwith accomplishments of the other gen-erators. Thus COMET gives its mediagenerators more context from which towork in making their initial decisions,while still allowing feedback.

Here we focus on two aspects of me-dia coordination in COMET. First, weshow how the use of a common content-description language allows for moreflexible interaction between media,making it possible for each generator toquery and reference other generators.By passing the same annotated descrip-tion of what is to be described to eachgenerator, we permit each generator touse information about what the othergenerators present to influence its ownpresentation. Then, we show how bidi-rectional interaction between the me-dia-specific generators is necessary forcertain kinds of coordination. Bidirec-tional interaction allows COMET to gen-erate explanations that are structurallycoordinated and that contain cross-ref-erences between media.

Common contentdescriptionAll components in COMET follow-

ing the content planner share a commondescription of what is to be communi-cated. Just as modules accept input inthe same formalism, they can also anno-tate the description as they carry out itsdirectives. This design has the followingramifications:

l It lets text and graphics influenceeach other.

l Communicative goals are separatedfrom the resources used to carry themout.

l It provides a mechanism for text andgraphics generators to communicate.Mutual influence. Since both the textgenerator and graphics generator re-

ceive the same annotated content de-scription as input, each knows whichgoals are to be expressed in text, ingraphics, or both. Even when a media-specific generator does not communi-

cate a piece of information, it knowsthat the information is to be conveyedto the user; thus, it can use this knowl-edge to influence its presentation. Con-sider a portion of the explanation thatCOMET generates to instruct the userin how to install the holding battery.The second step of the explanation (Fig-ure 2b) was generated from a complexLF that consists of one goal (to removethe holding battery cover plate) andtwo complex substeps that carry outthat goal. As Figure 2 b illustrates, themedia coordinator determines that thegoal is to be generated only in text (Re-move the holding battery cover plate:)and that the substeps are to be shown inboth media.

Although IBIS depicts only the sub-steps of the LF. it receives the entireannotated LF as input. Since it receivesthe full LF, and not just the pieces as-signed to graphics, IBIS knows that theactions to be depicted are steps thatachieve a higher level goal. Althoughthis goal is not itself realized in graph-ics, IBIS uses this information to createa composite illustration. This type ofillustration consists of an integrated setof pictures that work together to achievea common set of goals that cannot beaccomplished in a single simple illus-tration. In this case, IBIS rules do notinclude any satisfactory way to show theradio with its cover plate and captivescrews in different positions in one il-lustration.

If IBIS were to receive only the sub-steps, it would have no way of knowingthat the substeps are being described inrelation to a higher level goal. It mayend up producing two separate illustra-tions, just as it does for each simple LF,such as that sho wn in Figure 2a. Thus,information conveyed in the explana-tion as a whole, but not in graphics,influences how IBIS depicts other in-formation.

Separation of goals from resources.Because we are using a common con-tent-description language, content mustbe specified at a level that is appropri-ate for all generators. We have foundthat by expressing content as a combi-nation of communicative goals and theinformation needed to achieve thesegoals, each generator can select the re-sources it has at hand for accomplishingits assigned goals. In text generation,this means the selection of specific syn-tactic or lexical resources (using passive

38 COMPUTER


7/9

voice to indicate focus, for example). Ingraphics generation, it means the selec-tion of a conjunction of visual resources(modifying an objects material and thelights that illuminate it to highlight it,for example).

Consider agai n the explanation shownin Figure 1 and its associated annotatedLF in Figure 4. The main goal of the firstpart of the LF is to describe an action (c-push) and its role (medium). Subgoalsinclude referencing an object (for ex-ample, c-button-clr, the clear button)and conveying its location and size. IBISand the text generator use different re-sources to achieve these goals. For ex-ample, the text generator selects a lexi-cal item, the verb press, to describethe action. Press can be used insteadof other verbs because of the character-istics of the medium, c-button-clr. If themedium were a slider, a verb such aspush or move would be required.In contrast, IBIS uses a metaobject, anobject that does not itself represent anyof the objects in the world being depict-ed. In this case, the metaobject is anarrow that IBIS generates to depict theaction of pushing the button. To refer tothe clear button itself, the Sentence Gen-erator uses a definite noun phrase,whereas IBIS highlights the object inthe picture.

A mechanism for communication.Since both generators understand thesame formalism, they can provide moreinformation to each other about theresources they have selected simply byannotating the content description. Thus,the content description serves as a black-board to which all processes can writemessages. We use this facility for coor-dinating the internal text structure withpictures.

BidirectionalinteractionCertain typesof coordination between

media can only be provided by incorpo-rating interacting constraints betweentext and graphics. Two-way communi-cation between the media-specific gen-erators may be required as they carryout their individual realizations. Fur-thermore, coordination may only bepossible once partial decisions have beenmade by the media-specific generators.For example, the text generator needs

to know h ow the graphics generator hasdepicted an object before it can refer tothe objects visual properties in the il-lustration. Here we discuss two types ofcoordination that require bidirectionalinteraction: coordination of sentencebreaks with picture breaks, and cross-referencing text and graphics.

Coordinating sentence breaks withpicture breaks. In addition to revealingseveral dimensions along which to as-sign information to media, our mediacoordination experiments also demon-strated a strong preference for tightstructural coordination between text andgraphics. Our subjects much preferredsentence breaks that coincided with pic-ture breaks. While multiple sentencesaccompanying one picture wer e foundsatisfactory, subjects strongly objectedto a single senten ce that ran across sev-eral pictures.

Including this type of coordination inCOMET requires two-way interactionbetween text and graphics. Both textand graphics have hard and fast con-straints that must be taken into accountto achieve sentence-picture coordina-tion. IBIS uses a variety of constraintsto determine picture size and composi-tion, including how much informationcan easily fit into one picture, the size ofthe objects being represented, and theposition of the objects and their rela-tionship to each other. Some of theseconstraints cannot be overridden. Forexample, if too many objects ar e depict-ed in one picture, individual objects maybe too small for clarity.

Thissituationsuggests that constraintsfrom graphi cs be used to determine sen-tence size and thereby achieve coordi-nation between picture and sentencebreaks. However, some grammaticalconstraints on sentence size cannot beoverridden without creating ungram-matical - or at least very awkward -text. Each ver b takes a required set ofinherent roles. For example, put takesa medium and to-location. Thus, Johnput. and John put the book. are bothungrammatical. Once a verb is selectedfor a sentence, this can in turn constrainminima1 picture size; the LF portioncontaining information for all requiredverb roles shoul d not be split across twopictures. Consequently, constraints fromtext must also be taken into account.

In COMET we incorporate this inter-action by maintaining two separate tasksthat run independently, each annotat-

ing its own copy of the LF when a deci-sion is made and querying the otherwhen a choice a bout sentence or picturestructure must be made. Once a verb isselected for a sentence, the text genera-tor annotates its copy of the LF by not-ing the roles that must be included tomake a complete sentence. At the sametime, the graphics generator annotatesits LF with the mapping from the piecesof information to be communicated bygraphics to the identifiers of the illus-trations in which it intends to communi-cate the information.

When different sentence structuresare possible, the text generator uses thegraphics generators annotations tomake a choice by unifying the graphicsgenerators LF with its own. Considerthe example of clearing the displayshown in Figure 1. IBIS generates onepicture showing the action and its ef-fect; the text generator produces onesentence incorporating the effect as apurpose role of the action (. . . to clearthe display). IBIS could depict thisaction in many ways, depending on thesituation. For example, a pair of be-fore and after pictures can be used,the first showing the action push aboutto occur and the second showing thecleared display. This picture structurecan be especially useful in locating theobjects participating in the action byshowing their appearance prior to thisevent. Figure 5 (on the next page) showswhat happens when IBISs style rulebase is modified so that a compositebefore and after pair is preferred.After consulting the annotated logicalform, the text generator produces twoseparate sentences.

Cross-references. One important goalof media coordination is to allow mate-rial in one medium to cross-referencematerial in another. COMET providesfor two kinds of cross-referencing: struc-tural and content. The former refers tothe coarse structure of the material be-ing referenced. For example, a sentencecould refer to an action by mentioningthat it is depicted in one of the twopictures on the display. In contrast, acontent cross-reference refers to the ma-terials content, such as an objects po-sition in a picture or the way in whichthe object is highlighted. Structural cross-references require only high-level knowl-edge of how the material being refer-enced is structured, whereas contentcross-references require low-level

October 1993 39


8/9

Clear display

Press the CLR button.This will cause the display to clear.

Figure 5. An alternative explanation for clearing theillustration containing two pictures is generated withwith Figure 1).

radio display. A compositecoordinated text (compa re

knowledge of how the materials com-municative goals are rcali7cd.

C0MET.s text generator can makeboth structural and content cross -refer -ences to IBIS illustrations. IBIS con-structs a representation of each illustra-tion it generates that is indexed by theLF. This rcprcscntation contains infor-mation about the illu\trations hierar-chical structure. the identity and posi-tion of the objects that it depicts. andthe kinds of illustrative effects used inconstructi ng the illustration (like high-lighting or cut-awa y views). The textgenerator queries this representationto generate cross-references. For exam-ple. it can refer to information that iscommuni cated .*. in the left picture(structural cross-reference) or mention. the old holding battery shown inthe cut-away view (content cross-reference).

Current status andlimitationsCOMET can provide instructions for

maintenance and repair procedures intwo different contexts 4 user can di-rectly request instruci1 crn5 Ior ;i specific

procedure through a menu interface(essentially. asking How do I do x?).Alternatively. during symptom diagno-sis. the user can request an explanationfor any diagnostic procedure the systemspecifies must be carried out. COMETcan explain over 40 complex proceduresrepresented in the knowledge base. Avariety of explanations are provided fora single procedure. depending on userbackgroun d. explanation context. orprevious discourse. For example, CO M-l5.I can vary the vocabulary (like notusing the word polar ity if it isnt in theusers vocabular y). the illustration de-sign (like attempti ng to reuse the pre-ceding illustrations viewsing specific a-tion for the next illustration to avoidconfusing camera motion). or the infor-mation communi cated (like ornittinganexplanation for a procedural step if itwas explained recently). COME T alsohas preliminar y facilities for answeringfollow-up questions of the form: Whatis an x?. Where is the x?. or Whyshould I do x?

COMET was designed to be as do-main independent as possible It can beadapted with mini mum effort to a newtask-based domain when tasks and ac-tions ar e encoded uqing a standard plan-based rcprcx,~ntation and objects are

U-M-I40 COMPUTER~_ue to J lack of Contras? between rext and background, thrs pagt c J norreproduce well

represented using frames. The contentplanner can produce the content forexplanations of tasks or actions repre-sented as plans, as well as for follow-upquestions. Because many artificial in-telligence systems use plans and frame-based representations, COMET s ex-planation facilities can be used in a widerange of applications. Similarly, the Sen-tence Generator grammar, the LexicalChooser rules, and IBIS rules apply toany domain. task-ba sed or not. In fact,these components have already beenused in a number of applications underdevelopment at Columbia University.The media coordination rules we devel-oped also apply to any domain that in-cludes actions. physical objects, and ab-stract relations.To handle a new domain, it would benecessary to augment the lexicon (add-ing new vocabulary). the Loom knowl-edge base (adding new plans and ob-jects). and the graphics knowledge base(adding the new objects detailed ge-ometry and physical properties). Theseare currently substantial tasks. but so isthe effort required to create conven-tionally authored explanations for a newdomain. Also. note that some domain-dependent information. such as thegraphics knowledge base, would ideallybe available in CAD/CAM databasescreated when the objects to be docu-mented were designed. COMETs rulebases were designed for domains in-volving physical objects. Therefore,handling domains that stres s abstractrelations among abstract concepts (likestatistical analyses of numericvariables)would require major changes. The chang-es include different content-planningstrategies. a different media coordina-tor grammar. and new graphics genera-tion approaches (for example. adding amethod for designing quantitative datadisplays ).

T e COMET testbed has allowedus to explore many ways tocoordinate the generation oftext and graphics. Our present and fu-ture work on COMET and its compo-nents includes the development of ad-ditional generators that support thetemporal media of speech and anima-tion. (IBIS already allows for direct,dynamicusercontrol of the viewingspec-ification.) Our work also includes thedesign of a browsing/navigati on facility


9/9

for COMETs explanations. We aredeveloping a media layout compone ntthat will rely on the media generatorsannotations to determine the relation-ships among pieces of text and graphics.This will allow COMET to group relat-ed items spatially.

We plan to allow feedback from themedia generators to affect assignmentsmade by the media coordinator and theselection of communicative goals madeby the content planner. Our use of uni-fication has the potential to make suchfeedback possible. Currently, we use anoverall control structure for efficiency,calling the unifier separately for eachgrammar. Instead, we could call theunifier once for the combined series ofgrammars, thus allowing complete in-teraction through unification among thetypes of constraints. In this scenario, adecision made at a later stage of pro-cessing can propagate back to undo anearlier one. For example, informationabout the syntactic form selected canpropagate back to the lexical chooser toinfluence verb choice.

While COMET is a research proto-type, we believe that far more powerfulsystems will someda y generate high-quality multimedia explanations for us-ers in a variety of domains. Potentialapplications include education (explain-ing scientific phenomena) or even basichome repairs (coaching the user throughtroubleshooting a broken appliance).Although the hardware needed to runour current system is beyond the reachof most users, rapid improvements inthe price-performance ratio will soonmake high-performance real-time 3Dgraphics a fundamental capability ofany computer system. In our work onCOMET, we have attempted to lay someof the groundwork for the kinds ofknowledge-based user interfaces thattechnological advances will soon makefeasible. n

AcknowledgmentsResearch on COMET is partly support edby Defense Advanced Research ProjectsAgency Contract N00039-84-C-0165, theHewlett-Packard Company AI UniversityGrants Program, National Science Founda-tion Grant IRT-84-51438, New York StateCenter for Advanced Technology ContractNYSSTFCAT(88)-5, and Office of NavalResearch Contracts N00014-82-K-0256.N00014-89-J-1782, and N00014-91-J-1872.COMETs development is an ongoing group

October 1991

effort and has benefited from the contribu-tions of Michael Elhadad (FUF), DorCe Selig-mann (IBIS), Andrea Danyluk (diagnosticrule base), Yumiko Fukumoto (media coor-dinator), Jong Lim (static knowledge baseand content planner), Christine Lombardi(media coordinator), Jacques Robin (Lexi-cal Chooser), Michael Tanenblatt (knowl-edge base), Michelle Baker. Cliff Beshers,David Fox. Laura Gabbe. Frank Smadia,and Tony Weida.

References1. S. Feiner and K. McKeown, Coordinat-ing Text and Graphics in ExplanationGeneration, Proc. Eighth Nat1 Conf.Artificial Intelligence, AAAI Press/MITPress, Menlo Park, Calif., 1990, pp. 442-449.2. K.R. McKeown et al., Language Gener-ation in COMET, in Current Researchin Language Generation. C. Mellish, R.Dale, and M. Zock. eds.,Academic Press,London, 1990.3. R. MacGregor and R. Bates, The LoomKnowledge Representation Language,Tech. Report ISIiRS-87-188, Informa-tion Sciences Institute, University ofSouthern California, 1987.4. J. Allen, Natural Language Understand-ing, Benjamin Cummings PublishingCompany Inc., Menlo Park, Calif.. 1987.5. R. Reddy et al.. The Hearsay SpeechUnderstanding System: An Ex ample ofthe Recognition Process. Proc. IJCAI73, Morgan Kaufmann Publishers, PaloAlto, Calif., 1973, pp. 185-193.6. M. Kay, Functional Grammar. Proc.Fifth Mtg. Berkeley Linguistic s Sot., Ber-keley Linguistics Society. Berkeley. Ca-lif.. 1979.7. D. Seligmann and S. Feiner, Automat-ed Generation of Intent-Based 3D Illus-trations. Computer Graphics (Proc. SIG-graph 91). Vol. 25, NO. 4, July 1991. pp.123-132.8. Y. Arens. L. Miller, and N. Sondheimer.Presentation Design Using an Integrat-ed Knowledge Base, in Intelligent UserInterfaces, J. Sullivan and S. Tyler, eds.,Addison-Wesley, Reading, Mass., 1991.pp. 241-258.9. S. Roth, J. Mattis. and X. Mesnard,Graphics and Natural Language asCom-ponents of Automatic Explanation, inIntelligent UserInterfaces, J. Sullivan andS. Tyler, eds.. Addison-Wesley, Read-ing. Mass.. 1991, pp. 207.239.

10. W. Wahlster et al.. *Designing Illustrat-ed Texts: How Lan guage Production isInfluenced by Graphics Generation,Proc. European Chapter Assoc. Cornpu-tational Linguistics, 1991. pp. 8-14.

11. E. Sacerdoti, A Structure for Plans andBehavior, American Elsevier. New York,1977.12. J. Mackinlay, Automating the Designof Graphical Presentations of RelationalInformation,ACM Trans. Graphics,Vol.5. No. 2. Apr. 1986. pp. 110-141.

Steven K. Feiner is an associate professor ofcomputer science at Columbia University.His research interests include image synthe-sis, applications of artificial intelligence tocomputer graphics. user interfaces, anima-tion, hypermedia, virtual worlds, and visual-ization.Feiner received a PhD in computer sci-ence from Brown University in 1987. Heserves on the editorial boards of ElectronicPublishing and ACM Transactions on Infor-

mation Systems, and hascoauthored the bookComputer Graphics: Principles and Practice,2nd ed. Feiner received an Office of NavalResearch Young Investigator Award in 1991and is a member of the IEEE ComputerSociety and ACM.

Kathleen R. McKeown is an associate pro-fessor of computer science at Columbia Uni-versity. Her research interests include natu-ral languagegeneration. usermodeling, expertsystem explanation, and natural langua geinterfaces.McKeown received a PhD in computerscience rom the University of Pennsylvaniain 1982. She was elected president of theAssociationofComputationalLinguisticsfora term beginning in January, 1992, and cur-rently serves as ts vice president. McKeow nhas served on the executive council of theAssociation for Artificial Intelligence andwas program cochair of its annual confer-ence-in i991. She received a National Sci-ence Foundation Presidential Young Inves-tigator Award in 1985.

Readers may contact Steven K. Feiner andKathleen R. McKeown at the Department ofComputer Science.Columbia University, 500W. 120 St.. New York. NY 10027.41

automating generation of coordinated muli media

Documents