multimodal tool support for creative tasks in the visual arts

Multimodal tool support for creative tasks in the visual artsq

J. Sedivya,1, H. Johnsonb,*

aTelepresence Systems Inc., 300-8 Market Street, Toronto, Canada M5E 1W5bMathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, UK

Abstract

This paper presents an investigation into computer tool support for creative tasks in the visual arts, in particular sketching. The research

draws from both theoretical models of creative processes and empirical observations of sketching activity in the domain of character

development in animation and illustration. These sources show that creative problem solving through sketching is a highly iterative process

during which various constraints are rapidly evolving and being adapted to. A set of requirements for an experimental, multimodal sketching

tool was developed, drawing from our theoretical and empirical studies. In order to support the intense, rapid nature of the task, voice input

was implemented allowing the user to access functionality without interrupting the hand activity. The multimodal tool also included radial

marking menus in order to provide rapid navigation to functionality. Informal evaluations were conducted that provided qualitative feedback

about the use of the system. q 2000 Elsevier Science B.V. All rights reserved.

Keywords: Sketching; Multimodal tools; Creativity

1. Introduction

The goal of the research to be reported in this paper was

to investigate ways in which creative tasks in the visual arts,

in particular sketching, could be supported by computer

tools. There were two initial aims of the research; to review

literature related to both psychological and philosophical

models of sketching and to identify the feasibility of

employing multimodal input as a means of supporting the

rapid dialogue between the artist and the sketch.

Investigation into a variety of domains was undertaken in

order to make decisions about which task categories to

support, and to analyse the requirements that any proposed

sketching support tool should meet. There were two impor-

tant facets of the research; collecting, analysing and modelling

data about creative sketching tasks and discussing the possibi-

lities for computer tool support with professionals in the ®eld.

The speci®c domain chosen to be supported by the proposed

multimodal tool was character development in animation and

illustration. As a result of the observational and modelling

phases, a multimodal vector-based drawing program was

developed and informally tested on a number of experienced,

professional users. The intention was that the informal evalua-

tions would provide information about design issues concern-

ing speech input for creative work tasks and that the sketching

tool could be used as a test-bed for further research on the role

of input modality on sketching.

The informal evaluations revealed the dif®culty subjects

encountered when using new forms of input, types of menu

and new tools and alternative approaches to evaluation are

suggested.

2. Creativity

Creativity can refer to various problem solving, non-linear

thinking or inspirational creativity (from the Muses). The term

can be applied to thought processes in the arts or in the

sciences. Creativity is not easy to de®ne, but can be construed

(cf. Refs. [2,15]) as a process whereby there is an evolution

towards a solution to a problem, which makes use of a combi-

nation of logical and illogical mechanisms. When faced with

any creative task or problem, it is essential to be able to inves-

tigate a variety of alternative solutions and this process can

outwardly vary across disciplines. For example, scientists

might use diagrams and equations written on a chalk board

while musicians might doodle on the keyboard. In the case of

creativity in the visual arts, the production of rough sketches in

the initial stages of design is extremely common. Perceptually

and physically these three activities are quite different and yet

Knowledge-Based Systems 13 (2000) 441±450

0950-7051/00/$ - see front matter q 2000 Elsevier Science B.V. All rights reserved.

PII: S0950-7051(00)00064-2

www.elsevier.com/locate/knosys

* Corresponding author. Tel.: 1 41-122-532-3215; fax: 1 41-122-582-

6492.

E-mail addresses: [email protected] (J. Sedivys),

[email protected] (H. Johnson).1 Tel.: 1 1-416-777-1177, ext. 32; fax: 1 1-416-777-1188.q Derived from `Supporting creative work tasks: The potential of multi-

modal tools to support sketching', published in the Proceedings of the Third

Conference on Creativity and Cognition, Loughborough, UK, October 10±

13, 1999, pp 42±49. Reproduced with permission from ACM q 1999.

they all serve a similar purpose in the creative process. They

all can be seen as external representations of ideas that can

be manipulated, modi®ed, communicated and used to stimu-

late the development of new ideas.

From the perspective of providing tool support for crea-

tivity, it is important to understand the relationship between

the designer and the external representations they employ

and for what purpose. Within the domain of visual arts,

sketches and diagrams are the most common means by

which ideas can be explored; therefore, it is necessary to

analyse the activity of sketching.

3. Sketching

3.1. Psychological models of sketching

The relationship between the physical act of sketching,

the sketched images, and the cognitive processes of the

artists themselves is frequently described as a ªconversa-

tionº between the drawing and the artist [15,19,21]. In creat-

ing a sketch, designers are making an explicit representation

of their ideas, which then aids in further reasoning about the

problem. It is useful in this context to distinguish between

sketches that are used to explore ideas and therefore are a

means to an end and the drawings that are the end products

of the sketching process. ªReasoningº through the initial

sketches need not necessarily be logical, but can be ªloosely

structuredº or analogical. Moreover, different perceptual

ªmodesº can be stimulated by the same drawing. For exam-

ple, Goldschmidt [9] de®nes two different states, which she

refers to as seeing as and seeing that. Seeing as refers to the

process of using ª®guralº thinking while sketching and

seeing that refers to the use of non-®gural elements to reason

about the design.

Suwa and Tversky [21], in analysing how architects

perceive their sketches, devised a taxonomy of information

categories that could be used to classify what designers

described about their drawings. They further decompose

Goldschmidt's seeing as mode into emergent properties,

spatial and functional relations and background knowledge.

These categories were then used to de®ne chunks of infor-

mation that were perceived by the designers. Their analysis

revealed that detailed consideration of topics related to a

particular idea occurred during examination of local spatial

relations in a sketch.

Oxman [18] concentrates less on the data perceived by

designers than on the process of design itself. She also

argues that there are multiple abstracted representations

that the designer can extract from the sketches. These

abstractions are broadly divided into three categories: typo-

logical schema, topology and formal systems. Each of these

abstraction categories are associated with operational meth-

ods that can be observed as sketching actions such as re®ne-

ment, generalisation, scaling or symmetry operations. Her

model of visual reasoning in sketching is based on the re-

representation hypothesis, which de®nes creativity as using

a cycle of re-representations as a means for conceptual

exploration. She extends this theory by arguing that re-

representations are driven by external and internal

constraints and that the new designs adapt to satisfy these

constraints. The adaptation is achieved through the

designer's perception of the different abstraction levels

and the execution of associated operations. The resulting

model of re-representation is described as a cyclical activity

that passes through a series of distinct stages before repeat-

ing itself.

3.2. Character sketching for book illustration

The tasks to be considered in depth relate to character

development in illustration or animation. For the sake of

brevity, only our studies of illustrating will be outlined

here. Illustration, for instance in a book, can involve either

J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450442

Fig. 1. First two renderings of a Winnie the Pooh scene.

the creation of new visual representations of ®ctional char-

acters or incorporating prede®ned personas into a scene.

Unless the author and the artist work in close collaboration,

it is typical for illustrators to receive a copy of the story text

and a short description of what should be depicted on each

page. It is then left to the artist's discretion as to the overall

composition of the picture and the precise positioning of the

characters relative to each other. This decision is based on

the illustrator's knowledge of what is happening in the story

and knowledge of the general style expected by the story-

writer. In the case of small independent writers, this may be

an informal description or in the case of large clients such as

Disney or Warner Brothers, there may be strict style guides.

Further mitigating this decision is the knowledge about

established rules of the way compositional elements lead

the eye around the page so that it focuses on the desired

element and an understanding of established rules referring

to the way compositional elements contribute to dramatic

feeling.

Initially, the illustrator works out different ideas for the

composition as thumbnail sketches. These are done with

thick, soft pencils in a light colour. As the ideas for the

composition become more re®ned, darker colours and thin-

ner pencils are used to draw over the existing drawing. Also

as the ideas become more re®ned, the drawing is redone on a

larger scale so that smaller details can be worked out.

Throughout the sketching process, the artist pauses to rotate

the paper at different angles or tilts their head to get a

different perspective. While drawing a scene that involved

a lot of movement and characters jumping around, one artist

jiggled the paper up and down to aid in visualising how the

motion could appear.

This iterative process can be seen in the sketch samples in

Figs. 1 and 3. In the ®rst image (Fig. 1a), the artist has

roughly laid out the appearance of the scene and the approx-

imate positioning of the characters. At this stage, the char-

acters are not recognisable, they merely serve as

compositional elements. The thickness and `fuzziness' of

the lines are very deliberate because they allow the artist

to imagine more than one possibility for a particular

element. In fact, several scene elements have been drawn

in more than one position for the purpose of experimenting

with the scene. For example, the leftmost character (Tigger)

and the rightmost character (Christopher Robin) have been

drawn both with their arms at their side and with one arm

waving in greeting. Rubbers are never used. If a ªmistakeº is

made, it is simply drawn over in a darker pencil or traced

over on a new piece of tracing paper. At the early stages of

the sketch, however, the artist does not consider these as

mistakes, rather they are experiments with different compo-

sitions and poses. In many situations, the illustrator prefers

to have both representations visible to him so that he can

evaluate their dramatic and compositional impact, simulta-

neously. It is also common to experiment with elements of a

scene in whatever space is available on the drawing surface.

Fig. 3 shows a part of an illustration where the details of a

character's feet are experimented with in the margins.

Once the composition of the scene has been decided, the

artist places a sheet of tracing paper over it and roughly

retraces the scene that was last drawn in the darkest colours

(Fig. 1b). In this iteration, detailed features of the characters

and their facial expressions are developed on a larger scale

than the initial thumbnail. The peripheral elements of the

scene can be ignored for the moment, but are revisited later

for the ®nal image, which integrates the story text with the

illustration (Fig. 2). At these later stages, the artist might

consult a book of style guides provided by the client to make

sure that the proportions have been drawn correctly and that

J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450 443

Fig. 3. Experimenting with details in the margins.

Fig. 2. Final version of the Winnie the Pooh scene.

the movements and facial expressions are ªin characterº. As

the drawing nears completion, the image is photocopied

onto the page containing the story text to ensure that no

crucial elements of the drawing interfere with the text or

the spine of the book.

A photocopier is a common tool during the later stages in

the drawing, when the composition is being ®nalised.

Images, or parts of images are often enlarged or reduced

then cut out and glued into a new drawing. For example, the

artist might especially like the way a particular character

was drawn, but for its wrong size. This provides a clue as to

the types of operations (scaling, copying, rotations) that

could be useful to support in a sketching tool. These opera-

tions are clearly dif®cult for humans to perform since an

accomplished artist would rather walk over to another

room, make an enlarged/reduced photocopy, cut out an

image and paste it into a scene, simply to save the effort

of redrawing a precise copy in a different size.

3.3. Tool support for sketching

The studies reported in Section 3.2 clearly indicate the

iterative, manipulative and reasoning elements of represent-

ing and re-representing ideas for the purposes of solving

creative design problems. We need to know if tools can

aid this activity and what current computer tool solutions

exist.

Although there are a wide variety of powerful CAD tools

to assist architects and graphics designers to create repre-

sentations of their ®nal designs, very little exists in the way

of support for the earlier stages of the design process. In

fact, some of the tools can actually hamper creativity [3]. It

is crucial from an HCI perspective to consider which part of

the activity or process of sketching should be supported.

There are broadly two possibilities. The ®rst relates to

supporting the designer with the representation process

itself on an abstract level. Suggesting new or alternative

representations, de®ning constraints, or assisting the user

in lateral thinking can do this. This approach has been

explored extensively by Candy and Edmonds [5], Suwa

and Tversky [21] and Gross [7,10] amongst others. The

second possibility relates to supporting only the physical

act of sketching. Under this paradigm, creativity is solely

the responsibility of the human using the tool. Support for

this level of the creative process has been relatively unex-

plored. A notable exception is the Electronic Cocktail

Napkin [10], which provides some support for drawing

management and allows users to sketch diagrams using a

stylus input. However, because most of the tool's features

are focused towards assisting the user in de®ning spatial

relationships and constraints, it falls more readily into the

®rst category.

Our approach was to explore the potential of aiding the

physical act of sketching thus supporting what Oxman terms

ªoperational methodsº rather than ªabstraction levelsº. How

can this be accomplished?

Since sketching consists of intense activity for the hands,

speech input could be used potentially to manipulate a

sketch. Speech input, in lieu of commands expressed

through traditional menu bars, would leave the hands free

to work on the art and for experienced users also save time.

A mixture of speech and gesture is also a natural combina-

tion for certain types of expression that are dif®cult to

express through menus. Speech input alone has a number

of potential disadvantages, for instance, how is the novice

user given information about what commands are available

and how will they understand the structure of the command

set? An additional problem relates to the fact that we do not

know whether issuing linguistic commands will interfere

with conceptual thought processes, as in the Stroop effect

in psychology. It is therefore considered appropriate to

combine the respective strengths of two modes of input,

menu-driven and verbal input.

The view that speech and gestures are complementary

modes of communication rather than redundant is supported

in HCI studies by Oviatt [17], Mignot and Carbonell [16],

Cohen [6] and Hauptmann [11]. This notion also has formed

the basis of the architecture underlying the multimodal soft-

ware framework proposed by Vo and Waibel [22]. Proto-

type applications that use multimodal input have been

constructed successfully in recent years, but there is scant

evidence from user testing that the integration of input

modes was successful from a usability perspective as

opposed to a purely technical perspective. This means that

there is a lack of design principles or heuristics to guide

designers in constructing such systems.

4. Towards a designed solution

A ®rst step towards a designed solution is to collect,

analyse and model data about how designers undertake

the creative activity of sketching. This will provide, along

with the models of sketching outlined earlier in the paper, a

basis for identifying user requirements to be satis®ed by the

tool. The task was speci®c enough to identify relatively,

narrowly de®ned user requirements; but the sketching activ-

ity inherent in it is general enough for the results of the user

studies to be applied to other sketching applications.

Based on our ®rsthand observations, the sketching task

was shown to be highly iterative in nature and the psycho-

logical model of sketching proposed by Oxman referred to

previously was found, with modi®cation, to be an appropri-

ate representation of the activity of sketching observed and

described above. Although the test subjects in Oxman's

study were architects rather than animators, it was found

that there were similarities in the process of sketching across

the domains and only minor adjustments to the model were

required.

The abstraction levels of typological schema, topology

and a formal system still apply in this domain, however,

the associated operational methods are slightly different.


The most signi®cant difference between the domain of

architectural sketches and character illustration appear to

be in the abstraction level of the formal system. In the latter

domain, the conceptual components tended to include

elements such as compositional rules, perspective and

style guides rather than the central, linear and axial concepts

employed by architects. In addition, the operational

elements differed between the domains. In particular,

symmetry operations were not observed, but translation,

rotation, scaling and zooming were commonplace. Fig. 4

shows Oxman's model [18], with modi®cations appropriate

to our domain. For a full account of the modelling process

and results, see Ref. [20].

The sources for the requirements consist of the observed

task behaviour by subjects and from the relevant literature.

In addition, the requirements bene®t from informal discus-

sions with designers and informal evaluations with existing

technology.

4.1. Requirements

The main requirement is to support a rapid iteration of

design ideas. This requirement appears to be of fundamental

importance and can be achieved through a number of design

features:

1. Allow access to functionality as quickly as possible by

ensuring dialogues and command sequences are kept as

short as possible.

2. Support the use of layers. Transparent layers allow

designers to create a modi®ed version of a previous

sketch, tracing paper is frequently used and could provide

a useful metaphor for a sketching program. The layers

should be added, removed, shown or hidden very easily.

3. Support a variety of pencil thicknesses and colour inten-

sities. All the designers, who used pencils to sketch, used

either different sized or coloured pencils or used varying

pressures to achieve the same effect.

4. Allow users to perform manipulations on their drawings.

It should be easy to move and manipulate lines. The

transformations to be supported should be rotation, scal-

ing, insertion and deletion.

5. Support the expected aesthetic. Users of design packages

now expect images to be anti-aliased so that their work

does not appear computer drawn.

6. Provide as much space on the screen as possible for

drawing. Several users complained that the assortment

of palettes and widgets on the edges of their screen

took up too much space.

4.2. System design

Stemming from the requirements are two fundamental

design principles that underlie the implementation of

Speak 'n' Sketch. The ®rst is to minimise the time spent

in accessing functionality so that as much time as possible is

spent actually drawing. The second is to make the screen as

uncluttered as possible, making maximum space for

drawing.

Rough sketching is a rapid process during which ideas

can evolve from conception to completion in a matter of

minutes and access to drawing functionality therefore needs

to be rapid. Most solutions to this problem relate to provid-

ing palettes of widgets, which are always available, but this

then exacerbates the second problem of lack of screen space.

This becomes a major problem, highlighted by designers,

because they then can spend signi®cant amounts of time

scrolling around their image, as it does not ®t on the screen.

Driven by these two principles, the initial screen layout

for Speak 'n' Sketch does not have anything visible that is

not currently being used. All functionality can be accessed

via voice commands and pop-up menus. A layering system

has been implemented to simulate tracing behaviour (see

Fig. 5). As layers are added, the lines on the lower layer

become progressively lighter in colour thus allowing the


Fig. 4. Abstraction levels and their associated methods for character and illustration sketching.

artist to trace over a rougher version of the image. The

layers can be navigated by clicking on their associated tab

with the stylus. These tabs also become lighter in colour the

further down the pile they become. A thick black line

outlines the topmost active one. Up to ®ve different sheets

can be visible at a time, but each can be arbitrarily hidden or

displayed by toggling the check box on the tab. Although

layering is used in applications like Photoshop, they do not

have any opacity at all so lower layers are indistinguishable

from the higher ones and hiding individual layers is a multi-

step process. The drawn images are treated as vectors,

which means that rather than being represented as a collec-

tion of pixels the image is represented as a set of lines. Each

line is an arbitrarily shaped ®lled polygon than can be indi-

vidually selected, translated, rotated and scaled. Lines can

be grouped together and treated as single entities that can

also be moved, rotated and scaled. Whenever an object is

selected, the corners of the bounding box are drawn in,

either as circular handles for rotation or edged corners for

scaling. The user can grab one of these handles by pressing

down on them and dragging.

Resizing is a common manipulation performed by illus-

trators and graphic designers. Photocopiers are used to

enlarge or reduce images and then they are glued onto a

new drawing. In addition to resizing an image for composi-

tional purposes, the scaling option provides a facility for

working on details of a drawing whereby a user can quickly

enlarge an image, work on the ®ne detail and then scale it

back down to the desired size. The direct manipulation of

the translation and rotation also mimics to some degree the

type of paper jiggling observed in artists drawing very

dynamic, active scenes. In bitmap-based packages such as

PhotoShop, this type of direct manipulation is not possible.

In order to rotate or scale parts of an image, the area must


Fig. 5. The Speak 'n' Sketch interface.

®rst be selected and then a dialogue sequence entered into in

which the user must specify the angle and direction of the

rotation or the scaling ratio.

The pop-up menus are implemented in a radial format

rather than the traditional linear one. Early research has

demonstrated that selection from radial menus is much

quicker than from linear ones [4]. Furthermore, because

the selection is based on broad angles rather than the precise

distance travelled down a linear path, commands can often

be executed from a radial menu without necessarily attend-

ing to it visually. Submenus are displayed by selecting a

node in a parent menu, similar to the way in which nested

menus work. The submenus are also radial so a complete

selection from hierarchical pie menus will consist of a series

of strokes with angles to differentiate them. Further inves-

tigation into the design of pie menus has been conducted by

Kurtenbach and Buxton [13,14]. In this work the concept of

a radial menu has been pared down even further by introdu-

cing the idea of marks to make menu selections. Essentially,

once the user is familiar with the location of menu items, it

no longer becomes necessary to display the entire menu and

selections can simply be made by making the appropriate

gesture. As a minimum con®rmation of an action being

performed when the menu is not displayed it is still helpful

for the user to see an ink trail of the cursor's movement [13].

This is referred to as marking mode. In this mode, there is

virtually nothing obstructing the view of the drawing, again

reducing visual interference to the absolute minimum. Fig. 6

shows pie menus in both regular and marking modes. The

design of the menus in Speak 'n' Sketch are based on the

guidelines for radial marking menus established by Kurten-

bach [13].

All of the commands that can be executed with the menus

can also be performed with verbal input with the added

advantage that the hierarchical structure of the menus

does not have to be navigated. For instance, in order to

obtain a thicker brush stroke, the user has to say only the

phrase ªthicker penº rather than saying ªthickness menuº

followed by thicker pen. By ¯attening out the grouped struc-

ture of the command set, voice input permits a more direct

and less tedious dialogue sequence. Another advantage is

that simultaneous actions can be performed by using multi-

modal input. For example, the user can be rotating an object

using a stylus and change the colour of it using verbal

commands; an unwanted layer can be removed with the

menu commands while a new one is added with a voice

command, etc.

One of the design issues that must be addressed while

designing voice activated systems is the question of how

to provide adequate feedback to the user. A simple solution

is to provide a status bar that con®rms their request or

informs them that their command has been understood.

Another solution would be to provide auditory feedback.

A status bar was provided in Speak 'n' Sketch on the

basis that this was likely to be less intrusive. However,

adequate studies have not been conducted yet to support

or refute this assumption, and it also could be a question

of personal preference.

The system design went through several iterations to take

into account feedback from professionals with artistic and

creative backgrounds who had different degrees of experi-

ence with technology. In summary, changes were made to

how the pop-up menus were activated, to the method

for hiding and showing individual layers and to shape

manipulation.

5. User studies

Informal evaluations with a small number of users have

been carried out on the redesigned system. Whilst we would

not want to claim that the results are rigorous and widely

generalisable, they do provide valuable data and lessons

learned about designing tool support for creative tasks.

First, assumptions about the nature of the task, i.e. speed

of drawing, and the implications for system performance are

discussed. Second, the study and partial results are

described and ®nally alternative ways of conducting the

evaluations are presented.

The perception of the analyst in observing the sketching

tasks being undertaken was that the task consisted of very

rapid strokes. This rapidity was supported by the system and

all system testing was based on this assumption. When

subjects were given a real task in the evaluation studies,

they drew far more slowly than expected. This difference

could be either due to an illusion, an unintended and percep-

tual exaggeration on the part of the analyst observer. Alter-

natively, the demand characteristics of the task and task

setting resulted in slower than normal performance from


Fig. 6. Pie menus: (a) in regular mode; and (b) in marking mode.

the artists and designers. The result was that the internal

representation of shapes contained several orders of magni-

tude, more points than was envisaged, and system perfor-

mance slowed accordingly. The speed of drawing was still

reasonable, but object manipulation became dif®cult.

System performance was therefore a signi®cant usability

problem.

There were a number of goals of the study: to investigate

the ways in which subjects evolved their character ideas via

sketching using traditional materials and the designed tool;

to examine the role of multimodal input in the context of a

creative task; and to consider the design issues related to

speech input and multimodal systems, in general. Because

of the slow performance of the tool, it was not realistic to

undertake experimental comparisons of traditional media

and the designed tool. Rather, it was decided to carry out

in-depth studies with a few subjects, allowing them to

undertake realistic sketching tasks and provide useful quali-

tative feedback about the design of the system.

The studies consisted of giving subjects a ®ctional char-

acter description for ªRoderickÐthe amazing daredevil

frogº whom they were then asked to represent in one of

his daredevil activities for a promotional poster. The type

of task was derived from examples given by professional

illustrators and animators. In addition, it is very common to

use reference materials during such a task and so they were

also provided with several images that could be referred to

or traced over. Prior to the task, a brief demonstration of the

system was carried out and the functionality of the marking

menus explained. The subjects were then given time to

familiarise themselves with the system, and to execute

small subtasks such as drawing lines, grouping, changing

colours, etc. They were then asked to dictate a series of

commands to the speech recognition system. This had the

dual purpose of ®ne-tuning the voice recognition to their

voice, and familiarising them with the command set. After

these preliminaries, they were asked to create a representa-

tion of Roderick the frog. During this phase they were

videotaped, with the intention that they may provide a retro-

spective protocol regarding the evolution of their concepts

later. This technique was used in a similar sketching study

by Suwa and Tversky [21]. Prior to their study, the most

common technique was to ask the designer to give a running

commentary or concurrent protocol. This obviously will

affect the ability to do the task [8] and Speak 'n' Sketch

would be unable to differentiate speech intended as input

from that intended as a commentary.

5.1. Results

The ®rst subject was a professional illustrator who

worked exclusively with traditional media (paper and

pencils). Despite initial apprehension at using the new tech-

nology, he adapted to the use of the tablet and stylus remark-

ably quickly. He undertook a number of different sketches to

get the composition of the scene, to get a sense of what

Roderick's face would be like and the later sketches

produced were more ®nal renditions of the scene. Because

he was accustomed to using only traditional media, he did

not seem to need to make use of any of the transformation

functions that were available. In his particular method of

working, these type of manipulations come towards the end

of the task rather than in the early stages. Hence he did not

attempt to group objects together, resize, rotate or translate

them. However, the colour palette of progressively light

and dark shades was particularly useful for his style of

sketching.

The second subject was a professional animator who

frequently designed web-based animations. She was highly

experienced with a tablet and stylus and was accustomed to

operating a tablet and keyboard simultaneously. Although

she attempted to perform transformations on the character,

performance problems ensued and she subsequently

con®ned her activity to drawing.

The third subject was a professional graphics designer

who was highly experienced in a wide range of drawing

packages and taught a course in computer graphic design

at a college of art. She had used tablets before, but primarily

used the keyboard and mouse in her work. Her version of

Roderick went through less iteration than that of the ®rst

subject, and she made much more use of rotation and scaling

operations. Her initial image of the frog was drawn with him

facing upwards because she found him easier to draw that

way. He was then rotated and resized smaller until he was

facing downwards and a parachute was added. She did not

make use of the layer functionality.

6. Discussion and conclusions

Performing simple tasks gave the subjects enough experi-

ence of the system to be able to comment positively and

negatively on the design features of Speak 'n' Sketch. All of

the subjects agreed that the design of the layering system

was very effective. They found that the tracing paper effect

was very useful for redrawing purposes and it helped to

know which layer was currently being worked upon. The

speech input was also met with great enthusiasm. In parti-

cular, subjects who used computers regularly in their work

were convinced that this would increase their productivity

because it was faster and ªallowed continuity of thoughtº.

ªIt means you don't have to stop and think about where you

have to goº commented one subject. In a similar vein,

another subject noted that ªI think those words a hundred

times a dayÐgroup, optimise, rotate. It would be so good to

just say it as soon as you think itº. These remarks seem to

suggest that in addition to speeding up the execution of

functions, verbally expressed commands also reduced

cognitive processing. We have to be careful, however, on

the basis of qualitative data and subjective comments, with-

out the bene®t of cognitive modelling and performance data,

to claim that the transition from thought to speech is more


direct than thought to hand movement. Further studies

combining the use of task analytic techniques such as

Task Knowledge Structures (TKS) [12], and cognitive

modelling of mental resources such as in Interacting Cogni-

tive Subsystems (ICS) [1], would aid in addressing this

question.

The subject who depended on a tablet for her work

remarked that the angle at which she worked in order to

operate both tablet and keyboard caused shoulder and

back strain. Voice input would alleviate this problem. She

felt that at a minimum voice input had the same function-

ality as keyboard shortcuts and would improve the quality of

her work. The disadvantage, however, of the voice input and

one often referred to in studies of multimodal input, is the

disruption to other workers in situations like open studios.

The disruption whilst real and signi®cant is no worse than

someone talking on the telephone. In fact it could be consid-

ered less disruptive in that to people not involved in the task,

the vocalisations consist of random (to them) commands out

of context and not potentially interesting conversations.

All of the users wanted more palettes available to them as

space in Speak 'n' Sketch was much more than in commer-

cially available packages where palettes could be a problem.

Certain functionality was immediately needed to be in one

easy click such as changing pen size. The inexperienced

computer user made the analogy between picking up a

new pencil and clicking on a pen-size palette. He considered

this to be more natural than moving through a menu.

The radial menus received mixed reviews, two people

said they would use them in a real system, but the third

preferred keyboard shortcuts. Her rationale was that the

hand not being used for drawing could be used for the

keyboard commands. There is a deep issue here about the

degree of disruption and the cognitive demands that relate to

keyboard shortcuts and radial menus. This needs further

systematic investigation.

The evaluation had to be of an informal nature because it

would have been unrealistic to expect users to quickly learn a

new system, adapt to voice input, adapt to a new type of menu

and still expect them to be able to draw in their preferred

manner, naturally and easily. There were many new features

of the implementation; whilst this allowed us to achieve feed-

back on a number of design issues and choices, it made it

impossible to isolate the effects of individual design

features as opposed to their use in combination. This

would have been necessary in any systematic experi-

mental manipulation of the factors for the purposes of

validating the psychological models of sketching,

comparing task performance and quality in traditional

and computer supported contexts, and in assessing the

behavioural and performance factors associated with the

cognitive load of different use of modalities.

A longitudinal study also would be necessary to under-

stand how the tasks evolved through long term use of

computer support. Some features might prove useful,

unnecessary or need to be modi®ed or enhanced.

In future research, we aim to use extensions to the

implemented system as a test-bed for undertaking further

studies, where we will isolate the effects of speech input

and changes to the menu system over the short and long

term. We also intend to undertake further, theoretical

and empirical research into identifying, measuring and

assessing the impact on performance of different cogni-

tive resources used in interacting with different modal-

ities. The ultimate research aim is to develop principles

for multimodal tools to support highly creative and

iterative work tasks.

Acknowledgements

We are grateful to the British Council for funding this

research. Many thanks to Justin Wyatt for contributing all

the sketches in this paper. Walt Disney Co copyrights all

sketches.

References

[1] P.J. Barnard, J. May, Representing cognitive activity in complex

tasks, Human Computer Interaction (1999) 14.

[2] M. Boden, The Creative Mind: Myths and Mechanisms, Weiden®eld

& Nicholson, London, 1992.

[3] S. Bhavnani, et al., CAD usage in an architectural of®ce: from obser-

vations to active assistance, Automation in Construction 5 (3) (1996)

243±255.

[4] J. Callahan, et al., An empirical study of pie vs linear menus,

CHI88,1988, pp. 95±100.

[5] L. Candy, E.A. Edmonds, Supporting the creative user: a criteria-

based approach to interaction design, Design Studies 18 (2) (1997)

184±194.

[6] P. Cohen, The role of natural language in a multimodal interface,

UIST, 1992, pp. 143±149.

[7] E.Y. Do, M. Gross, Drawing as a means to design reasoning, Arti®cial

Intelligence in Design '96, 1996, pp. 22±27.

[8] K.A. Ericsson, H.A. Simon, Protocol Analysis: Verbal Reports as

Data, MIT Press, Cambridge, MA, USA, 1984.

[9] G. Goldschmidt, The dialectics of sketching, Creativity Research

Journal 4 (2) (1991) 369±383.

[10] M. Gross, The electronic cocktail napkin: a computational environ-

ment for working with design diagrams, Design Studies 17 (1) (1996)

53±69.

[11] A. Hauptmann, Speech and gesture for graphic image manipulation,

CHI89, 1989, pp. 241±245.

[12] H. Johnson, P. Johnson, Task knowledge structures: psychological

basis and integration into system design, Acta Psychologica 78

(1991) 3±26.

[13] G. Kurtenbach, Some articulatory and cognitive aspects of marking

menus: an empirical study, Human Computer Interaction 8 (2) (1993)

1±23.

[14] G. Kurtenbach, W. Buxton, The limits of expert performance using

hierarchic marking menus, CHI93, 1993.

[15] B. Lawson, How Designers Think: the Design Process Demysti®ed,

Butterworths, London, 1990.

[16] C. Mignot, N. Carbonell, Commande orale et gestuelle: etude empiri-

que, Technique et Science Informatiques 15 (10) (1996) 1399±1428.

[17] S. Oviatt, Multimodal interfaces for dynamic interactive maps,

CHI96, 1996.


[18] R. Oxman, Design by re-representation: a model of visual reasoning

in design, Design Studies 18 (4) (1997) 329±347.

[19] D. Schon, Designing as re¯ective conversation with the materials of

the design situation, Knowledge Based Systems 5 (1992) 3.

[20] J. Sedivy, Multimodal tool support for sketching, QMW Technical

Report, 1998.

[21] M. Suwa, B. Tversky, What do architects and students perceive in

their design sketches? A protocol analysis, Design Studies 18 (4)

(1997) 385±403.

[22] M.T. Vo, A.Waibel, A multimodal human computer interface: combi-

nation of speech and gesture recognition, InterCHI93, 1993.


multimodal tool support for creative tasks in the visual arts

Documents