multimodal tool support for creative tasks in the visual arts
TRANSCRIPT
Multimodal tool support for creative tasks in the visual artsq
J. Sedivya,1, H. Johnsonb,*
aTelepresence Systems Inc., 300-8 Market Street, Toronto, Canada M5E 1W5bMathematical Sciences, University of Bath, Claverton Down, Bath BA2 7AY, UK
Abstract
This paper presents an investigation into computer tool support for creative tasks in the visual arts, in particular sketching. The research
draws from both theoretical models of creative processes and empirical observations of sketching activity in the domain of character
development in animation and illustration. These sources show that creative problem solving through sketching is a highly iterative process
during which various constraints are rapidly evolving and being adapted to. A set of requirements for an experimental, multimodal sketching
tool was developed, drawing from our theoretical and empirical studies. In order to support the intense, rapid nature of the task, voice input
was implemented allowing the user to access functionality without interrupting the hand activity. The multimodal tool also included radial
marking menus in order to provide rapid navigation to functionality. Informal evaluations were conducted that provided qualitative feedback
about the use of the system. q 2000 Elsevier Science B.V. All rights reserved.
Keywords: Sketching; Multimodal tools; Creativity
1. Introduction
The goal of the research to be reported in this paper was
to investigate ways in which creative tasks in the visual arts,
in particular sketching, could be supported by computer
tools. There were two initial aims of the research; to review
literature related to both psychological and philosophical
models of sketching and to identify the feasibility of
employing multimodal input as a means of supporting the
rapid dialogue between the artist and the sketch.
Investigation into a variety of domains was undertaken in
order to make decisions about which task categories to
support, and to analyse the requirements that any proposed
sketching support tool should meet. There were two impor-
tant facets of the research; collecting, analysing and modelling
data about creative sketching tasks and discussing the possibi-
lities for computer tool support with professionals in the ®eld.
The speci®c domain chosen to be supported by the proposed
multimodal tool was character development in animation and
illustration. As a result of the observational and modelling
phases, a multimodal vector-based drawing program was
developed and informally tested on a number of experienced,
professional users. The intention was that the informal evalua-
tions would provide information about design issues concern-
ing speech input for creative work tasks and that the sketching
tool could be used as a test-bed for further research on the role
of input modality on sketching.
The informal evaluations revealed the dif®culty subjects
encountered when using new forms of input, types of menu
and new tools and alternative approaches to evaluation are
suggested.
2. Creativity
Creativity can refer to various problem solving, non-linear
thinking or inspirational creativity (from the Muses). The term
can be applied to thought processes in the arts or in the
sciences. Creativity is not easy to de®ne, but can be construed
(cf. Refs. [2,15]) as a process whereby there is an evolution
towards a solution to a problem, which makes use of a combi-
nation of logical and illogical mechanisms. When faced with
any creative task or problem, it is essential to be able to inves-
tigate a variety of alternative solutions and this process can
outwardly vary across disciplines. For example, scientists
might use diagrams and equations written on a chalk board
while musicians might doodle on the keyboard. In the case of
creativity in the visual arts, the production of rough sketches in
the initial stages of design is extremely common. Perceptually
and physically these three activities are quite different and yet
Knowledge-Based Systems 13 (2000) 441±450
0950-7051/00/$ - see front matter q 2000 Elsevier Science B.V. All rights reserved.
PII: S0950-7051(00)00064-2
www.elsevier.com/locate/knosys
* Corresponding author. Tel.: 1 41-122-532-3215; fax: 1 41-122-582-
6492.
E-mail addresses: [email protected] (J. Sedivys),
[email protected] (H. Johnson).1 Tel.: 1 1-416-777-1177, ext. 32; fax: 1 1-416-777-1188.q Derived from `Supporting creative work tasks: The potential of multi-
modal tools to support sketching', published in the Proceedings of the Third
Conference on Creativity and Cognition, Loughborough, UK, October 10±
13, 1999, pp 42±49. Reproduced with permission from ACM q 1999.
they all serve a similar purpose in the creative process. They
all can be seen as external representations of ideas that can
be manipulated, modi®ed, communicated and used to stimu-
late the development of new ideas.
From the perspective of providing tool support for crea-
tivity, it is important to understand the relationship between
the designer and the external representations they employ
and for what purpose. Within the domain of visual arts,
sketches and diagrams are the most common means by
which ideas can be explored; therefore, it is necessary to
analyse the activity of sketching.
3. Sketching
3.1. Psychological models of sketching
The relationship between the physical act of sketching,
the sketched images, and the cognitive processes of the
artists themselves is frequently described as a ªconversa-
tionº between the drawing and the artist [15,19,21]. In creat-
ing a sketch, designers are making an explicit representation
of their ideas, which then aids in further reasoning about the
problem. It is useful in this context to distinguish between
sketches that are used to explore ideas and therefore are a
means to an end and the drawings that are the end products
of the sketching process. ªReasoningº through the initial
sketches need not necessarily be logical, but can be ªloosely
structuredº or analogical. Moreover, different perceptual
ªmodesº can be stimulated by the same drawing. For exam-
ple, Goldschmidt [9] de®nes two different states, which she
refers to as seeing as and seeing that. Seeing as refers to the
process of using ª®guralº thinking while sketching and
seeing that refers to the use of non-®gural elements to reason
about the design.
Suwa and Tversky [21], in analysing how architects
perceive their sketches, devised a taxonomy of information
categories that could be used to classify what designers
described about their drawings. They further decompose
Goldschmidt's seeing as mode into emergent properties,
spatial and functional relations and background knowledge.
These categories were then used to de®ne chunks of infor-
mation that were perceived by the designers. Their analysis
revealed that detailed consideration of topics related to a
particular idea occurred during examination of local spatial
relations in a sketch.
Oxman [18] concentrates less on the data perceived by
designers than on the process of design itself. She also
argues that there are multiple abstracted representations
that the designer can extract from the sketches. These
abstractions are broadly divided into three categories: typo-
logical schema, topology and formal systems. Each of these
abstraction categories are associated with operational meth-
ods that can be observed as sketching actions such as re®ne-
ment, generalisation, scaling or symmetry operations. Her
model of visual reasoning in sketching is based on the re-
representation hypothesis, which de®nes creativity as using
a cycle of re-representations as a means for conceptual
exploration. She extends this theory by arguing that re-
representations are driven by external and internal
constraints and that the new designs adapt to satisfy these
constraints. The adaptation is achieved through the
designer's perception of the different abstraction levels
and the execution of associated operations. The resulting
model of re-representation is described as a cyclical activity
that passes through a series of distinct stages before repeat-
ing itself.
3.2. Character sketching for book illustration
The tasks to be considered in depth relate to character
development in illustration or animation. For the sake of
brevity, only our studies of illustrating will be outlined
here. Illustration, for instance in a book, can involve either
J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450442
Fig. 1. First two renderings of a Winnie the Pooh scene.
the creation of new visual representations of ®ctional char-
acters or incorporating prede®ned personas into a scene.
Unless the author and the artist work in close collaboration,
it is typical for illustrators to receive a copy of the story text
and a short description of what should be depicted on each
page. It is then left to the artist's discretion as to the overall
composition of the picture and the precise positioning of the
characters relative to each other. This decision is based on
the illustrator's knowledge of what is happening in the story
and knowledge of the general style expected by the story-
writer. In the case of small independent writers, this may be
an informal description or in the case of large clients such as
Disney or Warner Brothers, there may be strict style guides.
Further mitigating this decision is the knowledge about
established rules of the way compositional elements lead
the eye around the page so that it focuses on the desired
element and an understanding of established rules referring
to the way compositional elements contribute to dramatic
feeling.
Initially, the illustrator works out different ideas for the
composition as thumbnail sketches. These are done with
thick, soft pencils in a light colour. As the ideas for the
composition become more re®ned, darker colours and thin-
ner pencils are used to draw over the existing drawing. Also
as the ideas become more re®ned, the drawing is redone on a
larger scale so that smaller details can be worked out.
Throughout the sketching process, the artist pauses to rotate
the paper at different angles or tilts their head to get a
different perspective. While drawing a scene that involved
a lot of movement and characters jumping around, one artist
jiggled the paper up and down to aid in visualising how the
motion could appear.
This iterative process can be seen in the sketch samples in
Figs. 1 and 3. In the ®rst image (Fig. 1a), the artist has
roughly laid out the appearance of the scene and the approx-
imate positioning of the characters. At this stage, the char-
acters are not recognisable, they merely serve as
compositional elements. The thickness and `fuzziness' of
the lines are very deliberate because they allow the artist
to imagine more than one possibility for a particular
element. In fact, several scene elements have been drawn
in more than one position for the purpose of experimenting
with the scene. For example, the leftmost character (Tigger)
and the rightmost character (Christopher Robin) have been
drawn both with their arms at their side and with one arm
waving in greeting. Rubbers are never used. If a ªmistakeº is
made, it is simply drawn over in a darker pencil or traced
over on a new piece of tracing paper. At the early stages of
the sketch, however, the artist does not consider these as
mistakes, rather they are experiments with different compo-
sitions and poses. In many situations, the illustrator prefers
to have both representations visible to him so that he can
evaluate their dramatic and compositional impact, simulta-
neously. It is also common to experiment with elements of a
scene in whatever space is available on the drawing surface.
Fig. 3 shows a part of an illustration where the details of a
character's feet are experimented with in the margins.
Once the composition of the scene has been decided, the
artist places a sheet of tracing paper over it and roughly
retraces the scene that was last drawn in the darkest colours
(Fig. 1b). In this iteration, detailed features of the characters
and their facial expressions are developed on a larger scale
than the initial thumbnail. The peripheral elements of the
scene can be ignored for the moment, but are revisited later
for the ®nal image, which integrates the story text with the
illustration (Fig. 2). At these later stages, the artist might
consult a book of style guides provided by the client to make
sure that the proportions have been drawn correctly and that
J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450 443
Fig. 3. Experimenting with details in the margins.
Fig. 2. Final version of the Winnie the Pooh scene.
the movements and facial expressions are ªin characterº. As
the drawing nears completion, the image is photocopied
onto the page containing the story text to ensure that no
crucial elements of the drawing interfere with the text or
the spine of the book.
A photocopier is a common tool during the later stages in
the drawing, when the composition is being ®nalised.
Images, or parts of images are often enlarged or reduced
then cut out and glued into a new drawing. For example, the
artist might especially like the way a particular character
was drawn, but for its wrong size. This provides a clue as to
the types of operations (scaling, copying, rotations) that
could be useful to support in a sketching tool. These opera-
tions are clearly dif®cult for humans to perform since an
accomplished artist would rather walk over to another
room, make an enlarged/reduced photocopy, cut out an
image and paste it into a scene, simply to save the effort
of redrawing a precise copy in a different size.
3.3. Tool support for sketching
The studies reported in Section 3.2 clearly indicate the
iterative, manipulative and reasoning elements of represent-
ing and re-representing ideas for the purposes of solving
creative design problems. We need to know if tools can
aid this activity and what current computer tool solutions
exist.
Although there are a wide variety of powerful CAD tools
to assist architects and graphics designers to create repre-
sentations of their ®nal designs, very little exists in the way
of support for the earlier stages of the design process. In
fact, some of the tools can actually hamper creativity [3]. It
is crucial from an HCI perspective to consider which part of
the activity or process of sketching should be supported.
There are broadly two possibilities. The ®rst relates to
supporting the designer with the representation process
itself on an abstract level. Suggesting new or alternative
representations, de®ning constraints, or assisting the user
in lateral thinking can do this. This approach has been
explored extensively by Candy and Edmonds [5], Suwa
and Tversky [21] and Gross [7,10] amongst others. The
second possibility relates to supporting only the physical
act of sketching. Under this paradigm, creativity is solely
the responsibility of the human using the tool. Support for
this level of the creative process has been relatively unex-
plored. A notable exception is the Electronic Cocktail
Napkin [10], which provides some support for drawing
management and allows users to sketch diagrams using a
stylus input. However, because most of the tool's features
are focused towards assisting the user in de®ning spatial
relationships and constraints, it falls more readily into the
®rst category.
Our approach was to explore the potential of aiding the
physical act of sketching thus supporting what Oxman terms
ªoperational methodsº rather than ªabstraction levelsº. How
can this be accomplished?
Since sketching consists of intense activity for the hands,
speech input could be used potentially to manipulate a
sketch. Speech input, in lieu of commands expressed
through traditional menu bars, would leave the hands free
to work on the art and for experienced users also save time.
A mixture of speech and gesture is also a natural combina-
tion for certain types of expression that are dif®cult to
express through menus. Speech input alone has a number
of potential disadvantages, for instance, how is the novice
user given information about what commands are available
and how will they understand the structure of the command
set? An additional problem relates to the fact that we do not
know whether issuing linguistic commands will interfere
with conceptual thought processes, as in the Stroop effect
in psychology. It is therefore considered appropriate to
combine the respective strengths of two modes of input,
menu-driven and verbal input.
The view that speech and gestures are complementary
modes of communication rather than redundant is supported
in HCI studies by Oviatt [17], Mignot and Carbonell [16],
Cohen [6] and Hauptmann [11]. This notion also has formed
the basis of the architecture underlying the multimodal soft-
ware framework proposed by Vo and Waibel [22]. Proto-
type applications that use multimodal input have been
constructed successfully in recent years, but there is scant
evidence from user testing that the integration of input
modes was successful from a usability perspective as
opposed to a purely technical perspective. This means that
there is a lack of design principles or heuristics to guide
designers in constructing such systems.
4. Towards a designed solution
A ®rst step towards a designed solution is to collect,
analyse and model data about how designers undertake
the creative activity of sketching. This will provide, along
with the models of sketching outlined earlier in the paper, a
basis for identifying user requirements to be satis®ed by the
tool. The task was speci®c enough to identify relatively,
narrowly de®ned user requirements; but the sketching activ-
ity inherent in it is general enough for the results of the user
studies to be applied to other sketching applications.
Based on our ®rsthand observations, the sketching task
was shown to be highly iterative in nature and the psycho-
logical model of sketching proposed by Oxman referred to
previously was found, with modi®cation, to be an appropri-
ate representation of the activity of sketching observed and
described above. Although the test subjects in Oxman's
study were architects rather than animators, it was found
that there were similarities in the process of sketching across
the domains and only minor adjustments to the model were
required.
The abstraction levels of typological schema, topology
and a formal system still apply in this domain, however,
the associated operational methods are slightly different.
J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450444
The most signi®cant difference between the domain of
architectural sketches and character illustration appear to
be in the abstraction level of the formal system. In the latter
domain, the conceptual components tended to include
elements such as compositional rules, perspective and
style guides rather than the central, linear and axial concepts
employed by architects. In addition, the operational
elements differed between the domains. In particular,
symmetry operations were not observed, but translation,
rotation, scaling and zooming were commonplace. Fig. 4
shows Oxman's model [18], with modi®cations appropriate
to our domain. For a full account of the modelling process
and results, see Ref. [20].
The sources for the requirements consist of the observed
task behaviour by subjects and from the relevant literature.
In addition, the requirements bene®t from informal discus-
sions with designers and informal evaluations with existing
technology.
4.1. Requirements
The main requirement is to support a rapid iteration of
design ideas. This requirement appears to be of fundamental
importance and can be achieved through a number of design
features:
1. Allow access to functionality as quickly as possible by
ensuring dialogues and command sequences are kept as
short as possible.
2. Support the use of layers. Transparent layers allow
designers to create a modi®ed version of a previous
sketch, tracing paper is frequently used and could provide
a useful metaphor for a sketching program. The layers
should be added, removed, shown or hidden very easily.
3. Support a variety of pencil thicknesses and colour inten-
sities. All the designers, who used pencils to sketch, used
either different sized or coloured pencils or used varying
pressures to achieve the same effect.
4. Allow users to perform manipulations on their drawings.
It should be easy to move and manipulate lines. The
transformations to be supported should be rotation, scal-
ing, insertion and deletion.
5. Support the expected aesthetic. Users of design packages
now expect images to be anti-aliased so that their work
does not appear computer drawn.
6. Provide as much space on the screen as possible for
drawing. Several users complained that the assortment
of palettes and widgets on the edges of their screen
took up too much space.
4.2. System design
Stemming from the requirements are two fundamental
design principles that underlie the implementation of
Speak 'n' Sketch. The ®rst is to minimise the time spent
in accessing functionality so that as much time as possible is
spent actually drawing. The second is to make the screen as
uncluttered as possible, making maximum space for
drawing.
Rough sketching is a rapid process during which ideas
can evolve from conception to completion in a matter of
minutes and access to drawing functionality therefore needs
to be rapid. Most solutions to this problem relate to provid-
ing palettes of widgets, which are always available, but this
then exacerbates the second problem of lack of screen space.
This becomes a major problem, highlighted by designers,
because they then can spend signi®cant amounts of time
scrolling around their image, as it does not ®t on the screen.
Driven by these two principles, the initial screen layout
for Speak 'n' Sketch does not have anything visible that is
not currently being used. All functionality can be accessed
via voice commands and pop-up menus. A layering system
has been implemented to simulate tracing behaviour (see
Fig. 5). As layers are added, the lines on the lower layer
become progressively lighter in colour thus allowing the
J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450 445
Fig. 4. Abstraction levels and their associated methods for character and illustration sketching.
artist to trace over a rougher version of the image. The
layers can be navigated by clicking on their associated tab
with the stylus. These tabs also become lighter in colour the
further down the pile they become. A thick black line
outlines the topmost active one. Up to ®ve different sheets
can be visible at a time, but each can be arbitrarily hidden or
displayed by toggling the check box on the tab. Although
layering is used in applications like Photoshop, they do not
have any opacity at all so lower layers are indistinguishable
from the higher ones and hiding individual layers is a multi-
step process. The drawn images are treated as vectors,
which means that rather than being represented as a collec-
tion of pixels the image is represented as a set of lines. Each
line is an arbitrarily shaped ®lled polygon than can be indi-
vidually selected, translated, rotated and scaled. Lines can
be grouped together and treated as single entities that can
also be moved, rotated and scaled. Whenever an object is
selected, the corners of the bounding box are drawn in,
either as circular handles for rotation or edged corners for
scaling. The user can grab one of these handles by pressing
down on them and dragging.
Resizing is a common manipulation performed by illus-
trators and graphic designers. Photocopiers are used to
enlarge or reduce images and then they are glued onto a
new drawing. In addition to resizing an image for composi-
tional purposes, the scaling option provides a facility for
working on details of a drawing whereby a user can quickly
enlarge an image, work on the ®ne detail and then scale it
back down to the desired size. The direct manipulation of
the translation and rotation also mimics to some degree the
type of paper jiggling observed in artists drawing very
dynamic, active scenes. In bitmap-based packages such as
PhotoShop, this type of direct manipulation is not possible.
In order to rotate or scale parts of an image, the area must
J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450446
Fig. 5. The Speak 'n' Sketch interface.
®rst be selected and then a dialogue sequence entered into in
which the user must specify the angle and direction of the
rotation or the scaling ratio.
The pop-up menus are implemented in a radial format
rather than the traditional linear one. Early research has
demonstrated that selection from radial menus is much
quicker than from linear ones [4]. Furthermore, because
the selection is based on broad angles rather than the precise
distance travelled down a linear path, commands can often
be executed from a radial menu without necessarily attend-
ing to it visually. Submenus are displayed by selecting a
node in a parent menu, similar to the way in which nested
menus work. The submenus are also radial so a complete
selection from hierarchical pie menus will consist of a series
of strokes with angles to differentiate them. Further inves-
tigation into the design of pie menus has been conducted by
Kurtenbach and Buxton [13,14]. In this work the concept of
a radial menu has been pared down even further by introdu-
cing the idea of marks to make menu selections. Essentially,
once the user is familiar with the location of menu items, it
no longer becomes necessary to display the entire menu and
selections can simply be made by making the appropriate
gesture. As a minimum con®rmation of an action being
performed when the menu is not displayed it is still helpful
for the user to see an ink trail of the cursor's movement [13].
This is referred to as marking mode. In this mode, there is
virtually nothing obstructing the view of the drawing, again
reducing visual interference to the absolute minimum. Fig. 6
shows pie menus in both regular and marking modes. The
design of the menus in Speak 'n' Sketch are based on the
guidelines for radial marking menus established by Kurten-
bach [13].
All of the commands that can be executed with the menus
can also be performed with verbal input with the added
advantage that the hierarchical structure of the menus
does not have to be navigated. For instance, in order to
obtain a thicker brush stroke, the user has to say only the
phrase ªthicker penº rather than saying ªthickness menuº
followed by thicker pen. By ¯attening out the grouped struc-
ture of the command set, voice input permits a more direct
and less tedious dialogue sequence. Another advantage is
that simultaneous actions can be performed by using multi-
modal input. For example, the user can be rotating an object
using a stylus and change the colour of it using verbal
commands; an unwanted layer can be removed with the
menu commands while a new one is added with a voice
command, etc.
One of the design issues that must be addressed while
designing voice activated systems is the question of how
to provide adequate feedback to the user. A simple solution
is to provide a status bar that con®rms their request or
informs them that their command has been understood.
Another solution would be to provide auditory feedback.
A status bar was provided in Speak 'n' Sketch on the
basis that this was likely to be less intrusive. However,
adequate studies have not been conducted yet to support
or refute this assumption, and it also could be a question
of personal preference.
The system design went through several iterations to take
into account feedback from professionals with artistic and
creative backgrounds who had different degrees of experi-
ence with technology. In summary, changes were made to
how the pop-up menus were activated, to the method
for hiding and showing individual layers and to shape
manipulation.
5. User studies
Informal evaluations with a small number of users have
been carried out on the redesigned system. Whilst we would
not want to claim that the results are rigorous and widely
generalisable, they do provide valuable data and lessons
learned about designing tool support for creative tasks.
First, assumptions about the nature of the task, i.e. speed
of drawing, and the implications for system performance are
discussed. Second, the study and partial results are
described and ®nally alternative ways of conducting the
evaluations are presented.
The perception of the analyst in observing the sketching
tasks being undertaken was that the task consisted of very
rapid strokes. This rapidity was supported by the system and
all system testing was based on this assumption. When
subjects were given a real task in the evaluation studies,
they drew far more slowly than expected. This difference
could be either due to an illusion, an unintended and percep-
tual exaggeration on the part of the analyst observer. Alter-
natively, the demand characteristics of the task and task
setting resulted in slower than normal performance from
J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450 447
Fig. 6. Pie menus: (a) in regular mode; and (b) in marking mode.
the artists and designers. The result was that the internal
representation of shapes contained several orders of magni-
tude, more points than was envisaged, and system perfor-
mance slowed accordingly. The speed of drawing was still
reasonable, but object manipulation became dif®cult.
System performance was therefore a signi®cant usability
problem.
There were a number of goals of the study: to investigate
the ways in which subjects evolved their character ideas via
sketching using traditional materials and the designed tool;
to examine the role of multimodal input in the context of a
creative task; and to consider the design issues related to
speech input and multimodal systems, in general. Because
of the slow performance of the tool, it was not realistic to
undertake experimental comparisons of traditional media
and the designed tool. Rather, it was decided to carry out
in-depth studies with a few subjects, allowing them to
undertake realistic sketching tasks and provide useful quali-
tative feedback about the design of the system.
The studies consisted of giving subjects a ®ctional char-
acter description for ªRoderickÐthe amazing daredevil
frogº whom they were then asked to represent in one of
his daredevil activities for a promotional poster. The type
of task was derived from examples given by professional
illustrators and animators. In addition, it is very common to
use reference materials during such a task and so they were
also provided with several images that could be referred to
or traced over. Prior to the task, a brief demonstration of the
system was carried out and the functionality of the marking
menus explained. The subjects were then given time to
familiarise themselves with the system, and to execute
small subtasks such as drawing lines, grouping, changing
colours, etc. They were then asked to dictate a series of
commands to the speech recognition system. This had the
dual purpose of ®ne-tuning the voice recognition to their
voice, and familiarising them with the command set. After
these preliminaries, they were asked to create a representa-
tion of Roderick the frog. During this phase they were
videotaped, with the intention that they may provide a retro-
spective protocol regarding the evolution of their concepts
later. This technique was used in a similar sketching study
by Suwa and Tversky [21]. Prior to their study, the most
common technique was to ask the designer to give a running
commentary or concurrent protocol. This obviously will
affect the ability to do the task [8] and Speak 'n' Sketch
would be unable to differentiate speech intended as input
from that intended as a commentary.
5.1. Results
The ®rst subject was a professional illustrator who
worked exclusively with traditional media (paper and
pencils). Despite initial apprehension at using the new tech-
nology, he adapted to the use of the tablet and stylus remark-
ably quickly. He undertook a number of different sketches to
get the composition of the scene, to get a sense of what
Roderick's face would be like and the later sketches
produced were more ®nal renditions of the scene. Because
he was accustomed to using only traditional media, he did
not seem to need to make use of any of the transformation
functions that were available. In his particular method of
working, these type of manipulations come towards the end
of the task rather than in the early stages. Hence he did not
attempt to group objects together, resize, rotate or translate
them. However, the colour palette of progressively light
and dark shades was particularly useful for his style of
sketching.
The second subject was a professional animator who
frequently designed web-based animations. She was highly
experienced with a tablet and stylus and was accustomed to
operating a tablet and keyboard simultaneously. Although
she attempted to perform transformations on the character,
performance problems ensued and she subsequently
con®ned her activity to drawing.
The third subject was a professional graphics designer
who was highly experienced in a wide range of drawing
packages and taught a course in computer graphic design
at a college of art. She had used tablets before, but primarily
used the keyboard and mouse in her work. Her version of
Roderick went through less iteration than that of the ®rst
subject, and she made much more use of rotation and scaling
operations. Her initial image of the frog was drawn with him
facing upwards because she found him easier to draw that
way. He was then rotated and resized smaller until he was
facing downwards and a parachute was added. She did not
make use of the layer functionality.
6. Discussion and conclusions
Performing simple tasks gave the subjects enough experi-
ence of the system to be able to comment positively and
negatively on the design features of Speak 'n' Sketch. All of
the subjects agreed that the design of the layering system
was very effective. They found that the tracing paper effect
was very useful for redrawing purposes and it helped to
know which layer was currently being worked upon. The
speech input was also met with great enthusiasm. In parti-
cular, subjects who used computers regularly in their work
were convinced that this would increase their productivity
because it was faster and ªallowed continuity of thoughtº.
ªIt means you don't have to stop and think about where you
have to goº commented one subject. In a similar vein,
another subject noted that ªI think those words a hundred
times a dayÐgroup, optimise, rotate. It would be so good to
just say it as soon as you think itº. These remarks seem to
suggest that in addition to speeding up the execution of
functions, verbally expressed commands also reduced
cognitive processing. We have to be careful, however, on
the basis of qualitative data and subjective comments, with-
out the bene®t of cognitive modelling and performance data,
to claim that the transition from thought to speech is more
J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450448
direct than thought to hand movement. Further studies
combining the use of task analytic techniques such as
Task Knowledge Structures (TKS) [12], and cognitive
modelling of mental resources such as in Interacting Cogni-
tive Subsystems (ICS) [1], would aid in addressing this
question.
The subject who depended on a tablet for her work
remarked that the angle at which she worked in order to
operate both tablet and keyboard caused shoulder and
back strain. Voice input would alleviate this problem. She
felt that at a minimum voice input had the same function-
ality as keyboard shortcuts and would improve the quality of
her work. The disadvantage, however, of the voice input and
one often referred to in studies of multimodal input, is the
disruption to other workers in situations like open studios.
The disruption whilst real and signi®cant is no worse than
someone talking on the telephone. In fact it could be consid-
ered less disruptive in that to people not involved in the task,
the vocalisations consist of random (to them) commands out
of context and not potentially interesting conversations.
All of the users wanted more palettes available to them as
space in Speak 'n' Sketch was much more than in commer-
cially available packages where palettes could be a problem.
Certain functionality was immediately needed to be in one
easy click such as changing pen size. The inexperienced
computer user made the analogy between picking up a
new pencil and clicking on a pen-size palette. He considered
this to be more natural than moving through a menu.
The radial menus received mixed reviews, two people
said they would use them in a real system, but the third
preferred keyboard shortcuts. Her rationale was that the
hand not being used for drawing could be used for the
keyboard commands. There is a deep issue here about the
degree of disruption and the cognitive demands that relate to
keyboard shortcuts and radial menus. This needs further
systematic investigation.
The evaluation had to be of an informal nature because it
would have been unrealistic to expect users to quickly learn a
new system, adapt to voice input, adapt to a new type of menu
and still expect them to be able to draw in their preferred
manner, naturally and easily. There were many new features
of the implementation; whilst this allowed us to achieve feed-
back on a number of design issues and choices, it made it
impossible to isolate the effects of individual design
features as opposed to their use in combination. This
would have been necessary in any systematic experi-
mental manipulation of the factors for the purposes of
validating the psychological models of sketching,
comparing task performance and quality in traditional
and computer supported contexts, and in assessing the
behavioural and performance factors associated with the
cognitive load of different use of modalities.
A longitudinal study also would be necessary to under-
stand how the tasks evolved through long term use of
computer support. Some features might prove useful,
unnecessary or need to be modi®ed or enhanced.
In future research, we aim to use extensions to the
implemented system as a test-bed for undertaking further
studies, where we will isolate the effects of speech input
and changes to the menu system over the short and long
term. We also intend to undertake further, theoretical
and empirical research into identifying, measuring and
assessing the impact on performance of different cogni-
tive resources used in interacting with different modal-
ities. The ultimate research aim is to develop principles
for multimodal tools to support highly creative and
iterative work tasks.
Acknowledgements
We are grateful to the British Council for funding this
research. Many thanks to Justin Wyatt for contributing all
the sketches in this paper. Walt Disney Co copyrights all
sketches.
References
[1] P.J. Barnard, J. May, Representing cognitive activity in complex
tasks, Human Computer Interaction (1999) 14.
[2] M. Boden, The Creative Mind: Myths and Mechanisms, Weiden®eld
& Nicholson, London, 1992.
[3] S. Bhavnani, et al., CAD usage in an architectural of®ce: from obser-
vations to active assistance, Automation in Construction 5 (3) (1996)
243±255.
[4] J. Callahan, et al., An empirical study of pie vs linear menus,
CHI88,1988, pp. 95±100.
[5] L. Candy, E.A. Edmonds, Supporting the creative user: a criteria-
based approach to interaction design, Design Studies 18 (2) (1997)
184±194.
[6] P. Cohen, The role of natural language in a multimodal interface,
UIST, 1992, pp. 143±149.
[7] E.Y. Do, M. Gross, Drawing as a means to design reasoning, Arti®cial
Intelligence in Design '96, 1996, pp. 22±27.
[8] K.A. Ericsson, H.A. Simon, Protocol Analysis: Verbal Reports as
Data, MIT Press, Cambridge, MA, USA, 1984.
[9] G. Goldschmidt, The dialectics of sketching, Creativity Research
Journal 4 (2) (1991) 369±383.
[10] M. Gross, The electronic cocktail napkin: a computational environ-
ment for working with design diagrams, Design Studies 17 (1) (1996)
53±69.
[11] A. Hauptmann, Speech and gesture for graphic image manipulation,
CHI89, 1989, pp. 241±245.
[12] H. Johnson, P. Johnson, Task knowledge structures: psychological
basis and integration into system design, Acta Psychologica 78
(1991) 3±26.
[13] G. Kurtenbach, Some articulatory and cognitive aspects of marking
menus: an empirical study, Human Computer Interaction 8 (2) (1993)
1±23.
[14] G. Kurtenbach, W. Buxton, The limits of expert performance using
hierarchic marking menus, CHI93, 1993.
[15] B. Lawson, How Designers Think: the Design Process Demysti®ed,
Butterworths, London, 1990.
[16] C. Mignot, N. Carbonell, Commande orale et gestuelle: etude empiri-
que, Technique et Science Informatiques 15 (10) (1996) 1399±1428.
[17] S. Oviatt, Multimodal interfaces for dynamic interactive maps,
CHI96, 1996.
J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450 449
[18] R. Oxman, Design by re-representation: a model of visual reasoning
in design, Design Studies 18 (4) (1997) 329±347.
[19] D. Schon, Designing as re¯ective conversation with the materials of
the design situation, Knowledge Based Systems 5 (1992) 3.
[20] J. Sedivy, Multimodal tool support for sketching, QMW Technical
Report, 1998.
[21] M. Suwa, B. Tversky, What do architects and students perceive in
their design sketches? A protocol analysis, Design Studies 18 (4)
(1997) 385±403.
[22] M.T. Vo, A.Waibel, A multimodal human computer interface: combi-
nation of speech and gesture recognition, InterCHI93, 1993.
J. Sedivy, H. Johnson / Knowledge-Based Systems 13 (2000) 441±450450