jurnal4

International Journal of Industrial Ergonomics 24 (1999) 631}645

Computer interface evaluation using eye movements:methods and constructs

Joseph H. Goldberg*, Xerxes P. Kotval1

The Pennsylvania State University, Department of Industrial and Manufacturing Engineering, 207 Hammond Building,University Park, PA 16802-1401, USA

Received 1 March 1998; received in revised form 13 March 1998; accepted 7 July 1998

Abstract

Eye movement-based analysis can enhance traditional performance, protocol, and walk-through evaluations ofcomputer interfaces. Despite a substantial history of eye movement data collection in tasks, there is still a great need foran organized de"nition and evaluation of appropriate measures. Several measures based upon eye movement locationsand scanpaths were evaluated here, to assess their validity for assessment of interface quality. Good and poor interfacesfor a drawing tool selection program were developed by manipulating the grouping of tool icons. These weresubsequently evaluated by a collection of 50 interface designers and typical users. Twelve subjects used the interfaceswhile their eye movements were collected. Compared with a randomly organized set of component buttons, well-organized functional grouping resulted in shorter scanpaths, covering smaller areas. The poorer interface resulted inmore, but similar duration, "xations than the better interface. Whereas the poor interface produced less e$cient searchbehavior, the layout of component representations did not in#uence their interpretability. Overall, data obtained fromeye movements can signi"cantly enhance the observation of users' strategies while using computer interfaces, which cansubsequently improve the precision of computer interface evaluations.

Relevance to industry

The software development industry requires improved methods for the objective analysis and design of softwareinterfaces. This study provides a foundation for using eye movement analysis as part of an objective evaluation tool formany phases of interface analysis. The present approach is instructional in its de"nition of eye movement-basedmeasures, and is evaluative with respect to the utility of these measures. ( 1998 Elsevier Science B.V. All rights reserved.

Keywords: Eye movements; HCI; Computer interface design; Software evaluation; Fixation algorithms

*Corresponding author.1Present address: Lucent Technologies, Bell Laboratories,

Holmdel, NY.

1. Introduction

1.1. Interface evaluation

The software development cycle requires fre-quent iterations of user testing and interface

0169-8141/99/$ - see front matter ( 1999 Elsevier Science B.V. All rights reservedPII: S 0 1 6 9 - 8 1 4 1 ( 9 8 ) 0 0 0 6 8 - 7

modi"cation. These interface evaluations, whetherat initial design or at later test and evaluationstages, should assess system functionality and theimpact of the interface on the user. Earlier, designevaluation methods include cognitive wal-kthroughs, heuristic, review-based, and model-based evaluations. At more mature product phases,performance-based experiments, protocol/observa-tion, and questionnaires are frequently used asa basis for evaluation (Dix et al., 1998). Perfor-mance-based studies assess errors and time to com-plete speci"ed operations or scenarios (Wickens etal., 1998).

Interface evaluation and usability testing are ex-pensive, time-intensive exercises, often done withpoorly documented standards and objectives. Theyare frequently qualitative, with poor reliability andsensitivity. Provision of an improved tool for rapidand e!ective evaluation of graphical user interfaceswas the motivating goal underlying the presentwork assessing eye movements as an indicator ofinterface usability.

1.2. Eye movements on displays

While using a computer interface, one's eyemovements usually indicate one's spatial focus ofattention on a display. In order to foveate informa-tive areas in a scene, the eyes naturally "xate uponareas that are surprising, salient, or importantthrough experience (Loftus and Mackworth, 1978).Thus, current gazepoints on a display can approx-imate foci of attention over a time period. Whenconsidering short time intervals, however, one's at-tentional focus may lead or lag the gazepoint (Justand Carpenter, 1976). By choosing long enoughsampling intervals for eye movements, temporalleads/lags should be averaged out.

Applied eye movement analysis has at leasta 60 yr history in performance and usability assess-ments of spatial displays within information ac-quisition contexts such as aviation, driving, X-raysearch, and advertising. Buswell (1935) measured"xation densities and serial scanpaths while indi-viduals freely viewed artwork samples, noting thateyes follow the direction of principal lines in "gures,and that more di$cult processing produced longer"xation durations. Mackworth (1976) noted that

higher display densities produced 50}100 ms longer"xation durations than lower density displays.Non-productive eye movements more than 203from the horizontal scanning axis strongly in-creased as a percentage of all eye movements as thedisplay width and density increased. Kolers et al.(1981) measured eye "xations (number, number perline, rate, duration, words per "xation) as a func-tion of character and line spacing in a reading task.More "xations per line (and fewer "xations perword) were associated with more tightly-grouped,singled-spaced material. Fewer, yet longer "xationswere made with smaller, more densely packed textcharacters. Yamamoto and Kuto (1992) foundimproved Japanese character reading performanceassociated with series of sequential rather thanbacktracking eye movements. Eye tracking hasaided the assessment of whether the order of prod-uct versus "ller displays in a television commercialin#uences one's attention to that product (Janis-zewski and Warlop, 1993). Using eye movementanalyses while scanning advertisements on tele-phone yellow pages, quarter-page ad displays weremuch more noticed than text listings, and color adswere perceived more quickly, more often, and lon-ger than black and white ads (Lohse, 1997).

Prior eye movement-based interface and usabil-ity characterizations have relied heavily uponcumulative "xation time and areas of interest ap-proaches, dividing an interface into prede"nedareas. Transitions into and from these areas, as wellas time spent in each area, are tallied. While theseapproaches can signal areas where more or lessattention is spent while using a display, few invest-igations have considered the complex nature ofscanpaths, de"ned from a series of "xations andsaccades on the interface. Scanpath complexity andregularity measures are needed to approach someof the subtler interface usability issues in screendesign.

1.3. Objective

Eye tracking systems are now inexpensive, re-liable, and precise enough to signi"cantly enhancesystem evaluations. While the hardware technologyis quite mature (Young and Sheena, 1975, for a gen-eral review), methods of evaluating data from eye

632 J.H. Goldberg, X.P. Kotval / International Journal of Industrial Ergonomics 24 (1999) 631}645

Fig. 1. Interface designs. Left panel: good design; right panel: poor design.

tracking experiments are still somewhat immatureand disorganized. The objective of the present pa-per is to provide an introduction and frameworkfor eye movement data analysis techniques. Theseeye movement measures and algorithms are pre-sented in light of results from an experiment presen-ting users with both `gooda and `poora interfaces.

2. Methods

Scanpaths were collected from 12 subjects whileusing both `gooda and `poora software interfaces.The resulting scanpaths were characterized usinga number of quantitative measures, each designedto characterize di!erent aspects of scanpath behav-iors and relate to the cognitive behavior underlyingvisual search and information processing. A com-parison of expected user search behavior using eachinterface with the results of scanpath measures wereused to determine the relative e!ectiveness of eachmeasure.

2.1. Interface stimuli

Example `gooda and `poora interfaces were pro-grammed to provide a well-controlled and equallyfamiliar environment for all subjects in this study.Their primary purpose was not to evaluate theusability of these particular interfaces per se; rather,they provided a means for validating the variouscreated measures. Fig. 1 shows two of these interfa-ces. The interface showed a work area with a panelof tool buttons, much like a drawing package.

The good}poor distinction was based uponphysical grouping of interface tool buttons. Usersexpect physically grouped components to be re-lated by some common characteristic, whetherphysical or conceptual (Wickens and Carswell,1995). Exploiting this, the `gooda interface groupedeleven components into three functionally relatedgroups: editing, drawing, and text manipulationtools (Fig. 1, left panel). These functional groupingswere intended to allow relatively e$cient toolsearch, compared with the poorly designed inter-face (Fig. 1, right panel) intended to cause less e$-cient visual search. The `poora interface provideda randomized (i.e., not functional or conceptual)relationship within each tool group.

To verify a substantial di!erence in perceivedquality, "fty typical users and thirty interface de-sign experts rated each interface on a scale from1 (excellent) to 5 (unacceptable). The functionallygrouped interface averaged 1.35, between good andexcellent, whereas the randomly grouped interfaceaveraged 4.53, between unacceptable and poor.Thus, the two interfaces were con"rmed as substan-tially di!erent in design quality.

2.2. Apparatus and calibration

The experiment was hosted on a PC with a 13 in(33 cm) VGA monitor with mouse/windows con-trol. A second computer, remotely activated by thehost computer, controlled the eye tracking system,a DBA systems Model 626 infrared corneal re#ec-tion system (Fig. 2). An infrared sensitive CCDvideo camera was positioned just below the host

J.H. Goldberg, X.P. Kotval / International Journal of Industrial Ergonomics 24 (1999) 631}645 633

Fig. 2. Experimental apparatus, showing eye tracker with infrared-sensitive camera lens.

computer's monitor. The camera contained anLED inline with its focal axis, generating an illu-minated pupil and light glint ("rst Purkinje re#ec-tion) on the subject's cornea. The head posture andeye location were maintained with a head/chin rest,such that the eye was 22 in (56 cm) from the screen,and level with its center. At this distance, the screensubtended 213 and 163 of horizontal and verticalvisual angle, respectively. Each 65]65 pixel toolbutton in the interface subtended 1 in, or 2.23 ofhorizontal visual angle.

Video images of the pupil and Purkinje re#ectionwere captured at 60 Hz by the eye tracker andassigned light intensity values to each pixel in thedigital image. An intensity threshold "ltered thevideo image until the pupil image was isolated.Eye tracker software located the center of the pupiland calculated the vector from it to the corneal lightglint. A calibration procedure related this vectorwith Cartesian coordinates on the interface screen,providing the subject's eyegaze location, or point-of-regard (POR). The POR coordinates were col-lected and stored in a data"le for later processing.

Calibration used a set of 9 screen locations,and was checked with each block (33 trials) ineach subject's session. The criterion for a successfulcalibration equated to residuals that were lessthan 0.5 cm (0.53 visual angle) from actual targetlocation. In other words, the eye tracker software

estimate of target location was not more than 10pixels away from the actual target location.

2.3. Subjects

Twelve subjects (7 female, 5 male) participated inthis study. Ages ranged from 20 to 27 yr (mean23 yr). Participants averaged 4.8 yr of experienceusing typical windowing software, spending anaverage of 15.3 h a week using software interfaces.Because corrective lenses produce additional sur-face re#ections which interfere with the eyetracker's identi"cation and processing of the Pur-kinje image, subjects performed the experimentwithout corrective lenses. All subjects had an un-corrected Snellen visual acuity of 20/35 or better, asdetermined by a Bausch and Lomb Vision Tester(Cat. 71-22-41).

2.4. Procedure and design

After adjusting the chinrest and workstation,each subject was carefully calibrated. Calibrationwas also repeated prior to each block. Practice,consisting of a block of 33 trials, was provided foreach of the tested interfaces. Each trial on a blockwas initiated by the subject selecting a `Continueabutton at the center of the work area withthe mouse. The `Continuea button was then


Table 1Classi"cation of eye movement and scanpath measures

immediately replaced by the name of one of theeleven randomized tool buttons (e.g., CUT) in themiddle of the workspace. The eye tracker initiatedits POR data collection at this time. The subject, asquickly as possible, then located the tool buttonfrom the tool menu at the left of the display, andclicked the left button on the mouse, stopping PORcollection. Feedback, consisting of a statement of`correcta or `incorrecta at the position of the initialinstruction location, was provided after each trial.A 1 min break was provided between each block;the total subject testing time was 40 min.

Within each of 12 subjects, the experiment pre-sented 6 replicates of each of the 11 tool buttoncomponents for each of the two interfaces present-ed here. The trial order was counterbalanced be-tween subjects. A fully-crossed ANOVA for eachdependent measure included Subjects (12 levels,random e!ect)]Interface (2 levels, "xed e!ect)]Tool Component (11 levels, "xed e!ect)]6replicates).

3. Scanpath generation

3.1. Classixcation of measures

Scanpaths are de"ned by a saccade}"xate}sac-cade sequence on a display. For information searchtasks, the optimal scanpath is a straight line to

a desired target, with relatively short "xation dura-tion at the target. The derived scanpath measuresdiscussed below attempt to quantitatively measurethe divergence from this optimal scanpath in sev-eral ways. The measures each provide a singlequantitative value, with some requiring no know-ledge of the content of the computer interface.Table 1 provides a summary of these measures,categorizing them on two dimensions. Temporalmeasures describe the sequential, time-based na-ture of a scanpath, whereas spatial emphasizes thespread and coverage of a scanpath. Furthermore,the measures may rely upon unprocessed, 60 Hzraw gazepoint samples, or may be more oriented toprocessed "xations and/or saccades within a scan-path. Typically, reported eye movement data hasbeen pre-processed to form "xations and saccades,by one of many di!erent algorithms (Goldberg andSchryver, 1993). The resulting set of "xations andsaccades are further processed to characterize scan-paths and their dynamic change (Goldberg andSchryver, 1995). However, some of the measurespresented here can be applied to the gazepointsamples, which is computationally easier, but withless behavioral meaning.

3.2. Fixations

The eyes dart from "xation to "xation in a typi-cal search on a display. At least three processes take


Fig. 3. Events occurring within typical "xations.

place within the 250}300 ms of a typical "xation(Viviani, 1990), as shown in Fig. 3. First, visualinformation is encoded, presumably to label thegeneral scene (Loftus and Mackworth, 1978). Nextthe peripheral visual "eld of the current gaze issampled, to determine subsequent informativeareas. Finally, the next saccade is planned andprepared. These processes overlap, and may occurin parallel.

Gazepoints sampled at 60 Hz represent the line-of-sight at the time of sampling, and may or maynot be at a location within a "xation, as samplingmay have occurred during a saccade or perhapsduring a blink or other artifact. Most commercialeye tracking systems include software removal ofthese artifacts, and some also include "xation con-struction algorithms. Fixation algorithms may bebased on cluster and other statistical analyses, andmay be locally adaptive to the amplitude of ocularjumps (Goldberg and Schryver, 1995; Ramakrishnaet al., 1993; Belofsky and Lyon, 1988; Scinto andBarnette, 1986). Most algorithms develop "xationclusters by using a constrained spatial proximitydetermination, but temporal constraints can alsobe used. Latimer (1988) used temporal informationrelated to each sample gazepoint but only to deter-mine the cumulative "xation time after the clusterhad been de"ned by spatial criteria. A "xationalgorithm must produce "xations that meet certainminimum characteristics. The center of a typical"xation is within 2}33 from the observed targetobject (Robinson, 1979) and the minimum process-

ing duration during a "xation is 100}150 ms(Viviani, 1990).

The present study used a data position variancemethod (Anliker, 1976), after removing blinks andother eye movement artifacts. Fixations were ini-tially constrained to a 33$0.53 spatial area, andhad to be of at least 100 ms in duration. Thiscorresponded to a minimum of 6 sample gazepointsper "xation (at 60 Hz), following Karsh andBreitenbach (1983), and agrees with descriptions ofsaccades lasting 20}100 ms (Hallet, 1986). Oncea "xation was initially de"ned, its spatial diameterwas computed. Subsequent gazepoint data samplesfalling within this diameter threshold were interac-tively added to the "xation. The spatial diameterthreshold was then raised or lowered within a sub-ject, following the method of Krose and Burbeck(1989), with only one "xation diameter allowedper subject. Maximum "xation diameters werevaried from 23 to 43, in 0.53 increments, until de-"ned "xations su$ciently "t the gazepoint data.Allowed "xation diameters were increased whentoo few "xations (of very short duration) were evi-dent. Conversely, "xation durations longer than900 ms indicated that "xation diameter should bedecreased.

While these methods are useful in identifyingcritical areas of attentional focus on a display, in-formation based on the temporal order of "xationsis lost. When and how often a target is "xatedduring a scanpath provides valuable informationfor the evaluation of an interface. Fig. 4 illustrates


Fig. 4. Comparison of "xation algorithms. Spatial constraints(left panel); spatial plus temporal constraints (right panel).

Fig. 5. Fixation cluster de"nition, showing 80 pixel diameter.

the di!erence between spatially derived "xationsand "xations derived on the basis of both spatialand temporal criteria. Areas A and B are areas ofhigh interest due to the number of gazepointsamples at each location. The left panel shows atemporal independent clustering (spatial constraintonly), whereas the right panel shows a temporal-sensitive clustering. Areas A and B are still shownas areas of high interest, but by keeping track of thetemporal order of samples, better informationabout the relationship between A and B are ob-tained.

The present study supplemented the preceding"xation method by testing sampled gazepoints intemporal order (Latimer, 1988; Tullis, 1983). Eachof the 6 or more (de"ning at least 100 ms, at 60 Hz)temporally sequential gazepoint samples had to bewithin 40 pixels (0.6 in or 1.33) from the centroid ofthe gazepoint sample, as shown in Fig. 5. This

Table 2Algorithm for "xation clustering

Fixation cluster algorithm

Step 1 Place "rst node in current clusterStep 2 Compute common mean location of all sam-

ples in current cluster, and the next temporallysequential sample

Step 3 If the new point is within 40 pixels from thecommon mean location include new point inthe current clusterIf the new point is not within 40 pixels of thecommon mean then the current cluster be-comes an old cluster and the new point be-comes the current clusterIf the number of points (n) in the old cluster*6 then the cluster is classi"ed as a FIX-ATION of n]16.67 ms durationIf n(6 then the cluster is classi"ed as SAC-CADE of n]16.67 ms durationGOTO Step 2 until done

de"ned "xations that were within the 2}33 rangedescribed by Robinson (1979). If the total numberof samples within a cluster was less than 6, then thecluster was categorized as part of a saccade. Thegeneral "xation algorithm applied to the presentdata is in Table 2.

4. Measures of search

Illustrated descriptions of each of the eye move-ment measures and algorithms are provided below.Results from the good versus poor interfaces andother factors are also presented here. The samehypothetical scanpath is used in all examples be-low, for easy comparison. All of these measuresmay be used for a given scanpath, with each o!er-ing a slightly di!erent interpretation of the data.Scanpaths may also be viewed as directed or un-directed graphs, allowing additional characteriza-tions of complexity and size from graph theory. Theorganized functional grouping of components inthe good design was expected induce subjects to"nd components quickly, producing rapid directsearch patterns. In contrast, the randomized groupsof the poor design were intended to mislead sub-jects, causing them to stay in an incorrect grouping


or leave a correct grouping under the incorrectexpectation that grouped components were relatedfunctionally. As a result the poor design was ex-pected to produce more extensive search behavior.

4.1. Scanpath length and duration

Scanpath length is a productivity measure thatcan be used for baseline comparisons or de"ning anoptimal visual search based on minimizing saccadicamplitudes. This may be computed independentlyfrom the actual screen layout, and may be appliedto gazepoint samples or to processed "xation data.The length (in pixels) is the summation of the dis-tances between the gazepoint samples. An examplescanpath is illustrated in Fig. 6. Lengthy scanpathsindicate less e$cient scanning behavior but do notdistinguish between search and information pro-cessing times. Unless scanpaths are formed fromcomputed "xations and saccades, the scanpathsshould not be used to make detailed inferencesabout one's attentional allocation on a display.

Scanpath duration is more related to processingcomplexity than to visual search e$ciency, as muchmore relative time is spent in "xations than insaccades. Using 60 Hz gazepoint samples, the num-ber of samples is directly proportional to the tem-poral duration of each scanpath, or ScanpathDuration"n]16.67 ms, where n"number ofsamples in the scanpath. However, using "xations,the scanpath duration must sum "xation durationswith saccade durations.

Using gazepoint samples, there was no signi"-cant di!erence in the overall duration of scanpathsproduced by the good and poor interfaces(F

1,1320"1.90, p'0.05). The average duration

from the good interface was 1439 ms (sd"368.4),while the poor interface produced average dura-tions of 1543 ms (sd"566.5). The non-signi"cantduration here, possibly due to (non-signi"cant)variance di!erences, should not be interpretedalong as a sign of similar interface quality. Furthermeasure comparisons should be conducted.

Extensive search behavior produces spatiallylengthy scanpaths. Two "xation-saccade scanpathsmay have the same temporal duration but con-siderably di!erent lengths due to the di!erences inthe extent of search required. Using the summated

Fig. 6. Example computations for scanpath duration (left panel)and length (right panel).

lengths (by the pythagorean theorem), scanpathlengths were computed, as in Fig. 6. The poordesign did indeed produce longer scanpaths(F

1,1320"5.16, p(0.05), averaging 228 pixels

(sd"860), 14% longer than the better interface(1978 pixels, sd"491). Note that 65 pixels, here,was equivalent to a screen distance of 2.5 cm.

4.2. Convex hull area

Circumscribing the entire scanpath extends thelength measures to consider the area covered bya scanpath. If a circle circumscribed the scanpath,small deviations in gazepoint samples would leadto dramatic changes in the area of the circum-scribed circle, exaggerating actual di!erences in thescanpath area. As shown in Fig. 7, left panel, scan-paths A and B are similar, di!ering by only oneexcursion, but the area of the circle circumscribingscanpath B is 4 times the area of the circle circum-scribing scanpath A. In contrast, scanpaths B andC are dramatically di!erent in shape and range butproduce the same circumscribed circle area.

Using the area of the convex hull circumscribingthe scanpath, illustrated in Fig. 7, right panel, theexaggeration can be reduced. Note that scanpathareas A and B are now more similar, and B andC are less similar than with circumscribed circles.Table 3 provides an algorithm to construct convexhulls and hull area, of which steps 1}4 are illus-trated in Fig. 8. Fig. 9 provides a simple example ofa scanpath convex hull area. Alternative algorithms


Fig. 8. Iterative example of convex hull generation algorithm.

Fig. 7. Relative comparison of areas de"ned by circumscribedcircles (left panel) and convex hulls (right panel).

for generating convex hulls are provided bySedgewick (1990). Triangle areas (e.g., *ABC) werecomputed from:

*ABD"JP(P!AB)(P!BC)(P!CA),

where the perimeter,

P"(AB#BC#CA)/2

and

IJ"JDXI!X

JD2#D>

I!>

JD2.

While the convex hull area may seem to bea more comprehensive measure of search than thescanpath length, note that long scanpaths may stillreside within a small spatial area. Used in conjunc-tion, the two measures can determine if lengthysearch covered a large or a localized area on a

Table 3Algorithm for convex hull area

Step 1 Search all samples to identify and label thefour samples with the Min x, Max y, Max xand Min y

Step 2 Set Min x as Vertex(1)Step 3 Compute the slope of Vertex(n) with every

sample in the scanpathStep 4 IF MMin x(Vertex(n) Max yN OR

MMin y(Vertex(n)(Max xN THEN set Ver-tex(n) to the sample with the largest positiveslopeIF MMax y(Vertex(n)(Max xN ORMMin y(Vertex(n)(Min xN THEN set Ver-tex(n) to the sample with the least negativeslopeStore Vertex(n) in a listIF Vertex(n)"Min x THEN GOTO Step 5Increment n and GOTO Step 3

Step 5 Set n"2Step 6 Compute and store the area of the triangle

created by the points Vertex(1), Vertex(n) andVertex(n#1). Increment n by 1. Repeat Step6 until done. Sum of stored areas equals con-vex hull area

display. In the present experiment, the poor designproduced 11% larger (31 339 pixel2, sd"14 952)search areas than the better design (28 168 pixel2,sd"12 009, F

1,1320"6.70, p(0.05). The larger

search area coupled with the longer scanpath pro-duced by the poorer interface indicated that thedisorganized interface produced a widely distrib-uted search pattern.


Fig. 9. Example area generated by convex hull algorithm.

4.3. Spatial density

Coverage of an interface due to search and pro-cessing may be captured by the spatial distributionof gazepoint samples. Evenly spread samplesthroughout the display indicate extensive searchwith an ine$cient path, whereas targeted samplesin a small area re#ect direct and e$cient search.The interface can be divided into grid areas eitherrepresenting speci"c objects or physical screenarea. In the present experiment, the display wasdivided into an evenly spaced 10]10 grid, witheach cell covering 64]48 pixels. The spatial densityindex was equal to the number of cells containing atleast one sample, divided by the total number ofgrid cells (100); an example is shown in Fig. 10.A smaller spatial density indicated more directedsearch, regardless of the temporal gazepoint samp-ling order. The poor interface produced 7% largerspatial density indices (Mean index"10.2, sd"3.1%) compared with the better interface (Meanindex"9.5%, sd"2.3%, F

1,1320"6.31,

p(0.05).

4.4. Transition matrix

A transition matrix expresses the frequency ofeye movement transitions between de"nes Areas ofInterest (AOI's) (Ponsoda et al., 1995). This metricconsiders both search area and movement overtime. While the scanpath spatial density providesuseful information about the physical range of

Fig. 10. Example spatial density computation.

Fig. 11. Relative scanpath di!erences between e$cient (A) andine$cient (B) search.

search, a transition matrix adds the temporal com-ponent of search. Also known as link analysis(Jones et al., 1949), frequent transitions from oneregion of a display to another indicates ine$cientscanning with extensive search. Consider the twosimple spatial distributions presented in Fig. 11.Both distributions produce the same index of spa-tial density and convex hull area. However, thesearch behaviors are dramatically di!erent. Scan-path A has a more e$cient search pattern witha shorter scanpath length than scanpath B.

The transition matrix is a tabular representationof the number of transitions to and from eachde"ned area. As shown in Fig. 12, a directed scan-path from region 3 to region 5 forms a unique cell


Fig. 12. Example development of transition matrix from areas of interest on display.

pattern in the transition matrix. An unusuallydense transition matrix, with most cells "lled withat least one transition, indicates extensive search ona display, suggesting poor design. A sparse matrixindicates more e$cient and directed search. Thematrix may be characterized with a single quan-titative value by dividing the number of act-ive transition cells (i.e., those containing at leastone transition) by the total number of cells. A largeindex value indicates a dispersed, lengthy, and wan-dering scanpath, whereas smaller values point tomore directed and e$cient search.

The de"ned AOI's may be of equal or unequalsize. A content-dependent analysis would assign

each AOI to a screen window or object, witha unique AOI expressing all non-interesting areas.A content-independent analysis would simply di-vide the display into a grid, assigning an AOI toeach grid cell. The present experiment dividedthe display interface into 25 regions; 24 were ofequal size, whereas the 25th was the larger work-space area. In order to better capture dynamicsearch activity within the scanpath, intra-celltransitions need not be included (these were notshown in Fig. 12). The transition matrix density isthe number of non-zero matrix cells divided bytotal number of cells (25]25 cells here). The poorinterface had denser (1.69%, sd"1.01) transition


matrices than the better interface (1.37%, sd"0.65,F1,1320

"6.91, p(0.05), consistent with theundirected and more extensive search behaviorexpected due to the grouping of unrelated co-mponents in the poorer interface design.

4.5. Number of saccades

The number of saccades in a scanpath indicatesthe relative organization and amount of visualsearch on a display, with more saccades implyinggreater amount of search. For applied purposes, thedistance between each successive "xation can gen-erally be de"ned as a saccade, with both an ampli-tude and duration. That is, the number of saccadesmay be de"ned from the number of "xations minusone. A minimum amplitude of 160 pixels (5.33 vis-ual angle) was required here to "lter out smallmicro-saccades resulting from moving to the pe-riphery of the prior "xation (only 80 pixels). Sac-cades may be quite large (e.g., 203), so no upperbound was placed on the saccadic amplitude. Theoverall algorithm for counting saccades thus "rsttested the distance between "xation centers, in-crementing the saccade count if greater than 160pixels. Prior to acquiring the tool button target,there were 17% more saccades (averaging 2.53 sac-cades, sd"1.47) produced from the poor interfacethan from the better interface (2.17 saccades,sd"1.16, F

1,1320"8.74, p(0.05).

4.6. Saccadic amplitude

A well designed interface should provide su$-cient cues to direct the user's scanning to desiredtargets very rapidly, with few interim "xations. Thiswill result in an expectation of larger saccadic am-plitudes. If the provided cues are not meaningful ormisleading, the resultant saccades should be small-er, negotiating the interface until a meaningful cueappears. The average saccadic amplitude is com-puted from the sum of the distances between con-secutive "xations, dividing this by the number of"xations minus one. Note that all saccades wereused to generate this sum, with no minimum lengthcriterion. There was no signi"cant di!erence inaverage saccadic amplitude between the twointerface designs (F

1,1320"0.22, p'0.05). This

indicated that even with more extensive search inpoorer interfaces, the size of the saccades was sim-ilar between good (303 pixels, sd"109) and poor(299 pixels, sd"104) interfaces. While the localsearch step size was the same between the twointerfaces, the overall extent of search was greaterin the poor design. The functional grouping layoutthus aids visual search planning for proper toolselection, but does not impact individual saccadicmotions.

In both interfaces, subjects typically moved toone group and sampled a component. In the goodinterface, a rapid determination could be made todetermine if the desired component was in the samegroup (common function) or not (di!erent func-tion). If the group functionality matched the desiredcomponent, small saccades could be made to eachcomponent from within the tool grouping until thetarget was acquired. If the group does not match,a saccade is rapidly made to another group. Thefunctional grouping layout, however, did not pro-vide cues about the next group to sample, thusa small saccade was still made to an adjacentgroup, continuing the search. The poorer interfaceprovided little or no information about the othercomponents within a grouping, once a tool buttonwas acquired. As a result, subjects again madesmall local saccades, exhaustively searching withinthe group before executing a saccade to an adjacentgroup.

Using these search measures, it was clear that thefunctional component grouping reduces the extentof required visual search by allowing one to rapidly`zoom ina on the desired component, while main-taining a relatively small amplitude for saccades.

5. Measures of processing

Visual search is conducted to obtain informationfrom an interface, where more extensive searchallows more interface objects to be processed. Thisdoes not consider the depth of required processing,however. In the present study, as the same repres-entations were used in both interfaces, the depth ofprocessing required to distinguish and interpreta component was not expected to di!er.


5.1. Number of xxations

The number of "xations is related to the numberof components that the user is required to process,but not the depth of required processing. Whensearching for a single target, a large number of"xations indicates the user sampled many otherobjects prior to selecting the target, as if distractedor hindered from isolating the target. The poorinterface, intentionally designed to mislead the sub-ject, produced signi"cantly more "xations than thegood design (F

1,1320"8.36, p(0.05). The poor

interface produced on average 2.53 "xations(sd"1.16) for each component search, 17% more"xations than the good interface required (aver-age"2.17, sd"1.47). The functionally groupeddesign allowed one to `zoom ina on the correctcomponent more e$ciently, requiring fewer com-ponents to be processed.

5.2. Fixation duration

Longer "xations imply the user is spending moretime interpreting or relating the component repres-entations in the interface to internalized representa-tions. Representations that require long "xationsare not as meaningful to the user as those withshorter "xation durations. Maximum and average"xation times are context-independent measures,but the duration of single "xations on targets isdependent on the interface layout. Average "xationduration was calculated by summing the number ofgazepoint samples in all the "xations and dividingby the number of "xations.

The level of processing for graphic representa-tions was expected to be the same between inter-faces, as the icons were the same. Con"rming this,the good interface average "xation durations(411 ms, sd"144) and the poor interface "xations(391 ms, sd"144) did not signi"cantly di!er(F

1,1320" 1.92, p'0.05).

5.3. Fixation/saccade ratio

This content-independent ratio compares thetime spent processing ("xations) component repre-sentations to the time spent searching (saccades)for the components. Interfaces resulting in higher

ratios indicate that there was either more process-ing or less search activity than interfaces with lowerratios. Other measures can determine which ofthese was the case. As the "xation/saccade ratio didnot signi"cantly di!er between good (mean"14.8,sd"5.9) and poor (mean"13.9, sd"6.2) interfa-ces (F

1,1320"1.26, p'0.05), if more search was

required, a proportionate amount of processingwas also required.

5.4. Other measures

The preceding measures only describe a portionof the potential universe of eye movement andscanpath characterization tools. Other measuresmay further aid interface analysis in certain circum-stances. Several examples may be considered: First,a backtrack can be described by any saccadicmotion that deviates more than 903 in angle fromits immediately preceding saccade. These acuteangles indicate rapid changes in direction, due tochanges in goals and mismatch between users'expectation and the observed interface layout.Second, the ratio of on-target: all-target "xationscan be de"ned by counting the number of "xa-tions falling within a designated AOI or target,then dividing by the all "xations. This is a con-tent-dependent e$ciency measure of search, withsmaller ratios indicating lower e$ciency. Third,the number of post-target ,xations, or "xationson other areas, following target capture, can indi-cate the target's meaningfulness to a user. Highvalues of non-target checking, following initialtarget capture indicate target representations withpoor meaningfulness or visibility. Fourth, measuresof scanpath regularity, considering integrated erroror deviation from a regular cycle, can indicate vari-ance in search due to a poor interface or users' stateof training. Many potential measures of scanpathcomplexity are possible, once cyclic scanning be-havior is identi"ed.

6. Discussion

Successful interaction with a computer clearlyrequires many elements, including good visibility,meaningfulness, transparency, and the requirement


of simple motor skills. Eye movement-based evalu-ation of the interface, as espoused here, can onlyaddress a subset of critical interface issues thatrevolve around software object visibility, meaning-fulness, and placement. Though not a panacea toolfor design evaluation, characterization of eye move-ments can help by providing easily comparablequantitative metrics for objective design iteration.

One particular strength of eye movement-basedevaluation is in the assessment of users' strategiesat the interface. Usually unaware of their searchprocesses, eye movements can provide a temporal/spatial record of search and #ow while using a com-puter. Strategy di!erences are most evident duringlengthy (e.g., 10}15 s) tasks, where a su$cient scan-path exists for characterization by the abovemethods. Task that are too rapid (e.g., under 1 s) donot allow su$cient data for many of the measures.

In the case of component grouping, the presentstudy demonstrated very good validity betweenusers' and designers' ratings of good versus poor,and many of the eye movement-based measures.The framework and measures proposed here allowimproved objective estimation of users' strategies,and the in#uence of interface design on those strat-egies. Compared with a randomly organized set ofcomponent buttons, well-organized functionalgrouping resulted in shorter scanpaths, coveringsmaller areas. The poorer interface resulted in lessdirected search with more (though equal in ampli-tude) saccades. Though similar in duration, thepoor interface resulted in more "xations than thebetter interface. Whereas the poor interface pro-duces less e$cient search behavior, the layout ofcomponent representations did not in#uence theirinterpretability.

Ongoing investigations will further consider thesensitivity, reliability, and validity of eye move-ment-based measures for interface evaluation. Bypresenting several designed interfaces that morecontinuously vary in rated quality, assessment ofwhich measures re#ect these quality di!erences canbe made. Other factors are also being introduced,such as component meaningfulness and visibility.Ultimately, eye movements may lend some degree ofdiagnosticity to interface evaluations, and may pos-sibly lead to design recommendations, similar toTullis's (1983) methodology for text-based material.

References

Anliker, J., 1976. On line measurements, analysis and control. In:Monty, R.A., Senders, J.W. (Eds.), Eye Movements and Psy-chological Processes. Erlbaum Press, Hillsdale, NJ.

Belofsky, M.S., Lyon, D.R., 1988. Modeling eye movement se-quences using conceptual clustering techniques. Air ForceHuman Resources Laboratory, Doc. d AFHRL-TR-88-16,Air Force Systems, Brooks Air Force Base, TX.

Buswell, G.T., 1935. How People Look at Pictures. A Study ofthe Psychology of Perception in Art. The University ofChicago Press, Chicago, IL.

Dix, A., Finlay, J., Abowd, G., Beale, R., 1988. Human}Com-puter Interaction. 2nd ed., Prentice-Hall, London.

Goldberg, J.H., Schryver, J.C., 1993. Eye-gaze determination ofuser intent at the computer interface. In: Findlay, J.M.,Walker, R., Kentridge, R.W. (Eds.), Eye Movement Re-search: Mechanisms, Processes and Applications. North-Holland Press, Amsterdam, pp. 491}502.

Goldberg, J.H., Schryver, J.C., 1995. Eye-gaze contingent con-trol of the computer interface: Methodology and example forzoom detection. Behavior Research Methods, Instrumentsand Computers 27 (3), 338}350.

Jones, R.E., Milton, J.L., Fitts, P.M., 1949. Eye "xations ofaircraft pilots; IV: Frequency, duration and sequence of"xations during routine instrument #ight, US Air ForceTechnical Report 5975.

Just, M.A., Carpenter, P.A., 1976. Eye "xations and cognitiveprocesses. Cognitive Psychology 8, 441}480.

Kolers, P.A., Duchnicky, R.L., Ferguson, D.C., 1981. Eye move-ment measurement of readability of CRT displays. HumanFactors 23 (5), 517}527.

Krose, B.J.A., Burbeck, C.A., 1989. Spatial interactions in rapidpattern discrimination. Spatial Vision 4, 211}222.

Latimer, C.R., 1988. Eye-movement data: Cumulative "xationtime and cluster analysis. Behavior Research Methods, In-struments, and Computers 20 (5), 437}470.

Loftus, G.R., Mackworth, N.H., 1978. Cognitive determinants of"xation location during picture viewing. Journal of Experi-mental Psychology: Human Perception and Performance4 (4), 565}572.

Mackworth, N.H., 1976. Stimulus density limits the useful "eldof view. In: Monty, R.A., Senders, J.W. (Eds.), EyeMovements and Psychological Processes. Erlbaum,Hillsdale, NJ.

Ponsoda, V., Scott, D., Findlay, J.M., 1995. A probability vectorand transition matrix analysis of eye movements duringvisual search. Acta Psycholgica 88, 167}185.

Ramakrishna, S., Pillalamarri, B., Barnette, D., Birkmire, D.,Karsh, R., 1993. Cluster: A program for the identi"cation ofeye-"xation-cluster characteristics. Behavior ResearchMethods, Instruments, and Computers 25 (1), 9}15.

Robinson, G.H., 1979. Dynamics of the eye and head duringmovement between displays: A qualitative and quantitativeguide for designers. Human Factors 21 (3), 343}352.

Scinto, L., Barnette, B.D., 1986. An algorithm for determiningclusters, pairs and singletons in eye-movement scan-path


records. Behavior Research Methods, Instruments, andComputers 18 (1), 41}44.

Sedgewick, R., 1990. Algorithms in C. Addison-Wesley, Reading,MA.

Tullis, T.S., 1983. The formatting of alphanumeric displays:Review and analysis. Human Factors 25 (6), 657}682.

Viviani, P., 1990. In: Kowler, E. (Ed.), Eye Movements and TheirRole in Visual and Cognitive Processes, Ch. 8. ElsevierScience, Amsterdam.

Wickens, C.D., Carswell, C.M., 1995. The proximity compatibil-ity principle: Its psychological foundation and relevance todisplay design. Human Factors 37 (3), 473}494.

Wickens, C.D., Gordon, S.E., Liu, Y., 1998. An Introduction toHuman Factors Engineering. Addison-Wesley and Long-man, New York.

Yamamoto, S., Kuto, Y., 1992. A method of evaluating VDTscreen layout by eye movement analysis. Ergonomics 35(5/6), 591}606.


jurnal4

Documents

behavior research methods

randomly organized set

poorer interface resulted

convex hull area

cient search behavior

circle circumscribing scanpath

covering smaller areas

actual target location