recognition memory for realistic synthetic...

12
Recognition Memory for Realistic Synthetic Faces Yuko Yotsumoto 1 , Michael J. Kahana 2 , Hugh R. Wilson 3 and Robert Sekuler 1 1 Volen Center for Complex Systems, Brandeis University 3 Department of Psychology, University of Pennsylvania 3 Centre for Vision Research, York University A series of experiments examined short-term recognition memory for trios of briefly- presented, synthetic human faces derived from three real human faces. The stimuli were graded series of faces, which differed by varying known amounts from the face of the average female. Faces based on each of the three real faces were transformed so as to lie along orthogonal axes in a 3-D face space. Experiment 1 showed that the synthetic faces’ perceptual similarity stucture strongly influenced recognition memory. Results were fit by NEMo, a noisy exemplar model of perceptual recognition memory (Kahana & Sekuler, 2002). The fits revealed that recognition memory was influenced both by the similarity of the probe to series items, and by the similarities among the series items themselves. Non-metric multi-dimensional scaling (MDS) showed that faces’ perceptual representations largely preserved the 3-D space in which the face stimuli were arrayed. NEMo gave a better account of the results when similarity was defined as perceptual, MDS similarity rather than physical proximity of one face to another. Experiment 2 confirmed the importance of within-list homogeneity directly, without mediation of a model. We discuss the affinities and differences between visual memory for synthetic faces and memory for simpler stimuli. Kahana and Sekuler (2002) developed a computational model that successfully accounts for short-term recogni- tion memory with low-dimensional stimuli, compound si- nusoidal gratings whose spatial frequency and phase var- ied. Building on Nosofsky’s (1984, 1986) Generalized Con- text Model (GCM), Kahana and Sekuler’s Noisy Exemplar Model, NEMo, combines core aspects of GCM with new key assumptions. NEMo follows the tradition of multidi- mensional signal-detection theory (e.g., Ashby & Maddox, 1998) in assuming that stimulus representations are coded in a noisy manner, with different levels of noise associated with various dimensions. NEMo augments the summed-similarity framework of item recognition (Clark & Gronlund, 1996; Nosofsky, 1991, 1992; Humphreys, Pike, Bain, & Tehan, 1989) with the idea that recognition decisions are influenced not only by probe-to-list-item similarity, but also by the sim- ilarity of list items to one another, a variable that is called within-list homogeneity. Specifically, subjects appear to in- terpret probe-to-list similarity in light of within-list homo- geneity, with greater homogeneity leading to a greater ten- dency to reject lures that are similar to one or more of the studied items. This impact of within-list homogeneity has The authors acknowledge support from AFOSR F49620-03- 1-0376, and National Institutes of Health grants MH55687 and EY002158. Yuko Yotsumoto is now at the Martinos Center for Biomedical Imaging, Masschuetts General Hospital, Charlestown, MA. Correspondence concerning this article may be addressed to Robert Sekuler; e-mail may be sent to [email protected]. been confirmed by Nosofsky and Kantner (2006), using color patches as stimuli, and by Kahana, Zhou, Geller, and Sekuler (revision submitted), using compound gratings that were ad- justed to reflect individual subjects’ visual thresholds. In contrast to compound sinusoidal gratings, essential as- pects of visual processing of human faces take place sev- eral synapses beyond the primary visual cortex (Loffler, Gor- don, Wilkinson, Goren, & Wilson, 2005; Loffler, Yourganov, Wilkinson, & Wilson, 2005). Because the primary visual cortex participates not only in visual encoding but also in vi- sual memory and related phenomena (Magnussen & Green- lee, 1999; Kosslyn, Thompson, Kim, & Alpert, 1995; Klein, Paradis, Poline, Kosslyn, & Le Bihan, 2000), differences be- tween visual processing of compound gratings and of human faces might produce corresponding differences in recogni- tion memory for the two kinds of stimuli (Hole, 1996), and thereby undermine NEMo’s applicability to face stimuli. Our aim here is not to explore or adjudicate among competing psychophysical or neural accounts of face perception and/or memory for faces (for example, Gauthier, Skudlarski, Gore, & Anderson, 2000; Joseph & Gathers, 2002; Grill-Spector, Knouf, & Kanwisher, 2004; Riesenhuber, Jarudi, Gilad, & Sinha, 2004), but to test NEMo’s extensibility to memory for high dimensional stimuli, and to refine the methodology for measuring and modeling short-term memory. Studies of face perception and/or face memory have used a range of stimuli, including simple cartoons, such as the Brunswik faces (Brunswik & Reiter, 1937; Sigala, Gabbiani, & Logothetis, 2002; Peters, Gabbiani, & Koch, 2003), pho- tographs collected in convenience samples from sources such as school yearbooks, or images whose properties have been

Upload: others

Post on 17-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recognition Memory for Realistic Synthetic Facespeople.brandeis.edu/~sekuler/papers/faceMemory.final.pdfMEMORY FOR SYNTHETIC FACES 3 2002). In geometric terms, faces A-C lie along

Recognition Memory for Realistic Synthetic Faces

Yuko Yotsumoto1, Michael J. Kahana2,Hugh R. Wilson3 and Robert Sekuler1

1Volen Center for Complex Systems, Brandeis University3Department of Psychology, University of Pennsylvania

3Centre for Vision Research, York University

A series of experiments examined short-term recognition memory for trios of briefly-presented, synthetic human faces derived from three real human faces. The stimuli were gradedseries of faces, which differed by varying known amounts from the face of the average female.Faces based on each of the three real faces were transformed so as to lie along orthogonalaxes in a 3-D face space. Experiment 1 showed that the synthetic faces’ perceptual similaritystucture strongly influenced recognition memory. Results were fit by NEMo, a noisy exemplarmodel of perceptual recognition memory (Kahana & Sekuler, 2002). The fits revealed thatrecognition memory was influenced both by the similarity of the probe to series items, andby the similarities among the series items themselves. Non-metric multi-dimensional scaling(MDS) showed that faces’ perceptual representations largely preserved the 3-D space in whichthe face stimuli were arrayed. NEMo gave a better account of the results when similarity wasdefined as perceptual, MDS similarity rather than physical proximity of one face to another.Experiment 2 confirmed the importance of within-list homogeneity directly, without mediationof a model. We discuss the affinities and differences between visual memory for synthetic facesand memory for simpler stimuli.

Kahana and Sekuler (2002) developed a computationalmodel that successfully accounts for short-term recogni-tion memory with low-dimensional stimuli, compound si-nusoidal gratings whose spatial frequency and phase var-ied. Building on Nosofsky’s (1984, 1986) Generalized Con-text Model (GCM), Kahana and Sekuler’s Noisy ExemplarModel, NEMo, combines core aspects of GCM with newkey assumptions. NEMo follows the tradition of multidi-mensional signal-detection theory (e.g., Ashby & Maddox,1998) in assuming that stimulus representations are coded ina noisy manner, with different levels of noise associated withvarious dimensions. NEMo augments the summed-similarityframework of item recognition (Clark & Gronlund, 1996;Nosofsky, 1991, 1992; Humphreys, Pike, Bain, & Tehan,1989) with the idea that recognition decisions are influencednot only by probe-to-list-item similarity, but also by the sim-ilarity of list items to one another, a variable that is calledwithin-list homogeneity. Specifically, subjects appear to in-terpret probe-to-list similarity in light of within-list homo-geneity, with greater homogeneity leading to a greater ten-dency to reject lures that are similar to one or more of thestudied items. This impact of within-list homogeneity has

The authors acknowledge support from AFOSR F49620-03-1-0376, and National Institutes of Health grants MH55687 andEY002158. Yuko Yotsumoto is now at the Martinos Center forBiomedical Imaging, Masschuetts General Hospital, Charlestown,MA. Correspondence concerning this article may be addressed toRobert Sekuler; e-mail may be sent to [email protected].

been confirmed by Nosofsky and Kantner (2006), using colorpatches as stimuli, and by Kahana, Zhou, Geller, and Sekuler(revision submitted), using compound gratings that were ad-justed to reflect individual subjects’ visual thresholds.

In contrast to compound sinusoidal gratings, essential as-pects of visual processing of human faces take place sev-eral synapses beyond the primary visual cortex (Loffler, Gor-don, Wilkinson, Goren, & Wilson, 2005; Loffler, Yourganov,Wilkinson, & Wilson, 2005). Because the primary visualcortex participates not only in visual encoding but also in vi-sual memory and related phenomena (Magnussen & Green-lee, 1999; Kosslyn, Thompson, Kim, & Alpert, 1995; Klein,Paradis, Poline, Kosslyn, & Le Bihan, 2000), differences be-tween visual processing of compound gratings and of humanfaces might produce corresponding differences in recogni-tion memory for the two kinds of stimuli (Hole, 1996), andthereby undermine NEMo’s applicability to face stimuli. Ouraim here is not to explore or adjudicate among competingpsychophysical or neural accounts of face perception and/ormemory for faces (for example, Gauthier, Skudlarski, Gore,& Anderson, 2000; Joseph & Gathers, 2002; Grill-Spector,Knouf, & Kanwisher, 2004; Riesenhuber, Jarudi, Gilad, &Sinha, 2004), but to test NEMo’s extensibility to memory forhigh dimensional stimuli, and to refine the methodology formeasuring and modeling short-term memory.

Studies of face perception and/or face memory have useda range of stimuli, including simple cartoons, such as theBrunswik faces (Brunswik & Reiter, 1937; Sigala, Gabbiani,& Logothetis, 2002; Peters, Gabbiani, & Koch, 2003), pho-tographs collected in convenience samples from sources suchas school yearbooks, or images whose properties have been

Page 2: Recognition Memory for Realistic Synthetic Facespeople.brandeis.edu/~sekuler/papers/faceMemory.final.pdfMEMORY FOR SYNTHETIC FACES 3 2002). In geometric terms, faces A-C lie along

2 YOTSUMOTO, KAHANA, WILSON & SEKULER

tailored to the study’s specific purposes (e.g., Gold, Bennett,& Sekuler, 1999). We chose to work with realistic, synthetichuman faces generated by the methods introduced by Wilson,Loffler, and Wilkinson (2002). As test stimuli, Wilson facesmitigate problems arising from the availability of non-facialinformation or from the availability of distinctive featuraldifferences among faces (for example, Duchaine & Weiden-feld, 2003; Sadr, Jarudi, & Sinha, 2003). Most importantlyfor model-related research, the generating algorithm for Wil-son faces makes it easy to manipulate perceptual differencesamong faces. Furthermore, sets of Wilson faces can be repre-sented in n-dimensional perceptual spaces whose propertiescan be tailored to suit some particular theoretical objective.For example, the axes of a face space might be orthogonal-ized, or individual exemplar faces might be made equidistantfrom some mean or reference face. Also, small, graded dif-ferences between faces makes it difficult for subjects to learnand name each face in a reliable, consistent fashion. This isimportant because naming can subvert mnemonic reliance onvisual information (for example, Ashby & Ell, 2001; Gold-stein & Chance, 1971; Hwang et al., 2005). To reinforcereliance on visual information per se, we limited rehearsalby permitting subjects only a brief glimpse of each face,and then allowing only a short interval between successivefaces. Wilson’s synthesized faces are geometrically simple,compared to actual, unprocessed gray scale photographs offaces, but these synthesized faces manage to convey suffi-cient information to permit identification of individual facesWilson et al. (2002). This individuality, which is important inepisodic memory, is absent from some commonly used facestimuli, such as Brunswik faces (Brunswik & Reiter, 1937).

To preview, Experiment 1’s design allowed us to applythe NEMo model of visual short-term recognition memoryto performance on individual stimulus lists (i.e., diverse se-ries of stimulus items). The model was applied in alternativemodes, for example expressing ”similarity” either in terms offaces’ physical coordinates, or in terms of perceptual coordi-nates, assessed using multdimensional scaling. The design ofExperiment 2 provided a direct, model-free demonstration ofwithin-list homogeneity’s influence on recognition decisions.

Experiment 1

Perceptual similarity among stimuli plays a central role inthe structure of NEMo, and in other models of recognition.With compound gratings as stimuli, similarity has been de-fined by representing stimuli in a metric based on subjects’discrimination thresholds for spatial frequency (Zhou, Ka-hana, & Sekuler, 2004; Kahana et al., revision submitted).That same approach would likely fail with face stimuli be-cause the representational space is clearly anisotropic. Forexample, the difference threshold for discriminating betweenfaces varies substantially from one region of face space toanother, with smallest difference thresholds in the neighbor-hood of the average face (Wilson et al., 2002). To permitfull expression of potential anisotropies, we used non-metricmultidimensional scaling (MDS) to characterize the percep-tual space within which our face stimuli were located, and

to quantify the distances between faces in that space. Thedata used for MDS came from oddity judgments made onsimultaneously-presented trios of faces.

Subjects

Two male and six female volunteers aged from 20 to 25years participated in the main experiment. All were naiveto the purpose of this experiment. They had normal orcorrected-to-normal visual acuity as measured with Snellentargets, and normal contrast sensitivity as measured withPelli-Robson charts (Pelli, Robson, & Wilkins, 1988).

Stimuli

Stimuli were generated and displayed using Matlab andextensions from the Psychophysics and Video Toolboxes(Brainard, 1997; Pelli, 1997). Stimuli were presented on a15-inch computer monitor with a refresh rate of 95 Hz, andresolution set to 800 by 600 pixels. Routines from the VideoToolbox calibrated and linearized the display. Mean screenluminance was fixed at 36 cd/m2.

The Wilson faces used in all our experiments were basedon photographs of three Caucasian females whom we desig-nate A, B, and C. From mA, the vector of 37 measurementstaken on actual face A, we synthesized a realistic version ofthat face in a stimulus space of high dimensionality (n = 37).Vectors of measurements taken on faces A, B and C weretransformed mathematically so as to be mutually orthogonal,by Gram-Schmidt orthogonalization (Diamantaras & Kung,1996; Principle, Euliano, & Lefebvre, 2000). Consequently,variation in one face’s geometric properties is independent ofthe variation in geometric properties of the other two faces(Wilson et al., 2002). Additional details of the faces’ con-struction and properties are given in an Appendix.

After pre-processing and normalization, vectors of mea-surements from several different faces can be combined togenerate mavg, the vector of measurements for an averageface. To illustrate, the synthesized mean female face isshown at the left of Figure 1. This mean face is derived froma sample of 40 Caucasian female faces. Summing k(mavg)and (1−k)(mA), for some k, 0 < k < 1, generates a face thatis a mixture of the mean face and some particular individualface, in this case, A. Allowing k to vary, k = 0 . . .1, gener-ates a graded series of faces spanning a continuum from themean synthesized face (when k = 1) to a synthesized versionof face A alone (when k = 0). The same operation also cangenerate a graded series of faces, which span the distancefrom the mean face toward any other face, here, toward B orC. With these faces, it is easy to vary one stimulus’ similarityto another, a variable that is important in recognition memoryand central to NEMo.

The graded series for faces A-C are shown in the upperthree rows of Figure 1. Within each row, the value (1− k)ranges from 0.04 to 0.20, in increments of 0.04. This meansthat each face differs from its nearest neighbor by approxi-mately the mean discrimination threshold taken under view-ing conditions similar to the ones we used (Wilson et al.,

Page 3: Recognition Memory for Realistic Synthetic Facespeople.brandeis.edu/~sekuler/papers/faceMemory.final.pdfMEMORY FOR SYNTHETIC FACES 3 2002). In geometric terms, faces A-C lie along

MEMORY FOR SYNTHETIC FACES 3

2002). In geometric terms, faces A-C lie along the mutually-perpendicular axes of a 3D space, with the mean face at theorigin.

A final set of faces, D, was generated by averaging cor-responding exemplars of A, B, and C. The resulting facesare shown in the bottom row of Figure 1. Geometrically,the faces in row D lie along the diagonal of face space,which means that the geometric properties of the faces inD are equally well correlated with the geometric propertiesof each of the other faces, A, B, and C. Figure 2A shows thegeometric arrangement of all 21 synthetic faces in a spaceof three-orthogonal dimensions. To minimize potential in-fluences from emotional expressions on the stimulus faces(Goren & Wilson, 2003; Isaacowitz, Wadlinger, Goren, &Wilson, 2006), our set of Wilson-faces substituted constantgeneric shapes for features that would change shape or rela-tive position as emotions were expressed (Ekman & Friesen,1975).

Figure 1. Face stimuli used in Experiment 1. Constructed after themethod of Wilson et al. (2002), the stimulus labeled ’mean’ is theaverage of 40 female faces. In the matrix of faces, rows A-C showsfaces derived from three different faces; row D shows faces that arethe means of faces A-C. Over the matrix columns 1, . . . ,5, facesdeviate increasingly from the mean, by 0.04, in column 1, through0.20, in column 5. For additional details, see the text.

Procedure

On each trial, a study set of three faces, the study series,was followed by a single probe face (p). Subjects judgedwhether p had been among the items in the study series. Weuse the term target to designate a p that had been in the studyseries, and the term lure to designate a p that had not beenin the study series. Correspondingly, we can designate anytrial as either a target trial or a lure trial. Because the studyseries varied from trial to trial, subjects were forced to baseeach “yes”-“no” recognition judgment on the items they hadjust seen.

Each study face was presented for 110 msec, with an inter-stimulus interval of 200 msec. The use of brief presentationswas inspired by Wilson et al. (2002)’s use of this same dura-tion in their studies of face discrimination, and by the factthat fairly detailed processing of a face can be completedwithin the first 100 msec of viewing (Lehky, 2000).

A warning tone followed the study series. Then, after1200 msec retention interval, a p face was presented for 110msec. For each study list, p was chosen at random from theentire set of faces, with two constraints. First, on half ofall trials, p was forced to replicate one of the items in thestudy set (on half of all trials, p differed from all study items).Second, when p matched one of the study items, it matcheditems in each serial position equally often. Distinctive tonesfollowing each response gave subjects trial-wise knowledgeof results.

0.10.2

0.10.2

0

0.1

0.2

21

4B5

3

A53

1 24

112

32

4

D53

4C5

0.10.2

0.10.2

0

0.1

0.2

1 1 2

23

4

31

324

4

A5

B5

12

D534

C5

A B

Figure 2. Representations of face stimuli in alternative three-dimensional spaces. In each panel, the star indicates the location ofthe mean face; squares, diamonds, circles, and triangles representfaces from category A, B, C and D, respectively. The numbers 1-5designate faces’ distance from the mean; the numbers correspond tothe columns in Figure 1. Panel A. The 21 face stimuli arranged ac-cording to the Euclidean distances between faces’ physical descrip-tions, that is the physical distance of each face from the mean. PanelB. Arrangement of the stimulus faces in a three-dimensional per-ceptual space, using the MDS solution to position each face (Expt.2). The MDS space shown here has been Procrustes transformedto bring its dimensions into line with those of the space shown inPanel A.

Although there were small differences in size from face toface, each was approximately 5.5 degrees high by 3.8 degreeswide. To eliminate the usefulness of vernier-type cues, thevertical and horizontal position of each face was perturbedon each presentation by adding a pair of random displace-ments drawn from a uniform distribution with mean = 12.6minarc, and range = 2.1–23.1 minarc.

Sixty different stimulus series were generated were used.A preliminary study with different subjects and many morestimulus series identified these 60 series as likely to span awide range of recognition performance (Yotsumoto, Kahana,Wilson, & Sekuler, 2004). We reasoned that a wide range ofperformance would allow for the strongest test of NEMo. Ofthe 60 stimulus series, half comprised target trials, in whichp replicated one of three study items; the remaining listscomprised lure trials, in which p replicated none of the studyitems.

Subjects participated in four one-hour sessions of 490 tri-

Page 4: Recognition Memory for Realistic Synthetic Facespeople.brandeis.edu/~sekuler/papers/faceMemory.final.pdfMEMORY FOR SYNTHETIC FACES 3 2002). In geometric terms, faces A-C lie along

4 YOTSUMOTO, KAHANA, WILSON & SEKULER

als each. The first 10 trials of any session were treated aspractice, and were eliminated from data analysis; this left32 replications for each stimulus series and subject. Dur-ing testing, a subject sat with head supported by a chin-and-forehead rest, viewing the computer display binocularly froma distance of 114 cm. Trials were self-initiated.

Multidimensional Scaling. To characterize the perceptualsimilarity structure of the synthetic faces (Lee, Byatt, &Rhodes, 2000), the subjects who would serve in the recogni-tion memory experiment first took part in a study with non-metric multidimensional scaling (MDS). The data requiredfor such scaling were generated using the method of triads(Ennis, Mullen, Frijters, & Tindall, 1989) On each trial, threefaces were presented simultaneously, side by side, for 500ms. Simultaneous presentation was used in order to mini-mize likely effects of memory. From the set of three faces,subjects chose the one face that seemed most different fromthe other two (Romney, Brewer, & Batchelder, 1993; Wexler& Romney, 1972). We did not specify the characteristic(s)on which similarity judgments should be based.

To minimize the possibility that vernier-type cues mightcontribute to the dissimilarity judgments, each face’s verti-cal position was randomly offset by a sample from a uni-form distribution spanning ±16.8 min. Each possible stimu-lus pair (Face1, Face2) was presented with every other pos-sible stimulus, for example Face3. If stimulus Face3 wereselected as the stimulus most different from the others, thenby default the remaining stimuli, Face1 and Face2 were as-sumed to be the most similar to one another. A similaritymatrix was constructed by counting the number of times thata stimulus pair (e.g., Face1, Face2) is designated as ”similar”when placed in combination with various other stimuli (e.g.,Face3 . . .Face21).

To control the number of trials required for multidimen-sional scaling, we used a Balanced Incomplete Block design(Weller & Romney, 1988). For this design we generated tri-ads of faces (”blocks”), whose members were drawn fromthe complete set of 21 faces. This selection was constrainedso that each of the 210 pairs of faces occurred in the contextof 30 triads. This arrangement meant that the 30 trials whosetriads included any particular pair of faces were likely to havedifferent faces as their third member. The displacement ofthree faces were randomly determined for each trial.

Each subject participated in three, one-hour sessions of710 trials each. The first 10 trials of each session were treatedas practice, and were eliminated from our data analysis. Theremaining 2100 triadic comparisons per subject were con-verted into a dissimilarity matrix, which were processed bySPSS’ ALSCAL and INDSCAL routines, using a Euclideandistance model.

ResultsRecognition Memory. Figure 3 shows the mean propor-

tion correct for lure trials (arrow at right side of figure) andfor target trials as a function of p’s serial position (threegrey bars). An ANOVA showed that recognition on targettrials across the three serial positions varied significantly,

Figure 3. Proportion correct recognition responses on target tri-als for probe matching various study items. Error bars represent±1standard error of the mean, calculated using the correction sug-gested by ? (?). The arrow to the right of the graph represents themean proportion correct on lure trials. Each star shows the pro-portion correct generated by 29 different subjects who tested in thesame conditions during an experiment described elsewhere (Yot-sumoto et al., 2004).

F(2,14) = 24.43, p < 0.001, and an a priori comparisonrevealed a significant effect of recency, F(1,7) = 32.98,p < 0.01. For comparison, the stars in Figure 3 representresults with these same 60 stimulus series, in another exper-iment with different subjects and many more stimulus series(Yotsumoto et al., 2004). The comparability of results fromthe two independent replications is gratifying. Note that per-formance for each serial position is significantly greater thanchance, that is, each hit rate is significantly greater than thefalse alarm rate (paired two-tail t-tests, p <0.02, 0.01, and0.001 for serial position 1, 2, and 3, respectively). So al-though performance with the first two serial positions is low,it is above chance.

Multidimensional Scaling. MDS solutions were obtainedfor representations in one-to-six dimensions. Values of r2,which represent the proportion of variance accounted for inthe scaled data, increased as the number of dimensions variedfrom one to three, but saturated thereafter. Based on thesevalues and also on the dimensionality of the stimuli, sub-sequent modeling was based on the three dimensional MDSsolution. In that solution, Kruskal’s Stress measure was 0.26,and r2 was 0.62. The three-dimensional solution generatedby MDS is shown in Figure 2B. Each symbol represents onesynthetic face, and the pairwise distances between symbolsrepresent corresponding pairwise perceptual dissimilaritiesbetween the faces.

We used Procrustes analysis (Dryden & Mardia, 1998)to visualize relationships between this three-dimensional de-scription of similarity space and the three-dimensional struc-ture in which the faces were generated. The Procrustes anal-ysis linearly transformed the matrix of values from the MDSsolution to bring that matrix into best conformity with thematrix of pairwise distances in the faces’ physical space. The

Page 5: Recognition Memory for Realistic Synthetic Facespeople.brandeis.edu/~sekuler/papers/faceMemory.final.pdfMEMORY FOR SYNTHETIC FACES 3 2002). In geometric terms, faces A-C lie along

MEMORY FOR SYNTHETIC FACES 5

outcome, shown in Figure 2B, was based on the Euclideansimilarity transformations of translation, reflection, orthog-onal rotation, and isotropic scaling of points in the MDSsolution. If the perceptual representation of these syntheticfaces were identical to their physical representation, the post-Procrustes MDS solution would be perfectly congruent withthe representation in Figure 2A, where faces A, B, and C areorthogonal to each other, and exemplars of face D lie on thediagonal.

Clearly, although the transformed MDS solution doesresemble the arrangement of the face stimuli themselves,residual differences remain between the perceptual space,as represented by MDS, and the physical space. Afterthe Procrustes transformation, the sum of squared residualdiscrepancies between the physical representations and thetransformed-MDS representations was 0.26. To provide anintuition about the magnitude of this value, we used MonteCarlo methods to put this sum of squares into the same unitsas were used for the Euclidean physical space (Figure 2A).A series of Procrustes analyses were done on matrices inwhich the faces’ physical coordinates were randomly per-turbed to varying, known degrees, by the addition of inde-pendent, zero-mean, Gaussian random deviates to each of thethree coordinates for each face. This operation was carriedout 1000 times for Gaussian distributions with different stan-dard deviations. From the mean residual sums of squares as-sociated with each value of standard deviation, we identifiedthe random perturbation of face coordinates that producedthe same residual sum of squares as had been obtained withthe Procrustes transformation of the MDS solution. The stan-dard deviation of the residual difference between the MDSsolution and the original physical coordinates was equiva-lent to 6-7%, which corresponds to ∼1.5× the separationof neighboring faces within any single category of faces, A. . . D.

To examine the residuals on a finer scale, the mean residu-als between the MDS solution and the faces’ physical coordi-nates were calculated and then sorted into bins according tothe distance between faces in a study series and the mean ofthe 21 faces. These values are plotted in Figure 4. Note thatthe magnitude of the residuals grew with increasing distancefrom the mean face, confirming that perceptual and physicalrepresentations of faces were most discrepant for the moreextreme faces in our set.

As a further comparison between the MDS and physicalrepresentations of our 21 faces, we computed the vector an-gles between perceptual exemplars of A, B, and C. Thesevector angles are shown in Table 1. Faces just 4% away fromthe mean face were excluded from these calculations becausein MDS space those faces clustered tightly around the meanface, which made angle measurements for those faces mean-ingless. The mean angles based on the 8, 12, and 16% datawere 89, 70 and 93◦, suggesting that the perceptual similar-ity space preserved much, but not all of the orthogonality thathad been built into the faces’ original, 3-D space. However,all of the angle estimates dropped when the 0.20 faces wereincluded in the calculations, which confirms the demonstra-tion in Figure 4 that these extreme faces show the largest per-

ceptual deviations from the geometry of Figure 2A’s space.

Table 1Angles Between Perceptual Representation of Faces A-C

Face Distance AB Angle AC Angle BC Angle8% 125.7 65.3 116.1

12% 83.2 74.2 80.716% 57.4 71.5 83.420% 46.5 54.9 63.3

Mean for 8%-20% 78.2 66.5 85.9Mean for 8%-16% 88.8 70.3 93.4

Figure 4. Mean distance between faces’ representations in physi-cal space and in perceptual (MDS) space, as a function of distancefrom the mean face.

ModelWe applied NEMo to the recognition memory data. As

mentioned before, NEMo departs from the classic summedsimilarity models of item recognition (e.g., McKinley &Nosofsky, 1996; Nosofsky, 1986) by allowing recognitionjudgments to be determined not only by the similarity be-tween the probe, p, on one hand, and each study stimulus, onthe other, but also by similarities among study items them-selves. Given a series of L study items, s1...sL, and a probe,p, NEMo responds ”yes” if:

L

∑i=1

αη(p,si + εi)︸ ︷︷ ︸Summed Probe-Item Similarity

+β2

L(L−1)

L−1

∑i=1

L

∑j=i+1

η(si + εi,s j + ε j)︸ ︷︷ ︸Mean Within-List Homogeneity

>CL

(1)where η(p,si) is the perceptual similarity between p and theith study item (see Equation 2, below); ε is a vector repre-senting the noise associated with each stimulus dimension,αi is the weight given the ith study item, and CL representsan optimal criterion for a series of L study items. To al-low for the possibility that subjects’ decision rule might in-corporate within-list homogeneity, NEMo adds together (i)

Page 6: Recognition Memory for Realistic Synthetic Facespeople.brandeis.edu/~sekuler/papers/faceMemory.final.pdfMEMORY FOR SYNTHETIC FACES 3 2002). In geometric terms, faces A-C lie along

6 YOTSUMOTO, KAHANA, WILSON & SEKULER

summed similarity and (ii) within-list homogeneity, weight-ing the latter by a parameter β. If β = 0 the model reduces toa standard summed similarity model (Nosofsky, 1986) withnoisy item representations (Ennis, 1988) and a deterministicdecision rule. If β < 0, when s1...sL are similar to one an-other, a given lure becomes less tempting, that is, it attractsfewer “yes” responses. The opposite effect would accom-pany β > 0, that is, study items similar to one another wouldattract more “yes” responses.

In NEMo, similarity, η(si,s j), between item representa-tions, si and s j, is given by:

η(si,s j) = ae−τd(si,s j)c(2)

where d is the weighted distance between the two stimulusvectors, and τ, c and a jointly determine the form of the gen-eralization gradient.

With similarity defined in either physical or perceptual(MDS) space, we ran parallel simulations of NEMo, one setof simulations using each definition of similarity. Amongparameters in Equation 2, we fixed c = 1, implementing asimple exponential generalization function as suggested byprevious empirical results (Kahana & Sekuler, 2002). To re-duce NEMo’s free parameters further, we used an indepen-dent, empirical estimate of τ and a. We estimated similarityin physical space using data from a large-scale preliminaryexperiment (Yotsumoto et al., 2004) that generated an empir-ical approximation of the similarity tuning function in phys-ical space.1The similarity tuning function’s exponent and y-intercept, 9.20 and 0.84, were used for τ and a respectively, inone set of model simulations. Then, re-using results from thepreliminary experiment, we transformed inter-face distancesaccording to values from the MDS analysis, and fit a secondexponential similarity function. This generated a similaritytuning function in perceptual space. The function’s exponentand y-intercept, 11.14 and 0.91, were used as τ and a respec-tively, in a second set of model simulations. Finally, we fixedone other parameter, setting NEMo’s criterion to 0.5, whichis the empirical value found in previous model fits (Kahana& Sekuler, 2002).

Simulations and Application of Model. We fit NEMo tothe value of P(yes) obtained for each of the 60 different stim-ulus lists in Experiment 1. A genetic algorithm (Mitchell,1996) found NEMo’s best fitting parameter set by minimiz-ing the root-mean squared-difference (RMSD) between ob-served and predicted recognition scores. The genetic algo-rithm allowed a population of 1000 random parameter setsto evolve for 20 generations. At the end of every generationeach of the 500 least-fit parameter sets was replaced with anew parameter set, which randomly draw each of its param-eter values from one of the non-replaced, 500 best-fit param-eter sets. Then the non-replaced 500 best-fit parameter setswere mutated by a single, Gaussian parameter change witha standard deviation of 30% of a parameter’s range. Finally,to produce an estimate of RMSD, each parameter set ran for1000 simulated trials for each stimulus list.

Results of Model Simulations. We fit subjects’ averageperformance twice, expressing NEMo’s inter-face similarityvalues, τ, a, and d(si,s j), either as physical distances be-tween faces, or as values from the MDS descriptions of sub-jects’ perceptual space. Table 2 gives the best fitting modelparameters for NEMo derived from the genetic algorithm.The column Physical shows the best parameters using stim-ulus distances in physical space, and the column MDS showsthe best parameters using stimulus distances in MDS, per-ceptual space. The first three parameters, σ1, σ2, and σ3, arethe variances of noise distributions: one for each dimensionof the three dimensional, perceptual space. σ1, σ2, and σ3correspond to dimension 1, 2 and 3 in Figure 2. The nexttwo parameters, α1 and α2, represent forgetting for the firstand second item in a study series, respectively. (For the lastitem in a study series, α3 was set to one.)

As explained earlier, β represents the contribution ofwithin-list homogeneity. Its negative sign for both simula-tions, indicates that when study items were similar to oneanother, NEMo became more conservative, decreasing anytendency to treat a lure as a target. Note, finally, that theRMSD associated with the perception-based fit, 0.101, issmaller than the RMSD associated with the fit based on thefaces’ physical representation, 0.123.

0

0.2

0.4

0.6

0.8

1.0

0 0.2 0.4 0.6 0.8 1.0

Predicted Proportion "Yes"

r2=.75

B

0

0.2

0.4

0.6

0.8

1.0

0 0.2 0.4 0.6 0.8 1.0

Predicted Proportion "Yes"

r2=.67

A

Figure 5. Proportion ”yes” responses plotted against predictionsfrom NEMo. Panel A. Interstimulus distances used in NEMo weretaken from physical description of stimuli. Panel B. Interstimulusdistances used in NEMo were taken from MDS solution.

Each of the best-fit parameter sets was used to generateNEMo’s predictions for each of the 60 lists. Because NEMohas three non-deterministic, noise parameters, running themodel with any single trio of noise samples σ1 . . .σ3 couldnot produce a singular, exact set of predictions. Therefore,NEMo was used to simulate 1000 trials for each of the 60study-p lists, with new, independent random noise samplesdrawn for each trial. From these 1000 trials, we obtained pre-dicted proportion yes responses for each list. Figure 5 shows,for each list, the relationship between the predicted and theobserved proportion of yes responses. The predictions madewith similarity defined by the physical representations of facestimuli are shown in Figure 5A), predictions based on theMDS solution are shown in Figure 5B. NEMo produced a

1 The similarity tuning function was based on errors in same-different judgments with just a single study item, which minimizedmemory load, and with timing identical to that used here.

Page 7: Recognition Memory for Realistic Synthetic Facespeople.brandeis.edu/~sekuler/papers/faceMemory.final.pdfMEMORY FOR SYNTHETIC FACES 3 2002). In geometric terms, faces A-C lie along

MEMORY FOR SYNTHETIC FACES 7

better account of the data when it incorporated perceptualsimilarity among faces. Physical and perceptual similaritiesproduced r2 = 0.68 and 0.76, respectively. To gauge the re-liability of this difference in r2, we repeated the simulationan additional 1000 times for each list, again drawing newindependent noise samples for each trial, and calculating r2

after every new 1000 trials. For simulations based on phys-ical similarity between faces, the mean r2= 0.6838 (SD =0.0109); for simulations based on MDS-defined similarity,the mean r2= 0.7573 (SD = 0.0090). The difference betweenthese mean values of r2 is highly significant, z=3.43, p<.001.

Table 2Best fitting parameter values for NeMO’s fit to the data

Parameter Meaning Physical MDSσ1 Dimension1 noise 0.033 0.032σ2 Dimension2 noise 0.072 0.046σ3 Dimension3 noise 0.045 0.046α1 Forgetting of 1st item 0.47 0.57α2 Forgetting of 2nd item 0.40 0.45β Interitem similarity -0.53 -0.34τ Tuning function steepness 9.20 11.14

RMSD 0.123 0.101

The reliability of the MDS solutions. To assess the con-sistency of subjects’ triadic judgments we computed two dif-ferent mean MDS solutions. One solution was based onsubjects’ dissimilarity judgments on all odd-numbered trials(that is, first, third, fifth, etc.), the second solution was basedon judgments from all even-numbered trials (that is, second,fourth, sixth, etc.).2For each three dimensional solution, theEuclidean distances between all face pairs were taken, andthe correlation calculated between pairwise distances fromodd trials and pairwise distances from even trials. The re-sults are shown as a scatterplot in Figure 6. Despite the 50%reduction in the number of judgments on which each MDSsolution was based, the pair of MDS solutions produced bythis process had a relatively strong correlation with r2=0.79,which we take as confirmation that the triadic comparisonsproduced reliable measures of similarity.

As noted earlier, NEMo’s predictions were more accu-rate when psychophysical (MDS) rather than purely physi-cal similarities were taken account of, but those predictionshad a number of clear outliers. These were stimulus serieson which the model failed badly, that is, deviated by 0.20or more from the predicted P(yes). To identify the originof these failures, we examined the makeup of these series.Of the five outliers, three contained two or more faces thatdeviated from the mean face by the largest value possible,namely 0.20. To this outcome, Monte Carlo simulation as-signed a probability 0.02< p <0.01. So, faces with the great-est deformation relative to the mean face produced the largesterrors in NEMo’s predictions. As shown in Figure 4 and inTable 1, these extreme faces deviated appreciably from theorthogonal, physical representations of the faces. These de-viations might have arisen from perceptual transformations

Figure 6. Distance between items in the MDS solution calculatedby using even trials and them calculated by using odd trials.

associated with the faces’ ”strangeness,” which effectivelyintroduced an additional perceptual dimension limited to theextreme faces. Because this added dimension was associatedwith only a subset of faces, it was not represented fully in theMDS solution.

Discussion

NEMo’s account of the recognition results gives furthersupport to the idea that recognition decisions depend uponboth summed similarity and within-list homogeneity. Dis-crepancies between the faces’ representation in physicalspace (Figure 2) and their perceptual representation (Fig-ure 2) advantaged model fits that were based on perceptual,rather than physical similarities between faces. The percep-tual representation was based on MDS, which requires as in-put a matrix of similarity or dissimilarity judgments. Re-searchers have taken various approaches to generate suchmatrices. For example, the input matrix has been generatedby asking subjects to rate the distinctiveness of items pre-sented one at a time (Valentine & Bruce, 1986; Valentine,1991; Valentine & Endo, 1992; Lee et al., 2000), or to ratenumerically the similarity of items presented in pairs (Nosof-sky, 1991; Johnstone & Williams, 1997; Peters et al., 2003).We took a different approach, using triadic comparisons toproduce the input matrix for MDS. We chose this method inpart for its efficiency in generating many different compar-isons per pair of faces, and in part because the task resem-bled the recognition memory task we used. For one thing,by encouraging subjects to distinguish among members ofthe briefly-presented triad, the task was likely to engage therapid and sometime subtle distinctions required in the recog-nition judgments. At the same time, variation in the triad’s

2 The balanced incomplete block design forced us to take thisindirect approach. Differences in the makeup of triads on succes-sive trials prevented us from comparing judgments themselves. Asa result, our comparisons had to be mediated via MDS.

Page 8: Recognition Memory for Realistic Synthetic Facespeople.brandeis.edu/~sekuler/papers/faceMemory.final.pdfMEMORY FOR SYNTHETIC FACES 3 2002). In geometric terms, faces A-C lie along

8 YOTSUMOTO, KAHANA, WILSON & SEKULER

constituents from one presentation to another, mimicked thetrialwise variation among study series.

Torgerson (1958) described other variants of the methodof triadic comparisons in which subjects had to make multi-ple, explicit pairwise judgments per trial. Note that the sin-gle explicit judgment required on each trial in our applicationimplies that subjects have made one or more pairwise com-parisons, although such comparisons are not made explicit.Letting the stimuli in the triad be i, j, and k, our subjects’identification of one item as most dissimilar could reflectevaluations of inequalities among |i− j|, |i− k|, and | j− k|.Of course, when subjects are not forced to make such evalua-tions explicit, one cannot rule out the possibility that, partic-ularly with time pressures, subjects might sometimes makefewer than all pairwise comparisons.

Experiment 2

The simulations applied to data from Experiment 1 re-vealed that visual memory performance could be well pre-dicted by a model that takes account of both summed sim-ilarity and within-list homogeneity. Because the within-listhomogeneity term is a novel addition to the summed similar-ity framework, we sought an additional, direct, model-free,demonstration that within-list homogeneity was actually im-portant in face recognition memory. Therefore we designedstimulus series in which variation in both summed similarityand in within-list homogeneity were controlled. We expectedthat the responses produced by various combinations of thetwo factors would directly demonstrate the contribution ofeach factor, without the mediation of a computational model.

Methods

Apparatus, Stimuli and Procedure. The apparatus andstimuli were the same as in Experiment 1 except that mul-tiple subjects were tested simultaneously, using computersin a classroom cluster. Although subjects did not use chinrests, they were encouraged to maintain a constant viewingdistance of approximately 57 cm from their computer. Theprocedure was the same as Experiment 1 except that the threestudy faces for each trial were forced to come from threedifferent categories of faces, A. . .D (as shown in Figure 1).Study lists were first generated randomly. Because of theupper limit on possible pair-wise, physical differences be-tween faces that we used, differences among randomly se-lected faces tended to be small, which produced skewed dis-tributions of summed similarity and within-list homogeneity.Moreover, the randomly-generated lists, produced non-zerocovariance between summed similarity and within-list homo-geneity. To test wider ranges of both types of similarity in-dependently, summed similarity and within-list homogeneityhad to be to distributed uniformly. To generate series meetingthis criterion, the distributions of two kinds of similarity werecalculated after random generation of a set of series, and ex-isting stimulus series were replaced by newly generated onesuntil we had a set of study series that satisfied the distributionrequirement.

Subjects. Twenty nine Brandeis undergraduates partici-pated as part of a course requirement; during the session,each subject gave 436 trials. All subjects were naive to theexperimental purpose, and none had taken part in our otherexperiments.

Results and Discussion

For each series on which p was a lure, we used the faces’physical coordinates to calculate summed similarity betweenp and all study items, and the within-list homogeneity. Forthis purpose, similarity was assumed to be monotonic withEuclidean distance in the physical space represented in Fig-ure 2A. We sorted the trials into the cells of a 3×3 matrix,whose rows represented three levels of summed similarity,and whose columns represented three levels of within-listhomogeneity. At the end of the sorting, each cell of the3x3 matrix contained 24 trials per subject. The proportionof ”yes” responses was calculated separately for each of thenine cells in the matrix; these values are plotted in Figure 7.The parameter of the family of curves is within-list homo-geneity. The proportion of ”yes” responses increased withsummed similarity, but decreased with growth in within-listhomogeneity. A repeated measures ANOVA showed thatboth these effects were statistically significant, F(2,56) =85.6, p < .01, and F(2,56) = 7.08, p < .05, for summed sim-ilarity and within-list homogeneity, respectively. The inter-action was not significant, F(4,112) = 1.16, p = .33.

Note that the directions of the two effects observed herereproduced the corresponding effects seen in simulationswith NEMo for Experiment 1. Of particular interest wasthe direct confirmation that within-list homogeneity andsummed similarity operate in opposed directions to influencerecognition judgments, just as NEMo demonstrated they did.

Within-List Homogeneity

Figure 7. The proportion of ”yes” responses on lure trials as afunction of summed similarity. These false alarm rates are plottedseparately for three different levels of within-list homogeneity. Thediameter of each filled circle signifies the magnitude of within-listhomogeneity. Error bars represent ±1 standard error of the mean,corrected for within subject variability (?).

Page 9: Recognition Memory for Realistic Synthetic Facespeople.brandeis.edu/~sekuler/papers/faceMemory.final.pdfMEMORY FOR SYNTHETIC FACES 3 2002). In geometric terms, faces A-C lie along

MEMORY FOR SYNTHETIC FACES 9

General Discussion

Physical coordinates vs MDS solutions

NEMo’s account of visual recognition memory perfor-mance for face stimuli was improved when the model incor-porated faces’ perceptual similarity rather than purely phys-ical coordinates. It is important to note that, with either per-ceptual or physical representations of similarity, NEMo hadthe same number of free parameters, therefore this differencedid not result from a difference in the model complexity.

We should note that not every related study has demon-strated an advantage from describing stimuli in perceptual(MDS) terms rather than physical ones. Peters et al. (2003)found that the performance of categorization models was ei-ther unchanged or even slightly diminished when face stim-uli were described using MDS rather than a native, phys-ical metric. In their study, subjects were trained to cat-egorize stimuli including schematic Brunswik-Reiter faces(1937), and slightly more elaborate, cartoon faces. Thesefaces were defined in 4-dimensional physical spaces, whichsubjects learned to bifurcate using a simple, linear separablecriterion. After learning the category membership of vari-ous exemplars, subjects’ categorization was tested with mix-tures of previously-seen faces and new ones. Peters et al.(2003)’s modeling results favored the proposition that sub-jects stored a sparse, abstracted representation of categoryproperties, rather than the characteristics of individual ex-emplars. abstract, sparse the learned, long-term informa-tion as individual exemplars. This outcome with longer-termlearning of stable category properties is quite different fromthe case in our experiments on episodic recognition, wheresubjects seem to store individual exemplars, at least for thebrief duration of a single trial. In categorization tasks likePeters et al. (2003)’s, subjects can learn abstract rules overthe course of their experience with many exemplars. How-ever, such a strategy would not work in a episodic recognitiontasks in which targets and lures are drawn randomly from acommon stimulus space, and in which no simple rule couldwork in the face of trial to trial variation in the study andtest items. Of course, if some simplifying consistent biaswere introduced into a recognition experiment so that lureswere always drawn from one category, say Wilson faces fromclass A, and targets were always drawn from a different cat-egory, say Wilson faces from class B, subjects undoubtedlywould eventually learn the rule and be able to ignore thestudy items. Clearly, though, such an experiment would nolonger be an experiment on episodic recognition.

Differences from compound gratings

One of our purposes was to investigate short term visualmemory with higher dimensional stimuli, particularly by ap-plying NEMo to recognition of synthetic faces. The bestfitting parameters obtained here preserved the general char-acteristics observed in Kahana and Sekuler (2002)’s studieswith memory for compound gratings. For example, in bothstudies, recency effects were captured by values of α, and theempirical effects of within-list homogeneity were reflected in

significantly negative values of β. Moreover, despite the factthat τ values were derived in different ways for recognitionof gratings and of faces, the resulting τ values were closebetween two studies (8.8 and 10.7 with compound gratings,9.20 and 11.14 with synthetic faces).3 It seems, then, thatsimilar similarity-distance functions operate for both low-dimensional (gratings) and high-dimensional stimuli (syn-thetic faces).

However, memory for synthetic faces may differ in an im-portant way from the memory for compound gratings. Eventhough we found a recency effect with synthetic faces aswell as with gratings, attributes of the recency effect ob-served in this study differed from those found with gratings.With gratings, the recency effect persisted across the en-tire list, with performance increasing systematically from theleast to the most recently presented item (Kahana & Sekuler,2002). Here, though, the serial positions preceding the lastone produced essentially equivalent performance, though, asmentioned earlier, performance at all serial positions is bet-ter than the chance level defined by the false alarm rate.This suggests that with higher dimensional stimuli, insteadof forgetting previously-seen items gradually, the last seenitem may diminish the memory for all previously seen itemsequally. This resembles a result reported previously (Phillips,1974, 1983; Phillips & Christie, 1977).

Because procedures differed between the studies, we mustbe cautious in attributing various discrepancies in the resultsto differences in stimulus dimensionality alone. But we dobelieve that parallel studies of recognition memory for grat-ings, faces, and other high-dimensional stimuli, as well ascomparable stimuli from other sensory modalities (Visscher,Kahana, & Sekuler, 2006), will ultimately help us understandhow the number and structure of perceptual dimensions thatmake up stimuli contribute to human short term recognitionmemory.

We should note at least one possible application of thepresent results, to the validity of police lineups. When a wit-ness views a simultaneous lineup comprising several indi-viduals, the homogeneity of those individuals is an analogueto the within-list homogeneity whose potency was demon-strated in Experiments 1 and 2. Extrapolating from our re-sults, one would expect that within-list homogeneity in alineup would strongly affect a potential witness’ recognitionresponse. Recently, law enforcement officials in the UnitedStates have been encouraged to substitute sequential lineupsfor the traditional simultaneous procedure (Turtle, Lindsay,& Wells, 2003). Advocates of this procedure have claimedthat sequential lineups somehow enhance discriminability,and thereby promote accuracy. In sequential lineups, wit-nesses view one lineup member at a time and decide whetheror not that person is the perpetrator, prior to viewing the nextlineup member. Here, witnesses make a yes-no judgmentafter viewing each single person/face, one at a time, which is

3 We ran simulations with various τ values, and found that dif-ferences of this size did not appreciably alter the model’s resultingRMSD. So differences in the similarity-distance functions were notcritical to the success of model fits.

Page 10: Recognition Memory for Realistic Synthetic Facespeople.brandeis.edu/~sekuler/papers/faceMemory.final.pdfMEMORY FOR SYNTHETIC FACES 3 2002). In geometric terms, faces A-C lie along

10 YOTSUMOTO, KAHANA, WILSON & SEKULER

meant to minimize false alarms, by discouraging the sort ofrelative judgments that can contaminate simultaneous line-ups (Lindsay et al., 1991; Steblay, Dysart, Fulero, & Lind-say, 2001). However, Gronlund (2004) has shown that thebeneficial effect of sequential lineups arises not from height-ened discriminability, but from a change in criterion –witha sequential lineup promoting a more conservative criterion.Whatever their benefit, though, sequential lineups would beimmune to influences from within-list homogeneity only ifthere were zero carryover of memory from one face/lineupmember to another, an assumption that begs to be evaluated.

Finally, in all the studies we have reported here, possi-ble contamination of face recognition by emotional cues wasintentionally avoided. In fact, the set of Wilson-faces thatwe used substituted constant generic shapes for features thatwould change shape as emotions are expressed. In additionto equation on basic dimensions such as contrast and meanluminance, the consistent neutral expression of our facesruled out emotion as an aid to recognition memory. Givenemotional expression’s known role in recognition (for ex-ample, Gallegos & Tranel, 2005; Johansson, Mecklinger, &Treese, 2004; Kaufmann & Schweinberger, 2004), it is worthnoting that systematic variation in the position and/or orien-tation of eyes, mouth and brows (Ekman & Friesen, 1975)can generate Wilson faces that express distinct, easily iden-tified emotions, and in varying degree (e.g. Isaacowitz et al.,2006). We plan to evaluate how the presence of emotion sig-nals of varying, calibrated strength might combine with otherfacial information to influence short-term face recognition.

Appendix

Wilson et al. (2002) introduced a method for generatingsynthetic faces that are well-suited for for model-driven re-search on various topics, including visual memory. In theirscheme, individual synthetic faces are derived from gray-scale face photographs by digitizing 37 key points: 14 pointsdefining head shape, 9 points for the hairline, 4 points foreye locations, 4 points for nose length and width, 5 pointsdefining the mouth and lips, and one point for brow height.Synthetic faces are then reconstructed from these 37 mea-surements and bandpass filtered with a 2.0 octave wide dif-ference of Gaussians filter with a peak frequency of 10.0 cy-cles per face width (Wilson et al., 2002). Several studieshave shown that such filtering preserves frequencies neededfor face recognition (Gold et al., 1999; Nasanen, 1999).

By design, synthetic faces eliminate textures such as skin,hair, wrinkles, etc, and focus instead on geometric character-istics of faces. However,this raises the important question ofwhether synthetic faces are sufficiently accurate representa-tions of their original faces to be useful in psychophysical ex-perimentation. This question has been answered by requiringobservers to identify the gray-scale photograph from which asynthetic face was derived in a four alternative forced choiceexperiment. The mean across five observers was 97.4% cor-rect in matching between front view synthetic faces and pho-tographs, and even for matching between 20 side view pho-tographs and front view synthetic faces (or vice versa) per-

formance averaged 90.7% correct (Wilson et al., 2002). Aschance performance is 25% in these experiments, these dataclearly demonstrate that synthetic faces capture salient as-pects of individual face geometry. Furthermore, the data basecaptures known face gender differences: synthetic femalefaces have significantly smaller heads, rounder chins, thickerlips, and higher eyebrows than males. Finally, fMRI signalsfrom the fusiform face area show that synthetic faces pro-duce BOLD activation that is nearly as large as the originalgray scale faces from which they are derived (Loffler et al.,2005). Although the Wilson faces are relatively simple ge-ometrically, they still convey sufficient information to char-acterize individual faces. This essential individuality, whichis important in episodic memory, is absent from some com-monly used face stimuli, such as Brunswik faces (Brunswik& Reiter, 1937; Sigala et al., 2002; Peters et al., 2003).

ReferencesAshby, F. G., & Ell, S. W. (2001). The neurobiology of human

category learning. Trends in Cognitive Science, 5(May), 204-210.

Ashby, F. G., & Maddox, W. T. (1998). Stimulus categorization.In A. A. J. Marley (Ed.), Choice, decision, and measurement:Essays in honor of R. Duncan Luce (p. 251-301). Mahwah, NJ:ErIbaum.

Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision,10, 443-446.

Brunswik, E., & Reiter, L. (1937). Eindruckscharaktere schema-tisierter gesichter. Zeitschrift fr Psychologie, 142, 67-134.

Clark, S. E., & Gronlund, S. D. (1996). Global matching models ofrecognition memory: How the models match the data. Psycho-nomic Bulletin & Review, 3, 37-60.

Diamantaras, K. I., & Kung, S. Y. (1996). Principal ComponentNeural Networks. John Wiley & Sons, Inc.

Dryden, I. L., & Mardia, V. (1998). Statistical Shape Analysis. NewYork: J Wiley.

Duchaine, B. C., & Weidenfeld, A. (2003). An evaluation of twocommonly used tests of unfamiliar face recognition. Neuropsy-chologia, 41(6), 713-720.

Ekman, P., & Friesen, W. V. (1975). Unmasking the Face. Oxford:Prentice-Hall.

Ennis, D. M. (1988). Confusable and discriminable stimuli: Com-ment on Nosofsky (1986) and Shepard (1986). Journal of Ex-perimental Psychology: General, 117(4), 408-411.

Ennis, D. M., Mullen, K., Frijters, J. E. R., & Tindall, J. (1989). De-cision conflicts: Within-trial resampling in Richardson’s methodof triads. British Journal of Mathematical & Statistical Psychol-ogy, 42, 265-269.

Gallegos, D. R., & Tranel, D. (2005). Positive facial affect facil-itates the identification of famous faces. Brain and Language,93, 338-348.

Gauthier, I., Skudlarski, P., Gore, J. C., & Anderson, A. W. (2000).Expertise for cars and birds recruits brain areas involved in facerecognition. Nature Neuroscience, 3, 191-197.

Gold, J., Bennett, P. J., & Sekuler, A. B. (1999). Signal but notnoise changes with perceptual learning. Nature, 402, 176-178.

Goldstein, A. G., & Chance, J. E. (1971). Visual recognition mem-ory for complex configurations. Perception & Psychophysics, 9,237-241.

Page 11: Recognition Memory for Realistic Synthetic Facespeople.brandeis.edu/~sekuler/papers/faceMemory.final.pdfMEMORY FOR SYNTHETIC FACES 3 2002). In geometric terms, faces A-C lie along

MEMORY FOR SYNTHETIC FACES 11

Goren, D., & Wilson, H. R. (2003). Quantifying recognition abili-ties for four major emotional expressions based on facial geom-etry. Journal of Vision, 3(9), 300a.

Grill-Spector, K., Knouf, N., & Kanwisher, N. (2004). The fusiformface area subserves face perception, not generic within-categoryidentification. Nature Neuroscience, 7, 555-562.

Gronlund, S. D. (2004). Sequential lineups: shift in criterion ordecision strategy? Journal of Applied Psychology, 89, 362-368.

Hole, G. J. (1996). Decay and interference effects in visuospatialshort-term memory. Perception, 25(1), 53-64.

Humphreys, M. S., Pike, R., Bain, J. D., & Tehan, G. (1989). Globalmatching: A comparison of the SAM, Minerva II, Matrix, andTODAM models. Journal of Mathematical Psychology, 33, 36-67.

Hwang, G., Jacobs, J., Geller, A., Danker, J., Sekuler, R., & Kahana,M. J. (2005). EEG correlates of subvocal rehearsal in workingmemory. Behavioral and Brain Function, 1, 20.

Isaacowitz, D., Wadlinger, H., Goren, D., & Wilson, H. (2006). Se-lective preference in visual fixation away from negative imagesin old age? An eye tracking study. Psychology & Aging, 21,40-48.

Johansson, M., Mecklinger, A., & Treese, A.-C. (2004). Recog-nition memory for emotional and neutral faces: an event-relatedpotential study. Journal of Cognitive Neuroscience, 16, 1840-53.

Johnstone, A. B., & Williams, C. (1997). Do distinctive faces comefrom outer space? an investigation of the status of a multidimen-sional face-space. Visual Cognition, 4(1), 59-67.

Joseph, J. E., & Gathers, A. D. (2002). Natural and manufacturedobjects activate the fusiform face area. Neuroreport, 13(0959-4965), 935-8.

Kahana, M. J., & Sekuler, R. (2002). Recognizing spatial patterns:A noisy exemplar approach. Vision Research, 42, 2177-2192.

Kahana, M. J., Zhou, F., Geller, A., & Sekuler, R. (revision sub-mitted). Lure-similarity affects visual episodic recognition: De-tailed tests of a noisy exemplar model. Memory & Cognition.

Kaufmann, J. M., & Schweinberger, S. R. (2004). Expression influ-ences the recognition of familiar faces. Perception, 33, 399-408.

Klein, I., Paradis, A. L., Poline, J. B., Kosslyn, S. M., & Le Bihan,D. (2000). Transient activity in the human calcarine cortex dur-ing visual-mental imagery: an event-related fMRI study. Journalof Cognitive Neuroscience, 12(Supplement 2), 15-23.

Kosslyn, S., Thompson, W. L., Kim, I. J., & Alpert, N. M. (1995).Topographical representations of mental images in primary vi-sual cortex. Nature, 378(6556), 496-498.

Lee, K., Byatt, G., & Rhodes, G. (2000). Caricature effects, distinc-tiveness, and identification: Testing the face-space hypothesis.Psychological Science, 11(5), 379-385.

Lehky, S. R. (2000). Fine discrimination of faces can be performedrapidly. Journal of Cognitive Neuroscience, 12(5), 848-855.

Lindsay, R. C., Lea, J. A., Nosworthy, G. J., Fulford, J. A., Hector,J., LeVan, V., & Seabrook, C. (1991). Biased lineups: sequentialpresentation reduces the problem. Journal of Applied Psychol-ogy, 76, 796-802.

Loffler, G., Gordon, G. E., Wilkinson, F., Goren, D., & Wilson,H. R. (2005). Configural masking of faces: Evidence for high-level interactions in face perception. Vision Research, 45, 2287-2297.

Loffler, G., Yourganov, G., Wilkinson, F., & Wilson, H. R. (2005).fMRI evidence for the neural representation of faces. NatureNeuroscience, 8, 1386-1391.

Magnussen, S., & Greenlee, M. W. (1999). The psychophysics ofperceptual memory. Psychological Research, 62(2-3), 81-92.

McKinley, S. C., & Nosofsky, R. M. (1996). Selective attentionand the formation of linear decision boundaries. Journal of Ex-perimental Psychology: Human Perception & Performance, 22,294-317.

Mitchell, M. (1996). An Introduction to Genetic Algorithms. Cam-bridge: MIT Press.

Nasanen, R. (1999). Spatial frequency bandwidth used in the recog-nition of facial images. Vision Research, 39(23), 3824-3833.

Nosofsky, R. M. (1984). Choice, similarity, and the context theoryof classification. Journal of Experimental Psychology: Learn-ing, Memory & Cognition, 10(1), 104-114.

Nosofsky, R. M. (1986). Attention, similarity, and the identificationcategorization relationship. Journal of Experimental Psychol-ogy: General, 115, 39-57.

Nosofsky, R. M. (1991). Tests of an exemplar model for relatingperceptual classification and recognition memory. Journal ofExperimental Psychology: Human Perception & Performance,17(1), 3-27.

Nosofsky, R. M. (1992). Exemplar-based approach to relating cate-gorization, identification, and recognition. In F. G. Ashby (Ed.),Multidimensional models of perception and cognition (p. 363-394). Hillsdale, New Jersey: Lawrence Erlbaum and Associates.

Nosofsky, R. M., & Kantner, J. (2006). Familiarity, study-list ho-mogeneity, and short-term perceptual recognition. Memory &Cognition, 34, 112-124.

Pelli, D. G. (1997). The videotoolbox software for visual psy-chophysics: Transforming numbers into movies. Spatial Vision,10, 437-442.

Pelli, D. G., Robson, J. G., & Wilkins, A. J. (1988). Designinga new letter chart for measuring contrast sensitivity. ClinicalVision Sciences, 2, 187-199.

Peters, R. J., Gabbiani, F., & Koch, C. (2003). Human visual ob-ject categorization can be described by models with low memorycapacity. Vision Research, 43(21), 2265-2280.

Phillips, W., & Christie, D. (1977). Components of visual memory.Quarterly Journal of Experimental Psychology, 29, 117-133.

Phillips, W. A. (1974). On the distinction between sensory storageand short-term visual memory. Perception & Psychophysics, 16,283-290.

Phillips, W. A. (1983). Short-term visual memory. PhilosophicalTransactions of the Royal Society B, 302, 295-309.

Principle, J. C., Euliano, N. R., & Lefebvre, W. C. (2000). Neuraland Adaptive Systems. John Wiley & Sons, Inc.

Riesenhuber, M., Jarudi, I., Gilad, S., & Sinha, P. (2004). Faceprocessing in humans is compatible with a simple shape-basedmodel of vision. Proceeding of the Royal Society of London,Biological Sciences, 271, S448-S450.

Romney, A. K., Brewer, D. D., & Batchelder, W. H. (1993). Predict-ing clustering from semantic structure. Psychological Science,4, 28-34.

Sadr, J., Jarudi, I., & Sinha, P. (2003). The role of eyebrows in facerecognition. Perception, 32, 285-293.

Sigala, N., Gabbiani, F., & Logothetis, N. K. (2002). Visual cat-egorization and object representation in monkeys and humans.Journal of Cognitive Neuroscience, 14(2), 187-98.

Steblay, N., Dysart, J., Fulero, S., & Lindsay, R. C. (2001). Eyewit-ness accuracy rates in sequential and simultaneous lineup pre-sentations: a meta-analytic comparison. Law and Human Be-havior, 25, 459-473.

Page 12: Recognition Memory for Realistic Synthetic Facespeople.brandeis.edu/~sekuler/papers/faceMemory.final.pdfMEMORY FOR SYNTHETIC FACES 3 2002). In geometric terms, faces A-C lie along

12 YOTSUMOTO, KAHANA, WILSON & SEKULER

Torgerson, W. S. (1958). Theory and methods of scaling. J. Wiley& Sons.

Turtle, J., Lindsay, R. C., & Wells, G. L. (2003). Best practice rec-ommendations for eyewitness evidence procedures: New ideasfor the oldest way to solve a case. The Canadian Journal ofPolice and Security Services, 1(1), 5-18.

Valentine, T. (1991). A unified account of the effects of distinctive-ness, inversion, and race in face recognition. Quarterly Journalof experimental Psychology, A, 43(2), 161-204.

Valentine, T., & Bruce, V. (1986). The effects of distinctiveness inrecognising and classifying faces. Perception, 15(5), 525-535.

Valentine, T., & Endo, M. (1992). Towards an exemplar model offace processing: the effects of race and distinctiveness. Quar-terly Journal of Experimental Psychology, A, 44(4), 671-703.

Visscher, K., Kahana, M. J., & Sekuler, R. (2006). Short-term mem-ory for spectrally and temporally complex sounds: Comparingapples to apples. In Proceedings of the cognitive neurosciencesociety. San Francisco.

Weller, S. C., & Romney, A. K. (1988). Systematic Data Collection(Vol. 10). Newbury Park, CA: Sage Publications.

Wexler, K. N., & Romney, A. K. (1972). Individual variationsin cognitive structures. In A. K. Romney, R. M. Shepard, &M. Nerlove (Eds.), Multidimensional Scaling: Theory and Ap-plications in the Behavioral Sciences (2 ed., p. 73-92). SeminarPress.

Wilson, H. R., Loffler, G., & Wilkinson, F. (2002). Synthetic faces,face cubes, and the geometry of face space. Vision Research, 42,2909-2923.

Yotsumoto, Y., Kahana, M. J., Wilson, H. R., & Sekuler, R. (2004).Preliminary studies of recognition memory for synthetic faces(Tech. Rep. No. 2004-3). Visual Cognition Laboratory, VolenCenter for Complex Systems: Brandeis University.

Zhou, F., Kahana, M. J., & Sekuler, R. (2004). Short-term episodicmemory for visual textures: A roving probe gathers some mem-ory. Psychological Science, 153(2), 112-118.