retrieval processes in recognition and cued recall... · retrieval processes 385 as many items as...

30
Journal of Experimental Psychology: Learning, Memory, and Cognition 2001, Vol. 27, No. 2. 384-413 Copyright 2001 by the American Psychological Association, Inc. 0278-7393/01/$5.00 DOI: 10.1037//0278-7393.27.2.384 Retrieval Processes in Recognition and Cued Recall Peter A. Nobel and Richard M. Shiffrin Indiana University Bloomington The present studies used response time (RT) and accuracy to explore the processes and relation of recognition and cued recall. The studies used free-response and signal-to-respond techniques and varied list length and presentation rate. In Experiment 1, the free-RT distributions for recognition had much lower mean and variance than those for cued recall. Similarly, signal-to-respond curves showed fast rates of accumulation of information in recognition and slow rates in recall. (Quantitative models of the results are presented in the companion article by D. E. Diller, P. A. Nobel, and R. M. Shiffrin, 2001). To rule out the possibility that the slower responses in cued recall were due to a fast retrieval process followed by a slow process of cleaning up the retrieved trace for output, additional signal-to-respond tasks provided the relevant alternatives at test. Yet, these conditions showed slow growth rates, similar to those seen in recall. The results support the hypothesis that retrieval processes differ for single-item recognition and cued recall, with retrieval in cued recall (and associative recognition) due to a sequential search. The present studies used both response time (RT) and accuracy to explore the processes and relation of recognition and cued recall. The focus of this article is what is termed explicit or episodic memory. Experiment 1 reports free-response and signal- to-respond results for single-word recognition and cued recall (a quantitative model for these results is provided in the companion article by Diller, Nobel, and Shiffrin [2001]). Experiments 2 and 3 explore the relation between various types of recognition and cued recall and present additional evidence that the retrieval processes used in these paradigms differ qualitatively. Recognition and recall have been studied for many years, but the similarity of the processes that underlie them remains a matter of debate. In the present research project we collected and analyzed distributions of RTs, and accuracy in signal-to-respond proce- dures, in single-word, paired, and associative recognition, and in cued recall, in the hope that such measures would illuminate the similarities and differences between retrieval processes used in these cases. Whether recognition and recall are accomplished by similar retrieval processes is presently an open question. For some vari- ables manipulated in memory paradigms, recognition and recall respond in a similar fashion, whereas for others, the results differ. For example, the accuracy of recognition and recall is improved by increased study time (e.g., Ratcliff & Murdock, 1976; Shiffrin, 1970), decreased list length (e.g., Roberts, 1972), lessened delay, Peter A. Nobel and Richard M. Shiffrin, Department of Psychology, Indiana University Bloomington. Peter A. Nobel is now at Microsoft Corporation, Seattle, Washington, and is now named Peter A. Koss-Nobel. The research in this article was submitted by Peter A. Nobel in partial fulfillment of the requirements for a PhD in psychology at Indiana Uni- versity Bloomington. Funding for this research was provided by National Institute of Mental Health Grant MH12717. Correspondence concerning this article should be addressed to Richard M. Shiffrin, Department of Psychology, Indiana University Bloomington, Bloomington, Indiana 47405. Electronic mail may be sent to shiffrinO indiana.edu. and a shortened distractor task between study and test (e.g., Shep- ard, 1967). In contrast, (a) words that have a higher natural language frequency increase recall accuracy (at least for lists of one frequency; see Gillund & Shiffrin, 1984) but decrease recog- nition accuracy (e.g., Deese, 1960; Hall, 1979); (b) instructions for maintenance rehearsal improve recognition accuracy (Glenberg & Adams, 1978) but do not substantially change recall accuracy (e.g., Dark & Loftus, 1976); and (c) strengthening some list items (either by extra study time or extra repetitions) harms the free recall of other items, does not much alter cued recall of other items, or may even slightly help the recognition of other items (Ratcliff, Clark, & Shiffrin, 1990; Shiffrin, Ratcliff, & Clark, 1990). Whether these findings concerning accuracy of report support similar or different retrieval mechanisms for recall and recognition remains an open question because models of both types have been developed that account for many, if not all, of these results. Another line of evidence that sometimes is used to argue for different retrieval in recall and recognition is based on studies in which a given word is tested successively for recognition and recall (e.g., Flexser & Tulving, 1978). Study typically takes place in a pair setting in which a cue leads with high probability to a response (e.g., glass-VASE). In later tests, a word (e.g., VASE) may not be recognized as having been presented but then is recalled when prompted with the studied cue (e.g., glass). At- tempts have been made to argue for independent retrieval in such cases, but such conclusions are hotly contested, and models for the results include cases in which essentially the same retrieval oper- ation for recall and recognition can produce the results (e.g., Metcalfe, 1991; her composite holographic associative recall model (CHARM) assumes storage of both single-word informa- tion and pair information, with a recall probe accessing the pair information and a recognition probe accessing the single-word information). The difficulty of coming to definitive conclusions on the basis of accuracy results motivated us to take a careful look at RTs. RT measures can be particularly useful in illuminating the similarities and differences between the retrieval mechanisms that operate in recognition and recall. Free recall requires the participant to output 384

Upload: others

Post on 29-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

Journal of Experimental Psychology:Learning, Memory, and Cognition2001, Vol. 27, No. 2. 384-413

Copyright 2001 by the American Psychological Association, Inc.0278-7393/01/$5.00 DOI: 10.1037//0278-7393.27.2.384

Retrieval Processes in Recognition and Cued Recall

Peter A. Nobel and Richard M. ShiffrinIndiana University Bloomington

The present studies used response time (RT) and accuracy to explore the processes and relation ofrecognition and cued recall. The studies used free-response and signal-to-respond techniques and variedlist length and presentation rate. In Experiment 1, the free-RT distributions for recognition had muchlower mean and variance than those for cued recall. Similarly, signal-to-respond curves showed fast ratesof accumulation of information in recognition and slow rates in recall. (Quantitative models of the resultsare presented in the companion article by D. E. Diller, P. A. Nobel, and R. M. Shiffrin, 2001). To ruleout the possibility that the slower responses in cued recall were due to a fast retrieval process followedby a slow process of cleaning up the retrieved trace for output, additional signal-to-respond tasksprovided the relevant alternatives at test. Yet, these conditions showed slow growth rates, similar to thoseseen in recall. The results support the hypothesis that retrieval processes differ for single-item recognitionand cued recall, with retrieval in cued recall (and associative recognition) due to a sequential search.

The present studies used both response time (RT) and accuracyto explore the processes and relation of recognition and cuedrecall. The focus of this article is what is termed explicit orepisodic memory. Experiment 1 reports free-response and signal-to-respond results for single-word recognition and cued recall (aquantitative model for these results is provided in the companionarticle by Diller, Nobel, and Shiffrin [2001]). Experiments 2 and 3explore the relation between various types of recognition and cuedrecall and present additional evidence that the retrieval processesused in these paradigms differ qualitatively.

Recognition and recall have been studied for many years, but thesimilarity of the processes that underlie them remains a matter ofdebate. In the present research project we collected and analyzeddistributions of RTs, and accuracy in signal-to-respond proce-dures, in single-word, paired, and associative recognition, and incued recall, in the hope that such measures would illuminate thesimilarities and differences between retrieval processes used inthese cases.

Whether recognition and recall are accomplished by similarretrieval processes is presently an open question. For some vari-ables manipulated in memory paradigms, recognition and recallrespond in a similar fashion, whereas for others, the results differ.For example, the accuracy of recognition and recall is improved byincreased study time (e.g., Ratcliff & Murdock, 1976; Shiffrin,1970), decreased list length (e.g., Roberts, 1972), lessened delay,

Peter A. Nobel and Richard M. Shiffrin, Department of Psychology,Indiana University Bloomington.

Peter A. Nobel is now at Microsoft Corporation, Seattle, Washington,and is now named Peter A. Koss-Nobel.

The research in this article was submitted by Peter A. Nobel in partialfulfillment of the requirements for a PhD in psychology at Indiana Uni-versity Bloomington. Funding for this research was provided by NationalInstitute of Mental Health Grant MH12717.

Correspondence concerning this article should be addressed to RichardM. Shiffrin, Department of Psychology, Indiana University Bloomington,Bloomington, Indiana 47405. Electronic mail may be sent to shiffrinOindiana.edu.

and a shortened distractor task between study and test (e.g., Shep-ard, 1967). In contrast, (a) words that have a higher naturallanguage frequency increase recall accuracy (at least for lists ofone frequency; see Gillund & Shiffrin, 1984) but decrease recog-nition accuracy (e.g., Deese, 1960; Hall, 1979); (b) instructions formaintenance rehearsal improve recognition accuracy (Glenberg &Adams, 1978) but do not substantially change recall accuracy (e.g.,Dark & Loftus, 1976); and (c) strengthening some list items (eitherby extra study time or extra repetitions) harms the free recall ofother items, does not much alter cued recall of other items, or mayeven slightly help the recognition of other items (Ratcliff, Clark, &Shiffrin, 1990; Shiffrin, Ratcliff, & Clark, 1990). Whether thesefindings concerning accuracy of report support similar or differentretrieval mechanisms for recall and recognition remains an openquestion because models of both types have been developed thataccount for many, if not all, of these results.

Another line of evidence that sometimes is used to argue fordifferent retrieval in recall and recognition is based on studies inwhich a given word is tested successively for recognition andrecall (e.g., Flexser & Tulving, 1978). Study typically takes placein a pair setting in which a cue leads with high probability to aresponse (e.g., glass-VASE). In later tests, a word (e.g., VASE)may not be recognized as having been presented but then isrecalled when prompted with the studied cue (e.g., glass). At-tempts have been made to argue for independent retrieval in suchcases, but such conclusions are hotly contested, and models for theresults include cases in which essentially the same retrieval oper-ation for recall and recognition can produce the results (e.g.,Metcalfe, 1991; her composite holographic associative recallmodel (CHARM) assumes storage of both single-word informa-tion and pair information, with a recall probe accessing the pairinformation and a recognition probe accessing the single-wordinformation).

The difficulty of coming to definitive conclusions on the basisof accuracy results motivated us to take a careful look at RTs. RTmeasures can be particularly useful in illuminating the similaritiesand differences between the retrieval mechanisms that operate inrecognition and recall. Free recall requires the participant to output

384

Page 2: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 385

as many items as possible from the previous list, in any order. Thisis a sequential task that is spread out in time, typically over severalminutes at least, with the time between successive recalls tendingto grow to many seconds as the recall period proceeds. As a result,almost all models of free recall posit a sequential search of mem-ory (e.g., Atkinson & Shiffrin, 1968; Howard & Kahana, 1999;Metcalfe & Murdock, 1981; Raaijmakers & Shiffrin, 1980, 1981;Rohrer & Wixted, 1994; Shiffrin, 1970). However, both cuedrecall and recognition require a single response, so both retrievalby means of sequential search and retrieval by means of a singlestep of parallel retrieval remain plausible theoretical models. Forshorthand, we term these two approaches parallel versus serialretrieval, respectively.

In considering the distinction between recall and recognition,one needs to take into account another theoretical dichotomy: (a)access to a particular identifiable event with some episodic andpersonal content (assumed to underlie most successful recalls) and(b) a much less specific sense of familiarity (or activation, ormatching) without specific content (typically assumed to underlieat least a substantial portion of recognition performance). Forshorthand, we term these approaches event retrieval and familiar-ity, respectively. Parallel and serial models can be combined withevent retrieval and familiarity models into four general subclasses;a model in any of these subclasses could be proposed for eitherrecognition or cued-recall tasks. We are not aware of accuracymeasures sufficient to choose among these various models. In thisarticle, we shall see how the additional consideration of RT mea-sures can more tightly constrain the potential models for each typeof task.

The main datum to be extracted from RTs, as we elaborate indetail below, is the general slowness of cued recall relative torecognition. One interpretation of such results ascribes recall to aslow process like serial search and recognition to a fast processlike parallel retrieval, often termed familiarity. However, it seemsthat there is nothing to prevent whatever goes on in recall fromoccurring on occasion during recognition testing. Thus, somerecognition models propose recall-like processes as one compo-nent in a dual-route retrieval framework. The other component isbased on parallel global access (e.g., Horton, Pavlick, & Moulin-Julian, in press; Jacoby, 1991; Juola, Fischler, Wood, & Atkinson,1971; Mandler, Pearlstone, & Koopmans, 1969; Yonelinas, 1999).It is important to note that dual-route models for recognition can,in principle, be subdivided into two types. In one type (e.g., Juolaet al., 1971), the recall-like route is slower than the familiarityroute. In the other type, the recall-like route (which in recall tasksis extended in time and often quite slow) is truncated or cut off andis therefore much faster when used in recognition tasks than recalltasks. For example, a process of successive sampling and recoverycould (often) be slow when used in recall, due to many samples,but in a recognition task could be truncated to the first sample. Inthis second case, the two routes would differ qualitatively in someaspect other than time (e.g., the two routes may differ with respectto the variables that affect them; see Brainerd, Reyna, & Mojardin,1999; Jacoby, 1991).

Although both empirical results and plausible conceptual argu-ments can be mustered in favor of dual-route models for recogni-tion (at least for those with a truncated recall-like route), mostcurrent quantitative theorists fit recognition data with a singleprocess of parallel activation or parallel matching, sometimes

called "global matching" (e.g., Chappell & Humphreys, 1994; thesearch of associative memory [SAM] model of recall [Gillund &Shiffrin, 1984]; MINERVA [Hintzman, 1988]; CHARM1 [Met-calfe Eich, 1982, 1985]; the theory of distributed associativememory [TODAM; Murdock, 1982]; the matrix model [Pike,1984]; and the retrieving effectively from memory model [REM;Shiffrin & Steyvers, 1997]). The success of such single-routemodels is probably due in substantial part to two facts. First,latency data are often not collected or examined, making it mostdifficult to distinguish single- and dual-route models. Second,latency data may not be that helpful in distinguishing single-routerecognition models from dual-route models in which the recall-likeroute is truncated. In addition, if familiarity values that underlie thesingle-route decisions are highly correlated with recall success,then it might become almost impossible to distinguish single- anddual-route models.2 In fact, if such a correlation exists, participantsmight well choose to base almost all responses on familiarity, evenif a recall-like route were available. These arguments were fine-tuned by Hintzman and Curran (1994); they varied the similarity offoils in recognition and frequency judgment tasks and examinedretrieval dynamics. The results supported the view that most re-sponding was based on a familiarity process but that sometimesfalse alarms to highly similar foils could be inhibited by a (rela-tively) slow recall-like process involving access to specific traceinformation.

It should be noted that the possibility of a truncated recall-likeprocess operating as a second route in recognition does not openup the possibility that such a process could be posited as the soleroute for recognition. Such a model could possibly explain accu-racy and times for correct "old" responses but would have diffi-culty explaining the common finding that RTs for correct "new"responses are roughly equivalent to those for correct "old" re-sponses, a result that was also found in the present studies (also seeRatcliff & Murdock, 1976).

Thus, current theorists favor sequential search models for freerecall (usually based on event retrieval) and parallel global match-ing models for recognition (based on familiarity). The situation isless clear in the case of cued recall: Some models posit sequentialsearch (e.g., Raaijmakers & Shiffrin, 1980, 1981), whereas othersposit parallel retrieval processes akin to those in recognition (e.g.,Hintzman, 1988; Metcalfe Eich, 1982; Murdock, 1982). Our goalin this article was to use RTs and retrieval dynamics to compareand contrast retrieval processes in cued recall and in recognition.

Short-Term Memory and Long-Term Memory

We favor a memory model having distinguishable short-termand long-term components, with different roles and related butdistinguishably different rules for storage and retrieval (e.g., At-kinson & Shiffrin, 1968; Shiffrin, 1993). Therefore, the presentstudies were designed to minimize the likely contribution of re-

1 Technically, CHARM first retrieves by using the probe and thenmatches the result to the probe. We consider these two steps together as theglobal operation.

2 Research like that of Wiseman and Tulving (1975) and Flexser andTulving (1978) includes cases in which the correlation is not very high, butfor recall and recognition tasks like the ones in this article, it is possible thatsuch correlations could be high enough to make this argument plausible.

Page 3: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

386 NOBEL AND SHIFFRIN

trieval from short-term store: Arithmetic was used after list pre-sentations before retrieval, and study-test lags were generally keptquite long. Regardless, the conclusions should remain valid formodels that do not use the short/long-term distinction (e.g.,Dosher, McElree, Hood, & Rosedale, 1989; Hockley & Murdock,1987; Ratcliff, 1978).

Experiment 1

In this article, we present accuracy and RT data from recogni-tion and cued-recall paradigms. In Experiment 1, the participantswere tested concurrently in recognition and cued recall: The par-ticipants studied a list of words without knowing whether thefollowing test would be recognition or recall. The type of test wasindicated at the start of the test period.

RT measures have played a significant role in theorizing onperception, information processing, and cognitive processes (see,e.g., Luce, 1986; Townsend & Ashby, 1983). There is an importantbut smaller literature on data and models of RT in tasks involvinglong-term memory. Studying recognition memory, Ratcliff andMurdock (1976) described a series of four experiments in whichthey explored the relation between accuracy and RT under variousmanipulations (some of their results reprised those reported byMurdock & Anderson, 1975). Their studies allowed retrieval fromboth short- and long-term memory, but long enough lists were usedthat their results contained data of relevance for long-term re-trieval. Analyzing only high-confidence responses, they foundhigher accuracy and shorter RTs for decreased list length andhigher accuracy and longer RTs for slower presentation rates.

There is also a small literature on data and models of RT in tasksinvolving recall from long-term memory. In free-recall tasks, anal-ysis has been restricted mainly to interresponse times (see, e.g.,Murdock & Okada, 1970; Raaijmakers & Shiffrin, 1980; Rohrer &Wixted, 1994). The basic findings show that interresponse timesincrease in a positively accelerated fashion and that, at any givenoutput position, the interresponse time is a good predictor of thenumber of words yet to be recalled. Rohrer and Wixted fittedfree-recall latency distributions with an ex-Gaussian and interre-sponse times at a given output position as a function of the itemsnot yet recalled. Indow (1995) successfully modeled several as-pects of retrieval from long-term memory, including interresponsetimes, with Weibull distributions.

For cued recall, error latency is longer than correct latency (see,e.g., Millward, 1964), and there is a direct relation (positivecorrelation) between correct latency and error rate (see, e.g.,MacLeod & Nelson, 1984). In one approach, the correct and errorlatencies were analyzed separately and interpreted differently:Correct latency was seen as an index of the amount of informationabout the item available in memory, whereas error latency wasviewed as an index of the participant's willingness to continuesearching memory (e.g., Millward, 1964). This approach, ofcourse, presupposes a model involving some sort of sequentialsearch.

Ratcliff and Murdock (1976) stressed the importance of analyz-ing latency data at the level of the entire distribution rather than atthe level of the mean because distributions can provide much moreinformation. For instance, finding a lower mean RT for a particularcondition can be the result of a decrease in overall RT or speedingup of slower responses. Ratcliff (1978) suggested that models

should account for the shape of RT distributions (in particular,their skewness) and specify the relation between speed and accu-racy, despite the difficulty of obtaining reliable distributional data.We followed this advice in the design of the present study.

We used two different paradigms to track the time course ofretrieval. In both, participants took part in a large number ofsessions to enable the collection of data providing reliable char-acteristics of the RT distributions and reliable estimates of thedependence of accuracy on RT for each condition of length andstrength (list lengths were 10 or 40 pairs of words, and presentationtime per pair was two thirds of a second or 2 s). One paradigm useda free-response procedure (i.e., participants were asked to respondas quickly and accurately as possible). The free-response proce-dure is commonly used but is limited by the possibility thatparticipants might adopt different strategies in recognition andcued recall, such as differing biases to respond quickly in the twotasks. To control for possible strategy differences of this sort, inthe second paradigm we used a signal-to-respond procedure inwhich the participants were told to withhold responses until asignal was given and then to respond within a fixed short windowof time. Measuring accuracy at each of a range of controlledprocessing times allows independent assessment of, on the onehand, the dynamics of retrieval and, on the other hand, asymptoticperformance (i.e., the highest achievable accuracy; see, e.g.,Dosher, 1981, 1984a, 1984b; Wickelgren & Corbett, 1977). Thesame participants took part in both paradigms (on different days),each using similar study conditions.

To compare RTs for cued recall and recognition, it is essentialfor one to eliminate as much as possible artifactual or peripheralsources of differences, such as the use of one of two response keysto indicate an "old" or a "new" response in recognition, and thesequential use of 26 keys to type a response in recall. To equate thedemand characteristics of the tasks as much as possible, we had theparticipants give a verbal response that triggered a voice key; thisresponse latency was used as the basis for all measures of RT.Immediately after the verbal response, the participants provided atyped response that was used to assess accuracy.

Method

Participants

Ten Indiana University Bloomington graduate students received finan-cial compensation for their participation in the experiment. There were atotal of 30 experimental sessions: 15 for the free-response procedureand 15 for the signal-to-respond procedure. In addition, there were 1 or 2training sessions for signal to respond. Five students started with freeresponse and, after completion of 15 such sessions, moved on to thesignal-to-respond sessions. (One of these students left the experiment formedical reasons after finishing the full free-response part of the experi-ment.) Five students started with 15 experimental sessions of signal-to-respond and then moved on to the free-response sessions.

Materials and Apparatus

The stimuli were high-frequency English nouns (ranging from 21to 1,567 occurrences per million) from three to nine letters long (Francis &Kucera, 1982). Fifty words were used on two practice lists to start the 1stexperimental session, and 880 words were used in each experimentalsession. No word was studied on more than one list in a session, no wordwas tested more than once in a session, and a studied word could be tested

Page 4: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 387

in a session only in the test period immediately following its study list. Thesame pool of 880 words was permuted and used again for each session ofthe study.3 The words were displayed on IBM-compatible PCs. RTs wererecorded using a voice-activated response box; subsequent typed responsesused to verify accuracy were entered on the keyboard.

Design and Procedure

Each session was divided into 16 study-test cycles. During study,participants saw a list of pairs of words. All pairs on a given list werepresented at the same presentation rate. Varied between lists were thelength of the list (10 or 40 pairs) and the presentation time (670 or 2,000ms per pair). After presentation of the list, a simple arithmetic task(addition) was given for 16 s. Participants had 10 s to type their additionanswer on the keyboard.

Two test conditions were defined by the instructions given after thearithmetic period and at the start of the test list; the same type of test wasused for the entire test period for a given list. For both types of tests, singlewords were presented successively. In the recognition task, participantswere asked to differentiate between an old word (for old tests, the right orleft word from a pair was tested with equal frequency) and a new word. Inthe cued-recall task, one word from a study pair was given (the right or leftmember of a pair was tested with equal frequency), and participants had totry to generate the other word of the pair. A word presented for test alwayswas presented centered on the screen. Each of the four lists that had aparticular combination of length and strength was tested twice in eachcondition in each session.

Recognition test lists consisted of 20 words, regardless of the length ofthe studied list: 10 old words and 10 new words were tested, randomlyintermixed. Half of the old words tested had been studied on the right sideof a pair, and half had been studied on the left side. Only 1 word from agiven pair could be tested. In cued recall, the number of test words was 10,regardless of the length of the studied list. Thus, for both conditions, for theshort lists, 1 word from each pair was tested. Also for both conditions, forhalf of the long lists the first 10 pairs were tested, and for the other half ofthe long lists the last 10 pairs were tested. After the word pool waspermuted for each session, the 16 study-test cycles were generated, andtheir order was randomized.

On each trial, the participants had to give a verbal response through amicrophone headset that triggered a voice-activated response box (thesensitivity of which was adjusted as needed throughout training). This wasfollowed by a typed response on the keyboard (one of two keys forrecognition or the response word for recall). The verbal response was usedto obtain RTs, and the typed responses were used to assess accuracy.Participants were under the impression that their verbal responses werebeing recorded (though this was not in fact the case). They were told theyhad to type the response corresponding to their verbal response (even ifthey retrieved information from memory after the verbal response andbefore the typed response that would suggest another answer). Spot check-ing of each participant revealed what appeared to be universal compliancewith this instruction.

The verbal response in recognition was "P—yes" when the test wordwas recognized, and "P—no" if it was not recognized. This verbal responsewas followed by typing an F for positive responses or a J for negativeresponses. In cued recall, the verbal response was "P—word" (i.e., aparticular word was pronounced after the "P" sound) when the participantwas able to recall the other word of the pair (these responses could becorrect or could be intrusions) and "P—no" if the participant could notrecall an answer in the time allowed or gave up. This was followed bytyping the recalled word after a positive response or a J after a negativeresponse.

The "P" sound was inserted at the beginning of the verbal response toequate the onset times for different phonemes. Differences as large as 150

ms in initial phonemes have been reported (see, e.g., Pechmann, Reetz, &Zerbst, 1989). In all cases, if the microphone was triggered by anythingelse besides a valid "P—" response (e.g., breathing, extraneous noise, or"P—" followed by a pause), the participants were instructed to press a keylabeled trash, which marked the trial as invalid. Participants were in-structed and trained to respond without a pause between the "P" sound andthe following response; spot checking revealed universal compliance withthese instructions.

In the free-response sessions, participants were asked to respond ver-bally as quickly and as accurately as possible after presentation of the testitem and were given a maximum of 5 s to do so. If no verbal response hadbeen given after 4,500 ms, a brief tone was given (30 ms, 500 Hz) toindicate that a response had to be made within the next 500 ms. There wasno time limit for the typed response. At the end of each trial, feedback onaccuracy was given. After finishing the test list, participants receivedfeedback on proportion correct and mean overall RT for correct responses(such feedback helped maintain motivation).

In the signal-to-respond sessions, participants were instructed not torespond until a response signal (30 ms, 500 Hz) was given and then torespond verbally within 300 ms after the tone but not before 50 ms afterthe tone.4 The lag between onset of the test item and the signal wasvariable. Ten different lags were used: 100, 200, 300, 400, 550, 800,1,100, 1,400, 2,500, and 4,500 ms. The lags were assigned randomly tothe test trials in cued recall; in recognition, each lag was assignedrandomly to an old test word and a new test word. At the end of eachtrial, feedback was given for accuracy, and RT feedback was given ifthe response was too fast or too slow. After finishing the test list,participants were given feedback on proportion correct and mean over-all RT for correct responses.

3 Ratcliff and Murdock (1976) used repeated words in many studies,as we did in the present experiment. In a previous study with a paradigmvery similar to that of Experiment 1 in this article, we contrasted acondition that used a pool of 6,000 different words, of lower frequency,with a condition that repeated words, of higher frequency, from sessionto session. The data showed very similar patterns for the two conditions.Although the differences were not large, recognition performance (d')was better for the nonrepeated (low-frequency) words, and recall wasbetter for the repeated (high-frequency) words, as typically noted in theliterature. The recall advantage for repeated words occurred in the faceof any proactive interference that developed across sessions, but repe-titions could have led to a compensating increase in the effectiveness ofcoding techniques for individual words. (This experiment was notsuitable for the present purposes because the study times we used [2 svs. 6 s] and the list lengths [10 words vs. 20 words] did not producesignificant differences in accuracy.) In any event, we saw nothing inthis prior experiment to provide a compelling reason to use nonrepeatedwords in the present experiment. Most important, because the design ofthe present study would have required about 28,000 words if all were tobe different, it would have been difficult or impossible to find such anumber of words that were reasonably uniform in frequency, length,and type (e.g., nouns). We therefore decided to use a smaller set ofwords, repeated across sessions.

4 After giving a negative response in cued recall in the signal-to-respondconditions, participants were given a second chance to type in theirresponse. This feature was suggested by some participants after a pilotstudy in which the difficulty with cued recall at the short lags causeddistress; motivation increased considerably with a second opportunity forthe participants to show that they could recall if given enough time. Thesesecond-chance responses were not analyzed.

Page 5: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

388 NOBEL AND SHIFFRIN

Results5

In this article, we report the results relevant for the overallcomparison of recognition with cued recall, including accuracy,mean RTs, and certain summary distributions. The companionarticle by Diller et al. (2001) presents the distributions for theindividual conditions and the predictions from the associatedmodel. All results reported in this article are significant at the .005level or lower, unless otherwise indicated.

Free Response

Table 1 presents free-response accuracy for recognition andcued recall. Table 2 presents intrusion results for cued recall.Tables 3 and 4 present free-RT measures of central tendency andstandard deviation for recognition and cued recall, respectively.Figure 1 presents summaries of the free-RT distributions for rec-ognition and cued recall, averaged across conditions of length andstrength. As suggested by Tables 3 and 4 (and as discussed indetail by Diller et al., 2001), the RT distributions differed verylittle across variations in length and strength, so the summaries inFigure 1 are highly representative and sufficient for the compari-sons of recall and recognition.

Trials that were marked invalid by the participants were ex-cluded from all analyses, as were trials on which no response wasgiven within the 5-s interval. For recognition, about 0.03%and 1.54% were excluded, respectively, and in total, approxi-mately 1.6% of the responses were excluded; for cued recall,about 0.85% and 2.24% were excluded, respectively, and in total,about 3.1% of the responses were excluded.

Free-Response Accuracy

Recognition accuracy. Table 1 provides the probabilities of anold response for the four length-strength conditions, for length andstrength separately, and for targets (hits) and distractors (falsealarms); it also presents a" (calculated for each participant and thenaveraged). Analyses of variance (ANOVAs) revealed significant

Table 1Accuracy Results for Experiment 1 Recognitionand Cued Recall (Free Response)

Table 2Intrusion Results for the Experiment 1 Free-ResponseCued-Recall Condition

Condition

10-67010-2,00040-67040-2,00010 pairs40 pairs670 ms2,000 ms

HI

.69

.83

.65

.78

.76

.72

.67

.80

Recognition

FA

.21

.14

.31

.25

.17

.28

.26

.19

d'

1.402.150.971.581.781.281.191.86

Recall

CO

.17

.45

.11

.34

.31

.23

.14

.39

IN

.10

.08

.06

.08

.09

.07

.08

.08

Note. The conditions in the first four rows represent list length (10 or 40pairs) followed by presentation time (670 or 2,000 ms). The data for thefifth and sixth rows are for the two list lengths collapsed across presenta-tion time, and the data for the last two rows are for the two presentationtimes collapsed across list length. HI = hits; FA = false alarms; CO =corrects; IN = intrusions.

Category Proportion

Current listPrevious listSemantic—targetSemantic—cueUnknown

.391

.276

.114

.068

.152

effects for length and presentation time on all accuracy measures.Performance on the short lists was better than on the long lists: ford', F(l, 9) = 106.63, MSE = 0.35; for hits and misses, F(l,9) = 14.57, MSE = 0.34; for correct rejections and false alarms,F(l, 9) = 101.26, MSE = 0.31. In a similar manner, performanceon lists with slow rates of presentation was better than on lists withfast rates of presentation: for d', F(\, 9) = 38.51, MSE = 1.79; forhits and misses, F(l, 9) = 47.19, MSE = 1.07; for correct rejec-tions and false alarms, F(l, 9) = 18.41, MSE = 0.73. The manip-ulations of length and strength had substantial effects on recogni-tion performance: a 39% increase in d' for the shorter lists and a56% increase in d' for the stronger lists. The accuracy results forvariables that we regarded as being of secondary importance(variables that are not included explicitly in the models of Diller etal., 2001) may be found at the following website: http://www.psych.indiana.edu/publications.html. In short, there were differ-ences across participants and serial positions but not across ses-sions, list position within session, or test position.

Recall intrusion analysis. Trials on which the participantsgave a positive verbal response (i.e., "P—word") followed by anincorrect typed response were labeled as intrusions. Inspection ofthe intrusions showed that some were spelling errors (nonwordsthat matched the correct answer in most letters); these were re-corded as correct responses (a total of 10.1% of all intrusions). Theremaining intrusions were divided into five categories (in the orderlisted; when an intrusion fell into more than one category, it wascounted once only, in the first of the following categories): (a)presented on the current list; (b) presented on a previous list; (c)semantically related to the target (e.g., if the target was "pistol"and the response was "gun"); (d) semantically related to the cue(this happened mostly for word pairs with strong existing associ-ations; e.g., "cow-milk"); or (e) unknown, if the response did notfit any of the previous categories. Table 2 shows the proportion ofintrusions in each category. Most of the intrusions (68%) wereepisodic, representing words from the current or previous lists.

Recall accuracy. Table 1 gives the probabilities of correctresponses and of intrusions for the four length-strength conditionsand for length and strength separately. ANOVAs revealed signif-icant effects for length and presentation time on most accuracymeasures. Performance on the short lists was better than on thelong lists: for correct responses, p(c), F(l, 9) = 59.22,MSE = 0.36; for intrusions, F(l, 9) = 11.79, MSE = 0.08; forgive-ups, F(l, 9) = 111.43, MSE = 0.28. In a similar manner,

5 The data for this experiment (and subsequent experiments) are avail-able from the authors on request.

Page 6: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 389

Table 3

Experiment 1 Recognition: Response Time (RT) Central

Tendency Measures for the Free-Response Conditions

Response

Condition

10-67010-2,00040-67040-2,00010 pairs40 pairs670 ms2,000 ms

10-67010-2,00040-67040-2,00010 pairs40 pairs670 ms2,000 ms

10-67010-2,00040-67040-2,00010 pairs40 pairs670 ms2,000 ms

10-67010-2,00040-67040-2,00010 pairs40 pairs670 ms2,000 ms

HI

785 (280)760 (256)789 (273)796 (292)772 (268)793 (283)787 (277)778(275)

1.391.431.391.401.411.391.391.41

726706729730716729727718

1.401.441.401.391.421.401.401.42

FA

Mean RT

1,004 (322)1,017 (359)

971 (341)1,015 (400)1,009 (337)

990 (369)984 (334)

1,016 (386)

Mean 1/RT

1.171.091.201.111.141.161.191.10

Median RT

983988924969985946953978

Median 1/RT

1.081.081.141.091.081.111.111.08

CR

838 (286)834 (293)846 (284)862 (312)836 (290)855 (299)842 (285)847 (302)

1.301.301.291.271.301.281.291.28

789774793801782797791787

1.301.321.291.281.311.281.291.30

MI

965 (352)1,008 (389)

928 (295)999(370)980 (366)955 (328)945 (323)

1,003 (378)

1.141.101.181.111.131.151.161.11

909940872957924914890949

1.141.111.181.101.121.141.161.11

Note. Values were computed after adjustment of the data for nuisancevariables. For each set of conditions, the first four rows represent list length(10 or 40 pairs) followed by presentation time (670 or 2,000 ms). The datafor the fifth and sixth rows are for the two list lengths collapsed acrosspresentation time, and the data for the last two rows are for the twopresentation times collapsed across list length. Standard deviations are inparentheses. HI = hits; FA = false alarms; CR = correct rejections; MI =

performance on lists with slow rates of presentation was betterthan on lists with fast rates of presentation: for />(c), F(l,9) = 38.76, MSE = 4.75, for give-ups, F(l, 9) = 36.05,MSE = 5.24. The manipulations of length and strength had sub-stantial effects on recall performance: a 35% increase in p(c) forthe shorter lists and a 179% increase in p(c) for the stronger lists.The accuracy results for variables regarded as being of secondaryimportance (variables not modeled in Diller et al., 2001) are givenon the following website: http://www.psych.indiana.edu/publica-

tions.html. There were significant differences between participantsand across serial positions but no interactions.

Free-Response Times

The RT results for the variables of secondary importance, forboth recognition and recall, are also given on the following web-site: http://www.psych.indiana.edu/publications.html. The differ-ences that were found (e.g., differences among participants) werenot of much importance themselves. However, because these vari-

Table 4Experiment 1 Cued Recall: Response Time (RT) Central

Tendency Measures for the Free-Response Conditions

Response

Condition CO IN GU

MeanRT

10-67010-2,00040-67040-2,00010 pairs40 pairs670 ms2,000 ms

10-67010-2,00040-67040-2,00010 pairs40 pairs670 ms2,000 ms

10-67010-2,00040-67040-2,00010 pairs40 pairs670 ms2,000 ms

10-67010-2,00040-67040-2,00010 pairs40 pairs670 ms2,000 ms

1,486 (863)1,516(835)1,494(789)1,649(854)1,507(843)1,610 (840)1,489(834)1,573(846)

Mean

0.840.880.810.740.870.760.830.82

2,549 (1,001)2,934(1,066)2,666 (891)2,900(1,011)2,725 (1,048)2,799 (967)2,594 (960)2,917 (1,038)

1/RT

0.490.630.490.410.550.440.490.52

Median RT

1,2621,3451,3791,4731,3051,4301,3161,407

Median

0.890.810.800.740.850.770.850.77

2,5312,7162,6472,7372,6192,6932,5832,726

1/RT

0.450.440.410.420.450.410.430.43

2,111(1,012)2,656(1,141)1,960 (919)2,475 (1,058)2,324(1,097)2,172(1,011)2,031 (966)2,556(1,099)

0.730.640.670.520.690.610.700.57

1,8032,7281,6982,3892,2642,0431,7502,558

0.590.430.620.470.510.540.610.45

Note. Values were computed after adjustment of the data for nuisancevariables. For each set of conditions, the first four rows represent list length(10 or 40 pairs) followed by presentation time (670 or 2,000 ms). The datafor the fifth and sixth rows are for the two list lengths collapsed acrosspresentation time, and the data for the last two rows are for the twopresentation times collapsed across list length. Standard deviations are inparentheses. CO = corrects; IN = intrusions; GU = give-ups.

Page 7: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

390 NOBEL AND SHIFFRIN

1000 2000 3000 4000

Reaction Time (ms)

5000

UO

I)J

10 -

0.3 -

0.2 -

0.1 -

0.0 -

B1

/ \

A

1.

i—

n0m

1—r- A

— rA= .23= 998= 357= 965

1

P =u =i0 =m =

Ml26367346319

P(i

0

m

INI

= .08 "= 2758= 1013= 2653

1000 2000 3000 4000

Reaction time (ms)

5000

Figure 1. Experiment 1 response time (RT) distributions conditionalizedon correct responses (A: hits [HI], correct rejections [CR], and correct cuedrecalls [CO]) and incorrect responses (B: false alarms [FA], misses [MI],and recall intrusions [IN]). Data are collapsed across length and strength.The abscissa has RT in intervals of 50 ms. The ordinate has relativefrequency of responses in each RT interval. Each panel includes proportionof responses of that type (p), mean RT (fi), standard deviation (cr), andmedian RT (m).

ables can and did distort the shapes of the RT distributions (andbecause we fitted these distributions quantitatively in Diller et al.,2001), we adjusted the RTs to remove the effects of such variables.In the following sections, we present analyses based on the ad-justed RTs. Appendix A gives a detailed description of the adjust-ment method. It is important to point out that none of the patternsof data exhibited and discussed in this article and none of thestatistical effects are changed if the raw data are analyzed instead.(The tables present results for the adjusted RTs; the tabled RTmeans were unaffected by the adjustment procedures, but thetabled RT standard deviations were a little lower than for the rawdata.)

In keeping with traditional measures, we analyzed not onlydistributions but also measures of central tendency. When nuisancevariables are left in the data, different measures of central tendencycan produce quite different results (e.g., Ratcliff, 1993; Ulrich &Miller, 1994). For the adjusted data, the different measures tellmuch the same story. Nonetheless, we analyzed the adjusted datausing means and medians and did so for both RT and 1/RT. Theuse of 1/RT was suggested by Ratcliff (1993), who showed that

when variability among participant means was low relative to thestandard deviation of the distribution (as is true for the adjusteddata), the inverse transformation (1/RT) was always close to theoptimal cutoff for outliers and that the median provided an accu-rate estimate of the location of the distribution. As measures ofvariability, we focused on standard deviations, assuming that theadjustment techniques would prevent any significant distortions ofthe results.

Recognition RT. Table 3 presents the RT results (mean RTs,standard deviations, mean 1/RTs, median RTs, and median 1/RTs)for the four length-strength conditions, for length and strengthseparately, and for each of the possible responses: hits, falsealarms, correct rejections, and misses. The most surprising aspectof these results is the similarity across variations in length andstrength. Only a few of the comparisons among means werestatistically significant: Mean RT for hits was significantly higheron the long lists than on the short lists, F(l, 9) = 9.94,MSE = 0.09, p = .012, and mean RT for misses was lower on thefast lists than on the slow lists, F(l, 9) = 7.30, MSE = 0.28, p =.024; none of the other comparisons yielded statistically significantresults. In addition, a few comparisons among medians werestatistically significant: Median RT for false alarms was signifi-cantly higher on the long lists than on the short lists, F(l,9) = 11.89, MSE = 0.02, p = .007, and median RT for misses waslower on the fast lists than on the slow lists, F(l, 9) = 6.86,MSE = 0.07, p = .028; none of the other comparisons weresignificant. Analysis of 1/RT may be better justified, and in thiscase no significant differences for any comparisons across lengthand strength conditions were found for the means; for the medians,there was a significant difference for false alarms for list length,F(l, 9) = 8.62, MSE = 0.02, p = .017.

The striking similarities of measures of central tendency (and ofstandard deviations) across length and strength conditions aremirrored in the plots of the entire RT distributions, although thegraphs of the distributions for the various conditions are presentedin the companion article by Diller et al. (2001). The plots of thedistributions were constructed by cumulating observations in 50bins of size 100 ms (thereby covering the range from 0 to 5 s). Thegraphs of the RT distributions therefore give 50 proportions thatsum to 1.0; in each 100-ms bin is the proportion of all responsesof that type that had times within that interval. In addition, forconvenience, all graphs of distributions contain tabular informa-tion about the proportion of responses of that type (p), mean RT(jx), standard deviation (cr), and median RT (m).

The distributions were compared statistically with the nonpara-metric Kolmogorov-Smirnov (K-S) test for differences betweenthe cumulative distributions. Because the number of observationsin each distribution was quite large, this test was very sensitive.List-length differences were found for hits (D = .061) and correctrejections (D = .050). Presentation time differences were foundfor misses (D = .069). The other five comparisons were notsignificantly different statistically. The similarity of the RT distri-butions across variations in length and strength is particularlystriking when contrasted with marked differences in accuracyacross those same conditions. Such results may have importantpotential implications for theory, as taken up by Diller et al.(2001).

Figure 1 (summing across length and strength) illustrates thesmall differences found between the distributions for positive and

Page 8: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 391

negative recognition responses (solid lines vs. large dashes). Whenthe responses were correct (top panel; hits and correct rejections),the distributions were similar but not quite identical: Hits wereabout 60 ms faster than correct rejections, F(l, 9) = 2,728.30,MSE = 0.01, and F(l, 9) = 116.35, MSE = 0.51, for 1/RT.Differences were found for median RT, F(l, 9) = 327.65,MSE = 0.01, and the K-S test, D = .153. When the recognitionresponses were incorrect (lower panel; false alarms and misses),the distributions were again similar but not identical: False alarmswere about 30 ms slower than misses, F( l , 9) = 45.72,MSE = 0.03, for RT, but they did not differ for 1/RT. In addition,differences were found for median RT, F(l , 9) = 29.29,MSE = 0.02, and the K-S test, D = .068.

A comparison of the upper to lower panels in Figure 1 illustratesthe distributional differences for correct and incorrect responses,summed across length and strength conditions (recognition distri-butions are on the left of each panel). These differed on allmeasures. For targets (solid lines), there were large differencesbetween hits and misses. Hits were about 180 ms faster thanmisses: F(l, 9) = 3,746.65, MSE = 0.02, for RT; F(l, 9) =224.61, MSE = 0.71, for 1/RT; F(l, 9) = 1,220.16, MSE = 0.01,for median RT; and D = .346 for the K-S test. For distractors(large dashes), large differences between correct rejections andfalse alarms were found. Correct rejections were about 150 ms

faster than false alarms: F(l, 9) = 1,555.91, MSE = 0.03, for RT;F(l, 9) = 52.62, MSE = 0.77, for 1/RT; F(l, 9) = 476.18,MSE = 0.02, for median RT; and D = .266 for the K-S test. Therelation between accuracy and RT for the four separate conditionsis easier to see in an alternate plotting of the data: The top leftpanel of Figure 2 shows d' as a function of RT deciles for the shortlists for each presentation time; the bottom left panel shows this forthe long lists. In all conditions, slow responses were considerablyless accurate than fast responses.

We looked as well at other aspects of the RT distributions thatvarious investigators have suggested could be of importance. Forexample, the hazard functions have a shape that is fairly commonfor data from other sorts of tasks, such as simple RT, for which theunderlying distributions are ex-Gaussian (e.g., Luce, 1986, p. 134;Van Zandt & Ratcliff, 1995). An example of such a hazardfunction is given in Appendix B.

Recall RT. As with recognition, analyses were based on theadjusted data. With one minor exception discussed in Appendix A,none of the patterns of data exhibited and discussed subsequentlyand none of the statistical effects are changed if the raw data areanalyzed instead.

Table 4 presents the RT results (mean RTs, standard deviations,mean 1/RTs, median RTs, and median 1/RTs) for the four length-strength conditions, for length and strength separately, and for

3 - -

I 2Q.

1 " -

0 - -

10 pairs670 ms2000 ms

40 60

Percentile (RT)

80 100 40 60

Percentile (RT)

80 100

3 - -

•= 29-Q

1 - -

0 - -

I 1

40 pairs—o— 670 ms—v— 2000 ms -

20 40 60

Percentile (RT)

100 40 60

Percentile (RT)

80 100

Figure 2. Relation between accuracy and response time (RT) in free response in Experiment 1. Left panels forrecognition; right panels for recall. Performance as a function of RT deciles (Panels A: short lists; Panels B: longlists).

Page 9: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

392 NOBEL AND SHIFFRIN

each of the possible responses: corrects, intrusions, and give-ups.Comparisons of the means revealed several statistically significantresults: Mean RT for give-ups was significantly higher on the shortlists than on the long lists, F(l, 9) = 12.41, MSE = 4.00; mean RTwas lower on the fast lists than on the slow lists for corrects, F(l,9) = 5.87, MSE = 1.33, p = .038; intrusions, F(l, 9) = 10.48,MSE = 1.43, p = .010; and give-ups, F( l , 9) = 30.06,MSE = 17.94; the other comparisons did not yield significantresults. In addition, a few comparisons among medians werestatistically significant: Median RT for corrects was significantlyhigher on the slow lists than on the fast lists, F(l, 9) = 6.61,MSE = 0.18, p = .030; median RT for give-ups was lower on thelong lists than on the short lists, F( 1,9) = 11.03, MSE = 0.68, p =.009; and median RT for give-ups was lower on the fast lists thanon the slow lists, F(l, 9) = 31.98, MSE = 3.07; none of the othercomparisons were significant. Analysis of 1/RT may have beenbetter justified, and in this case no significant differences for anycomparisons across length and strength conditions were found forthe means; for the medians, there were significant differences forcorrects for list length, F(l, 9) = 8.07, MSE = 0.10, p = .019, andpresentation time, F(l, 9) = 16.01, MSE = 0.05, and for give-upsfor presentation time, F(l, 9) = 34.42, MSE = 0.11.

Although differences were found in the measures of centraltendency across length and strength conditions, the RT distribu-tions nonetheless overlapped considerably; although averagingacross length and strength was not quite as well justified as was thecase for recognition, we did so to allow ready comparison with therecognition results. Figure 1 shows the summary RT distributions(conditionalized for giving a particular response) for correct re-sponses and intrusions. The distributions for give-ups were gen-erally slow and skewed toward slow responses (the plots arepresented in Diller et al., 2001).

The distributions for each length-strength condition were com-pared statistically with the nonparametric K-S test for differencesbetween the cumulative distributions. Five out of six comparisonswere significantly different. List-length differences were found forcorrects (D = .094) and give-ups (D = .069). Presentation timedifferences were found for corrects (D = .081), intrusions (D =.160), and give-ups (D = .231). The considerable overlap of theRT distributions across variations in length and strength is partic-ularly striking when contrasted with the large differences in accu-racy across those same conditions. The relation between accuracyand RT for the four conditions is further explored in Figure 2. Thetop right panel shows the proportion of corrects as a function of RT

deciles for the short lists for each presentation time; the bottomright panel shows the proportion of corrects as a function of RTdeciles for the long lists. In all conditions, slow responses wereconsiderably less accurate than fast responses.

We looked as well at other aspects of the RT distributions thatvarious investigators have suggested could be of importance. Forexample, the hazard functions for the intrusion and give-up datahad a shape that is fairly common for data from other sorts of tasks,such as psychophysical judgments (e.g., Luce, 1986, p. 130). Anexample of such a hazard function is given in Appendix B.

Comparison of Recognition and Recall RTs

The recognition results were similar in pattern to the cued-recallresults, to a first degree of approximation: Despite the large dif-ferences in accuracy, the RT distributions changed little as afunction of length and strength. These length and strength findingsare discussed and modeled in Diller et al. (2001).

The between-paradigm comparisons, however, revealed largeRT differences. Table 5 presents the RT results for measures ofcentral tendency (mean RTs, standard deviations, mean 1/RTs,median RTs, and median 1/RTs), collapsed over length andstrength conditions, for each of the possible responses: (a) hits,false alarms, correct rejections, and misses in recognition and (b)corrects, intrusions, and give-ups in recall. Statistical comparisonsof the means yielded significant results: Mean RT in recall wassignificantly higher than in recognition—for corrects and hits, F(l,9) = 28,541.73, MSE = 0.04, and for intrusions and false alarms,F(l, 9) = 11,234.53, MSE = 0.17. In addition, the comparisonsamong medians were statistically significant: Median RT for cuedrecall was significantly higher than for recognition—for correctsand hits, F(l, 9) = 767.53, MSE = 0.15, and for intrusions andfalse alarms, F(l, 9) = 1,294.22, MSE = 0.53. Analysis of 1/RTmay have been better justified, and in this case all comparisonsyielded significant results for both mean 1/RT—for corrects andhits, F(l, 9) = 524.32, MSE = 1.38, and for intrusions and falsealarms, F(l, 9) = 105.28, MSE = 2.50—and median 1/RT—forcorrects and hits, F(l, 9) = 2,228.34, MSE = 0.05, and forintrusions and false alarms, F(l, 9) = 2,234.84, MSE = 0.05.These large differences in measures of central tendency weremirrored in the standard deviations, which were significantlylarger for the larger means (see Table 5), although statisticalanalysis was truncated for brevity.

Table 5Response Time (RT) Summary Results for Experiment 1 (Free Response)

Recognition Recall

Condition

Mean RTStandard deviationMean 1/RTMedian RTMedian 1/RT

HI

782276

1.40723

1.41

FA

998357

1.15965

1.10

CR

845294

1.29789

1.30

MI

967346

1.14919

1.13

CO

1,551843

0.821,364

0.81

IN

2,7581,013

0.512,653

0.43

GU

2,2421,054

0.652,153

0.53

Note. RTs are collapsed across length and strength conditions. HI = hits; FA = false alarms; CR = correctrejections; MI = misses; CO = corrects; IN = intrusions; GU = give-ups.

Page 10: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 393

The differences between recognition and cued-recall RTs areseen even more clearly in the plots of the RT distributions, asillustrated in Figure 1. The distributions look markedly differentfor recall and recognition (confirmed statistically with nonpara-metric K-S tests, ps < .001). Clearly, the cued-recall distributionsare shifted to the right, are more skewed to the right, and havehigher variance.

Appendix B contains examples of hazard functions (the proba-bility that an event will occur given that it has not occurred yet).It may be noteworthy that the shapes were quite different forrecognition (increasing then decreasing) and recall (increasing).Van Zandt and Ratcliff (1995) discussed the difficulty of drawingconclusions from hazard functions, partly because the interestingpart of the shape (the part after the rise to an initial peak) isdetermined by only a small part of the RT data in the tails of thedistributions. In addition, they demonstrated (and referred to arelevant literature) that mixtures of processes with different ratesoften produce nonmonotonic hazard functions, even when eachcomponent of the mixture has an increasing hazard function. Giventhat the hazard functions for both recognition and recall are likelyto be the result of mixtures, the different shapes for recognition andrecall may be another indication of different retrieval processes forthe two tasks. This finding could be the basis for a more thoroughinvestigation in the future, but for this article, we are satisfiedmerely to call attention to the difference.

Signal-to-Respond

Trials that were marked invalid by the participants were ex-cluded from all analyses (about 0.7% in recognition and 1.0% inrecall). An acceptance range for RTs between 25 and 300 msexcluded 17.4% of responses in recognition and 34.0% of re-sponses in recall, and a range between 25 and 400 ms ex-cluded 3.6% of responses in recognition and 9.6% of responses inrecall. Because none of the qualitative patterns or trends reportedhere and none of the statistical analyses changed for these tworanges, we decided to use responses that fell below the upperbound of our instructions, and we report observations with RTs inthe range of 25 to 300 ms.

Asymptotic Accuracy

We defined asymptotic accuracy as the average performanceover the three longest lags (1,400, 2,500, and 4,500 ms), becausethe performance seemed to have stabilized by this time (e.g.,Dosher, 1984a). Table 6 shows the asymptotic results (averagedover the three longest lags) for the four length-strength conditionsand for length and strength separately. For recognition, Table 6gives the probability of an old response for targets (hits), distrac-tors (false alarms), and a"; for cued recall, Table 6 gives theprobability of a correct response and an intrusion.

For recognition, ANOVAs showed significant effects for lengthand presentation time on all accuracy measures. Performance onthe short lists was better than on the long lists: for a", F(l,8) = 39.75, MSE = 0.27; for hits and misses, F(l, 8) = 13.55,MSE = 0.01, p = .006; and for correct rejections and false alarms,F(l, 8) = 70.80, MSE = 0.01. In a similar manner, performance onlists with slow rates of presentation was better than on lists withfast rates of presentation: for d', F(l, 8) = 20.31, MSE = 0.69; for

Table 6Asymptotic Accuracy Results for the Signal-to-RespondCondition for Experiment 1

Condition

10-67010-2,00040-67040-2,00010 pairs40 pairs670 ms2,000 ms

HI

.74

.83

.64

.79

.78

.71

.69

.81

Recognition

FA

.20

.09

.28

.19

.15

.24

.24

.14

response

d!

1.632.381.021.721.941.321.272.02

Cued-recall

CO

.25

.58

.16

.39

.42

.27

.21

.49

response

IN

.07

.08

.07

.08

.07

.07

.07

.08

Note. All values were computed by first averaging across participants andsessions and then averaging across the longest three lags. The conditions inthe first four rows represent list length (10 or 40 pairs) followed bypresentation time (670 or 2,000 ms). The data for the fifth and sixth rowsare for the two list lengths collapsed across presentation time, and the datafor the last two rows are for the two presentation times collapsed across listlength. HI = hits; FA = false alarms; CO = corrects; IN = intrusions.

hits and misses, F(l, 8) = 12.07, MSE = 0.03, p = .008; and forcorrect rejections and false alarms, F(l, 8) = 14.07, MSE = 0.02,p = .006.

For cued recall, performance on the short lists was better than onthe long lists: for corrects, F(l, 8) = 44.37, MSE = 0.25, and forgive-ups, F(l, 8) = 69.73, MSE = 0.16. In a similar manner,performance on lists with slow rates of presentation was betterthan on lists with fast rates of presentation: for corrects, F(l,8) = 28.35, MSE = 1.67, and for give-ups, F(l, 8) = 44.48,MSE = 1.15. No differences were found for intrusions.

Intrusion analysis. We analyzed the intrusion types in thesignal-to-respond condition by using the same criteria as thosedescribed for the free-response data. Again, typographical errors ofcorrect responses were recoded as correct (a total of 9.5% of allintrusions), and the remaining intrusions were divided into the fivecategories previously described. Table 7 shows the proportion ofintrusions in each category. Again, most intrusions (75%) camefrom the current list or the previous lists.

Retrieval dynamics assessed by exponential functions. Sensi-tivity of performance is typically plotted as a function of eithersignal delay or delay until response (also termed total processingtime). Because participants typically produce slower RTs aftershort signal delays, the two approaches are not equivalent. Moreoften than not, investigators have used total processing time, andwe followed this convention (given the unfortunately noisy char-acter of our signal-to-respond data, it would probably have beenunwise to expend much energy analyzing the data in both ways).Thus, a given d', or p(c), is plotted at a time corresponding to thesignal delay plus the average RT required for a signal at that delay.(The hit and false-alarm rates that together were used to calculatea" for each length and strength are given in the companion articleby Diller et al. [2001], along with model predictions.)

Examination of typical signal-to-respond functions shows aninitial period of chance performance, a subsequent period duringwhich accuracy increases rapidly, and a final period in which

Page 11: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

394 NOBEL AND StflFFRIN

Table 7Experiment 1 Cued Recall: Intrusion Resultsfor the Signal-to-Respond Condition

Category Proportion

Current listPrevious listSemantic—targetSemantic—cueUnknown

.464

.282

.114

.069

.071

accuracy reaches asymptote (e.g.> Dosher, 1984a, 1984b).6 Theseretrieval functions are typically described by an exponential func-tion with three parameters (see Equation 1): an asymptotic accu-racy parameter that reflects limitations in memory information, anintercept at which point accuracy first rises above chance, and arate of rise from chance to asymptote (see, e.g., Dosher, 1984a,1984b; Hintzman & Curran, 1994; Wickelgren & Corbett, 1977).The dynamics of retrieval are commonly summarized by theintercept and the rate parameters.

We compared the retrieval dynamics underlying the four con-ditions by fitting the following exponential function to the d' andp{c) data (see Dosher, 1981, 1984a; Hintzman & Curran, 1994;Wickelgren & Corbett, 1977):

f f<2'or/5(c)=Ajl - e x p

(f-/)

for t> l;d' = 0 , ^(c) = 0, for t < /. In this equation, A is theasymptote, G determines the rate of growth to asymptote, / is theintercept, and t is total processing time (delay + RT). Separateparameter values were fitted to recognition and recall.

The fits were evaluated by the proportion of variance accountedfor, adjusted for the number of free parameters, by using thefollowing equation for recognition:

(N-k)— , (2)

(N- 1)

where N is the total number of data points (d,0> d\ is the valuepredicted by Equation 1, 5' is the overall mean, and k is thenumber of free parameters (Hintzman & Curran, 1994; Reed,1976). For recall, the d's in Equation 2 were replaced by an arcsinetransformation of p(c).7 The data were fitted by minimizing thesum of the squared errors (and consequently maximizing the valueof r2) through a parameter estimation program that used combi-nations of the simplex method (Nelder & Mead, 1965) and theDavidson-Fletcher-Powell method (Fletcher & Powell, 1963).8

The top left panel of Figure 3 shows the observed recognition d'data, and the bottom left panel of Figure 3 shows the observedrecall proportion correct data, both with the best fitting exponentialfunctions for the four conditions. To fit the data for either recog-nition or recall (separately), we first let all parameters vary with

conditions (12 free parameters) and then fitted a restricted modelwith a common intercept and growth rate but different asymptotes(6 free parameters). The full and restricted models can be com-pared through an ANOVA (see, e.g., Neter, Wasserman, & Kutner,1985). The restricted models did not fit significantly worse thanthe full models: for recognition, F(6,28) = 0.77, MSE = 0.02, p =.600, and for recall, F(6, 28) = 0.95, MSE = 0.001, p = A16. Theexponential functions shown in Figure 3 are the predictions for therestricted models. The values of the best fitting parameters for bothmodels are given in Table 8.

As one would expect, the estimates of A, the asymptotic levels,closely mimicked the values reported earlier in the AsymptoticAccuracy section. They differed with length and strength varia-tions, and the results are not repeated here. For both recognitionand recall, measured either by averaging across the three longestlags or by using the asymptotic parameter estimates, accuracy wasuniformly superior in the signal-to-respond task than in the free-response task. Such findings are fairly typical (e.g., Dosher, 1982;Dosher & Rosedale, 1989). Although none of the individual com-parisons in the present study reached significance—for recogni-

6 Our cued-recall signal-to-respond functions in Experiment 1, and tosome extent in Experiment 2 as well, exhibited an atypical pattern, with aninitial period of above chance performance that was relatively flat. Giventhe variability of our signal-to-respond data, we chose to ignore this findingand fitted the cued-recall functions with the exponentials described in thetext, just as was done for the recognition conditions in which such a fit wasbetter justified. Nonetheless, the finding may well be real and deservessome discussion. The less interesting explanation involves variability inintercept across participants, items, trials, and so forth. Such variabilitywould produce an aggregate signal-to-respond function that is ogival, witha slowly rising initial edge, even if a pure exponential applies on each trial.Further such variability might well be higher for recall processes than forthe familiarity calculated for recognition. Possibly a more interestingexplanation is based on the idea that on some trials an extra process occursat the shortest lags: Suppose familiarity of the test word is assessed as thefirst step in retrieval (as posited in the model in the companion article byDiller et al., 2001). Occasionally, when familiarity is high, the participantmight initiate a "P" sound, anticipating (usually correctly) that successfulrecall will occur quickly enough to produce an output within the instruc-tional constraints of the experiment (consistent with the common findingthat RT after the signal is, on average, slower at the fastest signals). Suchan explanation might be consistent with hints in associative recognitiontasks, some in prior reports (e.g., Dosher, 1984b; Gronlund & Ratcliff,1989) and some in Experiments 2 and 3 in the present article, that there isa brief period of time with above chance but rather flat performance beforethe exponential growth starts in earnest. Notwithstanding the facts that suchobservations may be unreliable and may have many other explanations,there is other evidence that familiarity may be playing a role early inassociative conditions, and this is discussed after Experiments 2 and 3.

7 The variance of p(c) was not constant across lags because of a flooreffect at the lower lags. To stabilize the variances and to allow properstatistical comparisons, we transformed the proportions in recall by usingthe following equation (see, e.g., Montgomery & Peck, 1982): p(c)' = 2arcsineVp(c). Both the observed data and Equation 1 were transformedusing the arcsine equation.

8 These minimization methods were implemented in a program calledMINUTT, which was originally developed by F. James and M. Roos at theCERN, Geneva, Switzerland. See also Press, Teukolsky, Vetterling, andFlannery (1992) for a general discussion of these techniques, includingsource code.

Page 12: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 395

1000 2000 3000 4000

Total Processing Time (ms)

5000

0.0

1000 2000 3000 4000

Total Processing Time (ms)

5000

2.5

2.0

1.5

1.0

0.5

0.0

_ _ • o

670 ms2000 ms

0.6 -

„ 0.4 -

ST

0.2 -

0.0 -

B

T

- W

y

I

r'

3

y• T

-o—

I I I

_ f"

—o— 670 ms-••*•• 2000 ms

_

T

1000 2000 3000 4000

Total Processing Time (ms)

5000 1000 2000 3000 4000 5000

Total Processing Time (ms)

Figure 3. Experiment 1 signal-to-respond accuracy as a function of signal delay plus response time for the fourlength-strength conditions. Symbols are the observed data; lines are the predicted exponential functions for therestricted model with a common intercept and growth rate. Left panels: d' for the recognition conditions; rightpanels: p(c) for the cued-recall conditions. Best fitting parameters can be found in Table 8.

tion, f(8) = 2.21, p = .058, for the 10-pairs/670-ms list;f(8) = 1.44, p = .189, for the 10-pairs/2,000-ms list; t(8) = 0.68,p = .516, for the 40-pairs/670-ms list; and f(8) = 1.01, p = .340,for the 40-pairs/2,000-ms list; for recall, ?(8) = 2.27, p = .053, forthe 10-pairs/670-ms list; r(8) = 2.28, p = .052, for the 10-pairs/2,000-ms list; f(8) = 2.00, p = .081, for the 40-pairs/670-ms list;and t(8) = 0.70, p = .501, for the 40-pairs/2,000-ms list—theconsistent trends may suggest that participants in free responsesometimes terminated their memory retrieval processes at a pointin time at which less information was available than at the longestlags in signal-to-respond.9

For recognition and cued recall analyzed separately, the asymp-totic levels of performance in each case were higher for short listsand slower lists, but the retrieval dynamics did not differ acrossthese length-strength conditions.10 Of greatest present interest,however, are the comparisons of the retrieval dynamics betweenrecognition and cued recall. To compare these dynamics (we returnto this point in the discussion of Experiment 2), we made theassumption that p(c), the probability of correct recall, and d', themeasure of sensitivity for recognition, are appropriate measures.For reference, we also looked at the retrieval dynamics based onthe hit rate in recognition. To increase the power of the compari-sons and to reduce noise, the data were collapsed across length and

strength (a procedure at least marginally justified by our findingsthat only the asymptotes, and not the retrieval dynamics, differedacross length and strength).

We first separately fitted exponential functions to each of thethree measures, using nine parameters (three for each measure).Figure 4A shows the observed data with the resultant best fittingexponential functions forp(h), d', andp(c). The asymptotes are ondifferent scales and not directly comparable, but it is arguablyappropriate to compare retrieval dynamics. Therefore, two re-stricted four-parameter models were fitted to the data, models witha common intercept and growth rate but different asymptotes. Onerestricted model was fitted to the data forp(h) andp(c) (predictions

9 In the context of a rather different task using sentences, Dosher (1982)discussed cases in which differences in asymptotic accuracy might beexpected to produce differences in free-response RT. Her findings andassociated arguments highlight the need to find a way to model the smallfree-response RT differences found in Experiment 1 as a function of lengthand strength. Such a model is presented in Diller et al. (2001).

10 These findings may remind one of similar results by Dosher (1984b),using associative recognition. Her results are mentioned in the discussionof Experiments 2 and 3, both of which used associative recognition.

Page 13: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

396 NOBEL AND SHIFFRIN

Table 8Experiment 1 Signal-to-Respond: Best Fitting Parameters of theExponential Functions

ConditionFree

parameter

k = 12

GA

k = 6

GA

10 pairs 40 pairs 670 ms 2,000 ms r2

245135

1.89

27196

1.85

Recognition

249 28893 101

1.39 1.28

27196

1.41

27196

1.26

212131

2.03

271

2.02

.908

.893

Cued recall

12

GA

GA

102628

0.44

68762

0.46

01,239

0.32

68762

0.27

111619

0.22

68762

0.23

13921

0.54

68762

0.51

.924

.924

Note, k = number of free parameters; / = intercept (ms); G = growthrate (ms); A = asymptote.

are graphed in Figure 4B) and another to the data for d' and p(c)(predictions are graphed in Figure 4C). The values of the bestfitting parameters for the models are shown in Table 9. The fulland restricted models can be compared through an ANOVA (see,e.g., Neter et al., 1985). Both of the restricted models fittedsignificantly worse than the full model: F(2, 14) = 45.15,MSE = 0.001, for p(h) and p(c) and F(2, 14) = 7.56,MSE = 0.005, p = .006, for d' and p(c). These results confirmwhat appears to be the case after visual inspection: Retrievaldynamics were slower for cued recall than for recognition.

The parameter estimates showed a lower intercept for cuedrecall than for recognition, but this was a potentially unreliableresult, due largely to the well-known codependence of the growthrate and intercept parameters (e.g., Gronlund & Rateliff, 1989). Arestricted fit holding only the growth rate parameters constant wasrejected: F(l, 14) = 45.90, MSE = 0.006, for p(h) and p(c) andF(l, 14) = 12.95, MSE = 0.005, for a" andp(c). A restricted fitholding only the intercept parameters constant could not be re-jected for d', F(l, 14) = 2.11, MSE = 0.005, p = .169, but couldbe rejected for p(h), F(l, 14) = 6.01, MSE = 0.001, p = .028.These tests of the parameters of the exponential functions thatwere fitted to the signal-to-respond results thus confirm the free-response results: Retrieval dynamics measured in signal-to-respond (i.e., some combination of the intercept and growth rateparameters) were slower in cued recall than in recognition.

Discussion

When considered separately, both recognition and cued recallshowed differences in accuracy but little difference in RT whenlength and strength were varied. These very interesting results are

discussed and modeled in the companion article by Diller et al.(2001). However, in this article, we focus on the differences in RTsand retrieval dynamics between recognition and cued recall. Thefree-response procedure showed large RT differences betweencued recall and recognition. For correct responses (hits and correctrejections in recognition, and corrects in cued recall), the mean and

oa.

0.0

0 1000 2000 3000 4000 5000

Total Processing Time (ms)

0 1000 2000 3000 4000 5000

Total Processing Time (ms)

0.00 1000 2000 3000 4000 5000

Total Processing Time (ms)

Figure 4. Experiment 1 signal-to-respond results based on total process-ing time forp(h) and d' in recognition and pic) in cued recall. Symbols arethe observed data (collapsed across length and strength); lines are thepredicted exponential functions for the full model (A) and restrictedmodels (B and C) with a common intercept and growth rate. Best fittingparameters can be found in Table 9.

Page 14: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 397

Table 9

Experiment 1 Recognition and Recall Comparisons: Best FittingParameters of the Exponential Function

Freeparameter

k = 9IGA

k = 4/GA

IGA

Recognition,P(h)

263850.75

187160

0.77

Condition

Recognition,d'

239117

1.60

167200

1.65

Recall,p(c)

51854

0.38

187160

0.26

167200

0.27

.988

.991"

.989b

.941

.610

Note, k = number of free parameters; / = intercept (ms); G = growthrate (ms); A = asymptote.a Pairwise i2 for/?(h) andp(c), k = 6. b Pairwise r2 for d' andp(c), k = 6.

variance for the recognition distributions were quite a bit lowerthan for cued recall. In addition, responses in cued recall spannedthe entire response interval, whereas responses in recognitionshowed a cutoff around 2 s. It is not clear whether it is appropriateto compare incorrect responses in recognition with intrusions incued recall, but false alarms and misses in recognition showed RTdistributions with means and variability vastly different from thosefor intrusions in cued recall.

The results from the signal-to-respond procedure were unfortu-nately much noisier than those from the free-response procedure.The accuracy results were clearest and directly mimicked thosefound in free response. Asymptotic accuracy in signal-to-respondwas slightly higher than accuracy in free response, possibly sug-gesting that participants in free response terminated their retrievalprocesses before all of the relevant information was available.Despite the noise in the data, statistical comparisons of the param-eters of exponential functions fitted to the signal-to-respond curvesconfirmed the differences seen in the free-response data. Theretrieval dynamics, summarized by the growth rate (G) and theintercept (I), were quite different for single-item recognition andcued recall, with the growth rate in particular being faster inrecognition than in cued recall. These differences are fairly evidentin the functions graphed in Figure 3.

These differences in RTs (and the results of the additionalstudies reported in this article) provided the primary reason forfitting different models of retrieval to recognition and cued recall(Diller et al., 2001). In that article, recognition is treated as avariant of the REM model (Shiffrin & Steyvers, 1997; the generalapproach is similar in conception to that underlying the SAMmodel; e.g., Gillund & Shiffrin, 1984). Recognition involvesmatching the test word to all images in memory, in parallel,producing likelihood ratios for each image. The likelihood ratiosare summed, and a response is based on the result, so that retrievaloperates essentially as a one-step process. The cued-recall model isinstead a multistep process. Although the first step is that of therecognition model, namely, the production of likelihood ratios for

each image, there are additional steps, as in the sequential searchprocess of the SAM model for cued recall (e.g., Raaijmakers &Shiffrin, 1980, 1981). In each cycle of the search, an individualimage is sampled from memory, and then some of the contents ofthat image are recovered and examined. The cycles continue untila response is emitted or the participant decides to give up. Thissystem produces recall times that are spread out much farther intime than those of the recognition model.

At first glance, the differences between the recognition andrecall RTs seem to rule out models with a common retrievaloperation. The results specifically seem at odds with several mem-ory models that posit a single retrieval operation for both recog-nition and recall (e.g., Hintzman, 1988; Metcalfe Eich, 1982,1985;Murdock, 1982). For example, in CHARM (Metcalfe Eich, 1982,1985), episodic information about items is stored in a composite-distributed memory vector by a convolution operation. The re-trieval mechanism consists of correlating the test probe with thememory vector, which results in a retrieved trace. The retrievaloperation must be relatively fast to fit the rapid RTs found inrecognition. Because this mechanism operates in both recognitionand recall, the observed differences in RT distributions have to beexplained by what happens after retrieval. Similar reasoning op-erates in the case of other models positing a common retrievaloperation.

This argument can be clarified by distinguishing between re-trieval of episodic and lexical-semantic information. In an explicitmemory model like CHARM, the initial retrieval is of episodicinformation. This information is in the form of a noisy featurevector. In recognition, the subsequent decision could be quite rapidbecause the test alternative is available to the participant: Thenoisy vector could be compared directly with the test item, oraccess to the lexical entry of the test item could be speeded by thepresence of the test item. However, in cued recall, no comparisonitem is available. To produce an output, the retrieved vector mustbe compared with a lexicon separate from the episodic memory(though this process is left unspecified in CHARM). It is presum-ably this comparison with the lexicon that takes the extra time seenin cued recall. Similar arguments can be made for TODAM (Mur-dock, 1982), in which the postretrieval process is termed deblur-ring (though otherwise left unspecified). A related argument canbe used in the case of the matrix model (e.g., Pike, 1984). Sucharguments have some potential conceptual problems, including thefact that lexical access is observed to be an extremely rapid process(see, e.g., Hintzman & Curran, 1997) but nonetheless must betaken seriously.

Hintzman's (1988) MINERVA model stores episodic vectorsseparately (rather than in composite fashion). For recall,MINERVA calculates and uses echo content, rather than echointensity. It is unclear whether the time needed to retrieve echocontent for recall is different from that needed to retrieve echointensity for recognition. If the needed time is slower for recall,there are, by definition, different modes of retrieval for recognitionand recall. We focus, therefore, on the case in which retrieval ofecho intensity and echo content take the same time course. Con-sider that, as in CHARM, TODAM, and the matrix model, theresult in MINERVA of retrieval for recall is a noisy vector.However, because stored vectors are separate, the process ofdeblurring the retrieved vector can potentially consist of a series ofsuccessive retrievals, each using as a probe the result of the

Page 15: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

398 NOBEL AND SHIFFRDM

previous retrieval; with luck, these retrievals will converge on aparticular trace. Thus, this model could account naturally for theslower recall responses. Furthermore, these successive retrievalsoccur from episodic memory, so one would not want to refer to adistinction between retrieval and postretrieval in this case. None-theless, the extra recall time in this model, as in the three modelspreviously mentioned, is due to the process that turns the initiallyretrieved noisy vector into a response. It is this general notion, thatan initially fast retrieval like that occurring in recognition isfollowed by a slow process of cleaning up the retrieved vector toproduce a recall, that we wished to test in the following studies.

Experiment 2

Consider the possibility that participants begin retrieval in recalltasks by generating a noisy response with a time course thatparallels that in recognition and then spend additional time beyondthat needed in recognition to clean up the information to generatea response. If so, providing participants in recall tasks with thepotential answer might eliminate the need for additional time. Thisreasoning parallels that of Wickelgren and Corbett (1977) and ledus to adopt a variant of their signal-to-respond procedure (whichthey called "yes-no recall" but we term wickelcall). The results ofthis study paralleled those of Wickelgren and Corbett, demonstrat-ing similar retrieval dynamics for wickelcall and associative rec-ognition. If one assumes that wickelcall taps the processes of cuedrecall, both findings provide evidence against the hypothesis thatthe slower RTs in cued recall are due to a postretrieval processrequired to clean up a noisy vector and produce a response.However, the analyses suggest that a stronger, simpler, and morediagnostic study could provide even clearer evidence, and thisstudy is now reported as Experiment 2 (an account of the wickel-call study is reported as Experiment 3). The critical condition ofExperiment 2 used associative recognition as a diagnostic tool:Signal-to-respond tests consist of two words presented together;responses are withheld until a signal and then must be givenimmediately. The two words are either two words that had beenstudied together (targets) or two words from two different studypairs (distractors). The critical issue is the time course of retrievalin this case (which provides both alternatives and therefore obvi-ates the need for a postretrieval clean-up operation) compared withthat in cued recall and in single-word and pair recognition. Gron-lund and Ratcliff (1989) used STR techniques to examine variousforms of single-item, pair, and associative recognition. Theyfound, as did Dosher (e.g., 1984b) and Hintzman and Curran(1994) in somewhat different tasks, that there is an initial periodduring associative recognition when retrieval seems to be based onfamiliarity and similar foils tend to be given "old" responses.Related to this finding, Gronlund and Ratcliff found that trueassociative information first begins to determine responding asmuch as 200 ms after familiarity begins to accumulate. Thesefindings indicate one way that retrieval of associative informationdiffers from retrieval of familiarity. Gronlund and Ratcliff alsoobtained point estimates of rate of growth, and surprisingly, theseestimates indicated faster growth for associative conditions, but asthe authors pointed out, these estimates were highly unreliablebecause of (among other things) trade-offs of parameter values forintercept and rate of growth. The present studies address theseissues.

Outline of Experiment 2

Participants were tested on three types of recognition tasks anda cued-recall task in a signal-to-respond procedure. The cued-recall task and the single-word recognition task each used one testword. The associative recognition and paired recognition taskseach used two test words. We decided to eliminate the variationsof length and strength because they did not bear on the issues ofpresent interest.

In all cases, study lists consisted of pairs of items. Three typesof recognition tasks were used: (a) a single-item recognition taskthat was the same as that used in Experiment 1 (single words astargets and new words as distractors), (b) an associative recogni-tion task (intact pairs as targets and rearranged pairs as distractors),and (c) a paired recognition task (intact pairs as targets and pairsof new words as distractors). In addition, a cued-recall task wasincluded; this task was the same as that used in Experiment 1.

Method

Participants

Ten Indiana University Bloomington graduate students received finan-cial compensation for their participation in the experiment. There were atotal of 10 experimental sessions and 1 training session.

Materials and Apparatus

The stimuli were high-frequency English nouns (ranging from 21to 2,012 occurrences per million) from 3 to 10 letters long (Francis &Kucera, 1982). Forty-four words were used on two practice lists to start the1st experimental session, and 1,068 words were used in each experimentalsession. No word was studied on more than one list in a session, no wordwas tested more than once in a session, and a studied word could be testedin a session in the test period only immediately following its study list. Thesame pool of 1,068 words was permuted and used again for each sessionof the study. The words were displayed on IBM-compatible PCs. RTs wererecorded using a voice-activated response box; subsequent typed responsesused to verify accuracy were entered on the keyboard.

Design and Procedure

Each session was divided into 24 study-test cycles. During study,participants saw a list of pairs of words. All pairs on a given list werepresented at the same presentation rate. The length of each list was 20pairs, with a presentation time of 1,000 ms per pair.

Four test conditions were defined by the instructions given after thestudy period and at the start of the test list; the participants did not knowthe test condition during study. The same type of test was used for theentire test period for a given list. For two types of tests, single words werepresented for tests, one at a time (single-word recognition and cued recall),and for the other two types of tests, pairs of words were presented for tests,two at a time (associative recognition and paired recognition). In thesingle-item recognition task, participants were asked to differentiate be-tween an old word (the right or left word from a pair was tested with equalfrequency) and a new word. In the cued-recall task, one word from a studypair was given (the right or left member of a pair was tested with equalfrequency), and participants had to try to generate the other word of thepair. In the associative recognition task, participants were asked to differ-entiate between intact pairs and rearranged pairs of previously studieditems (in all cases, the left-right assignment remained intact). In the pairedrecognition task, participants were asked to differentiate between intactpairs and new pairs of items (for new pairs, neither word had been

Page 16: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 399

presented before). A test word or test pair always was presented centeredon the screen. Each of the four conditions was tested six times in eachsession.

All test lists consisted of 12 trials, regardless of condition. For therecognition conditions, 6 targets and 6 distractors, randomly intermixed,were tested. In single-item recognition and cued recall, half of the wordstested had been studied on the right side of a pair, and half had been studiedon the left side. For all conditions, the first and last pairs of each study listwere not tested, and the last three presentations of the "testable" pairs (18in total) were never tested first. The test list for associative recognition wasconstructed by randomly selecting 6 pairs out of the testable pairs (tar-gets); 6 distractor pairs were generated by randomly selecting a right sideof 1 of the remaining pairs (12 in total) and a left side of another pair (onlyone word per pair was used for the distractors). The test list for pairedrecognition was constructed by randomly selecting 6 pairs (targets) out ofthe testable pairs; 6 distractor pairs were generated by randomly selectingnew words from the word pool. The test list for single-item recognitionconsisted of 6 randomly selected items from different pairs of testablewords (half of which had been studied on the right side of a pair and halfof which had been studied on the left side); 6 distractors were generated byrandomly selecting 6 new words from the word pool. The test list for cuedrecall consisted of 12 randomly selected items from different pairs oftestable words; again, half were presented on the right side and half on theleft side. After the word pool was permuted for each session, the 24study-test cycles were generated, and their order was randomized.

A signal-to-respond procedure was used, with the procedures matchingthose used for Experiment 1. Response collection matched that for Exper-iment 1, with a voiced response followed by a typed response.

Results

Trials that were marked invalid by the participants were ex-cluded from all analyses (about 0.5% in total). We followedtraditional practice and our own previous analyses of Experiment 1by excluding also observations with RTs greater than 300 ms. Thetrials eliminated for each condition were about 14%, 19%, 16%,and 26% for single-item recognition, associative recognition,paired recognition, and cued recall, respectively. Another set ofanalyses excluding responses with RTs greater than 400 ms re-duced these percentages to 2.6, 3.8, 2.9, and 6.2, respectively. Thequalitative patterns of results did not change, however, and thestatistical conclusions reported below were not altered for thisenlarged data set. We therefore report the data with the 300-mscutoff, the cutoff that matched our instructions and feedback.

Cued-Recall Intrusions

For the cued-recall condition, trials on which the participantsgave a positive verbal response (i.e., "P—word"), followed by anincorrect typed response, were labeled as intrusions. The intru-sions that were typographical errors of correct responses wererecoded as correct responses (a total of 7.9% of all intrusions). Theother intrusions were analyzed the same way as in Experiment 1.Table 10 shows the proportion of intrusions in each category. Asin Experiment 1, most intrusions were words from the current listsor the previous lists (about 75% of the total).

Asymptotic Accuracy

We again assessed asymptotic accuracy by averaging resultsover the longest three lags (2,500, 3,500, and 4,500 ms). Table 11shows the asymptotic results for the probability of an "old" re-

Table 10Intrusion Results for Experiment 2

Category Proportion

Current listPrevious listSemantic—targetSemantic—cueUnknown

.464

.282

.114

.069

.071

sponse (hits and false alarms) for the three recognition conditions,d' for these conditions, and the probabilities of correct recall andintrusions.

For the recognition conditions, ANOVAs showed a significanteffect for d' and false alarms but not for bits: for d', F(2,18) = 10.29, MSE = 0.29, and for false alarms, F(2, 18) = 22.71,MSE = 0.26. Paired performance was superior to the other recog-nition conditions. The single-item and associative recognition con-ditions did not differ significantly on hits, false alarms, or d\ asurprising result that is taken up in the discussion.

Retrieval Dynamics Assessed by Exponential Functions

As in Experiment 1, we fitted exponential functions to d' for therecognition conditions and p(c) for the cued-recall conditions as afunction of total processing time. However, there are other ways tomeasure recognition performance, and there are a number ofpotential problems in comparing intercept and rate of growthparameters when the measurements lie on different scales, such asp(c) for cued recall and d' for recognition. We decided to tacklethese issues by (a) scoring the recognition data in four ways and(b) transforming the measurements to lie on similar scales.

The measures that we used to assess recognition were d' (on ascale from 0 or even less to infinity; see Green & Swets, 1966); A'assuming a standard deviation ratio of .8 and A' assuming astandard deviation ratio of 1.0 (see Snodgrass & Corwin, 1988),and Gamma (see Nelson, 1984). Gamma is defined as (H —F)/(H + F - 2HF), where H is the hit rate and F is the false-alarmrate (see Nelson, 1984), and is bound in [0, 1] when H > F. ForH < F, Gamma was set to 0. For the A' measures, H and F werefirst transformed to z scores, after which A' was computed (seeIverson & Sheu, 1992, Equation 8b). The resulting z score, A', wastransformed into a proportion that ranged from .5 to 1. Thesevalues were transformed by 2p — 1 (truncated at 0), where p is theproportion, to the range [0, 1] to yield measures comparable tocued recall.

The A' measures, Gamma, and p(c) lie in the interval [0, 1]; foreach of these, the goodness of fit used to fit the exponentialfunction was assessed after an arcsine transformation was appliedto better equate variances (see Footnote 4). The fits were thenevaluated by the proportion of variance accounted for, adjusted forthe number of free parameters, as described for Experiment 1.

Figure 5A gives the results for d'\ it shows the observed recog-nition data with the best fitting exponential functions. Figure 5Bshows the observed cued-recall data, pic), with the best fittingexponential function. These were obtained by letting all parame-ters vary with conditions, for a total of 12 free parameters, forwhich the best fitting values are shown in Table 12. The point

Page 17: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

400 NOBEL AND SHIFFRIN

Table 11Asymptotic Accuracy Results for Experiment 2

Condition

Single RNPaired RNAssociative RNCued recall

Response

HI

.78

.75

.75

FA

.21

.06

.21

d'

1.692.201.62

Response

CO IN

.24 .10

Note. All values were computed by first averaging across participants andsessions and then averaging across the longest three lags. HI = hits; FA =false alarms; CO = corrects; IN = intrusions; RN = recognition.

Table 12Best Fitting Parameters of the ExponentialFunction (Experiment 2)

Freeparameter

/GA

SingleRN

35756

1.68

PairedRN

279133

2.26

Condition

AssociativeRN

317319

1.54

Cuedrecall

173831

0.26

Note. RN = recognition; 7 = intercept (ms); G = growth rate (ms); A =asymptote.

estimates of these parameters are not too meaningful because ofhigh variability of the estimates in some cases and because of theinterdependencies of the growth rate and intercept parameters. Wecalculated the results for a variety of restricted models, withoutcomes that are discussed below, but we first provide a more

cQ.

2.5 -

2.0 -

1.5 -

1.0 -

0.5 -

0.0 -

-A

i

I' u

it

i

T7 ^ _ -

/ •

/ I 6

///f

1

1 1 !

v v _

6 •_^ a . &

A

Recognition—•— single—^— associative—^— paired

\ 1 11000 2000 3000 4000

Total Processing Time (ms)

5000

0.3 -

S 0 . 2 -

0.1 -

0.0 -

B

o o

/

o

oo

-

—o— Cued recall

1000 2000 3000 4000

Total Processing Time (ms)

5000

Figure 5. Experiment 2 signal-to-respond accuracy as a function of totalprocessing time. A: d' results for recognition; B: p(c) for cued recall.Symbols are the observed data; lines are the predicted exponential func-tions for the full models. Best fitting parameters can be found in Table 12.Proportion of variance explained: r2 = .939 for single-item recognition,r2 = .896 for paired recognition, r2 = .875 for associative recognition, andr2 = .872 for cued recall.

readily interpretable and easy-to-perceive measure of the differ-ences in intercepts and growth rates across the conditions.

Figure 6A shows the 95% confidence contours for the bestfitting retrieval dynamics parameters (/ and G), for the a" measure,and for cued recall p(c). These contours were obtained by com-paring the full model, in which all parameters were free to vary,with a restricted model, in which the asymptote was the only freeparameter. A grid search was performed for fixed values of / andG, and the sum of squared errors was minimized while A wasvaried. The error values of the full and restricted models were thencompared through an F test (see, e.g., Neter et al., 1985), with twodegrees of freedom for the numerator and seven degrees of free-dom for the denominator. The a level was set at .05; if therestricted model did not fit significantly worse, that particularcombination of / and G fell inside the 95% confidence region;otherwise, it fell outside the region. The combinations of / and Gthat formed the boundary are plotted in Figure 6. This procedurewas repeated separately for each of the four conditions.

This contour map is a useful way of displaying the resultsbecause it displays the well-known interdependencies of the inter-cept and growth rate parameters. The confidence intervals areelongated regions with a major diagonal having a negative slope.Thus, a slow growth rate tends to be associated with a smallintercept, and vice versa. One of these parameters can partiallycompensate for changes in the other.

The procedure that was used to produce the results for the a"measure that are depicted in Figure 6A was repeated for each ofthe other measures of recognition performance, with results shownin the next three panels: Gamma in Figure 6B, A' with a standarddeviation ratio of .8 in Figure 6C, and A' with a standard deviationratio of 1.0 in Figure 6D; the cued-recall plot is the same in all fourpanels but is repeated for convenience of comparison. The panelsof Figure 6 tell much the same story: Retrieval dynamics for cuedrecall and associative recognition were fairly similar with partiallyoverlapping confidence intervals; both exhibited slower growththan either item or pair recognition. (Item recognition tends to havelarger intercepts and faster growth rates than pair recognition.)

These points were confirmed statistically by comparing the fourconditions in pairwise fashion, in all combinations. In each case, afree fit of all parameters was compared (a) with a model that hada common intercept and growth rate and (b) with a model that hada common growth rate (for the d' recognition measure only). Fora", the only restricted model with a common intercept and growthrate that did not fit significantly worse than the full model was the

Page 18: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 401

23

500 -

400 -

300 -

200 -

100 -

A

. Ill

dy

i

1

i —i 11: Item recognitionII: Pair recognitionIII: Associative recognitionIV: Cued recall

—. -1 1 1

500 -

400 -

300-

200 -

100 -

0 -

B

I I" I

\

" v\

^

- • - • T "

IV

\ \

1—

1: Item recognitionII: Pair recognitionIII: Associative recognitionIV: Cued recall

\

\

1 1400 800 1200

Growth Rate (G)

1600 2000 400 800 1200 1600 2000

Growth Rate (G)

600

500

I: Item recognitionII: Pair recognitionIII: Associative recognitionIV: Cued recall 500 -

400 -

300 -

200 -

100 -

0 -

D

III J

IWi \

II \

IV

\

\ \

1_

i i1: Item recognitionII: Pair recognitionIII: Associative recognitionIV: Cued recall

-

\

\

1400 800 1200

Growth Rate (G)

1600 2000 400 800 1200

Growth Rate (G)

1600 2000

Figure 6. Experiment 2: 95% confidence contours for intercept and growth rate for single-item recognition,paired recognition, associative recognition, and cued recall. A shows d', B shows Gamma, C shows A' with astandard deviation ratio of .8, and D shows A' with a standard deviation ratio of 1.0.

one for associative recognition and cued recall, F(2,18) = 1.97,MSE = 0.018, p = .168; all other pairwise comparisons yieldedF values with ps < .001. When the growth rates were consid-ered alone, the associative recognition and cued-recall condi-tions did not differ significantly, F( l , 18) = 3.89,MSE = 0.018, p = .064, of course, and all other comparisonsdid. For Gamma, the only restricted model with a commonintercept and growth rate that did not fit significantly worsethan the full model was the one for associative recognition and

cued recall, F(2, 18) = 2.37, MSE = 0.028, p = .122; all otherpairwise comparisons yielded F values withps < .01. For bothA' measures (standard deviation ratios of 1.0 and .8), tworestricted models with a common intercept and growth rate didnot fit significantly worse than the full model, namely, forsingle-item recognition and paired recognition, F(2, 18) = 2.71,MSE = 0.007, p = .094, for a ratio of 1.0, and F(2, 18) = 2.29,MSE = 0.008, p = .130, for a ratio of .8, and for associativerecognition and cued recall, F(2,18) = 2.52, MSE = 0.003, p =

Page 19: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

402 NOBEL AND SHIFFRIN

.108, for a ratio of 1.0, and F(2, 18) = 2.74, MSE = 0.003, p =

.091, for a ratio of .8.The results of all these analyses suggest that the differences

among the growth rate and intercept parameters among the con-ditions were not due to differences in the range or type of thescales of measurement. Whether the measures we looked at are themeasures that ought to be used for recall and recognition is asomewhat different and more difficult question. This issue isaddressed in the discussion following Experiment 3.

Experiment 3

When measured in any of the standard ways, information growsmore rapidly for single-item old-new recognition and for pairedrecognition than for associative recognition and cued recall. Ex-periment 3 provides supporting evidence from a related proce-dure.11 The "yes-no recall" signal-to-respond procedure used byWickelgren and Corbett (1977) is very similar to cued recall. Theidea is to provide a cue for cued recall, which the participant is touse to try to recover an answer. However, the answer, if generated,is to be withheld until a later point in time. The participant isinterrupted at a certain time and provided with a candidate re-sponse word to which a "yes-no" response must be given at once.This candidate response is always a word from the list, and so isfamiliar, but may or may not have been studied with the cue word.If the participant has already retrieved a word in a form that can beoutput, it ought to be possible to very quickly match it to theprovided answer. The trick is to allow so little time for a responsethat a correct answer cannot be given unless the response hasalready been retrieved.

It is known that intact rearranged decisions are quite slow, andthe point in time at which such decisions begin to rise abovechance performance is a few hundred milliseconds later than thepoint in time at which old-new recognition decisions begin to riseabove chance performance (220 ms later in Gronlund & Ratcliff,1989; see also Dosher, 1981; Hintzman & Curran, 1994). Thus,requiring a rapid response allows time for matching to occur incases in which the correct answer has been generated by theparticipant during the interval between cue and probe (as demon-strated by Wickelgren & Corbett, 1977) but does not allow time forcoming to a correct decision if the cleaning-up process has notbeen finished. We term this procedure wickelcall.

If the sequential sampling model for cued recall is correct, thenperformance should gradually rise as signal delay increases, re-flecting the cases in which the correct image has been sampled upto that point in time. For example, if the free-response data showthat 10% of correct recalls occur after 3 s, then the new techniquewill still be (roughly) 10% short of asymptotic performance at a3-s test. In contrast, if cleaning up a retrieved trace is what takesthe extra time in recall, then the provided answer should eliminatethis step, and performance should reach asymptote no later than thepoint in time at which recognition responses cease to occur (about2 s; see Figure 1).

A wickelcall test requires participants to decide whether twowords were presented together in the same pair at study or werepresented in two different pairs at study. This is thus a sequentialversion of associative recognition (used in Experiment 2). A reg-ular associative recognition condition was also used in this study:Two words from the same pair or different pairs were presented

simultaneously at the start of the trial, and the participant re-sponded when a later signal arrived.

Method

Participants

Ten Indiana University Bloomington graduate students received finan-cial compensation for their participation in the experiment. There were atotal of 10 experimental sessions and 1 or 2 training sessions.

Materials and Apparatus

The stimuli were high-frequency English nouns (ranging from 21to 2,012 occurrences per million) from three to nine letters long (Francis &Kucera, 1982). Forty words were used on two practice lists to start the 1stexperimental session, and 960 words were used in each experimentalsession. Other details matched those for Experiment 2.

Design and Procedure

Each session was divided into 32 study-test cycles. During study,participants saw a list of pairs of words. All pairs on a given list werepresented at the same presentation rate. Varied between lists were thelength of the list (10 or 20 pairs) and the presentation time (2 or 6 s per pair;though these variations were not of major importance for the purposes ofthis article). After presentation of the list, a simple arithmetic task (addi-tion) was given for 16 s. Participants had 10 s to type their addition answeron the keyboard.

Two test conditions were defined by the instructions given after thearithmetic period and at the start of the test list; during list presentation, theparticipants did not know which test condition would be used. The sametype of test was used for the entire test period for a given list. In theassociative recognition condition, participants were asked to differentiatebetween intact pairs and rearranged pairs of previously studied items (in allcases, the left-right assignment remained intact). In the wickelcall condi-tion, one word from a study pair was given, and the other word wasreplaced by a question mark (the right or left member of a pair was testedwith equal frequency, and the left-right assignment remained intact), andparticipants had to try to generate the other word of the pair but withholdany overt response. After a variable lag, the question mark was replacedwith a word from the study list; this second word could be either the correctword from the studied pair or a word from another pair on the study list.The participants were asked whether the second word matched the one theystudied with the first word.

Each of the four lists that had a particular combination of length andstrength was tested four times in each condition in each session. A sessionconsisted of two equal parts divided by a small break. Each part containedtwo randomized blocks of the eight types of lists.

Test lists consisted of 10 pairs (5 intact and 5 rearranged) for the shortlists and 20 pairs (10 intact and 10 rearranged) for the long lists. In thewickelcall condition, the question mark appeared on each side on half ofthe trials. For the long lists, the testing order was blocked; the first 10

1 ' An early version of Experiment 1 used presentation times of 2 and 6 sand list lengths of 10 and 20 pairs; this study was redone (and reported asExperiment 1 in this article) with different manipulations of length andstrength to produce observable effects of these variables on accuracy. Thewickelcall study was carried out after this early study and therefore used itsvalues for length and strength manipulations. By the time we carried outthe study reported in this article as Experiment 2, we realized that manip-ulations of length and strength were not needed to study differences in RTsbetween different tasks.

Page 20: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 403

presented pairs during study were tested first, followed by the second 10presented pairs. After the word pool was permuted for each session, the 32study-test cycles were generated, and their order was randomized.

The response collection procedures followed those of Experiments 1and 2 for signal to respond, with the following differences: Participantsgave no voiced response; they provided binary typed responses and wereasked to respond within 200 ms after the tone. In the wickelcall condition,the probe consisted of just one word, and the response word at most delaysappeared 200 ms prior to the tone (in effect allowing 400 ms for partici-pants to respond from the presentation of the probe); lags of 100 and 200ms were identical to associative recognition (simultaneous presentation ofthe two word cues), and at a lag of 300 ms, the probe word preceded thetest word by only 100 ms. In both conditions, if no response had been given200 ms after the tone, the trial was marked as invalid.

Results

Trials that were marked invalid (RTs > 200 ms) were excludedfrom all analyses. In total, approximately 9% of the associativerecognition responses were excluded, and 13% of the wickelcallresponses were excluded.

Asymptotic Accuracy

Asymptotic accuracy, the average over the longest three lags(1,400, 2,500, and 4,500 ms), is given in Table 13 for hits, falsealarms, and d' for the various conditions. For the wickelcallcondition, ANOVAs showed significant effects for list length butnot for presentation time. Performance on the short lists was betterthan on the long lists: for d', F(l, 9) = 27.76, MSE = 0.09; for hitsand misses, F(l, 9) = 26.38, MSE = 0.01; and for correct rejec-tions and false alarms, F(l, 9) = 6.61, MSE = 0.01, p = .030. Nosignificant effects were observed for the variable presentationtime. For the associative recognition condition, ANOVAs showedsignificant effects for list length and presentation time on almostall accuracy measures. Performance on the short lists was betterthan on the long lists: for d', F(l, 9) = 53.31, MSE = 0.05, and forhits and misses, F(l, 9) = 32.36, MSE = 0.01; no significantdifferences were found for correct rejections and false alarms. In asimilar manner, performance on lists with slow rates of presenta-

Table 13Asymptotic Accuracy Results for Experiment 3

Condition

10-2,00010-6,00020-2,00020-6,00010 pairs20 pairs2,000 ms6,000 ms

HI

.41

.42

.33

.37

.42

.35

.35

.39

Wickelcall

FA

.18

.14

.17

.21

.16

.19

.17

.18

d'

0.750.910.540.540.850.560.640.70

Associative recognition

HI

.75

.81

.66

.74

.78

.70

.69

.76

FA

.24

.19

.26

.21

.22

.23

.25

.21

d'

1.511.881.171.611.731.371.291.70

Note. All values were computed by first averaging across participants andsessions and then averaging across the longest three lags. The conditions inthe first four rows represent list length (10 or 20 pairs) followed bypresentation time (2,000 or 6,000 ms). The data for the fifth and sixth rowsare for the two list lengths collapsed across presentation time, and the datafor the last two rows are for the two presentation times collapsed across listlength. HI = hits; FA = false alarms.

tion was better than on lists with fast rates of presentation: for d',F(l, 9) = 15.85, MSE = 0.31; for hits and misses, F(l, 9) = 7.71,MSE = 0.02, p = .022; and for correct rejections and false alarms,F(l, 9) = 10.14, MSE = 0.01, p = .011. In general, these resultsare similar to those from Experiment 2, showing asymptotic ac-curacy differences as a function of length and strength variations.The dependence was somewhat weaker in this study, possiblybecause in this study the list-length variation was less extreme andthe presentation times were slower overall, making the presenta-tion time variation less extreme because of the nonlinear increasein performance with increasing study time.

Retrieval Dynamics Assessed by Exponential Functions

We fitted exponential functions to d' as a function of totalprocessing time, as in the first two studies. For each combinationof length and strength, the left-hand panels of Figure 7 show theobserved data for wickelcall, and the right-hand panels show thesame for associative recognition. To fit the data for each task, wefirst let all parameters vary with conditions (12 free parameters)and then fitted a restricted model with a common intercept andgrowth rate but different asymptotes (6 free parameters). The fulland restricted models were compared through an ANOVA (see,e.g., Neter et al., 1985). For wickelcall, the restricted model did notfit significantly worse than the full model, F(6, 28) = 0.48,MSE = 0.01, p = .816. For associative recognition, the restrictedmodel also did not fit significantly worse than the full model, F(6,28) = 0.52, MSE = 0.02, p = .787. The exponential functionsshown in Figure 7 are the predictions for the restricted models. Thevalues of the best fitting parameters for both models are shown inTable 14. The slow and regular growth curves in Figure 7 and thegrowth rate parameter values are consistent with the time courseresults from Experiments 1 and 2 for cued recall, and not recog-nition. In general, the results are consistent with those of Dosher(1984a), who studied associative recognition and found that vari-ation of presentation time produced asymptotic accuracy differ-ences but not differences in retrieval dynamics.

Comparing Retrieval Dynamics in Recognition and Recall

Figure 8 shows the observed associative recognition and wick-elcall data, collapsed over length and strength. In addition, single-item recognition results are shown from a comparable experimentcarried out a few years earlier with a different set of participants(using the same list lengths, presentation times, and procedures,except that the time following the signal in which the participantshad to respond was limited to 300 ms; Nobel & Shiffrin, 1992).Each curve is shown with the best fitting exponential function.Table 15 gives the values of the best fitting parameters.

Figure 9 shows the 95% confidence contours for the best fittingretrieval dynamics parameters (/ and G). See the discussion inExperiment 2 for the construction methods and interpretation. Thelarge overlap of the associative recognition and wickelcall con-tours indicates the same underlying retrieval parameters. Pairwisecomparisons of the three conditions confirmed the conclusionsevident in the graph: The only restricted model with a commonintercept and growth rate that did not fit significantly worse thanthe full model was the one for associative recognition and wick-elcall, F(2, 14) = 1.97, MSE = 0.01,/? = .177. The other restricted

Page 21: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

404 NOBEL AND SHIFFRIN

0 1000 2000 3000 4000 5000

Total Processing Time (ms)

2.0 -

1.5 -

f 1.0-

0.5 -

0.0 -

0

V-cT

LJ.—| 1 1—o— 10 pairs• » - - 40 pairs

1000 2000 3000 4000 5000

Total Processing Time (ms)

1.25

1.00

a> 0.75

0.50

0.25

0.00

>— 670 ms• • - 2000 ms

1000 2000 3000 4000

Total Processing Time (ms)

50000.0

0 1000 2000 3000 4000 5000Total Processing Time (ms)

Figure 7. Experiment 3: The left-hand panels show wickelcall results, and the right-hand panels showassociative recognition results for the four length-strength conditions (d' as a function of total processing time).Symbols are the observed data; lines are the predicted exponential functions for the restricted model with acommon intercept and growth rate. Best fitting parameters can be found in Table 14.

models fitted significantly worse than the full models: for wick-elcall and single-item recognition, F(2,14) = 15.58, MSE = 0.01,and for associative recognition and single-item recognition, F(2,14) = 21.89, MSE = 0.01. We also compared the conditions inpairwise fashion for the case in which only the growth rate pa-rameter was fixed across conditions. In this case, wickelcall andassociative recognition did not differ significantly, of course, andthe other two comparisons did, F(l, 14) = 14.66, MSE = 0.002,for associative recognition and single-item recognition and F(l,14) = 8.22, MSE = 0.007, p = .012, for wickelcall and single-itemrecognition. Comparing the conditions in pairwise fashion with arestricted model in which only the intercept parameter was fixedacross conditions yielded no significant results. These results sug-gest slower retrieval for wickelcall and associative recognitionthan for single-item recognition.

Discussion

Using both free response and signal-to-respond, Experiment 1demonstrated slower retrieval for cued recall than recognition.According to one interpretation, recognition operates in one step ofparallel activation producing a value of familiarity, whereas cuedrecall proceeds as a sequence of sampling and recovery operations(e.g., Gillund & Shiffrin, 1984; Raaijmakers & Shiffrin, 1980).

Experiments 2 and 3 examined the alternative possibility thatepisodic retrieval in cued recall proceeds with a time course likethat in episodic recognition but that recall requires an additionalprotracted process that cleans up the retrieved information in orderto produce a response, probably through reference to a lexicalstore. The wickelcall and associative recognition conditions pro-vide the relevant words to the participants and therefore shouldhave eliminated the need for a protracted additional process. None-theless, the retrieval dynamics were very slow compared withsingle-word recognition and were statistically consistent withthose in cued recall, suggesting instead that retrieval in wickelcalland associative recognition proceeds as it does in cued recall. Wesuggest that in each of these cases retrieval operates as a sequentialsearch.

The astute reader will have noted that the asymptotic perfor-mance level for wickelcall was quite low, much lower than thecomparable associative recognition level. This result had beenfound by Wickelgren and Corbett (1977; for associative recogni-tion of both pairs and triples). A version of their explanation seemsreasonable: Suppose both wickelcall and associative recognitionoperate as a search process. In associative recognition, each of thetwo test items might be used as recall cues, with each routecontributing something independent to the observed performance

Page 22: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 405

Table 14Best Fitting Parameters of the ExponentialFunction (Experiment 3)

Freeparameter

k= 12

GA

IGA

k= 12

GA

IGA

10 pairs

381880

0.96

503612

0.91

413467

1.81

417445

1.79

Condition

20 pairs 2,000 ms

Wickelcall

529410

0.61

503612

0.66

501549

0.69

503612

0.71

Associative recognition

428410

1.42

417445

1.44

479311

1.33

417445

1.38

6,000 ms

539577

0.75

503612

0.74

372552

1.79

417445

1.74

r2

.921

.928

.959

.963

Note, k = number of free parameters; / = intercept (ms); G = growthrate (ms); A = asymptote.

level. In wickelcall, only one word is available as a recall cue untilthe second arrives, and then there is not sufficient time to carry outa recall operation with the second word, possibly accounting forthe performance difference. It is interesting that Wickelgren andCorbett compared wickelcall for learned pairs and triples, bothcued with one word at the start of retrieval and then tested after adelay with a second word. To perform well in the triple condition,

0.0

Item recognitionAssociative recognitionWickcall

1000 2000 3000 4000

Total Processing Time (ms)

5000

Figure 8. Comparison of wickelcall and associative recognition (bothfrom Experiment 3) and single-item recognition (data from Nobel &Shiffrin, 1992), showing d' as a function of total processing time. Symbolsare the observed data (all data are collapsed across length and strength);lines are the predicted exponential functions for the full model. Best fittingparameters can be found in Table 15. Proportion of variance explained:r2 = .933 for wickelcall, r2 = .973 for associative recognition, and r2 =.990 for single-item recognition.

Table 15Best Fitting Parameters of the Exponential Functions(Experiment 3 and Nobel & Shijfrin, 1992)

Freeparameter

/GA

Single-itemrecognition

376190

2.05

Condition

Wickelcall

475661

0.76

Associativerecognition

423422

1.52

Note. The wickelcall and associative recognition data are from Experi-ment 3 (collapsed across length and strength); the single-item recognitiondata are from a comparable experiment (Nobel & Shiffrin, 1992). / =intercept (ms); G = growth rate (ms); A = asymptote.

participants would have had to recall at least one and sometimesboth of the other words in a triple and had these ready for rapidmatching. The advantage of associative recognition over wickel-call was surprisingly similar for pairs and triples, consistent with amodel in which sampling is used during the delay period to locatethe relevant pair or triple, the contents of which are then used atfinal test to make a decision (see Shiffrin, Murnane, Gronlund, &Roth, 1989, for additional evidence and arguments supporting thispoint).

The associative recognition and wickelcall results from Exper-iment 3 come from a different study with different participantsthan the (earlier) single-word recognition study with which theyare compared. In addition, it is questionable whether it is proper tocompare single-word testing with two-word testing. However,Experiment 2 eliminated these potential problems, and the resultsgave rise to the same conclusions: Measured in any of the standard

700

600 - -

500 - -

400 - -

3 0 0 - -

200 - -

100

Item recognitionB = Associative recognitionC = Wickelcall

400 800 1200 1600 2000

Growth Rate (G)

Figure 9. Experiment 3: 95% confidence contours for intercept andgrowth rate for wickelcall, associative recognition, and single-item recog-nition.

Page 23: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

406 NOBEL AND SHIFFRIN

ways, information grows more rapidly for single-item old-newrecognition and for paired recognition than for associative recog-nition and cued recall.

It is now time to ask how commensurate are the measurementsfor cued recall and the various forms of recognition. For example,if in cued recall the cube root of performance—that is, of p(c)—was plotted as a function of time, the rate of growth would belarger, perhaps becoming similar to that in single-item recognition.There are several things one might say about such concerns. First,in the absence of some plausible model to justify measuringperformance in this way, such an approach is just another way ofdescribing the observation that the rates of growth differ. Second,even within the recognition domain, distinctly different rates ofgrowth occur for single-item and paired recognition comparedwith associative recognition. We propose an account of this find-ing in which associative recognition is carried out through the useof recall, thereby exhibiting slower rates of growth. Argumentspositing different measurement scales for cued recall and recog-nition would not provide a plausible account of this pattern ofresults.

Experiments 2 and 3 addressed the possibility that there is aretrieval phase in cued recall that operates as quickly as retrieval inrecognition but that the slow recall responses are due to the needto clean up the (rapidly) retrieved noisy trace. Such an argumentdoes not seem consistent with either the wickelcall or associativerecognition results: Both provide explicit response alternatives tothe participant and, hence, should not require slow cleaning up ofa retrieved trace.

Why then is associative recognition slow? Why cannot theparticipant use a parallel computation of familiarity (a fast process)to carry out this task? There are several answers to consider. Thefirst allows the possibility that familiarity is indeed used to carryout associative recognition but produces rather poor performancelevels in comparison with recall processes. It is therefore used inparallel with recall processes but changes performance too little tobe noticed. The best evidence for such a familiarity process actingin combination with recall would occur at very small lags in signalto respond, at a point in time before recall processes kick in.Unfortunately, the data at such observation intervals are notori-ously noisy and unreliable, in our experiments and others as well,making it most difficult even to demonstrate an early use offamiliarity, let alone demonstrate greater familiarity for intact thanrearranged test pairs. Perhaps the most that can be said is that thereare hints in our recall and associative recognition STR functions,and in related data from Dosher (1984b) and Gronlund and Ratcliff(1989), that there may be an early period during which associativeand recall performance is above chance, but the functions are notyet rising rapidly according to the exponential rates characteristicof the rest of the STR functions. It may not be wise to draw anystronger conclusions, partly because variability in intercept acrossparticipants, trials, and items could produce similar patterns.

A second answer also assumes that familiarity-based associativerecognition is relatively ineffective and that, for this reason, par-ticipants may choose not to use a retrieval strategy that uses thisbasis for response. There may be a number of related reasons whyassociative recognition is relatively ineffective when based onfamiliarity. Most of these reasons are based on quantitative detailsof the processes assumed to carry out associative recognition bymeans of familiarity, but a few general considerations are worth

noting. First, words in the test pairs are individually as familiarwhether they are in intact or rearranged pairs, if assessed inisolation. This fact alone may make it difficult to detect additionalfamiliarity due to configural properties, especially if there is con-siderable variability in individual item familiarity so that a pan-may seem familiar by virtue of very high familiarity of onecomponent alone. The degree of extra familiarity to be expectedfor intact pair testing would depend on the model or models usedto handle this case. One idea is that the context of other items maycause the encoding of a given item to change (e.g., diamond in thecontext of wedding may be coded differently than diamond in thecontext of card game; see Clark & Shiffrin, 1987; Dosher &Rosedale, 1989, for relevant data and discussion). Another idea isthat retrieval with a compound cue consisting of both words tendsto match or activate an intact pair to a degree beyond that expectedfor the sum of the activations of two rearranged pairs. Both thesepredictions depend on the details of the storage and activationmodels, including any limited capacity that would prevent full useof multiple words or their features as compound cues. Relatedissues concern the possibility of separate storage of information fora single item and for a pair containing that single item. Forexample, Dosher and Rosedale (1989, 1997) argued for functionalindependence of stored single-item information and stored pair (ortriple) item information (an "ensemble"). One might superficiallythink that separate storage of ensemble information would increasethe usefulness of familiarity as a retrieval strategy, but this possi-bility and the others mentioned would require quantitative model-ing to assess the strength of the arguments (see Shiffrin &Steyvers, 1998, for one approach to this issue). At this point, wecan say only that it is plausible that for any one of several reasonsthe use of joint familiarity to carry out associative recognitionmight be a relatively ineffective process. Whatever the reason, wehypothesize that associative recognition is carried out primarily bysampling and recovery cycles in a search process. Such a model issimilar to that presented for cued recall in the companion article byDiller et al. (2001) and is sketched below. If so, the similarretrieval dynamics for cued recall and associative recognitionwould be expected.

The similarity of accuracy in Experiment 2 for associativerecognition and single-item recognition (despite the RT difference)was unexpected at first glance. In previous studies (e.g., Clark &Shiffrin, 1987, 1992), associative recognition exhibited poorerperformance than single-item recognition. However, the presentresults argue for different retrieval mechanisms in the two tasks,and if so, there is no necessary relation between performance in thetwo cases. For example, if associative recognition operates bysampling of separately stored pair-image traces, a sampled pairimage could be judged "old" if both parts match the probe butjudged "new" if just one part matches (resampling probably wouldoccur if nothing matches). Such a source of information mightselectively enhance associative recognition performance in com-parison with single-word recognition carried out by a global sum-mation of familiarity. In addition to such retrieval-model consid-erations, storage strategies could also play a role. For example, ourwell-trained participants may have become efficient at associativecoding, coding that allows generation of an associate when theappropriate image is sampled. Alternatively, the participants mayhave put less emphasis on context (which allows discrimination oflist items from new items) and more emphasis on interitem coding.

Page 24: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 407

The observed higher level of performance for pair than single-word recognition would be expected under the assumptions ofalmost any model, because presenting two words rather than one,when the distractors consist of two new words, more or less givestwo chances of discriminating correctly. The dynamics of retrievalare more surprising: It appears that pair recognition showedslightly lower intercepts and slightly slower information retrievalthan single-word recognition (see Figures 5 and 6 and Table 12).The lower intercept might reflect variability in feature encoding,the minimum time for the first features to become active, gettinglower as the number of features increase. Slower growth rate mightbe due to some sort of limitation of parallel decision making, if, forexample, the response rule requires separate calculation of thefamiliarity values of the two words, and the product of these valuesdetermines the decision.

However one explains the differences in retrieval dynamics forsingle-word and pair recognition, the slowing of retrieval causedby the presence of two test words is quite small and does not comeclose to explaining the large amount of slowing of retrieval seen inassociative recognition. This provides some confidence for con-cluding that associative recognition is an extended process in time,quite possibly due to a search process containing cycles of sam-pling and recovery.

On Models for Recognition and Recall

In the companion article, Diller et al. (2001) provide quantita-tive simulation models for single-item recognition and cued recalland fit the models to the data from these tasks for both freeresponse and signal-to-respond. In that article, possible extensionsto pair recognition, associative recognition, and wickelcall are alsodiscussed. The reader is referred to that article for details. Here wegive only a brief summary of the basic approach for single-wordrecognition and cued recall.

The Assessment of Retrieval Completion(ARC)-REM Recognition Model

A word is represented as a vector of 20 nonzero feature values.A studied word pair is copied into memory as an image that isincomplete (some values are zero) and error-prone (some valuesare copied incorrectly). The study time determines the number ofnonzero entries. This episodic image for a pair of words is repre-sented as a vector of 40 feature values, 20 for each word, includingzero values. When a test word appears at a given moment in time,t, its (complete or incomplete) vector is matched in parallel to allthe episodic images of words from the list (for single-word rec-ognition, the double-length vectors for pairs are treated as twosingle-word vectors). For each episodic image, the matching con-sists of a count of the mismatching values, excluding zeros, and acount of the matching cases and their values. These numbers areconverted by formula (see Diller et al., 2001; Shiffrin & Steyvers,1997) to a likelihood ratio for each image. The likelihood ratiogives the probability that the image was stored during study of thetest word, divided by the probability that the image was storedduring study of something else. These likelihood ratios are aver-aged to produce the odds for that moment in time, t. How / ischosen for free response and signal-to-respond is discussed in the

next paragraph. A default decision of "old" is given if the odds aregreater than 1, but this criterion can be adjusted by the participant.

The ARC-REM model (in Diller et al., 2001) assumes that thefeatures of the probe become active individually, in highly variablefashion. While this is happening, the features of each image aredecaying exponentially. Comparison of a given feature takes placeonly when both a probe feature and a corresponding image featureare active at the same time. Once an image feature is compared,however, it is protected from further forgetting. In free response, aresponse is made when a high enough proportion (an estimatedparameter) of the probe features have become active (ARC is a keyassumption enabling the model to predict accuracy differences butnot RT differences when length and strength are varied). In signal-to-respond tasks, the odds are calculated on the basis of thecomparisons that have been made up to the time of the signal.

The ARC-REM Model for Cued Recall

The recognition and cued-recall models of Diller et al. (2001)are essentially identical to the point at which likelihood ratios havebeen calculated on the basis of matching values of a test word tomemory images (in cued recall using free response, the likelihoodsare calculated when all probe features have become active). Theselikelihood ratios are then used as the basis for cued recall. Thecued-recall model is more complicated than that for recognitionbecause it consists of a series of steps, each involving sampling ofan image, recovery of information from the sampled image, anddecisions. This is termed the memory search.

The memory search has the following stages. First, use the testword to generate likelihood ratios for the episodic images of eachword from the study list. These remain fixed through the rest of thesearch for that trial. Second, for each pair, select the maximum-likelihood ratio.12 Sample a pair proportional to these maximaraised to a fractional power. Third, examine the match between thetest word and the word image sampled: If the match is goodenough, accept this pair as the one being sought; if not, then skipto Step 6. The criterion for acceptance drops as the 5-s period foraccepting nears its end. Fourth, try to recover information from theother word image in the accepted pair. The ability to output ananswer depends on the number of features in that word image, orelse skip to Step 6. Fifth, if a word is output, the probability thatit will be correct depends on the likelihood ratio for the sampledpair, or else an intrusion is made. Sixth, an exit from Step 3 or 4triggers a decision whether to give up and terminate the search.This decision is based on the familiarity of the test word; that is,if the odds that the test word is "old" are below a criterion, thesearch is terminated. This decision becomes more liberal as the 5-speriod for responding nears its end. Seventh, if a decision is madeto continue, the search returns to Step 2, and sampling is madeagain with replacement.

All of the steps in the search have associated distributions oftimes. The time for Step 1 is that determined by the rules forrecognition, and the subsequent steps have times that are described

12 One could just as easily assume that a pair image is sampled on thebasis of the individual word likelihood ratios raised to a fractional power.The alternative models would likely be indistinguishable for the presentdata, but there might be testable differences for tasks in which the twowords of a pair vary in their similarity.

Page 25: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

408 NOBEL AND SHIFFRIN

in Diller et al. (2001). For signal-to-respond, one could imaginecomplicated modifications in which search continues despite lowfamiliarity of the probe word or is continued past the point whena plausible response is first encountered. However, we simplyassumed that the free-response model continues until the signal: Aresponse recovered by that time is output, or else a give-upresponse is made.

General Discussion

We are proposing that single-item and paired recognition arecarried out by a process of parallel access to recent episodicimages; the results of retrieval are combined and used to drive adecision process leading to a relatively fast response (either "old"or "new"). We are suggesting that cued recall and associativerecognition are carried out through a memory search, with succes-sive sampling and recovery until a relevant image is located,producing responses that are skewed toward longer RTs. Theassociative recognition conditions are particularly diagnostic be-cause the slow responses in cued recall could otherwise have beenascribed to a process of cleaning up (possibly through lexicalaccess) a memory trace retrieved in a single rapid step. Associativerecognition provides the two alternatives at test, obviating the needfor such a clean-up process. The wickelcall findings support thesame argument.

It is interesting to note that the pattern of results from Experi-ment 1 in single-word recognition is similar to that in cued recall(i.e., substantial accuracy differences and small or missing RTdifferences across length and strength conditions). It is quite pos-sible for this reason that a model similar to that proposed forrecognition, based on one retrieval step, could be applied to cuedrecall. However, the parameters of such a model would have to bequite different from those for recognition to predict the muchslower retrieval in cued recall. In the absence of any obviousreason why retrieval based on the single-word cue used in bothparadigms should have quite different temporal characteristics, weprefer to assume that different retrieval processes are operating inrecognition and cued recall (and in pair vs. associative recognitionparadigms as well).

Why should different retrieval processes be used in cued recalland associative recognition than in single-word and pair recogni-tion? We have touched on the answers in earlier discussion butelaborate a bit more here. This question has two components: (a)Why are recall processes not used to carry out the simpler sorts ofepisodic recognition? (b) Why are familiarity judgments not usedto carry out associative recognition? We take these up in turn.

In the case of single-word and pair recognition paradigms, theuse of a global familiarity process may be driven by convenienceand ease: Recent list items may be stored with sufficiently distinctcontext that such a mechanism is able to discriminate such itemsfrom new items with relative ease. Carrying out an extended searchmight involve considerable additional effort without a proportionalgain in performance.

This argument does not, however, preclude the possibility that afamiliarity process is augmented by certain types of recall-likeprocesses, as long as those processes operate quickly and easily.Dual-route (or multiple-route) models of recognition have, ofcourse, been suggested by many investigators (e.g., Brainerd et al.,1999; Dosher, 1984b; Hintzman & Curran, 1994; Horton et al., in

press; Jacoby, 1991; Juola et al., 1971; Mandler et al., 1969;Yonelinas, 1999). The present results argue that a recall-like routecannot be much extended in time, but it is, of course, possible toadopt an intermediate position. Recall carried out by search andsampling sometimes succeeds very quickly (when the first orperhaps the second sample selects the correct image; see, e.g.,Hintzman & Curran, 1994) and only occasionally requires thelarge number of samples that produce very slow responses. Sup-pose then that the familiarity response is collected in parallel witha sample-recovery from memory, with the first to succeed gov-erning a response. This model would indeed combine familiaritywith a recall-like mechanism and would still produce very fast RTs(even faster than the familiarity mechanism alone). Our own dataare almost certainly not sufficient to test between such a version ofthe "familiarity plus recall" model and the "familiarity only"model, especially because the two routes might lead to correlatedresponse decisions. We have not tried to fit a combined model forthis reason, and also because the complexity of the present modelsis already quite high. We have no theoretical or conceptual objec-tions to such a combined model, however.

Let us return to the converse of this issue: If recall processes dosometimes contribute meaningfully to recognition, even in trun-cated form, why would participants in a recognition task notcontinue search and sampling for an extended period of time,thereby improving performance? Even if such search is effortfuland aversive, participants are apparently willing to put out sucheffort in recall tasks. Probably the net gain is too low to merit suchactivity, given that familiarity may provide a good basis for re-sponse, and given that those cases in which familiarity judgmentsare in error may be highly correlated with cases in which anextended search may fail (if, e.g., the target image is stored quiteweakly). In cued recall, of course, the participant has no choicebecause familiarity of the test word does not provide a response;the relevant image must be sampled for recall to succeed, and itmay take quite a few samples for this to happen.

Thus, there seem to be plausible arguments for a single-stepparallel activation and matching process for old-new recognitiontasks (possibly augmented by a truncated recall-like process) andfor an extended search for cued-recall tasks. What then are theprocesses used in associative recognition? It would theoretically bepossible, according to the ARC-REM model, to probe memorywith both test words and use familiarity of the result to make adecision. The problem is that this procedure does not appear towork well when the distractor pairs contain familiar list items. Forexample, Clark and Shiffrin (1987) looked at this situation forword triples, doubles, and singles. Operating within the context ofthe SAM model (e.g., Gillund & Shiffrin, 1984), they found thatcalculating familiarity for various multiword cues could not pro-duce performance levels for various types of associative recogni-tion as high as those observed. They therefore augmented the SAMmodel with an encoding specificity hypothesis to solve this prob-lem. In retrospect it seems more likely that participants use a recallstrategy in these situations (e.g., Clark, 1992; Clark & Shiffrin,1992; Dosher, 1984b; Gronlund & Ratcliff, 1989; Hintzman &Curran, 1994). This issue was discussed in the context of the REMmodel by Shiffrin and Steyvers (1998).

If familiarity judgments in associative recognition produce per-formance levels that are unacceptably low, then the participantmight well be led to adopt a recall procedure, sampling images one

Page 26: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 409

at a time and making decisions on the basis of the informationrecovered from those images. Because this procedure is based onsequential image sampling, its time course ought to be roughlyakin to that seen in cued recall, which we argue is also based onsequential sampling. This pattern was observed in Experiment 2.Of course, various mixed models are conceivable, and we dis-cussed one variant in which guessing strategies might be based onfamiliarity.

The fact that associative recognition exhibits a slow rate ofretrieval is important because it provides evidence concerning thetype of retrieval occurring in cued recall. One argument that couldbe made concerning retrieval in cued recall goes as follows: (a) Aninitial phase of parallel retrieval occurs, with a time course like thatseen in recognition (perhaps because the same mechanism under-lies this phase in both tasks), (b) The result of Phase 1 retrieval isa "noisy" set of information that is often insufficiently precise toallow generation of an overt response, (c) A second phase ofretrieval then ensues, in which the noisy information is cleaned upto the point where an overt output can be made. This second phasecould consist of probing the lexical-semantic memory with thenoisy trace from Phase 1 and finding an acceptable match orpossibly could consist of using the noisy trace from Phase 1 as aprobe of the episodic memory a second time (and this recyclingcould continue for some number of additional steps). The secondphase of retrieval is needed, according to this argument, becausethere is no potentially correct word available with which the noisytrace from Phase 1 can be compared (as happens in recognitiontasks).

Associative recognition (and wickelcall) is therefore criticalbecause the participant is provided with the potentially correctwords with which the retrieved trace from Phase 1 might becompared directly. This direct comparison should circumvent theneed for a second phase of retrieval and should produce a fast timecourse of retrieval, as in single-item and paired recognition. Thedata showed quite slow retrieval in associative recognition, how-ever, and we have concluded that this line of argument is incorrectand that search and sampling is a better model for associativerecognition.

This argument on the basis of the associative recognition resultsis not entirely airtight, and some alternative hypotheses need to beconsidered. Perhaps a second phase of retrieval is needed inassociative recognition despite the presence of the response alter-natives. For example, suppose the participant uses a mnemonic orimaging technique to remember each pair (e.g., to encode table-horse, the participant imagines a rider sitting on a horse using atable rather than a saddle). Perhaps recognition of a test pairrequires recall of the mnemonic or image. The mnemonic or imageis not provided at test, of course, and might have to be recoveredin a second phase of retrieval.

We think it is perfectly reasonable that participants use mne-monic, imaging, and other coding techniques to remember pairs ofwords, and indeed such techniques would improve recall in asearch model (see, e.g., Raaijmakers & Shiffrin, 1980, 1981).However, the existence of such codes, and their usefulness in cuedrecall, does not convince us that a second phase of retrieval togenerate them would be needed in the hypothesized model forassociative recognition. After all, the retrieved trace from Phase 1presumably has noisy information concerning the two words in theimage as well as the code used to connect them. If the two noisy

words match the test pair, then this ought to be enough to respond"old," and it is not clear how coming up with a cleaned-up tracelater in time would improve performance. In addition, there is thelogical problem that the image or mnemonic to be recalled mightnot even be available in the participant's lexicon (as in the examplewith a table on a horse). In contrast, if Phase 2 of retrieval (in thehypothetical augmented model) takes place through the episodicimages in memory (or the one episodic image in the case ofcomposite storage models), then the whole problem has just beenregressed one step, and one is still left with the necessity ofsequential search or some similar slow process in Phase 2 ofretrieval. For reasons like these, we think the search model forcued recall and associative recognition is simpler, easier to defend,and generally to be preferred. We do note that if future researchreveals reliable differences between retrieval dynamics for cuedrecall and associative recognition, these might be due to the use ofmnemonic information in cued recall.

As a final comment, the arguments that have been made for adifferent and more extended retrieval mechanism in cued recalland associative recognition (and wickelcall), than in single-wordand pair recognition, do not depend critically on the details of themodels that have been fitted to the data in the companion article byDiller et al. (2001). Improved models may surface in the future, buton the basis of the data patterns reported here, we think it is likelythat such models will need to incorporate different retrieval mech-anisms for old-new single-item and paired-item recognition on theone hand and for cued recall, associative recognition, and wickel-call on the other hand. The most likely models, based on thedata in these three articles, would in our view attribute retrievalin pair and single-word recognition to parallel global activationand matching and would attribute retrieval in cued recall andassociative recognition (and wickelcall) to sampling of, recoveryfrom, and decisions about individual memory images sampledsuccessively.

References

Ashby, F. G., Tein, J. Y., & Balakrishnan, J. D. (1993). Response timedistributions in memory scanning. Journal of Mathematical Psychol-ogy, 37, 526-555.

Atkinson, R. C , & Shiffrin, R. M. (1968). Human memory: A proposedsystem and its control processes. In K. W. Spence & J. T. Spence (Eds.),The psychology of learning and motivation: Advances in research andtheory (Vol. 2, pp. 89-195). New York: Academic Press.

Brainerd, C. J., Reyna, V. F., & Mojardin, A. H. (1999). Conjoint recog-nition. Psychological Review, 106, 160-179.

Chappell, M., & Humphreys, M. S. (1994). An auto-associative neuralnetwork for sparse representations: Analysis and application to modelsof recognition and cued recall. Psychological Review, 101, 103—128.

Clark, S. E. (1992). Word frequency effects in associative and itemrecognition. Memory & Cognition, 20, 231-243.

Clark, S. E., & Shiffrin, R. M. (1987). Recognition of multiple-item probes.Memory & Cognition, 15, 367-378.

Clark, S. E., & Shiffrin, R. M. (1992). Cuing effects and associativeinformation in recognition memory. Memory & Cognition, 20, 580-598.

Dark, V. J., & Loftus, G. R. (1976). The role of rehearsal in long-termmemory performance. Journal of Verbal Learning and Verbal Behav-ior, 15, 479-490.

Deese, J. (1960). Frequency of usage and number of words in free recall:The role of association. Psychological Reports, 7, 337-344.

Diller, D. E., Nobel, P. A., & Shiffrin, R. M. (2001). An ARC-REM model

Page 27: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

410 NOBEL AND SHIFFRIN

for accuracy and response time in recognition and recall. Journal of Exper-imental Psychology: Learning, Memory, and Cognition, 27, 414-435.

Dosher, B. A. (1981). The effect of delay and interference: A speedaccuracy study. Cognitive Psychology, 13, 551-582.

Dosher, B. A. (1982). Effect of sentence size and network distance onretrieval speed. Journal of Experimental Psychology: Learning, Mem-ory, and Cognition, 8, 173-207.

Dosher, B. A. (1984a). Degree of learning and retrieval speed: Study timeand multiple exposures. Journal of Experimental Psychology: Learning,Memory, and Cognition, 10, 541-574.

Dosher, B. A. (1984b). Discriminating preexperimental (semantic) fromlearned (episodic) associations: A speed-accuracy study. Cognitive Psy-chology, 16, 519-555.

Dosher, B. A., McElree, B., Hood, R. M., & Rosedale, G. (1989). Retrievaldynamics of priming in recognition memory: Bias and discriminationanalysis. Journal of Experimental Psychology: Learning, Memory, andCognition, 15, 868-886.

Dosher, B. A., & Rosedale, G. (1989). Integrated retrieval cues as amechanism for priming in retrieval from memory. Journal of Experi-mental Psychology: General, 118, 191-211.

Dosher, B. A., & Rosedale, G. (1997). Configural processing in memoryretrieval: Multiple cues and ensemble representations. Cognitive Psy-chology, 33, 209-265.

Fletcher, R., & Powell, M. J. D. (1963). A rapidly convergent descentmethod for minimization. Computer Journal, 6, 163-168.

Flexser, A. J., & Tulving, E. (1978). Retrieval independence in recognitionand recall. Psychological Review, 85, 153-171.

Francis, W. N., & Kucera, H. (1982). Frequency analysis of English usage:Lexicon and grammar. Boston: Houghton Mifflin.

Gillund, G., & Shiffrin, R. M. (1984). A retrieval model for both recog-nition and recall. Psychological Review, 91, 1-67.

Glenberg, A., & Adams, F. (1978). Type I rehearsal and recognition.Journal of Verbal Learning and Verbal Behavior, 17, 455—463.

Green, D., & Swets, J. (1966). Signal detection theory and psychophysics.New York: Wiley.

Gronlund, S. D., & Ratcliff, R. (1989). Time course of item and associativeinformation: Implications for global memory models. Journal of Exper-imental Psychology: Learning, Memory, and Cognition, 15, 846-858.

Hall, J. F. (1979). Recognition as a function of word frequency. AmericanJournal of Psychology, 92, 497-505.

Hintzman, D. L. (1988). Judgments of frequency and recognition memoryin a multiple-trace memory model. Psychological Review, 95, 528-551.

Hintzman, D. L., & Cumin, T. (1994). Retrieval dynamics of recognitionand frequency judgments: Evidence for separate processes of familiarityand recall. Journal of Memory and Language, 33, 1—18.

Hintzman, D. L., & Curran, T. (1997). Comparing retrieval dynamics inrecognition memory and lexical decision. Journal of Experimental Psy-chology: General, 126, 228-247.

Hockley, W. E., & Murdock, B. B., Jr. (1987). A decision model foraccuracy and response latency in recognition memory. PsychologicalReview, 94, 341-358.

Horton, D. L., Pavlick, T. J., & Moulin-Julian, M. W. (in press). Retrieval-based and familiarity-based recognition and the quality of information inepisodic memory. Journal of Memory and Language.

Howard, M. W., & Kahana, M. (1999). Contextual variability and serialposition effects in free recall. Journal of Experimental Psychology:Learning, Memory, and Cognition, 25, 923—941.

Indow, T. (1995). Retrieval sequences and retention curves: An analysisbased on extreme statistics (Tech. Rep. No. MBS 95-31). Irvine: Univer-sity of California, Irvine, Institute for Mathematical Behavioral Sciences.

Iverson, G. J., & Sheu, C.-F. (1992). Characterizing random variables inthe context of signal detection theory. Mathematical Social Sciences, 23,151-174.

Jacoby, L. L. (1991). A process dissociation framework: Separating auto-

matic from intentional uses of memory. Journal of Memory and Lan-guage, 30, 513-541.

Juola, J. F., Fischler, I., Wood, C. T., & Atkinson, R. C. (1971). Recog-nition time for information stored in long-term memory. Perception &Psychophysics, 10, 8-14.

Luce, R. D. (1986). Response times: Their role in inferring elementarymental organization. New York: Oxford University Press.

MacLeod, C. M., & Nelson, T. O. (1984). Response latency and responseaccuracy as measures of memory. Ada Psyckologica, 57, 215-235.

Mandler, G., Pearlstone, Z., & Koopmans, H. J. (1969). Effects of orga-nization and semantic similarity on recall and recognition. Journal ofVerbal Learning and Verbal Behavior, 8, 410-423.

Metcalfe, J. (1991). Recognition failure and the composite memory trace inCHARM. Psychological Review, 98, 529-553.

Metcalfe, J., & Murdock, B. B. (1981). An encoding and retrieval model ofsingle-trial free recall. Journal of Verbal Learning and Verbal Behav-ior, 20, 161-189.

Metcalfe Eich, J. (1982). A composite holographic associative recallmodel. Psychological Review, 89, 627-661.

Metcalfe Eich, J. (1985). Levels of processing, encoding specificity, elab-oration, and CHARM. Psychological Review, 92, 1-38.

Miller, D. R., & Singpurwalla, N. D. (1977). Failure rate estimation usingrandom smoothing (NTIS No. AD-A040999/5ST).

Millward, R. (1964). Latency in a modified paired associate learningexperiment. Journal of Verbal Learning and Verbal Behavior, 3, 309-316.

Montgomery, D. C , & Peck, E. A. (1982). Introduction to linear regres-sion analysis. New York: Wiley.

Murdock, B. B., Jr. (1982). A theory for the storage and retrieval of itemand associative information. Psychological Review, 89, 609-626.

Murdock, B. B., Jr., & Anderson, R. E. (1975). Encoding, storage andretrieval of item information. In R. L. Solso (Ed.), Information process-ing and cognition: The Loyola symposium (pp. 145-194). Hillsdale, NJ:Erlbaum.

Murdock, B. B., Jr., & Okada, R. (1970). Interresponse times in single-trialfree recall. Journal of Experimental Psychology, 86, 263-267.

Nelder, J. A., & Mead, R. (1965). A SIMPLEX method for functionminimization. Computer Journal, 7, 308—313.

Nelson, T. O. (1984). A comparison of current measures of the accuracy offeeling-of-knowing predictions. Psychological Bulletin, 95, 109—133.

Neter, J., Wasserman, W., & Kutner, M. H. (1985). Applied linear statis-tical models: Regression, analysis of variance, and experimental de-signs. Homewood, IL: Irwin.

Nobel, P. A., & Shiffrin, R. M. (1992). Constraints on models of recog-nition and recall imposed by data on the time course of retrieval. In L.Erlbaum (Ed.), Proceedings of the Fourteenth Annual Conference of theCognitive Science Society (pp. 1014-1019). Hillsdale, NJ: Erlbaum.

Pechmann, T., Reetz, H., & Zerbst, D. (1989). [Criticism of a measurementtechnique: The inaccuracy of voice-key measurements]. Sprache undKognition, 8, 65-71.

Pike, R. (1984). Comparison of convolution and matrix distributed memorysystems for associative recall and recognition. Psychological Review, 91,281-294.

Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1992).Numerical recipes in C: The art of scientific computing (2nd ed.).Cambridge, MA: Cambridge University Press.

Raaijmakers, J. G. W., & Shiffrin, R. M. (1980). SAM: A theory ofprobabilistic search of associative memory. In G. H. Bower (Ed.), Thepsychology of learning and motivation: Advances in research and theory(Vol. 14, pp. 207-262). New York: Academic Press.

Raaijmakers, J. G. W., & Shiffrin, R. M. (1981). Search of associativememory. Psychological Review, 88, 93-134.

Ratcliff, R. (1978). A theory of memory retrieval. Psychological Re-view. 85, 59-108.

Ratcliff, R. (1979). Group reaction time distributions and an analysis ofdistribution statistics. Psychological Bulletin, 86, 446-461.

Page 28: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 411

Ratcliff, R. (1993). Methods for dealing with reaction time outliers. Psy-chological Bulletin, 114, 510-532.

Ratcliff, R., Clark, S., & Shiffrin, R. M. (1990). The list-strength effect: I.Data and discussion. Journal of Experimental Psychology: Learning,Memory, and Cognition, 16, 163-178.

Ratcliff, R., & Murdock, B. B., Jr. (1976). Retrieval processes in recog-nition memory. Psychological Review, 83, 190-214.

Reed, A. V. (1976). List length and the time-course of recognition inimmediate memory. Memory & Cognition, 4, 16-30.

Roberts, W. A. (1972). Free recall of word lists varying in length and rateof presentation: A test of total-time hypotheses. Journal of ExperimentalPsychology, 92, 365-372.

Rohrer, D., & Wixted, J. T. (1994). An analysis of latency and interre-sponse time in free recall. Memory & Cognition, 22, 511-524.

Shepard, R. N. (1967). Recognition memory for words, sentences, andpictures. Journal of Verbal Learning and Verbal Behavior, 6, 156-163.

Shiffrin, R. M. (1970). Memory search. In D. A. Norman (Ed.), Models ofhuman memory (pp. 375-447). New York: Academic Press.

Shiffrin, R. M. (1993). Short-term memory: A brief commentary. Memory& Cognition, 21, 193-197.

Shiffrin, R. M., Murnane, K., Gronlund, S., & Roth, M. (1989). On unitsof storage and retrieval. In C. Izawa (Ed.), Current issues in cognitiveprocesses: The Tulane Floweree Symposium on Cognition (pp. 25-68).Hillsdale, NJ: Erlbaum.

Shiffrin, R. M., Ratcliff, R., & Clark, S. (1990). The list-strength effect: II.Theoretical mechanisms. Journal of Experimental Psychology: Learn-ing, Memory, and Cognition, 16, 179-195.

Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory:

REM—retrieving effectively from memory. Psychonomic Bulletin andReview, 4, 145-166.

Shiffrin, R. M., & Steyvers, M. (1998). The effectiveness of retrieval frommemory. In M. Oaksford & N. Chater (Eds.), Rational models ofcognition (pp. 73-95). Oxford, England: Oxford University Press.

Snodgrass, J. G., & Corwin, J. (1988). Pragmatics of measuring recognitionmemory: Applications to dementia and amnesia. Journal of Experimen-tal Psychology: General, 117, 34-50.

Townsend, J. T., & Ashby, F. G. (1983). The stochastic modeling ofelementary psychological processes. Cambridge, England: CambridgeUniversity Press.

Ulrich, R., & Miller, J. (1994). Effects of truncation on reaction timeanalysis. Journal of Experimental Psychology: General, 123, 34-80.

Van Zandt, T., & Ratcliff, R. (1995). Statistical mimicking of reaction timedata: Single process models, parameter variability, and mixtures. Psy-chonomic Bulletin and Review, 2, 20-54.

Wickelgren, W. A., & Corbett, A. T. (1977). Associative interference andretrieval dynamics in yes-no recall and recognition. Journal of Experi-mental Psychology: Human Learning and Memory, 3, 189-202.

Wiseman, S., & Tulving, E. (1975). A test of confusion theory of encodingspecificity. Journal of Verbal Learning and Verbal Behavior, 14, 370-381.

Yonelinas, A. P. (1999). The contribution of recollection and familiarity torecognition and source-memory judgments: A formal dual-processmodel and an analysis of receiver operating characteristics. Journal ofExperimental Psychology: Learning, Memory, and Cognition, 25, 1415-1434.

Appendix A

Response Time Data Adjustment Method

The aim of this article and the companion article by Diller et al. (2001)was to examine and model not only the measures of central tendency forthe RTs but also the entire RT distributions. However, averaging acrossvariables of little interest for modeling efforts can affect various charac-teristics of the RT distributions when those variables have an effect. Oneapproach to this problem would involve including in the models compo-nents and parameters reflecting each of these variables. However, weregard these as "nuisance" variables and do not wish to clutter the modelswith extra complexities and parameters in an attempt to model them. Oneapproach to solving this problem involves "vincentizing" the RT distribu-tions, as suggested by Ratcliff (1978,1979). This method works well whenthere is one nuisance variable to be taken into account. In our case, themain differences were those introduced by different participants, so thevincentizing approach could have been used. However, this method doesnot generalize to multiple variables, so it is of some interest to present amethod that might do so. Thus, we adopted another approach in which aregression-like analysis is used to remove the effects of nuisance variables.The adjusted data are presented in the main body of this and the companionarticle and are modeled in the companion article. None of the patterns ofdata exhibited and discussed in the body of the article and none of thestatistical effects, are changed if the raw data are analyzed instead.

RT Adjustment Method

The raw data were adjusted in two passes. The first pass involvedadjustments for the means of sessions, participants, serial position of list

within session (termed list position, not to be confused with serial positionwithin list), and output position. The second pass involved adjustments forthe standard deviations of each of these nuisance variables. The adjust-ments were made separately for each type of response, that is, for hits, falsealarms, correct rejections, and misses. We included sessions, list positions,and output positions in these adjustments, even though these did not differsignificantly, in case there might have been differences that lack of powerprevented from reaching significance.

The adjustment model used for the means assumed that

Y(i, j , k, I) = X(i, j , k, l) + [G- m(i)] + [G - m(j)]

+ [G - m(k)] + [G - m(l)], (Al)

where X(i, j , k, I) is a given single RT having participant i, session j , listposition k, and output position /; Y(i, j , k, I) is the RT for that sameobservation adjusted for the means of the nuisance variables; G is theoverall mean RT for the entire study; m(i') is the overall mean for partic-ipant i; m(j) is the overall mean for session/, m(k) is the overall mean forlist position k; and m(l) is the overall mean for output position /. In the firstpass, all RTs were adjusted according to Equation Al.

The adjustment model for the variability assumed that

j , k, I) -

X [S(k)/s(k)][S(l)/s(t)l (A2)

(Appendixes continue)

Page 29: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

412 NOBEL AND StflFFRIN

where Z(i,;', k, I) is the final adjusted RT for this observation; s(t) is theoverall standard deviation for participant i, and S(i) is the mean of the s(i);s(f) is the overall standard deviation for session j , and S(f) is the mean ofthe s(j); s(k) is the overall standard deviation for list position k, and S(k) isthe mean of the s(k); and s(l) is the overall standard deviation for outputposition /, and S(l) is the mean of the £(/). In the second pass, allobservations were adjusted according to Equation A2.

Figure Al shows an example of the result of these corrections forfree-response recognition data from Experiment 1. RTs were cumulatedin 50 bins of size 100 ms. The unadjusted distribution in Figure Al showsRT for hits in the 40-pairs/2,000-ms condition; this distribution is bimodal,with an early bump, due mainly to 1 particularly fast participant. Theadjusted distribution is given by the dashed line; the participant and otherdifferences have been removed, and the distribution is unimodal andsmoother.

Figure A2 shows an example of the result of these corrections for thecued-recall times. The unadjusted distribution in Figure A2 shows RT for

0.3 --

I 0.2 - -

0.1 --

0.0

Rawp =.78M- =785a =393m = 712

•-•• Adjustedp =.72u =796o =292m = 730

•1000 2000 3000 4000

Reaction Time (ms)

5000

Figure Al. Example of response time distributions for recognition freeresponse (hits in the 40-pairs/2,000-ms condition) before and after adjust-ing for "nuisance" variables, p = proportion of responses of that type; /u. =mean response time; <r = standard deviation; m = median response time.

0.20

0.15 --

0.10 --

0.05 - -

0.00

Raw Adjustedp = .27 p = .27

1550 fi = 1551o = 868 a = 843

1253 m = 1364

1000 2000 3000 4000

Reaction Time (ms)

5000

Figure A2. Example of response time distributions for cued-recall freeresponse (correct responses collapsed across length and strength condi-tions) before and after adjusting for "nuisance" variables, p = proportionof responses of that type; pi = mean response time; o- = standard deviation;m = median response time.

correct responses collapsed across all length-strength conditions; thisdistribution includes the differences between participants and has a slightpeak after 4,500 ms (due to the signal that was given at that time to remindthe participants that they had to respond). The adjusted distribution is givenby the dashed line; the participant differences have been removed, and thedistribution is smoother. A consequence of the adjustment procedure is thatthe externally time-locked peak at 4,500 ms is smeared over a range oftimes and is no longer evident to the eye.

In general, the adjusted distributions are more similar in shape, aresmoother, and have smaller standard deviations (and for the cued-recalldata are missing the peak at 4,500 ms) than the unadjusted distributions.Because the pattern of results and statistical analyses are the same for theadjusted and unadjusted sets of data, and because the responses at 4,500 msin response to the cued-recall prompt are of relatively little theoreticalinterest, all free-response data reported in this article, and analyzed andmodeled in the companion article by Diller et al. (2001), were adjusted.

Page 30: Retrieval Processes in Recognition and Cued Recall... · RETRIEVAL PROCESSES 385 as many items as possible from the previous list, in any order. This is a sequential task that is

RETRIEVAL PROCESSES 413

Appendix B

Hazard Functions

In this appendix, we begin by giving the hazard functions for recognitionfor the four types of responses: hits, false alarms, misses, and correctrejections. The hazard function, h(t\ is defined by

h(t) =AD

1 - F(t) ' (A3)

where f(t) is the probability density function, ^(0 is the cumulativedistribution function, and I represents time (see, e.g., Luce, 1986). At eachparticular time t, the hazard function gives the conditional probabilitydensity that a response will be given in the next instant, given that noresponse has yet occurred. We estimated the hazard function by using therandom smoothing technique (Miller & Singpurwalla, 1977; also describedin Luce, 1986). Figure A3 shows hazard functions for the four types ofresponses.

5+A

4

3 - -

p _ _

1 - -

"i r- Corrects- Intrusions

1000 2000 3000 4000

Reaction Time (ms)

5000

CorrectsHitsCorrect Rejections

1000 2000 3000 4000

Reaction Time (ms)

5000

IncorrectsFalse Alarms

1000 2000 3000 4000

Reaction Time (ms)

5000

Figure A3. Recognition free response: hazard functions for the four typesof responses. Data are collapsed across length and strength.

1000 2000 3000 4000

Reaction Time (ms)

5000

Figure A4, Cued-recall free response: hazard functions for the three typesof responses. Data are collapsed across length and strength.

The hazard functions are similar to those obtained by Ashby, Tein, andBalakrishnan (1993) in a memory scanning paradigm and to those reportedby Luce (1986) for simple RT tasks: The hazard function rises to a peakand then drops before becoming constant. Van Zandt and Ratcliff (1995)discussed a number of factors, such as mixing, that should produce hazardfunctions with such shapes.

Figure A4 gives for cued recall the hazard functions for the three typesof responses: corrects, intrusions, and give-ups. The data were collapsedacross the four length-strength conditions. As we discussed in the text, thedifferently shaped hazard functions for recall and recognition might be anindicator of different retrieval processes in recall and recognition.

Received June 15, 1999Revision received September 13, 2000

Accepted September 21, 2000 •