pylyshyn zenon, the study of cognitive architecture.pdf

18
 The Study of Cognitive Architecture Zenon Pylyshyn, Rutgers University Although much psychological theorizing has always been implicitly concerned with architectural issues — for that is what we are concerned with when we ask “what kind of a physical system is the mind?” — Allen Newell was arguably the rst scientist to try to say something in detail about the nature of this architecture. In my reading of the history of the eld this began with his 1973 paper on Production Systems as models of control structures (Newell, 1973a). It was here that the idea of proposing a new primitive architecture and discovering the constraints that it imposes on processes   rs t took s ha pe in wh at w e t he n ref erred to asat h eor y-lad en programming language. I shall have more to say about this as a research strategy later. But  rst I begin by revie wing the conce pt of cognit ive archi te cture an d su mmar iz in g some thin gs I ha ve al ready writ ten about why it is su ch a fundamen talprob lem in cogn iti ve scie nce. In fact in my view , it is th e ce nt ra l pr ob le m of co gn it iv e sc ie nc e an d mu st be ad dr esse d fr om th e star t or th e sc ie nc e wi ll __ continue to de ve lop fr ag mente d and ad ho c mode ls fo r small do ma ins of da ta . I de ve lo p th is cl ai m by sk et ch in g some id eas I ha ve ar gu ed at gr ea ter le ng th el se wh ere. Wh et he r or not there is a distinct level of organization of an intelligent system that can be called its arc hit ec tu re isa lon g te rm empir ic al ques ti on. Wh at is not an empir ic al questi on, ho wever, is whet he r th e co mpu ta ti onal view of mind re qu ir es su ch a le ve l. The ve ry no ti on of co mputation, if ta ke n lite ra ll y, pr es uppo ses the exis te nce of mechanis ms wh os e variabil it y is ci rc umsc ri be d in ce rt ai n pa rt ic ul ar wa ys . Comp ut at ion pr es up poses a  xed po int in th eorg aniz at io n of a sy ste m. This orga ni za tion provid es the ph ys ic al mechanis ms that carr y out al go rithmi c pr oc es ses wh ic h, in tu rn, manip ula te symbol str uctu re s. How  xed this structure is and with respect towhich parameters it is  fixe d or ch angeable is it se lf a fund amen ta l ques tion I shall addr es s pr es entl y. Fo r an y pa rt ic ul ar co mp ut at iona l pr oc es s th er e is only on e le ve l of the system’s or ga ni za ti on that corresponds to what we call its cognitive architecture. That isthelevel at which thestates (datas tr uc ture s) be in g pr ocesse d are re pr es enta ti onal , an d wh ere the re pr es enta ti ons corres pond 1 to the objects of thought (including percepts, memories, goals, beliefs, and so on). Notice that there may be many other levels of system organization below this, but that these do not constitute different  cognitive architectures because their states do not represent cognitive contents. Rather,  they correspond to various kinds of implementations, perhaps at the level of some abstract 2 n e u r o l o g y , w h i c h r e a l i ze ( o r i m plement) t h e co g ni ti ve a r chite c t ure or to b io l o g i c a l ( e . g . anatomical) or phys ica l or gan iza ti ons. Sim ilarly, the re may be vario us levels of orga niz ation ab ov e th is , bu t th ese ne ed not co ns ti tute di ff erent co gn it iv e ar ch it ec tu re s. Th ey ma y re pr es en t th e or ga ni za ti on of th e co gn it iv e pr oc es s it se lf , sa y in te rms of hi erar ch ie s of su br ou ti nes, no t a different level of the system structure. Also various regularities in asystem’s behavior may ——————— 1. It is non-trivial to specify what constitutes an “object of thought” as opposed to some other hypothetical construct or intervening variable — such as the internal sti muli of behaviori st theories (e.g. Osgoo de’s s ; Osgoode, 1953). Roughly, ob jects of thought are i ntentional states i n m the sense of Fodor, 1980 or Pylyshyn, 1984, which represent equivalence classes of brain states characterized by semantic properties — i.e. by what they are about. These semantic properties then appear in broad generalizations that capture the systematicity of behavior. 2. C onnect ionists (e.g ., Smole nsk y, 19 88 ) so me times ta lk ab out dis tri bute d re prese ntatio ns as involvi ng a le ve l some whe r e between impl ementa ti on and fu ll se ma nt ic re pr esenta ti ons – a sort of su bc og ni ti ve le ve l of re pr es enta ti on . Bu t th is is a ve ry mi sl ea di ng wa y of  - - - speak ing. What so-cal led distri bute d repr esentations do is encode some semantic object s in te rms of sets of features (or “microfe atur es”, fr equent ly st at is ti call y de ri ved) . They do this as a way of (i .e., in li eu of ) repr esenti ng the conc epts or obje ct s themselves. This is a feat ur e- de co mpos it io n or a comp on enti al vi ew of re pr es enta ti on. Ho weve r, in th is case th e co mpon en ts in to wh ic h th e re pr es ente d ob je ct s ar e decomposed are the semantically interpreted symbols, they are not some special lower level of quasi-representation.

Upload: aristide-de-besure

Post on 05-Oct-2015

7 views

Category:

Documents


0 download

TRANSCRIPT

  • The Study of Cognitive Architecture

    Zenon Pylyshyn, Rutgers University

    Although much psychological theorizing has always been implicitly concerned with architecturalissues for that is what we are concerned with when we ask what kind of a physical system isthe mind? Allen Newell was arguably the first scientist to try to say something in detail aboutthe nature of this architecture. In my reading of the history of the field this began with his 1973paper on Production Systems as models of control structures (Newell, 1973a). It was here thatthe idea of proposing a new primitive architecture and discovering the constraints that it imposeso n p r o c e s s e s fi r s t t o o k s h a p e i n w h a t w e t h e n r e f erred to as a t h e o r y - l a d e n p r o g r a m m i n glanguage. I shall have more to say about this as a research strategy later. But first I begin byreviewing the concept of cognitive architecture and summarizing some things I have alreadywritten about why it is such a fundamental problem in cognitive science. In fact in my view, it isthe central problem of cognitive science and must be addressed from the start or the science will___continue to develop fragmented and ad hoc models for small domains of data.

    I develop this claim by sketching some ideas I have argued at greater length elsewhere. Whetheror not there is a distinct level of organization of an intelligent system that can be called itsarchitecture is a long term empirical question. What is not an empirical question, however, iswhether the computational view of mind requires such a level. The very notion of computation,if taken literally, presupposes the existence of mechanisms whose variability is circumscribed incertain particular ways. Computation presupposes a fixed point in the organization of a system.This organization provides the physical mechanisms that carry out algorithmic processes which,in turn, manipulate symbol structures. How fixed this structure is and with respect to whichparameters it is fixed or changeable is itself a fundamental question I shall address presently.

    For any particular computational process there is only one level of the systems organization thatcorresponds to what we call its cognitive architecture. That is the level at which the states(datastructures) being processed are representational, and where the representations correspond

    1to the objects of thought (including percepts, memories, goals, beliefs, and so on). Notice thatthere may be many other levels of system organization below this, but that these do not constitutedifferent cognitive architectures because their states do not represent cognitive contents. Rather,

    they correspond to various kinds of implementations, perhaps at the level of some abstract2n e u r o l o g y , w h i c h r e a l i z e ( o r i m p l e m e n t ) t h e c o g n i t i v e a r c h i t e c t u r e o r t o b i o l o g i c a l ( e . g .

    anatomical) or physical organizations. Similarly, there may be various levels of organizationabove this, but these need not constitute different cognitive architectures. They may representthe organization of the cognitive process itself, say in terms of hierarchies of subroutines, not adifferent level of the system structure. Also various regularities in a systems behavior may

    1. It is non-trivial to specify what constitutes an object of thought as opposed to some other hypothetical construct or intervening variable such as the internal stimuli of behaviorist theories (e.g. Osgoodes s ; Osgoode, 1953). Roughly, objects of thought are intentional states inmthe sense of Fodor, 1980 or Pylyshyn, 1984, which represent equivalence classes of brain states characterized by semantic properties i.e. bywhat they are about. These semantic properties then appear in broad generalizations that capture the systematicity of behavior.

    2. C o n n e c t i o n i s t s ( e . g ., Smolensky, 1988) sometimes talk about distribu t e d r e p r e s e n t a t i o n s a s i n v o l v i n g a l e v e l s o m e w h e r e b e t w e e nimplementation and full semantic representations a sort of subcognitive level of representation. But this is a very misleading way of---speaking. What so-called distributed representations do is encode some semantic objects in terms of sets of features (or microfeatures,frequently statistically derived). They do this as a way of (i.e., in lieu of) representing the concepts or objects themselves. This is a feature-decomposition or a componential view of representation. However, in this case the components into which the represented objects aredecomposed are the semantically interpreted symbols, they are not some special lower level of quasi-representation.

  • result from the particular experience or knowledge that the system has or the way that knowledgewas acquired and organized. On the other hand, they may also be genuinely architectural theymay represent different macro-architectural organizations, as is the case with what Fodor (1983)calls mental modules. Such organizations are fixed in a way that qualifies them as architecturalproperties i.e. they are cognitively impenetrable (see below).

    I will argue that for purposes of cognitive science, the difference between cognitive architectureand other levels of system organization is fundamental; without an independently motivated

    theory of the functional architecture, a computational system cannot purport to be a literal modelof some cognitive process. There are three important reasons for this, which I sketch below.

    A r c h i t e c t u r e - r e l a t i v i t y o f a l g o r i t h m s a n d s t r o n g e q u i v a l e n c e . F o r m o s t c o g n i t i v escientists a computational model is intended to correspond to the cognitive process being

    3modeled at what might roughly be characterized as the level of the algorithm since (thisview of the proper level of correspondence is what is referred to as strong equivalence).Y e t w e c a n n o t s p e c i f y a n a l g o r i t h m w i t h o u t fi r s t m a k i n g a s s u m p t i o n s a b o u t t h ea r c h i t e c t u r e : a l g o r i t h m s a r e r e l a t i v i z e d t o a r c h i t e c t u r e s . A n a l g o r i t h m c a n o n l y b er e a l i z e d o n a n a r c h i t e c t u r e t h a t p r o v i d e s t h e a p p r o p r i a t e p r i m i t i v e o p e r a t i o n s a n df u n c t i o n a l o r g a nization. For example the discrimination-tree a l g o r i t h m c a n o n l y b erealized in what is called a register architecture, in which items are stored in registers andretrieved by address. Similarly a binary search algorithm can only be realized in anarchitecture that provides primitive arithmetic operations (or at least an ordering overaddresses and an operation for deciding the order of a pair of items). Because analgorithm always presuppose some architecture, this means that discovering the cognitivearchitecture of the mind must be a central concern of a cognitive science which takes itsgoal to be the discovery of mental processes or the algorithms used by the mind, or inother words the development of strongly-equivalent models of cognition.

    Architecture as a theory of cognitive capacity. Another way of looking at the role ofarchitecture is as a way of understanding the set of possible cognitive processes that area l l o w e d b y t h e s t r u c t u r e o f t h e b r a i n . T h i s m e a n s t h a t t o s p e c i f y t h e c o g n i t i v ea r c h i t e c t u r e i s t o p r o v i d e a t h e o r y o f t h e c o g n i t i v e c a p a c i t y o f a n o r g a n i s m . T h earchitecture provides the cognitive constants while the algorithms and representationsprovide the free empirical parameters set by the incoming variable information. Thisallows one to explain both individual differences and differences across occasions andcontexts as differences in methods, with architecture held constant (at least to a firstapproximation).

    Architecture as marking the boundary of representation-governed processes. Finally, formany of us, a fundamental working hypothesis of Cognitive Science is that there exists ana u t o n o m o u s ( o r a t l e a s t p a r t i a l l y a u t o n o m o u s ) d o m a i n o f p h e n o m e n a t h a t c a n b eexplained in terms of representations (goals, beliefs, knowledge, perceptions, etc) andalgorithmic processes that operate over these representations. Another way to put this isto say that cognitive systems have a real level of organization at what Newell (1982) hascalled the knowledge level. Reasoning and rational knowledge-dependent principlesapply at this level. Because of this, any differences in behavioral regularities that can beshown to arise from such knowledge-dependent processes do not reveal properties of thearchitecture, which remain invariant with changes in goals and knowledge. Although this

    3. I use this term in order to build on the general understanding most of us have of a certain level of description of a process e.g. the level atwhich rules and basic operations are applied to encoded representations. The notion of algorithm, however, needs to be explicated with somecare in cognitive science (see Pylyshyn, 1984), especially the architecture of the mind is very likely to be quite different from that of moderncomputer, and hence mental algorithms will likely look very different from conventional ones as well.

    2

  • is really another way of saying the same thing as was already said above, the differentemphasis leads to a novel methodological proposal; namely, that the architecture must becognitively impenetrable. Differences in knowledge and goals do not lead directly todifferences in architectures, though they can do so indirectly as when one decides to takeactions that themselves lead to such changes, such as deciding to ingest drugs.

    I have discussed these claims in various places in the past decade. For present purposes I want toconcentrate on the notion of architecture as defining cognitive capacity and then to discuss whata p p r o a c h e s a s c i e n t i s t m i g h t t a k e i n u n d e r s t a n d i n g c o g n i t i o n i f t h e d i s t i n c t i o n b e t w e e narchitecture and process or representation is taken seriously. I begin with some general pointsabout the purpose of theories.

    One of the more insidious meta-scientific tenets of psychology (and social science generally) ist h e i d e a t h a t o u r p r i n c i p l e g o a l i s t o b u i l d t h e o r i e s t h a t a c c o u n t f o r v a r i a n c e o r m a k estatistically reliable predictions. The main problem with that view is that it fails to distinguishwhat type of variance is theoretically relevant and it fails to distinguish the different ways inwhich statistically prediction can be accomplished. Two ways of predicting the same behaviorneed not be equivalent from the point of view of their explanatory adequacy. This is a point Ia r g u e d i n s e v e r a l p l a c e s i n c l u d i n g i n m y d i s p u t e w i t h J o h n A n d e r s o n ( A n d e r s o n , 1 9 7 8 ;Pylyshyn, 1979) concerning his claim that the form of mental representation is not empiricallydecidable. Consider the following examples.

    Suppose you wished to model some cognitive phenomenon, such as perhaps reading. Among thedata you might wish to account for is the way in which task latency varies with changes in thetask (say reading time as a function of certain aspects of grammatical structure). You might alsowish to predict subjects eye movements both their latencies and their loci. Let us supposet h e r e a r e s e v e r a l m o d e l s t h a t m a k e a p p r o x i m a t e l y e q u i v a l e n t q u a n t i t a t i v e a n d q u a l i t a t i v epredictions in this case. Is there anything to choose among them? Consider the followingdifferent ways in which the predictions might arise. (Each of these models is assumed to beimplemented in a computer and takes ASCII encodings of the reading text as input. What itprints out and in what order varies with the model.)

    1. The first model is a form of table-lookup. It makes its predictions by storing a corpus oftext in some form along with mean measurements obtained from previous subjects. (Or ifstoring the actual text seems like a cheat, imagine storing some skeletal information suchas strings of form classes rather than the actual words.) This model prints out time-stamped gaze positions which constitute predictions of subjects behavior while readingthe input text.

    2. A second model is based on an elaborate multivariate statistical predictor with a numberof parameters whose values are estimated from independent data. The input to thepredictor could be words, phrases, parenthesized word strings, n-gram frequency tables,or any other parameters shown to be correlated with the relevant dependent variables. Aswith the first model, this one also prints out a list of time-stamped gaze positions.

    3. A third model actually goes through a sequence of states each of which specifies wherethe subjects gaze would be in that state, as well as the time that elapsed since theprevious state. In this model the sequence of intermediate states that the model goesthrough in its processing, correspond to the sequence of states that a subject goes through.Thus the time associated with each gaze position is computed in the order in which themodeled process actually unfolds. The computation itself could be based on a stochasticmathematical model of state transitions such as a Markov model used by the Harpyspeech recognition system (Reddy, 1975).

    3

  • 4. A fourth and final model not only goes through the same sequence of states as the processit is modeling, but also the transitions from state to state are governed by the sameinformation and the same principles or rules that determines the human process. Inparticular, what the model examines at gaze location L determines where its gaze willisaccade to on the next location L , and the time between fixations is related in somei+1systematic way to the number and nature of the operations performed during each statetransition.

    Clearly these models are not equivalent from the point of view of their explanatory power even ifthey are equally predictive over some particular body of data. There are two morals I wish todraw from this simple example. The first is that explanatory power is measured relative to a setof behaviors. The set need not even be representative it need not be an ecologically validsample. The reason is that when an organism is in its ecological niche the range of behaviorsthat it exhibits can be highly restricted by the range of environmental variations, whereas a

    general theory ought to account for the set of possible behaviors over different environments. In other words a general and principled cognitive theory must be able to account for the organisms

    cogn i t i v e c a p a c i t y . T h i s c a p a c i t y i s , b y h y p o t h e s i s , s o m e t h i n g t h a t d o es not change overspecified kinds of environmental variation (in particular, informational variation). Hence thecapacity fixes certain parameters while others are fit to the particular circumstances. Whatmodels (1) and (2) above fail to do is fix certain parameters while fitting a more restricted set to aspecific situation.

    The difference between the third and fourth models is more subtle and perhaps even moreimportant. The reason we intuitively prefer the fourth model is that we implicitly accept that acomputational model should do more than just mimic the behavior actually observed even asufficiently large sample of behavior. It ought to also model the WAY in which the behavior iscaused in the organism. Even going through the same intermediate states as the organism beingmodeled is not enough unless the state changes occur for the same reason. This point has beencentral to some of my disagreements with models of mental imagery phenomena. I will reviewsome of these arguments for illustrative purposes since they are central to the issue of whichregularities are due to the architecture and which are due to knowledge-dependent processes.

    When subjects reason using images they appear to be constrained in various ways, for examplethe sequence and timing of various representational states appears to be regular and predictable.In fact, because the sequence is very much like that which unfolds in a real physical situation,this is often take to suggest that there are architectural properties that mimic nature. Forexample, there are models that postulate mental analogs of space and other properties a sort ofnatural harmony view of the mind.

    I have discussed the shortcomings of this view at length. What I have objected to has been theparticular assumptions about the architecture that are made in various models, rather than thei d e a t h a t t h e r e m a y b e s o m e t h i n g s p e c i a l ( t h o u g h u n s p e c i fi e d ) g o i n g o n i n i m a g e r y - b a s e dr e a s o n i n g . I b e l i e v e t h e r e i s a l e s s o n t o b e l e a r n e d i n t h e s e a r g u m e n t s c o n c e r n i n g w h a tarchitecture is supposed to be doing for us and how its properties show through in empiricalphenomena.

    A revealing example is the Shepard and Feng (1972) study of mental paper-folding. In thisexperiment, subjects were shown a sheet of paper cut so that it could be folded to form a box.The sheets had marks on two edges and the subjects task was to say whether the two markswould join when the paper was folded along indicated edges. The reliable finding was that theamount of time to do the mental task increased linearly with the number of folds that it wouldactually take to bring the two relevant edges physically together. What Shepard and Fengconcluded is that a second order isomorphism exists between the way subjects had to imaginethe task and the way that physical constraints require it to be performed in reality. They further

    4

  • took this to be evidence that imagery mimics real actions in the world a property which theyrefer to as analogue.

    Subjects in this task doubtlessly go through a sequence of mental representations correspondingto a sequence of individual folds. But the question still remains: Why do they do so? Or whydoes that constraint on the sequence of mental events hold? In the physical case it holds becauseof how the task must be done by hand and because of physical properties of real surfaces (e.g.,they cannot interpenetrate without tearing). In the mental case it could hold for a number of

    quite different reasons, several of which are worth enumerating. Which one holds i.e. whatthe reason is for the observed constraint holding is always an empirical issue. My point herei s s i m p l y t o i n d i c a t e t h a t i t m a t t e r s w h i c h i n t e r p r e t a t i o n i s t r u e b e c a u s e t hey each tel l u ssomething quite different about the nature of mind and of the mental process underlying theobservation. Consider the following three options and their implications.

    1. The observed constraint on the sequence of mental states that subjects go through mighthold because thats one of the properties of mind of the architecture deployed inimage-based reasoning. This is certainly the way that many people interpret this andother similar results (e.g. Kosslyn, 1980). Another way of putting this is to attribute theregularity to properties of the medium in which the imaginal representation is embedded.Note, however, that if that were the case then the constraint could not be made todisappear by merely changing the way the task is understood, or the subjects goals, orthe knowledge that they have concerning the way that folding operations work. Thiswould be the strongest and most interesting reason for the observed regularity since itwould lead directly to specifying certain properties of the cognitive architecture thebasic mechanisms and resources of cognition.

    2. Another reason that subjects might have gone through the sequence that they did, is thatthey might have chosen to do the task this way for any of several reasons. On this story,subjects could equally have chosen to do it some other way, but instead chose to mimicthe way the paper would have been folded in reality. I am not claiming that this is in factthe correct explanation in this case, but it is a logical possibility and would constitutequite a different interpretation of the observer constraint. I have, moreover, claimed thatthis is precisely the correct explanation for a variety of other imaginal processes whichhave been studied chronometrically in the literature.

    For example, it has widely been observed that the amount of time it takes to switchattention from one place on an imagined map to another is a linear function of the actualdistance on the corresponding real map. I have argued (Pylyshyn, 1981) that the bestexplanation for this is that subjects take the task to be the simulation of what wouldhappen if they were looking at a map and scanning it for specified places. Since subjectsknow what would happen e.g., in order to look from A to B one has to scan theintermediate locations they make this happen in their imagining. In other words theyimagine a sequence of events in which attention is first at A then a small distance awayfrom A, and so on as it approaches B. We showed empirically that if the subjects do nottake their task to be the mimicking of such a scan then the linearity constraint no longerholds. The same is true for a variety of other results such as the finding that it takeslonger to report features on an object imagined as small than the same object imagined aslarge (Kosslyn, 1975).

    3. A third possible reason for the sequence of mental representations being constrained as itis in the paper folding task, is that subjects knowledge of the task domain is organized insuch a way that the observed sequence is a natural consequence. For example, in order todo the paper folding task one needs to know what happens when a sheet of paper isfolded about a given line. But suppose, as seems reasonable, that people ordinarily only

    5

  • know what will happen when a single fold is made. In other words, knowledge of theresults of folding is indexed to individual applications of the fold operator. Why shouldthat be so? It could arise because of experience (we only experience the results of onefold at a time) or as a result of a general economy of storage (it would be wasteful to storethe result of pairs of folds when this is rarely needed and it can be reconstructed fromknowing the result of individual folds). However in this case the architecture does notrequire this particular way of representing the knowledge of paper folding and otherorganizations could also arise (e.g. origami experts who deal with folding problems on afrequent basis may well represent their knowledge in terms of clusters of folds.).

    The difference between explanation (1) and the other two is that the constraint in the latter twoemerges from much more general properties of mind e.g., the property of being able to

    represent beliefs and to reason with them, and the property of tending to minimize memory demands for rarely-needed information. If, however, the correct explanation of the constraint is

    (1) that it is due directly to a property of the cognitive architecture then discovery of suchconstraints would provide important evidence for the nature of mind. Unfortunately, it turns outmore often than not, that observed regularities do not directly reflect the cognitive architecture,but rather the way that general architectural features interact with task demands and the structureof knowledge representation.

    The problem is that human behavior is remarkably flexible. Almost no behavior (putting asideits physical form, which is clearly circumscribed by the laws of physics) can be excluded underall possible sets of beliefs and goals. In physics one can at the very least, claim that certain

    motions or states are impossible. But within the domain of cognitive behaviors, almost nothingcan be so excluded. It is because of this that the mind is better modeled by some form of

    resource-limited Turing machine than as any kind of a non-computational physical device including a biological machine even though it is clearly implemented in tissue, just as a Turing machine must be implemented in some physical form to produce actual instances of b e h a v i o r . W h e n w e u n d e r t a k e t o s p e c i f y a c o g n i t i v e a r c h i t e c t u r e w h a t w e a r e d o i n g i s

    attempting to discover which variant of a Turing machine corresponds to the mind. But becauseof the intimate way that architecture and encoded knowledge interact in generating behavior, it isa h i g h l y n o n - t r i v i a l m a t t e r t o i n f e r t h e a r c h i t e c t u r e f r o m b e h a v i o r a l r e g u l a r i t i e s . I n m yobservation there are three distinct approaches to this goal.

    The first approach is to set boundary conditions on the architecture. This can serve as a powerfulway of excluding putative archite c t u r a l p r o p o s a l s . F o r e x ample, the work sketched abovecriticizing certain imagery models (Pylyshyn, 1973, 1981) is in this category. By showing thatobserved regularities can be altered in a rational way by changing subjects beliefs about theworld or about the task (i.e., by showing that the regularities are cognitively penetrable) wedemonstrate that the regularities do not reveal properties of the architecture.

    The second approach is to attempt to uncover pieces of the architecture in areas where thebehavioral regularity is modular or cognitively impenetrable. A considerable degree of successhas been achieved in the study of early vision, pre-attentive perceptual processes, and arguablyeven the syntactic aspects of language processing, in this way. In these areas provisionalproposals have been made concerning properties of the architecture.

    The third approach is by far the most ambitious. It is to use all ones intuitive appreciation of thecomputational problems and the primitive operations that human cognition might possess, to tryto postulate a complete and uniform architecture for cognition. Allen Newell is the only personto have attempted so grand and ambitious a scheme. The reasons for taking this high road werealso well articulated by Newell (1973b). They are that cognition cannot be adequately analyzedpiecemeal by building models of little fragments of phenomena, because understanding cognitionis highly holistic, and the small questions cannot be addressed without embedding them in a

    6

  • larger picture of what cognitive phenomena we ultimately hope to explain. This is not only thehigh road, it is also the most treacherous, since any mistaken assumptions one makes in the

    initial axioms will tend to permeate the rest of the theory.

    A n d fi n a l l y , a s a g e n e r a l i z a t i o n o f t h e h i g h r o a d , o n e o f t h e n o v e l m e t h o d o l o g i e s I h a v eattempted to exploit is called the Minimal Mechanism Strategy (Pylyshyn, 1989). This strategyattempts to find the apparently simplest mechanism (or set of operations) that are sufficient forcarrying out a task that one has independent reason to believe must be carried out as part of thelarger process being studied. To my mind Newells (1973a) postulation of a production system(PSG) as a component of the rapid memory search task was one of the first examples of thisstrategy. My own attempt to postulate a set of primitive mechanisms that would allow one tobuild up a pattern-description and perceptual-motor coordination system is another example ofthis strategy.

    In what follows I will attempt a quick analysis of these strategies and the general problem ofmethodology in the study of Cognitive Architecture.

    Design Questions about Architectures

    If we accept the importance of specifying the architecture in building models of the mind, wewill sooner or later be faced with a number of important design decisions.

    1. The problem of control structure. This was not even seen as a problem for cognitivescience until Newells 1973 paper (Newell, 1973a). Where is there a problem about thecontrol structure? It might seem that if you have a model in the form of a program insome language you have specified all you need to; you dont have to worry aboutsomething called its control structure. After all, the process unfolds as it does becauseof the sequence of commands you write down, including commands for branching onvarious conditions. What more is there? Newell argued very persuasively that such aprogram presupposes a particular (unstated) control structure that may not be empiricallywarranted. There are processes being carried out and memories being used that do nota p p e a r e x p l i c i t l y a t h e m o d e l p r e s e n t e d i n t h i s w a y . B i n d i n g v a r i a b l e s , p a s s i n ga r g u m e n t s , t r a c k i n g r e t u r n a d d r e s s e s f o r s u b r o u t i n e s a n d d y n a m i c e n v i r o n m e n t sassociated with individual subroutine calls involves considerable run-time resources anda n e l a b o r a t e m a c h i n e r y t h a t r e m a i n s h i d d e n f r o m v i e w a n d u n e x a m i n e d . N e w e l lproposed production systems in the first instance as a way to minimize the run-timem e c h a n i s m s a n d t o m a k e e x p l i c i t t h e a s s u m p t i o n s c o n c e r n i n g w h a t c o n t r o l s t h esequencing of the basic operations of a process.

    2. The problem of choosing the primitives. Choosing the basic operations in an architectureinvolves taking a position on certain general organizational structures and tradeoffs, suchas the discipline for invocation of operators, computation vs lookup, top-down vs bottom-up guidance, fetch-execute vs recognize-act cycle (see Newell, 1973a), constraints onmessage and argument passing, parallelism, synchronization, and so on. These are issuesthat have to be faced in the course of designing an architecture and are likely to beapproached on an incremental or even trail-and-error basis.

    3. The problem of uniformity vs multiplicity of architectures. On the face of it there appearsto be a need for different architectures whenever very different forms of representationare involved. For example, in early vision the representation appears to be somatotopic

    7

  • and the processing runs off as a parallel relaxation process across the image. Similarlyperceptual-motor coordination becomes almost reflexive and highly stratified for certainkinds of functions. There has also been a lot of speculation concerning whether imageryand other central functions that are sometimes found dissociated in pathological casesmight not involve separate architectures. There is very little detailed study of these casesfrom a computational perspective so the case for distinct architectures, while tempting,has not been made. At some level we know that something special is going on in thesedifferent skills. Yet at another level (say at the neural level) they are made of the sameingredients. Whether at the special level we call cognitive architecture they are distinctremains to be seen as more detailed theories are worked out though I will have somecautionary comments to make about this later.

    4. The problem of what is explicitly represented. Behavior of a system can arise from theway it is structured or from the representations it contains. The distinction between thesetwo was raised earlier in connection with the decision as to which aspects of behavior areattributable to the architecture and which to the knowledge that is represented. Butsometimes the distinction is not empirically so clear, especially in the case of rules.Here the question arises; are the rules that are appealed to in explaining the behaviorthemselves represented, or do they merely describe how the system behaves. Are theyexplicit or implicit. Does the system follow the rule or merely behave in accordancewith the rule. The difference is an important one because different entities are posited ineach case. If a system follows a rule then the rule must be explicitly represented andaccessed in generating the behavior. To use computer terminology, such a rule is beingused in interpreted mode. Some encoding of the inscription of the rule is actuallyaccessed and evaluated to generate an action. If the rule is explicitly encoded then theform of the rule is empirically significant. Consequently the theory must assume someinternal encoding scheme (or language) and hypothesize that the rule is encoded usingthis scheme. If the rule is implicit, or if it merely describes the behavior, then the form ofthe rule or the notation in which it is written has no empirical consequences. That is thecase with, for example, the laws of physics. It is only the regularity that the laws describethat is significant, not their form. Any extensionally equivalent form is equally valid,though some may be more parsimonious or easier to compute or more transparent, and soon.

    Given this distinction, the question then arises whether the rules that are hypothesized forhuman behavior are explicitly represented or are implicit. For example, are the rules ofarithmetic, or grammar, or of social conduct, explicitly represented? The answer is notalways so clear. Of course if people can tell you the rule they are following (as in thecase of such mathematical procedures as taking a square root, or in the case of trafficrules) then one can be reasonably sure that the rule is explicitly represented. But whenthe rule is tacit and not reportable by the subject, the issue is more difficult. There weneed additional evidence that the form of the rule is important. Even in the case ofgrammar, where the realist view was promoted most strongly by Chomsky (see Pylyshyn,1991b) the issue remains unclear. Whether or not the rules of grammar are explicitlyr e p r e s e n t e d a n d a c c e s s e d i n t h e c o u r s e o f s p e a k i n g o r c o m p r e h e n d i n g r e m a i n s a nempirical question, the evidence for which is still tenuous (but see Pinker and Prince,1988). On the other hand it is clear that at least some of what we call beliefs and goalsare explicitly represented, and that the representational form must meet certain boundaryconditions, such as compositionality and systematicity (see Fodor & Pylyshyn, 1988, fordetailed arguments). This, in turn, places certain strong requirements on the nature of thearchitecture requirements that are met by architectures which, like the original Turingmachine, read and writes symbolic expressions. Whether they can also be met by otherclasses of architecture remains to be seen though none that have been proposed so far(e.g., connectionist or analogue architectures) meets these minimal conditions.

    8

  • 5. T h e p r o b l e m o f C h a n g e . H u m a n b e h a v i o r i s h i g h l y p l a s t i c . T h e r e a r e a l m o s t n obehaviors of which a person is physically capable that could not be elicited under somecircumstances, under some beliefs and goals. Changes in behaviors may have short orlong term consequences on patterns of behavior. Some of these are rational responses todecisions taken with certain information and certain beliefs and utilities. Others are ther e s u l t o f m o r e s u b t l e m a n i p u l a t i o n o f i n f o r m a t i o n a n d m e m o r y a n d t h e o f t e n t a c i tdiscovery of utilities of different actions (there is good reason to think that a great deal ofhuman conditioning is of this sort see Brewer, 1974). Others are the result of self-observation, playful discovery, or even trial-an-error. Others may be the result of ar e o r g a n i z a t i o n o f k n o w l e d g e i n t o m o r e e f fi c i e n t f o r m a s o c c u r s w h e n e x p l i c i t l yrepresented knowledge of novices becomes compiled into larger automated structures(chunks) in experts. And finally, some may be the result of changes in aspects of thearchitecture. The latter are changes which need not bear an informational or logicalrelation to the causal events. They may, for example, result from the effects of mererepetition or from simple exposure to triggering events such as happens, for example,when instinctive behavior is released by appropriate triggers. Very little is known aboutwhich changes are architectural and which are a form of knowledge-induction. Forexample it is widely assumed that both natural concepts and linguistic competence areacquired through knowledge-induction rather than architecture-changing processes (seePiatelli-Palmarini, 1980). Historically, psychologists have referred to all of these formso f l o n g - term behavior a l t e r a t i o n s a s l e a r n i n g . B u t a s c i e n c e t h a t t a k e s c o g n i t i v earchitecture as a central concern has to make distinctions among them since they mayrepresent quite different ways in which an architecture interacts with an environment toproduce changes in state. The study of learning has been a sad chapter in the history ofpsychology, with the ironic consequence that the most studied problem in psychologyremains the least understood.

    Some Ways to Study Architecture

    There is no formula for discovering the architecture of mind, any more than there is a formula foruncovering the secrets of the physical world. We cannot afford to leave out any source of

    evidence, including direct neurophysiological evidence. Indeed we need to explore the space ofoptions even more than we do in a mature science. It is in this spirit that I want to discuss someways that people have studied cognitive architecture.

    First there is the obvious distinction between the high road and the low road between those,like Alan Newell, who would build large scale unified theories of the mind and those whotake the approach much more common in psychology and build computational models that

    closely fit a set of fairly restricted empirical data. The mini-model builders very nearly have amonopoly in cognitive science, and perhaps there is good reason for it. It is here that the

    m e t h o d o l o g y o f I n f o r m a t i o n P r o c e s s i n g P s y c h o l o g y i s m o s t d e v e l o p e d ( m e t h o d s s u c h a schronometric analysis, protocol analysis, stage analysis, and so on) and it is here that modelsl e n d t h e m s e l v e s m o s t r e a d i l y t o e x p e r i m e n t a l t e s t s . O v e r t h e p a s t 5 0 y e a r s e x p e r i m e n t a lpsychology has been at its most innovative in the development of experimental paradigms ford e c i d i n g b e t w e e n s u c h b i n a r y p r o p e r t i e s a s s e r i a l - v s - p a r a l l e l , a t t e n t i v e - v s - p r e a t t e n t i v e , s e l f -terminating vs exhaustive, and so on (Newell lists a much larger number of such oppositions).

    I shall not pass judgment on the relative merits of these two approaches (Allen Newell did a goodjob of that in his Twenty Questions paper; Newell, 1973b). However I do want to comment ona couple of disadvantages of the low-road. One problem with it, which David Marr (Marr, 1982)has been most articulate in pointing out, is that if we attempt to study mechanisms outside of

    9

  • their more general functional context we are quite likely to misidentify the reason for a regularity in the way I spoke about earlier when I referred to discovering why some regularity holds. Infact if we look at a narrow range of behavioral contexts, it is entirely natural to postulate an

    architectural mechanism that generates each observed regularity. But the regularity may not reveal the constraints of a mechanism at all. It may simply reveal the existence of a very general

    capacity and the way it interacts with a particular task. John Anderson (1991) has also arguedfor this point using some specific examples (see also the response by Simon, 1991). The

    principle that behavioral regularities may be attributable to interactions with the demands of theenvironment (or more precisely, with task demands) has been made in various contexts by

    ecological psychologists as far back as Egon Brunswick (1956) and reiterated by Simon (1969)in his famous ant-on-the-beach parable, as well as in Newell and Simon (1972). It is also theprinciple point behind my critique of a large class of experiments on mental imagery whichpurport to show that certain special (sometimes analog) mechanisms are involved in reasoningwith mental images (Pylyshyn, 1981). The lesson ought to have become etched in the minds ofInformation Processing modelers, but it has not.

    The reason for raising this issue at this time is that if one is concerned with methodologies fordiscovering properties of the cognitive architecture, one must be acutely aware of the fact thatnot all empirical regularities transparently reveal properties of the architecture. Thats why

    injunctions such as first find out what function is being computed can have a salutary effect.

    What this all leads to is that in attempting to understand the larger architectural problems one isfaced with a dilemma. If we look at particular experiments and try to model them one runs therisks already alluded to. But is there any other way to proceed? All I can do is reiterate a

    general injunction concerning minimizing commitment too early and avoiding the temptation toposit a mechanism for each reliable empirical regularity. We have already seen in discussingthe mental imagery research that many robust behavioral regularities do not reveal propertieso f t h e a r c h i t e c t u r e b e c a u s e t h e y c a n b e s h o w n t o b e c o g n i t i v e l y p e n e t r a b l e , a n d h e n c eknowledge-dependent. Cognitive penetrability remains a major diagnostic tool in determiningwhether a regularity arises from some architectural feature or at least in part from a knowledge-based cognitive process. The earliest application of this criterion (though not under that name)was the demonstration that the psychophysical threshold (see the historical review of this notionby Corso, 1963) did not reveal a property of the architecture, but was confounded with a

    cognitive decision process which could be penetrated by changes in beliefs and utilities. This was the important contribution of Signal Detection Theory (Green and Swets, 1966).

    In what follows I w i l l s k e t c h s o m e i d e a s a b o u t t h e h i g h - r o a d f o r i n f e r r ing aspects of thearchitecture. In doing this I will refer to two general case studies. The first comes from Newellsown work (as described mainly in Newell, 1990) and shows how as a theory matures more andmore of the empirical phenomena that were assumed to directly reflect architectural propertiesmay turn out to be interaction effects among other factors. The second example is from my ownwork on visual location indexes and illustrates a research strategy which I call the MinimalMechanism Strategy.

    When and why to posit particular architectural properties

    N e w e l l ( 1 9 9 0 ) d i s t i n g u i s h e d b e t w e e n w h a t h e c a l l e d t e c h n o l o g i c a l l i m i t a t i o n s o f a narchitecture and designed-induced functional limitations. The difference is subtle but important.Technological limitations are arbitrary brute facts about such things as resources available to thecomputational machinery. There is no point asking why such a limitation occurs beyond

    10

  • perhaps asking for neural substrates. If we say that STM has a certain size, say 7 plus or minus42, then we are claiming such a brute fact about how the brain happens to be structured. On the

    other hand, it may also be possible to explain the phenomena of STM by appealing to the waythat task requirements interact with the design principles of an intelligence system principlesthat are independently motivated because of what the system has to be able to accomplish in

    dealing with the world, solving problems, learning, and so on. In the case of SOAR, STM p henomena occur as a side effect of other more princip l e d d e s i g n p r o p e r t i e s , f o r e x a m p l e

    because symbols must be kept available as long as a relevant subgoal is active and they disappearbecause the working memory changes with each new production cycle, and so on.

    Whether or not this particular story is correct, the idea of allowing shortcomings deviationsfrom normative or optimality principles to derive from interacting functional requirements issurely a good strategy. Its much like the strategy I have always advocated with respect toappealing to stochastic mechanisms: Although some psychological mechanisms and processesmay be inherently stochastic, a reasonable research strategy is to resist appealing to stochasticnotions until forced to do so because of the inability of deterministic structural hypotheses to dealwith the phenomena. The reason for such a conservative strategy in the case of stochasticmechanisms is the same as the reason for resisting positing arbitrary technological limitations asa first move, namely it is too easy to merely stipulate properties that map directly onto empiricalphenomena and in so doing miss deeper principles that have a wider domain of applicability butwhich relate to the phenomena in question by interacting with particulars of the task at hand. Wehave already seen examples of this principle in connection with the discussion of hypothesesconcerning analogue mechanisms for visual imagery phenomena.

    B e f o r e m o v i n g o n t o m y o w n w o r k I w a n t b r i e fl y t o e l a b o r a t e t h i s c l a i m t h a t e m p i r i c a lobservations may not directly reflect properties of underlying mechanisms. One example thatNewell commented on some years ago (Newell, 1986) concerns the modularity of cognition.The first order evidence suggests that the cognitive system may be completely interdependent with perception, language processing, and other cognitive skills all talking to one another and alldepending on reasoning. Subsequent refinement of the evidence suggests a second order viewthat when the proper distinctions are made (say between detection and response selection orbetween parsing and understanding sentences) there is much reason to believe that the cognitivesystem is architecturally highly restricted with respect to which processes can talk to which.This leads naturally to a stage view wherein certain subprocesses must be completed without theintervention of sources of knowledge outside those designated for that stage (or perhaps storedinside that module). Although I believe that something like that is true of vision, I acknowledgethat some forms of apparent modularity (cognitive impenetrability of certain subprocesses) neednot arise directly from architectural constraints. They may arise indirectly from requirementsthat the architecture imposes on such things as argument passing, as well as on task-inducedrequirements on ordering of processes, on clustering of expert knowledge in subprocesses, andso on. Of course, in the end these do depend on the architecture since without an appropriatea r c h i t e c t u r e n o n e o f t h e o b s e r v e d b e h a v i o r w o u l d o c c u r . T h e p o i n t i s t h a t t h e o b s e r v e dregularities need not reveal properties of the architecture directly because the observations mayarise from the way some (perhaps accidental) architectural feature interacts with knowledge,habits, task demands, and other factors.

    As Newell (1990) points out, certain phenomena of encapsulation, while real enough, may in theend be compatible with a uniform architecture which exhibits modularity as a consequence of thew a y t h a t t h e g e n e r a l a r c h i t e c t u r a l c o n s t r a i n t s i n t e r a c t w i t h o t h e r f a c t o r s . F o r e x a m p l e ,

    4. Asking why STM has a capacity of 7 symbols rather than 9 would be just like asking why we have 5 fingers on each hand instead of 4 or 6.The best answer may well be that it had to be some number and 5 did not put us at a survival disadvantage. Many people believe there has tobe a teleological or evolutionary explanation for all properties of an organism. But that is patently false since there are indeterminately manyproperties that will always remain unaccounted for after all evolutionary factors are taken into account.

    11

  • modularity may result from (a) the development of compiled expertise through chunking of pastexperience (which itself depends on the architecture, though in a less direct way) and (b) the

    t e m p o r a l o r d e r i n w h i c h p r o c e s s e s a r e e x e c u t e d , w h i c h i t s e l f m a y b e a p r o d u c t o f b o t harchitecture and knowledge represented (as I suggested in discussing the Shepard and Fenge x p e r i m e n t ) . T h e l a s t w o r d i s n o t i n o n e x a c t l y h o w a r c h i t e c t u r a l c o n s t r a i n t s m a n i f e s tt h e m s e l v e s i n w h a t a p p e a r s t o b e c l e a r l y a f r a g m e n t e d s y s t e m o f p r o c e s s i n g . B u t t h epossibility that it too is an interaction effect remains a serious contender here as it is in the caseof other phenomena like STM. Of course what is needed is not just the recognition of a logicalpossibility but a specific proposal as we have in the case of STM. For what its worth, my ownbet is that in areas such as early vision and certain perceptual-motor effects, the encapsulation isa symptom of real architectural differences, though in other putative areas (e.g., face recognition,mental imagery) they are interaction effects. But only time will tell.

    The Minimal Mechanism Strategy and FINST Index Theory

    Finally, I want to devote some time to the question of methodology for taking the high road indesigning a cognitive architecture. In recent years I have adopted a strategy I believe bears ar e s e m b l a n c e t o t h a t w h i c h motivated N e w e l l s fi r s t e x p l o r a t i o n i n t h e d e s i g n o f a g e n e r a lcognitive architecture. In doing this I have confined myself to relatively early stages in vision in particular to the interface between the early stage in vision that is preattentive, automatic,impenetrable, and perhaps parallel, and later stages where processes become serial and undersome at least minimal voluntary control. The strategy I have adopted is one I refer to as aminimum mechanism strategy which represents at least one relatively high-road approachtowards the difficult but crucial problem of inferring aspects of the cognitive architecture.

    The approach is something like that taken by Alan Turing in his famous paper on the Entschei-dungsproblem. In that paper, Turing asks himself, What are the minimal essential operations thata mathematician must perform in doing proofs? Turing then postulates a mechanism based onthese minimal operations: the mechanism can write symbols on a tape, it can move the tape byone unit, and it can change its state as a function of its current state and the symbol currentlyunder its read head. The reasoning that went into this design is actually made explicit inTurings paper.

    Newell (and earlier Newell and Simon) also considered what the very least was that could beassumed in building a system that did not presuppose a hidden run-time control structure. Thislead to the idea of building an architecture based on a recognize-act cycle an idea that turnedout to have many interesting side-effects. For example, in attempting to design a process thatwould run on such an architecture and produce the observed behavioral regularity in a rapidmemory search task, Newell (1973a) was lead to a novel hypothesis concerning the locus of theserial-exhaustive-search observations. This hypothesis (called the decoding hypothesis) wastotally natural in the production system architecture, though it would have seemed unmotivatedin a conventional architecture.

    Another example of this minimalist strategy was Marr and Nishiharas (1976) proposal for amechanism which brings a 3D representation of an image into correspondence with a canonicalmodel of the object for purposes of recognition. The problem they addressed is how a viewer-centered image (with depth information) could be placed into canonical orientation in order to belooked up in a table of shapes so the object could be identified. This proposal postulates a simpleand primitive piecewise image-rotation mechanism. This mechanism takes a pair of vectors inthe 3D representation and computes their projection after they have been rotated by a small angle

    12

  • about a reference axis called a Spasar. Using this simple idea they were able to offer the beginnings of a theory of object recognition by shape, as well as suggest an account of the Shepard and Metzler (1971) mental rotation experiments.

    In al l t h e s e c a s e s w h a t i s i n v o l v e d i s t h e p o s t ulation of a mechanism which is extremelyprimitive yet appears to be sufficient for the task at hand. Because the notion of simplicity is notwell defined, such mechanisms are not unique and their utility ultimately depends on how wellthe complexity of processes built out of them matches the empirically observed complexity ofpsychological processes. Despite the nonuniqueness of the selection of a primitive basis, theminimal mechanism strategy represents a departure from the usual way of building models toaccount for experimental findings. It follows the principle of least commitment in the design ofan architecture. If the evidence requires a more complex basis this can be provided withoutabandoning the original model, since complex operators can be built out of the simpler ones ifthe latter are appropriately chosen to be complete.

    This is not the time nor the place to go into details of the mechanism we have proposed as part ofan early vision process. However a brief sketch may provide a flavor of what is involved. Themechanism in question is one which binds internal variables to primitive features in a visualscene. In so doing it individuates these features without necessarily recognizing their type norencoding their location in some coordinate frame of reference. Since the binding is assumed tobe sticky and once assigned remains with the feature independent of its retinal location, thepostulated pointer (called a FINST) provides a way to refer to occupied places in a scene.

    Although the idea is extremely elementary, it turns out to have broad implications for a numberof visual phenomena, including parallel tracking of multiple targets, stability of the visual worldwith eye movements, certain image scanning experiments, subitizing, and the beginnings of av i e w o f c r o s s - m o d a l i t y b i n d i n g o f l o c a t i o n s f o r p e r c e p t u a l - m o t o r c o o r d i n a t i o n . T h e s eimplications are discussed in Pylyshyn (1981). I shall not have time here to do more than sketchthe basic idea and suggest why it has the broad ramifications.

    A simple way to illustrate what is intended by such an indexing mechanism is to describe an experiment showing that multiple items can be tracked simultaneously. In these studies, which

    have now been repeated in dozens of ways and by several investigators (Pylyshyn & Storm, 1988; Yantis, 1992; Intriligator, Nakayama, & Cavanagh, 1991; Drior, 1992), subjects are shown

    a set of 10-15 identical objects on a screen. A subset of 3-5 of them are briefly distinguished, sayby flashing t h e m . T h e n a l l o b j e c t s b e g i n t o m o v e a b o u t i n r a n d o m n o n - c o l l i d i n g motion(Brownian motion). Subjects are required to track the subset (usually with instructions not tomove their eyes). Some time later a response is required which depends on subjects havingsuccessfully tracked all the moving objects. For example, in a typical experiment one of theitems flashes or changes shape. Subjects task is to say whether it was a member of thepreviously distinct subset. Subjects are able to perform better than chance under a variety ofconditions where they are unlikely to have been tracking by continuously scanning all targets inround-robin fashion. We concluded (Pylyshyn & Storm, 1988) that they must have tracked all 4(or perhaps 5) objects simultaneously.

    The model we have in mind is extremely simple, though it has wide ramifications. It is just this.Some small number of primitive visual features attract indexes from a finite pool of availableinternal names or pointers. These indexes provide a way to access the objects in the visual field even if the objects are in motion. Such access is primitive and occurs despite the fact that theindexed objects need not be classified or their locations encoded in an explicit (e.g. Cartesiancoordinate) fashion. What the index provides is a way to strobe the indexed places for additionalproperties. The basic assumption is that no visual properties of a scene can be computed unlessthe features involved are first indexed in this primitive way.

    13

  • What indexes allow the visual system to do is leave some of the memory in the scene instead ofhaving to encode all the details. In that sense it provides a primitive way of situating vision (touse currently popular jargon). It also provides a way to convert visualization phenomena (e.g. m e n t a l s c a n n i n g r e s u l t s u c h a s a l r e a d y d i s c u s s e d ) i n t o p a r t i a l l y v i s u a l p h e n o m e n a . B yindexing places in a real visual field subjects can actually scan their attention to those places andhence generate what appears to be mental scanning phenomena (Kosslyn, 1973). But withoutindexes no relative locations can be computed and no directed movement of attention or eyemovements can be carried out, thus explaining the instability of vision in the ganzfeld (Avant,1965).

    Similarly, the phenomenon of subitizing can be viewed as counting the number of indexes in use.According to this view, when determining the cardinality of features in the subitizing range

    subjects need not search for the features in the visual scene providing they are already indexed. That is why subitizing is faster than counting, relatively insensitive to spatial layout,

    and not possible when the features do not induce indexes (i.e. when they are not what are calledpopout features), as reported in Trick & Pylyshyn (1994).

    M a n y e x p e r i m e n t s h a v e b e e n d o n e t o e x p l o r e t h e c o n d i t i o n s u n d e r w h i c h i n d e x e s a r eautomatically attracted to features (e.g. it appears that onsets and luminance changes attractindexes but equiluminant color-changes do not). Also in computing certain relational propertiesthat require serial processing (e.g. what Ullman, 1979, calls Visual Routines) the relevant

    places in the scene must first be indexed, otherwise there is no way to direct processing seriallyto these features.

    If we do assume the existence of such indexes, however, we also have a way to understand whathappens when a mental image is superimposed on a visual scene. These experiments

    include the mental scanning phenomena studied by Kosslyn (which remain robust when an actual visual display is involved), as well as a variety of superposition phenomena ranging

    f r o m m o t o r a d a p t a t i o n t o i m a g i n e d a r m l o c a t i o n s ( F i n k e , 1 9 7 9 ) , c o m p a r i s o n o f s e e n a n dimagined patterns (Hayes, 1973; Shepard & Podgorny, 1978) to illusions produced by imaginingfeatures superimposed on visual ones (Bernbaum & Chung, 1981). In these and other such caseswhat appears to be happening is that imagined patterns are being superimposed on visuallypresent ones. This experience of visual projection notwithstanding, however, we do not need toposit a pictorial object (the mental image) superimposed upon another pictorial object (theperceived image) in order to obtain the visual-like effects observed in these experiments. Solong as we have a way of binding representations of features (regardless of the form that thisrepresentation takes) to particular places in a scene, we provide a way for actual space to act as aframework for locating imaginal features. Because of this we make it possible to convert aspectsof the process of inspecting the merged information into an actual visual process. Take, forexample, the image-scanning experiment. So long as subjects can think this place is where thelighthouse is and that place is where the beach is] (where the locatives are replaced by FINSTindexes to features in the scene), then they can move attention or even their gaze to the indexedplaces thereby taking more time when the distance is greater. Other examples are discussedin Pylyshyn (1989).

    F i n a l l y , i n d e x e s f o r m t h e b a s i s f o r p e r c e p t u a l m o t o r c o n t r o l . A c c o r d i n g t o o u r w o r k i n ghypothesis they are the mechanism by which places in space can be bound to internal argumentsfor both visual predicates and for cross-modality binding of arguments in motor commands.Thus the FINST mechanism is an aspect of the architecture of early vision and motor control a part of the cognitive system that Newell (1990) speculated might require special mechanisms.

    The minimal mechanism strategy suggests that even when it is not possible to obtain directevidence for a particular mechanism, such as FINSTs (at least no more direct than the multiple-object tracking results), positing such a mechanism is justified when it allows one to construct

    14

  • certain processes which otherwise would remain a mystery. This is in fact just an instance of thehypothetico-deductive method, with the additional twist that we are giving particular weight tothe sufficiency criterion, and attempting to begin with the simplest set of operations sufficient forthe task even if we might expect that the actually mechanisms are more complex and containthe hypothesized operations as just one logical component.

    Conclusion

    Psychologists have typically been a timid bunch; unwilling to make bold hypotheses or sharp distinctions. I have argued that progress in Cognitive Science depends crucially on making at

    l e a s t o n e c r i t i c a l d i s t i n c t i o n . I t s t h e d i s t i n c t i o n b e t w e e n c o g n i t i v e p r o c e s s a n d c o g n i t i v earchitecture. There are many ways to view this distinction. Moreover, as with any empiricaldistinction, we find that it leaks at its edges and its joints and has to be patched carefully in orderto continue providing insights without getting bogged down in border disputes.

    The message of this essay has been that the very possibility of a strong form of computationalpsychology where the computational process literally mirrors, or is strongly equivalent to thecognitive process being modeled depends on our being able to independently motivate the

    p r o p e r t i e s o f t h e a r c h i t e c t u r e t h a t s u p p o r t s t h e p r o c e s s e s w e a r e m o d e l i n g . W i thout s u c hi n d e p e n d e n t m o t i v a t i o n , t h e p r o c e s s e s t h a t w e p o s t u l a t e a r e i m p l i c i t l y c o n d i t i o n e d b y t h ec o m p u t e r l a n g u a g e s a n d a r c h i t e c t u r e s t h a t w e h a p p e n t o h a v e a r o u n d w h i c h w e r e n o tdesigned with psychological modeling in mind.

    Having said that, we now are faced with the enormity of the task at hand. For to design acognitive architecture is to develop a theory of the minds relatively fixed capacities thosec a p a c i t i e s w h o s e m o d i fi c a t i o n d o e s n o t f o l l o w t h e s a m e p r i n c i p l e s a s t h e m o d i fi c a t i o n o fbehavior produced by the minds apprehension of new information, knowledge and goals. Theborder between architectural change and knowledge-based (or inductive) change is difficulte n o u g h t o a d j u d i c a t e i n p r a c t i c e . B u t the task of inferring the architectura l s t r u c t u r e t h a tunderwrites cognitive processes is even more difficult insofar as it must be quite indirect though no more difficult that the task of uncovering the fixed principles in any domain. We canapproach this task in various ways. We can develop small scale models as we have been doingover the past half century, and remain wary that any architecture we hypothesize (or implicitlyassume) must meet certain boundary conditions such as being cognitively impenetrable. Wecan then subject our assumed architectural mechanisms to these criteria and see if they stand upto scrutiny (as was done in Pylyshyn, 1981, 1991a).

    Alternatively we can take the bold step of attempting to design a complete architecture fromwhole cloth and subject it to the widest range of possible investigations by attempting to buildworking models of many different tasks. This is the strategy that Allen Newell took. Anarchitecture posited de novo is bound to be wrong in both detail and likely even in more basicways, and we must be prepared to iterate the process of refinement tirelessly. But unless we takeon that task we may be relegating psychology to the ranks of taxonomic and descriptive sciences,like botany and zoology. I am not suggesting that this would be a terrible disaster it may evenbe the only option that will work in the end and it may have considerable practical importance.But for those of us who hope that cognition and computation are both natural scientific domainsfor which common causal laws may eventually be formulated, this seems like a poor workinghypothesis. From this perspective, Newells seminal works seem like a proper step, thoughperhaps a small step, in the right direction.

    15

  • References

    Anderson, J.R.(1991) The place of cognitive architecture in a rational analysis. In K. VanLehn(ed), Architectures for Intelligence. Hillsdale: Lawrence Erlbaum Associates.

    Anderson, J.R. (1978) Arguments concerning representations for mental imagery. Psychologicalreview, 85, 249-277.

    Bernbaum, K, and Chung, C.S. (1981). Muller-Lyer illusion induced by imagination. Journal ofMental Imagery, 5, 125-128.

    Brewer, W.F. (1974) There is no convincing evidence for operant or classical conditioning inadult humans. In W.B. Weiner and D.S. Palermo (Eds), Cognition and symbolic processes,Hillsdale, NJ: Erlbaum.

    Brunswick, E. (1956) Perception and the representative design of psychological experiments.Berkeley, CA: University of California Press.

    Corso, J.F. (1963). A theoretico-historical review of the threshold concept. PsychologicalBulletin, 60, 356-370.

    D r i o r , I . E . ( 1 9 9 2 ) . C o m p o n e n t s o f s p a t i a l a w a r e n e s s : v i s u a l e x t r a p o l a t i o n a n d t r a c k i n g o fmultiple objects. Unpublished report, Dept of Psychology, Harvard University, 1992.

    Finke, R.A. (1979). The functional equivalence of mental images and errors of movement.Cognitive Psychology, 11, 235-264.

    Fodor, J.A. (1983). The Modularity of Mind: An essay on faculty psychology. Cambridge, Mass:MIT Press, A Bradford Book.

    Fodor, J.A. (1980). Representations. Cambridge, Mass: MIT Press, A Bradford Book.

    Fodor, J.A. & Pylyshyn, Z.W. (1988). Connectionism and Cognitive Architecture: A criticalanalysis. Cognition, 28, 3-71.

    Green, D.M. and Swets, J.A. (1966). Signal Detection Theory and Psychophysics. New York:Wiley.

    Hayes, J.R. (1973). On the function of visual imagery in elementary mathematics. In W. Chase(Ed), Visual Information Processing. New York: Academic Press

    Intriligator, J., Nakayama, K., & Cavanagh, P. (1991). Attentive tracking of multiple movingobjects at different scales. ARVO Annual Meeting Abtract Issue, 1040.

    Kosslyn, S.M. (1980). Image and Mind. Cambridge, Mass: Harvard Univ. Press.

    Kosslyn, S.M. (1975). The information represented in visual images. Cognitive Psychology, 7,341-370.

    16

  • Kosslyn, S.M. (1973). Scanning visual images: Some structural implications. Perception andPsychophysics, 15, 90-94.

    Marr, D. (1982). Vision. San Francisco, CA: W.H. Freeman.

    Marr, D. and Nishihara, H.K. (1976). Representation and recognition of spatial organization ofthree-dimensional shapes. A.I. Memo 377. Cambridge, Mass: MIT Artificial IntelligenceLaboratory.

    Newell, A. (1990) Unified Theories of Cognition. Cambridge, Mass: Harvard University Press.

    Newell, A. (1986). General Discussion of Modularity of Mind. In. Z.W. Pylyshyn and W.Demopoulos (Eds). Meaning and Cognitive Structure: Issues in the Computational Theoryof Mind. Norwood, NJ: Ablex Publishing.

    Newell, A. (1982). The knowledge level. Artificial Intelligence, 18, 1, 87-127.

    Newell, A. (1973a). Production systems: models of control structures. In W. Chase (ed), VisualInformation Processing. New York: Academic Press.

    Newell, A. (1973b). Why you cant play twenty questions with nature and win. In W. Chase(ed), Visual Information Processing. New York: Academic Press.

    Newell, A. and Simon, H.A. (1972). Human Problem Solving. Englewood Cliffs, NJ: PrenticeHall.

    Osgoode, C.E. (1953). Method and Theory in Experimental Psychology. New York: OxfordUniversity Press.

    Piatelli-Palmarini, M. (Ed). (1980) Language and Learning: The debate between Jean Piagetand Noam Chomsky. Cambridge, Mass: Harvard Univ. Press.

    P i n k e r , S . & P r i n c e , A . ( 1 9 8 8 ) . O n l a n g u a g e a n d c o n n e c t i o n i s m : A n a l y s i s o f a p a r a l l e ldistributed processing model of language acquisition. Cognition, 28, 73-194.

    Pylyshyn, Z.W. (1991a). The role of cognitive architecture in theories of cognition. In K.VanLehn (ed), Architectures for Intelligence. Hillsdale: Lawrence Erlbaum Associates.

    Pylyshyn, Z.W. (1991b). Rules and representations: Chomsky and representational realism. InA. Kashir (Ed), The Chomskian Turn. Oxford: Basic Blackwell, Ltd.

    Pylyshyn, Z.W. (1989). The role of spatial indexes in spatial perception: A sketch of the FINSTspatial indexing model. Cognition, 32, 65-97.

    Pylyshyn, Z.W. (1984). Computation and Cognition. Cambridge, Mass: MIT Press, A BradfordBook.

    P y l y s h y n , Z . W . ( 1 9 8 1 ) . T h e i m a g e r y d e b a t e : A n a l o g u e m e d i a v e r s u s t a c i t k n o w l e d g e .Psychological review, 88, 16-45.

    P y l y s h y n , Z . W . ( 1 9 7 9 ) . V a l i d a t i n g c o m p u t a t i o n a l m o d e l s : A c r i t i q u e o f A n d e r s o n sindeterminacy of representation claim. Psychological review, 86, 4, 383-394.

    17

  • Pylyshyn, Z.W. (1973). What the minds eye tells the minds brain: A critique of mentalimagery. Psychological Bulletin, 80, 1-24.

    Reddy, D.R. (1975). Speech Recognition. New York: Academic Press.

    Shepard, R.N. and Feng, C. (1972). A chronometric study of mental paper folding. CognitivePsychology, 3, 228-243.

    Shepard, R.N. and Metzler, J. (1971). Mental rotation of three-dimensional objects. Science,171, 701-703.

    S h e p a r d , R . N . a n d P o d g o r n y , P . ( 1 9 7 8 ) . C o g n i t i v e p r o c e s s e s t h a t r e s e m b l e p e r c e p t u a lprocesses. In W.K. Estes (Ed), Handbook of Learning and Cognitive Processes, Vol 5.Hillsdale, NJ: Lawrence Erlbaum.

    Simon, H.A. (1969). The Sciences of the Artificial. Cambridge, Mass: MIT Press.

    S m o l e n s k y , P . ( 1 9 8 8 ) . O n t h e p r o p e r t r e a t m e n t o f c o n n e c t i o n i s m . B e h a v i o r a l a n d B r a i nSciences, 11, 1-74.

    T r i c k , L . M . a n d P y l y s h y n , Z . W . ( 1 9 9 4 ) . W h y a r e s m a l l a n d l a r g e n u m b e r s e n u m e r a t e ddifferently? A limited capacity preattentive stage in vision. Psychological Review 101(1),80-102.

    Ullman, S. (1979). Visual routines. Cognition, 18, 97-159.

    Y a n t i s , S . ( 1 9 9 2 ) . M u l t i e l e m e n t v i s u a l t r a c k i n g : A t t e n t i o n a n d p e r c e p t u a l o r g a n i z a t i o n .Cognitive Psychology, 24, 295-330.

    18