symbols and rules - university of colorado bouldermatt.colorado.edu/teaching/highcog/fall8/fabian...

19
Symbols and Rules

Upload: others

Post on 14-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Symbols and Rules

Page 2: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Becoming symbol-minded

Judy S. DeLoache

Page 3: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Symbols

def: A symbol is something that someone intends to representsomething other than istelf.

This is among the looser definitions for symbols. It places anemphasis on intention and use.

Page 4: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Illustrative examples

I Symbols Represent Things (Priessler and Carey)I 18–24 mo.I Shown picture of whisk, associated with the wordI Will associate with object alone, or abject and picture

I Symbols are GeneralI ‘Baby Signs’ at 11 mo. → grater vocabulary at 3 yearsI 13–18 mo. accept non-verbal labels, 20 mo. show verbal bias

Page 5: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Symbols are intentionalI Infants and toddlers are sensitive to intentionsI Older Children take their own intentions very seriously.

of this age subsequently improved them. Thus, having acommunicative intent enhanced the children’s appreci-ation of the symbolic function of their drawings.

Learning symbol–referent relationsOne might think that it goes without saying that a symbolalways represents something ‘other than itself ’, but onlygradually do infants appreciate how some symbols differfrom their referents. They have to figure out throughexperience that a depicted toy cannot be picked up andmilk cannot be obtained from a photograph of a cup.

When presented with books containing highly realisticphotographs of individual objects, 9-month-olds do notsimply look at the pictures, as an older person would[29,30]. Instead, they behave like the infant in Figure 2a.They manually explore the images, frequently feeling,rubbing and striking the picture surface. Sometimes theyeven grasp at the pictures as if trying to pick up the

depicted objects. Similar behavior occurs to images of stilland moving objects on a television screen [31].

The manual exploration of depicted objects presumablyarises from uncertainty. Although infants can perceive adifference between real and depicted objects, they do notunderstand the significance of that difference, so theyinvestigate. This interpretation is supported by the factthat the more a depicted object looks like a real object, themore infants explore it. Color photographs elicit the mostexploration and black-and-white line drawings the least[30]. Thus, because infants do not understand the natureof pictures, they sometimes respond to depicted objects asif they were real objects.

This conclusion has recently been challenged byYonas and his colleagues (pers. commun.), who presented9-month-old infants with pictures of an object, simple colorpatches, and textured patterns. Using a relatively strin-gent definition of grasping, they found that the infants intheir study made very few grasping motions toward thedepicted entities. From this, they concluded that althoughthe infants did manually explore the depicted objects, theydid not respond to them as if they were real objects.Inspired by this report, we recoded the videotapes of theinfants in some of our studies using the more stringentcriteria for grasping. We found substantial evidence ofinfants’ grasping at the depicted objects, supporting ouroriginal interpretation.

Furthermore, additional relevant evidence comes frombehaviors like that depicted in Figure 2b. This 9-month-oldhas leaned over to place his lips on the nipple of a depictedbaby bottle. Thus, infants’ behavior toward depictedobjects is sometimes related to the specific meaning ofthe real objects they represent. With age – and presum-ably experience with pictures and video – manual explo-ration of depicted objects steadily declines. By 18 monthsof age, children point to and talk about pictured objectsinstead [29,31]. Thus, through experience, infants gradu-ally come to treat pictures symbolically, as objects ofcontemplation and communication, not action.

Very young children’s use of symbolic artifacts asinformationAs mentioned earlier, a vital function of symbols is toenable humans to acquire information without directexperience. Our vast stores of cultural knowledge existonly because we can learn indirectly through symbolicrepresentations.

Research that my colleagues and I have done hasrevealed many factors influencing very young children’sability to exploit the informational potential of symbolicartifacts. In this research, very young children are pro-vided with information about the location of a hidden toyvia a symbolic object – scale model, picture, video, map.For example, in the model task, children observe anexperimenter hide a miniature toy somewhere in a real-istic scale model of a room, and they are told that a largerversion of the object is hidden in the corresponding place inthe room itself. If the child understands the relationbetween the model and the room, finding the toy isrelatively easy. If, however, the child does not appreciate

Figure 1. When asked to draw a balloon and a lollipop, 4-year-olds produceddrawings that could have been either one. The same was true of their renderingsof the experimenter and themselves. Nevertheless, when asked to name a givenpicture, the children were adamant that it was whatever they had intended to drawwhen they produced it. Reproduced with permission from [27].

A lollipop

The experimenter

The child

A balloon

Figure 2. (a) Nine-month-olds often manually explore realistic photographs,revealing that they do not fully understand the critical difference between a pictureand its referent. This child is making grasping motions at a highly realistic colorphotograph of an object. (b) This 9-month-old boy is leaning over preparing to puthis lips on the nipple of the bottle. He apparently recognizes the content of thephotograph, but does not appreciate how the depicted bottle differs from a realone. Reproduced with permission from [29].

Review TRENDS in Cognitive Sciences Vol.8 No.2 February 200468

www.sciencedirect.com

Page 6: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

a Symbol represents something other than itself

of this age subsequently improved them. Thus, having acommunicative intent enhanced the children’s appreci-ation of the symbolic function of their drawings.

Learning symbol–referent relationsOne might think that it goes without saying that a symbolalways represents something ‘other than itself ’, but onlygradually do infants appreciate how some symbols differfrom their referents. They have to figure out throughexperience that a depicted toy cannot be picked up andmilk cannot be obtained from a photograph of a cup.

When presented with books containing highly realisticphotographs of individual objects, 9-month-olds do notsimply look at the pictures, as an older person would[29,30]. Instead, they behave like the infant in Figure 2a.They manually explore the images, frequently feeling,rubbing and striking the picture surface. Sometimes theyeven grasp at the pictures as if trying to pick up the

depicted objects. Similar behavior occurs to images of stilland moving objects on a television screen [31].

The manual exploration of depicted objects presumablyarises from uncertainty. Although infants can perceive adifference between real and depicted objects, they do notunderstand the significance of that difference, so theyinvestigate. This interpretation is supported by the factthat the more a depicted object looks like a real object, themore infants explore it. Color photographs elicit the mostexploration and black-and-white line drawings the least[30]. Thus, because infants do not understand the natureof pictures, they sometimes respond to depicted objects asif they were real objects.

This conclusion has recently been challenged byYonas and his colleagues (pers. commun.), who presented9-month-old infants with pictures of an object, simple colorpatches, and textured patterns. Using a relatively strin-gent definition of grasping, they found that the infants intheir study made very few grasping motions toward thedepicted entities. From this, they concluded that althoughthe infants did manually explore the depicted objects, theydid not respond to them as if they were real objects.Inspired by this report, we recoded the videotapes of theinfants in some of our studies using the more stringentcriteria for grasping. We found substantial evidence ofinfants’ grasping at the depicted objects, supporting ouroriginal interpretation.

Furthermore, additional relevant evidence comes frombehaviors like that depicted in Figure 2b. This 9-month-oldhas leaned over to place his lips on the nipple of a depictedbaby bottle. Thus, infants’ behavior toward depictedobjects is sometimes related to the specific meaning ofthe real objects they represent. With age – and presum-ably experience with pictures and video – manual explo-ration of depicted objects steadily declines. By 18 monthsof age, children point to and talk about pictured objectsinstead [29,31]. Thus, through experience, infants gradu-ally come to treat pictures symbolically, as objects ofcontemplation and communication, not action.

Very young children’s use of symbolic artifacts asinformationAs mentioned earlier, a vital function of symbols is toenable humans to acquire information without directexperience. Our vast stores of cultural knowledge existonly because we can learn indirectly through symbolicrepresentations.

Research that my colleagues and I have done hasrevealed many factors influencing very young children’sability to exploit the informational potential of symbolicartifacts. In this research, very young children are pro-vided with information about the location of a hidden toyvia a symbolic object – scale model, picture, video, map.For example, in the model task, children observe anexperimenter hide a miniature toy somewhere in a real-istic scale model of a room, and they are told that a largerversion of the object is hidden in the corresponding place inthe room itself. If the child understands the relationbetween the model and the room, finding the toy isrelatively easy. If, however, the child does not appreciate

Figure 1. When asked to draw a balloon and a lollipop, 4-year-olds produceddrawings that could have been either one. The same was true of their renderingsof the experimenter and themselves. Nevertheless, when asked to name a givenpicture, the children were adamant that it was whatever they had intended to drawwhen they produced it. Reproduced with permission from [27].

A lollipop

The experimenter

The child

A balloon

Figure 2. (a) Nine-month-olds often manually explore realistic photographs,revealing that they do not fully understand the critical difference between a pictureand its referent. This child is making grasping motions at a highly realistic colorphotograph of an object. (b) This 9-month-old boy is leaning over preparing to puthis lips on the nipple of the bottle. He apparently recognizes the content of thephotograph, but does not appreciate how the depicted bottle differs from a realone. Reproduced with permission from [29].

Review TRENDS in Cognitive Sciences Vol.8 No.2 February 200468

www.sciencedirect.com

I Nine mo. grasp at realistic color photographs.

Page 7: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Information about a hidden object via symbolicrepresentation

I 2.5-year-olds fail, 3-year-olds succeedI Dual Representation

I Increasing salience of model as object decreases success for 3 yoI Decreasing salience as object increases success for 2.5 yo

I viewing via windowI shrinking machine

I something

Page 8: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Prefrontal cortex and flexible cognitive control:Rules without symbols

Rougier, Noelle, Braver, Cohen, O’Reilly

Page 9: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Mechanisms of the PFC embodied in a neural network model result ina self-organized abstract rule-like PFC representation that supportsflexible cognitive control.

Page 10: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Properties of the PFC informing the model

1. PFC can maintain representations to be held in working memory2. Representations are adaptively maintained by switching between

active maintenance and rapid updatingI Unexpected reward stabilizes the PFCI Lack of expected reward destabilizes PFCI Controlled by dopamine from the VTA (ventral tegmental area) or

without dopamine from the basal ganglia

3. PFC modulates processing via extensive connections to othercortical areas

Page 11: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

ically intact and frontally damaged people on benchmark tasks ofcognitive control.

MethodsWe tested a model implementing the three sets of PFC-specificmechanisms described above (Fig. 1a), as well as versions of itlacking these mechanisms by varying degree. These models weretrained either on two (Task Pairs condition) or four tasks (AllTasks condition), to test the effects of restricted vs. broadtraining experience, respectively. The tasks were designed tosimulate simple processing of multidimensional stimuli (e.g.,varying along dimensions such as size, shape, color, etc.) andactive maintenance. Critically, we constructed these tasks so theyall shared a common requirement: only one stimulus dimensionwas relevant at a given time. For example, one task involvednaming a stimulus feature value along a given dimension (e.g., ifthe stimulus was a blue large circular object, and the relevantdimension was shape, then the correct response was to activatethe ‘‘circle’’ output unit; Fig. 1b). Other tasks included matchingfeatures of two stimuli (if they matched along the relevantdimension, the correct output was the name of the sharedfeature; otherwise, the ‘‘No Match’’ unit should be activated) orcomparing their relative ordinal values (i.e., output the name ofthe larger!smaller feature within the relevant dimension).

Thus, knowing the relevant dimension was a critical rule in eachtask, uniquely determining the mapping from stimulus to response.Because all of the tasks shared this requirement, attention to asingle dimension, we predicted that during training, the PFC woulddevelop abstract representations of these dimensions (i.e., learn therelevant set of rules), and that this would allow it to generalize itsperformance to novel stimuli in each task. To allow the current ruleto be discovered solely by trial-and-error learning (even in networkswithout a PFC, which adapted relatively slowly to task rulechanges), we kept the relevant dimension the same over blocks oftrials (a variety of strategies for blocking task and dimensioninformation were explored without substantial differences in re-

sults, as described in supporting information; the basic case was taskswitching every block of 25 trials, with dimension switching aftertwo iterations through all of the tasks). These conditions weredesigned to simulate simple forms of real-world learning experiencethat humans encounter during development (e.g., in playing withblocks, a sustained focus on the shapes of these objects is necessaryto construct desired structures). Furthermore, we also included theability to provide explicit task instructions to the models by meansof a dimension cue input, to provide as generous a test as possibleof models lacking the ability to maintain task-relevant informationinternally (see supporting information for more details and effectsof parametric variations).

To enable generalization testing, the model saw only a subset ofthe feature values along each dimension for a given task and arelatively small fraction (!30%) of all possible stimuli (i.e., com-binations of features across dimensions). A given training runconsisted of 100 epochs of 2,000 trials per epoch; it took thenetworks only !10 epochs to achieve near-perfect performance onthe training items, but we measured crosstask generalization per-formance every five epochs throughout the duration to find the bestgeneralization for each network, unconfounded by any differencesin architecture or in the raw amount of exposure to features acrossdifferent training scenarios. Generalization testing measured thenetwork’s ability to respond to stimuli it had not seen in that task.

We trained and tested different network configurations to testthe contribution made by constituent mechanisms to learning andperformance. All network configurations had the same total num-ber of processing units, to control for the effects of overall com-puting resources. The only differences among configurations werethe patterns of connectivity and the presence or absence of theadaptive gating mechanism. The various configurations are de-scribed in Fig. 3. These ranged from a simple feedforward networkwith 145 hidden units (equaling the number of hidden plus PFCunits in the full PFC model) to the complete model, including fullrecurrent connectivity within the PFC and an adaptive gatingmechanism. For all networks, we ran 10 different random initial

Fig. 1. Model and example stimuli. (a) The model with the complete PFC system. Stimuli are presented in two possible locations (left, right). Rows representdifferent stimulus dimensions (e.g., color, size, shape, etc., labeled A–E for simplicity), and columns represent different features (red, orange green, and blue;small, medium, etc., numbered 1–4). Other inputs include a task input indicating current task to perform (NF, name feature; MF, match feature; SF, smallerfeature; LF, larger feature), and, for the ‘‘instructed’’ condition (used to control for lack of maintenance in non-PFC networks), a cue to the currently relevantdimension. Output responses are generated over the response layer, which has units for the different stimulus features, plus a ‘‘No’’ unit to signal nonmatchin the matching task. The hidden layers represent posterior cortical pathways associated with different types of inputs (e.g., visual and verbal). The AG unit isthe adaptive gating unit, providing a temporal differences (TD) based dynamic gating signal to the PFC context layer. The weights into the AG unit learn via theTD mechanism, whereas all other weights learn using the Leabra algorithm that combines standard Hebbian and error-driven learning mechanisms, togetherwith k-winners-take-all inhibitory competition within layers and point-neuron activation dynamics (26) (also see supporting information). (b) Example stimuliand correct responses for one of the tasks (NF) across three trials where the current rule is to focus on the Shape dimension (the same rule was blocked over 200trials to allow networks plenty of time to adapt to each rule). The corresponding input and target patterns for the network are shown below each trial, withthe unit meanings given by the legend in the lower left. The network must maintain the current dimension rule to perform correctly.

Rougier et al. PNAS " May 17, 2005 " vol. 102 " no. 20 " 7339

NEU

ROSC

IEN

CE

networks to generate statistics, and error bars in Figs. 3 and 4 reflectthe standard error over these runs.

The model was implemented in the Leabra algorithm, whichincludes error-driven and associative (Hebbian) learning mecha-nisms, k-winners-take-all inhibitory competition within layers, andpoint-neuron ion-channel-based neural dynamics with bidirectionalexcitatory connectivity. Leabra integrates the most widely usedneural modeling principles developed by a variety of researchersinto one unified framework, which has been used to simulate !40different cognitive models from perception and attention to learn-ing, memory, language, and higher-level cognition (26), plus manymore published simulations in other papers. In keeping with thegoal of using the same set of mechanisms and parameters across awide range of models, default parameters and mechanisms wereused in this model. The details of these standard mechanisms andthe PFC-specific mechanisms in our model are described in ref. 24and supporting information.

ResultsRepresentations and Generalization. Our primary finding was that,over the course of training on these tasks, the PFC layer in the fullmodel developed synaptic weights and associated patterns of ac-

tivity that encoded abstract rule-like representations of the relevantstimulus dimensions (Fig. 2d). That is, each PFC unit came torepresent a single dimension and all features in that dimension.More precisely, these representations collectively formed a basis setof orthogonal vectors that spanned the space of task-relevantstimuli, and that were aligned with the dimensions along whichfeatures had to be distinguished for task performance. Moregenerally, we can characterize rule-like representations as encodingand producing a common abstract pattern of behavior over a broadclass of specific situations. These representations were only partiallyapparent in the configuration having a PFC but lacking an adaptivegating mechanism (Fig. 2b), as well as the full model trained onlyon task pairs (Fig. 2c), and were essentially absent from the modelentirely lacking a PFC (Fig. 2a). These models tended to memorizespecific combinations of stimulus features and responses ratherthan develop abstract representations of feature dimensions thatcould serve as more general rules. Additional principal componentsanalysis supported this visual interpretation of the weights, showingthat the non-PFC networks do not simply have a low-dimensional‘‘rotated’’ representation of the dimensions (e.g., the posteriorcortex model had 8 eigenvalues !1 and a smooth continuum downto a minimum of 0.4, which is still relatively large). As noted in

Fig. 2. Representations (synaptic weights) that developed in four different network configurations. (a) Posterior cortex only (no PFC) trained on all tasks. (b) PFCwithout the adaptive gating mechanism (all tasks). (c) Full PFC trained only on task pairs (name feature and match feature in this case). (d) Full PFC (all tasks). Each imageshows the weights from the hidden units (a) or PFC (b–d) to the response layer. Larger squares correspond to units (all 30 in the PFC and a random and representativesubset of 30 from the 145 hidden units in the posterior model), and the smaller squares within designate the strength of the connection (lighter" stronger) from thatunit to each of the units in the response layer. Note that each row designates connections to response units representing features in the same stimulusdimension (asillustrated in e and Fig. 1). It is evident, therefore, that each of the PFC units in the full model (d) represents a single dimension and, conversely, that each dimensionis represented by a distinct subset of PFC units. This pattern is less evident to almost entirely absent in the other network configurations (see text for additional analyses).

7340 ! www.pnas.org"cgi"doi"10.1073"pnas.0502455102 Rougier et al.

Page 12: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

networks to generate statistics, and error bars in Figs. 3 and 4 reflectthe standard error over these runs.

The model was implemented in the Leabra algorithm, whichincludes error-driven and associative (Hebbian) learning mecha-nisms, k-winners-take-all inhibitory competition within layers, andpoint-neuron ion-channel-based neural dynamics with bidirectionalexcitatory connectivity. Leabra integrates the most widely usedneural modeling principles developed by a variety of researchersinto one unified framework, which has been used to simulate !40different cognitive models from perception and attention to learn-ing, memory, language, and higher-level cognition (26), plus manymore published simulations in other papers. In keeping with thegoal of using the same set of mechanisms and parameters across awide range of models, default parameters and mechanisms wereused in this model. The details of these standard mechanisms andthe PFC-specific mechanisms in our model are described in ref. 24and supporting information.

ResultsRepresentations and Generalization. Our primary finding was that,over the course of training on these tasks, the PFC layer in the fullmodel developed synaptic weights and associated patterns of ac-

tivity that encoded abstract rule-like representations of the relevantstimulus dimensions (Fig. 2d). That is, each PFC unit came torepresent a single dimension and all features in that dimension.More precisely, these representations collectively formed a basis setof orthogonal vectors that spanned the space of task-relevantstimuli, and that were aligned with the dimensions along whichfeatures had to be distinguished for task performance. Moregenerally, we can characterize rule-like representations as encodingand producing a common abstract pattern of behavior over a broadclass of specific situations. These representations were only partiallyapparent in the configuration having a PFC but lacking an adaptivegating mechanism (Fig. 2b), as well as the full model trained onlyon task pairs (Fig. 2c), and were essentially absent from the modelentirely lacking a PFC (Fig. 2a). These models tended to memorizespecific combinations of stimulus features and responses ratherthan develop abstract representations of feature dimensions thatcould serve as more general rules. Additional principal componentsanalysis supported this visual interpretation of the weights, showingthat the non-PFC networks do not simply have a low-dimensional‘‘rotated’’ representation of the dimensions (e.g., the posteriorcortex model had 8 eigenvalues !1 and a smooth continuum downto a minimum of 0.4, which is still relatively large). As noted in

Fig. 2. Representations (synaptic weights) that developed in four different network configurations. (a) Posterior cortex only (no PFC) trained on all tasks. (b) PFCwithout the adaptive gating mechanism (all tasks). (c) Full PFC trained only on task pairs (name feature and match feature in this case). (d) Full PFC (all tasks). Each imageshows the weights from the hidden units (a) or PFC (b–d) to the response layer. Larger squares correspond to units (all 30 in the PFC and a random and representativesubset of 30 from the 145 hidden units in the posterior model), and the smaller squares within designate the strength of the connection (lighter" stronger) from thatunit to each of the units in the response layer. Note that each row designates connections to response units representing features in the same stimulusdimension (asillustrated in e and Fig. 1). It is evident, therefore, that each of the PFC units in the full model (d) represents a single dimension and, conversely, that each dimensionis represented by a distinct subset of PFC units. This pattern is less evident to almost entirely absent in the other network configurations (see text for additional analyses).

7340 ! www.pnas.org"cgi"doi"10.1073"pnas.0502455102 Rougier et al.

Page 13: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Methods, the total number of training trials and stimulus inputs wasequated across simulation conditions, so that the increased breadthof experience in the All Tasks condition was solely from exposureto more task contexts. Furthermore, models were trained wellbeyond convergence, so differences in overall learning rate are nota factor.

The abstract rule-like representations that developed in the fullPFC model supported task performance by providing top-downexcitatory support for the relevant stimulus dimension in the rest ofthe network. The adaptive gating system learned to update the PFClayer activity when the relevant stimulus dimension (i.e., task rule)changed (due to rapid error-based destabilization of PFC activa-tions), and the PFC actively maintained this rule while it remainedin effect. In models without these active maintenance and updatingmechanisms, synaptic learning mechanisms shifted the network’sprocessing to the relevant stimulus dimension, but these changeswere necessarily slower than the rapid shifts that can be achieved bydynamic updating of activation states in PFC (26). This differenceaccounts for the increased levels of perseveration observed withPFC damage in the Wisconsin Card Sort Task (WCST) and othertasks, as has been demonstrated in several existing models (14, 15,24) and as we report for our model below.

We hypothesized that the abstract rule-like representations thatdeveloped in the full PFC model should support more flexiblecognitive control in this model relative to the others. We tested thisidea by comparing the ability of each network to generalize itsperformance across the different tasks. Each network was trainedon a subset of stimuli in each task and then tested on stimuli thatit had not previously seen in that task. We theorized that theabstract dimensional representations in the PFC would be able toguide processing for the task–novel test stimuli in a similar manner

as the trained stimuli. Indeed, only the Full PFC model exhibitedsubstantial generalization, achieving 85% accuracy (i.e., only one-third as many errors as other networks) on stimuli for which it hadno prior same-task experience (Fig. 3a). However, this was the caseonly for the All Tasks regimen; training on pairs of tasks resultedin more than four times as many generalization errors. Thisindicates that breadth of experience was critical for exploiting themechanisms present in the PFC, just as we had earlier observed inthe development of the abstract rule-like PFC representations.Indeed, Fig. 3b shows that, as we hypothesized, the degree to whichdifferent networks developed abstract dimensional representationswas strongly correlated with the network’s generalization perfor-mance (r ! 0.97).

There is a clear mechanistic explanation for why the combinationof rapid updating and sustained active maintenance of task rulerepresentations in the full PFC model (which depends on theadaptive gating mechanism) was critical for the formation ofabstract rule-like representations during training. Within a block oftrials with the same relevant dimension, the specific features withinthat dimension varied, but a constant PFC activity pattern wasmaintained due to the gating mechanism. This caused these PFCrepresentations, which initially had random connections, to begin toencode all of the varying features within a dimension, resulting inan abstract dimensional representation. In contrast, other networkstended to activate new representations for each new stimulus (as thespecific features changed) and thus were unable to form thedimensional abstraction across features. Interestingly, the dimen-sional alignment of PFC representations was greater for the AllTasks than the Task Pairs condition. This is because the pressure touse the same PFC representations across all tasks increased with thenumber of tasks: with only two tasks, it was possible for the network

Fig. 3. Generalization and learning results. (a) Crosstask generalization results (% correct on task-novel stimuli) for the full PFC network and a variety of controlnetworks, with either only two tasks (Task Pairs) or all four tasks (All Tasks) used during training (n ! 10 for each network, error bars are standard errors). Overall,the full PFC model generalizes substantially better than the other models, and this interacts with the level of training such that performance on the All Taskscondition is substantially better than the Task Pairs condition (with no differences in numbers of training trials or training stimuli). With one feature left out oftraining for each of four dimensions, training represented only 31.6% (324) of the total possible stimulus inputs (1,024); the "85% generalization performanceon the remaining test items therefore represents good productive abilities. The other networks are: Posterior, a single large hidden unit layer between inputsand response, a simple model of posterior cortex without any special active maintenance abilities; P # Rec, posterior # full recurrent connectivity among hiddenunits, allows hidden layer to maintain information over time via attractor dynamics; P # Self, posterior # self-recurrent connections from hidden units tothemselves, allows individual units to maintain activations over time; SRN, simple recurrent network, with a context layer that is a copy of the hidden layer onthe prior step, a widely used form of temporal maintenance; SRN-PFC, an SRN context layer applied to the PFC layer in the full model (identical to the full PFCmodel except for this difference), tests for role of separated hidden layers; NoGate, the full PFC model without the AG adaptive gating unit. (b) The correlationof generalization performance with the extent to which the units distinctly and orthogonally encode stimulus dimensions for the networks shown in Fig. 2. Thiswas computed by comparing each unit’s pattern of weights to the set of five orthogonal, complete dimensional target patterns (i.e., the A dimension targetpattern has a 1 for each A feature, and 0s for the features in all other dimensions, etc.). A numeric value between 0 and 1, where 1 represents a completelyorthogonal and complete dimensional representation was computed for unit i as: di ! maxk!wi!tk!"$k!wi!tk!; where tk is the dimensional target pattern k, andwi is the weight vector for unit i, and !wi!tk! represents the normalized dot product of the two vectors (i.e., the cosine). This value was then averaged across allunits in the layer and then correlated with that network’s generalization performance. (c) Relative stability of PFC and hidden layer (posterior cortex) in themodel, as indexed by Euclidean distance between weight states at the end of subsequent epochs (epoch ! 2,000 trials). The PFC takes longer to stabilize (i.e.,exhibits greater levels of weight change across epochs) than the posterior cortex. For PFC, within-PFC recurrent weights were used. For Hidden, weights fromstimulus input to Hidden were used. Both sets of weights are an equivalent distance from error signals at the output layer. The learning rate is reduced at 10epochs, producing a blip at that point.

Rougier et al. PNAS ! May 17, 2005 ! vol. 102 ! no. 20 ! 7341

NEU

ROSC

IEN

CE

Page 14: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

to use different PFC representations for different tasks, but thisstrategy becomes less and less efficient as the number of tasksincreases. The adaptive gating mechanism also caused the PFCrepresentations to focus on single dimensions, instead of encodingfeatures across multiple dimensions, because the gating mechanismcaused all active PFC units to be inhibited upon a dimension switch,discouraging persistent activation across multiple dimensions.Thus, overall, the adaptive gating mechanism plays a critical role inshaping the PFC representations.

Our model makes the additional prediction that PFC represen-tations should stabilize later in development (training) than thosein posterior areas, because it is necessary for representations inposterior systems to stabilize before the PFC can extract thedimensions of these representations relevant to task performance.We tested this by measuring the average magnitude of weightchanges from projections into the main hidden (posterior cortex)layer and in the PFC layer. The hidden layer stabilized within 20epochs (one epoch is 2,000 trials), whereas the PFC did not stabilizeuntil 70 epochs (Fig. 3c). This slower development of PFC repre-sentations, together with the breadth of training required, is con-sistent with the protracted developmental course of the human PFC(extending into late adolescence), which allows a broad range ofexperience to shape PFC representations (9–11).

Neuropsychological Tasks. We next explored whether the rule-likePFC representations learned by our model can produce appropriatepatterns of performance in tasks specifically associated with pre-frontal function. To do so, we used the full PFC model trained inthe All Tasks condition to perform simulations of the Stroop taskand the WCST, two tasks that have been used widely as benchmarksof prefrontal function (27–30). Converging evidence from a varietyof sources suggests that the kinds of dimensional stimulus repre-sentations found in our model are localized in dorsolateral areas of

PFC (DLPFC) in humans (see supporting information for morediscussion). Accordingly, we focused on DLPFC lesion data in bothof these tasks.

In the Stroop task, participants are presented with color wordsprinted in various colors and are asked to either read the wordor name the color in which it is printed. Due to greater familiaritywith word reading, it is relatively faster than color naming, andan incongruent word (e.g., ‘‘green’’ displayed in red) interfereswith color naming (saying ‘‘red’’), whereas word reading isrelatively unaffected. To simulate these asymmetries of experi-ence in our model, one of the stimulus dimensions was trainedless (25% as much) than the other four dimensions, with all otherfactors unchanged from the first study. The model captures thecharacteristic effects seen in human Stroop performance (Fig.4a). These results replicate previous modeling work showing thattop-down excitation from PFC representations of the dimen-sions that define each task (colors vs. words) can partiallycompensate for the differences in relative strength of the rele-vant posterior pathways (13, 26). However, unlike these earliermodels, PFC representations in our model developed throughlearning. Furthermore, Fig. 4b shows that simulated lesions tothe model’s PFC layer (30% unit removal, post training) repli-cate the color-naming impairments observed from PFC lesions(predominantly dorsolateral areas of PFC) in human patients(30), consistent with the observation that this PFC area supportsabstract color dimension representations (29).

In the WCST task, participants are provided with a deck of cardsbearing multidimensional stimuli that vary in shape, size, color, andnumber. These must be sorted according to a particular dimension(rule), which must be discovered from trial-and-error feedback.The rule switches without warning after the participant makes acriterion number of correct responses in sequence (e.g., ref. 8).Patients with frontal damage typically are able to discover the first

Fig. 4. Neuropsychological taskresults. (a) Performance of the fullPFC network on a simulatedStroop task, demonstrating theclassic pattern of conflict effectson the subordinate task of colornaming with unaffected perfor-mance on the dominant wordreading task (human data fromref. 31). This was simulated bytraining one dimension (a) withone-fourth the frequency of theothers, making it weaker. In theneutral condition, a single featurewas active, whereas the conflictcondition had two featurespresent and the dimension cue in-put specified that was to benamed. Reaction time (RT) wasmeasured as the number of cyclesto activate a feature in the re-sponse layer !0.75 (multiplied by35 to match human RT in msec). (b)Stroop performance for a 30% le-sion (removal) of PFC units in themodel (posttraining), comparedwith data from ref. 30 on patientswith left frontal (LF) lesions (six ofeight include dorsolateral PFC) and matched controls (Ctrl) (data in seconds to complete a block of trials; model cycles were transformed as RT " cycles # 5.5–30to fit this scale; the Conflict Word reading conditions were not run on the human subjects). The main effect of damage is an overall slowing of color naming,consistent with the notion that the PFC provides top-down support to this weaker pathway via abstract dimensional representations. (c) Performance in asimulated WCST task, demonstrating the classic pattern of increasing perseveration with increased PFC damage (% of units removed, posttraining). Persevera-tions " number of sequential productions of feature names corresponding to the previously relevant dimension after a switch. Clearly, the simulated PFC is criticalfor rapid flexible switching. (d) WCST results (perseverations) for the three different training conditions used by ref. 28 (128 is the standard case plotted before,whereas 64A involves providing instructions about the relevant dimensions along which cards could be sorted, and 64B has explicit instruction when the rulechanges; see supporting information for details). n " 10 networks; error bars " standard error for all graphs.

7342 ! www.pnas.org"cgi"doi"10.1073"pnas.0502455102 Rougier et al.

Page 15: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Language, embodiment, and the cognitive niche

Andy Clarke

Page 16: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Words as targets in the material world

I Example of Sheba the chimpanzee

I “The simple act of labeling creates for the learner a new realm ofperceptible objects upon which to target her more basiccapacities of statistical and associative learning. The actualpresence of tags or labels . . . alters the computational burdensinvolved in certain kinds of learning and problem-solving”

Page 17: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Words as constituents of hybrid thoughts: numbers

Dehaene et al. suggest that precise mathematical thought relies on. . .

I ‘1-ness’ ‘2-ness’ ‘3-ness’ ‘more-than-that-ness’

I Rough Magnitude comparisons

I learnt capacity to use specific number words of a language andappreciate that each distinct number represents a distinct quantity

Page 18: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

Thinking about thinking

Linguaform reason

I Thinking about the thoughts and beliefs of others

I Being self evaluative

I “To formulate thoughts in words is to create an object available toourselves and to others, and, as an object, it is the kind of thingwe can have thoughts about.”

I Stable attendable structure to which subsequent thinking canattach.

Page 19: Symbols and Rules - University of Colorado Bouldermatt.colorado.edu/teaching/highcog/fall8/fabian symbolsAndRules.pdf · Illustrative examples I Symbols Represent Things (Priessler

A stabilizing force in neural networks?

I Neural networks are ‘fluid’ and need stabilizing

I The models can be strongly context sensitive

I Words, instead of being cues, might be like encountered itemsand act directly on mental states