on activity-dependent development in ... - bryant university

On Activity-Dependent Development in the Mammalian Visual System

Prior to Visual Experience

byG. David Poznik

ThesisSubmitted in partial fulfillment of the requirements for the

Degree of Bachelor of Science with Honors Distinction in the Department ofBiophysics at Brown University.

April 1999

This thesis by G. David Poznik

is accepted in its present form by the Division of Biology and Medicineas satisfying the thesis requirements for the degree of

Bachelor of Science with Honors in Biophysics

DateProf. Leon N Cooper, advisor

DateProf. Michael A Paradiso, second reader

ii

Acknowledgments

First, I would like to thank Dr. Brian Blais. He has been a mentor to me from day one. My researchwas motivated by his work and was strongly influenced by his ideas. Brian was always available toanswer my questions, whether they were about C, Matlab, LaTeX, simulation results, Biology, orPhysics. He also saved me much time in writing this thesis, as several of the figures are his.

I’d like to thank my advisor, Professor Leon Cooper, for giving me the opportunity to workin his lab. I am grateful for having had the chance to synthesize my undergraduate studies ofMathematics, Physics, Biology, and Computer Science into a single research project in an excitingfield. I’d also like to thank Professor Michael Paradiso for agreeing to be my second reader eventhough he is on sabbatical this semester. Thank you to the other members of the Cooper lab: OmerArtun, Ann Lee, Nicola Neretti, Pedja Neskovic, Dr. Gastone Castellani, and Dr. Harel Shouvalfor sharing your collective expertise.

I’d like to thank my best friend, Pranay Parikh, for getting me interested in Biology in thefirst place. Thanks also to the other guys in my house and on my ultimate team for making my lastyear of college a memorable one. And especially, thanks to my family for giving me the opportunityto get a first-class education, and for supporting me along the way.

iii

Abstract

On Activity-Dependent Development in the Mammalian Visual SystemPrior to Visual Experience

We offer an explanation of the pre-eye-opening development of four properties of the mammalianvisual system: cortical orientation selectivity, the localized nature of retinogeniculate connections,the retinotopic map in the LGN, and the eye-specific lamination of the LGN. We investigate thepossibility that the development of these properties results from structure in the activity of theprenatal visual environment. Three separate approaches are taken to modeling this activity. Wemodel the prenatal visual environment as consisting of either: explicitly correlated noise, retinallyprocessed noise, or retinal waves. A study of the behavior of the BCM and PCA learning rules inthese model environments leads to an understanding of the emergence of the four visual systemproperties. Both rules are consistent with the initial development of orientation selectivity, but,unlike BCM, the PCA rule is unable to account for the development of the LGN properties.

iv

CONTENTS

Contents

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1The Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1The Visual System: The Retinofugal Pathway . . . . . . . . . . . . . . . . . . 2

1.2 Some Properties of the LGN and Visual Cortex . . . . . . . . . . . . . . . . . . . . . 3Cortical Orientation Selectivity . . . . . . . . . . . . . . . . . . . . . . . . . . 5LGN Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 The Prenatal Visual Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Retinally Processed Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Correlated Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Retinal Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Methods 10

2.1 Modeling Neural Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10The Model Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Synaptic Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Discrete Learning Rules and Simulations . . . . . . . . . . . . . . . . . . . . . 14

2.2 The PCA Learning Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Low Dimensional Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 The BCM Learning Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Low Dimensional Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 Training on Natural Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23BCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.5 A Model Synthetic Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.6 Environments Defined by a Correlation Function . . . . . . . . . . . . . . . . . . . . 282.7 Retinally Processed Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

v

CONTENTS

2.8 Retinal Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Results 36

3.1 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Geniculo-Cortical Connections . . . . . . . . . . . . . . . . . . . . . . . . . . 36Retinal-Geniculate Connections . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 Orientation Selectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Correlation Function Environments . . . . . . . . . . . . . . . . . . . . . . . . 37Retinally Processed Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3 Localized Retinogeniculate Connections . . . . . . . . . . . . . . . . . . . . . . . . . 383.4 Retinotopy in the LGN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.5 Eye-Specific Layers in LGN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Discussion 41

4.1 Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Cortical Orientation Selectivity . . . . . . . . . . . . . . . . . . . . . . . . . . 41The LGN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 Comparison of Learning Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.3 The Emergent Picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Retina to LGN Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46LGN to Cortex Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

A PCA Fixed Points 47

vi

CHAPTER 1. INTRODUCTION

Chapter 1

Introduction

Being and thought are one.Jean Dubuffet (1901–1985)

It is our abilities to think, to reason, to learn, to communicate, and to reflect on our own existencethat define the human experience. The source of these capabilities lies in the brain’s aptitude forstoring and processing information. Our goal in neuroscience is to reach an understanding of howthe brain works; we want to be able to explain both its physiology and its methods of computa-tion. Toward this understanding, researchers have proposed several models of neural informationprocessing.

BCM(Bienenstock et al., 1982) and PCA(Oja, 1982) are examples of such models, or “learningrules.” They can be studied in simulations of visual environments. An important result is that asimulated visual cortical neuron can develop orientation selectivity(Law, 1994). That is, a model cellthat at first responds randomly to stimulation, can learn, through the application of a learning ruleon a simulated visual environment, to respond strongly to a bar of light of a specific orientation andweakly to bars of different orientations. This property of real cortical neurons was first characterizedin the sixties(Hubel and Wiesel, 1962). The simulations mentioned above model an animal’s visualexperience, which has been shown to be important in cortical development(Blakemore, 1976; Wieseland Hubel, 1965). However, visual experience does not tell the whole story in a cell’s development.Orientation selectivity exists to some extent in kittens before they open their eyes(Hubel and Wiesel,1963) and in monkeys before birth(Wiesel and Hubel, 1974). There are also several properties ofthe LGN (a component of the visual pathway) that develop prior to visual experience. This workwill attempt to explain how these properties may arise in an activity-dependent manner.

1.1 Background

The Neuron

This work will focus on learning at the single cell level. Here we will review the basic propertiesof the neuron, the primary information processing unit (Fig. 1.1). The cell body, or soma, has twotypes of projections (neurites): dendrites and axons. These projections transmit information in theform of electro-chemical pulses. A typical neuron has an array of dendrites that brings informationto the cell from other parts of the nervous system. There is also a single axon through whichinformation leaves the cell.

1


Axon

Dendrites

Soma

Figure 1.1: The Neuron (adapted from Bear, Connors, Paradiso, Neuroscience: Exploring theBrain). The dendrites collect information that contributes to the potential of the soma. Informationis transmitted by the axon.

Neurons have a negative resting potential of about -65 mV. Innervation from other neuronscan raise or lower the potential. When the cell’s potential is raised beyond a threshold potential of-40 mV(Bear et al., 1996), voltage-gated sodium channels open up. Positive sodium ions rush in,further depolarizing the cell. The potential reaches a peak just before the sodium channels close.This electrical potential burst is called an action potential. It is transmitted by the cell along itsaxon.

A basic property of neural systems is that these action potentials are “all or nothing.” Thatis, all action potentials are the same size—there are no half-bursts of potential. A neuron’s job,therefore, is to collect potential input and use it to decide whether or not to fire. Dendrites collectthe information at junctions with other cells’ projections. These junctions are called synapses.Some synapses have a greater efficacy in eliciting action potentials than other synapses receivingthe same stimulation. Which synapses a neuron regards as more important than others is not fixed.Synapses grow and shrink in importance as a function of the activity they are able to elicit(Hebb,1949). This dynamic process is the basis of neural modeling.

The Visual System: The Retinofugal Pathway

This work will deal entirely with vision as a representative neural system. In the visual system, theeye focuses and collects light from its environment. The coded input leaves the eye via the opticnerve, and heads for the lateral geniculate nucleus (LGN), a small area of the thalamus (Fig. 1.2).From the LGN, the information is relayed to the primary visual cortex, located in the hind-brain1.The visual cortex processes visual input and then sends it off to other areas of the brain for higherorder processing and, ultimately, visual perception.

Raw visual input consists of a pixelated mesh of photons in varying intensities and wave-lengths. We perceive variation in intensity as level of brightness and variation in wavelength ascolor. The intensity and wavelength data collection takes place in the retina, three layers of cellsat the eye’s rear. Considering the visual acuity that the eye is capable of2, a strict bitmap of the

1We must note the curious fact that 80% of LGN innervation is actually feedback from the visual cortex(Bear et al.,1996). There is currently no explanation for the role of this feedback. This observation is, however, not importantto our model.

2Humans, by no means the most visually gifted animals, can recognize letters that subtend 5 minutes of arc(Bear

2


visual field

left eyeretina

optic nerve

left LGN

primary visual cortex

Figure 1.2: The visual pathway (adapted from: Bear, Connors, Paradiso, Neuroscience: Exploringthe Brain).

visual environment constitutes an enormous amount of data. In normal visual environments, likenatural scenes, this mass of data is highly structured. It is the visual system’s job to pick outthe structure from the mass of information. Higher order visual processing enables us to recognizeobjects and plays a large role in our perception of our surroundings.

The process of identifying the structure of the environment takes place at many levels inthe visual pathway. It begins as early as the retinal stage. The retinal layers are organized in ahierarchical fashion. The first layer, the photoreceptors (rods and cones) are tightly packed at theback of the retina. Bipolar cells occupy the next layer. They receive direct input from a smallcluster of photoreceptors in addition to receiving indirect input, by means of horizontal cells, fromphotoreceptors surrounding the cluster (Fig. 1.3). The area of receptors that, when stimulated,will influence the bipolar cell is referred to as the receptive field. The particular architecture ofthe bipolar cell–photoreceptor interaction is a center-surround receptive field. Stimulation at thereceptive field center invokes a response opposite to that of stimulation of the periphery. Thereare two types: ON-center and OFF-center. An ON-center receptive field, for example, will yieldmaximal stimulation when light is cast on the center and the surround is dark. There will be lessof a response if the whole field is illuminated, and there will be merely spontaneous firing if thereis no illumination (Fig. 1.4).

1.2 Some Properties of the LGN and Visual Cortex

This thesis will attempt to explain the development of four properties of the visual system whichhave been observed prenatally. To simplify matters, we treat these properties as arising from feed-forward interactions. That is, we try to explain them without reference to lateral connectivity(networking) or feedback mechanisms.

et al., 1996).

3


Figure 1.3: A bipolar cell’s receptive field (Bear, Connors, Paradiso, Neuroscience: Exploring theBrain).

Figure 1.4: An off-center ganglion cell, for example, will respond only spontaneously to light, morestrongly to dark, and most strongly to darkness localized at the center of the receptive field (Bear,Connors, Paradiso, Neuroscience: Exploring the Brain).

4


0 50 100 150 200angle

resp

onse

Figure 1.5: Shown is a gray-scale of an orientation selective receptive field. At right is the corre-sponding tuning curve, which plots the response of the receptive field to bars of light at variousorientations. This particular receptive field is selective for stimuli with an orientation of 20 degrees.Note the sharp peak of the tuning curve; there is almost no response from bars oriented at anglesmuch different from 20 degrees.

Cortical Orientation Selectivity

Visual cortex cells respond strongly to oriented bars of light presented at a specific orientations andweakly to stimuli of different orientations(Hubel and Wiesel, 1962). This phenomenon, known asorientation selectivity, can be illustrated with a “tuning curve.” The curve is a plot of the response(firing rate) of a cortical cell as a function of the angular orientation of the stimulus (Fig. 1.5).

Orientation selectivity is an important property of the visual cortical cells. It facilitatesobject recognition. We can tell when there is an object in front of us, and we know where theobject stops and open space starts because objects have edges. The orientation selective cells pickout these edges.

LGN Properties

Localized Selectivity in the LGN

The connections from the retina to the LGN are localized. That is, a mature LGN cell is influencedby one, or, at most, a few retinal cells3(Mastronarde, 1992; Sherman and Koch, 1990).

The Retinogeniculate Map

The LGN is characterized by a retinotopic map. That is, it is organized in such a way thatneighboring LGN cells receive input from neighboring retinal cells(Voigt et al., 1983; Sanderson,1971).

Eye-Specific Layers

Another property of the mammalian visual system that this work will address is the presence ofeye-specific layers in the LGN. In adult mammals, the ganglion axons from the eyes are segregated

3In other words, the LGN is, for the most part, a passive filter.

5


such that a given layer of LGN will be selective for input from only the left eye or only the righteye(Shatz, 1983).

1.3 The Problem

It has been shown that visual experience plays a significant role in shaping postnatal corticaldevelopment. Kittens reared in the dark (binocular deprivation) for more than three weeks loseresponsiveness in a large portion of their cells(Levental and Hirsh, 1980). Short-term binoculardeprivation results in a reduced orientation selectivity. Monocular deprivation studies, in whichone eye of a normally reared kitten is deprived of patterned input, result in an ocular dominanceshift, that is, cortical neurons that were responsive to one eye become responsive to the other eyeinstead(Blakemore, 1976; Wiesel and Hubel, 1965). Normal ocular dominance columns do notdevelop. Instead, the cortex becomes less responsive to stimulation of the deprived eye. Thus, weknow that visual experience is necessary to maintain the properties that we are interested in.

Previous work has led to an understanding of how orientation selectivity is refined in normalvisual experience(Law, 1994). A goal of this research is to extend our understanding to includeprenatal development. What influences the initial development of orientation selectivity observedby Hubel and Wiesel in new-born kittens? We would also like to understand the development ofthe LGN properties described above. Since all these properties develop in a kitten before it opensits eyes, an understanding must include a study of the prenatal visual environment. Table 1.1summarizes the development of the visual system in the cat. It serves as the general frameworkwithin which all our studies must fall.

Our starting point is that we presume these properties to be activity dependent. In kittens,the process of segregation normally takes place between embryonic day 45 (E45) and birth (E65). Ithas been shown that intra-ocular infusion of TTX4 during this period will prevent the segregationfrom taking place.

Thus, in the fetus, long before the onset of vision, spontaneous action potential activityis likely to be present in the visual system and to contribute to the segregation of theretinogeniculate pathway(Shatz and Stryker, 1988).

Further evidence that spontaneous activity is important includes earlier observations that oculardominance column formation can also be prevented with intra-ocular infusion of TTX, but not bydark-rearing(Stryker and Harris, 1986). The only difference between these two cases is that thedark-rearing does not abolish the spontaneous activity(Shatz and Stryker, 1988). Hence, there issomething quite important about this spontaneous activity.

1.4 The Prenatal Visual Environment

To study our problem, therefore, we will need both an understanding of the spontaneous activityas well as an understanding of how this activity influences the system.

We know that the elements of the visual system are influenced by the structure of theinput environment in an activity-dependent manner. After eye-opening, the input is that of thevisual world. It is not clear what constitutes the structure of the prenatal visual environment.

4Tetrodotoxin (TTX) is a Sodium channel blocker. It is used to prevent action potentials.

6


Days Event in Cats19 Retinal ganglion cells (RGCs) present22–32 Lateral geniculate nucleus (LGN) neurogenesis26 Axons leave LGN32–35 RGC axons project to LGN36 LGN axons project to cortex39 Functional retinogeniculate synapses40 Peak numbers of RGCs45 Horizontal and amacrine synapses52 Retinal waves appear(Meister et al., 1991)56 Conventional synapses onto ganglion cells(Maslim and Stone, 1986)60 LGN axons invade cortex60 LGN cell lamination complete65 Birth71–73 Photoreceptors mature71–74 Cell lamination complete75 Eye opening75 Orientation columns in cortex75–85 Highest synaptogenesis in retina79 LGN dendritic growth86 Direct LGN connections to layer IV of the visual cortex95 Lateral synapses mature

Table 1.1: Chronology of important events in the development of the visual system in the cat(adapted from Haith, 1998; additions individually cited).

7


We have several ideas of what it could be like. This research is, in essence, an exploration ofseveral proposals for what the prenatal visual environment is like. We model the environmentsmathematically and then present them to a simulated neuron in the hopes that one or more ofthese simulated environments will elicit responses in the neuron that are in accordance with theexperimentally observed phenomena detailed above.

Since there is no external source of input prior to photoreceptor development and eye-opening,all activity in the prenatal visual environment must ultimately stem from spontaneous firing. Wepresume the random, spontaneous firing to be unstructured. The structure in the environmentmust come from neuronal interactions initiated by the spontaneous firing. This can include eitherprocessing effects along the retinofugal pathway or interactions of retinal, geniculate, or cortical cellswith their neighbors. The three main approaches we take to modeling the prenatal environment arestudies of retinally processed noise, an instance of processing; retinal waves, an instance of lateralinteractions in the retina; and correlated noise, a general model incorporating several properties weknow the environment must have.

Retinally Processed Noise

Law(Law, 1994) proposed an explanation for the prenatal development of orientation selectivity.He assumed the output to the ganglion cells to be essentially Gaussian noise5. This noise may beprocessed by the retina or LGN in the same way that visual input is processed in mature eyes andthat this processing may grant appreciable structure to the input.

Correlated Environments

As opposed to the other two approaches, in which we model known physiological processes, in thecorrelated environment approach we make a direct assessment of the structure that we think maybe present. We build on the work of Miller who proposes that the prenatal visual environmentmay be dominated by correlated spontaneous activity(Miller et al., 1989). Although no one hasmeasured what the correlation function looks like, there are a few properties that we know it shouldhave. We explore various correlation functions that exhibit these properties.

Retinal Waves

Our final study of the structure in the prenatal visual environment is a simulation of retinal waves.In the late eighties, bursts of temporally correlated action potentials were measured in the fe-tal rat retina using extracellular electrodes in vivo(Galli and Maffei, 1988). Retinal waves havesubsequently been studied in higher mammals, such as kittens. These rhythmic bursts of highlystructured activity(Wong, 1997) occur during the period of retina-LGN connection refinement. Itis known that the retinogeniculate synapse is extremely efficient at transmitting these potentialbursts(Mooney et al., 1996). Thus, retinal waves are an excellent candidate for explaining thedevelopment of the LGN properties discussed above.

Retinal wave properties can be measured with fura-2, a fluorescent calcium indicator thatgives an indirect measure of membrane depolarization(Feller et al., 1996). The waves start smalland then propagate with no preferred direction(Feller et al., 1997) to cover a finite domain. An

5Note that this activity does not come from the photoreceptors, which are yet active(Friedlander and Tootle,1990).

8


interesting property is that neighboring waves do not invade each others’ borders. As a result, thewave boundaries are quite ragged. This seems to imply some kind of cellular refractory period.

We must also note that the wave activity does not occur continuously. In the ferret, forexample, a burst will last two to eight seconds. It will then be followed by a long period ofinactivity(Wong, 1997).

9

CHAPTER 2. METHODS

Chapter 2

Methods

There are two essential components to solving the problems before us. Firstly, we must understandhow the cells receive and interpret stimuli. We model these processes mathematically. Once wehave a model of the neuron and its behavior, we can apply the model to simulations of the prenatalvisual environment. The second component of the problem, therefore, is to characterize the natureof such an environment.

2.1 Modeling Neural Behavior

The Model Neuron

Figure 2.1 summarizes our model neuron. The typical cortical neuron has about a hundred LGNinnervations (dendrites). Along each of the dendrites travels a series of action potentials, alsoreferred to as a spike train. The spike trains carry coded information. For our purposes, we dealonly with the amount of activity, that is, the rate of action potentials incident to the cortical cell(over a discreet time interval). There is a certain small baseline of spontaneous activity. Spiketrains significantly above this baseline are highly active, while trains at or below the baseline arerelatively inactive. We denote the input to a given cortical cell by the vector, x. Each componentof x represents the rate of firing along a single dendrite.

y

...

w

w

...

x

x

xw

n

2

11

2

n

Figure 2.1: The model neuron. The inputs, xi, are collected by the dendrites. Each input is scaledby the corresponding weight, wi, and contributes to the output, y of the cell.

10

CHAPTER 2. METHODS

σ(⋅)

Figure 2.2: A typical sigmoid. Note the plateaus for small and large values of x.

The job of the cortical neuron is to collect the information from all the dendrites and “decide”whether or not to fire1. The cell, of course, does not treat all dendrites equally. Input from someplaces are more important to a given cell than input from others. We model this by defining aweight at each synapse. We denote the collection of synaptic weights by the vector, w. Strongerweight values are assigned to synapses that have a greater efficacy in eliciting a response in thetarget cell.

We can thus define the output, y, of a cortical neuron with the Activation Equation:

y = w · x (2.1)

That is, the activity is the sum of the products of the weight and the input at all synapses. Thecell will fire if and only if the output is above a critical level. This linear relationship between theinput and the output is a good approximation for small values, but does not tell the whole story.There are certain minimum and maximum values for the output. When the dot product reaches acertain level, the cell saturates, and any additional input has no effect. We take this into accountby applying a sigmoid. The sigmoid, typically a hyperbolic tangent, is a one-to-one function thatis approximately linear around zero and has plateaus for small and large values (Fig. 2.2). Thuswe have

y = σ(w · x). (2.2)

Figure 2.3 summarizes our model of information flow from the environment to the visual cortex.

Synaptic Modification

To successfully model neural behavior, we need to know how the cells’ properties change as afunction of activity. Such models are referred to as “learning rules.” They detail learning at thecellular level. For the most part, this thesis will refer to two such models: PCA and BCM. Bothmodels are examples of rules that find structure in the environment they are exposed to. Thestructures that the two rules find are often quite different from one another.

The starting point for modeling synaptic modification lies in Hebb’s observation that

When an axon in cell A is near enough to excite cell B and repeatedly and persistentlytakes part in firing it, some growth process or metabolic change takes place in one orboth cells such that A’s efficiency in firing B, is increased(Hebb, 1949).

Or, more simply, “neurons that fire together, wire together.” We note that the actual biologicalprocesses underlying this phenomenon are somewhat understood for the case of NMDA-dependentlong-term potentiation (LTP) in the hippocampus(Bear and Malenka, 1994).

1This is referred to as spatial integration. The accompanying phenomena, known as temporal integration, is implicitin our treatment of the input as a time average of activity.

11

CHAPTER 2. METHODS

cellcortical

plane

left

image

left

retina

LGNright

LGN

rightretina

.σy= (w x)

wx

Figure 2.3: The model architecture of the first stages in the retinofugal pathway. Retinal cells collectinformation from the environment through their center-surround receptive fields. The informationis relayed in a one-to-one manner to the LGN. An LGN cell, in turn, relays the information to acortical cell. The input, x, is weighted at the synapse with a weight, w. The cortical cell’s resultantactivity level is a sigmoided dot product of the weight and input vectors.

12

CHAPTER 2. METHODS

A simple mathematical formulation of Hebb’s observation is

w = yx. (2.3)

This is called a Hebb rule. The problem, of course is that there is no way for the weights to decrease.This rule implies that the weights continually grow. They would eventually saturate, thus renderingthe cell unable to carry information. Various ways of controlling this problem have been proposed.Some include adding terms to the equation that explicitly normalize the weights. However, it is notvery realistic to assume that the cell continually provides each synapse with information regardingall the other synapses.

Oja, proposed the rule

w = yx− y2w. (2.4)

This is the PCA rule(Oja, 1982). As we shall see, the extra term has the effect of normalizing theweights, but it relies entirely on local information. Increasing the weights at some synapses resultsin a decrease at other synapses. Put another way, there is a spatial competition between the inputs.

The BCM rule takes the form

w = φ(y, θ)x (2.5)

where φ is a scalar function of the activity. It is negative below a particular output value andpositive above it as shown here:

y

φ

0θ

This zero-crossover value, θ, is referred to as the modification threshold. The effect of the phifunction is to drive the weight vector in the direction of the input vector when the output is large(above the threshold) and in the opposite direction when the output is small (below the threshold).

A constant value of the threshold would lead to the same difficulty we have with the Hebbrule: a weight vector that responds strongly to all input patterns will grow in magnitude. Theincreased weight vector will lead to stronger activity, which will, in turn, cause the weight vectorto continue to grow indefinitely. There will be instability. For biological plausibility, we mustincorporate some form of negative feedback. One way to do this is to construct the φ function witha sliding threshold. That is, θ must grow with the activity. More precisely, it must grow fasterthan the activity. Otherwise, it would not catch up to the activity growth. Thus, we take θ tobe a super-linear function of the neuron’s output. There are many ways to do this. To facilitateanalysis, this thesis will use the quadratic form of the BCM modification equation(Intrator andCooper, 1992):

w = ηy(y− θ)xθ = Eτ

[y2]

(2.6)

13

CHAPTER 2. METHODS

Here φ is a parabola and θ is a windowed time average of the squared activity. Again, to permitanalysis, we use the following exponentially weighted windowing function:

θ =1τ

∫ t

−∞e−(t−t′)/τy2(t′)dt′ (2.7)

Discrete Learning Rules and Simulations

In order to study models of synaptic plasticity in simulations, it is necessary to have discrete versionsof the models. Breaking up a continuous function for weight modification into discrete steps allowsus to cast our models as iterative algorithms, well-suited for computer simulation. Each iterationcorresponds to a small slice of time. The discrete BCM equations (see Blais, 1998 for derivation)are

yn = wnxn

θn+1 = θn +1τ

(y2n − θn

)(2.8)

wn+1 = wn + ηyn(yn − θn+1)xn.

Here, n is the iteration number. Note that the learning rule (i.e. the weight modification equation)is applied to each element of the weight vector at each iteration. The discrete PCA equations(Wyattand Elfadel, 1995) are

yn = wnxn

wn+1 = wn + ηyn(xn − ynwn). (2.9)

We note that it is not exactly clear how iteration time relates to actual time. That is,we cannot be sure as to whether an iteration can be interpreted as a constant time interval (inseconds) or as a time-independent period of input collection. For instance, we cannot be sure ifthe synaptic “clock” ticks in the absence of activity. However, these details are unimportant forour purposes. We can estimate an iteration to be approximately equivalent to 200ms(Blais, 1998),with the caveat that this estimate should only be used to facilitate rough comparisons of simulationtime with biological development.

We program our simulations in C and compile and run the code on Sun workstations runningSolaris. We then use Matlab to analyze our results. A weight vector is considered to have convergedto a stable fixed point when a plot of the activity (measured by θ) as a function of iteration numberhas leveled off—that is, when θ is stable.

2.2 The PCA Learning Rule

Analysis

The PCA rule takes its name from the mathematical technique of principal component analysis.In order to understand PCA, we must first introduce the correlation function.

The correlation function is defined by

C ≡ E[(x− µ)(xT − µ)

](2.10)

14

CHAPTER 2. METHODS

[ 0 .04C = C = [.77 .42.42 .28

[

C = [-.41 .28.77 -.41[1 0 [

Figure 2.4: Some two dimensional examples: The first distribution has zero two-point correlation—information about the x-component tells us nothing about the y-component. The correlation matrixis diagonal. The second distribution has positive two-point correlation—a large x value tells us thatwe should expect a large y value. The correlation matrix has positive off-diagonal elements. Thethird distribution has negative two-point correlation—a large x value tells us that we should expecta large negative y value. The correlation matrix has negative off-diagonal elements.

where E [ψ] denotes the expectation (i.e. the average) value of ψ, and µ is the mean of the inputvector, x. For an environment that is symmetric about the origin (as ours generally will be),Equation 2.10 reduces to

C = E[xxT

]. (2.11)

That is, it is a matrix whose (i, j) element is the average value of the product of xi and xj takenover the entire input environment.

Figure 2.4 shows some example two dimensional distributions and their correlation matrices.Note that the correlation matrix is symmetric. The diagonal elements are the average squares of thecomponents of the vector. Because the mean is zero, these are just the single-point variances2. Theoff-diagonal elements of the matrix are the covariances of the data set. Specifically the element inrow i, column j of C is the covariance of the i-th and j-th element of the vector x. The covarianceis zero if the points are uncorrelated, positive for positively correlated points, and negative fornegatively correlated points. Put simply, a non-zero correlation between two data points meansthat information about one point gives us information about the other point.

With this background, we can analyze the PCA learning rule:

y = w · xw = yx− y2w

We want to look for the fixed points of the weight vector. These are the weight values for whichthere is no average time change in the weight vector due to continued training on the input vectorsfrom the environment. In other words, we are looking for the weight vectors that stay constant ina given environment. Thus:

0 = E [w]2Recall that the variance is σ2 ≡ E

[x2]−E2 [x] . For a zero mean, the second term is zero

15

CHAPTER 2. METHODS

= E[yx− y2w

]= E

[x(x ·w)− (x ·w)2w

]= E

[(xxT)w − (wTxxTw)w

]= Cw− (wTCw)w

The entity, wTCw, is just a scalar, so we can make the substitution, λ ≡ wTCw. This gives us

Cw = λw (2.12)

Equation 2.12 is an eigenvalue equation. It tells us that the stable PCA weight vector is aneigenvector of the correlation matrix with corresponding eigenvalue, λ. The particular eigenvectoris the one with the largest eigenvalue (Appendix A). We note that

λ = wTCw = wTλw = λ |w|2

⇒ |w|2 = 1.

That is, the PCA final weight vector is always implicitly normalized to one.This simple result means that PCA is very well understood. Oftentimes, we use the PCA

rule to study an environment since we know what structures the rule picks out.

Low Dimensional Examples

To gain an intuition for the models, we examine simulations in low dimensional input environments.Note that the dimensionality refers to the number of input channels (i.e. the number of modelsynapses). The two dimensional case is quite instructive, because we can easily plot the input andweight vectors with the two channels as our axes. Studying the convergences in these cases givesus an intuition for the type of structure the learning rule is looking to pick out of the environment.For the case of PCA trained in input environments with a large sample space, this structure is theprinciple eigenvector of the correlation matrix.

Let us first study an input environment consisting of two input vectors. Figure 2.5 shows thetwo stable weight vectors. Note that they are oriented in such a way as to have the largest possiblesquared activity. That is, they are quite close to lying parallel to the stronger of the input vectors.

It is often convenient to think of the final weight vector in terms of its output distribution.The output distribution is a histogram of {y} = w · {x} with y the activity3, w the final weightvector and {x} the set of all input vectors. The output distribution for the two point example isshown in Figure 2.5.

To study the behavior of PCA in an environment with a larger sample space, we consider aGaussian cloud with a greater variance in one direction than the other (Fig. 2.6). The weight vectorconverges to orient along the direction of greatest variance. This is the direction of the principlecomponent. The squared activity is maximal along this direction.

3Note that the activity we are analyzing here is the pre-sigmoid output.

16

CHAPTER 2. METHODS

x1

x2

w2

w1

−0.5 0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

P(a

ctiv

ity)

activity

PCA

Figure 2.5: The PCA fixed points in a two point environment (left) and the corresponding outputdistribution for one of the fixed points.

Figure 2.6: PCA trained on a Gaussian cloud with a large variance in one direction and a smallervariance in the other.

17

CHAPTER 2. METHODS

x1

x2

w2

w1

−0.5 0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

P(a

ctiv

ity)

activity

BCM

Figure 2.7: The stable BCM weight vectors resulting from training in a two dimensional environ-ment consisting of two equally probable elements (left). Note that the weight vector orients in sucha way as to lie orthogonal to one of the inputs. The result is a zero output every time the weightvector sees that input. We see this in the output distribution (right).

2.3 The BCM Learning Rule

Analysis

Analysis of BCM is more difficult. We can think of the BCM fixed points as maximizing a functionof the activity, y = x ·w. The form of BCM that we have been using maximizes a particular “costfunction.” This cost function is a measure of multi-modality of the distribution of y(Intrator andCooper, 1992). In other words, one can show that a BCM trained weight vector will converge in sucha way as to yield more than one typical value for y. When the input environment is not well-suitedto a bi-modal activity distribution, we will see that another property, the kurtosis, is maximized.These properties of BCM can be contrasted to PCA, which we can think of as maximizing theaverage square activity(Blais, 1998).

Low Dimensional Examples

We begin with training BCM in a two dimensional environment that consists of just two inputvectors. Independent of where we initialize the weight vector to point, it will converge to lieorthogonally to one of the input vectors (Fig. 2.7). Since the output, y, is the scalar product ofthe weight vector with the input vector, the orthogonality means that one of the input vectorscontributes nothing to the output.

Extending this input environment to consist of two clusters of input vectors gives a similarresult. When we train in this environment, randomly selecting an input vector from the input spaceat each iteration, the weight vector will converge to be orthogonal to the center of mass of one ofthe clusters. The output distribution differs from that of the two-input environment in that it isdelocalized. However, it is still clearly bimodal (Fig. 2.8).

This tendency to develop a multi-modal output distribution can be shown to be a gen-eral property of BCM in environments that consist of N linearly independent inputs(Intrator andCooper, 1992). But real neurons are not necessarily exposed to such linearly independent envi-ronments. Most of the time, visual neurons will be presented with input from natural scenes. Innatural scenes we have occlusions (for example), where one object lies in front of another, thus

18

CHAPTER 2. METHODS

prob

abili

ty d

ensi

ty

activity

Figure 2.8: BCM trained on an environment consisting of two clusters of points (left). BCM’sbehavior in this environment is the same as in the two point environment. The converged weightvector is orthogonal to the center of mass of one of the clusters. This gives a bi-modal outputdistribution (right).

blocking parts of the second object. Occlusion is not a linearly separable process (although it maybe approximately so). Thus, we can not always expect BCM to develop a multi-modal outputdistribution.

When we analyze BCM in natural scenes we find that the output distribution is, indeed, notmulti-modal. Instead, it looks very much like a Laplace distribution:

P (w · x) =1

2λe−|w·x|/λ

It turns out that in environments that do not contain clusters, BCM will pick out “kurtotic pro-jections”(Blais, 1998). This can be reconciled with the bi-modality result in that in general BCMconverges in such a way as to have sparse output distributions. It is sensitive to outliers.

Kurtosis is a measure of how spread out the tails of a distribution are (Fig. 2.9). A uniformdistribution has no tails. There is exactly zero probability for any value beyond a well-definedrange. The Gaussian distribution falls off exponentially with x2. Its kurtosis is defined to be zero.The Laplace distribution falls off more slowly, thus its tails are more extended. It has a positivekurtosis.

We can study the kurtosis seeking property of BCM in low dimensional environments aswell. Consider the Laplace distribution in one dimension (Fig. 2.9). Its extended tails tell us thatsome elements of the distribution will be large. The probability of occurrence of such an outlieris, however, small. As a result, in two dimensions, we see almost no points that are large in bothdimensions4. Thus, we see “wisps” along the axes of the distribution (Fig. 2.10).

We observe that when we train BCM on the two dimensional Laplace distribution, the finalweight vector will always orient along one of the wisps of the distribution. This is the case evenwhen the distribution is rotated with respect to the coordinate system. With such an orientation,the weight vector has maximized the range of components in the direction of the weight vector.Thus, we have the widest range in w · x, that is, the widest possible output distribution.

Since the multi-dimensional Gaussian distribution is spatially symmetric5, there is no direc-4The probability of a point having two large components would be the square of a small probability.5In two dimensions the probability density, P (x)P (y)dxdy ∝ e−x

2/σ2e−y

2/σ2= e−(x2+y2)/σ2

= e−r2/σ2

, has noangular dependence.

19

CHAPTER 2. METHODS

−5 0 50

0.2

0.4

0.6Uniform: K < 0

−5 0 50

0.2

0.4

0.6Gaussian: K = 0

−5 0 50

0.2

0.4

0.6Laplace: K > 0

Figure 2.9: Shown are the uniform, Gaussian, and Laplace distributions, each with unit variance.The Laplace distribution, with its extended tails has positive kurtosis; the Gaussian distributionis defined to have zero kurtosis; and the uniform distribution, for which their is zero probabilityoutside a small range, has positive kurtosis.

prob

abili

ty d

ensi

ty

activity

Figure 2.10: When we train BCM on a two dimensional Laplace distribution (left), the weightvector orients along one of the wisps. Doing so maximizes the kurtosis of the output distribution.

20

CHAPTER 2. METHODS

Figure 2.11: When trained in an environment defined by a multi-dimensional Gaussian distribution,a BCM weight is free to wander about a fixed point circle.

tion of maximum kurtosis. BCM has no stable fixed points in such an environment (Fig. 2.11).The weight vector grows to reach a fixed point circle. When the length is stable, the vector is freeto change its angular orientation.

One final example of a two dimensional study involves an environment consisting of a bandof input vectors (Fig. 2.12). This is the Gaussian cloud in which PCA has simple fixed points in thedirection of maximum variance. Note that the majority of the input vectors in this environment aresignificantly stronger along one direction than along the other. If we animate the BCM weight vectoras it evolves, we see that it initially shoots out to align with the direction of greater variance. Ratherthan remain pointing in that direction, it starts to drift, just as in the case where the variancesare equal. Figure 2.12 shows the weight vector at several stages in its development. One can showthat the fixed points for this situation actually lie on an ellipse. We need to understand why, if thefixed points lie on an ellipse on which all points are equally valid, does the weight vector alwaysinitially shoot out along the direction that the distribution has the greater variance. The simpleanswer is that this is the direction that gives the most activity. Since there is more input in thatdirection, the component of the weight vector in that direction is more strongly reinforced than theother component. Thus it grows the fastest. We will refer back to this example, as it turns out tobe an instance of a quite general phenomenon.

2.4 Training on Natural Scenes

Once we have a good feel for the properties of our learning rules, we would like to test them inmore realistic input environments—on images from natural scenes6. For our natural scenes weuse photographs from Lincoln Woods State Park that were scanned as gray-scale images. Herethe images are stored as square arrays of numbers. Each number represent a pixel’s relative lightintensity. When we view the arrays, the brightest regions are white, the darkest are black, andthose in between are shades of gray. The numerical intensity values correspond to photoreceptorspike trains, as the receptors fire in proportion to the intensity of light received.

Recall that the inputs in our models refer to the information relayed through the LGN. Thus,prior to training we replicate retinal pre-processing. Figure 2.13 summarizes the process. To model

6Natural scenes are preferred because man-made objects tend to have very rigidly defined borders (compare abuilding to a tree), and we do not want to unfairly bias the images.

21

CHAPTER 2. METHODS

Figure 2.12: A BCM weight vector trained on a Gaussian cloud with a greater variance in onedirection that the other. The weight vector initially shoots out along the direction of highervariance and then is free to wander along the fixed point ellipse.

the ON-center retinal receptive fields, we apply a difference of Gaussians7 filter to the images(Lawand Cooper, 1994). The DOG function takes the shape of a “Mexican hat”. The peak in thecenter corresponds to the ON-center, and the dips beyond the center correspond to the inhibitoryOFF-surround of the receptive fields. The two dimensional filter is the Mexican hat rotated aroundthe vertical axis. To filter, we perform a two dimensional convolution of the filter with the image8.Compare the sample natural scene image to the DOG filtered version. Note that the filter has theeffect of sharpening the edges and smoothing out the image.

Once we have the image as seen by the LGN, we would like to use it as the input environmentin which to train our model neuron. Our images are typically 256× 256 pixels. A cortical neurontypically has on the order of a couple hundred input synapses. So, for our input vectors we usesmall “patches” from the natural scene images, on the order of 13×13. We apply a circular “mask”to the patches before using them in calculations. This is to nullify any structure we would beimposing on the images were we to use square patches. The patches are now our input vectors, andwe train on them the same way as in the low dimensional case. The patch vectors are of length n2,where n is the length of the patch. That is, each pixel value is a component. Each component (orsynapse, if you will) is assigned a corresponding weight, and just as in the low dimensional cases,the output of the cell at any point is given by y = σ (w · x)9. At each iteration a patch vectoris chosen at random from any of a dozen images, and the weights are modified according to thelearning rule to be studied.

7A difference of Gaussians (or DOG) is a function constructed by subtracting a Gaussian of high variance from a

Gaussian of low variance. Typically, we use standard deviations of 1 and 3: f = e−r2/12− 1

32 e−r2/32

where r is thedistance from the center of the filter, and f is the value of the filter.

8This has an important computational effect; it allows us to use a Fast Fourier Transform which greatly speedsup the algorithm.

9Note that for PCA, the activity is not sigmoided.

22

CHAPTER 2. METHODS

Retina RF

+++

−

−−

−

Difference of Gaussians

Original Images

Retinally Processed Images

Figure 2.13: Natural scenes (top) are processed with a DOG filter (middle, left). We train ourmodel cortical neurons on the resulting processed images (bottom). The filter values are that of atwo dimensional difference of Gaussians function—positive (excitatory) in a small central region,negative (inhibitory) in the surround, and zero outside. A one dimensional difference of Gaussiansis shown (middle, right).

Figure 2.14: A PCA weight vector trained on natural scenes.

PCA

Training PCA in a natural scene environment leads to an orientation selective weight vector(Fig. 2.14). It can be shown to have a simple cosine-like behavior(Shouval, 1994). This is ageneral result for PCA in radially symmetric environments.

BCM

The results from training BCM on natural scenes are striking. When we start with a set of randomlyinitialized weights, we end up with a stripe of high weights, against a background of low weights(Fig. 2.15). That is, the model BCM neuron learns to be orientation selective. We measure theselectivity with a tuning curve10.

10In practice, we present a sine grating as input to the weight vector. For many angular orientations, we determinethe maximal activity over all frequencies and phases the sine grating may take. We then plot these maximal activities

23

CHAPTER 2. METHODS

Initial Final

0 20 40 60 80 100 120 140 160 180angle

resp

onse

Figure 2.15: In training BCM on natural scenes, randomly initialized weights (top left) converge toan orientation selective weight vector (top right). Also shown is a tuning curve of the final weights(bottom).

This convergence may at first seem surprising. How does BCM pick out weight vectors withspecific orientations? We get some insight into this question, by looking at what the neuron actuallysees—the patches (Fig. 2.16). Notice that many of the patches are oriented bars of light. Thisindicates that a large portion of our visual environment is defined by edges and object boundaries.

The natural scene environment is an instance in which BCM finds kurtotic projections(Fig. 2.17). That is, a natural scene environment contains a small number of patches that alignwith a given converged, orientation selective, BCM weight vector. These input patches elicit alarge response in the model cell. If we remove from the input environment the patches that givethe strongest 1% of responses and continue to train using a converged BCM weight vector, thereceptive field will change(Blais et al., 1998). Removing these patches eliminates the kurtosis of

as a function of angle.

Figure 2.16: A sample natural scene and some patches taken from it. Note that many of the patchesconsist mainly of oriented bars of light.

24

CHAPTER 2. METHODS

Example Receptive Field

−20 0 200

0.05

0.1

0.15

0.2

ActivityP

(act

ivity

)

Output Distribution

−30−20−10 0 10 20 3010

−5

10−4

10−3

10−2

10−1

100

Activity

Output Distribution

natural scenedistribution distribution

gaussian

Figure 2.17: Shown is an example BCM weight vector that was trained on natural scenes (left) andthe corresponding output distribution on a linear (center) and log (right) scale.

the weight vector’s output distribution, thus making it unstable.

2.5 A Model Synthetic Environment

Having verified that the BCM model is sufficient to explain experience-dependent development oforientation selectivity, we now turn to other input environments. Our ultimate goal is to test BCMwith input structured like the pre-eye-opening visual environment.

The pre-eye-opening environment is dominated in large part by spontaneous activity, ornoise. Thus, one approach to creating our synthetic environments will be to impose various formsof structure on noise. As a first instructive example, let us consider an input environment structuredto look like orientation selective final weight vectors.

To create such an environment, we start with a 169 × 20, 000 array of Laplace distributednoise. We use the Laplace distribution since we know from our study of natural scenes and lowdimensional cases that BCM seeks out kurtotic projections. We then rotate the array with arotation matrix whose columns are orthonormal, orientation selective, weight vectors obtainedfrom training on natural scenes (see Fig. 2.18 for a two dimensional explanation of this process).This has the effect of pointing the wisps of the distribution in the direction of the rotation columns(in 169-space), that is, in the direction of the selective weight vectors.

The columns of the rotation matrix were obtained by training BCM neurons on naturalscenes and occasionally subtracting off the weights’ projections onto all previously collected rotationcolumns11. The final weights were normalized before recording. This Gram-Schmidt procedure wasnecessary to ensure that we were collecting distinct vectors for our rotation matrix. We wanted toguard against biasing the environment from either multiple copies of a single vector and/or certainweight vector having larger norms than others.

As an aside, it is interesting to note that this procedure also tells us about the naturalscene environment we used in our studies. We can liken the weight-space to a stretched rubbersheet. Using the natural scenes is like scattering a collection of balls of various sizes—some bowlingballs, some baseballs, etc.—onto the sheet. Initializing the weight vector is like placing a marblesomewhere on the sheet. Normally, the marble will roll to the dominant structure wells—thebowling balls. Applying the Gram-Schmidt procedure is like removing the bowling balls. Doing so

11This is just the Gram-Schmidt procedure for orthonormal bases from Linear Algebra.

25

CHAPTER 2. METHODS

R* =

w2w1

R = [w1 w2]

Figure 2.18: Sample orthonormal weight vectors (above) are put column-wise into a matrix, R.Multiplying a distribution by R has the effect of rotating it (below).

A B

Figure 2.19: Nine of the orthonormal orientation selective weight vectors that went into the rotationmatrix (A), and a converged BCM weight vector trained on the synthetic environment (B). Thisweight vector is almost identical to the second rotation column depicted.

26

CHAPTER 2. METHODS

allows us to see the less dominant structure of our scenes.Once we have a collection of orthonormal final weight vectors (Fig 2.19, left), we put them

into our rotation matrix and pad the right with zeros12. We rotate a matrix of Laplace noise, andthen add to the result a matrix of Gaussian noise. This last step is prevent our input space fromlying entirely within subspace of dimension equal to the number of final weight rotation columns;adding the Gaussian noise allows the weights to wander from this subspace if that’s what theychoose to do. We have

X = R · L+G (2.13)

where X is the environment matrix, R is the 169× 169 rotation matrix, and L and G are matriceswith 169 rows and a large number of columns of Laplacian and Gaussian distributed randomnumbers, respectively. The columns of X can now be used as input vectors on which we can trainBCM.

The column-space approach to matrix multiplication gives us another way to think aboutthis process. Each input vector is essentially a linear combination of the columns of the rotationmatrix13. Each column is scaled by multiplication with a Laplace random and then added to therunning sum. The nature of the distribution (with its highly kurtotic tails) is such that in thecalculation of each input vector, there is likely to be a few dominant Laplace numbers and, as aresult, a few dominant columns making up the input vector.

The weights resulting from training BCM on such an input environment (Fig 2.19, right)are what we would expect given the results from training in two dimensional uniform and Laplacedistributions. The weights converge to look like one of the rotation columns (in this case, thesecond one). To check that it is not just a superficial resemblance and the weights have actuallyconverged to one of the columns, we calculate the angle between the final weight vector and eachof the orthonormal columns. We observe that the weight vector is orthogonal to all but one of thecolumns. The one column that it makes a small angle with (always less than 30◦) looks like theweight vector. Thus BCM has picked out a direction of high kurtosis.

The above observation is dependent on the use of Laplace distributed noise. Consider ann-dimensional Laplace distribution. Its wisps define specific directions of structure that BCM isable to pick out. This can be contrasted with the Gaussian distribution which, as we saw in ourtwo dimensional examples, is spherically symmetric. When we rotate Laplace distributed numbers,we are selecting in which directions we want the structure to lie.

One may point out that input vectors selected from a matrix are not truly random, as thenumber of iterations in our runs were much larger than the size of the matrix. In order to testwhether this was as problem, we implemented a modified version of the program that would createa new environment every time the input vectors had been selected an average of ten times. Theresults were the same. The weights continued to converge to the same values when environmentswere changed.

Our model environment confirmed that BCM will pick out kurtotic structures even whentrained in our artificial noise-based environments. This confirmation allows us to proceed with ourstudy of correlated environments. The model environment also serves as a general framework forgenerating simple environments.

12We found that the maximum number of receptive fields we could cull from the natural scenes was about fifty,whereas the rotation matrix must have 169 columns.

13In two dimensions:

[a cb d

]·[xy

]= x

[ab

]+ y

[cd

]

27

CHAPTER 2. METHODS

2.6 Environments Defined by a Correlation Function

It has been suggested that the development of cortical selectivity prior to visual experience maybe due to correlated spontaneous retinal activity(Miller et al., 1989). Thus we investigated neuraltraining in environments defined by specific two-point correlations. Recall, for environments witha zero mean, the correlation function is given by

C = E[xxT

]. (2.14)

In constructing this type of environment, we want to start with a collection of random vectors,{x}, and somehow impose on them a structure such that Equation 2.11 holds for a given correlationfunction. If we take vectors of random numbers that are independently distributed with a variancenormalized to one, n, and define a matrix, M such that

MTn = x (2.15)

we have

C = E[xxT

]= E

[(MTn)(nTM)

](2.16)

Since E[nnT

]= I for uncorrelated random vectors with a normalized variance14, we have

MTM = C (2.17)

We can solve for M using the Cholesky decomposition15. We now have a method to create artificialenvironments with a given correlation function. Note the similarity of this method to the modelsynthetic environment. The input environment is determined by applying some type of mixingmatrix to vectors of noise.

An interesting application of our method is to calculate the correlation function of our nat-ural scene images and create an environment based on that. We reproduced results from earlierwork(Shouval, 1994) and determined that the correlation function of our natural scenes is approx-imately a log rule (Fig. 2.20):

C = −.172 · ln(r) + .838 (2.18)

Training BCM on an input environment of noise correlated according to this logarithmic correlationfunction did not lead to orientation selective final weights. This suggests that the BCM learningrule is dependent on more than second order statistics.

What is the form of the pre-eye-opening environment? We make a few assumptions regardingits properties:

1. We take the correlation function to be translationally invariant and radially symmetric. Thisgreatly facilitates the practical use of our method. As we can see from the calculations onnatural scenes, which we showed to be correlated as a function of distance, this is not anunreasonable restriction to place on a visual input environment. We will see in our discussionof retinal waves, which constitute an important facet of the prenatal visual environment, thatthey are characterized by a radial spreading of excitation. The translational invariance is astatement that there is no special, preferred region of activity.

14The diagonals of the correlation matrix are the single-point variances at each component, in this case, 1. Theoff-diagonal elements are the two-point correlations, which, for uncorrelated numbers, are zero.

15Also known as the LU decomposition.

28

CHAPTER 2. METHODS

100

101

0.2

0.4

0.6

0.8

1best fit: −0.172*ln(r) + 0.838

Figure 2.20: A logarithmic plot of the correlation as a function of separation distance for naturalscenes is linear.

2. The correlations should fall off with distance so that widely separated inputs are uncorrelated.There is no reason for input components from widely separated places to have any similarity,unless there are long-range connections. Such connections have not been observed(Orban,1984).

3. We must assume the particular form that the statistics of the environment take. We willstudy correlated noise, (i.e. activity that is ultimately random in origin). The activity be-comes correlated from structures within the visual system. We commonly use the Gaussiandistribution to model noise.

Normally, we train on 13 × 13 patches. Thus, we have a 169-dimensional problem. Thecorrelation function must be a 169×169 matrix. To arrive at the correlation function we must firstcalculate a matrix of distances. We define the distance matrix as the 169×169 matrix that containsthe distance between every pair of points in a 13× 13 patch. To construct the distance matrix weassign a label to each of the components of our 13× 13 patch (Fig. 2.21). We call the center (0, 0),and the left corner (−6,−6). The rest of the labels follow accordingly. This way, each component’scoordinate label tells us the horizontal and vertical distance from the center of the patch. We thensquash a labeled 13× 13 matrix into a one dimensional vector, stacking the 13 columns one on topof the next. This vertical vector defines the rows of the distance matrix. Its transpose defines thecolumns. For example, for element (50, 75) of the distance matrix, the vertical axis is the point(−4, 4)16 and the horizontal axis is the point (0, 3). Thus the distance element is:

R(50, 75) =[(−4− 0)2 + (4− 3)2

]1/2=√

17 ≈ 4.123

This is done at every point. The resulting distance matrix is shown in Figure 2.22.We tested several possible correlation functions that conform to the criteria listed above.

Most notably:

1. An inverse exponential: C = exp (−r/c), where r denotes the distance, and c is some constantradius (Fig. 2.23).

16The 50th element of the squashed patch is the 11th row of the 3rd column (11 + 3 · 13 = 50). Thus the coordinateis 4 left of center and 4 down from center, that is, (−4, 4).

29

CHAPTER 2. METHODS

-6,-6 ...-6,6 -5,-6 ...-5,6 -4,-6 ...-4,6 ... ... 6,6

-6,6

6,-6

...

6,6...

-6,-6 ...

... 0,0

13x13 Patch

-6,-6

-6,6

...

-5,-6

-5,6

......

-4,-6

-4,6

6,6

......

169 x 169DistanceMatrix

Figure 2.21: Outline of the procedure to define the distance matrix

50 100 150

50

100

150

Figure 2.22: The distance matrix. Note that the upper-right and lower-left corners of every 13×13subregion have large (white) values. This is because at these corners the horizontal and verticalaxes are at their opposite extremes.

30

CHAPTER 2. METHODS

50 100 150

50

100

150

λ=68.4 λ=18.6 λ=18.6

λ=7.7 λ=5.8 λ=5.4

Figure 2.23: The correlation matrix, C = exp (−r/c), and its first six eigenvectors. Note thatthe eigenvector with by far the strongest eigenvalue has all the weights constant. This is the DCeigenvector. It will not be present in a data set with a zero mean.

A B

Figure 2.24: Shown are patches of Gaussian noise (A) and patches from an environment defined bya correlation matrix (B). Note the structure in the artificial environment.

2. A Gaussian: C = 1σ2 exp

(−r2/c2

), where σ is the standard deviation.

3. A difference of Gaussians: C = 1σ2

1exp

(−r2/c2σ2

1

)− 1

σ22

exp(−r2/c2σ2

2

), where the standard

deviation of the positive part, σ1, is greater than that of the negative part, σ2.

4. An inverse: C = 1/r, interpolating to set C(r = 0).

We used the Cholesky decomposition technique to create noise environments correlated accordingto each of these possible correlation functions (Fig. 2.24). Then we trained on the synthetic en-vironment in exactly the same way we trained on natural scenes—sampling a small patch of theenvironment at each iteration.

31

CHAPTER 2. METHODS

DOG convolved noiseGaussian noise

Figure 2.25: Compare an image of DOG convolved noise (right) to unprocessed Gaussian noise(left). The filter used is a 2,6 difference of Gaussians (i.e. a DOG whose positive and negativeportions have standard deviations of two and six, respectively).

A B

Figure 2.26: Shown are patches of Gaussian noise (A) and patches from an environment of DOGconvolved noise (B).

2.7 Retinally Processed Noise

As we have seen, retinal processing can be modeled by simply convolving the input with a differenceof Gaussians (DOG) filter. Thus, we made an artificial environment by DOG convolving a largematrix of Gaussian distributed random numbers.

Convolving noise with a DOG imposes some structure on it (Fig. 2.25). Like the patches fromthe correlated environment, the patches from the DOG convolved noise environment (Fig. 2.26)look somewhat interesting. It seems as if many of them contain oriented bars of light.

Again, we trained on the artificial environment in much the same way as we do for naturalscenes. In order to really understand what was happening, we had to watch the weight vector as itevolved. So, rather than saving the values of the state of the system (that is, the weights and thethreshold) only at the end of the simulation, we saved them at discrete intervals. This allowed usto animate the weight field to watch its time development.

32

CHAPTER 2. METHODS

Has been in a refractory state for an amount of time chosen from a Gaussian distribution.

2) Influenced to fire by active neighbors.

1) Fires spontaneously.

Has fired for a full second.

Refractory

Firing

Normal

Figure 2.27: The three possible states of an amacrine cell and the criteria for transitions betweenstates. Note that there are two possible ways for cell in the normal state to begin firing.

2.8 Retinal Waves

For our retinal wave simulations, we implemented the model proposed by the Shatz group(Felleret al., 1997). The Shatz model accounts for many experimentally measured properties of retinalwaves. Specifically, their model addresses:

1. the source of spontaneous activity.

2. the propagation of the activity in a compact, finite wave.

3. the discrepancy between the time interval between waves and the refractory period.

The model consists of two layers, an amacrine cell layer and a ganglion cell layer. The wavesultimately originate due to spontaneous activity in the amacrine layer. The amacrine cells are allcharacterized by one of three states (Fig. 2.27). They are either firing, refractory, or “normal.”

A normal cell can be brought to fire by either of two mechanisms:

• Every cell has an innate ability to fire spontaneously. Thus, in our simulation, normal cellsare assigned a given probability for spontaneous firing. Such spontaneous events, althoughrelatively infrequent, are quite powerful, for they are the ultimate origin of the wave activity.

• What were initially spontaneous firings in some amacrine cells can spread and bring othercells to fire. Once a cell starts firing it continues to fire for one second. During this time, thecell influences other surrounding cells in the form of excitation. The excitation level of anamacrine cell in the normal state evolves in time according to

XAi,new = XA

i,olde−∆t/τA +NA

i . (2.19)

That is, a cell’s new excitation level is obtained by diminishing the level at the previous time-step with a decay constant17 and augmenting it by NA

i , the number of firing amacrine cellswithin the cell’s dendritic arbor. When the excitation level of a non-refractory cell becomesgreater than a threshold value (typically, we use 6 units), the excited normal cell will beginto fire. This models the propagation of retinal waves. Physiologically, the propagation mayresult from increased concentrations of K+ outside the cells(Burgi and Grzywacz, 1994). Thisincrease comes from nearby cells that release potassium during action potential activity.

17Typical values for the time-step, ∆t, and the constant, τA, are 100 ms.

33

CHAPTER 2. METHODS

Figure 2.28: Depicted are all the connections of a single amacrine cell. It influences neighboringamacrine cells as well as ganglion cells in the readout layer(scanned from Feller et al., 1997).

Once a cell in the firing state has fired the full second, it enters the refractory state. Insimulations, we assign a refractory period from a normal distribution with mean of 120 secondsand variance of 38 seconds. While in a refractory state, the cell cannot fire (neither spontaneouslynor through wave propagation).

In understanding the model, it is instructive to think of a forest. Dead, combustible treescan be likened to our normal cells, and live trees are like our refractory cells. When the majorityof trees are not combustible, a fire will not spread very far. But when there is a large percentageof trees that are ready to burn, a forest fire that has a small initiation site can become a greatconflagration18. Similarly, when the field of cells is primarily in a refractory state, there may besmall waves, but when the cells are primarily normal, the waves can sweep across the entire field.

The amacrine layer is coupled to a second, ganglion layer (Fig. 2.28). Ganglion cells of thissecond layer “read” the amacrine layer. That is, each ganglion cell is innervated by a number ofcontiguous amacrine cells. A ganglion cell’s excitation level is updated according to

XGj,new = XG

j,olde−∆t/τG +NA

j . (2.20)

Note that the ganglion cells are not influenced by their neighbors, but only by amacrine cells withinthe arbor. When a ganglion cell’s excitation level is greater than about 10 units, the ganglion cellitself will become active. This occurs when a sufficiently high number of amacrine cells within theganglion cell’s “arbor” are active.

Incorporation of this second layer into the model has two important effects. The motivationwas to account for the discrepancy between the measured inter-wave interval and the refractoryperiod of the individual cells. Also, the activity in the ganglion layer looks more like the measure-ments of retinal waves. Some amacrine cells are always in a refractory state, so amacrine layerwaves tend to be quite mottled. But since only the collective behavior of the amacrine layer ispassed to the ganglion layer, whether or not an individual amacrine cell is firing is unimportant.The ganglion layer waves are well-defined (Fig. 2.29); that is, they have distinct boundaries, withinwhich all cells are active.

The retinal wave model gives us yet another description of the prenatal visual environment.We used this and the other environments to study the possible mechanisms of activity-dependentdevelopment in the mammalian visual system.

18This is what happened to Yellowstone National Park. Small fires were routinely put out, thus leaving the majorityof trees in the “normal” state.

34

CHAPTER 2. METHODS

Figure 2.29: Some patches of the ganglion layer during a retinal wave.

35

CHAPTER 3. RESULTS

Chapter 3

Results

3.1 Expectations

Geniculo-Cortical Connections

We would like to understand how orientation selectivity arises in visual cortical neurons prior tovisual experience. If one or more of our approaches is an accurate depiction of the prenatal visualenvironment, we expect to find elongated subregions in our weight vectors. That is, our weightvectors should resemble the orientation selective receptive fields that we obtain by training onnatural scenes (Fig. 2.15).

Retinal-Geniculate Connections

The following are the properties of retinal-geniculate connections whose development we would liketo understand:

• The connections are localized. Thus, we expect that training in one of our environments willyield receptive fields with no more than a few strong weights.

• The LGN is characterized by a retinotopic map. If we consider the physics of axon growth,it makes sense that axons sprouting from approximately the same place will tend to end upclose to one another. Thus, we can presume that prior to any activity there is some initialbias toward a retinotopic map. We study some simulations in which we initially bias theweight vector to be somewhat localized about a particular region. We expect that trainingsuch an initially biased weight vector in a model environment will cause the weight vector tosharpen (i.e. become more localized) about the region of initial bias.

• The LGN is laminated. Each layer is influenced by one or the other eye channel, but notboth(Shatz, 1983). We therefore run simulations in which a single model neuron receivesinput along two channels, one for each eye. We expect the model cell to become selective toonly one of the channels.

36

CHAPTER 3. RESULTS

Figure 3.1: PCA weight vectors trained in an environment of noise correlated according to C =exp (−r/c). Note that the first weight vector is the DC, the principle component of the environment.The next five weight vectors were obtained from applying the Gram-Schmidt procedure, subtractingoff the components along each of the previous weight vectors.

3.2 Orientation Selectivity

Correlation Function Environments

We trained BCM and PCA neurons in environments constructed according to the correlation func-tions listed in Section 2.6. The results were not highly dependent on the type of correlation functionused.

PCA

We confirmed Miller’s result that training PCA neurons in correlation-based environments leadsto selective receptive fields(Miller, 1994). However, we can only obtain this result for an inputenvironment with a zero mean. Otherwise, the weight vector just converges to the DC solution.Figure 3.1 illustrates a simulation in which we trained PCA in the correlated environment sixtimes. After the first training we applied the Gram-Schmidt procedure (described in Section 2.5),subtracting off the components of the weight vector that lay in the directions of the previousreceptive fields. Note that subtracting out the DC solution is equivalent to setting the mean of theinput environment equal to zero.

If we compare the PCA weight vectors from this simulation to the eigenvectors of the cor-relation matrix (Fig. 2.23), we see that the weight vector converges to the eigenvector with thenext highest eigenvalue with each simulation1. This is the result we expect based on the knownproperties of PCA.

BCM

The correlation function that seemed to precipitate the most selective BCM weight vectors was

C = exp (−r/c) .1Note that the eigenvectors of similar eigenvalues may be interchanged in their order of occurrence, or even mixed

in a linear combination. There is little preference between such eigenvectors. Note also that in comparing theeigenvectors with the weight vectors, light and dark regions may be interchanged. This is simply the effect of a signchange; it is the shape (i.e. direction) of the vector that is important.

37

CHAPTER 3. RESULTS

Figure 3.2: Two stages of training BCM on an environment defined by C = exp(−r/rad). Early on,the weights point in the direction of the principle component, which is orientation selective (left).The weights then dissipate into apparent structurelessness (right).

Figure 3.3: A PCA weight vector trained on DOG convolved noise.

In Figure 3.2 we show a BCM weight vector at two stages in training in this environment. Weobserve that soon after θ begins to rise, the weights start to look like the principle component ofthe input environment2 (see Fig. 2.23). But this turns out not to be a stable fixed point. If weanimate the weight vector as it evolves, we observe that the receptive field rotates and ultimatelydissipates to become a non-selective field.

Retinally Processed Noise

PCA

The PCA weight vector, when trained on DOG convolved noise, converges to a cosine (Fig. 3.3).This is, of course, the principal component of the input environment. Since the mean is zero forthis environment, there is no DC solution to the eigenvector equation.

BCM

Training BCM in environments of retinally processed noise yields results similar to those obtainedfrom training in correlated environments. The weight vectors initially appear selective, but thendrift and becomes non-selective.

3.3 Localized Retinogeniculate Connections

We trained BCM model neurons in an environment constructed from implementing the Shatzreadout model of retinal waves. The resulting weight vectors (e.g. Fig. 3.4, left) have a singlestrong component adjacent to a single weak component. That is, there is localized selectivity.

2In these simulations, we subtract off the mean of the input environment. This eliminates the DC eigenvector.

38

CHAPTER 3. RESULTS

Figure 3.4: BCM (left) and PCA (right) weight vectors trained on the Shatz retinal wave model.The BCM weight vector has converged to be selective to a single point, whereas the PCA weightvector is completely non-selective.

A B

Figure 3.5: Each of the nine weight vectors in (B) were initialized with the bias located in thecorresponding position in (A). An initial bias in the receptive field will be sharpened by trainingBCM in the retinal wave environment. The final weight vector becomes highly selective to a pointclose to the center of the initial bias.

Training PCA in the retinal wave environment does not lead to localized selectivity (Fig. 3.4,right). The weights converge to be approximately equal.

3.4 Retinotopy in the LGN

We observe that training BCM in the retinal wave environment will always lead to a single pointwith a large weight. The location of this point within the weight vector is not fixed: running manysimulations indicated that it can be located virtually anywhere within the weight vector. It turnsout that we can control where high weight point will lie if we bias the initial weight vector to besomewhat selective about a particular point (Fig. 3.5). As discussed above, this helps to explainthe development of retinotopy within the LGN.

39

CHAPTER 3. RESULTS

Left Eye Right Eye

Figure 3.6: Here we trained a BCM weight vector in an environment in which each input vectorhas a retinal wave patch in one channel and noise in the other. The cell becomes selective to onlyone eye.

3.5 Eye-Specific Layers in LGN

When we consider the time-line of kitten visual system development (Table 1.1), we see that retinalwaves, at embryonic day 52(Meister et al., 1991) occur about a week before LGN lamination iscompleted (E60). Thus, it is reasonable to think that the retinal waves may play some role in thesegregation of the layers.

To study eye-specific layer segregation we train BCM neurons in a two channel environment.In practice, the only difference from a single channel simulation is that input and weight vectorsare twice as long. Half of a vector’s components represents information flow from the left eye, andthe other half is from the right eye.

If we compare a typical inter-wave interval of about 2 minutes(Feller et al., 1997) to a typicalretinal wave duration (on the order of a few seconds), we see that during this stage in development,the retinal cells spend the majority of the time not involved in a retinal wave. Thus, the frequencywith which both retinas are undergoing a retinal wave is quite small. A reasonable model inputenvironment, therefore, is for every input vector to carry retinal wave stimulus along either channeland Gaussian noise along the other3.

Training in the two channel environment described above gives the result we would expectif retinal waves were responsible for the eye-specific layer segregation in the LGN (Fig. 3.6). Theweights converge to be selective in one channel and non-selective in the other.

3Input vectors that carry noise along both channels have no cumulative effect on the weight vector.

40

CHAPTER 4. DISCUSSION

Chapter 4

Discussion

4.1 Interpretations

Cortical Orientation Selectivity

We confirmed that a PCA neuron will become orientation selective when trained in a rotation-ally symmetric environment with a zero mean(Shouval, 1994). We also showed that orientationselectivity can arise to a limited extent in BCM neurons trained in various forms of correlatedactivity environments. The orientation selective receptive fields that we get with BCM are notstable. However, we can argue that stability is not a requirement for prenatal receptive fields, as,by their nature, the environments in which they exist are transitory and are normally replaced bynatural scenes. As we would expect from time-line considerations, retinal wave models do not helpto explain orientation selectivity, but our studies of correlation function and DOG convolved noiseenvironments give us insight into the process.

Our result for training BCM in an environment defined by the correlation function

C = exp (−r/c) .

(Fig. 3.2) is clearly analogous to a two dimensional example that we have previously discussed: aGaussian cloud with different variance in each direction (Fig. 2.12). In that case, we observe thatthe weight vector immediately points in the direction of maximum variance of the input, that is,the principal component. It subsequently drifts–apparently aimlessly. When we solve for the fixedpoints, however, we see that the weight vector is actually confined to an ellipse1.

The initial behavior of the weight vector in the low dimensional example is due to the factthat the direction of maximum input will have the strongest early effects on the growing weightvector. It is clear that the same phenomenon is at work in the higher dimensional case of correlationfunctions and DOG convolved noise. While, at first, the weight vector appears to be converging toa nice orientation selective pattern, it is merely growing to the principal component, which is notstable.

The fact that the receptive field is not stable becomes unimportant if we slightly modify ourlearning rule. The learning rule that we usually use is:

w = ηy(y− θ)xθ = E

[y2]

(4.1)

1This involves maximizing the BCM cost function for this distribution.

41


0

0.2

0.4

0.6

0.8

response

prob

abili

ty d

ensi

ty

Figure 4.1: The output distribution of the BCM weight vector trained on retinal waves.

The rule

w = η1θy (y − θ) x (4.2)

is the same as Equation 4.1 except it has an extra factor of 1/θ. This particular form of BCM hasbeen studied(Law, 1994). It can be shown to exhibit the same sliding threshold qualities of theunmodified rule(Shouval, unpublished result). The rule has the same quantitative behavior as theunmodified rule—the fixed points are the same. Just the dynamics are different.

The 1/θ factor has the effect of slowing down the learning. When there is a high amount ofactivity, the factor is small, and the time change of the weights is lessened. Such a modificationlends some stability to the final weight vector. The receptive field remains orientation selective forquite a long time in simulation. This may be enough to account for the experimental results thatkittens possess a limited degree of orientation selectivity when they first open their eyes. Exposureto natural scenes will lead to refinement of these receptive fields, and binocular deprivation willeventually destroy them.

The LGN

Localized Selectivity

The PCA receptive fields that we obtained were all non-local2. Thus, PCA does not help to explainthis property of the LGN. To gain a greater understanding of the BCM retinal wave results, welook at the output distribution3 of the final weight vector (Fig. 4.1). There is a sharp peak at zeroand two smaller peaks equidistant from zero.

We can construct a low dimensional environment to have a comparable output distribution(Fig. 4.2). The two dimensional environment consists of two heavily populated clusters, located180◦ from one another, and two less populated clusters on opposite sides of the line defined bythe first two. A BCM weight vector will orient orthogonally to the both the large clusters. Thisgives a great number of zero projections. There will be a smaller number of positive and negativeprojections localized about the projections of the weight vector on the centers of mass of the twosmaller clouds.

This low dimensional example gives us insight into BCM’s behavior in the retinal waveenvironment. Recall that BCM seeks out kurtotic projections with the input environment. In theretinal wave environment, the BCM weight vector (Fig. 3.4) has oriented itself in such a way as to

2This is a general property of PCA(Shouval, 1994).3Recall that the output distribution is a histogram of the activity taken over the entire input space (i.e. all of the

retinal wave patches): {yi} = w · {xi}.

42


prob

abili

ty d

ensi

ty

activity

Figure 4.2: A two dimensional BCM example with an output distribution similar to that of theretinal wave environment.

be orthogonal to the majority of the input vectors (as seen by the sharp peak at zero in the outputdistribution). Whenever the synapses with both the strong positive weight and the strong negativeweight receive input from a retinal wave, their effects exactly cancel to give zero activity. The sameis true when neither component is part of a retinal wave. However, when the positive weight seesa retinal wave and the negative weight does not see a wave4, or vice versa, the effects of the twocomponents are cumulative. There is a nonzero activity level. The nonzero activity occurs whenthe points of positive and negative selectivity lie across a retinal wave border.

Retinotopy

We could not a priori eliminate the possibility that the BCM fixed points in the retinal waveenvironment were of unequal energies. If that were the case, biasing the initial weight vector wouldnot have a strong effect. We observe, however that in a retinal wave environment, training BCMwith an initial bias of some selectivity about a point has a strong influence on where the fixed pointwill lie. This evidence suggests that the retinal waves may be responsible for the development ofretinotopy in the LGN.

Eye-Specific Layers

Our two channel result from training BCM in a retinal wave environment (Fig. 3.6) is an instance ofa more general property of BCM. This can be illustrated by means of a two dimensional, two channelexample (Fig. 4.3). In this example, we have two channels, each of which has two dimensions, sowe are actually dealing with a four dimensional problem. We construct the input environments asfollows:

• Half of the input vectors have structure (Laplace distributed numbers) in their first twocomponents and noise (Gaussian distributed numbers) in the second two components.

• The other half of the input vectors have noise in the first two components and structure inthe third and fourth.

4Note that for the sake of analysis, we assign a negative value to input components that are not a part of a retinalwave.

43


Structured Stimulus Unstructured Stimulus

Figure 4.3: Two dimensional, two channel environment for BCM: At each iteration we presenteither channel with stimulus from a rotated Laplace distribution (left). This distribution containsstructure that BCM seeks out. The other channel is presented with unstructured Gaussian noise(right).

Left Eye Right Eye

Figure 4.4: When we train BCM in the environment depicted in Figure 4.3, the (four dimensional)weight vector will become responsive to only one channel (in this case, the right eye channel). Notethat we plot the left eye and right eye components of the weight vector against the structuredstimulus distribution. We do this for the sake of readability. One must note that in simulation onlyone channel receives stimulus from this distribution at each iteration.

In our simulation, input vectors are selected at random, so at each iteration one channel is presentedwith structure and the other is presented with noise. The result (Fig. 4.4) is that the BCM weightvector becomes selective to only one channel.

PCA does not exhibit this behavior of becoming selective to a single channel when onechannel sees noise while the other sees structure. We can construct an environment similar to theone described above with one difference: rather than Laplace distributed numbers, we present thestructure channel with a Gaussian cloud that has higher variance in one direction than the other(Fig. 4.5). Thus, each input vector contains structure that is sought by PCA along one channel andnoise along the other. PCA will not converge to a single channel. We can predict this by examiningthe eigenvectors of the correlation matrix of the environment. Two of the four eigenvectors have thesame large eigenvalue. One of these eigenvectors corresponds to selectivity to one of the channels,and the other eigenvector corresponds to selectivity to the other channel. Since they have the sameeigenvalue, they are equal energy solutions. PCA converges to a linear combination. That is, it

44


Structured Stimulus Unstructured Stimulus

Figure 4.5: Two dimensional, two channel environment for PCA: At each iteration we present eitherchannel with stimulus from a rotated Gaussian distribution with greater variance in one directionthat the other (left). This distribution contains structure sought by PCA. The other channel ispresented with unstructured Gaussian noise (right).

need not be selective to a single channel.We have seen that for BCM, but not for the PCA learning rule, when an input environment

consists of structure along one eye channel and non-structure along the other eye channel, a cell willbecome responsive to a single channel. This is the case even when the channel along which thereis structured input is different is not constant. Since there is no correlation of activity betweenthe two retinas and the fraction of time that waves are active is small, retinal waves satisfy thiscondition. Our results indicate that they may, indeed, be responsible for the segregation of LGNinto layers of eye-specific layers.

4.2 Comparison of Learning Rules

We observed that both the PCA and BCM rules can account for the development of orientationselectivity prior to visual experience. But only the BCM rule predicts the segregation of the LGNlayers as well. Also, the PCA rule can not account for the experimental result that the LGN cellshave localized connectivity to the retina5. The PCA receptive field is non-local in the retinal waveenvironment. Thus, in describing the prenatal environment, the BCM rule is more robust.

Other learning rules have been proposed. One such rule(Haith, 1998; Sejnowski, 1998) is

wi = ηwiyxi

wn = wn/|wn| (4.3)

This is a modified Hebb rule in which the weight change at each component is proportional toits current value, the current activity of the cell, and the stimulus. The weights are explicitlynormalized at each iteration.

Scaling the learning rule with the current weight value causes a feed-back cycle in whichstrong weights tend to become stronger. This rule leads to localized weight vectors that couldexplain the observed retinogeniculate connectivity. However, this rule has two flaws:

5We know that mean activity in retinal waves is not zero, so the DC solution is the solution.

45


• It will always converge to a localized weight vector, even in training on natural scenes. Thus,this cannot be the correct learning rule for cortical neurons. The implication is that thegeniculate and cortical neurons operate under different learning rules. We can not dismissthe rule based on this alone, but we would tend to favor a single rule that applies to bothtypes of neurons.

• One must incorporate other mechanisms, such as inhibition between layers and reinforcementwithin layers, to explain the eye-specific lamination in the LGN(Haith, 1998).

• It involves an explicit weight normalization. This is not all that biologically plausible sinceit would require the incorporation of information from all the synapses into a calculation ateach individual synapse.

Of the three learning rules, BCM is the only one that can account for all four of the propertiesdiscussed in this thesis. It does so without the use of non-local information for local calculations.

4.3 The Emergent Picture

Our results suggest the following description of the development of selectivity in the retinofugalpathway prior to visual experience:

Retina to LGN Connections

The emergence of localized selectivity of the retinogeniculate connections, as well as the retinotopyand eye-specific lamination of the LGN, are attributable to the retinal wave activity that is presenttwo weeks before birth (in kittens). This result is dependent on the use of the BCM learning rule.The critical period for the LGN is rather short. Its properties are fixed well before eye-opening,presumably so it is ready to act as the relay and filter we see in adults.

LGN to Cortex Connections

Cortical orientation selectivity develops after the LGN properties. We have shown that it can beginto arise from BCM trained in various forms of correlated activity such as:

• Activity defined by a correlation function that falls off exponentially with distance.

• Retinal processing of spontaneous activity.

The fixed points are not stable, but we can lend stability to them if we modify our learning rule witha 1/θ factor. This factor does not change the behavior of the rule in the natural scene environmentscharacteristic of the post-eye-opening visual world.

4.4 Conclusion

We have shown that these visual system characteristics need not be determined by separate, com-plicated developmental processes whose instructions would have to be specifically outlined in thegenetic code. We also note that we did not need to incorporate network effects. We therefore havethe elegant result that all four characteristics can be explained as arising from the basic propertiesof single-cell neural interaction. Nature has found an incredible economy.

46

APPENDIX A. PCA FIXED POINTS

Appendix A

PCA Fixed Points

Here we show that the particular eigenvector the PCA rule converges to is that with the greatesteigenvalue(Oja, 1982). Note that PCA is a modified Hebb rule. The second term serves only tonormalize the weights, so we can start with the Hebb rule:

w = yxE [w] = E [yx]

= E [(x ·w) x]

= E[xTxw

]E [w] = Cw

Since the eigenvectors form a complete basis, we can write

w =∑

aivi.

Substituting and differentiating, we have∑aivi = C

∑aivi

=∑

aiλivi.

Since the eigenvectors are orthonormal, we can take the dot product of both sides of this equationwith vj to get

ai = λiai.

The solution to this differential equation is

ai(t) = a(0)eλiai .

Therefore, the coefficient corresponding to the greatest eigenvalue will grow the fastest.

47

BIBLIOGRAPHY

Bibliography

Bear, M. F., Connors, B. W., and Paradiso, M. A. (1996). Neuroscience, Exploring the Brain.Williams and Wilkins.

Bear, M. F. and Malenka, R. C. (1994). Synaptic plasticity: LTP and LTD. Curr. Opin. Neurobiol.,4:389–399.

Bienenstock, E. L., Cooper, L. N., and Munro, P. W. (1982). Theory for the development ofneuron selectivity: orientation specificity and binocular interaction in visual cortex. Journalof Neuroscience, 2:32–48.

Blais, B. S. (1998). The Role of the Environment in Synaptic Plasticity:Towards an Understanding of Learning and Memory. PhD thesis, Brown University, Institutefor Brain and Neural Systems; Dr. Leon N Cooper, Thesis Supervisor.

Blais, B. S., Intrator, N., Shouval, H., and Cooper, L. N. (1998). Receptive field formation innatural scene environments: comparison of single cell learning rules. Neural Computation,10(7).

Blakemore, C. (1976). The conditions required for the maintenence of binocularity in the kitten’svisual cortex. Journal of Physiology, 261(2):423–44.

Burgi, P.-Y. and Grzywacz, N. M. (1994). Model based on extracellular potassium for spontaneoussynchronous activity in developing retinas. Neural Computation, 6:983–1004.

Feller, M. B., Butts, D. A., Aaron, H. L., Rokhsar, D. S., and Shatz, C. J. (1997). Dynamicprocesses shape spaciotemporal properties of retinal waves. Neuron, 19:293–306.

Feller, M. B., Wellis, D. P., Stellwagen, D., Werblin, F. S., and Shatz, C. J. (1996). Requirementsfor cholinergic synaptic transmission in the propagation of spontaneous retinal waves. Science,272:1182–1187.

Friedlander, M. J. and Tootle, J. (1990). Postnatal anatomical and physiological development ofthe visual system. In Coleman, J. R., editor, Development of Sensory Systems in Mammals,pages 61–124, New York. John Wiley and Sons.

Galli, L. and Maffei, L. (1988). Spontaneous impulse activity of rat retinal ganglion cells in prenatallife. Science, 242:90–91.

Haith, G. L. (1998). Modeling Activity-Dependent Development in the Retinogeniculate Projection.PhD thesis, Stanford University.

Hebb, D. O. (1949). The Organization of Behavior. Wiley.

48

BIBLIOGRAPHY

Hubel, D. H. and Wiesel, T. N. (1962). Receptive fields, binocular interaction and functionalarchitecture in the cat’s visual cortex. J. Physiol, 160:106–154.

Hubel, D. H. and Wiesel, T. N. (1963). Receptive fields of cells in stiate cortex of very yong, visuallyinexperienced kittens. J. Neurophysiol., 26:994–1002.

Intrator, N. and Cooper, L. N. (1992). Objective function formulation of the BCM theory of visualcortical plasticity: Statistical connections, stability conditions. Neural Networks, 5:3–17.

Law, C. and Cooper, L. (1994). Formation of receptive fields according to the BCM theory inrealistic visual environments. Proceedings National Academy of Sciences, 91:7797–7801.

Law, C. C. (1994). Development of Primary Visual Cortex According to the BCM Theory ofSynaptic Plasticity. PhD thesis, Brown University, Institute for Brain and Neural Systems;Dr. Leon N Cooper, Thesis Supervisor.

Levental, A. G. and Hirsh, H. V. B. (1980). Receptive field properties of different classes of neuronsin visual cortex of normal and dark-reared cats. Journal of Neurophysiology, 43:1111–1132.

Maslim, J. and Stone, J. (1986). Synaptogenesis in the retina of the cat. Brain Res, 373:35–48.

Mastronarde, D. N. (1992). Nonlagged relay cells and interneurons in the cat lateral genicluatenucleus: Receptive-field properties and retinal input. Visual Neuroscience, 8:407–441.

Meister, M., Wong, R. O. L., Baylor, D. A., and Shatz, C. J. (1991). Synchronous bursts ofaction potentials in ganglion cells of the developing mammalian retina. Science Wash. DC,252:939–943.

Miller, K. D. (1994). A model for the development of simple cell receptive fields and the orderdarrangement of orientation columns through activity-dependent competition between on- andoff-center inputs. J. Neurosci., 14:409–441.

Miller, K. D., Keller, J., and Stryker, M. P. (1989). Ocular dominance column development:Analysis and simulation. Science, 240:605–615.

Mooney, R., Penn, A. A., Gallego, R., and Shatz, C. J. (1996). Thalamic relay of spontaneousretinal activity prior to vision. Neuron, 17:863–874.

Oja, E. (1982). A simplified neuron model as a principal component analyzer. Journal of Mathe-matical Biology, 15:267–273.

Orban, G. A. (1984). Neuronal Operations in the Visual Cortex. Springer Verlag.

Sanderson, K. J. (1971). Visual field projection columns and magnification factors in the lateralgeniculate nucleus of the cat. Experimental Brain Research, 13:159–177.

Sejnowski, T. (1998). Constrained optimization for neural map formation: A unifying frameworkfor weight growth and normalization. Neural Computation, 10:671–716.

Shatz, C. (1983). The prenatal development of the cat’s retinogeniculate pathway. Journal ofNeuroscience, 3:482–499.

Shatz, C. J. and Stryker, M. P. (1988). Prenatal tetrodotoxin blocks segregation of retinoginiculateafferents. Science, 242:87–89.

49

BIBLIOGRAPHY

Sherman, M. and Koch, C. (1990). Thalamus. In Shepherd, G. M., editor, The Synaptic Organi-zation of the Brain, pages 246–278, New York. Oxford University Press.

Shouval, H. Z. (1994). Formation and Organization of Receptive Fields, With An Input EnvironmentComposed of Natural Scenes. PhD thesis, Brown University, Institute for Brain and NeuralSystems; Dr. Leon N Cooper, Thesis Supervisor.

Stryker, M. and Harris, W. (1986). Binocular impulse blockade prevents the formation of oculardominance columns in cat visual cortex. Journal of Neuroscience, 6:2117–33.

Voigt, T., Naito, J., and Wassle, H. (1983). Retinotopic scatter of optic tract fibres in the cat.Experimental Brain Research, 52:25–33.

Wiesel, T. N. and Hubel, D. H. (1965). Comparison of the effects of unilateral and bilateral eyeclosure on cortical unit responses in kittens. J. Neurophysiol., 28:1029–1040.

Wiesel, T. N. and Hubel, D. H. (1974). Ordered arrangement of orientation columns in monkeyslacking visual experience. J. Comp. Neurol., 158:307–318.

Wong, R. O. L. (1997). Patterns of correlated spontaneous bursting activity in the developingmammalian retina. seminars in Cell and Developmental Biology, 8:90–91.

Wyatt, J. L. and Elfadel, I. M. (1995). Time-domain solutions of Oja’s equations. Neural Compu-tation, 7(5):915–922.

50

on activity-dependent development in ... - bryant university

Documents