modeling the adaptive visual system

Upload: katie-banks

Post on 07-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Modeling the Adaptive Visual System

    1/19

    2003 Special Issue

    Modeling the adaptive visual system: a survey of principled approaches

    Lars Schwabe *, Klaus Obermayer Department of Computer Science and Electrical Engineering, Berlin University of Technology, FR2-1, Franklinstrasse 28/29, Berlin 10587, Germany

    Received 2 December 2002; revised 28 July 2003; accepted 28 July 2003

    Abstract

    Modeling the visual system can be done at multiple levels of description ranging from computer simulations of detailed biophysicalmodels to ring rate and so-called black-box models. Re-introducing David Marrs analysis levels for the visual system, we motivate theuse of more abstract models in order to answer the question of what the visual system is computing. The approaches we selected to reviewin this article concentrate on modeling the changes of sensory representations. The considered time-scales, range from the developmentaltime-scale of receptive eld formation to fast transient neuronal dynamics during a single stimulus presentation. Common to all approachesis their focus on providing functional interpretations, instead of only explanations in terms of mechanisms. Although the concreteapproaches can be distinguished along different lines, a common theme is emerging which may qualify as a paradigm for providingfunctional interpretations for changes of receptive eld properties, i.e. the dynamic adjustment of sensory representations to varyingexternal or internal conditions.q 2003 Elsevier Ltd. All rights reserved.

    Keywords: Visual cortex; Models; Infomax; Bayesian inference; MDL principle

    1. Introduction

    When should an organism exploit its environment inorder to benet from its previously acquired knowledge, andwhen should an organism continue to explore its environ-ment to acquire even more knowledge for later exploitation?This famous and well-known exploration exploitationtradeoff (see, e.g., Kaelbling, Littman, & Moore, 1996 ) isa rather high-level problem relevant for understanding thebehavior of biological organisms and for building intelligentarticial agents. This problem, however, has a low-levelcounter part which in turn is highly relevant for under-standing biological sensory systems and, of course, forbuilding the sensing devices of intelligent articial agents.The so-called plasticity stability tradeoff (Grossberg,1976) refers to the problem of deciding when and how toadapt the internal sensory representation of the outsideworld.

    On the one hand, subsequent processing stages rely onstable and reproducible sensory representations whenperforming their computations like, e.g., the decision

    between exploration and exploitation. On the other hand,sensory representations need to be plastic in order to ensurea high delity representation of sensory stimuli whosebehavioral relevance may not be known beforehand. Inother words, sensory representations need to be adaptive.Indeed, adaptation is a widespread phenomenon in nervoussystems, and it happens on multiple time-scales. In thecontext of the early visual system, the activity-dependentrenement of cortical maps happens during a critical phasewhich lasts weeks, perceptual learning occurs during hoursand days, contrast-adaptation in the primary visual cortexworks on the order of a few seconds, and the fast dynamicsof receptive eld properties on the order of a few tens orhundred milliseconds may also be viewed as an adaptiveprocess.

    Together with the apparent general applicability of theidea of adaptation comes a certain vagueness: for example,what exactly do contrast-adaptation and the development of cortical maps have in common? Is it indeed justied to think of dynamic receptive eld properties or even attentionaltopdown modulations in the visual cortex as some sort of adaptation? Clearly, understanding adaptation only in thenarrow sense of a neuronal fatigue after prolongedstimulation falls too short. However, with this review wewant to demonstrate that adaptation, when understood in

    0893-6080/03/$ - see front matter q 2003 Elsevier Ltd. All rights reserved.doi:10.1016/j.neunet.2003.07.009

    Neural Networks 16 (2003) 13531371www.elsevier.com/locate/neunet

    * Corresponding author. Tel.: 49-30-314-24753; fax: 49-30-314-73121.E-mail address: [email protected] (L. Schwabe).

    http://www.elsevier.com/locate/neunethttp://www.elsevier.com/locate/neunet
  • 8/6/2019 Modeling the Adaptive Visual System

    2/19

    a broader sense, may serve as a paradigm for providingfunctional interpretations of experimentally characterizablechanges of receptive eld properties on multiple time-scales. We suggest to conceptualize these changes asadjustments of sensory representations to changing externalor internal conditions to ensure their optimal usage. In orderto apply this view to a concrete phenomenon, however, oneneeds to make explicit (i) which conditions are changing,and (ii) what is the measure according to which theoptimality of a representation is to be judged. Here wereview models of the visual system at multiple time-scalesand focus on so-called principled approaches, because theymake explicit both aspects.

    This article is structured as follows: In Section 1.1 wegive a very short introduction to the experimentally well

    characterized early part of the primate visual pathway. InSection 1.2 we justify our focus on principled approaches byre-introducing the three analysis levels for the visual systemproposed by David Marr more than 20 years ago ( Marr,1982), and in Section 1.3 we suggest some taxonomies forgrouping organization principles for the visual system. InSections 24 we review models for adaptation processes inthe visual system, and in Section 5 we nish with a shortsummary and a sketch of what we think are promisingdirections for future research.

    1.1. The primate early visual pathway

    Most (but not all) approaches we review in this article areintended to be simple models of primary visual cortex. Inthese models the inputs are pixel images, and the neuronalresponses are modeled as functions of these images. Thus,these models abstract from the early visual pathway. Oftenthis can be justied, because explicitly modeling the signaltransmission from the retina to the primary visual cortexwould make the model too complicated and hide the mainidea behind it. For completeness, however, let us brieysummarize the anatomy and physiology of the early visualpathway. A more detailed description can be found in(Kandel, Schwarz, & Jessel, 1996 ).

    Fig. 1(a) sketches the overall anatomical structure of theearly visual pathway. When a cat or monkey xates itsenvironment, light that falls into the eye is focused by thecornea and the lens to form two images on the left and rightretinae. The part of the world that contributes to the imageformed on the two retinae is called the visual eld of theanimal. Each retina transforms the incoming light intensitydistribution into spike patterns, which are transmitted by thetwo optic nerves into the central nervous system. Each opticnerve consists of roughly 10 6 fast conducting axons, almostall of which target two structures within the thalamus, whichare called Lateral Geniculate Nuclei or LGN. At the opticchiasm, each optic nerve branches such that one half of thebers targets the contralateral LGN at the side opposite toits origin, whereas the rest contacts the ipsilateral LGNat the same side as the eye of origin. The cross-over of

    the nerve bers occurs in a highly ordered fashion, whichensures that each LGN receives the bers from theipsilateral parts of both retinae. Thus, each LGN processesthe signals from the contralateral hemisphere of the visualeld.

    Proceeding from the LGN, another less concentratedbundle of bers, which is referred to as the optic radiation,contacts the primary visual cortex (also referred to as areaV1). In contrast to the optic nerve, the optic radiation doesnot cross hemispheres. Hence, analogously to the LGN,each hemisphere of the primary visual cortex processesvisual information from the contralateral hemisphere of thevisual eld. From V1, two major output streams can bedivided. The rst stream projects from V1 to higher visualareas, whereas the second stream projects back to the LGN

    and other deep structures. So far we have summarizedanatomical basics of the pathway from the retina to V1. Letus now move on to the prototypical response properties of cells in the retina, LGN and V1.

    Neurons in the retina and the LGN are sensitive to stimulionly within a small, roughly circular part of the visual eld,which is called their receptive eld. If this receptive eld isilluminated with a small bright spot, the responses of mostcells are not uniform but depend on the spots positionwithin the receptive eld. Some neurons are excited, whenthe spot hits the center of the receptive eld, whereas theirresponse is suppressed by illumination of the surround.These cells are called ON-center cells. The so-called OFF-

    center cells are inhibited by illumination in the center andexcited by surround illumination. Diffuse illuminationevokes almost no response. However, compared to neuronsin the retina and the LGN which have nearly circularreceptive elds, cortical response characteristics show aqualitatively new feature: many cortical neurons respondselectively to contrast lines, bars, or gratings of a certainorientation within the receptive eld.

    Fig. 1(b) shows the spike train of a cortical neuron inmacaque V1 in response to an oriented bar moved across theclassical receptive eld (after ( Hubel & Wiesel, 1968 )).The neuron responds most strongly for a given orientation of the bar, and its response declines strongly with increasingdifference between the stimulus orientation and thepreferred orientation of the cell. In Fig. 1(c) the averagespike rate of an orientation selective neuron is plotted as afunction of the orientation of a moving grating used forvisual stimulation (after ( Levitt & Lund, 1997 )). This curveis called an orientation tuning curve. The response of thisneuron is tuned to an orientation of approximately 30 8. Theneuron responds similarly to an orientation of 210 8, whichcorresponds to a grating of the same orientation but movinginto the opposite direction. Fig. 1(d) shows a hypotheticalwiring scheme between the neurons in the LGN and anorientation-tuned neuron in V1: a projection from ON- andOFF-center cells in the LGN with their receptive eldcenters being located along an axis whose orientationmatches the preferred orientation of the cortical neuron

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 135313711354

  • 8/6/2019 Modeling the Adaptive Visual System

    3/19

    would induce orientation selectivity. In other words,the cortical neuron becomes a detector of oriented edgeswith the strongest response to bright illumination in the ON-center regions and absence of illumination in the OFF-center regions. The response vanishes for diffusestimulation.

    Of course, response properties in the primary visualcortex have been characterized in much more detail thansketched here, but this so-called orientation selectivity is theparadigmatic example for feature selectivity in the visualsystem. It may come as a surprise, but until now the details

    of the possibly species-specic mechanisms underlyingorientation selectivity are not known, and the wiring schemein Fig. 1(d) is only one possible explanation among acontinuum ranging from the feed-forward scheme suggestedby Fig. 1(d) which involves the LGN and V1 ( Hubel &Wiesel, 1977 ) to pure recurrent models relying only on theintra-cortical circuitry within V1 ( Adorjan, Levitt, Lund, &Obermayer, 1999a ). Furthermore, it also applies only tosimple cells, which in addition to a preferred orientationhave a preferred phase, i.e. reversing the light anddark regions of a stimulus would silence such neurons.

    Fig. 1. (a) The primate early visual pathway. (b) Responses of an orientation-selective neuron in the primaryvisual cortex when an oriented bar is moved acrossthe classical receptive eld (rectangular box) (c) Typical orientation tuning curve with the mean ring rate plotted against the orientation of the moving visualstimulus. (d) Spatial layout of how to connect ON-center and OFF-center neurons in the LGN to cells in primary visual cortex for producing orientationselectivity predicted by a feedforward model.

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 13531371 1355

  • 8/6/2019 Modeling the Adaptive Visual System

    4/19

    The phase-invariance of complex cells cant be explainedwith such a wiring scheme. Recent and in-depth reviews of this topic can be found in ( Ferster & Miller, 2000;Sompolinsky & Shapley, 1997 ). However, for our consider-ations the concrete wiring scheme itself is only of secondaryinterest. The approaches reviewed here ask for why thereceptive elds of cells in primary visual cortex are orientedat all (see (Olshausen & Field, 1996 ) in Section 2.2) or whycomplex cells are phase invariant (see ( Wiskott &Sejnowski, 2002 in Section 2.2). Let us now re-introduceDavid Marrs analysis levels for the visual system in orderto justify the principled approaches.

    1.2. David Marrs analysis levels

    David Marr (together with Tomasio Poggio) developeda general account of information-processing systems ingeneral and of visual systems in particular in terms of three levels of analysis ( Marr, 1982 ): (i) the level of thecomputational theory of the system, (ii) the level of algorithms and representations, which are the mathemat-ical descriptions of the computations, and (iii) the level of the neuronal implementation of these algorithms andrepresentations. In computer science a similar distinctionbetween the (formal) specication of a problem, itsalgorithmic solution and the concrete implementationhas emerged as the ideal case for developing algorithms/ software. The challenge, however, of actually applying

    Marrs three levels of analysis is twofold: rst, in theneurosciences one is not designing systems as in computerscience, but analyzing them. Second, a clear separationbetween the three levels may not always be possible (oreven desirable), because in neuronal systems the actuallyrealized algorithms are certainly not independent of theneuronal hardware they are running on. In other words,although a recurrent neuronal network can in principlesimulate an arbitrary Turing machine (see ( Minsky, 1967 )and (Siegelmann & Sontag, 1991 )), i.e. every conceivablealgorithm can be implemented, this does not mean that thebrains microcircuits should be viewed in this particularway. Let us now illustrate Marrs basic idea with ametaphor taken from computer science.

    Although in computer science algorithms are oftendeveloped in an ad hoc manner, the ideal case is a step-by-step development starting from an abstract specication of the problem to be solved and ending with executable code.Consider the problem of sorting a sequence of numbers S i0; i1; ; i N according to a relation , : The rst step fordeveloping a sorting algorithm is to formalize the require-ments via, e.g., using the predicate sorted dened as

    sorted S : i0 , i1 , , i N

    Every sorting algorithm sort has to transform an inputsequence S in into an output sequence S out sort S inso thatafter the sorting algorithmhas terminated sorted S outholds,i.e. the output sequence S out is sorted. We like to point out

    three important aspects of this metaphor. First, onlyalgorithms for which the predicate sorted holds for everyinput sequence S in should be called a sorting algorithm.Second, multiple sorting algorithms could solve the problemof sorting a sequence of numbers, e.g. quick-sort, bubble-sortor selection-sort. They may differ, however, in terms of theircomputational efciency. Third, each sorting algorithmcould be formulated, compiled and executed on a variety of different hardware platforms, e.g. a personal computer, amechanical computing device, a Nintendo Gamecube or amobile phone.

    Applied to David Marrs levels of analysis, theformulation of the predicate sorted corresponds to level(i). An algorithm can only be developed if its requirementshave been formulated. Correspondingly, the coarsest

    description of a sensory system is a (verbal) description of what the system is computing, even if the details of thiscomputation are not known yet. The formulation of aparticular sorting algorithm like quick-sort or bubble-sortcorresponds to level (ii). In the same way as the problem of sorting a sequence of numbers could be solved by amultitude of different algorithms in different ways, asensory system could compute representations of the outsideworld in a more or less efcient manner. Another parallel isthe type of the used representations. Sorting algorithms areusually formulated for particular data-types like linked lists,arrays or heaps. For a particular sensory system one can ask,whether the representation is discrete or continuous, noisy

    or reliably. Finally, the hardware used to execute thesoftware implementation of a sorting algorithm correspondsto level (iii). For example, once assumed that therepresentations of a particular sensory system are continu-ous, one has to make explicit the neuronal substrate of theserepresentations. Continuous representations could be rea-lized at the neuronal level in a multitude of different ways. Itis conceivable that the time-averaged ring rate of individual neurons corresponds to the represented continu-ous quantity, but the average activity of a population of neurons as well as time-delays between action potentials arealso candidate substrates. On the other hand, discreterepresentations could be realized locally simply by thepresence or absence of action potentials of selected neurons.A more global realization of discrete representations couldinvolve the synchronized activity of large groups of neurons. The set of neurons which at a given timesynchronize their action potentials could then be identiedwith a discrete representation.

    Of course, despite these parallels an important differenceis obvious: once the problem to be solved has beenspecied, a computer scientist can design an algorithm tosolve this problem with the only constraints being theexpressive power of the used formal programminglanguage. Evolution always had to re-build and adaptsystems to the environmental demands where only availableresources could be utilized. Thus, it could well be thatevolution has produced solutions which are only just good

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 135313711356

  • 8/6/2019 Modeling the Adaptive Visual System

    5/19

    enough and not optimal, as it would have been possible if the system is designed from scratch instead of being theresult of modifying an already existing system.

    However, during the last 15 years or so a systematicapproach to modeling sensory systems has been devel-oped. The general method is the following: First, onesimply guesses the problem solved by the investigatedsensory system. Second, one formalizes a mathematicalcriterion according to which the performance of thesensory system is evaluated. Third, one derives theopt imal solution to this problem in terms of amathematically formulated algorithm, which could inprinciple be implemented on a computer. Then, one triesto solve the same problem with a mathematical model,which is subject to the constraints of the modeled

    sensory system and compares the performance of thissolution with the optimal one. As a last step one needs tocompare the resulting predictions of the constrainedmodel with experimental data like measured receptiveeld properties. If the model predictions match themeasured data, then one can hope to have gained someinsight into what and how well the investigated system iscomputing, because this was formulated in the rst place.

    This approach is clearly a topdown approach in thesense that the sensory systems function is formulated rstand the neuronal machinery used for realizing this functionis derived afterwards. Marrs analysis levels seem tosupport this approach. Furthermore, it is similar to theideal case in computer science, which starts from aspecication of a problem and then proceeds via well-dened renement steps towards the executable code.Therefore, one may also refer to it as analysis by design inthe sense that a solution to a problem the sensory systems isguessed to solve is designed. We already noted, however,that evolution may not have found optimal, but only goodenough solutions. Furthermore, when modeling biological

    visual systems the primary goal is still an analysis of the investigated system instead of designing an articialvision system. Hence, one may question the validity of thisanalysis by design in favor of a bottomup approachstarting from models of neuronal circuits which are thenanalyzed in terms of the computations they perform.Although in this article, we focus on top down approachesonly, we view both approaches as being distinct and almostequally important.

    1.3. A taxonomy for computational principles

    At a very general level, one agrees that sensorysystems are computing representations of the outsideworld which serve as the basis for an animals behavior.

    Concrete approaches, however, can be distinguishedalong multiple different lines. For example, consider thesystematic top down approach to modeling sensorysystems sketched in Section 1.2. It involves a criterionfor evaluating the sensory systems performance, whichis usually formulated in terms of an objective function.Fig. 2(a) and (b) shows the information ow between anagents sensory system, the read-out and action selectionand the environment. In both cases the agent perceivesits environment via its sensory system. Adapting thesensory system as dictated by an objective function,however, involves an error signal. We use the source of this error signal to derive a useful taxonomy according towhich modeling approaches might be grouped.

    Almost all proposed approaches use unsupervised or self-supervised organization principles, mainly becausesupervised principles have been viewed to be less plausible,since they require comparing the actual outputs of thesensory system with desired outputs in order to compute theerror signal. Indeed, every biologically plausible organiz-ation principle needs to predict adaptation rules, which can

    Fig.2. Suggested taxonomyfor organizationprinciples for sensory representations. (a) Unsupervisedand self-supervisedprinciples do not relyon any feedback from the environment. All used information about the environment is already present in the feedforward input. (b) Feedback-based principles utilize feedback from the environment which is not present in the raw feedforward input like reinforcements.

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 13531371 1357

  • 8/6/2019 Modeling the Adaptive Visual System

    6/19

    be computed based on locally available signals. Inunsupervised and self-supervised organization these signalsare computed within the sensory system itself (see solidarrows in Fig. 2(a) ). Neurons being part of high-levelrepresentations may send their output signals back viafeedback projections in order to improve low-levelrepresentations, but the computed error signals do notincorporate other information about the environment thanalready present in the feedforward input.

    In contrast, feedback-based organization principlesutilize feedback from the environment in order to improvesensory representations (see Fig. 2(b) ) which would allowfor a task-specic adaptation and reorganization of asensory system. Note, however, that feedback-basedprinciples do not necessarily need to be supervised in the

    sense of relying on desired outputs of the sensory systemdirectly provided by the environment, because those couldbe guessed internally, e.g. based on reinforcementsreceived from the environment. To the best of ourknowledge, no biologically plausible feedback-basedorganization principle for the visual system has beenproposed so far, although the results of Ullman and co-workers (reviewed in Section 2.2) suggest that incorporat-ing task-specic feedback from the environment is likely toplay a role in shaping high-level representations. Althoughrecent evidence from human psychophysics suggest thatsuch a mechanisms may also operate in human perceptuallearning ( Seitz & Watanabe, 2003 ) of low-level visual

    features, it is not clear to what extent the organization of the visual system during the developmental time-scaledepends on this type of feedback from the environment.Every approach we review in this article can becharacterized as being either unsupervised or feedback-based. However, we decided to group the approachesreviewed here along the time-scale during which a changeof receptive eld properties occurs instead along thisunsupervised/feedback-based taxonomy, because we think of this as another usefull taxonomy.

    2. The developmental time-scale

    Most principled approaches to modeling the early visualsystem have been made in deriving receptive eld proper-ties. Since in this article we selected the biologicalperspective of adaptation at multiple time-scales as a wayto group these and other approaches, we do not even attemptto provide a comprehensive review of these works, but focuson some selected and representative works only.

    As already sketched in Section 1.2, the general methodfollowed by many authors is to formulate an objectivefunction and then to optimize the free model parameterswhich describe properties of the neuronal system likereceptive eld proles. A very prominent approach isefcient encoding. The basic idea is to recode the rawsensory inputs into neuronal ring patterns in a way that

    preserves all the information present in the input. Therecoding, however, should also be efcient in the sense thatthe average neuronal description of the sensory input is asshort, i.e. cheap, as possible,which can turn out to be usefulfor a biological system, if the energetic costs of producingactions potentials are considered. Furthermore, shortdescription lengths of sensory inputs allows for fasterreactions times. Minimizing the average length of theneuronal description of sensory inputs could be achievedby assigningshort descriptions to frequentlyoccurringinputsand longer descriptions to only rarely occurring inputs.

    However, one may question whether it is a valid approachto characterize sensory systems as communication channelsoptimized forrecoding their inputsso that all the informationpresent in the input is preserved. For example, it is

    conceivable that sensory systems perform a preprocessingthat leads to a high delity representation of only thebehavioral relevant stimuli, which do not necessarily need tobe the most frequently occurring ones. Nevertheless,applications of the efcient encoding idea to early sensorysystems have been quite successful in the past indicating thatat least to early processing stages this idea might beapplicable.

    On a developmental time-scale, the idea of efcientencoding with efciency corresponding to short neuronaldescriptions of the sensory inputs might indeed beapplicable. In the context of adaptation at the environmentaltime-scale, however, we suggest the extraction of invariants

    as another goal of sensory systems (see Section 3.2) whichturns out to be a side effects of optimizing a sensory systemsbased on the idea of efcient encoding.

    In Section 2.1 we rst introduce the idea of efcientencoding in the context of Rissanens minimum descriptionlength framework (MDL) framework ( Rissanen, 1989 ),because it provides a clear mathematical framework whichalso allows to express constraints on the sensory represen-tations. This is important when modeling sensory systems,because those are always subject to biologically constraintslike the non-negativity and limited range of neuronal ringrates.

    2.1. Probabilistic inference and efcient encoding

    We now shortly introduce the Bayesian approach tomodeling the visual system and Rissanens MDL frame-work. We also compare both approaches with an applicationof regularization theory to visual information processing.

    A nearly paradigmatic assumption in visual neuroscienceis that the visual systems goal is to compute internalrepresentations of the external world, but for which specicpurpose these representations are computed is still an openquestion. It is also an important question, because somerepresentations may be more suited for a given purpose thanother candidate representations. One idea is to view thevisual brain as performing probabilistic Bayesian inference.In a rather general setting, this has been formulated in

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 135313711358

  • 8/6/2019 Modeling the Adaptive Visual System

    7/19

    Kersten and Schrater (1999) , and Barlow recently envi-sioned its potential usefulness for investigating neuronalrepresentations ( Barlow, 2001 ).

    In order to perform probabilistic inference, at least twothings are needed: a model for how a given state H of theenvironment gives rise to the two-dimensional retinal image I ; and a model for the prior probability of the environmentsstate. Denoting with G I l H the probabilistic model for howthe environment produces the images I and with P H theprior probability for the environments state, we canapply Bayes rule Q H l I G I l H P H = P I to obtainthe posterior Q H l I for the environments state given theobserved image I : It is not clear, however, whether thevisual brain indeed performs this type of probabilisticinference. If it does, it is also an open question which model

    G I l H and prior P H are used. Furthermore, how shouldneuronal activation patterns in the visual system be

    interpreted in terms of the above inference rule? Doneuronal activation patterns induced by visual stimulationsomehow represent the full probabilistic informationcontained in the posterior Q H l I as shown to be possiblein Zemel, Dayan, and Pouget (1998) , or do neuronalactivation patterns represent the most likely state of theenvironment only? These issues are reviewed in greaterdetail in Pouget, Dayan, and Zemel (2000) . Note that thesequestions directly relate to the levels (ii) and (iii) of theanalysis levels proposed by David Marr.

    Considering visual information processing as Bayesian

    inference is a mathematically elegant and indeed promisingapproach, but it might be applicable only to the computationof high-level representations much more downstream theprocessing hierarchy than the primary visual cortex, e.g. theinferior temporal cortex in the ventral stream which isthough to perform object recognition. Furthermore, it is alsoonly applicable as long as the underlying states of theenvironment can be formulated as being mutually exclusive.If the neuronal activation patterns in the primary visualcortex are also thought of as representing (probabilistic)information about the state of the environment, then thesestates are far from being intuitive, because in the primaryvisual cortex the visual world is viewed through the smallpinholes of cortical neurons classical receptive elds.Although experimentally so-called contextual effects fromoutside the classical receptive elds of neurons in theprimary visual cortex (see also Section 4.2) have beenreported and well characterized, the ever-changing visualstimulation due to eye movements makes its conceptuallynot straight forward to apply the perspective of Bayesianprobabilistic inference to the primary visual cortex. It seemsmuch more intuitive to consider activation patterns in theprimary visual cortex as being representations based onwhich subsequent processing stages may perform probabil-istic Bayesian inference.

    The key idea behind the Bayesian perspective is toconsider every observed, processed and propagated quantityas a random variable characterized by its probability density

    or distribution function. If one is interested in singlequantities instead of the full probabilistic informationcontained in, e.g., the posterior, the Bayesian framework dictates to integrate over the probability density functions toobtain expectations or conditional expectations. Anothermethod for extracting a single quantity based on theposterior probability density is to select the state of the environment, which maximizes the posterior given theobserved image. This method is known as the maximum aposteriori method (MAP), but it is much more related toparameter estimation than to Bayesian inference in itsoriginal sense.

    Rissanens MDL framework ( Rissanen, 1989 ) focuses onthe efcient description and communication of observeddata instead of inferring underlying causes which makes it a

    promising method to model neuronal representations evenin early sensory systems like the primary visual cortex.Interestingly, for encoding models which utilize priors theMDL principle suggest a method formally equivalent to theMAP method ( Rissanen, 1989 ).

    Let { M I ; w} denote a family of (not necessarilyprobabilistic) models parameterized by w which describeobserved images I : For a given observable image x the MDLprinciple dictates which model w to select in order tocommunicate the observed image x: The communication isaccomplished by rst sending the selected model w to thereceiver and then the description of the image x using thealready communicated model w: The MDL framework then

    dictates to select the model w which minimizes the overallcode length (see below for its denition), i.e. the length forencoding the model w and the length of encoding theobserved image x using the model w.

    If we consider probabilistic models, i.e. M I ; w is theprobability to observe image I given the model w, the MDLprinciple coincides with the Maximum Likelihood principleknown from parameter estimation which states to select thew which maximizes the probability M I x; w: If the M I ; w are probabilistic models composed of a modelG I ; w and a probability density function Pw for theparameters, the MDL principle states to select the w whichminimizes the description length L 2 log G I x; w2

    log Pwwhich is equivalent to selecting the w maximizingthe posterior Qwl I x G I lwPw = P I : Within theMDL framework the negative logarithms of probabilitiesare used as approximations to code lengths, because acoding scheme which assigns to w a code of length -log Pwasymptotically reaches the limit H W for the averagecode length ( Cover & Thomas, 1991 ). Here, H denotesthe entropy. In other words, in this case the MDL principlesuggest a model selection formally equivalent to the MAPmethod from parameter estimation in a Bayesian frame-work. Note that now one can use the prior P wto expressproperties and constraints of the representation systeminstead of assumptions about the environments state. As anexample, the Pwmay express the desire to have so-calledsparse representations (see Section 2.2). However,

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 13531371 1359

  • 8/6/2019 Modeling the Adaptive Visual System

    8/19

    the specic way in which the MDL principle should beapplied to understand neuronal representations in terms of population codes is not clear yet, and the approach reviewedin Section 4.2 may turn out to be just a starting.

    Now we can dene the efciency of a representation asthe average length of the image description. In otherwords, parameters which change at the developmentaltime-scale like receptive elds should be adjusted so thatwhen selecting a representation w of an image x byminimizing L 2 log G I x; w2 log Pwthe expectedlength E L x averaged over the stimulus ensemble isminimized.

    Finally, we also briey like to point out the formalsimilarity of this objective function with a regularizationtheory approach to visual information processing ( Poggio,

    Torre, & Koch, 1985 ): Regularization theory uses regular-izers in objective functions in order to constrain the spaceof possible solutions to so-called ill-posed problems. Notethat the prior in MAP parameter selection and its negativelogarithm in the MDL framework are formally equivalentto regularizers. If the goal of the visual system is to inferthe states of the environment, then this is indeed an ill-posed problem, because the information present in the 2Dactivation patterns of the two retinae about the 3D world isnecessarily incomplete. Hence, using the regularizationtheory approach again allows us to formulate objectivefunctions which need to be optimized for particularparameters of the sensory system similar to the MAPmethod, but now the regularizers again may correspond toprior knowledge about the environment instead of constraints for the sensory system as in the MDLframework.

    These considerations reveal a great exibility forformulating objective functions, but they do not preventone from explicitly stating the principles based on which theobjective function has been formulated. They may differ fordifferent brain areas. For example, the MDL principle mightbe applicable to the primary visual cortex, whereas theBayesian perspective might be applicable to circuits in theprefrontal cortex reading out high-level representations inthe inferior temporal cortex.

    2.2. Deriving receptive elds: low level representations

    Let us now consider receptive eld derivations for low-level representations. Although not stated in this way, theexample ( Olshausen & Field, 1996 ) we consider rstintended for deriving receptive elds in the primary visualcortex can be viewed as an instance of the MDL principle.

    The primary visual cortex is composed of recurrentmicrocircuits, and a mathematical model accounting for thisrecurrency is certainly desirable, but as a mathematicallytractable rst step one may model the neuronal responsessimply as continuous valued ring rates a i which are used ascoefcients in a linear basis function model. In Olshausen

    and Field (1996) the following model

    G I r la N m Xi a

    if r ; s 2

    s ! I r was used, where I r is the intensity value of the 2D imageat the position r r 1 ; r 2; the ring rates a i are coefcients,the f ir are xed basis functions, and N m; s 2 denotes a1D Gaussian density with mean m and variance s 2 toaccount for additive noise. Dening a prior

    Pa Yi1

    Z S exp2 S a i

    for the coefcients one can utilize the following stochasticapproximation algorithm in order to optimize the model

    parameters: For a given small image patch I r selectedrandomly from a database of natural images one nds thecoefcients a which minimize L 2 log G I r la2 log Pavia gradient descent. Then one adapts the parameters of thebasis functions f i in order to minimize the expected lengthE L averaged over the stimulus ensemble. Sparse codingcan be achieved by using so-called sparse priors, i.e. viaS a ila i l or S a ilog1 a 2i : These priors enforcerepresentations where the neurons are active only rarelywhich might be desirable from an energetic point of view,because ring action potentials is metabolically costly.Furthermore, the read-out of a sparse representation bysubsequent processing stages might be easier compared to a

    non-sparse but dense representation.Now, one may ask which basis functions should be

    chosen. Olshausen and Field modeled the basis functionsf ir as long vectors w i with 12 12 144 entriescorresponding to the pixels in 12 12 image patchestaken randomly from natural images. After the stochasticapproximation algorithm converged, the 12 12 receptiveelds (see Fig. 3, taken from Olshausen and Field (1996) )turned out to have shapes like Gabor functions: they arelocalized, oriented and bandpass which is a property of experimentally measurable receptive elds in primaryvisual cortex.

    The most striking aspect of this work is that the resultingreceptive elds share properties with experimentallymeasured receptive elds which is not achieved whenchanging the sparse to a Gaussian prior. This points to animportant role for this constraint. Recent work by vanVreeswijk (2000b) , however, suggest that receptive eldssimilar to Gabor functions may emerge from an optimiz-ation procedure which maximizes the mutual information(see also Section 3.1) between patches taken from naturalimages and the neuronal responses even without enforcingsparseness. The striking difference between both approachesis that van Vreeswijk modeled the neuronal responses asPoisson processes instead of continuous ring rates. Adeeper theoretical understanding of this difference stillneeds to be achieved. The approach to model neuronalresponses as Poisson processes instead of continuous ring

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 135313711360

  • 8/6/2019 Modeling the Adaptive Visual System

    9/19

    rates is a promising direction for future investigations,because this model accounts for both the noisiness and thediscrete nature of neuronal spike responses. Furthermore, invan Vreeswijk (2001a) van Vreeswijk has shown that

    sparseness of neuronal representations modeled via renewalprocesses can be the natural outcome of a mutualinformation maximization approach instead of a constraintintroduced ad hoc into an objective function.

    Of course, clarifying the role of sparseness for neuronalrepresentations is certainly also an experimental issue.Simple extracellular single cell recordings of neurons in theprimary visual cortex when stimulated with natural imagesequences can reveal the sparsity of individual neuronsresponses (life-time sparseness). Another idea behindsparse coding, however, is that for a given image only a fewneurons in a whole population are active (populationsparseness) which could be measured only via simul-taneous recording of multiple neurons during stimulationwith different images taken from the natural environment of the animal. These experimental methods are still beingdeveloped, but results may become available in the nearfuture.

    Efcient and sparse coding are certainly not the onlyprincipled approaches to deriving receptive elds in theprimary visual cortex. Another principle which has beenproposed is temporal coherence which is based on theassumption that objects in the environment change on amuch slower time-scale than the sensory signals themself.This idea is appealing, because despite the resulting learningrules being unsupervised, they capture an essential propertyof an animals environment without resorting to a feedback-based paradigm. One can recover information about

    the objects in the environment by extracting the slowlychanging features of the raw sensory signal, which leadsto the algorithm of slow feature analysis ( Wiskott& Sejnowski, 2002 ). In summary, given sensory signals

    xt x1t ; x2t ; x N t and a set F of real-valued basisfunctions, the goal is to nd a slowly varying functiong x t g1 x t ; g2 x t ; ; g M x t which can beachieved via minimizing the time-averaged squared tem-poral derivative d = dt g j2 as the objective function (see(Wiskott & Sejnowski, 2002 ) and (Berkes & Wiskott, 2002 )for further details like constraints). Interestingly, theresulting receptive elds resemble properties of complexcells in the primary visual cortex like optimal stimuli similarto Gabor functions and phase invariance ( Berkes & Wiskott,2002). Although the emergence of these properties dependson the particular subset of natural image sequences selectedto optimize the objective function, these results suggesttemporal coherence as a plausible objective even for theearly visual system. Furthermore, spatial coherence as anobjective has also been used successfully in a principledapproach which lead to model neurons discovering depth inrandom dot stereograms of curved surfaces ( Becker &Hinton, 1992 ) indicating that intuitive principles liketemporal and spatial coherence are promising candidatesto be placed beside principles relying solely on issuesrelated to efcient encoding.

    2.3. Deriving receptive elds: high-level representation

    So far we have considered the unsupervised derivation of receptive elds for low-level vision only, but one mayspeculate that simple principles may also explain receptive

    Fig. 3. Learned basis functions from natural images with sparseness prior. The image shows the result of optimizing linear receptive elds in order to minimizethe average description length of 12 12 natural image patches (see text for details).

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 13531371 1361

  • 8/6/2019 Modeling the Adaptive Visual System

    10/19

    elds in higher visual areas like the inferior temporal cortexwhich is though to perform object recognition. So far,however, no explicit probabilistic model has been publishedthat, when optimized according to the idea of efcientcoding, produces model neurons that are selective to objectsor parts of objects. It may come as a surprise, but the simplenon-negative matrix factorization algorithm (NMF) ( Lee &Seung, 1999 ) produces model neurons selective for parts of more complex objects. Physiological evidence supportingthe idea that complex objects are represented in termsof their parts is rare and this issue is controversial.Nevertheless, this idea is plausible.

    The basic idea of the NMF algorithm is to produce arepresentation of an ensemble of images via linearlycombined basis images. The image ensemble is describedas an n m matrix V ; where each of the m columns containsn non-negative image pixels. Then, the algorithm nds anapproximate factorization V < WH ; or

    V im < WH im Xr

    a1W ia H am

    of the image ensemble, where the r columns of W are thebasis images, and the each of the m columns in H serves asthe representation of the corresponding image column in V :The factorization of V ; however, is subject to the constraintof having only non-negative entries in H and W which leads

    to the receptive elds shown in Fig. 4(a) (taken from Lee &Seung, 1999). Visual inspection of the two enlarged basisimages suggest that the algorithm indeed performs adecomposition into parts characteristic for the images inV : In this case, the parts correspond to eyes and a mouth.Relaxing the constraint of non-negativity or imposingconstraints corresponding a simple quantization or to thewell-known PCA algorithm leads to receptive elds, whichdo not correspond to object parts when inspected visually.At rst, the constraint of non-negativity seems to beplausible when related to neuronal ring rates, becausethose cannot be negative, but receptive elds in the brain

    often come in opponent pairs, where the positive andnegative ranges of a variable could be represented by thepositive ring rates of two neurons. Nevertheless, the NMFalgorithm may turn out to be an important starting point forlearning of how to decompose images of objects.

    Note also, that the resulting parts produced by the NMFalgorithm reect properties of the image ensemble usedduring training. If objects of face images are decomposedinto parts, it may not be a surprise to nd the parts beingnoses, eyes and mouths. It might be interesting to testwhether the NMF algorithms also decomposes an imageensemble of face images and images of artifacts into partswhich are compatible with the intuitive notion of artifactsparts. Furthermore, one may speculate that a description of object sets in terms of their parts may turn out to be efcient

    Fig. 4. Derived receptive elds for high-level representations. (a) Receptive elds resembling parts of objects produced by the NMF algorithm. Shown are allbasis images and two enlarged ones. (b) Receptive elds produced by selecting the image fragments informative for a classication task (left) set up todistinguish faces from cars and the dependence of the corresponding mutual information objective function on the size of the extracted features (right) withintermediate sizes being optimal.

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 135313711362

  • 8/6/2019 Modeling the Adaptive Visual System

    11/19

  • 8/6/2019 Modeling the Adaptive Visual System

    12/19

    when transformed with g x as shown in Fig. 5(d) .The probability density for the outputs caused by theGaussian input distribution with mean m1 0 is much moreuniform than the one produced by the Gaussian with m2 0:5 which can be understood by considering the transferfunction in Fig. 5(b) which has its linear range matched to

    the region of high input density. For most inputs producedby the Gaussian with m2 0:5 this transfer function worksin its satration regime with high output ring rates (see Fig.5(d) , dotted line) occuring much more frequently than lowoutput ring rates. Plotting the output entropy for both inputdensities as a function of the transfer function shift D xreveals that the maximum is always attained at thecorresponding means of the Gaussian where most of theprobabilistic mass is centered.

    Considering a particular neuronal system with, e.g. asigmoidal transfer function the Infomax principle predictsthat the transfer functions should change if the inputdistribution changes. Here we have considered only theshift of the transfer function, but neuronal transferfunctions may also change their shape, slope, etc.

    3.2. Contrast adaptation

    Let us consider now an application of the Infomaxprinciple to contrast adaptation of simple cells in theprimary visual cortex which links all three analysis levelsproposed by David Marr: the computational theory, the

    algorithm level and the underlying neuronal mechanism.Fig. 6(a) shows the experimental protocol used in

    contrast adaptation experiments. First, the cell is adaptedto a high (or low) contrast adapting stimulus which in thiscase is a slowly moving grating. Then, the contrast responsefunction is measured using short presentations of a teststimulus with varying contrast. Fig. 6(b) shows experimen-tal results for intracellular recordings of a simple cell in catprimary visual cortex taken from Carandini and Ferster(1997) . From left to right Fig. 6(b) shows how the contrastresponse functions for the stimulus-dependent rst harmo-nic of the membrane potential (F1 component), the meanmembrane potential (DC component) and the rst harmonicof the ring rate adapt to the contrast of the stimulus shownduring the adaptation phase. The F1 component of

    Fig. 5. (a) The Infomax performance criterion is the mutual information between the stimuli X and the neuronal representation Y and involves thestatistics P X of the environment and the probabilistic black box description P Y l X of the representation system itself. (b) The neuronal transfer functiong x for b 5 and D x 0: (c) Two Gaussian densities for the input X with s 0:3 and m1 0 (solid line), m2 0:5 (dotted line). (d) Densities for theoutput y g x resulting from transforming the corresponding input densities. (e) Output entropies as a function of the shift D x of the transfer functionfor the two input densities.

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 135313711364

  • 8/6/2019 Modeling the Adaptive Visual System

    13/19

    the membrane potential does not show adaptation (left), butits DC component does (middle). Furthermore, the contrastresponse function of the ring rate (right) shifts towardshigher contrast values following a prolonged presentation of high contrast stimuli, and shifts in the opposite direction if the preceding adapting stimulus had a low contrast.

    During the last decade several ideas emerged to explainthe underlying mechanism, but none of them was fullyconsistent with the available experimental data. Whichadaptation mechanism could account for an adaptation of the mean membrane potential and the F1 component of thering rate, but without predicting an adaptation of themembrane potentials F1 component? On the one hand, agroup of studies suggested that plasticity of excitatorysynaptic weights is responsible for contrast adaptation. Onthe other hand, an adaptation like the rescaling of synapticweights would also predict an adaptation of the membranepotentials F1 component. In order to resolve thiscontradiction, we proposed a model, which involveschanging the dynamic properties of the afferent geniculo-cortical, synapses from the LGN via modulating thetransmitter release probability as the key mechanism(Adorjan, Piepenbrock, & Obermayer, 1999b ). Thus, wemodeled the afferent synapses not as static, but as dynamicsynaptic weights with their value being dependent both onthe recent history of stimulation and the transmitter releaseprobability. In other words, changing the transmitter release

    probability changes the dynamic properties of the synaptictransmission.

    Fig. 6(d) shows a sketch of an orientation column modelwe set up in order to explain contrast adaptation. In thismodel, the dynamic synapses are located at the feedforwardconnections. Fig. 6(c) shows simulation results of the modelwith high (solid lines, i.e. low contrast) and low (dottedlines, i.e. high contrast) synaptic release probability. Thesimulation results agree in a qualitative way well with theexperimental contrast response functions measured afteradaptation to stimuli with high and low contrast. Thissuggests the mechanism of changing the synaptic releaseprobability as a candidate mechanism underlying contrastadaptation. Note that this mechanism also accounts for theindeed puzzling experimental observation that although theF1 component of the ring rate and the mean membranepotential adapts to the contrast of the stimulus, the F1component of the membrane potential does not showadaptation.

    Now, one may ask how the synapses adapt theirtransmitter release probability. How can they knowwhether the adapting stimulus had a high or a low contrast?In Adorjan et al. (1999a) we derived an online adaptationrule for the synaptic release probability based on theInfomax objective function: Let MI X ; Y H Y 2 H Y l X denote the mutual information between the inputring rates X and the output ring rates Y of a simple cell in

    Fig. 6. Contrast adaptation of simple cells in the primary visual cortex. (a) The experimental protocol to probe contrast adaptation. (b) Experimental results forcontrast adaptation in cat primary visual cortex. Shown are contrast response functions for the F1 component of the membrane potential (left), the DCcomponent of the membrane potential(middle) and the F1component of the ring rateafter adaptation to a low contrast (solid circles) and highcontrast (emptycircles) stimulus. (c) Simulation results of a modeled orientation column with an online adaptation rule dictating how to change the release probability of theafferent synapses with the solid (dotted) lines corresponding to contrast response functions after adaptation to a low (high) contrast stimulus. (d) Illustration of the orientation column model.

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 13531371 1365

  • 8/6/2019 Modeling the Adaptive Visual System

    14/19

    primary visual cortex. The input ring rates X aretransformed into output ring rates Y via a transfer function y g x; p similar to the one used for our illustration inSection 3.1. Similar to D x in Section 3.1, the transmitterrelease probability p is a parameter of this transfer function.Now, the goal is to maximize H Y with respect to p; sincewe assumed the transfer function to be deterministic whichimplies that the mutual information as a function of theparameter p has its maximum where the output entropy H Y is maximal. Noting that

    H Y 2 E log p y g x; p p x 2 E log p x = j

    xg x; pj p x

    E logj

    x g x; pj p x2 E log p x p xwe can utilize a stochastic approximation procedure withthe gradient of H Y with respect to the release probability pbeing sampled in an online manner leading to an adaptationrule

    t adapt

    t p F p; x

    which dictates how to change the transmitter releaseprobability p as a function of the actual presynaptic ringrate x (See(Adorjan,Piepenbrock & Obermayer,1999b )foradenition of F). We set the time constant to t adapt 7 s in

    order to account for the experimentally observed time-scaleof contrast adaptation. The stochastic gradient approxi-mation used in our approach is only one possible optimiz-ation method, but since it is an online learning rule, it ispossible to implement it biologically. It is conceivable,however, that natural systems employ different optimizationmethods in order to optimize the quantity H Y :

    The results in Fig. 6(c) have been generated bysimulating the experimental protocol for contrast adap-tation, but with the above adaptation rule governing theadaptation of the release probability p: In the context of theInfomax principle, the shift of the ring rate functiontowards lower contrast values can be understood as adynamic readjustment of the ring rate functions dynamicrange of those contrast values which occurred in the recentpast. Compare this with Fig. 5(b) and (c) which illustratethat shifting the linear part of a sigmoidal transfer functiontowards the mean value of the Gaussian input distributionmaximizes the output entropy H Y by equalizing the outputring rates.

    These results suggest that contrast adaptation mightindeed be understood as a consequence of an ongoingoptimization of neuronal response functions to changingcontrast statistics in order to ensure an accurate represen-tation of all possible contrast values. Note, however, that fora read-out to estimate the absolute value of the stimuluscontrast represented by the output ring rate of neurons inthe primary visual cortex, it needs to adapt as well.

    Therefore, another interpretation of contrast adaptationapplicable to a read-out which does not adapt is a low-levelpreprocessing in order to ensure contrast invariance. Inother words, a non-adapting read-out seeing the visualworld via the ring rates of adapting cells in the primaryvisual cortex could focus on pattern vision independent of the average contrast values in the environment.

    With this exercise in modeling contrast adaptation wecombined all three analysis levels: as the functional role of contrast adaptation in primary visual cortex we proposed theongoing adaptation of the neuronal representation in orderto ensure the accurate representation of absolute contrastvalues and/or the computation of contrast invariance. At thealgorithmic level we assumed a stochastic sampling of stimulus contrasts in order to adapt the neuronal systems

    parameters as dictated by the Infomax objective function. Atthe level of the neuronal realization we assumed thatcontrast adaptation is mediated by adapting the transmitterrelease probability of afferent synapses.

    4. The perceptual time-scale

    Fast neuronal dynamics happen on the time-scale of asingle stimulus presentation (perceptual time-scale) whichprevents to consider them as an adaptation to changingstatistics of the input (environmental time-scale). Let usnow consider two examples which assign a functional role

    to these fast neuronal dynamics.In Section 4.1 we propose an extension of the Infomax

    principle which takes into account a possibly time-varyingmodel to encode the signals and assumes a continuous read-out, and in Section 4.2 we review an efcient encodingapproach for explaining contextual effects in the primaryvisual cortex which suggests the view of fast neuronaldynamics as a search process for an efcient description of the sensory input.

    4.1. Rapid adaptation to internal states

    Recently, Bialek and coworkers ( Brenner, Bialek, & deRuyter van Steveninck, 2000 ) provided direct experimentalevidence for the Infomax principle by showing that theinput/output relation of the motion-sensitive H1 neuron inthe y visual system indeed adapts to the changing statisticsof a selected stimulus ensemble, namely simulated visualvelocity signals. They also showed that the time-scale of thisadaptation is close to the physical limit imposed bystatistical sampling. Therefore, it seems as if even extremelyrapid adaptation could be interpreted as an optimizationwith respect to changing input statistics. But does this alsoapply to the mammalian visual system with its experimen-tally observed transient dynamics to ashed but otherwisestatic stimuli?

    We argued that the functional interpretation offeredby the Infomax principle is only applicable as long as

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 135313711366

  • 8/6/2019 Modeling the Adaptive Visual System

    15/19

    the time-scale of adaptation is slow compared to the time-scale at which the statistics of the stimulus ensemblechanges. Only in this case an organism could estimatechanges of stimulus statistics and adapt its sensory system.For the y visual system this interpretation may indeed beapplicable, because the statistics of the input to the H1neuron, i.e. the variance of the velocity signal, may changeabruptly corresponding to a transition from cruising ight toacrobatic or chasing behavior. In the mammalian visualsystem, however, even static environments are scanned withsaccadic eye movements, but the neuronal responses inprimary visual cortex are time-dependent on the time-scaleof a typical xation period ( < 300 ms). Interpreting thesefast dynamics as an adaptation in the sense of the Infomaxprinciple is not straight forward, because the adaptation

    happens during a single stimulus presentation, and cantherefore not be viewed as an adaptation to changingstatistics of the input, simply because statistics cannot besampled based on a single observation.

    We have shown ( Adorjan, Schwabe, Wenning, &Obermayer, 2002 ) that these dynamics can still be under-stood with information-theoretic concepts by havingformulated an objective function and suggested that rapidadaptation mechanisms which underly the fast and transientdynamics in sensory systems could be the result of optimizing this objective function. In particular, weextended the Infomax principle by having proposed themaximization of the mutual information

    MI S ; Rt H S 2 E H S l Rt r t P r t ; b t between the stimuli S and the neuronal representation Rt attime t after stimulus onset as the objective function for therapid adaptation mechanisms operating during the presen-tation of a single stimulus. In the above equation, H andE denote the entropy and expectation of a randomvariable, and the expectation is taken with respect toPr t ; b t dsP sP r t ls; b t with s being the stimulusvariable and Ps its assumed distribution. The time-dependent parameter b t characterizes the state of rapidadaptation mechanisms and is assumed here to beindependent of the stimulus.

    The rst term in the above equation reects the prioruncertainty about the stimulus before having observed anyneuronal response. The second term is the so-called noiseentropy and corresponds to the average reduction of uncertainty about the actual stimulus after having observedthe neuronal representation r t at time t : The representations,however, depend on the parameterization of the sensorysystem which may be time-dependent as indicated by the b t :Thus, our extension of the Infomax principle is twofold:First, we assume a continuous read-out of the neuronalrepresentation. Second, we allow the parameterization of the sensory system to change on the time-scale of a singlestimulus presentation which might become necessary whenassuming a continuous read-out.

    Let us consider an example in the context of encodingvisual information by a local orientation hypercolumn inthe primary visual cortex which is composed of recurrentlyconnected orientation columns. Fig. 7(a) shows how astatic stimulus s induces a spatial-temporal responsepattern N 1 ; N 2; ; N t in the encoding population which isrelayed to a hypothesized downstream population, i.e. theread-out of the encoded information. Each N i correspondsto the number of spikes observed in a short time-intervalbeginning at the ith time step, and the representation Rt denotes the accumulated spike count up to time step t :Now, even if the encoding population does not receive anyfeedback from the read-out, it is a valid question to ask whether the parameter b t characterizing the instantaneousinput/output relation of the encoding population should be

    adapted or not.We investigated this question by setting up an effectivedescription of a monocular patch in a local corticalorientation hypercolumn in the primary visual cortex withthe real-valued inputs representing the intensities of localfeatures, e.g. oriented edges. Given the effective network transfer function

    l t ;is : t expb t si

    X L

    j1expb t s j

    with b t being the only free, but possibly time-dependent

    parameter, each l t ;is denotes the expected number of spikes the ith neuron res in the t th time-interval of durationt in response to the stimulus s: Fig. 7(b) illustrates thismodel. This model is a so-called divisive normalizationmodel, which is not intended to account for all possiblebiophysical details of the microcircuits in primary visualcortex, but mimics the experimentally observed phenom-enon of recurrent competition between orientation columns.Presenting a stimulus, which activates multiple orientationcolumns like, crossed bars, the orientation columns with thehigher activity can be viewed as suppressing orientationcolumns with lower activation leading to a competition forneuronal activation. The parameter b parameterizes thelevel of this competition with high values corresponding tostrong and low values corresponding to low competition.The outputs are modeled via Poisson processes withtheir time-varying rates given by the l t ;i : Assuming furthera non-negative and sparse distribution Ps for the stimulis we performed a numerical greedy optimization toobtain the optimal parameterization for the free network parameters b t :

    We found that only a dynamic parameterization performswell as quantied by our objective function. Fig. 7(c)illustrates this result. During the initial phase of the responsehigher, and at later times lower values of b t are preferred.Let us compare the dynamic parameterization with staticstrategies. Fig. 7(c) (dashed lines) shows static parameter-izations for high and low values of b t ; and Fig. 7(d) shows

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 13531371 1367

  • 8/6/2019 Modeling the Adaptive Visual System

    16/19

    the corresponding mutual information. For the low value theincrease is slow, whereas for the high value it is faster. Onthe other hand, the mutual information obtained with thehigh value saturates and will be outperformed by the lowvalue. In other words, a spike-counting read-out mechanismobtains on average more information about the stimulus if the sensory system is dynamic.

    Intuitively, for our particular model of a network transferfunction with adjustable competition this can be understoodas follows: Initially, the level of output noise is high,because only a few spikes are available for representation.Strong competition, however, biases the representationtowards the most salient input, which is a sensible strategy,because our assumed input distribution implies thattypically only a single input line is strongly active andneeds to be represented. Later, when more spikes areavailable for representation and averaging over time allowsthe signal to be separated from the noise, reducedcompetition allows the signaling of possibly multiple activeinputs and their faithful representation. Thus, the adaptation

    strategy with the dynamic parameterization rst transmitsinformation about the salient features of the input, thenabout the less salient details.

    We also applied this idea to a much more detailedbiophysical model of an orientation hypercolumn in theprimary visual cortex and found that the mechanisms of spike-frequency adaptation of cortical pyramidal neurons(Adorjan et al., 2002; Schwabe, Adorjan, & Obermayer,2001) or fast synaptic depression ( Adorja n et al., 2000 ) atthe recurrent excitatory synapses may serve as theunderlying substrate to realize the dynamic competitionsuggested by our abstract model as a promising strategy interms of the extended Infomax principle.

    4.2. Contextual effects

    The idea of efcient encoding has also been applied usinga hierarchical probabilistic model intended to explaincontextual effects in the primary visual cortex. These so-called contextual effects refer to the phenomenon that

    Fig. 7. The idea of dynamic coding. (a) A static sensory stimulus S is applied and induces noisy neuronal responses models as a Poisson process with time-varying mean ring rate l t : The instantaneous spike responses N t are observed by a spike-counting read-out and summarized into a summed spike-count Rt : (b)Illustration of the hypercolumn model L inputs and outputs and time-dependent recurrent competition parameterized by b t : (c) Predicted optimal dynamics of the competition b t (solid line) and static sub-optimal static competition (dotted lines). (d) Resulting mutual information between the input S and theaccumulated spike-count Rt as a function of the window used for counting the spikes for different dynamics of the competition parameter b t :

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 135313711368

  • 8/6/2019 Modeling the Adaptive Visual System

    17/19

    although stimulation outside the classical receptive eld(the surround) do not cause the ring of neurons, theymodulate the responses to stimulation within the neuronsclassical receptive eld (the center). These contextualeffects have been found to be dependent on the contrastand orientation of the stimuli within and outside theclassical receptive eld. The phenomenon reported mostfrequently by different groups is that if the stimulusorientation of the center and surround are equal, thenthe responses are suppressed compared to when the centerstimulus is shown alone. Fig. 8(b) (solid line) showsthe response of a complex cell in layer 2/3 of cat primary

    visual cortex as a function of the length of a bar used forstimulation. As the bar length increases, the center andsurround are stimulated with the same orientation and theresponse of the center becomes suppressed. Although thedetailed neuronal circuitry and the functional role of theseeffects are still under investigation, the reduced suppressionwhen inactivating layer 6 ( Fig. 8(b) , dotted line) which ispart of the feedback loop connecting the primary andsecondary visual cortex suggests that top downfeedback may contribute to these contextual effects,because blocking layer 6 may also impair the re-entry of feedback signals.

    Fig. 8. Contextual effects predicted via efcient encoding with a hierarchical model. (a) Model architecture. (b) End-stopping complex cell responses in layer

    2/3 of cat visual cortex when inactivating layer 6 (dotted line) and for the control case (solid line). (c) Model predictions for the case of active (solid line) andinactive (dotted line) feedback from the next higher level. (d) Predicted responses for the case of stimulating center and surround with different (solid line) andthe same (dotted line) orientations. Figs. bd taken from (Rao and Ballard, 1999).

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 13531371 1369

  • 8/6/2019 Modeling the Adaptive Visual System

    18/19

    Rao and Ballard (1999) set up a hierarchical probabilisticmodel and applied the idea of efcient encoding in order toderive the receptive elds of and connectivity between twolayers of neurons corresponding to the primary andsecondary visual cortex. Their model architecture isshown in Fig. 8(a) . The raw visual input to layer 1 (themodeled primary visual cortex) is provided by layer 0 (theraw retinal image). The outputs of layer 1 serve as the inputsto layer 2 (the modeled secondary visual cortex), whichprojects back to layer 1. They propose that the ring rates a nat each level n in the processing hierarchy serve as therepresentation of the feedforward input In to this layer. Therepresentations a n in turn serve as the feedforward inputIn 1 to the next level n 1:

    In contrast to the sparse coding model reviewed in

    Section 2.2, their hierarchical model GInla nhas the form

    GIn lan N m f Una ns 2 s 2nIn

    where the feedforward input In is a long column vector, theentries in Un serve as the weights for the feedback projections from layer n to layer n 2 1; s 2n is a constantvariance for an assumed additive noise, and f is asigmoidal neuronal transfer function. The effect of feedback from layer n 1 to layer n is described by a model

    Pa n la n 1 N m f Un 1a n 1; s 2 s 2na n

    which in this form can be viewed as a prior for theactivations a n propagated back from level n 1 to layer n:Utilizing the feedback projections as a way to communicateinformation back to the lower representation level might bebenecial, because neurons at higher levels have largerreceptive elds and therefore could help to disambiguate therepresentation at lower levels leading to the experimentallyobserved contextual effects.

    Rao and Ballard set up the objective function

    E 2 log GIn la n2 log Pan lan 1 ga hU

    with ha and hU expressing constraints for the synapticweights U and the activations a like the sparsenessconstraint utilized in Section 2.2. Randomly selectingpatches of natural images, then performing a search forthe best representation a for nearly xed synaptic weights Uand only slowly adapting these weights during this randomsampling they found receptive elds in terms of the columnsof UT which at the level 1 look like Gabor functions as longas ga enforces sparseness. The columns of UT can beinterpreted as receptive elds, because in the gradient

    ddt

    a /d

    daE

    k 1s 2n

    UT f T

    aI 2 f Ua

    k 1s 2n

    a td 2 a2k 12

    g0a

    used for searching representations a with k 1 being a constantand g0 the derivative of g with respect to a the term UT actsas a feedforward weight matrix. With this interpretation, theith row of UT represents the synaptic weights of a singleneuron with activity a i : They also found that when

    inactivating the feedback term during the gradient descentsearch for a good representation a ; their model producesresponses similar to the one observed experimentally wheninactivating layer 6 as shown in Fig. 8(c) .

    For our consideration of neuronal dynamics at theperceptual time-scale, however, a different observation ismore relevant. Fig. 8(d) shows the responses a i of a selectedrepresentation neuronin therst layer tovisualstimulationof center and surround with the same (dotted line) and different(constant line) orientations as a function of the time afterstimulus onset.Note that here thefast neuronal dynamics of aat the perceptual time-scale of a single stimulus presentationare considered as the signature of a dynamic search processintended to minimize the proposed objective function for agiven input image with respect to the neuronal activations a :

    Based on these simulation results, one could predict the timecourse of contextual effects which, to the best of ourknowledge, have not been explored experimentally.

    Although it needs to be investigated in greater detailwhether transient dynamics in primary visual cortex couldindeed be viewed as some sort of an online optimization of an objective function for a given input image, it is at leastanother possible interpretation of transient dynamics at theperceptual time-scale.

    5. Summary

    In summary, we have reviewed principled approaches tomodeling the visual system which consist essentially of formulating an objective function and then optimizing thefree parameters of simple models with respect to thisobjective function. However, when trying to say more aboutwhat the visual system is computing than just some sort of representation, one has to be precise and, in the ideal case,to use a coherent mathematical framework. With this reviewwe tried to give an overview of concrete principledapproaches with a focus on ideas we believe are under-represented in the literature like the MDL principle andrapid adaptation to internal states.

    As a guideline we utilized the general idea of adaptation at multiple time-scales, because we believethat this idea may qualify as a general paradigm forderiving functional interpretations of changes of receptiveeld properties. Our own work reviewed in Section 4.1exemplies that this method is also applicable tofast neuronal dynamics for which almost no explicitfunctional interpretations are present in the literature.However, in order to test whether the view we suggest isonly a nice theoretical idea, or actually realized inbiological systems, further experiments are needed topinpoint for concrete phenomena the conditionsmaythey be external or internalwhich are changing andinducing changes of receptive eld properties.

    A recapitulation reveals that many proposed principlesmight be reduced to optimizing a regularized objective

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 135313711370

  • 8/6/2019 Modeling the Adaptive Visual System

    19/19

    function which often might be cast easily in terms of efcient coding using the MDL model selection principlewith the logarithm of the prior corresponding to theregularizer. (Hence, our detailed review of the MDLprinciple, which is well suited for explicitly includingbiological constraints). As an example, consider the way wepresented the idea of sparse coding which has beenpresented originally in terms of such a regularized objectivefunction ( Olshausen & Field, 1996 ). We simply consideredsparseness as being a desirable property as expressed via asparseness prior. We believe that more constrained modelsneed to be proposed with the model and constraints beingargued for from a biological perspective. Then, comparingmodel predictions with experimental measurements wouldallow to sort out those models and constraints hopefully

    corresponding to the investigated sensory system.In this review article we have investigated mainly ringrate models. Neurons, however, are often considered to benoisy computing devices, and this noise (beside otherconstraints) needs to be taken into account when modelingneuronal systems. This problem is much more severe than itappears in the rst place, because neuronal noise may havedifferent characteristics than the additive noise usuallyassumed in probabilistic models of sensory data. Further-more, neurons spike. Thus, (i) modeling neuronal responseswith greater biological detail than ring rate models, and(ii) explicitly including other biological constraints seemsto become imperative for principled approaches.

    Acknowledgements

    This work was supported by the Wellcome Trust(061113/Z/00) and the German Science Foundation (SFB618 and DFG 120-3).

    References

    Adorjan, P., Levitt, J. B., Lund, J. S., & Obermayer, K. (1999a). A modelfor the intracortical origin of orientation preference and tuning in

    macaque striate cortex. Visual Neuroscience , 16 , 303318.Adorjan, P., Piepenbrock, C., & Obermayer, K. (1999b). Contrastadaptation and infomax in visual cortical neurons. Reviews in the Neurosciences , 10 , 181200.

    Adorja n, P., Schwabe, L., Piepenbrock, C., & Obermayer, K. (2000).Recurrent current competition: strengthen or weaken. In S. A. Solla,T. K. Leen, & K.-R. Muller (Eds.), ( Vol. 12 ) (pp. 8995). Neural Information Processing Systems NIPS , MIT Press.

    Adorjan, P., Schwabe, L., Wenning, G., Obermayer, K (2002). Rapidadaptation to internal states as a coding strategy in visual cortex? Neuroreport , 13(3), 33742.

    Barlow, H. B. (2001). Redundancy reduction revisited. Network , 7 (2),251259.

    Becker, S., & Hinton, G. (1992). Self-organizing neural network thatdiscovers surfaces in random-dot stereograms. Nature , 355 , 161163.

    Berkes, P., & Wiskott, L. (2002). Applying slow feature analysis to imagesequences yields a rich repertoire of complex cell Properties . InProceedings of the ICANN02 , Springer, pp. 81 86.

    Brenner, N., Bialek, W., de Ruyter, R., & de Ruyter van Steveninck, R.(2000). Adaptive rescaling maximizes information transmission. Neuron , 23 (3), 695702.

    Carandini, M., & Ferster, D. (1997). A tonic hyperpolarization underlyingcontrast adaptation in cat visual cortex. Science , 276 , 949952.

    Cover, T., & Thomas, J. (1991). Elements of informationtheory . New York:Wiley.

    Ferster, D., & Miller, K. D. (2000). Neural mechanisms of orientationselectivity in thevisual cortex. Annual Reviews of Neuroscience , 23.

    Grossberg, S. (1976). Adaptive pattern classication and universalrecoding, I: parallel development and coding of neural feature detectors. Biological Cybernetics , 23 , 121134.

    Hubel, D. H., & Wiesel, T. N. (1968). Receptive elds and functionalarchitecture of monkey striate cortex. Journal of Physiology , 195 ,215243.

    Hubel, D. H., & Wiesel, T. N. (1977). Functional architecture of macaquemonkey visual cortex. Proceedings of Royal Society of London B , 198 ,159.

    Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcementlearning: a survey. Journal of Articial Intelligance Research , 4,237286.

    Kandel, E. R., Schwarz, J. H., & Jesse, T. M. (1996). Principles of neuralscience . New York: McGraw-Hill.

    Kersten, D., & Schrater, P. W. (1999). Perception theory: Conceptual pattern inference theory: A probabilistic approach to human vision .New York: Wiley.

    Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature , 401 , 759760.

    Levitt, J. B., & Lund, J. S. (1997). Contrast dependence of contextualeffects in primate visual cortex. Nature , 387 , 7376.

    Marr, D. (1982). Vision . New York: Freeman.Minsky, M. (1967). Computation: Finite and innite machines . Engelwood

    Cliffs, NJ: Prentice Hall.Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive

    eld properties by learning a sparse code for natural images. Nature ,381 , 607609.

    Poggio, T., Torre, V., & Koch, C. (1985). Computational vision andregularization theory. Nature , 317 , 314319.

    Pouget, A., Dayan, P., & Zemel, R. (2000). Information processingwith population codes. Nature Review in Neurosciences , 12 , 125132.

    Rao, R. P. N., & Ballard, D. H. (1999). Predictive coding in the visualcortex: a functional interpretation of some extra-classical receptive-eld effects. Nature Neuroscience , 2.

    Rissanen, J. (1989). Stochastic complexity in statistical inquiry . Singapore:World Scientic Publishing.

    Schwabe, L., Adorjan, P., & Obermayer, K. (2001). Spike-frequencyadaptation as a mechanism for dynamic coding. Neurocomputing , 38 40 , 351358.

    Seitz, A. R., & Watanabe, T. (2003). Is subliminal learning really passive? Nature , 422 (6927), 36.

    Siegelmann, H. T., & Sontag, E. D. (1991). Turing computability withneural nets. Applied Mathematics Letters , 4, 7780.

    Sompolinsky, H., & Shapley, R. (1997). New perspectives on themechanisms for orientation selectivity. Current Opinion in Neurobiol-ogy , 7 , 514522.

    Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classication. Nature Neuroscience , 5.

    van Vreeswijk, C. (2000a). Using renewal neurons to transmit information.European Biophysics Journal , 29 , 245.

    van Vreeswijk, C (2000b). Whence Sparseness. In: Neural InformationProcessing Systems NIPS, MIT Press (vol. 13), pp. 180186.

    Wiskott, L., & Sejnowski, T. J. (2002). Slow feature analysis: unsupervisedlearning of invariances. Neural Computation , 14(4), 715770.

    Zemel, R., Dayan, P., & Pouget, A. (1998). Probabilistic interpretation of population codes. Neural Computation .

    L. Schwabe, K. Obermayer / Neural Networks 16 (2003) 13531371 1371