two projects in theoretical neuroscience

The Pennsylvania State UniversityThe Graduate School

Eberly College of Science

TWO PROJECTS IN THEORETICAL NEUROSCIENCE:

A CONVOLUTION-BASED METRIC FOR NEURAL MEMBRANE

POTENTIALS AND A COMBINATORIAL CONNECTIONIST

SEMANTIC NETWORK METHOD

A Dissertation in

Physics

by

Garrett Nolan Evans

c© 2015 Garrett Nolan Evans

Submitted in Partial Fulfillmentof the Requirements

for the Degree of

Doctor of Philosophy

May 2015

This dissertation by Garrett Nolan Evans has been reviewed and approved1 by:

John C. CollinsDistinguished Professor of PhysicsDissertation AdviserChair of the Committee

Reka AlbertDistinguished Professor of PhysicsProfessor of Biology

Steven J. SchiffBrush Chair Professor of EngineeringProfessor of Engineering Science & MechanicsProfessor of NeurosurgeryProfessor of Physics

Brad WybleAssistant Professor of Psychology

Peter MolenaarDistinguished Professor of Human Development and Family StudiesSpecial Signatory

Nitin SamarthProfessor of PhysicsDownsbrough Head of the Department of Physics

1Signatures are on file in the Graduate School.

ii

Abstract

In this work, I present two projects that both contribute to the aim of discovering howintelligence manifests in the brain.

The first project is a method for analyzing recorded neural signals, which takes the formof a convolution-based metric on neural membrane potential recordings. Relying only onintegral and algebraic operations, the metric compares the timing and number of spikeswithin recordings as well as the recordings’ subthreshold features: summarizing differencesin these with a single “distance” between the recordings. Like van Rossum’s (2001) metricfor spike trains, the metric is based on a convolution operation that it performs on theinput data. The kernel used for the convolution is carefully chosen such that it producesa desirable frequency space response and, unlike van Rossum’s kernel, causes the metricto be first order both in differences between nearby spike times and in differences betweensame-time membrane potential values: an important trait.

The second project is a combinatorial syntax method for connectionist semantic networkencoding. Combinatorial syntax has been a point on which those who support a symbol-processing view of intelligent processing and those who favor a connectionist view havehad difficulty seeing eye-to-eye. Symbol-processing theorists have persuasively argued thatcombinatorial syntax is necessary for certain intelligent mental operations, such as reason-ing by analogy. Connectionists have focused on the versatility and adaptability offered byself-organizing networks of simple processing units. With this project, I show that there isa way to reconcile the two perspectives and to ascribe a combinatorial syntax to a connec-tionist network. The critical principle is to interpret nodes, or units, in the connectionistnetwork as bound integrations of the interpretations for nodes that they share links with.Nodes need not correspond exactly to neurons and may correspond instead to distributedsets, or assemblies, of neurons.

iii

Table of Contents

List of Figures vi

Acknowledgements viii

1 Introduction 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 A tool for metric space analysis of membrane potentials . . . . . . . 21.1.2 A connectionist semantic representation framework . . . . . . . . . . 3

1.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 The Hodgkin-Huxley neuron . . . . . . . . . . . . . . . . . . . . . . 61.2.2 The binary neuron and the perceptron . . . . . . . . . . . . . . . . . 81.2.3 Continuously-valued activations . . . . . . . . . . . . . . . . . . . . . 101.2.4 Continuous time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2.5 Linear separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2.6 Observing neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3 Hopfield networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.4 Multi-layer feedforward networks and parallel distributed processing . . . . 161.5 PDP, connectionism and intelligence theory . . . . . . . . . . . . . . . . . . 191.6 Concept theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.7 Neural signal analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.7.1 Spectral analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.7.2 Point process formalism . . . . . . . . . . . . . . . . . . . . . . . . . 281.7.3 Spike train spectral analysis . . . . . . . . . . . . . . . . . . . . . . . 32

1.8 Concluding remarks and itemization of specific contributions . . . . . . . . 33

2 Convolution-based Metric for Neural Membrane Potentials 352.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.3.1 A generalized convolution-based metric . . . . . . . . . . . . . . . . 422.3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.4.1 Single spike recordings . . . . . . . . . . . . . . . . . . . . . . . . . . 442.4.2 Many-spike recordings . . . . . . . . . . . . . . . . . . . . . . . . . . 462.4.3 Fourier analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

iv

2.4.4 Demonstrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592.6 Proof of Eq. (2.29) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622.7 Triangle inequality for dgen . . . . . . . . . . . . . . . . . . . . . . . . . . . 642.8 Kernel for noisy data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662.9 Estimating the spike area for the simulated Hodgkin-Huxley neuron . . . . 67

3 Combinatorial Connectionist Semantic Network Method 683.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.2 Combinatorial syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.3 Role-binding, feature-binding and multi-binding . . . . . . . . . . . . . . . . 723.4 Multi-binding as combinatorial syntax . . . . . . . . . . . . . . . . . . . . . 743.5 Partial implementation: A formal-language–MBN translator program . . . . 76

3.5.1 The formal language representation . . . . . . . . . . . . . . . . . . 763.5.2 The Multi-Binding Network representation . . . . . . . . . . . . . . 793.5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

References 88

v

List of Figures

2.1 Point-comparison metrics and membrane potentials. In A, the action po-tential timing is a closer match than in B, but according to dpt, since themismatched action potentials do not overlap in either case, both recordingpairs are the same distance apart. . . . . . . . . . . . . . . . . . . . . . . . . 36

2.2 Plots of the σHP , σDH , and σX kernel functions discussed in the text. In A,the functions are plotted. For the sake of comparison, all have been normal-ized to unit area. In B, we have the Fourier transforms in decibels relativeto the zero-frequency amplitude. Here, we are using α = 2 for σHP anda = 1.8, b = 0.92 for σDH . Of the parameters found in the literature, thesegive σHP and σDH the slowest high-frequency fall-off while also maintaining aminimum-free spectrum. For σX , the high-frequency fall-off is considerablyslower. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

2.3 Gamma function, γX(x), showing the behavior of dC as spike pairs separate. 54

2.4 Demonstrations. Left column: dC (×), scaled D2vR (∗), and scaled DVP ()

metrics evaluated on several different types (A–E ) of computer-generatedneural data versus the offset of features of the data. The two argumentsfor the metrics are a randomly generated signal and a modified version ofthe same signal. See text for details. Right column: Sample raw and σX -convolved data of each type. Gray trace is 50-ms-offset version of black trace.(A) Delta function Poisson spikes at 4 Hz (on average). (B) Hodgkin-Huxley(HH) neuron under current noise sufficient to cause 4 Hz (average) spiking.(C ) HH neuron under current noise insufficient to cause spiking. (D) HHneuron under input similar to C plus non-offset 4 Hz Poisson pulse input.(E ) HH neuron under similar input to C plus fixed (5 ms) offset 4 Hz Poissonpulses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.1 Connectionist relational encoding motif. Ellipses stand for repetition. . . . 70

3.2 Multi-binding semantic network. (A) Left-hand-side of partition: A net-work utilizing role-and feature-binding adapted from Hummel and Holyoak(2003). (B) Right-hand-side of partition: A multi-binding extension of thisnetwork including relationship type nodes. . . . . . . . . . . . . . . . . . . 73

3.3 Simple network with single defined struct, item and assertion. . . . . . . . 82

3.4 A network involving recursion. . . . . . . . . . . . . . . . . . . . . . . . . . 83

vi

3.5 Network corresponding to simplified car-engine example from text. Exampleis simplified so that the network may be followed with the eye. . . . . . . . 83

3.6 Network corresponding to “Bill loves Mary” example from Fig. 3.2. . . . . . 843.7 Network encoding the feature- and role-binding semantic structure, “The

quick brown fox jumped over the lazy dog.” . . . . . . . . . . . . . . . . . . 853.8 More complex network telling a story. Network not intended to be followed

with the eye. The point of the figure is to show that extended informationcan be translated to and from an MBN. . . . . . . . . . . . . . . . . . . . . 85

vii

Acknowledgements

I would like to thank John Collins for his attention, advice and support throughout the de-velopment of the ideas and expression presented here. I am further grateful to Steven Schifffor numerous invaluable suggestions regarding the verbal presentation and this position ofthis work within the context of important results by others; to Reka Albert for attentionand guidance far beyond the call of duty regarding the introductory chapter and generalpresentation; to Peter Molenaar for insightful comments and suggestions concerning howto make my work more relatable and concrete; to Brad Wyble for his selfless rescue heroicsand much-appreciated perspective on my work; and finally to Jorge Sofo, whose effortshave greatly aided this work’s coming together.

viii

1 Introduction

What I cannot create, I do not understand. — Richard P. Feynman

1.1 Overview

Neuroscience has undergone a dramatic expansion since the 1950s. Starting with thedevelopment of the Hodgkin-Huxley (1952) model of the neuronal cell membrane, ourmathematical understanding of neurons and their behavior has been steadily increasing (seeDayan & Abbott, 2001). At the same time, advances in the field of computing have placedexhaustive and detailed calculations that at one time may have seemed unimaginable withinreach. These advances allow researchers to directly simulate neurons and vast networksthereof (see, e.g., Markram, 2012).

By contrast, over the same period of time, despite the initial expectations of digital comput-ing’s pioneers (see Herrmann & Ohl, 2009), classical approaches to computer intelligencehave not achieved a facsimile of the functions of the human mind. More recently, computerperformance at some human-like functions, especially, categorization of complex stimuli,such as road signs and handwritten characters, has been greatly enhanced by taking in-spiration from our increasing understanding of neural function into the computer sciencedomain (see, e.g., Ciresan, Meier, & Schmidhuber, 2012; Fausett, 1994). Nonetheless, evenwith such advances, it is clear we have yet to see a computer that is what we would callintelligent (see Sendhoff, Korner, Sporns, Ritter, & Doya, 2009, Preface). For example,computer instructions, by and large, must still be written in an “idiot-proof” way usingrigid, non-natural languages: presently, there are no computers that we can generally tellto do something, as we would tell a human, and have them work out how. It has beenmy presumption—and not just mine (see Sendhoff et al., 2009)—that we have many morelessons to learn from the brain about the nature of intelligence and that, sympathetically,we will not be able to say we have a scientific understanding of the brain until we are ableto reproduce its essential functions (see Eliasmith, 2013).

This dissertation presents two steps I have taken along this twin path, one regardingthe quantitative analysis of neural dynamics and another in the direction of a theory ofconnectionist symbolic representation.

1

1.1.1 A tool for metric space analysis of membrane potentials

In order to increase our understanding of the brain, we must hone our ability to observe itand to make sense of what we observe. A variety of observational protocols are availabletoward this end. One may perform an EEG, or electro-encephalogram, which is a highlyaggregated, though temporally very precise brain observation. Functional magnetic reso-nance imaging (fMRI) offers a more spatially localized yet indirect and temporally fuzzyrepresentation of the operating brain. A measurement that is local in both space and timeis provided by electrode recordings of the membrane potentials of individual neurons—withthe drawback that only a relative few neurons (at most, hundreds) may be observed in thisway (Baars & Gage, 2010). More recently, optical techniques that observe neurons by al-tering them so that they broadcast their state with a light signal are becoming increasinglyavailable (Scanziani & Hausser, 2009). As compared with electrode recordings, these allowa more non-invasive observation of neural behavior that is still reasonably precise, bothspatially and temporally (St-Pierre et al., 2014), and has the potential to observe exten-sive ensembles of neurons simultaneously (Yuste & Church, 2014). The analytical toolsdiscussed in this work are garnered toward the latter two types of observation.

The analysis of neural signals often concentrates on the sequence of action potentials, orspikes, occurring in a neuron or ensemble of neurons. Because spikes propagate over longaxon distances to other neurons whereas subthreshold membrane potential variations donot, spikes are seen as the essence of what the neuron is “saying” to other neurons and,therefore, the most important facet of membrane potential recordings (Rieke, Warland,De Ruyter van Steveninck, & Bialek, 1997, Ch. 1). As a first step in processing, membranepotential time courses are often transformed into spike trains, being sequences of the timesat which action potentials occur in the recordings. As we will discuss in more detail below,standard approaches to neural signal analysis include, among others, spectral analysis ofspike trains (see, e.g., Mitra & Bokil, 2007), point-process modeling of the spike trainand estimation of its intensity function (see Cox & Isham, 1980), analyses of the neuralfiring rate and its dynamics over time, the peri-stimulus time histogram (PSTH) of thespiking response to a stimulus (Dayan & Abbott, 2001), along with spike train metrics,which quantify a meaningful “distance” between two trains (van Rossum, 2001; Victor &Purpura, 1997). However, more information is available and relevant than what can begleaned from the spike train reduction of the membrane potential alone.

Features within membrane potential recordings offer indication as to what a neuron is“hearing” along with what it is “saying,” and there is substantial indication that neuronsspend a greater proportion of their time listening than they do speaking (Lennie, 2003;Shoham, O’Connor, & Segev, 2006). Since we are interested in understanding the mechan-ics of the neural networks of the brain, it is essential for us to look closely and analyze thetime course of neural inputs in addition to outputs, and we may do this only by analyzingthe membrane potential itself. I have developed a method for establishing a meaningful

2

distance between membrane potential recordings that attends to both action potentials inthe recordings and to the recordings’ subthreshold features, i.e., a metric for membranepotential recordings. This metric will be discussed in Chapter 2.

1.1.2 A connectionist semantic representation framework

My second contribution, which I will cover in Chapter 3 of this “extended discussion” isa theory of structured information representation in connectionist networks (see Sendhoffet al., 2009). By “connectionist” we shall mean, “consisting of numerous simple, intercon-nected processing units.” This definition includes the brain as a network of (relatively)simple biological neurons along with computer abstractions of neural networks consistingof even simpler computational atoms. Starting with pattern-classifying models of the 1960sand earlier, information has been represented in prominent connectionist paradigms in theform of simple associative connections by which active representations generate or reinforceactivity in other representations (Fausett, 1994). With the perceptron models along withthe later and more advanced parallel distributed processing (PDP) theory (see Rumelhart& McClelland, 1985), association takes the form of direct connections between neuron-likeunits representing pattern members and units representing classification categories.

With the physicist, John Hopfield’s, approach, we see associative connections betweenpattern members playing the central role, with pattern recognition occurring as a dynamicalattractor state in which activity in all pattern members obtains under sufficient partialinput to the members (Hopfield, 1982). Finally, with even more refined models such asGrossberg and Carpenter’s adaptive resonance theory (see Carpenter & Grossberg, 1988),we see associative connections in the top-down direction. That is, in addition to connectionsfrom member units to category units, there are connections from category units to memberunits, with both sets of connections cooperating to give rise to pattern recognition. One ofthe great strengths of all these methods is that the connections between units that causepattern-recognition to take place can be automatically generated according to computablelearning rules rather than having to be dictated by a network architect.

Starting in the early 1990s with SHRUTI (Shastri & Ajjanagadde, 1993), less well-knowntheories have been under development concerning a certain incompleteness to these typesof approaches to connectionist information representation. As was a chief complaint in thewell-known critique by (Fodor & Pylyshyn, 1988) of the connectionist approach, connec-tionism, on the basis of the models discussed above, does not seem to be compatible withstructured or combinatoric (roughly, combinable) representation. To give a simple exampleof what we mean, consider the simple example situation, which should be familiar for thosewith formal training in computer keyboarding, “The quick brown fox jumped over the lazydog.” This sentence offers a trivial example of a situation that cannot be captured bythe simple collective association of its member components: if we simply associate quick,

3

brown, fox, jumping, laziness and dog into a single representation, either with each otheraccording to the Hopfield paradigm, or to or with a classifier representation, according tothe PDP approach, we will wind up with an ambiguous jumble in which it is totally unclearwho is jumping over whom, who is quick, who is lazy and who is brown.

Shastri and Ajjanagadde (1993) and others including, earlier, Hinton, McClelland, andRumelhart (1985), and, more recently, Hummel and Holyoak (2003), have explored a con-nectionist means of resolving this issue, which we will be discussing in detail. So far asI know, the basic principle these approaches are relying upon as well as the range of itspotency has not been identified and discussed yet. The principle is this: subdivision of themajor conglomeration, being the situation one wishes to represent, into minor conglomera-tions for which association is sufficient to unambiguously infer meaning. For instance, thefox jumping the dog example discussed above may be represented by using minor conglom-erations involving the role-players in the situation (the fox and the dog), their respectivedescriptors (quick, brown; lazy), and their roles in the situation (jumper and jump-ee). Byfirst forming these minor conglomerations, subsequent conglomeration (or binding) into alarger complex successfully captures the structure of the situation: it becomes clear whois jumping, whom is jumped, etc.

We will discuss a project by which I demonstrate that this approach to connectionistrepresentation is exceedingly potent in its expressive ability: it can recoverably representa very wide class of information structures that includes recursion. The relevance is thatthe approach offers a theory for how structured information is represented in the brain andcan be represented in further approaches to computer intelligence.

A scheme like this is becoming more and more needed as reports continue to come in of“concept cells” and researchers struggle to make sense of how such cells, previously thoughtimpossible by many scholars, might be used in neural representations. Quiroga, Reddy,Kreiman, Koch, and Fried (2005) was an early study to report neurons, found in the medialtemporal lobe of adult humans, which selectively respond to specific high-level concepts(such as landmarks and famous actresses) and no other known inputs. Strictly speaking,though, these are not the first examples of “concept” cells. Returning again the 1950s, animportant decade for neuroscience and related fields, Hubel and Wiesel (1959) made thelandmark discovery of cells in the primary visual cortex of cats that selectively respond to aspecific “kind” of input including those which respond to small line segments in the visualfield oriented to or close to a particular direction. These line segments may be viewed asa low-level perceptual concept, and the ensembles of cells responding to them (which forminto columns) may be seen as a localized representation of the concepts in the brain.

Hubel and Wiesel (1962) also discovered cells that seem to respond to particular combina-tions of line segments, and the view that henceforth emerged is that perception is “builtup” from successive levels of perceptual organization, each level being predicated on therepresentations in lower levels. It is not difficult to imagine, in principle, this build-up ex-

4

tending all the way to learned high-level concepts, provided the concepts appear frequentlyenough in an individual’s experience to aid her or him in processing that experience. Thatis, while I may not develop a concept cell for my acquaintance’s grandmother, whom I haveonly encountered in passing a single time and am unlikely to encounter again, I am welloff to, through repeated experience, develop a concept cell for my own grandmother whichoffers me a basis on which to “build up” representations of situations involving her, whichare likely to be frequent. Experiments by Quiroga and colleagues seem to indicate that thebrain is indeed using a representational principle not unlike this.

That said, researchers are pungently aware of a fundamental difficulty with the notion ofa concept cell, which is that it is not sensible to suppose that the representation of a high-level concept resides within a single neuron or ensemble of neurons itself. After all, anysingle cell or ensemble thereof is not all that different from any other, so there is a paucityof distinction on which the representation can be predicated. Furthermore, it is clear thatthe reason that a concept-selective cell responds to a concept in the way that it does is byvirtue of its participation in a larger network. Thus, many argue, the representation of theconcept must belong to the network and not to the cell itself.

However, the situation is precisely the same with the orientation-selective cells discoveredby Hubel and Wiesel that we briefly discussed above. Orientation-selective cells do notrepresent orientated line segments by means of anything internal to themselves but ratherby their position in a network that receives input from the retina and, more proximately,the lateral geniculate nucleus. Nonetheless, neuroscientists by and large have no problemsaying that orientation-selective cells represent the oriented line segments they detect tothe rest of the brain. We should not have a problem saying concept selective cells representthe concepts they respond to, either, regardless of the network structures upon which thatselective response is predicated.

The nature of those network structures needs to be clarified, however. To understandconcept cells requires us to understand how they arise and how they contribute to thebrain’s collective representation of information, including structured information. Theconnectionist representation scheme I will put forward may provide a basis for that.

Since we are discussing concept cells, we would be remiss to neglect some of the basic workthat has been done by psychologists on concepts themselves. Scholars have been thinkingabout concepts since the days of Aristotle, for whom concepts were equivalent to sets ofnecessary and sufficient conditions. According to Aristotle, the concept of ‘chair’ has aset of criteria to go with it for which anything and everything satisfying those criteria is achair and also any and every chair satisfies those criteria. For millennia, this point of viewstood. Starting in the 1950s and by the 70s, to return again to a pivotal period of timefor our subject matter, the Aristotelian view of concepts fell from favor as scholars noticedthat hard, rigid boundaries for concepts simply are not there.

5

For example, consider the concept of a ‘heap’. At what point does a collection of, say,toothpicks become a heap? How many toothpicks do there have to be? What is thenecessary and sufficient number of toothpicks for a heap? Clearly the heap concept is notsomething that has binary status. While a single toothpick is definitely not a heap, anda stack of 50,000 definitely is, five or ten toothpicks loosely grouped and overlapping oneanother cannot be said to be definitely a heap nor definitely not a heap. Fuzzy boundariesfor concepts have been well-established by psychological experiments (see, e.g., Hampton,1979). If a concept does not have a crisp definition, is it real, and if so, what can it be saidto be? Two views have emerged among psychologists concerning these questions since therejection of the classical (Aristotelian) perspective. One is abstraction theory, which saysconcepts are defined by an abstract representation of the features or properties that tendto be held in common by instances of a concept, together with the importance of thosefeatures. The other view is exemplar theory, which holds that human individuals also storemultiple specific concrete instances of concepts in addition to the abstract representation(Murphy, 2002). We will discuss concept theory in more detail below and will ultimately seehow its contentions square nicely with our model of structured information representationin connectionist networks.

1.2 Preliminaries

1.2.1 The Hodgkin-Huxley neuron

Biological neurons are dynamical computational structures which, to a very rough approxi-mation, perform a weighted sum of their recent inputs (within 10–20 ms) and, if the sum issufficiently large1, make a report of this to other neurons in the form of an action potential,which then becomes a partial basis on which other neurons decide whether they will fire.A breakthrough for neuroscience came in the 1950s when Hodgkin and Huxley developeda faithful mathematical description for the dynamics, including action potential genera-tion, for neurons. The Hodgkin-Huxley model is a set of coupled differential equations,which Hodgkin and Huxley arrived at by painstakingly manipulating and observing thegiant squid axon (Hodgkin & Huxley, 1952; Mitra & Bokil, 2007; Rieke et al., 1997). TheHodgkin-Huxley and related “conductance-based” models continue to be used as a stapledescription of neurons in computer simulation studies.

1See post-inhibitory rebound, e.g., Dayan and Abbott (2001) for a well-known exception to this picture.

6

The Hodgkin-Huxley equations take the form (see, e.g. Dayan & Abbott, 2001):

cmdV

dt= m3 h gNa · (ENa − V ) + n4 gK · (EK − V ) + gL · (EL − V ) + iext (1.1)

dm

dt= αm(V ) · (1−m) + βm(V ) ·m (1.2)

dh

dt= αh(V ) · (1− h) + βh(V ) · h (1.3)

dn

dt= αn(V ) · (1− n) + βn(V ) · n (1.4)

Here, V is the electric potential difference between the interior and exterior of the neuron,i.e., the potential difference across the cell membrane, or membrane potential difference forthe neuron. The constant, cm is the capacitance per unit area of the cell membrane; iextis the input current to the cell per unit area, gNa, gK , and gL are nominal conductancesper unit area for the cell membrane, respectively, for its sodium ion channels, potassiumion channels, and non-specific ion flow (leakage) through the membrane. The multiplyingfactors, m, h, and n, are known as gating variables; they represent the state of the sodium(m, h) and potassium (n) ion channels. The gating variables take on values between 0 and1, and the channels only realize their full conductance if their associated gating variable(s)are equal to 1. The gating variables, themselves, are dynamic. The dynamics for eachhas both a membrane-potential-sensitive activating (αX(V )) and de-activating (βX(V ))component (see Dayan & Abbott, 2001, for details), and it is these activating and de-activating functions that are ultimately responsible for producing the action potential, orspike, for the neuron.

Finally, for each ion channel, and also for the leakage current, there is a separate rever-sal potential, EX , which is the membrane potential at which the respective current is atequilibrium and there is no net current flow. Standard values for ENa, EK , and EL are,respectively, 50 mV, -77 mV, and -54 mV. A typical equilibrium value for the membranepotential is -65 mV. Because ENa is so far above the resting membrane potential, it actsas the driver for the action potential. When the membrane potential rises sufficiently tostrongly activate the sodium ion channels, the membrane potential dramatically swingsupward toward ENa. Shortly thereafter, however, the sodium ion channels deactivate,and the potassium channels activate, plunging the membrane potential sharply downwardtoward EK . The result is a spike.

After the spike, the membrane potential rises back toward equilibrium, and the cell soon(∼10 ms) becomes ready to generate another spike if the input to the cell is sufficient tobring the membrane potential above spiking threshold once again.

The all-or-none character of the Hodgkin-Huxley spike—either the membrane potentialrises sufficiently high to trigger the cell’s action-potential generating mechanism, or it does

7

not and there is no spike—captures what is commonly seen as the most characteristicproperty of neurons (see Rieke et al., 1997): for practical purposes, they only communicatewith one another via spikes, which either happen totally if the neuron’s input is sufficientlystrong, or not at all if the input is less than sufficient. The spikes propagate along the cable-like axons of neurons to locations, called synapses, where the axons make chemical contactwith the dendrites of another neuron. When a spike traveling along an axon reaches asynapse, it triggers an influx of current into the subsequent dendrite, possibly contributingto a spike in the neuron the dendrite belongs to.

Similar “conductance-based” models to the Hodgkin-Huxley neuron, such as the faster-spiking Connor-Stevens model (see Connor, Walter, & McKown, 1977) are commonly used.Conductance-based models of neurons represent a relatively high degree of biological faith-fulness, by comparison to other models (see Izhikevich, 2004), although there are certainlymuch more realistic model neurons, e.g., Gold, Henze, Koch, and Buzsaki (2006).

The differential equations describing these models can be integrated by a modern computerin a reasonable amount of time for many neurons simultaneously (hundreds to hundredsof thousands, depending on the computer system). If interaction terms are included inthe equations (usually in the form of an incoming-spike-dependent synaptic conductance,gsyn, times a factor, (Esyn−V ) added to the dV

dt equation, where Esyn is a constant), largeinteracting networks of neurons can and have been simulated.

1.2.2 The binary neuron and the perceptron

On the other end of the spectrum, the most simplified representation of a neuron is whatis called the binary neuron (Fausett, 1994). For this neuron, the membrane potential isreplaced with an “activation value” that can be 0 or 1, and time is treated discretely. Theall-or-none facet of neural activity is captured in this case by an input threshold, θ. If theinput to the neuron exceeds θ, the neuron’s activity is 1; otherwise it is 0.

Many useful models of category learning have been developed on the basis of this model.A rudimentary example is the perceptron (see Fausett, 1994; Rosenblatt, 1958). Considera set of N binary neurons, represented by the binary variables, x1, . . . , xN , and let theseneurons represent some form of unprocessed data. For example, they could represent thepresence or absence of light on N different grid squares in a 2-D array, or they couldrepresent the presence or absence of more high level features, such as true or false answerson a questionnaire supplied to human participants.

Now suppose there is another set of M binary neurons representing categories one wishesto divide different patterns among the N input variables into, and let us represent each ofthese with one of the binary variables, y1, . . . , yM . The categorization may be achieved,with certain limitations, by way of a real-number-valued weight matrix, W = (wij) between

8

the N input neurons and the M category neurons. The entries, wij , of W are a simplifiedrepresentation of synaptic connections between the input neurons and the category neurons,with the strength of the synaptic interaction represented by the value of wij . In someimplementations, these weights are allowed to be negative; in others, they are strictlypositive.

If we set the input, Ii to the ith category neuron, yi as

Ii =∑j

wijxj (1.5)

and use our binary rule:

yi =

1 : Ii > θ

0 : otherwise(1.6)

where θ is an input threshold describing the minimum input necessary to produce anoutput, our weight matrix performs our categorization for us. The difficulty lies in assigningthe weight value entries, wij that will accomplish the categorization we desire.

Fortunately, however, this is not necessary. The weight matrix may be generated accordingto a learning rule that requires us only to provide a sufficient number of examples of eachcategory to the network. If there is a weight matrix that performs the categorization weseek (as will be discussed below, this is not guaranteed), the learning rule will settle on it(Fausett, 1994).

The perceptron learning rule for binary neurons is as follows:

∆wij = −α εi · xj (1.7)

where ∆wij is the update to the weight matrix entry, wij at a given time, α is the positivelearning rate constant (α 1), xj is the activity for the jth input unit at the same time,and εi is the error for the ith category unit at that time. That is, εi is the differencebetween yi, the correct binary value for the ith categorization unit for this input, and yi,the actual value calculated on the basis of the present weight matrix:

εi = yi − yi = H

∑j

wijxj − θ

− yi (1.8)

where H(x) is the Heaviside theta function,

H(x) =

1 : x > 0

0 : otherwise(1.9)

9

The weight update rule, Eq. (1.7), has an importance that goes beyond its algorithmicconvenience. It is also reminiscent of synaptic plasticity in biological neural networks,which are known to possess chemical messengers (e.g., dopamine), the presence or absenceof which serves as an indication of performance error (see, e.g., Schultz, 2002). These samemessengers also play a role in determining synaptic plasticity between neurons.

1.2.3 Continuously-valued activations

Ideas that apply to binary neurons can in most cases be generalized to model neuronswhose activation values take on continuous values, which is the next order in complexityof neural modeling that is generally seen. The activation of a neuron is usually restrictedto the range [0, 1] and can be roughly thought of as corresponding to the normalized firingrate (spikes per second) of a biological neuron. Time is still treated as discrete unlessotherwise specified.

Perceptrons utilizing continuously-valued activations for neurons map activations for inputvariables, xi, to activations for output categories, yi using a weight matrix, W as before.The input to the ith category, yi, is still described by the equation:

Ii =∑j

wijxj (1.10)

although the xi are now continuous variables. The corresponding activations, yi are deter-mined from these inputs by a sigmoidal activation function, F (I):

yi = F (Ii − θ) (1.11)

Weights are still determined through a self-organization process that occurs over timewith repeated presentation of input patterns and their correct category assignments by theupdate rule:

∆wij = −α εi · xj (1.12)

where

εi = yi − yi = F

∑j

wijxj − θ

− yi (1.13)

and yi is the correct activation for the ith category neuron for the present input.

10

1.2.4 Continuous time

Finally, neural activations may be modeled in continuous time. In this case, the activationvalue for a neuron is a function of time, x(t). Models utilizing either binary neurons orcontinuous activation neurons in discrete time can typically be generalized to continuoustime, which is often the next order of complexity used in artificial neural networks—although it is also possible to use binary neurons in continuous time.

For continuous-activation–continuous-time perceptron models, the inputs to category neu-rons in response to the activations, xj , of input neurons is still, essentially, Eq. (1.10)—although it is also possible to introduce a delay parameter into the input function:

Ii(t) =∑j

wij xj(t− τij) (1.14)

Here τij is the time delay between neuron j and neuron i. For the perceptron, sincethere are no recurrent connections (loops) in the network, models utilizing uniform delays(τij = τδ) are equivalent to models with zero delay.

The activations for the category neurons evolve continuously:

τdyidt

= F (Ii)− yi (1.15)

where F is a suitable sigmoidal activation function and τ is an evolution time.

The weight update rule for continuous-time perceptrons takes the form of a differentialequation:

τdwijdt

= −α εi(t) · xj(t) (1.16)

where, as before,εi(t) = yi(t)− yi(~x(t)) (1.17)

is the error between the current activation value, yi(t) and the correct value, yi(t) giventhe present input, ~x(t), where ~x represents a vector filled with the input values, xi.

A time delay, τδ, may be introduced into these equations as well, yielding:

τdwijdt

= −α εi(t) · xj(t− τδ) (1.18)

εi(t) = yi(t)− yi(~x(t− τδ)) (1.19)

11

1.2.5 Linear separability

The categorizations that may be implemented by a perceptron have a fundamental limi-tation: only what are known as “linearly separable” classifications may be achieved. Thisis due to the fact that the input neurons connect directly to the output neurons with nointermediates—a single-layer neural network. To understand this limitation, consider theweights between the N input neurons and one of the category neurons, i∗, where i∗ is fixed.These weights, wi∗j , form a vector, ~wi∗ . The input variabes, xi, may also be treated as avector, ~x. The output of the (i∗)th category neuron is then:

yi∗ = F (~x· ~wi∗) (1.20)

where F (I) = H(I − θ) in the case of binary neurons, and is in general a sigmoidal functionof the input. This expression makes it explicit that the output of each category neuron,given input ~x, is determined by the projection of ~x, along the direction of the categoryneuron’s weight vector. It follows that the surfaces in input space yielding equal outputs forthe category neuron will be perpendicular to the weight vector, parallel to each other andnot in any way curved. They cannot, for example, close on themselves to form a sphere orother shape surrounding a compact region—a sweet spot. Nor may they form saddle-likestructures. Also, because F is required to be a sigmoidal and therefore monotonically non-decreasing function, the category neuron’s response will never decrease in the directionof the weight vector. This would otherwise be necessary to solve the often-referenced“exclusive or” classification, which, for two binary inputs, includes only the inputs, (1, 0)and (0, 1) and excludes both (0, 0) and (1, 1). There is no straight line that may be drawnon the plane for which the points (0, 1) and (1, 0) are on one side of the line and (0, 0)and (1, 1) are on the other side. Category neurons belonging to a perceptron or othersingle-layer network are therefore said to be only capable of making linear discriminationson the input space.

1.2.6 Observing neurons

The nature of the brain creates obstacles to its observation. It is not a thing that, presently,can be perfectly tracked in time, like a comet or a cold front. It is a dense, opaque, three-dimensional, and highly complex object: containing some 100 billion individual neuronswith on the order of 1 quadrillion (1015) interconnections between them. The brain canbe taken apart, dissected, and examined, and it has been. However, once this takes place,it stops working. There are several major methods that are used to observe the operatingbrain, each of which tackles the problem of doing so in a different way, with its own advan-tages and drawbacks. Some methods offer fine time resolution but low spatial resolution.Others offer intermediate resolution with regard to both. Still others offer good to excellent

12

resolution in both categories but limit the proportion of the brain that can be observed ata time. The major neural observational methods presently in use by professionals includemicroelectrode recordings, electroencephalography (EEG), functional magnetic resonanceimaging (fMRI), and optical imaging. We will discuss each of these in turn.

Microelectrode recordings offer the greatest precision currently available for observing neu-rons. In this technique a very sharp electrode, usually 0.1–2 µm in diameter, is insertedinto the brain such that its tip is placed inside (intracellular), on the surface of (patch),or nearby (extracellular) an individual neuron (or neurons) (Mitra & Bokil, 2007, Ch. 8).The microelectrode itself is either a tiny metal wire or a tiny glass tube (micropipette)filled with an electrolyte solution. Resistance values for microelectrodes are on the orderof 1–100 MΩ. Intracellular microelectrodes offer the capability of more or less directlymonitoring the membrane potential for neurons. Amplification electronics and data acqui-sition hardware and software make it possible to store sampled recordings of a neuron’smembrane potential at a density of up to 10 ms−1.

Because intracellular and patch electrodes have a somewhat fragile positioning, the extra-cellular technique has often been used for awake, behaving animals (but not always: seeFee, Kozhevnikov, and Hahnloser (2004)). Extracellular data is often influenced by theactivity in more than one neuron, with closer neurons exerting greater influence. Neuronalspikes can be extracted from the data, and there are methods for teasing apart whichspikes in the signal belong to the same neuron. Major limitations of the microelectrodetechnique include its delicacy, difficulty, invasiveness, and the fact that only a limited num-ber of neurons—usually less than 10 at a time—can be simultaneously observed, practicallyspeaking, when using it: ten neurons is less than one-billionth of the brain.

Electroencephalography (EEG) is a significantly less invasive technique than microelectroderecording, requiring no surgery and relying on the placement of simple metal electrodes,usually on the outside surface of the skull (Mitra & Bokil, 2007, Ch. 10). These electrodesdetect changes in the aggregate electric potential outside large regions (compared to thesize of a neuron) of the cerebral cortex—the outermost layer of the brain. EEG data isprecise in time (on the order of 1–10 ms resolution), and since EEG usually involves theplacement of ten or more electrodes over the entire scalp, the data provides informationabout what is happening throughout the cerebral cortex.

The EEG signal mostly contains information about oscillations in neural activity. Theseoscillations occur at a variety of frequencies depending on the activity that the animal isengaged in and range usually between 5–50 Hz for awake humans. EEG may also containso-called “event-related potentials” (ERPs): specific disturbances to the EEG signal thatoccur at a fixed time delay relative to a known discrete event—often a particular kindof external stimulus. While comprehensive, EEG data is lacking in detail, since EEGelectrodes provide only an aggregate representation of the activity in the tens of thousandsor more of neurons that influence each electrode.

13

Functional magnetic resonance imaging (fMRI) is a minimally invasive, high-technologymethod that is able to create an image representation of activity in the brain by detectinglocal chemical changes via the manner in which those changes affect the nuclear magneticresonance properties of water protons (hydrogen nuclei) (Mitra & Bokil, 2007, Ch. 11).Usually, the so-called “blood-oxygen-level dependent” (BOLD) fMRI signal is used. Asthe name suggests, the signal reflects the amount of oxygen present in blood that is servingdifferent small volumes—delineated by the experimenter and called voxels—in the brain.Since blood oxygenation local to a brain area responds to neural firing activity in that area,neural activity and the BOLD signal are correlated. The exact nature of the correspondenceis an active research area. The fMRI method provides spatial resolution on the order of1 mm (there are up to tens of thousands of neurons per 1 mm3 volume) and allows theactivity of the entire brain to be imaged simultaneously. The procedure is relatively non-invasive, requiring only that subjects lie still in an enclosed space. Unfortunately, however,since the blood-oxygen-level takes time to respond to changes in neural activity, the timeresolution for fMRI is somewhat low: on the order of 1 second at best. Data for fMRIpresents as a multivariate time series. Processed data reflect the aggregate neural activityin each voxel at a sequence of time points and have been used extensively to determinewhich parts of the brain “light up” or become active during various cognitive, behavioral,and stimulus-driven processes.

Finally, optical imaging is a new method that has been coming online recently which offerspromise for temporal and spatial resolution approaching that of microelectrode recordingsalong with the ability to observe a much larger number of neurons at a time (Scanziani &Hausser, 2009). It is touted by some authors as the future of experimental neuroscience(Yuste & Church, 2014). The basic technique is to genetically modify neurons such thatthey express proteins with fluorescent properties that are dependent on a quantity of in-terest for the neuron, e.g., the calcium ion concentration or the neural membrane potentialitself. Current temporal resolution for voltage-sensitive optical imaging is down to 1 ms,and spatial resolution is at the 10 µm (single neuron) level (Quirin, Jackson, Peterka, &Yuste, 2014). Obstacles and limitations for optical imaging include the difficulty of thetechnique along with the fact that the brain is naturally opaque—although the latter canbe altered (see Chung & Deisseroth, 2013).

1.3 Hopfield networks

Hopfield networks are recurrent networks of identical model neurons. Input patterns arepresented to the network in much the same fashion as they are for the perceptron. Neuronscorresponding to specific values for specific input variables are activated when those valuesfor those variables are present within an input to the network. However, recognition for theinput does not manifest as an activation of a “categorization neuron” existing outside the

14

set of input neurons. Recognition occurs when the network enters a stable activity patternconsisting of neurons corresponding to the variable-values for a stored input pattern. Thevirtue of Hopfield networks is that they are able to perform pattern-completion: given anincomplete version of a stored input (a memory), a tuned network is able to access andenter the stable activity state for the completed memory.

The weight matrix, W = (wij), for a Hopfield network consisting of binary neurons isa square, symmetric matrix between the N input neurons and themselves with diagonalentries zero. Typically the weight matrix is explicitly set such that for i 6= j:

wij =∑p

f(xp,i, xp,j) (1.21)

where f(x1, x2) = (2x1 − 1) (2x2 − 1) =

1 : x1 = x2

−1 : x1 6= x2

(1.22)

and where xp,i is the activation for the ith neuron in the pth stored pattern, ~xp (Fausett,1994). This weight matrix can clearly also be produced by starting with a zero matrix andpresenting each pattern to the network using the discrete time update rule:

∆wij = f(xi, xj) : i 6= j (1.23)

Inputs to neurons in a Hopfield network are determined according to the recurrent versionof Eq. (1.5):

Ii =∑j 6=i

wijxj + Ei (1.24)

where the Ii are inputs to the xi neurons, themselves, and Ei is external input.

Neural activations are updated asynchronously—either in sequential order or by randomchoice—according to:

xi,new =

1 : Ii > θi

0 : otherwise(1.25)

Here θi is a threshold that is allowed to be different for different neurons.

The idea for Hopfield networks is that the neurons involved in a stored pattern, ~xp, willtend to reinforce each other’s activity and suppress activity in neurons not in the pattern.In this way, the stored pattern most closely matching an input vector, ~E, to the networkis supposed to emerge victorious as a stable activation pattern in response to ~E.

The most interesting aspect of Hopfield networks is that it may be shown that they havea so-called Lyapunov function, or energy function, which always decreases or stays thesame when a neuron is updated. The presence of such a function indicates that Hopfield

15

networks have stable attractors, being local minima of the energy function. These localminima are then the network’s catalogue of responses to inputs, ~E. An input, ~E willinitiate the network in a particular position in its state space, which will belong to thebasin of attraction for one of the network’s energy minima. The network will then proceedto that minimum.

The energy function for Hopfield networks is (see Fausett, 1994, pg. 139):

U(~x, ~E

)= −1/2

∑i

∑j 6=i

xixj wij −∑i

Eixi +∑i

θixi (1.26)

Due to the symmetry of the weight matrix, W, when a neuron, xi, updates by an amount,∆xi, the accompany change in the energy, ∆U is:

∆U =dU

dxi∆xi = −

∑j 6=i

xjwij + Ei − θi

∆xi (1.27)

Thus, the condition that xi = 1 upon update, which means xi either increases or staysthe same, is precisely the same as the condition that U will decrease when xi increases.Inversely, the condition that xi = 0 upon update, meaning that xi either decreases orstays the same, is the same as the condition that U decreases or stays the same when xidecreases. Therefore we have that the energy function, U , always decreases or stays thesame. Additionally, an updating neuron will always act to decrease U if possible.

Ideally, the local minima of the energy function will be the stored patterns. Hopfieldand others have addressed the problem of determining when this is indeed the case, withHopfield finding that if the number of stored patterns is less than about 0.15N , where Nis the number of neurons, the accuracy is reasonable (see Fausett, 1994, pg. 140).

Hopfield networks have been extended to continuous activation and continuous time net-work models in the same vein as the extension of the perceptron network to such conditionsdiscussed above. See Patterson (1996) or Hopfield (1984) for details.

1.4 Multi-layer feedforward networks and parallel distributed processing

In Sec. 1.2.5, it was noted that single-layer networks are only capable of performing linearlyseparable classifications of their inputs: positively classified data must lie on one side oranother of a flat (linear) hyperplane in the input space with negatively classified datalying on the opposite side. This is quite a constraint. As was noted, it means the networkcannot implement the “exclusive or” classification. Nor can it perform what we might call a“Goldilocks” classification which rules out input data that is in any way extreme and instead

16

selects only input vectors that are in the near vicinity of a particular point that is “justright.” The brain is clearly able to make such discriminations, and so are neural networkmodels, provided one makes the augmentation that the network contains more than onelayer : there must be a set of neurons intermediate to the input and output neurons, whichthe input neurons connect to and which in turn “innervate” the output neurons. Theseintermediate neurons serve as intermediate, “hidden” categories facilitating the ultimatecategorization being performed.

The intermediate neurons may themselves be segregated into any number of consecutive“layers” for which weighted connections exist between neurons in a given layer and thosebelonging to the layer or layers subsequent to that layer. In this section we are restrictingour attention to so-called “feedforward networks” for which connections in the reversedirection are not allowed. In practice, multi-layer feedforward networks are usually furtherrestricted such that weighted connections only exist between a layer and its immediatesuccessor, and that will also be our assumption here.

Due to the non-linear nature of the activation function, F (I), neurons in consecutive layersdo not simply perform successive linear recombinations of the input variables. Instead,they generate a wide class of classifications including those that are not linearly separable.However, this increase in capability comes with a price. The method for assigning weightvectors becomes something of a difficulty.

For the perceptron, updating the weight, wij between an input neuron, xi and an outputneuron, yj is as simple as multiplying the error on yj , εj = yj − yj , by the activity in xi,as in Eq. (1.7). This works, in part, because the direct connection between xi and yj andthe non-decreasing nature of F ensures that this weight adjustment will move the input,Ij , to unit yj , closer to what it would need to be for yj to take on its target value, yj , inthe event that the same input is repeated.

In the case of a multi-layer feedforward network, the same procedure works well enoughon the weights between the final hidden layer and the output layer. However, it does notapply so well to the connections received by a hidden layer from its predecessor. For onething, it is not clear from the outset what the activations of hidden layer units “ought tobe” in response to input patterns, so one cannot use an error on hidden unit activations asa basis for updating weights received by hidden units. At the same time, because weightsother than those between the final hidden layer and the output layer do not directly affectthe units in the output layer, it is not immediately transparent which direction (increasingor decreasing) of change for these weights will cause any particular output unit to changein a direction desired or how much change is appropriate.

Parallel distributed processing (PDP) is a multi-layer network paradigm (see Rumelhart& McClelland, 1985) that usually uses feedforward networks and determines weight up-dates through a calculus called backpropagation. The crux of backpropagation is to use

17

partial differentiation to determine by how much an update to each network weight wouldincrease or decrease the sum of squared errors (SSE) over all output units in the network.Error-reducing updates to weights are then made in proportion to the magnitude of thisdependency:

E2 =∑i

ε2i =∑i

(yi − yi)2 (1.28)

∆w = −α ∂

∂wE2 (1.29)

The calculation is made efficient by exploiting the “chain rule” which can be used to relatethe partial derivative of the SSE (E2) with respect to a given weight, w, in terms of theSSE’s partial derivative with respect to the neural activations in the next layer, which maybe expressed in terms of the partial derivatives with respect to activations in the layerafter that, and so on. Let xpre be the activation of a neuron that makes a connection withweight w, onto a neuron, xpost. We have that:

∂

∂wE2 =

∂xpost∂w

∂

∂xpostE2 (1.30)

⇒ ∂

∂w= xpreF

′(Ipost)∂

∂xpostE2 (1.31)

where Ipost is the aggregate input to xpost and F is our activation function. If xpost belongsto the output layer, i.e., it is one of the yi, then ∂

∂xpostE2 = 2εpost. Otherwise, suppose xpost

makes connections onto next-layer neurons, xi, with weights wi.

We have:

∂xi∂xpost

=∂Ii∂xpost

dxidIi

= wiF′(Ii) (1.32)

∂

∂xpostE2 =

∑i

∂xi∂xpost

∂

∂xiE2 =

∑i

wiF′(Ii)

∂

∂xiE2 (1.33)

This gives a recursion relation for the partial derivatives, ∂∂xE

2. The partial with respectto the activation of a given neuron (xpost in our case) is given in terms of the partials withrespect to the activations of neurons (xi here) that the neuron links to in the next layer.Since the partials for the final layer are established ( ∂

∂yiE2 = 2εi), all partial derivatives

may be found iteratively by “backpropagating” through the network. Once the partialswith respect to unit activations are known, the partials with respect to weights may beimmediately calculated through Eq. (1.31).

18

The mathematics of backpropagation is further discussed in Fausett (1994); Patterson(1996); and Rumelhart, Hinton, and Williams (1985). While a mathematically elegantmethod, backpropagation seems to sacrifice the level of biological analogy found in theperceptron model. It seems difficult to propose a way that biological tissue could be innatelyperforming a computation that requires synapses to have access to non-local informationpertaining to downstream partial derivatives.

Backpropagation, while perhaps the most widely recognized and adopted, is not the onlymethod used for determining weights for multi-layer feedforward networks. Fausett (1994)and Patterson (1996) discuss several additional well-known algorithms. A biological modelfor how synaptic weights might be learned in a specific single-hidden-layer network foundin the songbird brain is presented in Fiete, Fee, and Seung (2007).

1.5 PDP, connectionism and intelligence theory

As well-known and as widely-implemented and discussed as parallel distributed processingis as a pattern recognition algorithm, it is equally well-known for its position in the ongoingscientific inquiry regarding the mechanisms and nature of intelligence. PDP has beensomething of a canonical version of the so-called “connectionist” perspective on intelligencewhich, in the literature, has sat in contraposition to the “symbolic” paradigm.

Proponents of the symbolic view assert that intelligence amounts to symbol processing.Chess playing, for example, revolves around the manipulation of tokens—chess pieces—among a discrete set of positions on a chess board. Symbolic intelligence theorists believethat the intelligent way to play this game is for a brain or other intelligent system tointernally represent possible board configurations with internal symbols and to manipu-late these symbols—playing through possible move sequences, calculating advantages anddisadvantages for each along the way—in such a way that yields profitable move choices.Furthermore, they believe that the same basic approach of symbol manipulation is theintelligent approach to all problems, given the appropriate symbol manipulation strategyfor the problem (see, e.g., Newell & Simon, 1976).

Connectionist theorists advance that symbol-processing is too rigid and constrained to beeffective at solving the broad scope of problems encountered in the real world and that,rather, intelligence is something that emerges from the complex self-organized interactionsof many simple, small, interconnected processing units such as neurons (see Rumelhart,1989; Smolensky, 1989). Learned pattern recognition by a perceptron is a simple exampleof a connectionist success. According to Rumelhart (1989, Sec. 2), multi-layer feedfor-ward networks with backpropagation learning have been successfully applied to problemsinvolving character and speech recognition, sonar detection, molecular structure analysis,game-playing and simple sentence-parsing.

19

The PDP program led by Rumelhart, Rogers, McClelland, Hinton, and others at CarnegieMellon University (and elsewhere), has sought to push the envelope for their brand ofconnectionism—demonstrating some of the capacities connectionism has. Among the in-teresting work the group has undertaken has been the application of PDP systems tosemantic representation and cognition—a classic domain of symbol-processing approachesto intelligence modeling.

In McClelland and Rogers (2003), the authors review one of the approaches taken bytheir group to this problem. They discuss a feedforward network originally envisioned byRumelhart, who by the time of the publication was unfortunately unable to participate.The network consists of two input layers, a “representation” layer (a kind of hidden layer),another hidden layer, and an output layer. One of the input layers contains simple noun-concepts, such as oak tree and salmon. The second input layer contains the basic semanticrelations, is-a, is, can, and has-part. The output layer contains relevant categories, adjec-tives, action-abilities and parts for the items in the first input layer, such as plant, green,swim, and scales, respectively. The network’s job is to learn to associate pairs of inputsconsisting of a single item and a single relation with the appropriate output items. Forexample, the network is to learn that an oak tree is-a plant and that a salmon has scales.That is, when the oak tree and is-a input units are activated, the network is to activateplant along with any other category units that are appropriate to oak tree, such as livingthing and similarly for all pairs of input items and relations.

The topology for the layers is the following: The noun-concept input layer connects tothe representation layer; the relation input layer does not. In this way, the units in therepresentation layer serve to “represent” the noun-concept units in such a way that acombination of several representation layer units will respond to any given noun-concept.This is called a distributed representation of the noun-concepts. The representation layerand the relation layer connect to the (second) hidden layer, which then is the only layer toconnect to the output layer.

The network is initialized with small random weights and is repeatedly exposed to trainingdata consisting of combinations of single noun-concepts, single relations and all appropri-ate output units corresponding to the noun-relation pair. The backpropagation method isapplied to achieve a working set of weights that correctly establishes the semantic map-ping.

One impressive aspect of this semantic model is its ability to generalize its learning, whichis a basic cognitive ability possessed by humans that has proved difficult to reproduce. Inthis case, after the initial set of item-relation-value combinations has been learned, it ispossible to introduce a new item—perhaps elm tree—to the network, which is similar tooak tree, and can make use of the connections already learned which map oak to its relatedvalues. To demonstrate this, Rumelhart froze the connections between the hidden layerand the output-value layer and only performed weight updates on the connections between

20

the noun-concept input layer and the representation layer—and only using is-a trainingdata. The network learned to map the new noun-concept onto an activation pattern inthe representation layer similar to those for similar noun-concepts previously learned andthereby was able to activate output neurons corresponding to values the new concept sharedwith the previously-learned similar concepts.

Another impressive aspect of this work is that it achieves a measure of semantic encodingusing what I will call a purely connectionist network. That is, it uses a network that has nosemantic attachments added to its links, or edges. This is in contrast with the typical casefor network models of semantic information, going back to Quillian (1967), in which edgesare labelled with the type of relationship that exists between the concepts–representedby nodes–on either side of the edge. In the model presented in (McClelland & Rogers,2003), relations are represented with neurons/nodes/units just like the noun-concepts are.The feedforward structure of the network and its strictly two-part relations are somewhatlimiting, however, to its expressibility.

The Rumelhart model presented in (McClelland & Rogers, 2003) is actually a simplified(for the purposes of study) version of an earlier, more versatile recurrent semantic connec-tionist framework brought forward by Hinton, which is discussed in Hinton et al. (1985).(A “recurrent” network, in contrast to a feedforward network, has directed connectionswhich form closed loops.) Among other things, Hinton’s model seems to be the first toutilize a little-known technique of establishing the roles that items play in a structuredrepresentation by using separate “role/identity combination” nodes, or patterns, that bindthe items to the roles (see Hinton et al., 1985, pg. 107) This idea has resurfaced in Shastriand Ajjanagadde (1993), Hummel and Holyoak (2003), and will figure heavily in Chapter3 of this dissertation.

1.6 Concept theory

The topic of concepts is extremely broad, having been closely scrutinized in both thepsychological and philosophical traditions for centuries on the one hand and millenia onthe other. They are now receiving attention from the neuroscience community as we areseeing evidence for ‘concept neurons’ such as ‘Jennifer Aniston cells’ (see Quiroga et al.,2005). The classical view of concepts was discussed above in Sec. 1.1 and asserts thata concept is equivalent to a clearly delineated list of necessary and sufficient conditionsfor itself. Subsequent to the work of Eleanor Rosch (see, e.g., Rosch & Mervis, 1975)and others in the 1970s, the classical view of concepts has been all but dismissed bymainstream psychology. Major results contributing to this development include the gradedcategory membership results due to Hampton (1979) mentioned before along with the workof McCloskey and Glucksberg (1978) showing items that are inconsistently judged as beingconcept members versus not—even by the same subject on trials separated by as little as a

21

couple of weeks. Psychologists have since recognized (see Murphy, 2002, pg. 21) the valueof “fuzzy” concepts: by using them, people avoid the combinatoric explosion problem forstored concepts that has often been seen by neuroscientists as an argument against thepossibility of single-concept neurons. Having a different concept for every variation onemight encounter on, say, a coffee mug (mug with two handles, mug with double, ‘B’-shapedhandle, mug with wide base to avoid spills, mug with a sip-top for insulation, etc.) resultsin an enormous number of independent concepts that may rarely if ever be used. Havinga single loose concept of “coffee mug” is a more efficient system.

One of the major areas of investigation during the last half-century regarding conceptsconcerns “typicality.” Given a concept, such as chair, people will find different items moreor less typical of the concept than others. For example, a high chair—for a toddler toeat from—is an atypical chair whereas a four-legged, straight-backed dinner-table chair ismore typical. Aristotelian concepts have no accounting for typicality: Items either have thenecessary and sufficient conditions for the concept or they do not. Individual assessmentsof typicality show reliability across time and across subjects.

Typicality has been shown to influence mental processing including reading and inferencespeed. For example, when reading a story about an atypical concept member (e.g., a storyabout a goose: an atypical bird), subjects take longer to process a sentence referring to themember by the concept name (“The bird came in the door”) than they do if the story isabout a more typical concept member (e.g., a robin). Typicality also affects likelihood forspontaneous category member generation: when asked to produce examples of a concept,subjects are more likely to produce items that have been rated by others as more typical.For subjects deciding whether an item belongs to a concept or not, the best known predictorfor the reliability of such decisions is the typicality of the item for the concept. Typicalityis one of the primary phenomena that post-classical theories of concepts have sought toaccount for.

In general, these views come in two flavors: abstraction theories and exemplar theories(Murphy, 2002). Abstraction theories hold that the stored representation for a concept isat the general level rather than the concrete. Frequency theory and prototype theory areabstraction theories. Frequency theory says a concept is represented by varying levels ofassociation with the features borne by members of the concept. The level of associationreflects the proportion of cases for which the feature is present: the feature’s frequency.The more of these features that a given item has, the more typical it is of the concept.Prototype theory suggests that concept membership and typicality are determined by howwell an item fits a prototype, which is more structured and may, for example, contain slots,or roles, to be filled by a concept member.

In exemplar theory, the concept is thought to be represented by a stored set of entitiesthat exemplify the concept (Murphy, 2002). The mind decides whether a new stimulusentity is an instance of the concept by comparing the stimulus to the stored exemplars. If

22

the stimulus is close enough to one or more of them, the conclusion is positive. Similarityto multiple exemplars increases the level of affiliation with the concept in roughly additivefashion. According to (Murphy, 2002, pg. 73), actual exemplar theorists do not denya mixed representation for concepts, with some representation occurring at the abstractlevel. Their intention is to draw attention to the importance of exemplars.

None of these views on concepts is totally consistent with all experimental results, andeach of them is partially successful in its own right (Murphy, 2002). Concepts, as theyare used by the human organism, seem to be an amalgamation of features, exemplars,prototypes and other aspects. This observation is especially relevant to our Chapter 3,where I will present a network representation of concepts in which each concept is encodedby its weighted linkages with features, exemplars, roles, related concepts, and any otherrelevant items.

1.7 Neural signal analysis

In this section, I will describe some traditional approaches to neural signal analysis as apreface to my own work on the subject. We will turn first to the spectral analysis of timeseries data, discussing methods for estimating the spectral content of discretely sampleddata of finite duration in time. Afterward we will discuss the point process frameworkfor understanding discrete point-like events occurring at probabilistically determined timesalong with methods for applying this framework to the analysis of neural data. Finally,we will discuss the application of spectral analysis to neural data that presents as a spiketrain point process realization.

1.7.1 Spectral analysis

A standard technique for analyzing time series data of any sort is spectral analysis, bywhich the oscillating components of various frequencies within the data are identified andmeasured. This method allows signals of different lengths in the time domain and withinwhich the oscillating components may be offset from each other to be compared on thebasis of the relative magnitudes of these components. Neural data is no exception. Spectralanalysis is a commonly employed technique for analyzing EEG, fMRI, micro-electrode andoptical data.

Classically, the Fourier transform, F , is the operation that extracts frequency information,F (ω) from a function of continuous time, f(t):

F (ω) = F [f(t) ;ω] ≡ 1√2π

∫ ∞−∞

f(t) exp (−iωt) dt (1.34)

23

The amplitudes, F (ω) are complex numbers whose magnitudes indicate the degree towhich oscillation at frequency ω is present in f (t) and whose phase angles in complexspace provide information about the phase, relative to t = 0 of that oscillation.

The function, f (t) may be recovered from F (ω) by the inverse Fourier transform:

f(t) = F−1[F (ω) ; t] ≡ 1√2π

∫ ∞−∞

F (ω) exp (iωt) dω (1.35)

The elegant mathematics of Fourier transforms, well-known to those with rudimentarymathematical training, is of limited direct use when it comes to real data. This is for tworeasons. One is that truly continuous numerical data is never seen: data is always sampledat discrete points in time. The other is that infinite durations of time are never studied:data recordings have a start and stop time.

Assuming data samples are equally spaced by ∆t in time, this means the closest one cancome to evaluating a Fourier transform on real data is the so-called partial sum,

FN (ω) =T√2π

N−1∑j=0

f(j∆t) exp (−i ω j∆t) (1.36)

where N is the number of samples and T = N∆t is the total amount of time sampled(see Brillinger, 1975; Mitra & Bokil, 2007). The F symbol is used here to indicate thatthe partial sum is an estimate of a Fourier transform rather than a Fourier transform perse. The time, tfirst, for the first time series data point is, without loss of generality, beingtreated as the origin in this formula. Note that T = tlast − tfirst + ∆t.

For frequency components approaching the frequency, ω = π/∆t, which is called theNyquist frequency, the resolution of the data begins to adversely affect the accuracy ofthe summation because only a few points are sampled per oscillatory period. When ω ex-ceeds the Nyquist frequency, the partial sum is no longer valid even as an approximation.If the sampling (angular) frequency, 2π/∆t, is large compared to oscillation frequencies ofinterest, this is not so great an adversity.

What can be a much more significant issue is that the partial sum unfortunately givesinaccurate values for F (ω) due to the restriction of the sum to the finite interval, T . Thisinaccuracy may be understood by recognizing that the partial sum is really an approxima-tion to the Fourier transform of the function, fT (t) = bT (t− T/2) · f(t), where bT (t) is the“boxcar” function,

bT (t) =

1 : −T/2 ≤ t ≤ T/20 : otherwise

(1.37)

24

The convolution theorem for Fourier transforms states that:

F [f(t) · g(t)] = F [f(t)] ∗ F [g(t)] (1.38)

where ∗ is the convolution operation on functions:

F (ω) ∗G(ω) =

∫ ∞−∞

F(ω − ω′

)G(ω′)dω′ (1.39)

Therefore,

F [fT (t)] = F [f(t)] ∗ F [bT (t− T/2)] = F (ω) ∗(e−iωT/2BT (ω)

)(1.40)

⇒ F [fT (t)] = F (ω) ∗(√

2πe−iωT/2 sin(ωT/2)

ω

)(1.41)

The convolution acts to diffuse frequency amplitudes together. Substantial diffusion occursover a “narrowband” width, ∆ω = 2π/T . This effect is not such a terrible encumbranceand may be considered the price one has to pay for using a finite sample. As the sample sizebecomes large, this effect diminishes in inverse proportion. A larger issue is that, for thepartial sum, diffusion of frequencies continues outside this narrowband to a degree inverselyproportional to ∆ω. This “broadband” effect is quite detrimental to the fine details of thespectrum, which one generally wants to preserve as much as possible when analyzing andcomparing spectra.

The issue is dealt with by introducing window functions, wT (t) into the estimator that actto soften the “hard” boundaries of the sampled interval:

FN (ω) =T√2π

N−1∑j=0

wT (j∆t− T/2) f(j∆t) exp (−i ω j∆t) (1.42)

The symmetric window function, wT (t), is defined to be zero outside the domain −T/2 ≤t ≤ T/2. However the transition to zero is made much more smoothly.

The resulting expression is now an approximation to the Fourier transform,

F [f(t) · wT (t− T/2)] = F [f(t)] ∗ F [w(t− T/2)] = F (ω) ∗(e−iωT/2WT (ω)

)(1.43)

There is still an unavoidable convolutional spreading of the frequency information. How-ever, one now has control over that spreading by way of the shape of WT (ω). Higherfrequency elements in w(t) give rise to wider convolutional spreading, which is also called“spectral leakage.” Because the boxcar function, bT (t) goes to zero at t = ±T/2 in per-haps the most dramatic and sudden way possible, the ordinary partial sum, Eq. (1.36),

25

suffers from a great deal of spectral leakage. Fortuitous choices for the window function,wT (t), will approach zero at t = ±T/2 in a more graduated manner and have been thesubject of considerable research and discussion spanning many decades (Brillinger, 1975;Harris, 1978). A particularly astute family of windows known as the prolate spheroidalwavefunctions, or Slepian sequence, emerged starting in the 1960s.

Prolate spheroidal wavefunctions

Since a window function, w(t) alters the spectrum of a data function, f(t) by convolution:F [w(t− T/2) · f(t) ; t] =

(e−iωT/2F [w(t) ; t]

)∗F [f(t) ; t], the priority for a window function

is to restrict its bandwidth as much as possible under the constraint that w(t) goes to zeroat the endpoints of its domain, −T/2 and T/2.

An approach to expressing this priority, mathematically, is to consider two operators onfunction space, one that restricts a function in the time domain and another that restrictsit in the frequency domain (Slepian & Pollak, 1961):

Rt[f(t) ;T ] = bT (t) · f(t) (1.44)

Rω[f(t) ; Ω] = F−1[bΩ(ω) · F [f(t) ;ω] ; t] (1.45)

= BΩ(t) ∗ f(t) , where BΩ(t) ≡ F−1 [bΩ(ω)] (1.46)

Here bX is the boxcar function as before—see Eq. (1.37). Both restriction operators areself-adjoint. (This is true by construction for the first operator and must also be true forthe second operator since it is self-adjoint in the frequency basis.) It follows that theircomposition, Rω,t[f(t) ; Ω, T ] = Rω[Rt[f(t) ;T ] ; Ω] is also self-adjoint. Applying Rω,t is away, roughly speaking, of finding the component of a function that is inside the range[−T/2, T/2] in the time domain and [−Ω/2,Ω/2] in the frequency domain.

Since Rω,t is self-adjoint, it has a complete set of eigenfunctions, ei(t) with eigenvalues,λi. The larger the eigenvalue, the more concentrated the eigenfunction is in the time andfrequency domains, [−T/2, T/2] and [−Ω/2,Ω/2]. Since all suitably well-behaved functionswill be a sum of the eigenfunctions of Rω,t, any function other than the eigenfunction, e1,with the largest eigenvalue, λ1, will be diminished more by application of Rω,t than e1

itself. This makes e1 optimally concentrated, in a meaningful sense, in the [−T/2, T/2]and [−Ω/2,Ω/2] time and frequency domains.

The e1 function is called the first prolate spheroidal wavefunction, for parameters Ω andT . Generally speaking, for i < ΩT/2π − 1, the ith prolate spheroidal wavefunction, ei, hasan eigenvalue, λi, close to unity and therefore is a relevant choice for window function fordesired frequency resolution, Ω.

26

Discrete versions of the prolate spheroidal wavefunctions may be obtained by applyingthe same basic method to functions of discrete time (or sequences). Consider the N -dimensional vector space of functions, f(tk) : k ∈ 0, 1, . . . , N − 1, defined at the discretepoints, tk = −T/2 + k∆t, over the domain [−T/2,−T/2 + (N − 1) ∆t]. One may define afrequency band-limitation operator on this vector space by way of the Fourier transformfor the space:

FN,T [f(tk) ;ωj ] =T

N

N−1∑k=0

f(tk) exp (−i ωj tk) (1.47)

where ωj are the discrete frequencies 2πj/T : j ∈ Z. The inverse transform is:

F−1N,T [F (ωk) ; tj ] =

1

T

N−1∑k=0

F (ωk) exp (i ωk tj) (1.48)

Once again, we may use these transforms to implement a bandwidth limitation operator(the time domain for the functions is limited by construction):

Rω[f(tk) ; Ω] = F−1N,T [bΩ(ω) · FN,T [f(tk) ;ωj ] ; ti] (1.49)

The N -dimensional eigenvectors to this self-adjoint operator are known as the discreteprolate spheroidal wavefunctions. For large N (and fixed T ), they approach the prolatespheroidal wavefunctions. Because the partial sum is essentially the same operation asthe finite N-dimensional Fourier transform, the discrete prolate spheroidal wavefunctionsare better tuned to serve as window functions for the partial sum than their continuouscounterparts. The general rule remains that for j < ΩT/2π − 1, the jth discrete prolatespheroidal wavefunction, ej(ti) has eigenvalue sufficiently close to unity to be useful as awindow function for the partial sum (Mitra & Bokil, 2007).

The multitaper method

Because the (discrete or continuous) prolate spheroidal wavefunctions are eigenvectors ofa self-adjoint operator, they must all be mutually orthogonal. This means that the projec-tions of any vector, g(ti), onto two (or more) different discrete prolate spheroidal wavefunc-tions will be independent quantities, provided the components of the original vector arealso independent. Note also that the windowed partial sum, Eq. (1.42), is the projectionof the vector, f(ti) · exp (−iωti) onto the window function, w(ti). This means that usingdifferent discrete prolate spheroidal wavefunctions in the windowed partial sum estimate ofF (ω) yields independent estimates in case f is a realization of a white noise process.

27

Combined with the fact that the first ΩT/2π − 1 discrete prolate spheroidal wavefunctionsare useful for spectral analysis at a given frequency resolution, Ω, this makes for a veryuseful method. Given the desired frequency resolution, Ω, and the time extent of the data,T , one may obtain up to ΩT/2π−1 independent estimates of F (ω) by using the first ΩT/2π−1discrete spheroidal wavefunctions as window functions in the partial sum.

Once the ΩT/2π − 1 estimates have been made, statistical methods may be employed toyield a value-and-uncertainty estimate for |F (ω)|2 (Mitra & Bokil, 2007; Thomson, 1982;Walden, McCoy, & Percival, 1994). The mean of the estimates of |F |2 follows a chi-squaredistribution with 2ne degrees of freedom, where ne is the number of estimates (see Mitra& Bokil, 2007, pg. 195). With some caveats (see Walden et al., 1994), this remains true ifthe process is Gaussian and, for large T , is approximately true in general.

This is the multitaper method, and it is a standard approach to spectral analysis in avariety of contexts not limited to the neurosciences. With specific regard to the latter, themultitaper method is applicable to micro-electrode, EEG, fMRI or optical data, along withsimulation data, and in general any data that presents as a time series.

1.7.2 Point process formalism

The point process framework describes discrete, point-like objects (points) that exist in acontinuous space, with the understanding that the specific set of loci at which the pointsoccur is a random variable (Cox & Isham, 1980). In the standard case, points are quali-tatively identical, instantaneous events in time. The term sample path is used for specificrealizations of this type of point process. Point process theory lends itself to neural signalanalysis since the narrow-width and all-or-none character of spikes matches the ideal ofidentical point-like events well.

There are several ways of formulating the probabilities involved in point processes. Oneis to consider regions (possibly non-contiguous) within the space, S, in which the pointsoccur. For any (countable) collection of such regions, a joint probability may be assignedto the different possibilities for the number of points occurring in each region.

Said another way, if each Ai (i ∈ N) is a subset of S, and N(Ai) represents the number ofpoints occurring in Ai, we may assign a joint probability:

P (N(A1) = n1;N(A2) = n2; . . .) (1.50)

for each collection of subsets, Ai, and for each collection of positive integer values, ni, viz.,the number of points in each set. If this probability is defined for every possible collectionof subsets, Ai, and every possible number of points occurring in the Ai, then the pointprocess is defined (Cox & Isham, 1980).

28

For the case that S is one-dimensional, as is the case for spike point processes for whichS is T , the set of all times, there is another way of defining the point process, calledthe complete intensity function, which can be more convenient. The complete intensityfunction gives the probability density for a point to occur at time, t, in a particular samplepath, given the history, Ht, of points occurring in the sample path at times less than t(Cox & Isham, 1980, pg. 9):

pI(t;Ht) = limδ→0+

P (N(t, t+ δ) > 0 | Ht) /δ (1.51)

where N(t1, t2) is the number of points occurring in the interval, [t1, t2). When using theintensity function description, it is generally assumed that coincident points occur with, atmost, vanishing likelihood.

The final characterization we will consider for 1-dimensional point processes is that ofassigning joint probabilities for sequences of intervals between consecutive points (Cox &Isham, 1980). For example, if ∆ti, i ∈ N, describes a sequence of waiting times for pointevents starting from a suitable origin, t = 0, then, assuming waiting times of 0 durationonly occur with vanishing likelihood, one may define the joint probability density:

p∆(∆t1,∆t2, . . .∆tn) =

limδ1,δ2,...,δn→0

1

δ1 δ2 · · · δnP (t1 ∈ [∆t1,∆t1 + δ1) ; t2 − t1 ∈ [∆t2,∆t2 + δ2) ;

. . . ; tn − tn−1 ∈ [∆tn,∆tn + δn)) (1.52)

where ti is the time of the ith point. When defined for all sequences of intervals, the densitycharacterizes the point process.

According to Cox and Isham (1980, pg. 11), all three of these methods for characterizingpoint processes are equivalent in the case that multiple simultaneous events never or almostnever occur (the former being the case for neural spikes); i.e., they may each be seen asoffering a complete definition of the point process.

While, in principle, any conceivable history-dependence is possible for point processes, themore intricate the dependence the more difficult the process is to analyze and use as amodel. For this reason, the two processes that receive the most emphasis are Poisson pro-cesses and renewal process. In a Poisson process, the history dependence for the intensityfunction is completely absent, and separate point events are statistically independent ofeach other. Note that time-dependence for the Poisson process can still occur if the inten-sity function is time-dependent or dependent on external variables. In a renewal process,the history-dependence for the intensity function extends only the time of the most recentevent.

29

Poisson Processes

In a Poisson process, all events are independent, and the intensity function is:

pI(t;Ht) = pI(t) = λ (1.53)

where λ is the rate for the process (not to be confused with the firing rate for a neuronobeying the process, although the two may coincide). Strictly speaking, the case that λ isconstant is called a homogeneous Poisson process. We may also have an inhomogeneousPoisson process for which λ is a function of time, λ(t), or is a function of other variablesthat are themselves functions of time, λ(t,X(t)).

The number of events occurring within an interval of time for a homogeneous Poissonprocess is given by the Poisson distribution:

P (N(t1, t2) = n) = (λ · (t2 − t1))n exp (−λ · (t2 − t1)) /n! (1.54)

In the case of inhomogeneous point processes, this result, strictly speaking, does not hold.However, if one works with rescaled time, z(t) =

∫ t0 λ(t′) dt′, it has an analog (Mitra &

Bokil, 2007):

P (N(t1, t2) = n) = (z(t2)− z(t1))n exp [− (z(t2)− z(t1))] /n! (1.55)

For both homogeneous and inhomogeneous Poisson processes, the ratio of variance to meanfor the point count, Var [N(t1, t2)] /Exp [N(t1, t2)] is strictly unity. The number of pointsoccurring in non-overlapping intervals of time are independent.

Renewal Processes

For renewal processes, the intensity function depends on the history of the process only byway of the time of the most recent event:

pI(t;Ht) = ν(t; tlast) (1.56)

In principle, ν can have any time-dependence or even depend on external variables. Thecase that ν is strictly a function of the time that has elapsed since tlast: ν (t; tlast) =ν (t− tlast) is called a stationary renewal process.

A key feature of stationary renewal processes concerns the sequence of intervals betweenpoints in a sample path for the process: ∆ti = ti − ti−1. Since the intensity functionfor stationary renewal processes depends only on the time since the most recent event,

30

all intervals in the process are statistically independent of each other. The process cantherefore be characterized by the probability density for single intervals between points,p∆(∆t): the joint interval distribution, Eq. (1.52), being the product of factors, p∆(∆ti),if the origin coincides with a point. Otherwise the probability density for ∆t1 = t1 willbe different (Cox & Isham, 1980). Independence of intervals also means that the powerspectrum for the interval sequence is frequency independent, i.e., a white noise spectrum(Mitra & Bokil, 2007). If ν is a more general function ν(t; tlast), intervals need not bestatistically independent and a frequency-independent power spectrum is not guaranteed.This is particularly true if the time-dependence for ν contains oscillations.

Applications to Neural Analysis

Point process theory has been applied to neural analysis in many different ways. One hasbeen to consider neurons to follow an inhomogeneous Poisson process and to examine thedependence of the rate parameter, λ, on variables pertaining to an external stimulus andon time.

The data for this is gathered by observing the same neuron across multiple trials duringwhich the organism the neuron belongs to is exposed to qualitatively identical stimuli.A frequent approach to estimating λ is to define time as being relative to the stimulusonset, subdivide time into bins of suitably narrow width and then count the number ofspikes, over all trials, that occur in each bin. This measure is called the peri-stimulus timehistogram (PSTH). The spike counts, when normalized by the number of trials, serve asan estimate for the probability density for spikes in the vicinity of the bin, that is, as anestimate for the local rate parameter, λ(t).

One problem with the time-binning approach is that deciding on bin widths is a sensitiveand non-systematic procedure. If the widths are too large, one misses out on the highfrequency details of the time-dependence for λ. If the widths are two narrow, one suffersfrom an insufficient number of samples to accurately determine λ when λ is small.

Additionally, the edges of bins are rigid boundaries. As such they introduce noise intothe rate parameter estimate. This issue may be resolved by using a smoothing kernel,such as a Gaussian kernel, applied to the spike trains to perform the estimation ratherthan using time-binning. This method still suffers from the issue of how to determine thewidth of the kernel, with wide kernels discarding rapidly changing components in λ(t) andnarrow kernels suffering from imprecise estimates for low values of λ. Estimation methodsaddressing these remaining issues may be found in Mitra and Bokil (2007, Ch. 13).

Scholars have also developed inhomogeneous Poisson models for which the rate parameterdepends not directly on time but rather, in an interesting way, on both internal networkvariables—such as the local phase of network activity oscillations—as well as external

31

situational or stimulus variables. See Brown, Frank, Tang, Quirk, and Wilson (1998) foran application of this method to hippocampal place cells.

Another point-process-oriented approach has been to examine the degree to which a Poissonprocess model is a fit to neural behavior by examining the Fano factor,

F (T ) ≡ Var [N(T )] /Exp [N(T )] (1.57)

where N(T ) is the number of spikes occurring in intervals of time, T. As we have noted,this ratio is strictly unity for both homogeneous and inhomogeneous Poisson processes.By evaluating it for observed spike trains, the analyst may draw some inference as to theaccuracy of the history-independent picture for the intensity function.

Analysts have also examined the degree to which a stationary renewal process model ofneural activity applies by calculating the power spectrum of the interval sequence for spiketrains. As discussed above, this spectrum should be flat. A spectrum that varies withfrequency indicates the stationary renewal process point of view is incomplete.

1.7.3 Spike train spectral analysis

Spectral analysis on realizations of point processes may be performed by first transformingthe sequence of point times,

s = (t1, t2, . . . ti, . . . tn) (1.58)

into a function of time, fs(t) by substituting each point—viz. each spike, in our case—witha Dirac delta function:

fs (t) =

n∑i=1

δ(t− ti) (1.59)

A Fourier transform may then be performed:

Fs(ω) =

∫ ∞−∞

fs(t) exp(−iωt) dt (1.60)

=

n∑i=1

exp(−iωti) (1.61)

One may also use a windowing function, w(t), with the transform giving:

Fs(ω) =

n∑i=1

w(ti) exp(−iωti) (1.62)

32

Another approach is to transform the spike train into a binary function, gs(t), by subdivid-ing the observation interval into finitely-many bins, [tj , tj + ∆t), which are of sufficientlynarrow width, ∆t, that no bin contains more than one spike:

gs(t) =

1 : t belongs to the same bin, [tj , tj + ∆t), as a spike, ti ∈ s0 : otherwise

(1.63)

and then computing the windowed partial sum, Eq. 1.42:

Gs(ω) =T√2π

N−1∑j=0

wT (j∆t− T/2) gs(j∆t) exp (−i ω j∆t) (1.64)

This approach has the advantage of not disguising the underlying frequency band limi-tation imposed by the experimental sampling process. Furthermore, applying an efficientalgorithm, such as the fast Fourier transform, can return results more or less just as quicklyfor this method of computing the spectrum as for using the delta function approach whena large number of spikes is present (Mitra & Bokil, 2007, Sec. 7.7).

The multitaper method may be used with either the delta function approach or the time-binned approach to yield a value-and-uncertainty estimate of the spectrum. Care mustbe taken, however, regarding the number of degrees of freedom, which, regardless of howmany windowed estimates are used, does not exceed the total number of spikes (see Mitra& Bokil, 2007, Sec. 7.7.2).

1.8 Concluding remarks and itemization of specific contributions

In this chapter, we have reviewed some basics of neuroscience, neural signal analysis, ar-tificial neural networks, and the psychology of concepts. These topics have prepared theground for what follows. In the next chapter, I will present my contribution to the topicof neural signal analysis, which is a mathematical metric for neural membrane potentialrecordings. The metric provides a quantitative basis for describing difference betweenrecordings, which could come either from microelectrode or optical imaging studies (seeSec. 1.2.6). This metric is designed to be sensitive to the presence of action potentialsas important features in the signal. Such features make straightforward approaches to amembrane potential recording metric inadequate, as we will discuss.

The specific research contributions I have made with this project are:

1. The generalization of van Rossum’s (2001) metric on spike trains to the idea of aconvolution-based membrane potential metric (Sec. 2.3).

33

2. A condition on the kernel function used for a convolution-based membrane potentialmetric that yields first-order dependence for the metric on both spike time differencesand membrane potential differences (Eq. (2.30)).

3. A kernel that fulfills the condition and yields other important properties (Eq. (2.49)).

4. The resulting convolution-based membrane potential metric (Eq. (2.52)).

In the third chapter, we will discuss a method I have developed for using connectionistnetworks to represent complex semantics. As discussed in Sec. 1.5, the connectionist ap-proach to intelligence is well-known for its versatility, adaptability, and success at solvingpattern detection and constraint satisfaction problems, which have often stumped classicsymbolic approaches to artificial intelligence. On the other hand, representing situationsand stimuli with complex internal relationships, components and structure has been a dif-ficulty for connectionism, which, as was discussed in Sec. 1.5, has certainly been appliedto semantics albeit in a somewhat limited fashion. In Chapter 3, I will present a schemefor representing a wide class of complex and combinable semantic structures, includingrecursion, within a purely connectionist paradigm. The framework is consistent with ideasfrom the psychology community regarding concepts as discussed in Sec. 1.6. I will discussa computer program I have developed to implement and test my assertion that the con-nectionist networks are representing the structures I say they are by translating back andforth between formal language and connectionist representations.

With the project, I have made the following specific research contributions:

1. A generalized combinatorial syntax for nodes in a connectionist network and thereforea possible internal “language” for semantic representation in the brain (Sec. 3.4).

2. A proof-of-concept for the network syntax—a program that translates formal lan-guage syntax into the network syntax and vice versa (Sec. 3.5).

34

2 Convolution-based Metric for Neural Membrane Potentials

2.1 Introduction

Electrophysiological methods, voltage-sensitive fluorescence (Scanziani & Hausser, 2009)and computer simulations (e.g., Hellgren, Grillner, & Lansner, 1992) are all ways thatneuroscientists access membrane potential time courses for (real or in silico) neurons andensembles thereof. In the effort to analyze these signals, a variety of approaches are oftentaken, including spectral analysis, point-process intensity function estimation, the peri-stimulus time histogram and others (see Mitra & Bokil, 2007). While the results of suchanalyses provide valid and important avenues for signal comparison, it is also convenient tohave a measure that compares signals more directly: without recourse to time-binning, astatistical model or other major processing (Paiva, Park, & Prıncipe, 2009). It is thereforeworthwhile to have a metric that, given two membrane potential trajectories, computes ameaningful, non-negative “distance” between them. Adding distances between trajectoriesfor individual neurons gives an ensemble distance.

In the case of spike trains, considerable work has been done to develop this kind of measure(see Paiva, Park, & Prıncipe, 2010). The conventional approach to comparing spike trainshas been to use a time-binning procedure to transform trains into finite-dimensional vectorsand to evaluate, e.g., a Euclidean distance on the vectors. Such methods have majordrawbacks, however, if precise spike timing is of interest, as several authors note (Schreiber,Fellous, Whitmer, Tiesinga, & Sejnowski, 2003; van Rossum, 2001; Victor & Purpura,1997). For example, the binning procedure is insensitive to timing differences that do notchange a spike’s bin, and it treats a timing difference of a single bin the same as a differenceof any (nonzero) number. The drawbacks of time-binning have led to an effort to developwhat are called binless spike train measures. Two well-known binless measures are theVictor-Purpura metric and the van Rossum metric (van Rossum, 2001; Victor & Purpura,1997).

Spikes and relative spike timing are very meaningful components of the membrane potentialtime course; however, analyses attending exclusively to spikes provide only part of theinformation available in the signal. As Lennie (2003) has pointed out, energy considerationsindicate that neurons spend the bulk of their time in an inactive, non-firing state, i.e., theyspend more time “listening” than they do “talking.” Lennie estimates about 0.16 spikesper neuron per second in the awake human brain, and suggests that optimally no morethan 4% of neurons are actively firing at any time—which corresponds to a 4 Hz average

35

ΔtA

ΔtB

> ΔtA

V1

V2

t

V1

V2

A

B

α α

t

|V1 –

V

2 |

α α

t

|V1 –

V

2 |

t dpt

[V1 , V

2 ] = á = 2α|V

1(t) – V

2(t) | dt

-∞

∞

dpt

[V1 , V

2 ] = á = 2α|V

1(t) – V

2(t) | dt

-∞

∞

Figure 2.1: Point-comparison metrics and membrane potentials. In A, the action potentialtiming is a closer match than in B, but according to dpt, since the mismatchedaction potentials do not overlap in either case, both recording pairs are thesame distance apart.

spike rate for active neurons (4 Hz × 4% = 0.16 Hz).

Metrics that look only at spikes omit information concerning what is happening duringinactivity and between spikes during active firing. Such tools offer a view into what theneuron is “saying” but not into what it is “hearing.” The latter is certainly of interest toneuroscientists. For example, Ali, Deuchars, Pawelzik, and Thomson (1998); Bruno andSakmann (2006); Long, Jin, and Fee (2010); Moore and Nelson (1998); Rosen and Mooney(2006); Steriade, Nunez, and Amzica (1993) and Ziburkus, Cressman, Barreto, and Schiff(2006) are just a few studies that look closely at non-spiking (i.e., subthreshold) featuresof neural recordings.

On the other hand, taking a simple-minded approach to membrane potential comparisonleads to a metric that is mostly insensitive to spike timing: Suppose we apply a point-comparison metric, such as dpt [V1, V2] ≡

∫∞−∞ |V1(t)− V2(t)| dt to membrane potentials. In

this case, we get a metric for which non-overlapping action potentials (∼1–5 ms in width)are treated as equally different no matter how much time separates them. See Fig. 2.1 foran illustration. This is hardly satisfactory.

We need an integrated membrane potential time course metric that is sensitive both to

36

relative action potential timing and to subthreshold membrane potential dynamics. In thischapter, I provide such a metric by building off van Rossum’s metric for spike series (vanRossum, 2001). We will generalize the van Rossum metric in such a way that it applies tomembrane potential recordings and adapt it so that it has a first-order response to bothspike time differences and membrane potential differences. We will also tailor the metricso that it has a fat-tailed, low-pass response to input frequencies that is free of zeros orlocal minima. The metric is defined in Eq. (2.52).

This chapter is essentially a reproduction of Evans (2014).

2.2 Background

In the previous section, we discussed two popular binless spike-time sensitive metrics forspike trains: the Victor-Purpura metric (Victor & Purpura, 1997) and the van Rossummetric (van Rossum, 2001). We will now briefly review these two results, both of whichwill figure in what follows.

The Victor-Purpura metric works by assigning costs to basic transformations on spiketrains, defining the distance between trains as the least total cost of a sequence of basictransformations that maps one train to the other. Three basic transformations are con-sidered: spike insertion, spike removal and spike displacement in time. Spike insertionand removal are both given the same set cost, and spike displacement has a cost that isproportional to the amount of the displacement. This results in a metric, DVP , that risesproportionally to spike time differences for nearby spikes1. The distance reaches a hardplateau when it becomes less costly to remove a displaced spike and re-insert it at its newlocation than it is to move it.

Victor and Purpura (1997) provide an algorithm that computes their metric with O(n1 ·n2)computational complexity, where n1 and n2 are the numbers of spikes in the trains. Lets1 and s2 be two spike trains, each consisting of a sequence of spike times:

s1 = (t11, t12, . . . t1i, . . . t1n1) ; s2 = (t21, . . . t2i, . . . t2n2) (2.1)

Victor and Purpura’s algorithm inductively builds up a minimum transformation costmatrix, G, between the trains, the ijth entry of which, gij , is defined as the minimumcost of a transformation from the first i spikes of s1 to the first j spikes of s2. With theboundary conditions, gi0 = cini, g0j = cinj, where cin is the spike insertion/removal cost,the matrix can be built up as:

gij = ming(i−1)j + cin, gi(j−1) + cin, g(i−1)(j−1) + q ·|t2i − t1j |

(2.2)

1We will use a capital D for spike train metrics and a d for distances on functions of continuous time.

37

where the constant, q, is the proportionality factor for spike-displacement costs. Thebottom right entry of G, gn1n2 , yields DVP (s1, s2)q: the least cost of a transformationbetween s1 and s2. Victor and Purpura’s algorithm is an adaptation of one Sellers (1974)introduced in the context of a metric for DNA sequences.

The van Rossum distance2 solves the problem another way: First, spike trains are mappedto real-valued functions of time by replacing each spike with a one-sided exponential decay,H(t− tij) e−(t−tij)/τ . Here tij is the time of the spike being replaced, H(x) is the Heavisidestep function, and τ is a parameter describing the timescale of the metric’s sensitivity to

spike time differences. The standard L2 norm, dL2 [f1, f2] ≡(∫∞−∞ (f1(x)− f2(x))2 dx

)1/2,

then defines a distance between these functions which, after scaling by a factor of 1/τ , istaken to be the distance between the spike trains. Because the one-sided exponential profilespreads spikes out by the amount, τ , the L2 norm becomes responsive to time differencesof order τ between spikes.

The resulting metric, D2vR, may be stated mathematically in the following way:

D2vR (s1, s2)τ ≡

1

τ

∫ ∞−∞

(n1∑i=1

σvR ((t− t1i) /τ)−n2∑i=1

σvR ((t− t2i) /τ)

)2

dt (2.3)

where σvR (t/τ) ≡

e−t/τ : t ≥ 0

0 : otherwise(2.4)

The distance squared, D2vR, is preferred to its square root (DvR) because, for spike trains

that only contain a single spike, we have (van Rossum, 2001, pg. 755, Eq. 2.8):

D2vR (s1, s2)τ = 1− e−|td|/τ (2.5)

where td is the time difference, t21−t11, between the spikes in the two trains. The distancesquared increases proportionally with the time difference and levels off as the differencebecomes large compared to τ , giving a meaningful quantification of an interesting piece ofabstract information: the relative incompatibility of the spikes.

As reported by Paiva et al. (2010, pg. 408, Eq. 6), in the case of arbitrarily many spikes,the van Rossum metric generalizes to3:

D2vR (s1, s2)τ = 1

2

n1∑i=1

n1∑j=1

e−|t1i−t1j |/τ + 12

n2∑i=1

n2∑j=1

e−|t2i−t2j |/τ −n1∑i=1

n2∑j=1

e−|t1i−t2j |/τ (2.6)

2This metric also appears in an earlier publication by Hunter, Milton, Thomas, and Cowan (1998).3Here we are subtracting the third double sum from the overall expression rather than adding it as

originally appeared in Paiva et al. (2010), which we take to be a mistype.

38

In the third double sum, we see negative terms that mirror the spike-time comparisonterm in the single-spike expression, Eq. (2.5); they give initially proportional reporting forthe spike time difference for all pairs of spikes not belonging to the same train. The firsttwo sums contain positive terms that decrease with the timing difference between spikesin the same train. These terms function similarly to the ‘1’ in Eq. (2.5) in that theyexactly cancel the inter-train terms if the trains are identical—so that D2

vR = 0 in thiscase. We will discuss the function of these terms further in Sec. 2.4.2, where we will seethat they tend to cancel out the least relevant inter-train spike time comparisons in thethird sum. The overall expression allows the van Rossum metric to be calculated withO(n1 ·n2) computational complexity.

Eq. (2.6) may be rewritten in two important ways:

D2vR (s1, s2)τ = 1/2 (n1 + n2) +

n1∑i=1

i−1∑j=1

e−|t1i−t1j |/τ

+

n2∑i=1

i−1∑j=1

e−|t2i−t2j |/τ −n1∑i=1

n2∑j=1

e−|t1i−t2j |/τ (2.7)

and

D2vR (s1, s2)τ = 1/2 (n1 − n2)2 −

n1∑i=1

i−1∑j=1

γ 2vR ((t1i − t1j) /τ)2

−n2∑i=1

i−1∑j=1

γ 2vR ((t2i − t2j) /τ)2 +

n1∑i=1

n2∑j=1

γ 2vR ((t1i − t2j) /τ)2 (2.8)

where γ 2vR (t/τ) ≡ 1− e−|t|/τ .

Van Rossum (2001) explored two important limits of his metric, which follow, respectively,from these two expressions. Eq. (2.7) shows that when all spikes are separated by muchmore than τ , D2

vR counts the spikes in the two series, returning 1/2 (n1 + n2). Alternatively,in Eq. (2.8), we see that if all spike pairs are close together compared to τ , the metric squaresthe difference in the number of spikes, yielding 1/2 (n1 − n2)2.

Both the van Rossum metric, D2vR, and the Victor-Purpura metric, DVP , share the useful

property of rising linearly with spike time differences for nearby spikes and leveling off asspike time differences get large and spikes are (seemingly) unlikely to correspond. Theyalso share the same O(n1 ·n2) order of computational complexity. As spike train metrics,they both offer a useful and meaningful quantification of difference.

39

2.3 Methods

While the Victor-Purpura metric is without question a fascinating and effective tool, itseems difficult in principle to convert its spike-train-transformation approach into a seam-less method applicable to membrane potential recordings. Because action potentials donot have hard boundaries and never share exactly the same shape, it is problematic totransform between membrane potentials by cutting and pasting spikes.

By contrast, van Rossum’s metric generalizes easily to such a context. As pointed out byPaiva et al. (2010), the replacement of spikes with a one-sided exponential decay so centralto van Rossum’s method is equivalent to convolving the decay with a spike train function,fs, consisting of Dirac delta functions serving in the place of spikes:

f vRs (t; τ) ≡∫ ∞−∞

σvR((t− t′

)/τ)· fs(t′)dt′ (2.9)

where fs(t) =∑ti∈s

δ(t− ti) (2.10)

Here, s is a spike train (see Eq. (2.1)). In these terms, we have:

D2vR (s1, s2)τ =

1

τ

∫ ∞−∞

(f vRs1 (t; τ)− f vRs2 (t, τ)

)2dt (2.11)

This way of writing D2vR suggests an easy adaptation to membrane potentials. The mem-

brane potential of a neuron as a function of time, V (t), with its tall, narrow action po-tentials, may be considered in analogy to the spike train function, fs(t), which representsspikes as ideally tall and narrow delta functions. Where T is the (finite) time domain overwhich the neuron has been recorded from, the convolution, fs is then analogous to:

V σvR (t; τ)T =1

τ

∫TσvR

((t− t′

)/τ)V(t′)dt (2.12)

Here we are normalizing the convolution by the time width, τ . Since σvR (x) has unit area,this scaling allows V σvR to be interpreted as a smoothing of V . Application of the L2 norm4

to smoothed potentials now gives a generalization of van Rossum’s spike train metric to

4We scale the integration in the L2 norm by 1/|T | to make the metric intensive (see below).

40

membrane potential recordings: an example of a convolution-based metric5:

dvR [V1, V2; τ ]T ≡

√∫ ∞−∞

(V σvR

1 (t; τ)T − VσvR

2 (t; τ)T

)2dt/|T | (2.13)

=

√∫ ∞−∞

(∫TσvR ((t− t′) /τ) · (V1(t′)− V2(t′)) dt′/τ

)2

dt/|T | (2.14)

The distinction in notation is important here: D2vR refers to van Rossum’s original distance

squared which is defined on spike trains (Eq. (2.3)); dvR is defined for membrane potentialrecordings and is scaled differently. Like van Rossum’s spike series metric, d2

vR comparesnearby spike times in the recordings by virtue of the convolution operation’s spreading ofspikes in time by τ . The relationship between D2

vR and dvR is:

D2vR (s1, s2)τ = τ |T | dvR [fs1(t) , fs2(t) ; τ ] 2

T (2.15)

For convenience, we are treating membrane potential recordings as continuous functionsof time, neglecting the fact that actual recordings are discretely sampled. This will be theapproach taken throughout the chapter. Toward the end, we will address the problem ofapplying the convolution-based metric, dC , to sampled time series.

The dvR metric, Eq. (2.14), has a couple of issues. First, just as van Rossum needed to takeD2vR in order to get a spike train distance that is first order in spike time differences, one

must take d2vR in order to get the same property. The first-order property is advantageous

to those interested in spike timing: the magnitude of timing differences (that are smallcompared to the chosen timing sensitivity, τ) correlates well with the magnitude of themetric. For analysts also interested membrane potential differences, a metric first orderin V1−V2 offers a similar advantage. However, d2

vR is second order in V1−V2. Secondly,the infinite tail of the exponential decay in σvR causes the hassle of needing to perform anintegral, or an approximation thereof, over all recorded times prior to t when evaluatingthe interior integral in Eq. (2.14).

These two observations make it worthwhile that, in constructing a membrane potentialmetric, we attempt to replace σvR in Eq. (2.14) with a kernel that changes these twoproperties for the metric. In addition, the kernel ought to produce a favorable responsefor the metric to non-spiking membrane potential dynamics. In this respect, while someneglect of high-frequency components is unavoidable due to the fact that we are convolvingwith a continuous function, which is a smoothing operation, we want to preserve as muchhigh-frequency information in the time course as we can. Otherwise, we want the frequency

5Square brackets in the definition for dvR (Eq. (2.14)) and throughout this chapter indicate that thesymbol being defined is a functional, having some arguments that are functions.

41

response to lack any bias with regard to specific frequency bands. Finally, we want a kernelthat causes the metric to behave like a metric, viz., to give 0 if and only if the recordingsit evaluates are identical. We will find a kernel (Eq. (2.49)) that addresses each of theseissues, namely one that:

1. yields a metric with a first order response to same-time membrane potential differ-ences and nearby spike time differences for otherwise identical recordings

2. is symmetric and non-zero only over a finite domain

3. produces a metric with a smooth low-pass frequency response that lacks any zeros,minima or oscillations which unfairly bias or overlook certain frequencies

4. gives the metric a near-optimally fat-tailed frequency response curve.

5. causes the metric to return 0 if and only if recordings are identical

2.3.1 A generalized convolution-based metric

For the purpose of analysis, we will now define a generalized convolution-based metric, dgen,that accepts a smoothing kernel, σ, as one of its arguments: We will use the definition fordvR (Eq. (2.14)) with the van Rossum kernel, σvR, replaced by the argument kernel. Thisconstruction will allow us to study how the properties of a convolution-based metric dependon its kernel:

dgen [σ] [V1, V2; τ ]T ≡

N [σ]

√∫ ∞−∞

(∫Tσ((t− t′) /τ) · (V1(t′)− V2(t′)) dt′/τ

)2

dt/|T | (2.16)

= N [σ]

√∫ ∞−∞

(V σ

1 (t; τ)T − V σ2 (t; τ)T

)2dt/|T | (2.17)

Here, T is the time domain of the membrane potential recordings; τ is the range of timeover which feature (e.g., spike) timing is compared; and V σ

1 (t; τ)T and V σ2 (t; τ)T are σ-

smoothings (in practice, σ(x) should be normalized to unit area) of V1 and V2:

V σi (t; τ)T ≡

1

τ

∫Tσ((t− t′

)/τ)Vi(t′)dt′ (2.18)

We have scaled both internal and external integrals in Eq. (2.16) by the size of the relevanttime domain to produce an intensive metric. An “intensive” metric indicates that if twosignals preserve the same pattern of difference over an extended length of time, the distancebetween them is the same regardless of how long the recordings are (provided |T | τ so

42

that edge effects are negligible). The units of the metric are therefore Volts, not Volt-seconds. The interior integral is scaled by τ−1 since the width of σ(t/τ) is set by τ , thesmoothing time. The exterior integral is scaled by |T |−1. This is because, its integrand,the convolved membrane potential difference,

∫T σ((t− t′) /τ) · (V1(t′)− V2(t′)) dt′/τ , will

normally be very close to zero for t more than a few multiples of τ away from T . Assumingτ |T |, while the limits of the integral are ±∞, the size of the region being integrated isroughly |T |.

N [σ] is a kernel-dependent normalization term. The constraint used to set N [σ], whichwill be explained in Sec. 2.4.1, is a regularization of the initial rate of increase for dgen

with spike time differences. What we will arrive at is N [σ] = 1/√∫∞−∞ σ

′(x)2 dx, assuming

σ′(x) is square-integrable6, where σ′(x) is the derivative of σ with respect to its argument.For most kernels we will discuss, this latter condition holds. It does not hold for σvR; theinitial rate of increase for dgen [σvR] with spike time differences diverges, and our usualconstraint cannot be met. Nonetheless, we must define N [σvR] in order for dgen [σvR] tobe defined. In general, we use N [σ] = 1 when σ′(x) is non-square-integrable. This givesdvR = dgen [σvR]. Eq. (2.31) states this definition for N [σ].

2.3.2 Overview

We will begin our analysis by addressing dgen’s response to action potentials, modelingmembrane potentials as sums over Dirac delta function spikes. This will allow us to obtainthe above condition on σ yielding first order dependence for dgen on individual spike timedifferences along with the normalization, N [σ]. We will see that, given a kernel that meetsthe condition, a convolution-based metric responds to spike timing differences that are smallcompared to τ in the same way that the Euclidean metric on coordinate spaces responds toindividual coordinate differences. We will discuss other aspects of how convolution-basedmetrics respond to spikes and spike timing as well.

Turning our attention to dgen’s response to non-spiking features of the membrane poten-tial, we then benefit from some Fourier analysis. We will show that with the choice, σX(Eq. (2.49)), for the kernel, we get a near-ideally-gradual low-pass frequency response thatis otherwise impartial to specific frequency bands. We will continue by showing that thisresponse implies dC ≡ dgen [σX ] returns zero only for identical membrane potentials. Thetriangle inequality for dC is shown in App. 2.7.

Finally, we will evaluate dC , D2vR, and DVP numerically for several types of extended

neural data, comparing the different metrics’ performances. The inputs to the metrics willbe pairs of data, one of which is a systematically modified version of the other. The datawill include randomly generated delta function Poisson spike trains and simulations of a

6Note we have changed variables from t to x ≡ t/τ .

43

Hodgkin-Huxley neuron under randomly generated current input. We will scale the spiketrain metrics, D2

vR and DVP , in such a way that they can be plotted on the same graphwith dC , and the values can be directly compared with each other. In each case, we willplot dC versus a parameter that controls the timing offset for features in the recordings.We will see that the convolution-based metric, dC , gives sensible output and verify thatit provides the analyst with a considerable amount of additional information about thedifference between two neural signals than what is available from either the van Rossumor Victor-Purpura spike train metrics.

2.4 Results

Let us consider the response dgen[σ] has to spike timing. Given two spike trains, s1 ands2, as defined in Eq. (2.1), we will model their corresponding membrane potential timecourses by treating the spikes as Dirac delta functions which capture, in idealized fashion,the large-amplitude, narrow-timescale character of the action potential:

Vs1(t;α) = α

n1∑i=1

δ(t− t1i) , Vs2(t;α) = α

n2∑i=1

δ(t− t2i) (2.19)

The parameter, α, sets the area under each spike; α∼100 µV· s is realistic. We will assumethat all spikes occur within the time domain of comparison, T .

2.4.1 Single spike recordings

In the case of single spike recordings, Vs1(t;α) = α δ(t− t11), and Vs2(t;α) = α δ(t− t21),suppose the spikes are td apart so t21 = t11 + td. Evaluating dgen gives:

dgen [σ] [Vs1(t;α) , Vs2(t;α) ; τ ]T =

N [σ]

√∫ ∞−∞

α2

τ2(σ((t− t11) /τ)− σ((t− t11 − td) /τ))2 dt/|T | (2.20)

= N [σ]α√

2√τ |T |

√(∫ ∞−∞

σ(x)2 dx−∫ ∞−∞

σ(x)σ(x− td/τ) dx

)(2.21)

In the last line, we have changed variables from t, measured in seconds, to the dimensionlessx ≡ t/τ . We will be switching back and forth between these as is convenient. We see thatthe dgen metric is proportional the square root of σ’s autocorrelation at zero lag minus itsautocorrelation at lag td/τ . Adopting the notation R[σ, σ] (x) for the autocorrelation, we

44

may write:

dgen [σ] [Vs1(t;α) , Vs2(t;α) ; τ ]T = α√τ |T |

γgen [σ] (td/τ) (2.22)

where γgen [σ] (x) ≡ N [σ]√

2 (R[σ, σ] (0)−R[σ, σ] (x)) (2.23)

for single spike time recordings. Noting that the autocorrelation for σvR (x) is:

R [σvR, σvR] (x) ≡∫ ∞−∞

σvR(x′)σvR

(x′ − x

)dx′ = 1

2e−|x| , (2.24)

and recalling N [σvR] ≡ 1, we have for single spikes:

dvR [Vs1(t;α) , Vs2(t;α) ; τ ]T = dgen [σvR] [Vs1(t;α) , Vs2(t;α) ; τ ]T

= α√τ |T |

√1− e−|td|/τ (2.25)

⇒ dvR [Vs1(t;α) , Vs2(t;α) ; τ ] 2T = α2

τ |T |

(1− e−|td|/τ

)(2.26)

Since fsi(t) = Vsi(t; 1.0) (Eqs. 2.10, 2.19) and D2vR (s1, s2)τ = τ |T | dvR [fs1 , fs2 ; τ ] 2

T

(Eq. (2.15)), this is consistent with van Rossum’s result (Eq. (2.5)).

With the formula, Eq. (2.25), we see the problem with dvR and DvR we previously discussedin Secs. 2.2 and 2.3: the initial rise for pairs of single spike trains is in proportion to

√|td|.

It is therefore necessary to square, as van Rossum and others have, to get a quantity thatinitially rises ∝ |td|. But in our case this would produce a metric that is second order inVs1−Vs2 , with dimensions of membrane potential squared.

We can avoid having to do this by placing a proper constraint on the kernel, σ, namelythat

∫∞−∞ σ

′(x)2 dx converges to a non-zero finite value. To see that this suffices, we beginby expressing γgen as follows (letting xd ≡ td/τ):

γgen [σ] (xd) = N [σ]

√∫ ∞−∞

(σ(x)− σ(x+ xd))2 dx (2.27)

We may examine the behavior of γgen for xd close to zero by Taylor expanding σ(x+ xd)in the integrand in powers of xd about x, which gives:


√∫ ∞−∞

x2d ·(σ′(x) + 1

2σ′′(x) · xd + . . .

)2dx (2.28)

Since we are taking the limit xd → 0, we will dismiss all but the lowest order term in xd in

45

the integrand to give (recalling xd = td/τ):

limtd→0

γgen [σ] (td/τ) =|td|τN [σ]

√∫ ∞−∞

σ′(x)2 dx (2.29)

This dismissal may raise some doubt since σ′′ can involve divergences, as it will in case ofσX . It may therefore seem questionable whether terms involving σ′′ and higher derivativescan be reliably neglected as xd → 0 even though they involve higher powers of xd. As itturns out, so long as

∫∞−∞ σ

′(x)2 dx exists, Eq. (2.29) is valid. This is rigorously establishedin Appendix 2.6.

Our sufficient condition, then, for a proportional rise of dgen with the spike time differencebetween single delta-function-spike recordings is that:

0 <

∫ ∞−∞

σ′(x)2 dx <∞ (2.30)

Eq. (2.29) is also our basis for setting N [σ]. For any kernel satisfying Eq. (2.30), we ensurethat γgen initially rises as |td| /τ (and that dgen therefore rises as α

τ3/2√|T ||td|) if we set

N [σ] = 1/√∫∞

−∞ σ′(x)2 dx . This is a useful way to regularize the behavior of dgen, and

we use it. For any σ that does not satisfy Eq. 2.30 (in particular, σvR), we do not attemptany regularization and set N [σ] to unity:

N [σ] =

1/√∫∞

−∞ σ′(x)2 dx : 0 <

∫∞−∞ σ

′(x)2 dx <∞1 : otherwise

(2.31)

2.4.2 Many-spike recordings

Let us now move on to uncover how dgen performs when many spikes are involved:

Vs1(t;α) = α

n1∑i=1

δ(t− t1i) , Vs2(t;α) = α

n2∑i=1

δ(t− t2i) (2.32)

46

For such time courses, we have:


N [σ]α

τ

√√√√∫ ∞−∞

(n1∑i=1

σ((t− t1i) /τ)−n2∑i=1

σ((t− t2i) /τ)

)2

dt/|T | (2.33)

=N [σ]α√τ |T |

√√√√√√√√√√(n1 + n2)R[σ, σ] (0) + 2

n1∑i=1

i−1∑j=1

R[σ, σ]

(t1i − t1j

τ

)

+ 2

n2∑i=1

i−1∑j=1

R[σ, σ]

(t2i − t2j

τ

)− 2

n1∑i=1

n2∑j=1

R[σ, σ]

(t1i − t2j

τ

) (2.34)

Note that for compactly supported σ, there will always be a critical value, xc[σ], such thatfor |x| beyond xc[σ], R[σ, σ] (x) is strictly zero. The sums in Eq. (2.34) only need to beexplicitly evaluated over terms for which the scaled time difference, |ta − tb| /τ , is less thanxc[σ]. Assuming a density of spikes that is, on average, constant in time, this leads tocomputational complexity for the exact calculation that grows as ni + nj . Conversely, fornon-compact σ such as σvR, this truncation is not possible, leading to O(ni ·nj) complexityfor the exact calculation, as one has for D2

vR.

Because we have set N [σvR] = 1,

dvR [Vs1(t;α) , Vs2(t;α) ; τ ] = dgen [σvR] [Vs1(t;α) , Vs2(t;α) ; τ ] (2.35)

Recall thatR [σvR, σvR] (t/τ) = 1

2e−|t|/τ (2.36)

Since D2vR (s1, s2)τ = τ |T | dvR [Vs1 , Vs2 ; τ ]2, Eq. (2.34) is consistent with Eq. (2.7), and

both are consistent with an increasing dvR as spikes in opposite trains separate.

In particular, for single spike trains, both expressions recover Eq. (2.5):

D2vR (s1, s2)τ = τ |T | dvR [Vs1(t;α) , Vs2(t;α) ; τ ]2 = 1− e−|t21−t11|/τ (2.37)

47

We can clarify Eq. (2.34) by making use of γgen [σ] (x) (Eq. (2.23)):


α√τ |T |

√√√√√√√√√√(n1 − n2)2N [σ]2R[σ, σ] (0)−

n1∑i=1

i−1∑j=1

γgen [σ] ((t1i − t1j) /τ)2

−n2∑i=1

i−1∑j=1

γgen [σ] ((t2i − t2j) /τ)2 +

n1∑i=1

n2∑j=1

γgen [σ] ((t1i − t2j) /τ)2

(2.38)

Assuming the autocorrelation, R[σ, σ] (x) vanishes as x goes to ∞, which it does for σvR,along with any σ with compact support, we have:


α√τ |T |

√√√√√√√√√√1/2 (n1 − n2)2 γgen [σ] (∞)2 −

n1∑i=1

i−1∑j=1

γgen [σ] ((t1i − t1j) /τ)2

−n2∑i=1

i−1∑j=1

γgen [σ] ((t2i − t2j) /τ)2 +

n1∑i=1

n2∑j=1

γgen [σ] ((t1i − t2j) /τ)2

(2.39)

where γgen [σ] (∞) is shorthand for limx→∞ γgen [σ] (x). Finitude of γgen [σ] (∞) is guaran-teed if we further assume square-integrability for σ since if R[σ, σ] (x) vanishes with large x,

γgen [σ] (∞) =√

2∫∞−∞ σ(x)2 dx. For compactly supported σ, terms need only be explicitly

evaluated if ta−tb/τ is not so large that γgen [σ] (ta−tb/τ) = γgen [σ] (∞). This, again, oftengives an O(ni + nj) complexity calculation.

Eq. (2.39) allows us to better see the inner workings of the metric: d2gen has a positive

term proportional to the squared difference in the number of spikes for the two recordings,and it has a sum over terms that increase with the time difference between pairs of spikes,one from each recording, in precisely the same functional manner as d2

gen does for a pairof single-spike trains: Provided our kernel constraint, Eq. (2.30) is met, each term initiallyincreases as ((t1i − t2j) /τ)2, and eventually reaches an upper bound as the difference suf-ficiently exceeds τ if the kernel is compact and square-integrable. This leads to an initiallyhyperbolic rise for dgen with individual time differences (linear if the membrane poten-tials are otherwise equal). The parameter, τ , sets the scale for the distance’s (bounded)dependence on spike time differences. The hyperbolic behavior is favorable because it isthe same functional response that the Euclidean metric on coordinate spaces has withindividual coordinate differences.

The two positive terms make a great deal of sense. Pairs of spikes between recordings are

48

compared in a sensible way, and the presence of unpaired spikes raises the distance squaredas the square of the number of such spikes. Interestingly two other terms appear in thedistance squared, both of which decrease with the time difference between pairs of spikesin the same recording, according to the exact same function as does the distance squaredincrease in the time difference between pairs in opposite recordings. These terms may becompared to the time-dependence on pairs of same-train spikes in the first two double sumsin Eq. (2.6). We know, from dgen’s definition (Eq. (2.16)), that it is strictly positive andthat these negative terms cannot overcome the positive ones. Furthermore, their presencecan be anticipated from the form of Eq. (2.33).

With these recognitions quieting any doubts, it is useful to remark on the utility of thenegative terms, which may be seen as handling the problem of spike pairing: Given twospike trains, one of which differs from the other only by small timing shifts in the samespikes, it is natural to think that only the timing differences between the spikes that “gotogether” is important to the distance between the trains. The relative timing of unrelatedspikes is not so important. The negative terms address this issue: they cause a convolution-based metric to neglect differences in less related spike pairs and to focus on differencesin more related pairs—in the following way: Suppose two spikes, t1i and t2j , in oppositerecordings, are nearby to one another, and that a third spike, say t1k, is far from both.Our dgen prioritizes the timing difference between t1i and t2j by “shielding” the effect ofthe positive inter-recording term, γgen [σ] ((t1k − t2j) /τ)2, with the negative intra-recordingterm, −γgen [σ] ((t1k − t1j) /τ)2. Due to the asymptotic behavior of γgen (for well-behavedσ), these terms will be similar in magnitude.

2.4.3 Fourier analysis

We now need to address how our metric handles oscillatory components of various fre-quencies in the input time courses. Since it looks at smoothed membrane potentials, dgenwill, by necessity, have an attenuated response to high frequencies. In regard to sub-spikemembrane potential fluctuations, we would usually like to limit this low-pass behavior, pre-serving as much information at all frequencies as we can7. There should be no frequencies,however, that are entirely overlooked. Furthermore, we do not want any local maxima orminima in the frequency response other than the central peak8 at ω = 0, since this wouldbe unjustifiably partial toward or against such frequencies.

7In some cases, such as in the presence of abundant high-frequency noise, this may not apply. Thiscircumstance is addressed in App. 2.8

8The tip of this peak will usually not contribute substantially since we are taking the difference betweenmembrane potential recordings, which tends toward zero mean for sufficiently lengthy signals; the pri-mary contribution to the metric will come from the slopes of the central response peak.

49

Recall our definition of dgen:

dgen [σ] [V1, V2; τ ]T =

N [σ]

√∫ ∞−∞

(∫Tσ((t− t′) /τ) · (V1(t′)− V2(t′)) dt′/τ

)2

dt/|T | (2.40)

We may alternatively express this in the following way:

dgen [σ] [V1, V2; τ ]T = N [σ]

√∫ ∞−∞

s[σ; τ ] (t) 2T dt/|T | (2.41)

where s[σ; τ ] (t)T ≡∫ ∞−∞

σ((t− t′

)/τ)h(t′)Tdt′/τ (2.42)

and h(t)T ≡

V1(t)− V2(t) : t ∈ T0 : otherwise

(2.43)

Parseval’s theorem informs us that Eq. (2.41) is equivalent to an integral over the squaremodulus of the Fourier transform, s(ω), of s(t):

dgen [σ] [V1, V2; τ ]T =N [σ]√|T |

√∫ ∞−∞|s[σ; τ ] (ω)T |

2 dω (2.44)

From the convolution theorem for Fourier transforms, we have that:

s[σ; τ ] (ω)T = σ(τω) h(ω)T (2.45)

Therefore,

dgen [σ] [V1, V2; τ ]T =N [σ]√|T |

√∫ ∞−∞|σ(τω)|2 ·

∣∣∣h(ω)T

∣∣∣2 dω (2.46)

We see that dgen responds to oscillations of frequency, ω, in h(t) (which for large enough |T |,may be taken to be essentially the same as those in V1−V2 over T ) according to the presenceof the τ -scaled frequency, τω, in σ. Said another way, σ’s power spectrum, |σ(τω)|2, definesthe frequency response for dgen. We can therefore ensure that the frequency response doesnot unfairly bias or overlook specific frequencies or frequency bands by stipulating thatour kernel function’s Fourier transform is a smooth, oscillation-free function that has nozeros. As we will see below, the no-zeros property also guarantees a metric that returns 0only for identical recordings.

Schrauwen and Campenhout (2007) offer three alternative kernels for spike train metrics.

50

Two of these, the Gaussian and Laplacian kernels, are non-compact, ruling them out for us.The third, a triangularly shaped kernel, has spectral oscillations and zeros. In the contextof spectral analysis, compactly supported multiplicative window functions are sought outthat minimize spectral leakage across bins when the finite Fourier transform is taken. Theusual emphasis in designing these is a narrow main lobe for the window’s power spectrumand not on the elimination of zeros or oscillations occurring outside the main lobe, whichare quite common and present for the reputed zeroth order prolate spheroidal wavefunction,or Slepian window.

Compactly supported windows with the no-spectral-oscillations property have been studiedin the context of convergent spectral parameter estimation by Depalle and Helie (1997).They considered the forms:

σHP (x;α) = 12 (1 + cos (2πx)) e−2α|x| : |x| ≤ 1/2 ; (α ≥ 2) (2.47)

and

σDH(x; a, b) = (1− 2 |x|)a e−4b x2 : |x| ≤ 1/2 ; (specific pairs a, b) (2.48)

with both functions set to zero for |x| > 1/2.

The first of these is called the Hanning-Poisson window (see Harris, 1978). The second isdue to Depalle and Helie themselves. I advance a different spectral-oscillation-free kernelfunction in this dissertation, σX . The reason is that while for spectral analysis one isinterested in window functions that have as much power concentrated at low frequenciesas possible (Harris, 1978), for a membrane potential metric, the slower the high frequencyfall-off, the better: We want a metric that preserves high frequency information to thegreatest extent it can9 while still meeting our other requirements. As we will now see, σXbetter enables dgen to do this than the Hanning-Poisson and Depalle-Helie windows do.The three kernels are plotted and compared in Fig. 2.2.

The σX kernel is an integral over triangular kernels of less than unit width:

σX(x) ≡

4∫ 1

2|x| (1− 2 |x| /x0) dx0 = 4 (1 + 2 |x| (log |2x| − 1)) : 0 < |x| < 1/2

4 : x = 0

0 : otherwise

(2.49)

It is monotonically decreasing in |x| and continuous and bounded at the origin. Its deriva-tive, on the other hand, diverges at the origin, producing an infinitely sharp cusp (seeFig. 2.2). This critical feature causes σX to preserve a near-optimal amount of high fre-quency information in the signals it is convolved with. Nonetheless,

∫∞−∞ σ

′X(x)2 dx = 128

9As noted above, the inverse may apply to noisy recordings: we may want to disregard high frequencycomponents of the signal. App. 2.8 provides a kernel addressing this situation.

51

4.0

3.0

2.0

1.0

0.0-0.4 -0.2 0.2 0.40

x

No

rmali

zed

σ(x) σ

X

σHP

σDH

0-50π-100π 50π 100π

-10

-20

-30

-40

-50

-60

-70

k

0σX

σHP

σDH

~

~

~

σ(k) /σ(0)

in d

B~

~

A

B

Figure 2.2: Plots of the σHP , σDH , and σX kernel functions discussed in the text. In A, thefunctions are plotted. For the sake of comparison, all have been normalized tounit area. In B, we have the Fourier transforms in decibels relative to the zero-frequency amplitude. Here, we are using α = 2 for σHP and a = 1.8, b = 0.92for σDH . Of the parameters found in the literature, these give σHP and σDH theslowest high-frequency fall-off while also maintaining a minimum-free spectrum.For σX , the high-frequency fall-off is considerably slower.

52

converges as required for first order spike time comparison. The gamma function, γX(x) =γgen [σX ] (x) is plotted in Fig. 2.3.

The Fourier transform of σX is (see Fig. 2.2 for plot):

σX(k) = 2√

2π

(2k

)2Cin (k/2) (2.50)

where Cin (x) ≡∫ x

0 t−1 (1− cos(t)) dt —a type of cosine integral function. The tails of σX

fall off roughly as log |k|/k2. Discontinuous finite functions, such as σvR, preserve more ofthe high-frequency spectrum—the Fourier transform for such functions rolls off as 1/|k|—but they also cause

∫∞−∞ σ

′(x)2 dx to diverge. Meanwhile, Fourier transforms of functions

with discontinuous but finite derivatives, such as σHP and σDH , roll off as 1/k2 (see Harris,1978, pg. 59). Our σX falls off slower without causing a divergent square integral for thederivative: It’s just right for our application.

Furthermore:

σ′X(k) = 2√

2π

(2k

)3(−Cin (k/2) + 1− cos (k/2)) (2.51)

is strictly negative for k > 0 and antisymmetric on k. This means σX(k) is strictlydecreasing with |k|, as we require.

Noting N [σX ] = 1/

8√

2, we will then define our preferred metric, dC ≡ dgen [σX ]:

dC [V1, V2; τ ]T ≡1

8√

2

√∫ ∞−∞

(∫TσX((t′ − t) /τ) (V1(t)− V2(t)) dt′/τ

)2

dt/|T | (2.52)

Identity of indiscernibles for dC

The frequency-space expression for dgen, Eq. (2.46), allows us to see that dC returns 0 ifand only if the two membrane potentials it compares are equal over the time domain ofinterest, which is important for the validity of the metric:

dC [V1, V2; τ ]T = dgen [σX ] [V1, V2; τ ]T =1

8√

2 |T |

√∫ ∞−∞|σX(τω)|2 ·

∣∣∣h(ω)T

∣∣∣2 dω (2.53)

Given that σX ’s Fourier transform,

σX(k) = 2√

2π

(2k

)2Cin (k/2) , (2.54)

53

−1 − 0.5 0.5 1

0.05

0.10

0.15

γX

(x

)

00

x

Figure 2.3: Gamma function, γX(x), showing the behavior of dC as spike pairs separate.

has no zeros, dC will be zero strictly for h(ω)T = 0, which will be the case if and only ifh(t)T = 0, and this is equivalent to V1(t) = V2(t) over all of T (see Eq. 2.43).

2.4.4 Demonstrations

We are now ready to compare the performance of our convolution-based metric, dC , to thatof the van Rossum metric, D2

vR, and the Victor-Purpura metric, DVP . We will do this byevaluating all three metrics with, as arguments, computer-generated neural data of severaldifferent kinds. One of the data arguments will be a systematically altered version of theother. We will plot each metric versus an “offset” parameter that controls the amountof alteration that occurs. The data include randomly-generated Poisson spike trains andsimulated Hodgkin-Huxley neuron data (standard model parameters used—see, e.g., Dayanand Abbott (2001)). The relevant plots are found in Fig. 2.4.

In order to compare the metrics on the same plot, we choose their parameters and scalethem such that, for delta function spike input, they all initially increase at the samerate with spike separation and reach the same maximum values as spikes get very farapart. Choosing a smoothing time of τX = 50 ms for dC , this requires a time constantτvR = τX γX(1) ≈ 9 ms for D2

vR and a shift cost qVP = 2 cin/ (τX γX(1)) ≈ 0.22 ms−1

for DVP (using cin = 1). Leaving dC unscaled, we must scale D2vR by αγX(1) /

√n |T | τX ,

where T is our time domain, n is the number of spikes that occur, and α is the area of a

spike. Finally, DVP must scale by αγX(1) /(

2 cin√n |T | τX

).

Performing these scalings requires an estimate of the spike area, α, for simulated neurons.In App. 2.9 the procedure used to produce this estimate is provided.

54

0 10 20 30 40 500

0.1

0.2

Offset (millisec)

Dis

tan

ce (

mV

)

Rast

er

0

4

812

mV

0

0.04

0.08

0.12

0.16

Offset (millisec)

Dis

tan

ce (

mV

)

0

0.1

0.2

0.3

0.4

Offset (millisec)

Dis

tan

ce (

mV

)

0

0.04

0.08

0.12

0.16

Offset (millisec)

Dis

tan

ce (

mV

)

−75

−65

−55

mV

−75

−65

−55

sec

mV

0

Offset (millisec)

Dis

tan

ce (

mV

)

−75

−65

−55

mV

−75

−65

−55

sec

mV

0.05

0.1

0.15

0.2

A

B

C

E

D

0 10 20 30 40 50

0 10 20 30 40 50

0 10 20 30 40 50

0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1 1.2

0 0.2 0.4 0.6 0.8 1 1.2

0 0.2 0.4 0.6 0.8 1 1.2

−75

−65

−55

mV

−75

−65

−55

sec

mV

0 0.2 0.4 0.6 0.8 1 1.2

−75

−65

−55

mV

−75

−65

−55

sec

mV

0 0.2 0.4 0.6 0.8 1 1.2

Figure 2.4: Demonstrations. Left column: dC (×), scaled D2vR (∗), and scaled DVP ()

metrics evaluated on several different types (A–E ) of computer-generated neu-ral data versus the offset of features of the data. The two arguments for themetrics are a randomly generated signal and a modified version of the same sig-nal. See text for details. Right column: Sample raw and σX -convolved data ofeach type. Gray trace is 50-ms-offset version of black trace. (A) Delta functionPoisson spikes at 4 Hz (on average). (B) Hodgkin-Huxley (HH) neuron undercurrent noise sufficient to cause 4 Hz (average) spiking. (C ) HH neuron undercurrent noise insufficient to cause spiking. (D) HH neuron under input similarto C plus non-offset 4 Hz Poisson pulse input. (E ) HH neuron under similarinput to C plus fixed (5 ms) offset 4 Hz Poisson pulses.55

Fig. 2.4(A) shows the metrics evaluated for 1-second-long 4 Hz Poisson spike trains (gener-ated by randomly inserting four spikes into a 1-second block of time). The delta functionspike model is used as membrane potential input to dC with α set to10 138 µV· s. Eachpoint in the graph on the left represents a metric distance between two versions of the samerandomly generated Poisson spike train: the original and a copy offset by the amount oftime indicated on the x-axis. To keep the endpoints of the recordings fixed, 0.11 secondsof padding silence is added before and after the 1 second of spiking. Only this 1 secondduring which spikes occur is offset—and by no more than the padding silence so that theendpoints of the recording remain unchanged. A sample pair of spike trains is seen on thetop right: the original randomly generated train uses the darker line; its copy, offset by 50ms, appears in the lighter gray.

On the left, markers indicate the average distance between an unaltered and offset Poissonprocess across 600,000 realizations of the process. Error bars indicate the sample standarddeviation across realizations. The plot shows that the metrics share an initial regionof linear increase transitioning into a more or less level maximum. The extent of thelinear region is similar for all three metrics. Shape differences between the curves exist,particularly between theDVP curve and the others, reflecting the sudden transition betweenthe offset-dependent and -independent parts of DVP ’s response. The D2

vR and dC curveshave a more graduated transition and closely agree.

On the bottom right side of Fig. 2.4(A), we have the σX -convolutions corresponding to thetwo rasters in the top right (assuming delta function spike area 138 µV· s). This illustratesthe first implicit step in processing that dC uses to compare data. Again, the darker linecorresponds to the original, and the gray line is the offset version.

In Fig. 2.4(B), we see the first of our four plots that use simulated data from a Hodgkin-Huxley neuron as input to dC . Spike train inputs to D2

vR and DVP are generated byapplying a simple spike detection algorithm to the recorded membrane potential: Peaks inthe potential occurring above -20 mV are recorded as spikes.

The input to the neuron is a simple inject current signal, generated by a Gaussian randomwalk process with a 50 ms time decay toward an equilibrium value of -1.5 µA/cm2. Stepsin the random walk are chosen from a normal distribution with 0 mean and standarddeviation 4.0 δt/

√δt/(50 ms) µA/cm2, where δt is the step size for the simulation (0.01

ms in our case). The power spectrum for fluctuations in this process plateaus below 20Hz, above which it falls off as 1/|f |2. The input offers a simplified representation of theaggregate current a neuron might receive at some point in vivo.

Similarly to the procedure we use for Fig. 2.4(A), in the non-offset case the neuron receivesthis input only during a 1-second on-period that is surrounded on both sides by 0.11 seconds

10This value was chosen because it is consistent with the effective spike area found for the Hodgkin-Huxleyneuron simulated in making the remaining four plots.

56

of zero-input padding. The offset input is generated by shifting the on-period only. Asthe on-period never approaches the end of the padding, the duration of the simulation isfixed. Trials are used if the neuron spikes precisely four times in response to the 1-secondon-period so that we again have a neuron spiking at an average rate of 4 Hz, which isconsistent with what Lennie (2003) suggests for active neurons.

Sample membrane potential output for the neuron under a realization of this input is shownin the top right plot of Fig. 2.4(B). Once more, the lighter-colored trace is the 50-ms-offsetversion of the darker trace. The tops of spikes (∼40 mV) are clipped so as to show moresubthreshold details in the recording. The convolution of the two sample recordings by σX(for τ=50 ms) appears on the bottom right.

Metric distances are plotted on the left versus the input offset. For each offset, the metricsare evaluated on 1,100 independently generated input pairs11. Means and sample standarddeviations for each metric across realizations are represented in the graph, respectively, bythe markers and error bars.

In the plot, we see that, at higher offsets, our convolution-based distance, dC , indicatesa significantly larger distance than do the spike train metrics. Keep in mind that wehave normalized these distances to each other, so the excess is not a simple effect ofnormalization: dC is genuinely representing about a 100% higher distance for the largeroffsets than are either of the spike train metrics. This is important because spike timedifferences are only one of the ways that membrane potential recordings differ from eachother under this type of input. They also differ in their subthreshold dynamics due to themembrane potential fluctuations produced by the noise current. An agreement betweenthe metrics here would indicate that dC offers no advantage over D2

vR and DVP ; too muchof an excess would indicate an over-emphasis on subthreshold behavior. Instead, we seethat dC monitors both subthreshold and spike time differences, the former of which areignored by spike time metrics by construction.

In Fig. 2.4(C), we reduce the amplitude of the inject current fluctuations by a factorof one half so that random walk steps are chosen from a normal distribution of mean0 and standard deviation 2.0 δt/

√δt/(50 ms) µA/cm2. The input produces substantial

subthreshold variations but rarely any spikes. Realizations are chosen such that no spikesat all are produced during the 1-second input period, and the distance is plotted versus theon-period offset. Again, 1,100 realizations are generated for each point on the graph, whichrepresents the mean and sample standard deviation of the distance across realizations.Sample raw and σX -convolved membrane potentials are shown on the right. Here we seethe response of dC to purely subthreshold dynamics for the neuron. Naturally, D2

vR andDVP have no response as there are no spikes.

11Fewer realizations than in (A) are used because dC takes a similarly greater time to evaluate on thetime-extended membrane potential recordings than any of the metrics do on discrete spike trains.

57

In Fig. 2.4(D), the same model for the current input as in (C) is used again. This time,however, superthreshold pulse input to the neuron is added. This input takes the formof a sudden increase to the neuron’s excitatory synapse conductance12, which then decayswith time constant 5 ms. This is sufficient to produce a spike in almost all cases. Timesfor these inputs are chosen as a Poisson process at 4 Hz. Runs are discarded in the eventthe neuron does not happen to spike in response to all pulses. A total of 1,000 membranepotential trace pairs are used to generate the data points at each offset value.

In plotting the distances versus the offset, only the current input is offset. The pulse inputsare the same for both the original and offset neural responses. This allows us another wayof gauging how significant the subthreshold variations in the membrane potential are todC : this time in the presence of (fixed) spikes. In the sample membrane potential andconvolution plots on the right, we can see that the spikes in the non-offset and 50-ms-offset versions are highly overlapping. On the left, we see that while all metrics respondto current offset, dC ’s response is much greater. The spike train metrics respond becausethe current offset has a small but non-negligible effect on the spike latency in response tothe superthreshold pulses. The response of dC is much greater since the slight spike timingjitter this introduces is a minor effect compared to changes to the membrane potentialintroduced by the current offset.

Finally, Fig. 2.4(E) shows a similar plot to (D), with the difference that instead of havingthe superthreshold pulse input exactly coincident in both the original and altered versionsof the input (the neural response to which the metrics are comparing), we offset the pulsesby exactly 5 ms each time. This number does not change throughout the plot. Just asin (D), only the current has its offset varied. The importance is that (E) allows us to seethe relative effect of varying the subthreshold current in the presence of fixed non-zerodifferences in the superthreshold input. As we see, only our convolution-based distance,dC , shows significant variation over the course of the plot, with the van Rossum and Victor-Purpura distances holding steady on average. Spike latency effects are not systematicallyaffecting the spike train metrics because while offsets to the current do shift spikes, theymake spike offsets smaller just as often as larger. Once more we see that dC provides accessto information that D2

vR and DVP do not give us access to. As in (D), 1,000 independenttrace pairs contribute at each offset plotted in the distance vs. offset graph. Sample rawand σX -convolved membrane potentials appear at right.

Our demonstrations indicate that, as regards spike-time offsets alone, dC has a responsethat is similar to D2

vR and DVP . We have also seen several situations in which dC capturesa substantial amount of additional, meaningful information regarding differences in thesubthreshold behavior of the membrane potential—and therefore differences in the neuron’ssubthreshold inputs—than what is reported by D2

vR or DVP .

12Increment is 0.1 mS/cm2. 0 mV is used for the excitatory synapse reversal potential.

58

2.5 Discussion

We have seen that a generalized convolution-based metric, dgen [σ], of the form Eq. (2.16),has a first order response to both same-time membrane potential differences, V1(t)−V2(t),and to differences in spike timing between V1 and V2, provided that its kernel, σ, sat-isfies the condition, Eq. (2.30). After further exploring some of the generalized metric’sresponse characteristics, both to spikes and to fluctuations of various frequencies in themembrane potential, we obtained a specific choice of kernel, σX (Eq. (2.49)), that sat-isfies Eq. (2.30), preserves a near-maximal amount of high-frequency information in themembrane potentials, is otherwise free of frequency bias and causes the metric to return 0only for identical recordings. Applying dgen to this kernel leads to the convolution-basedmetric, dC (Eq. (2.52)), argued for in this work. A kernel that discards high frequencycomponents—more suitable for applications with substantial high frequency noise—andretains the other properties appears in App. 2.8.

On the subject of computing dC for sampled data, one needs to do a quick and accu-rate job of approximating the integrals involved. Evaluating the convolution operation,∫T σX((t− t′) /τ) · (V1(t′)− V2(t′)) dt′/τ , requires a separate integral for each point, t,

within τ/2 of T and is the most intensive part of the computation. Time can be savedby thoughtfully (and sparsely) choosing the points for which the convolution needs to beexplicitly evaluated versus the points at which it can be interpolated.

One can also save computational overhead by omitting points from the membrane potentialdifference that can be linearly interpolated from surrounding points. If one takes thisapproach, it is crucial to remember that the product, σX((t− t′) /τ) · (V1(t′)− V2(t′)), willnot be a straight line between the sampled points but rather a more complex (thoughanalytic) function. The second derivative (usually) diverges at the point, t− t′ = 0, soone should not attempt a trapezoidal integration without sampling the product at thispoint and in its vicinity. A piecewise analytic integral avoids this necessity. The densityof sampled points, and thus the required number of computations, can often be furtherreduced by using a cubic spline or other polynomial approximation for V1(t)−V2(t).

It is worthwhile to note that dC is an inner product metric: it applies the L2 Hilbert spaceinner product norm to linearly transformed (convolved) membrane potentials. This meansthat the Hilbert space inner product itself induces a useful quantification:

C〈V1, V2 ; τ〉T ≡ N [σX ]2∫ ∞−∞

V σX1 (t; τ)T · V

σX2 (t; τ)T dt/|T |

=1

128

∫ ∞−∞

(∫∫T×T

σX

(t− t′

τ

)V1

(t′)· σX

(t− t′′

τ

)V2

(t′′) dt′dt′′

τ2

)dt

|T |(2.55)

This “convolution-based inner product” has a similar meaning to a dot product between

59

vectors in a finite-dimensional vector space. Since the Cauchy inequality holds for the L2

inner product, we may quantify the “collinearity” of two recordings by taking:

C〈V1, V2; τ〉T/√

C〈V1, V1; τ〉T × C〈V2, V2; τ〉T (2.56)

This ratio has a maximum of unity that occurs strictly for V1 and V2 that differ by at mosta multiplicative constant. It decreases with discrepancies in the timing of various features,including spikes, and with local membrane potential displacements up or down. Definingvoltages in such a way that the mean of V1 and V2 is close to zero makes this measure moreinformative. Paiva et al. (2009) and Schreiber et al. (2003) discuss similar measures forspike trains.

Our metric may be beneficially applied to physiological recordings or computer simulationdata any time that spike timing and subthreshold signal differences are both of interest.Some specific applications for dC include the following:

1. comparing simulated model neuron behavior to experimentally observed neurons

2. comparing simulated neural network behavior to experimental observations

3. comparing state trajectories for simulated or observed neural systems under differinginitial or external conditions

To extend to a metric over time courses for ensembles rather than individual neurons,one can add distances between corresponding neural signals in the ensemble, for example,taking a root sum of squares (RSS) over distances between individual signals. If it isambiguous or unimportant which neurons correspond to each other, one can apply themetric to the “bulk,” ensemble-averaged signals. More refined constructions such as whatHoughton and Sen (2008) apply to DvR may be applied to dC as well.

Recent advances in optical methods (e.g., Quirin et al., 2014; St-Pierre et al., 2014) andcontinuing advances in computational neuroscience (see Markram, 2012) make it possibleto obtain simultaneous data for increasingly many neurons (Yuste & Church, 2014). Asthis data comes online, it is important to think about ways to process it. Our dC metric,which can be applied to chemical (e.g., calcium) data as well as to membrane potentials,may be particularly relevant for this since it extends so naturally to a metric over neuralensemble data.

By applying the metric to simulated neural data, we have confirmed that it responds ina desirable way to complex membrane potentials, increasing initially linearly with timingoffset for several types of randomly generated data, including data containing multiplespikes and data containing no spikes. We have seen that dC provides a considerable amountof information additional to that available from spike train metrics.

60

The membrane potential contains information about what a neuron is “hearing” in additionto what it is “saying.” The dC metric offers an elegant tool that simultaneously quantifiesdifferences in both types of information.

61

2.6 Proof of Eq. (2.29)

In this appendix, we will prove that:

Proposition: If∫∞−∞ σ

′(x)2 dx converges then for sufficiently small |xd|,

γgen [σ] (xd) ≈ N [σ] |xd|

√∫ ∞−∞

σ′(x)2 dx (2.57)

Proof:

The derivation begins with Eq. (2.27):


√∫ ∞−∞

(σ(x)− σ(x+ xd))2 dx (2.58)

Differentiating this expression, we find

d

dxdγgen [σ] (xd) = N [σ]

−2∫∞−∞ (σ(x)− σ(x+ xd))σ

′(x+ xd) dx

2√∫∞−∞ (σ(x)− σ(x+ xd))

2 dx(2.59)

= −N [σ]2∫∞−∞ (σ(x)− σ(x+ xd))σ

′(x+ xd) dx

γgen [σ] (xd)(2.60)

Consider the one-sided limits,

limxd→0±

d

dxdγgen [σ] (xd) (2.61)

= −N [σ]2 limxd→0±

∫∞−∞ σ(x)σ′(x+ xd) dx−

∫∞−∞ σ(x+ xd)σ

′(x+ xd) dx

γgen [σ] (xd)(2.62)

= −N [σ]2 limxd→0±

∫∞−∞ σ(x− xd)σ′(x) dx−

∫∞−∞ σ(x)σ′(x) dx

γgen [σ] (xd)(2.63)

Assuming σ is continuous, both numerator and denominator of Eq. (2.63) go to zero inthese limits, in which case we may apply L’Hopital’s rule:

62

limxd→0±

d

dxdγgen [σ] (xd) = −N [σ]2

(lim

xd→0±

d

dxdγgen [σ] (xd)

)−1

· limxd→0±

d

dxd

∫ ∞−∞

σ(x− xd)σ′(x) dx (2.64)

⇒(

1

N [σ]lim

xd→0±

d

dxdγgen [σ] (xd)

)2

= − limxd→0±

d

dxd

∫ ∞−∞

σ(x− xd)σ′(x) dx (2.65)

= limxd→0±

∫ ∞−∞

σ′(x− xd)σ′(x) dx (2.66)

=

∫ ∞−∞

σ′(x)2 dx (2.67)

⇒ limxd→0±

d

dxdγgen [σ] (xd) = ±N [σ]

√∫ ∞−∞

σ′(x)2 dx (2.68)

⇒ γgen [σ] (xd) ≈ N [σ] |xd|

√∫ ∞−∞

σ′(x)2 dx : |xd| sufficiently small (2.69)

The last two conclusions follow since γgen has a minimum of 0 at xd = 0.

This proof assumes continuity for σ and that the integral,∫∞−∞ σ(x− xd)σ′(x) dx, is dif-

ferentiable. The convergence of∫∞−∞ σ

′(x)2 dx entails both since squares of Dirac deltafunctions cannot be integrated and since:∣∣∣∣ ddxd

∫ ∞−∞

σ(x− xd)σ′(x) dx

∣∣∣∣ =

∣∣∣∣∫ ∞−∞

σ′(x− xd)σ′(x) dx

∣∣∣∣ ≤ ∣∣∣∣∫ ∞−∞

σ′(x)2 dx

∣∣∣∣ (2.70)

63

2.7 Triangle inequality for dgen

The triangle inequality for dgen will be more accessible if we first prove that:

Proposition:

For any three bounded functions, x(t) , y(t) , z(t) with |z(t)| ≤ |x(t) + y(t)|:√∫Tx(t)2 dt+

√∫Ty(t)2 dt ≥

√∫Tz(t)2 dt (2.71)

where T is a finite domain.

Proof:

This inequality follows easily from the Cauchy inequality for L2 Hilbert spaces:∫ ∞−∞

x(t)2 dt

∫ ∞−∞

y(t′)2dt′ ≥

(∫ ∞−∞

x(t) y(t) dt

)2

(2.72)

for square-integrable x(t) and y(t).

Taking the square root, we have, for any bounded functions, x(t), y(t),√∫Tx(t)2 dt

∫Ty(t′)2 dt′ ≥

∫Tx(t) y(t) dt (2.73)

Doubling both sides and adding∫T x(t)2 dt+

∫T y(t)2 dt gives:

∫Tx(t)2 dt+

∫Ty(t)2 dt+ 2

√∫Tx(t)2 dt

∫Ty(t′)2 dt′ ≥∫

Tx(t)2 dt+

∫Ty(t)2 dt+ 2

∫Tx(t) y(t) dt (2.74)

⇒

(√∫Tx(t)2 dt+

√∫Ty(t)2 dt

)2

≥∫T

(x(t) + y(t))2 dt (2.75)

⇒

√∫Tx(t)2 dt+

√∫Ty(t)2 dt ≥

√∫Tz(t)2 dt (2.76)

With this inequality accessible, it is straightforward to prove:

64

Proposition, Triangle inequality for dgen:

Given three well-behaved time courses, V1(t), V2(t), V3(t), and a bounded kernel, σ(x):

dgen [σ] [V1(t) , V2(t) ; τ ]T + dgen [σ] [V2(t) , V3(t) ; τ ]T ≥ dgen [σ] [V1(t) , V3(t) ; τ ]T (2.77)

Proof:

Define the functions, s12[σ]T (t), s23[σ](t)T , and s13[σ](t)T , according to:

sij [σ](t)T ≡∫Tσ((t− t′

)/τ)·(Vi(t′)− Vj

(t′))dt′/τ (2.78)

This gives:

dgen [σ] [Vi(t) , Vj(t) ; τ ]T = N [σ]

√∫ ∞−∞

sij [σ](t) 2T dt/|T | (2.79)

We will assume that well-behaved Vi means the sij are bounded. Recognizing that:

s13(t)T = s12(t)T + s23(t)T (2.80)

⇒ |s13(t)T | ≤ |s12(t)T + s23(t)T | (2.81)

the proposition follows from Eq. (2.71).

65

2.8 Kernel for noisy data

In the presence of significant detection noise, i.e., noise that is not intrinsic to the neuronor network of neurons being observed but rather originates in the act of observation per se,one may prefer a metric that disregards high frequency information rather than preservesas much of it as possible. Nonetheless, one still wants the metric to avoid bias toward oragainst specific frequency bands and to yield zero strictly for identical inputs (identity ofindiscernibles). The following kernel, σN , is applicable to such a case:

σN (x) ≡

3610 − 192x2 + 768

(|x|3 − x4

)− 3072

10 |x|5 : |x| ≤ 1

4

485 (1− 2 |x|)5 : 1

4 < |x| ≤12

0 : otherwise

(2.82)

Since it is a piecewise polynomial, σN satisfies our main restriction for kernels, Eq. (2.30),

with∫∞−∞ σ

′N (x)2 dx = 3512

35 , giving N [σN ] = 12

√35878 . The Fourier transform for σN

is:

σN (k) =36√2π

(1− sinc (k/4))2

(k/4)4, where sincx ≡ sinx /x (2.83)

This σN is zero-free, oscillation-free and rolls off as 1/k4 owing to σN ’s continuous secondderivative. This is much faster than the other kernels we have discussed.

66

2.9 Estimating the spike area for the simulated Hodgkin-Huxley neuron

In Sec. 2.4.4, it is necessary to estimate the effective area of the simulated Hodgkin-Huxleyneuron’s spikes to produce Fig. 2.4. I make this estimate by repeatedly evaluating ourconvolution-based metric, dC [V1, V2; τ ], on pairs of simulated membrane potentials, bothof which contain a single spike, and varying the time difference, td, between the spikes.The time sensitivity, τ , is set to τX = 50 ms, the same value we use for dC in Sec. 2.4.4.Performing a linear regression on dC versus γX(td/τX) produces the estimate for α.

A separate estimate for α is made for each of the panels, (B–E), in Fig. 2.4: the differencesfrom panel to panel in the type and nature of the neuron’s input could have some effecton the average area of its spikes. For each estimate, I generate sections of noise currentinput 0.25 seconds in length according to the same process that generates the noise currentapplied to the neuron in the relevant panel. These are down-selected on the requirementthat the neuron does not spike during the 0.25 seconds. A superthreshold synaptic pulseis then added to the middle of the input, producing a single spike in the simulated neuronthere. A second recording is produced according to the same procedure with the pulseshifted by an amount between 0–25 ms. Spike times are extracted from both recordingsvia a simple spike detection algorithm that labels local maxima above -20 mV as spikes.2,200 pairs of recordings are produced in this way, not all of which are used: sometimesone or both recordings contain a number of spikes other than one, in which case the pairis discarded.

For the non-discarded recording pairs, I record the difference, td, between the spike times,and evaluate the convolution-based distance, dC [V1, V2; τX ], between the recordings. Withthese values in hand, a collection of ordered pairs, (γX(td), dC) is then constructed. Re-gressing the distance, dC , versus γX(td), produces a regression coefficient that, when mul-tiplied by

√τX |T |, where |T | = 0.25 s, gives an estimate of α for the simulated neuron

valid for the form of input used. This estimate is then used to perform the scalings of D2vR

and DVP for that type of input (corresponding to one of the panels in Fig. 2.4). For eachinput type, the estimate is close to 138 µV· s, with estimation error ∼1%.

The contribution of the estimation error for α to the error on the scaled versions of thespike train metrics is included in the spike train metrics’ error bars in Fig. 2.4.

67

3 Combinatorial Connectionist Semantic Network Method

3.1 Introduction

Experimental detection of neurons that apparently respond to single concepts have left thefield of neuroscience, over the last decade or so, wondering how to incorporate them intoits view of the brain. Some of these experiments are reviewed in Roy (2014) and includeQuiroga’s (2005) observations of, e.g., a “Halle Berry” cell: a cell selectively responsive todifferent photos of the actress and also to her written and spoken name. Other experimentshave found category-specific cells in both humans and animals that respond to broadervisual concepts, such as natural scene, house, nest/bed, animal face, and human face.

There has been a long-standing view in the neuroscience community that concept cellscannot be real, which has led to some resistance to these results. The stance taken againstconcept cells is largely based on the rationale that there is a combinatorial explosion prob-lem for concept cells: there are more potential concepts than neurons—because everydifferent combination of basic features (or other concepts) can form the basis for a newconcept. While this observation is quite true, the logic that says it precludes concept cellsseems to rely on certain notions that were widespread at the time that the rejection ofconcept cells took hold which have since been revisited.

The first of these notions concerns neurons. Prior to the 1970s, it was held that the braindoes not change very much after early childhood—that all its neurons and synapses aremore or less rigidly in place by age 3 with very little subsequent changes. From within thisperspective, given the plain fact that a great proportion of concepts that humans, anyway,use are first encountered after age 3, it would seem that in order for there to be conceptcells in the brain, the cells for all concepts used during life would have to be present atage 3. Since the brain is not clairvoyant, this would require cells to exist for any possibleconcept that the brain might develop from age 3 forward.

It is now well-known, however, that the brain’s synapses have tremendous flexibility, orplasticity, and are constantly changing throughout life. This makes it possible for neuronsto become tuned and retuned to different concepts as those concepts are adopted—just asthe output neurons for the perceptron are tuned to input patterns and categories by wayof gradual weight changes as a result of their exposure to those inputs. Concept cells donot require, at any stage, for the brain to contain a cell for every possible concept—butrather only those that are useful at the time.

68

The second notion concerns concepts, themselves. As discussed in Sec. 1.6, up until recently(again, ca. the 1970s), it was held that concepts had well-delineated boundaries anddefinitions. Psychological experiments, however, have made it clear that this is not thecase: concepts, as used by humans, are fuzzy, non-rigid and non-discrete. Furthermore,psychologists have recognized the utility of fuzzy concepts: they help to avoid precisely thecombinatoric explosion problem that has been seen as an argument against concept cells.When fuzzy concepts are used, not as many are needed to cover the range of phenomenathat humans (or other animals) might encounter and need to deal with on any givenday: stimuli that are minor modifications to a stored concept may still be recognizedas belonging to the concept, and knowledge pertaining to that concept may be used tocompute a response to the stimulus (Murphy, 2002, pg. 21).

A third notion concerns combinability of concepts. There has been the thought, in the neu-roscience literature, that concept cells must correspond to so-called pontifical cells, whichrepresent exact and entire stimuli, rather than cardinal cells, which stand for somethingmany stimuli hold in common and represent a stimulus only through their mutual andcollaborative activity (see Barlow, 1972, Sec. 12.3). The pontifical view, again, is notconsistent with the way that concepts are recognized by psychologists, whose investiga-tions have led them to view concepts as categories that can be combined, productively, todescribe newly encountered situations and entities (Murphy, 2002, Ch. 12).

It is worthwhile to re-examine our prejudices against concept cells that were originallyformed on the basis of information that has since been revised and to take seriously the viewof the brain as an intricate and interdependent network of concept-representing neurons—especially in light of the concept cell experimental results cited above. Note that this viewdoes not require that there be one and only one single neuron for every concept used bythe brain. A more robust and therefore more likely strategy is to have distributed sets, orassemblies, of neurons corresponding to each concept.

Authors such as Roy (2014) are taking on precisely this task. However, there is an obstacleto be overcome, which concerns a classic objection made by symbol-processing intelligencetheorists, Fodor and Pylyshyn (1988), to the connectionist view of intelligence. Fodorand Pylyshyn made the point that, in order to achieve certain critical operations, suchas reasoning by analogy, it is necessary for an intelligent system to possess what theycall combinatorial symbols. Symbols are combinatorial for Fodor and Pylyshyn if, looselyspeaking, they are combinable—if you can combine symbols to get composite symbols withmeanings that combine the meanings of the original symbols.

The binding of semantic components either by structural network linkages or temporalcorrelation has been extensively discussed as a representational principle for connectionist1

1For clarity, the word ‘connectionist’, as I am using it, means “consisting of simple, more or less identicalprocessing units that are interconnected by possibly weighted and/or directed links.”

69

Relation

Instance

Role 1

Relation

Role 1

InstanceRole N

Instance

Role 2

Instance

Partic-

ipant A

Partic-

ipant B

Partic-

ipant Z

Role NRole 2

Figure 3.1: Connectionist relational encoding motif. Ellipses stand for repetition.

networks (Bowers, 2009; Hummel & Biederman, 1992; Meyer & Damasio, 2009; Singer,1999; von der Malsburg, 1994). The motif, Fig. 3.1 and cousins, has appeared in some ofthese places (e.g., Hinton et al., 1985; Hummel & Holyoak, 2003; Shastri & Ajjanagadde,1993) as a technique for relational encoding that avoids the more common semantic networkapproach of attaching relationships to edges. In this chapter, I will extend the role-bindingprinciple that the technique is using, to a principle I will call multi-binding, and I willshow how powerful this principle is. I will show that it provides a basis for combinatorialsymbolism in a connectionist context.

Ideas and expression related to this chapter may be found in Evans and Collins (2013).

3.2 Combinatorial syntax

In their landmark paper, Fodor and Pylyshyn (1988) set forward the following definitivecriteria for a combinatorial syntax:

(a) there is a distinction between structurally-atomic and structurally-molecularrepresentations; (b) structurally-molecular representations have syntactic con-stituents that are themselves either structurally-molecular or are structurally-atomic; and (c) the semantic content of a (molecular) representation is a func-tion of the semantic contents of its syntactic parts, together with its constituentstructure.

Let us parse this expression a bit. When Fodor and Pylyshyn refer to structurally-atomicand structurally-molecular representations, they are obviously drawing an analogy to chem-

70

istry. For the purposes of chemistry, matter can be thought of as ultimately consistingin indivisible units called atoms (and, probably, electrons). Atoms have properties, asatoms. However, the atoms may also be combined to form the structures we call molecules.Molecules have their own properties, and these properties come from the properties of theatoms they are made of, along with the specific manner and geometry by which the atomshave been combined to form the molecule.

For Fodor and Pylyshyn, combinatorial symbols are symbols that work in essentially thesame way—with the difference that physical and chemical properties are to be replacedwith semantic meaning. Structurally-atomic representations (atoms) are symbols thatare indivisible in the sense that they cannot be broken up into parts that can still beascribed a symbolic meaning. Given a set of symbol-atoms, there must be rules that allowsymbol-atoms to be combined into symbol-molecules just as the laws of chemistry andphysics allow physical atoms to be combined into physical molecules. Symbol-moleculesmay be further combined to form higher molecules (although this is not strictly required).Finally, Fodor and Pylyshyn say the meaning of symbol-molecule must be a function ofthe semantic contents of the atoms and/or lower-order molecules that have been combinedto produce the molecule. They are saying that the meaning for the symbol-molecule mustcome from the meanings of its parts along with the specific manner in which the partshave been combined to form the molecule—just as the physical and chemical properties ofa physical molecule come from those of its parts along with the particular manner of theirphysical combination. That is, the meaning of the symbol-molecule cannot be somethingarbitrary—it must be determined by the meaning of the parts and the way that the partshave been put together to form the molecule.

Fodor and Pylyshyn’s point was that certain types of well-known mental operations requirecombinatorial syntax, or symbolism. An example is reasoning by analogy. Reasoning byanalogy occurs when there is some type of homology between two knowledge domains,one of which is well-understood and the other of which is not. The mind is able to mapthe knowledge in the known domain to new knowledge in the unknown domain. An ap-propos example of this is the exposition in the preceding paragraph. There, we usedwell-known knowledge from the domain of high-school chemistry and the relationship,“physical/chemical properties → semantic meaning” to clarify what Fodor and Pylyshynmean by various expressions in their definition of combinatorial syntax.

In order for our minds to perform this mapping, Fodor and Pylyshyn argue, there must bea mirroring between their internal representations for the knowledge in the two domains.It must have been possible to unplug, as it were, the part of the representation for thechemistry knowledge that referred to “physical and chemical properties” and replace itwith a part that refers to “meaning.” For this to be possible requires the representationfor both knowledge domains to be combinatorial.

71

3.3 Role-binding, feature-binding and multi-binding

Semantic representation requires the representation of relationships between ideas. Forexample, when storing geographical information about a town, it is not only important torecord which streets and shops are in the town but also how these are related to each otherand to other ideas: Which streets intersect? In what part of town is the intersection? Whichstreets run parallel? How far away from one another are they? Which shops are on a givenstreet? Of these, which are on the same block? These are all relational questions. Scholarsadopting the network paradigm for semantic representation have often chosen to place therepresentation for relationships onto the links between nodes, which represent ideas—see,e.g., Quillian (1967). This is combinatorial syntax, and it does provide a solution to theproblem of relationship encoding.

However, it is not, strictly speaking, a connectionist approach, and it is not a basis fora theory of semantic representation in the brain. Connectionism, proper, does not allowfor semantic attachments to the connections between units, and neither does the brain.Synapses between neurons are dynamical in nature. They may be differentiated from eachother on the basis of their strength and the neurotransmitter used, but this is not a basisfor saying that one synapse has a different semantics from another.

To address the problem of connectionist relation encoding, the role-binding motif found inFig. 3.1 has been adopted by Hinton et al. (1985); Hummel and Holyoak (2003); Shastri andAjjanagadde (1993) and other scholars. As nearly as I am aware, it was first put forwardby Hinton. The motif works by binding participants in relationships to their roles in thoserelationships through the use of separate nodes for the participant–role bound element—which I will sometimes call role instances. As shown in Fig. 3.2(a), an adaptation of afigure found in Hummel and Holyoak (2003), having these role-binding nodes becomes abasis for, in this example, encoding a specific relationship between the entities, ‘Bill’ and‘Mary’: that ‘Bill loves Mary’. The role-binding nodes make clear that one entity is alover, another is beloved, and which is which. The ‘Bill loves Mary’ relation node pairsthis particular lover (Bill) with this particular beloved (Mary).

In Fig. 3.2(a), the ‘Bill’ and ‘Mary’ nodes, themselves, bind together a specific set offeatures, viz., ‘male’, ‘adult’, and ‘human’ in the case of ‘Bill’. “Feature-binding” is normalfor semantic network representations, wherein nodes are often seen as belonging to a kind ofhierarchy. As one moves up the hierarchy, one moves from nodes that are more feature-liketo nodes that combine the features. That is, from Hummel and Holyoak’s point of view,whereas the ‘Bill’ node brings together the features, ‘male’, ‘adult’, and ‘human’ (a partiallist), the ‘Bill-as-lover–’ node brings together ‘Bill’ and ‘lover’ as its own features. The “Billloves Mary” node then brings together ‘Bill-as-lover–’ and ‘Mary-as-beloved–’. However,if we think of Bill’s love for Mary as being part of what makes Bill Bill, the role-bindernode, ‘Bill-as-lover-of-Mary’ can be included in the list of ideas that ‘Bill’ binds together.

72

Bill(relation

partic.)

Mary(relation

partic.)

male(participant

aspect)

adult(participant

aspect)

human(participant

aspect)

female(participant

aspect)

Bill-as

-lover-of-Mary(role instance)

Mary-as-be-

loved-of-Bill(role instance)

Bill loves Mary(relation instance)

lover(role)

beloved(role)

loves relation(relation type)

has-emotion(role aspect

/supertype)

emotion-obj.(role aspect

/supertype)

emotive relation(relation supertype)

attraction(assoc. relation)

attracted(assoc. role)

attractor(assoc. role)

Joni-as

-lover-of-Chachi(role exemplar)

Joni

loves Chachi(exemplar relation)

Chachi-as

-beloved-of-Joni(role exemplar)

AB

Figure 3.2: Multi-binding semantic network. (A) Left-hand-side of partition: A networkutilizing role-and feature-binding adapted from Hummel and Holyoak (2003).(B) Right-hand-side of partition: A multi-binding extension of this networkincluding relationship type nodes.

73

This is a crucial shift because it breaks the feature–feature-combination hierarchical view.The ‘Bill-as-lover-of-Mary’ node is just as much a part of the meaning of ‘Bill’ as ‘Bill’ isa part of ‘Bill-as-lover-of-Mary’.

We may take the same non-hierarchical approach with regard to the role-binder nodesalso. The relationship, “Bill loves Mary” may be validly seen as providing the contextfor the role-participant-binding nodes, ‘Bill-as-lover-of-Mary’ and ‘Mary-as-beloved-of-Bill’and therefore being part of their meaning. In fact it is only on taking this approach thatwe may be justified in using the label, ‘Bill-as-lover-of-Mary’. Binding together ‘Bill’ and‘lover’ does not inform as to who or what Bill is the lover of. But when the relationship“Bill loves Mary” becomes incorporated into the meaning, this information is now present.The same goes for ‘Mary-as-beloved-of-Bill’.

The role-binder nodes and the ‘Bill’ and ‘Mary’ nodes may therefore be validly interpretedas bringing together the meanings of other nodes that they link with. The multi-bindingsemantic network principle, which is the topic of this chapter, generalizes this idea anduses it as a systematic interpretational principle for all nodes in a connectionist semanticnetwork. Specifically, multi-binding says that the meaning of each node in a network is thebound integration of the meanings of the nodes it links with.

3.4 Multi-binding as combinatorial syntax

In Fig. 3.2(b), we extend the network of Fig. 3.2(a)—adding a few nodes according tothe multi-binding principle that serve to (partly) establish the meanings of the “Bill lovesMary”, ‘lover’ and ‘beloved’ nodes from Fig. 3.2(a), which otherwise are unspecified by thenetwork. With the extension added, the “Bill loves Mary” node integrates not only the‘Bill-as-lover-of-Mary’ idea and the ‘Mary-as-beloved-of-Bill’ idea but also the general ideaof the ‘loves’ relation, which provides information on what it is that, generally speaking,exists between a lover and beloved and brings that information into this specific context.The general ‘loves’ relation, itself, integrates all the different stored instances (exemplars)in which it occurs, the roles (‘lover’ and ‘beloved’) occurring within it, and other relationsthat it entails, suggests or is otherwise associated with. These concepts, which the ‘loves’node brings together, provide grounds for inference, goal selection, and ultimately actionselection in the presence of a specific “X loves Y” relation that links to the ‘loves’ relationconcept. Likewise, the role nodes, ‘lover’ and ‘beloved’, integrate exemplars of lovers andbeloveds in stored exemplars of love relations, along with characteristics of these roles, andassociated roles in associated relations.

This principle of nodal integration should be thought of as extending all the way outto nodes that couple directly to events in the environment within which the network issituated—either sensory receptor nodes or motor effector nodes, which integrate these

74

environmental events directly into their meaning. Nodes coupling to the environmentare crucially important for a multi-binding network because without them there is noultimate grounding for node meaning—each node would be a bound integration of othermeaningless nodes, which would make it meaningless as well. To draw an analogy, a multi-binding network without environment-coupled nodes is as meaningless as a dictionary isto a person who does not know any of the words in a language.

The syntax of multi-binding satisfies Fodor and Pylyshyn’s criteria for combinatorial syntaxin the following way: Atomic symbols are environmental events that link with one or morenodes in the network. Molecular symbols are nodes linking with more than one event, node(or both). Nodes have, as their syntactic constituents, any and all nodes and environmentalevents they link with. Nodes assemble the meanings of nodes they link with by bindingthem together into a single integrated meaning. Since multi-binding is a combinatorialsyntax, a connectionist framework utilizing is not subject to the limitations Fodor andPylyshyn (1988) discuss.

We will not write down a set of rules for how a node’s linked meanings should be boundinto a single one. Rather, we will assume that it is clear, given the linked meanings. Forexample, it is clear in Fig. 3.2(a) that the correct integration of ‘Bill-as-lover-of-Mary’,’Mary-as-beloved-of-Bill’, and ’loves’ is “Bill loves Mary.” At times it may be ambiguoushow the linked concepts are to come together to form the bound concept. However, afterworking with numerous examples of this, my experience is that it is always possible toadd nodes and links to the network that are sufficient to make the meaning unambiguous.Although I do not have a proof for this at the present time, it is my hypothesis that thisis strictly true: any meaning that one desires to express through multi-binding may beachieved given sufficiently many nodes.

We may make as precise a statement as possible about the meaning of a link in a multi-binding network by saying that each link provides a constraint on the meaning of the nodethat has the link. The constraint is that the node on the other side of the link mustbe involved, somehow, in the original node’s meaning. When retrieving meanings storedin a multi-binding network (MBN), it is up to the mind’s own processing to solve theconstraint problem of determining how the meanings of all the linked nodes fit togetherinto a single coherent meaning. Recall that constraint-satisfaction problems are showcasesolvable problems for connectionist systems (Rumelhart, 1989). This consolidated meaningwill manifest in the application of the meaning to whatever mental process the mind isundertaking at the time of the consolidation—whether that be the active generation ofbehavior suited to a present situation, the determination of future goals, the developmentof new information to be stored for later use or another process.

75

3.5 Partial implementation: A formal-language–MBN translator program

A full implementation of the multi-binding semantic network principle in the form of aself-organizing semantic network—in the vein of parallel distributed processing or Hopfieldnetworks—would be a very serious undertaking. I have not attempted this. To give aconcrete demonstration of the expressive power of multi-binding, I have instead written acomputer program that translates semantic information in the form of a formal languagedescription into a multi-binding network (MBN) and back. By performing the translationin both directions without loss, the program proves that information encoded by the formallanguage can be unambiguously encoded with an MBN.

The networks we have discussed so far have been undirected, unweighted networks. Here,we will make the enhancement of using directed and weighted connections (edges). Therewill only be two weights for the edges we use: strong and weak. Directionality for edges willbe used in the following way: a node’s outgoing edges link to the nodes whose meaningsit binds together to form its own meaning. Strong edges represent a higher degree ofbinding than weak edges. Directionality for links allows that the degree to which one nodeparticipates in another node’s meaning is not necessarily the same as the degree to whichthe second participates in the first’s.

3.5.1 The formal language representation

The formal language uses three kinds of symbols: item symbols, brackets and structs. Itemsymbols are simple alphabetical names, such as a, b, Sam, or airplane. Item symbolsmight stand for a simple item that you would like to ascribe a name to, like your caror favorite chair. They can also stand for more complex entities. A bracket is a pairof rounded parentheses, which may contain a pair of angled braces and a pair of curlybraces inside them, i.e., (<...>,...). Inside the braces goes a list of other symbols,like so: (a,Sam) or (<a,b>,Sam,airplane). The symbol list may include item symbols,brackets and structs. A struct is a alphabetical name next to a bracket, R(<a,Sam>,b)or LivesIn(Sam,SamsHouse).

Brackets stand for bound integrations of the symbols appearing within the bracket. Itemsinside the angled braces are necessary members; the items in the curly braces are possiblemembers. Items appearing in a bracket with no angled or curly braces are all necessary.Brackets correspond to nodes in the MBN. Necessary members of the bracket will receivestrong (thick) links from the node representing the bracket; possible members will receiveweak (thin) links.

Structs (short for structure) are brackets that have some additional implied structure withinand among the members of the bracket. Structs appearing in the formal language descrip-

76

tion will correspond either to struct types or struct instances, depending on context. Anexample of a struct type is the ‘loves’ relation in Fig. 3.2. The ‘loves’ relation is a generalconcept—a type—for which the specific relation, “Bill loves Mary,” is an instance. Nodescorresponding to struct types will link to other nodes that encode the implied structure ofthe struct. Nodes corresponding to struct instances, such as “Bill loves Mary”, will gainaccess to this structure as part of their own meaning by having a strong link to the structtype node that they instantiate. The struct type node will also have weak links to anystruct instance nodes instantiating it since they are exemplars of the abstract concept itrepresents.

There are two sections in the formal language description, the DEFINE section and theASSERT section. The DEFINE section declares and defines item symbols and struct types.Each line followed by a semicolon declares or defines a new symbol. An item symbol orstruct type followed by a semicolon has been declared. This means that it has been statedas existing, as a symbol, without (yet) being defined—so that it may be used in othersymbol definitions without causing an error. When a struct type is declared, its roles areincluded in the struct symbol and will be declared also.

Structs occurring on the left hand side of an equals sign are struct types. The itemsoccurring inside the struct types’ bracket are the struct roles for the struct type. Structroles are positions within the struct that items participating in an instance of the structtake with respect to each other. These are the roles used for role-binding in the MBN.Examples of struct roles are the ‘lover’ and ‘beloved’ roles in the ‘loves’ relation, which isa struct type.

If the struct type or item symbol is followed by an equals sign and another symbol, thestruct type or item is defined. A simple definition would be to assign an item symbol to abracket, for example, which might take on the following appearance:

car = (<wheels,engine>,sunroof);

This definition says that a car is combination of the necessary items, wheels and engine,and the possible item, sunroof.

A definition for a struct type will be a bracket that includes all the internal structure ofthe struct. For example, we might define a Turns(turner,turnee) struct:

Turns(turner,turnee) =

(CanRotate(turnee),Causes(turner,Rotates(turnee));

This definition says that a Turns struct occurs between a turner and a turnee (i.e., a thingthat is turned) when the turnee has the property that it can rotate and the turner causesthe turnee to actually rotate. This definition uses three struct instances: an instance of

77

the single-role struct, CanRotate, an instance of the double-role struct, Causes, and aninstance of the single-role struct, Rotates. A struct instance is a particular example of thestruct type it is associated with. The symbols listed inside the struct instance’s bracketrepresent specific concepts occupying roles in the struct instance. The position of eachconcept symbol in the list will match the position of a struct role in the struct type for thestruct. The matched role is the role that the symbol’s concept takes in the struct. Notethat in the case of Causes(turner,Rotates(turnee)), one of the symbols in the Causes

struct instance is itself a struct (Rotates(turnee)).

CanRotate(turnee), Causes(turner,Rotates(turnee) and Rotates(turnee) are allstruct instances because, in constructing them, we are putting the roles for the Turns

struct into the positions of the roles for the CanRotate, Causes and Rotates structs.The struct types for these should be defined or declared on an earlier line (although thetranslation program will automatically declare them if they have not been).

Struct instances can also be used in item definitions. For car, we might prefer:

car = (<wheels,engine,Turns(engine,wheels)>,sunroof);

Here we are putting specific items, wheels and engine into the roles for Turns.

The ASSERT section asserts items and struct instances as existing/true. Each line containsa single struct, bracket or item symbol, followed by a semicolon, and serves to make a newassertion: the symbol’s referent is to be regarded as actual, or true. For example, havingdefined the items, Pope and Italy, and the struct, LivesIn(liver,placeLived), in theDEFINE section, we might want to assert,

LivesIn(Pope,Italy);

which says the pope lives in Italy. We will often (but not always) also want to assert theitems appearing in the asserted struct as existing. In this case, the pope is a real person andItaly is a real place, so we should assert “Pope;” and “Italy;” to say that the pope andItaly are things that exist rather than fictional, hypothetical or purely abstract concepts,like ManInTheMoon for instance.

Structs and items appearing in the ASSERT section should be declared or defined in theDEFINE section prior to assertion—although the translator will correct for this, automat-ically declaring undeclared struct types or items. All structs found in the ASSERT sectionare necessarily struct instances.

78

3.5.2 The Multi-Binding Network representation

As we have said, the principle of the MBN is that each node should be interpreted asintegrating the meanings of the nodes it links to into a single coherent meaning. However,extending a network all the way out to event-coupled nodes will result in a vast and deepnetwork for any network containing concepts at the level of abstraction we are used tospeaking and thinking about as human beings. For this reason, for the purpose of brevityand effective discourse, we will need to make some omissions. Dashed boundaries indicatethat some links are missing and that a node is not completely defined by the network.Similarly to orthographical ellipses (“. . . ”), they imply an omission. Nodes with solidboundaries are defined entirely by connections shown in the network.

There are five special nodes that generally occur in the MBN. They are the struct type,struct role, struct instance, struct role instance, and truth special nodes. These nodes onlyappear when they are needed, viz., when there is another node that needs to link to them.For the first four special nodes listed, nodes will link to or from them in case there is asymbol of the corresponding kind in the formal language description. That is, every structtype in the formal language description will receive a node in the MBN. That node willlink strongly to and weakly from the struct type special node. Each struct role belonging toa struct type will also be granted a node—linking strongly to and weakly from the structrole special node. Each struct instance of each struct type will likewise receive a nodethat links strongly to and weakly from the struct instance node. Finally, for every structinstance, there will be a unique role instance for every role appearing in the struct’s type.Each of these role instances will link strongly to and weakly from the struct role instancenode. These four nodes represent the general concepts of a struct type, struct role, structinstance and a struct role instance.

When a node links to one of them, it constrains the linking node’s meaning such that itmust involve the corresponding concept somehow—usually in that the linking node repre-sents a concept of the kind (struct type, struct role, struct instance, struct role instance)represented by the special node. The weak link in the opposite direction (from the specialnode) for such nodes represents that the node is an exemplar of the kind represented by thespecial node. The special nodes all have dashed boundaries, indicating that their meaningis not established by the network. They should nonetheless be thought of as existing ina larger network for which their meaning is established, not by specialness, but by theirconnections to other nodes—as is the case for any node. Therefore, in the end, there isnothing really “special” about them.

The truth special node is used for assertions. All asserted concepts receive a node that linksstrongly to and weakly from the truth special node. The truth node should be thought ofas representing the total integrated unity of what exists and is real, or true. An alternativename for the truth special node is the reality node, and it may be helpful to think of it

79

in this way. When a concept is asserted in the formal language description, the meaningis that the concept represents something that is real or applies to what is real. Eitherway, the concept is involved in what is real, which is why the truth special node linksto each asserted node. Likewise, all asserted concepts, since they are asserted as true orexisting rather than merely being hypothetical, fictional, abstract or otherwise, have truth,or reality, as part of what they are; therefore the nodes for asserted concepts should anddo link to the truth node.

Instances of struct type, struct role, struct instance, and struct role instance (role-binding)nodes are wired together essentially as in Fig. 3.2 with some modifications owing to thefact that our MBN is a weighted, directed network. Our example, Fig. 3.6, will encode thesame information as Fig. 3.2 for the sake of comparison.

Nodes standing for struct types will link weakly to their struct instances, since their structinstances are their exemplars. Struct instances will link strongly to their struct types sincea struct instance’s type is an important, necessary components of the instance’s meaning.Nodes standing for struct roles will link strongly the struct types they are involved in, sincethe struct type provides a major component of the context for the struct role. The structtype will link strongly to the struct role if the struct role is necessary (according to thebracket used in the struct type definition) and weakly to the struct role if the struct roleis possible.

Nodes standing for struct role instances will link strongly to the struct roles they instantiate.As exemplars of their struct roles, they will also receive weak links from their struct rolenodes. They will additionally link strongly to the struct instance they are involved in.This struct instance node will leak strongly to the struct role instance nodes involved init regardless of whether the role is necessary or possible in the struct type node. This isbecause while the role may only be possible for the struct type, if there is an item takingon the role in the struct instance, the item-role unit is a necessary member of the structinstance.

This pattern of connectivity is, in normal cases, sufficient to easily reconstruct the formallanguage description—although it is possible to construct pathological input which confusesthe program. The translator uses simple rules to perform the MBN reverse translationthat rely partly on links between with special nodes to determine whether a node is astruct type, a struct instance, a struct role or a struct role instance. This is not the onlydetermining factor, however. The program also considers a node’s other outgoing partners,or successors, in deciding whether it in fact belongs to the type corresponding to the specialnode it links to. A struct type node, for example, will need to link to at least one structrole node before the program will decide to interpret it as a struct type.

80

3.5.3 Examples

Figs. 3.3–3.8 provide some sample formal-language–MBN translation pairs. In each ofthese cases the translation program has been used to translate from the formal languagedescription to the MBN shown and back again from the MBN to the formal languagedescription, producing a description equivalent to the original. Because the MBN storesno information concerning the order in which struct roles occur in structs, sometimes thisorder is shuffled on translation of the MBN. If this happens, the items taking on the rolesin struct instances are correspondingly re-ordered.

Fig. 3.3 is a low-complexity information structure containing a single declared struct type,P(x), a single declared item, a, and an asserted instance of P(x), P(a). Notice in thenetwork that the node for the struct role, x, belonging to P(x), has the prefix, P., attachedto the x in the node label, producing P.x. This convention for struct roles is universallyobserved by the translator program.

The P(a) struct instance is established as a struct instance partly by its link to the Struct

Inst node and additionally by the fact that its linkages follow the pattern for a structinstance: It links to a node that in turn links to Struct Type (likely a struct type) andto at least one other node that links to Struct Role Inst (likely a struct role instance).The node gains the specific meaning P(a) according to the role-binding motif, Fig. 3.1:its link to the P(x) struct type (or relation) indicates that P(x) is its type; its link to therole-binder/role-instance node, a as P.x, places a in the x position for the struct. Therole instance node, a as P.x gains its meaning by linking to the Struct Role Inst node,to the item, a, to the struct role, P.x, and to P(a). As a consequence, the struct instance,P(a) is defined. P(a) is asserted as true by a link to the Truth node.

The network in Fig. 3.4 is more complex but still possible to follow with the eye. Thistime we are declaring two struct types, P(x) and R(x). The R(x) struct type is definedto be a second order recursion of the P(x) struct. This is an important example becauserecursion is a classic combinatorial symbolic structure, and we are encoding it here with amulti-binding network. In the network, the definition for the R(x) struct type is providedby its link to the struct instance, P(P(R.x)). The meaning for P(P(R.x)) is encodedaccording to the same role-binding motif used to define P(a) in the previous example: itlinks to its struct type, P(x), and to an instance of the P.x struct role, namely, P(R.x)as P.x. This struct role instance binds the role, P.x to the role-filler, P(R.x), placing theconcept, P(R.x) into the position of the x role in the P(x) struct.

The P(R.x) node, which is being placed into the role, is itself encoded according to thesame motif: it links to the struct type, P(x), and the role instance, R.x as P.x. Therole instance binds R.x (the single struct role for R(x)) with the role for P(x), P.x. Thisstructure is essentially the same as the node encoding P(a) in the previous example, withthe exception that rather than having an item (a) fill the P.x role, this time a role (R.x)

81

Figure 3.3: Simple network with single defined struct, item and assertion.

belonging to a separate struct type (R(x)) is filling P.x.

The structure here may seem confusing to the reader encountering and learning aboutthis method for the first time. However, please note that the translator program has noproblem decoding it and reproducing the code on the left of Fig. 3.4 from the network onthe right.

Fig. 3.5 shows a formal-language–MBN pair for a less abstract example which we discussedabove in Sec. 3.5.1. Here we are declaring three items, engine, wheels and car, along witha struct type Turns(turnee,turner). The car item is defined as necessarily involving theitems, engine, wheels, and the struct instance, Turns(wheels,engine). What we aresaying with our definition for car is this: a car is a concept that needs to have wheels andan engine, and the engine must turn the wheels. For simplicity of the network diagram, weare omitting the possible item, sunroof that was discussed before, and we are also omittingthe definition for the Turns(turnee,turner) struct type. No assertions are made for thisexample. We have only defined a simplified version of the concept of car is without sayingwhether or not it exists or is found in the real world.

Fig. 3.6 is a more complex network that will be very challenging to follow with the eyein its computer-drawn form. The connection pattern is similar to that of Fig. 3.2 andencodes essentially the same information except according to a directed, binary-weightedformat. The example is included for comparing the formal language description and theMBN representation to the network in Fig. 3.2 and to show that more complex information

82

Figure 3.4: A network involving recursion.

Figure 3.5: Network corresponding to simplified car-engine example from text. Example issimplified so that the network may be followed with the eye.

83

Figure 3.6: Network corresponding to “Bill loves Mary” example from Fig. 3.2.

structures can be faithfully encoded as MBNs.

A second semi-complex example involving both feature-binding and role-binding appears inFig. 3.7. In this example, we are encoding the relational statement, “The quick brown foxjumped over the lazy dog,” which was mentioned in Chapter 1. The QBFox (quick, brownfox) and LDog (lazy dog) nodes in this example perform feature binding on their respectiveconcepts. They are asserted as existing in the “real world” by linking to the Truth node.The JumpOver(jumper,thingJumped) struct type is defined as integrating the structs,Jump(jumper) and Over(jumper,thingJumped). The JumpedOver(jumper,thingJumped)(past tense) struct type is defined as integrating JumpOver(jumper,thingJumped) and theconcept, past. Finally, the struct instance JumpedOver(QBFox,LDog) is wired accordingto the role-binding motif and asserted with a link to Truth.

Fig. 3.8 shows one last semi-complex example with several struct instances serving to relatethe asserted items, Jack, JHouse, JMouse, and JCat to each other. The formal languageand the MBN serve to (independently) tell the same story which, if rendered in naturalEnglish, might appear as:

This is the house that Jack built. This is the mouse that lived in the housethat Jack built. This is the cat that chased the mouse that lived in the housethat Jack built.

84

Figure 3.7: Network encoding the feature- and role-binding semantic structure, “The quickbrown fox jumped over the lazy dog.”

Figure 3.8: More complex network telling a story. Network not intended to be followedwith the eye. The point of the figure is to show that extended information canbe translated to and from an MBN.

85

3.6 Discussion

The combinatorial connectionist semantic network method I have presented brings togetherknown results regarding neural responses to abstract concepts, the psychology of concepts,and connectionist semantic network models in a way that answers a specific objection thathas been made to connectionist approaches to intelligence modeling: that they do not yieldcombinatorial symbolism. The result is a framework that offers a starting point for futurework concerning meaning and symbolism in neural and connectionist networks in general.The schema is built around the principle of multi-binding, viz., interpreting individualnodes as integrating the meanings of nodes they link to as well as environmental eventsthey are causally connected to. I have shown that this principle can be used to encode, witha connectionist network, the same information encoded by a formal first order predicatesyntax I have developed, which is clearly a combinatorial syntax.

It remains to be seen whether meaning can truly be stored by neural networks in theway that I suggest. Two levels of theoretical development and testing may be helpful inmaking this determination—the dynamic semantic network level and the neural networklevel. By the dynamic semantic network level, I mean a theory ascribing activity to nodesin a network and having principles that describe both the flow of activity among the nodesas well as structural changes to the network itself, such as the addition of new nodes andlinks and the adjustment of link weights. A dynamic semantic network theory of this kindwill need to be based on experimentally observed principles for neural dynamics, and itwill need to be able to store and recover information in the semantic network’s structurevia these dynamics.

Relevant to that point, experimental studies have suggested that temporally correlatedactivity may have a role in binding parts of a stimulus together into entities (Singer, 1999).While it has been implied throughout this chapter that the method presented is aimedat semantic representation via synaptically implemented connections between concept-representing neural assemblies, it is also possible to use the multi-binding principle inanother way that is relevant to the synchrony results.

Patterns of synchrony among assemblies of neurons can be represented by hypergraphs,which extend the “edge” concept for ordinary graphs to that of the hyperedge—an edge thatcan join any number of nodes together. A hypergraph can represent a synchrony patternby placing hyperedges among sets of nodes that share a common time-locked behavior.Since it is known that hypergraphs are equivalent to bipartite graphs, it is also possible touse a bipartite graph to represent a synchrony pattern. The multi-binding principle canbe used as a generalized syntax for such a “synchrony” network just as easily as it canfor the synaptic network. Because mathematical results including the proven rigid phaseconjecture indicate that the emergent structure of time correlations among networks ofdynamical units reflects the underlying structure of the network (see Golubitsky, Romano,

86

& Wang, 2012; Stewart & Parker, 2007), it is possible that a multi-binding synchronynetwork is a way for the information encoded in a multi-binding synaptic network to beretrieved into the activity pattern so as to influence ongoing dynamics.

If a dynamical framework is successfully proposed and validated through simulation andtesting, one may search for and develop implementation possibilities using neurons, synapses,and biologically-based neural dynamics.

87

References

Ali, A. B., Deuchars, J., Pawelzik, H., & Thomson, A. M. (1998). CA1 pyramidal to basketand bistratified cell EPSPs: Dual intracellular recordings in rat hippocampal slices.The Journal of Physiology , 507 (1), 201–217.

Baars, B. J., & Gage, N. M. (2010). Cognition, brain, and consciousness: Introduction tocognitive neuroscience. New York: Academic Press.

Barlow, H. B. (1972). Single units and sensation: A neuron doctrine for perceptualpsychology. Perception, 1 (4), 371–394.

Bowers, J. S. (2009). On the biological plausibility of grandmother cells: Implicationsfor neural network theories in psychology and neuroscience. Psychological Review ,116 (1), 220.

Brillinger, D. R. (1975). Time series: Data analysis and theory. New York: Holt, Rinehartand Winston.

Brown, E. N., Frank, L. M., Tang, D., Quirk, M. C., & Wilson, M. A. (1998). A statisticalparadigm for neural spike train decoding applied to position prediction from ensemblefiring patterns of rat hippocampal place cells. The Journal of Neuroscience, 18 (18),7411–7425.

Bruno, R. M., & Sakmann, B. (2006). Cortex is driven by weak but synchronously activethalamocortical synapses. Science, 312 (5780), 1622–1627.

Carpenter, G. A., & Grossberg, S. (1988). The art of adaptive pattern recognition by aself-organizing neural network. Computer , 21 (3), 77–88.

Chung, K., & Deisseroth, K. (2013). Clarity for mapping the nervous system. NatureMethods, 10 (6), 508–513.

Ciresan, D., Meier, U., & Schmidhuber, J. (2012). Multi-column deep neural networksfor image classification. In Computer vision and pattern recognition (CVPR), 2012IEEE conference on (pp. 3642–3649).

Connor, J. A., Walter, D., & McKown, R. (1977). Neural repetitive firing: Modificationsof the hodgkin-huxley axon suggested by experimental results from crustacean axons.Biophysical Journal , 18 (1), 81.

Cox, D. R., & Isham, V. (1980). Point processes (Vol. 12). USA: CRC Press.Dayan, P., & Abbott, L. F. (2001). Theoretical neuroscience. Cambridge, MA: MIT Press.Depalle, P., & Helie, T. (1997). Extraction of spectral peak parameters using a short-

time fourier transform modeling and no sidelobe windows. In Applications of signalprocessing to audio and acoustics, 1997 IEEE workshop on.

88

Eliasmith, C. (2013). How to build a brain: A neural architecture for biological cognition.Oxford University Press.

Evans, G. N. (2014). Convolution metric for neuron membrane potential recordings. arXivpreprint arXiv:1409.2182 .

Evans, G. N., & Collins, J. C. (2013). Neurally implementable semantic networks. arXivpreprint arXiv:1303.4164 .

Fausett, L. (1994). Fundamentals of neural networks: Architectures, algorithms, andapplications. New Jersey: Prentice-Hall, Inc.

Fee, M. S., Kozhevnikov, A. A., & Hahnloser, R. H. R. (2004). Neural mechanisms of vocalsequence generation in the songbird. Annals of the New York Academy of Sciences,1016 (1), 153–170.

Fiete, I. R., Fee, M. S., & Seung, H. S. (2007). Model of birdsong learning based ongradient estimation by dynamic perturbation of neural conductances. Journal ofNeurophysiology , 98 (4), 2038–2057.

Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: Acritical analysis. Cognition, 28 (1), 3–71.

Gold, C., Henze, D. A., Koch, C., & Buzsaki, G. (2006). On the origin of the extracellularaction potential waveform: A modeling study. Journal of Neurophysiology , 95 (5),3113–3128.

Golubitsky, M., Romano, D., & Wang, Y. (2012). Network periodic solutions: Patterns ofphase-shift synchrony. Nonlinearity , 25 (4), 1045.

Hampton, J. A. (1979). Polymorphous concepts in semantic memory. Journal of VerbalLearning and Verbal Behavior , 18 (4), 441–461.

Harris, F. J. (1978). On the use of windows for harmonic analysis with the discrete fouriertransform. Proceedings of the IEEE , 66 (1), 51–83.

Haugeland, J. (1997). Mind design II: Philosophy, psychology, artificial intelligence. Cam-bridge, MA: MIT press.

Hellgren, J., Grillner, S., & Lansner, A. (1992). Computer simulation of the segmentalneural network generating locomotion in lamprey by using populations of networkinterneurons. Biological Cybernetics, 68 (1), 1-13.

Herrmann, C. S., & Ohl, F. W. (2009). Cognitive adequacy in brain-like intelligence. InB. Sendhoff, E. Korner, O. Sporns, H. Ritter, & K. Doya (Eds.), Creating brain-likeintelligence (pp. 314–327). Springer.

Hinton, G. E., McClelland, J. L., & Rumelhart, D. E. (1985). Distributed representations.In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Ex-plorations in the microstructure of cognition: Foundations (pp. 77–109). Cambridge,MA: MIT Press.

Hodgkin, A. L., & Huxley, A. F. (1952). A quantitative description of membrane currentand its application to conduction and excitation in nerve. The Journal of Physiology ,117 (4), 500.

Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective

89

computational abilities. Proceedings of the National Academy of Sciences, 79 (8),2554–2558.

Hopfield, J. J. (1984). Neurons with graded response have collective computational proper-ties like those of two-state neurons. Proceedings of the National Academy of Sciences,81 (10), 3088–3092.

Houghton, C., & Sen, K. (2008). A new multineuron spike train metric. Neural Computa-tion, 20 (6), 1495–1511.

Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striatecortex. The Journal of Physiology , 148 (3), 574.

Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functionalarchitecture in the cat’s visual cortex. The Journal of physiology , 160 (1), 106.

Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shaperecognition. Psychological Review , 99 (3), 480.

Hummel, J. E., & Holyoak, K. J. (2003). A symbolic-connectionist theory of relationalinference and generalization. Psychological Review , 110 (2), 220.

Hunter, J. D., Milton, J. G., Thomas, P. J., & Cowan, J. D. (1998). Resonance effect forneural spike time reliability. Journal of Neurophysiology , 80 (3), 1427–1438.

Izhikevich, E. M. (2004). Which model to use for cortical spiking neurons? IEEE Trans-actions on Neural Networks, 15 (5), 1063–1070.

Lennie, P. (2003). The cost of cortical computation. Current Biology , 13 (6), 493–497.Long, M. A., Jin, D. Z., & Fee, M. S. (2010). Support for a synaptic chain model of

neuronal sequence generation. Nature, 468 (7322), 394–399.Markram, H. (2012). The human brain project. Scientific American, 306 (6), 50–55.McClelland, J. L., & Rogers, T. T. (2003). The parallel distributed processing approach

to semantic cognition. Nature Reviews Neuroscience, 4 (4), 310–322.McCloskey, M. E., & Glucksberg, S. (1978). Natural categories: Well defined or fuzzy sets?

Memory & Cognition, 6 (4), 462–472.Meyer, K., & Damasio, A. (2009). Convergence and divergence in a neural architecture

for recognition and memory. Trends in Neurosciences, 32 (7), 376–382.Mitra, P. P., & Bokil, H. (2007). Observed brain dynamics. New York: Oxford University

Press.Moore, C. I., & Nelson, S. B. (1998). Spatio-temporal subthreshold receptive fields in the

vibrissa representation of rat primary somatosensory cortex. Journal of Neurophysi-ology , 80 (6), 2882–2892.

Murphy, G. L. (2002). The big book of concepts. Cambridge, MA: MIT Press.Newell, A., & Simon, H. A. (1976). Computer science as empirical inquiry: Symbols

and search. Communications of the Association for Computing Machinery , 19 (3),113–126. (reprinted in Haugeland (1997), pp. 81–110)

Paiva, A. R. C., Park, I., & Prıncipe, J. C. (2009). A reproducing kernel Hilbert spaceframework for spike train signal processing. Neural Computation, 21 (2), 424–449.

Paiva, A. R. C., Park, I., & Prıncipe, J. C. (2010). A comparison of binless spike train

90

measures. Neural Computing and Applications, 19 (3), 405–419.Patterson, D. W. (1996). Artificial neural networks: Theory and applications. Prentice

Hall PTR.Quillian, M. R. (1967). Word concepts: A theory and simulation of some basic semantic

capabilities. Behavioral Science, 12 (5), 410–430.Quirin, S., Jackson, J., Peterka, D. S., & Yuste, R. (2014). Simultaneous imaging of neural

activity in three dimensions. Frontiers in Neural Circuits, 8 .Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C., & Fried, I. (2005). Invariant visual

representation by single neurons in the human brain. Nature, 435 (7045), 1102–1107.Rieke, F., Warland, D., De Ruyter van Steveninck, R., & Bialek, W. (1997). Spikes:

Exploring the neural code. MIT Press.Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure

of categories. Cognitive Psychology , 7 (4), 573–605.Rosen, M. J., & Mooney, R. (2006). Synaptic interactions underlying song-selectivity in

the avian nucleus HVC revealed by dual intracellular recordings. Journal of Neuro-physiology , 95 (2), 1158–1175.

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage andorganization in the brain. Psychological Review , 65 (6), 386.

Roy, A. (2014). On findings of category and other concept cells in the brain: Sometheoretical perspectives on mental representation. Cognitive Computation, 1-6.

Rumelhart, D. E. (1989). The architecture of mind: A connectionist approach. In M. I. Pos-ner (Ed.), Foundations of cognitive science (pp. 314–327). MIT Press. (reprinted inHaugeland (1997), pp. 205–232)

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representa-tions by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Paralleldistributed processing: Explorations in the microstructure of cognition: Foundations(pp. 318–362). Cambridge, MA: MIT Press.

Rumelhart, D. E., & McClelland, J. L. (1985). Parallel distributed processing: Explorationsin the microstructure of cognition: Foundations. Cambridge, MA: MIT Press.

Scanziani, M., & Hausser, M. (2009). Electrophysiology in the age of light. Nature,461 (7266), 930–939.

Schrauwen, B., & Campenhout, J. V. (2007). Linking non-binned spike train kernels toseveral existing spike train metrics. Neurocomputing , 70 (7), 1247–1253.

Schreiber, S., Fellous, J. M., Whitmer, D., Tiesinga, P., & Sejnowski, T. J. (2003). A newcorrelation-based measure of spike timing reliability. Neurocomputing , 52 , 925–931.

Schultz, W. (2002). Getting formal with dopamine and reward. Neuron, 36 (2), 241–263.Sellers, P. H. (1974). On the theory and computation of evolutionary distances. SIAM

Journal on Applied Mathematics, 26 (4), 787–793.Sendhoff, B., Korner, E., Sporns, O., Ritter, H., & Doya, K. (2009). Creating brain-

like intelligence: From basic principles to complex intelligent systems (Vol. 5436).Springer.

91

Shastri, L., & Ajjanagadde, V. (1993). From simple associations to systematic reasoning: Aconnectionist representation of rules, variables and dynamic bindings using temporalsynchrony. Behavioral and Brain Sciences, 16 , 417–451.

Shoham, S., O’Connor, D. H., & Segev, R. (2006). How silent is the brain: Is there adark matter problem in neuroscience? Journal of Comparative Physiology A, 192 (8),777–784.

Singer, W. (1999). Neuronal synchrony: A versatile code for the definition of relations?Neuron, 24 (1), 49–65.

Slepian, D., & Pollak, H. O. (1961). Prolate spheroidal wave functions, fourier analysisand uncertaintyi. Bell System Technical Journal , 40 (1), 43–63.

Smolensky, P. (1989). Connectionist modeling: Neural computation/mental connections.In L. A. Nadel, P. C. Cooper, & R. M. Harnish (Eds.), Neural connections, mentalcomputation (pp. 49–67). Cambridge, Bradford/MIT Press. (reprinted in Haugeland(1997), pp. 233–250)

Steriade, M., Nunez, A., & Amzica, F. (1993). A novel slow (< 1 Hz) oscillation ofneocortical neurons in vivo: Depolarizing and hyperpolarizing components. TheJournal of Neuroscience, 13 (8), 3252–3265.

Stewart, I., & Parker, M. (2007). Periodic dynamics of coupled cell networks i: Rigidpatterns of synchrony and phase relations. Dynamical Systems, 22 (4), 389–450.

St-Pierre, F., Marshall, J. D., Yang, Y., Gong, Y., Schnitzer, M. J., & Lin, M. Z. (2014).High-fidelity optical reporting of neuronal electrical activity with an ultrafast fluo-rescent voltage sensor. Nature Neuroscience, 17 (6), 884–889.

Thomson, D. J. (1982). Spectrum estimation and harmonic analysis. Proceedings of theIEEE , 70 (9), 1055–1096.

van Rossum, M. C. W. (2001). A novel spike distance. Neural Computation, 13 (4),751–763.

Victor, J. D., & Purpura, K. P. (1997). Metric-space analysis of spike trains: Theory,algorithms and application. Network: Computation in Neural Systems, 8 (2), 127–164.

von der Malsburg, C. (1994). The correlation theory of brain function. Springer.Walden, A., McCoy, E., & Percival, D. (1994, February). The variance of multitaper spec-

trum estimates for real gaussian-processes. IEEE Transactions on Signal Processing ,42 (2), 479-482.

Yuste, R., & Church, G. M. (2014). The new century of the brain. Scientific American,310 (3), 38–45.

Ziburkus, J., Cressman, J. R., Barreto, E., & Schiff, S. J. (2006). Interneuron and pyramidalcell interplay during in vitro seizure-like events. Journal of Neurophysiology , 95 (6),3948–3954.

92

Garrett Nolan Evans 15925 Booth Circle Volente, TX 78641 [email protected]

Areas of Academic Interest:

Connectionism, theoretical neuroscience, concept neurons, binding, neural synchrony, neural chaos

Education: Ph.D. (expected), Physics Department, Penn State University, May 2015

Advisor: Professor John C. Collins Skills developed: C++, MATLAB and Java computation Dissertation: Two Projects in Theoretical Neuroscience: A Convolution-based Metric for Neural Membrane Potentials and a Combinatorial Connectionist Semantic Network Method

• Multi-disciplinary work involving mathematical psychology and neuroscience

B.S., Physics Department, University of Texas at Austin, December 2002

Awards and Honors: May 2012: Graduate Teaching Award, Physics Department, Penn State University

August 2008 – August 2011: Academic Computing Fellowship, Penn State University

June – August 1997: Department of Defense Scientific and Engineering Apprenticeship Program, Applied Research Laboratories: The University of Texas

Research and Teaching Experience: August 2011 – December 2014: Teaching Assistant, Physics Department, Penn State University

• Introduced and guided undergraduates through laboratory and problem-solving sessions

August 2008 – May 2011: Research supported by Penn State Academic Computing Fellowship

January 2007 – August 2008: Teaching Assistant, Physics Department, Penn State University

January 2006 – December 2007: Research Assistant, Physics Department, Penn State University

August 2003 – January 2006: Teaching Assistant, Physics Department, Penn State University

Papers and Presentations: Evans, G. N. (2015). Submitted. Convolution-Based Metric for Neural Membrane Potential Traces.

Submitted to PLOS ONE. Manuscript PONE-D-15-14692.

Evans, G. N., & Collins, J. C. (2013). Neurally Implementable Semantic Networks. arXiv:1303.4164.

Evans, G. N. (2012). Semantic Networks and the Brain. January 31, 2012 Network Science Seminar,

Physics Department, Penn State University.

Evans, G. N. & Collins, J. C. (2011). Modeling Knowledge Representation in Neuronal Networks.

Poster presented at the 2011 International Joint Conference for Neural Networks, San Jose, CA.

two projects in theoretical neuroscience

Documents