Download - Sparse Distributed Memory
Sparse Distributed MemoryA study of psychologically driven storage
Pentti Kanerva
Sparse Distributed Memory – p. 1/24
OutlineNeurons as address decodersBest matchSparse MemoryDistributed Storage
Sparse Distributed Memory – p. 2/24
GoalThe goal of Kanerva’s paper is to present a method ofstorage that, given a test vector, can retrieve the bestmatch to the vector among a set of previously storedvectors. He builds this model from the ground up, firstusing neurons as address decoders. He then describes amethod of storage called sparse memory. This fails hisgoals, however, he builds on this model, culminating inwhat he calls distributed storage.
Sparse Distributed Memory – p. 3/24
GoalKanerva believes that his model may be a goodgeneralization of how the human brain works. It isnon-conventional in that it does not strive to provide amethod for storing exactly that which one would like toretrieve. Rather, it provides a way to retrieve thosevectors that are very similar to a given vector.
Sparse Distributed Memory – p. 4/24
Address Decoder NeuronsHave standard weight vector [called inputcoefficients]
BipolarLinear Threshold GateUnipolar Output
Sparse Distributed Memory – p. 5/24
Key Observation
A one-dimensional weight vector can be realized as anaddress, for which the neuron can ‘code’ for.
The threshold value can be realized as the difference inthe desired address and the input address.
The Response Region of this type of neuron is the set ofinputs for which the output of the neuron is 1.
Sparse Distributed Memory – p. 6/24
Mathematical AsideIt isn’t necessary that the weights be unipolar, but itmakes the model simple, and more useful. Also, it makesmuch more sense that each bit of the address contributesequally, otherwise the distance function becomesskewed.
Sparse Distributed Memory – p. 7/24
Biological Aside
The addition and deletion of new synapses will not affectgreatly the address of an individual neuron. [Thiscorresponds to the expansion or contraction of the weightvector.] Furthermore, if synapses changed fromexcitatory to inhibitory or vice versa, the address wouldbe affected, but neurons are not known to make this sortof change.
Sparse Distributed Memory – p. 8/24
Assumption for Rest of Work
Weights are constant over timeContributing to the idea that a neuron has anaddress
Thresholds are allowed to vary
Sparse Distributed Memory – p. 9/24
Best MatchProblem: Given a data set of stored words, retrieve thebest match to a test word.
Formally: Describe a filing scheme for storing a set X ofwords so that one can retrieve, in minimum time, thestored word ζ that is the best match [closest in HammingDistance] to a test word z.
Recall: Hamming Distance between two binarysequences is the number of bits that disagree.
Sparse Distributed Memory – p. 10/24
Problem with Conventional Architec-ture
It is quicker to check each word of a trillion-word dataset against each word of a trillion-word test set, than it isto load into memory the given data set.
Sparse Distributed Memory – p. 11/24
Kanerva’s ‘Machine’An array of address decoding neurons, with the ability tochange all their thresholds as a unit. Output probablymuch like a hardware architecture, where the closestmatch will come out, when the threshold is passed.
Sparse Distributed Memory – p. 12/24
Sparse [Random Access] Memory
Storage locations and addresses are given from thestart.Only contents of locations are modifiableStorage locations are very few in comparison withN = 2n
Storage locations randomly distributed over {0, 1}n
Sparse Distributed Memory – p. 13/24
DefinitionsHard Locations: The random sample of actual storagelocations.
Nearest N ’-Neighbor, x’: The hard location closest tox ∈ N .
Sparse Distributed Memory – p. 14/24
An example of a sparse memory
Address space N = {0, 1}1000
Hard Locations N ′ = 1, 000, 000
On average, each location x ∈ N is 424 bits from x′
Sparse Distributed Memory – p. 15/24
Best Match w/ Sparse Memory
Let X be a random data set of 10,000 words.
Try storing each ξ ∈ X in ξ ′
With test word z and ξ closest to z in X , what is theprobability that the occupied hard location nearest to zcontains ξ?
Note: we are searching for ξ, located in ξ ′, and we arenot looking for z′ since z is not necessarily in X .
Sparse Distributed Memory – p. 16/24
Distributed Storage
Many storage locations participate in a write/readoperationAppearance of a RAM with large address spaceIf η is stored at ξ, then reading from ξ retrieves η
Claim: Reading from address x similar to ξ retrievesa word y that is closer to η than x to ξ.
Sparse Distributed Memory – p. 17/24
DefinitionsAccess Radius, Circle: If r is the access radius thengiven address x, we access all hard locations within r ofx when reading from or writing to x.
Contents of a Location x: Each storage location willcontain all of the words that have ever been written there.
Data of x, D(x): The data retrievable from address x isthe multiset of all the words in the hard locations insidethe access radius of x.
Reading at x: Returns a single word.
Sparse Distributed Memory – p. 18/24
Best MatchGiven a test word z retrieve target word ζ ∈ X
1. Search D(z) for best match
2. Most frequent word of D(z)
3. Random word of D(z)
4. Compute archetypical element of D(z)
Sparse Distributed Memory – p. 19/24
Finding the Best Match
Key Observation: If ζ is the word of X most similar tothe test word, we can expect it to occur the most often inD(z).
Obvious Attempt: Take the word that occurs mostfrequently.....Too much computation.
Weak Attempt: Take a random word.....Doesn’t work.
Working Attempt: Take the averageelement....Works....In most cases.
Sparse Distributed Memory – p. 20/24
Computing the Archetypical Element
The Average Element: Take the bitwise sum of D(z).
If d(z, ζ) < 209 than in a series of steps the bitwise sumof each successive word read will converge on ζ .
Sparse Distributed Memory – p. 21/24
Other Properties
Convergence/DivergenceMemory CapacityStorage Location CapacitySignal Strength/Noise/Fidelity
Sparse Distributed Memory – p. 22/24
What’s it got to do with Neural Net-works?
Knowing that one knows may be indicated by fastconvergence.
Tip-of-the-tongue may be indicated by the critical point,at which we may recover the memory, and we may not.
Rehearsal of some action, might be indicated by thestorage of that item over and over.
The Effects of brain damage which sometimes seemnon-existent, may be indicative of a distributed storage,where it is still possible to retrieve memories.
Sparse Distributed Memory – p. 23/24
Questions
???
Sparse Distributed Memory – p. 24/24