on working memory and mental imagery: how does the brain learn to think? victor eliashberg, ph.d....

On working memory and mental imagery:How does the brain learn to think?

Victor Eliashberg, Ph.D.

Consulting professor, Stanford University, Department of Electrical Engineering

President, Avel Electronics

email: [email protected]

web site: www.brain0.com

The human brain has many remarkable cognitive characteristics that deeply

puzzle scientists and engineers. Among the most important and the most intriguing of these characteristics is the brain’s unmatched ability to learn. People are universal learners. We learn to see, to hear and to move. We learn to speak and to think. We learn how to learn, and even how to learn how learn, and so on. There is hardly any aspect of human behavior that is not affected by learning. In this lecture, I will address two fundamental computational questons associated with the universality of human learning:

1. How can a programmable computing system achieve universality (type 0) using basic computational resources similar to those of the brain? I argue that the brain does not have a counterpart of conventional RAM .

2. How can the above computing system achieve arbitrarily complex effects of programming via a process similar to associative learning?

I argue that the brain is too slow to use parsers, compilers, and other traditional computer science tricks - they require a fast sequential processor and a lot of

RAM.

BIG PICTURE: general structure of the computational model for the cognitive system (Man,World)

B(t) is a formal representation of B at time t, where

t=0 is the beginning of learning. B(0) is referred to as

Brain 0 (Brain zero).

BDW

Human-like robot (D,B)External world W

External system (W,D)

Sensorimotor devices D

Computing system, B, simulatingthe work of human nervous system

The mystery of human learning

How big is B(0)? How big is B(t)?

B(0) B(t), t >20 years

MEGABYTE(S)? TERABYTES?

learning

Type 0

Type 1

Type 2

Type 3

Type 4

Type 0: Turing machines (the highest computing power)

Type 1: Context-sensitive grammars

Type 2: Context-free grammars (stack automata)

Type 3: Finite-state machines

Type 4: Combinatorial machines (the lowest computing power) [12]

Types of behavior and the levels of computing power [4]

No system can learn to do what it cannot do in principle! An elephant can learn to fly only in the Disney film!

Type 4: Combinatorial machines

Boolean functions, feed-forward artificial neural networks (ANN), support vector machines (SVM), etc., are systems of this type. Analog vs. digital distinction is not essential at this general level!

f: X*G→YG

aaa c a c

0 1 0 0 1 1

y

b

xf

PROM

b

X={a,b,c}

Y={0,1}

General structure of universal programmable systems of different types

Programmable logic array (PLA) is an example of a universal programmable system of type 4 [14]

Type 3: Finite-state machines

f: X*S*G→S*Y

PROM

register

snext

G

yx

fs

aa 1 0 1

0 1 0 0 1 1

11

aaa c a c

0 0 0 1 1 1

bb

0

X={a,b,c}

S=Y={0,1}

a b c

a b c

1

0 x

yState 0

1

0 x

y State 1

x

y

s

snext

Type 0: Turing machines (state machines coupled with a read/write external memory)

f: X*S*G*M→S*M*YPROM

register

snext

G

yx

fsRAM or tape

M

I argue that the theoretically interesting issue of super-Turing computing [34] , [6] is not relevant to the brain. It is easy to show that the presence of noise eliminates this issue in classical systems [35] - there is no good reason to assume that the brain uses quantum computing.

Time vs. space: combinatorial explosion associated with an attempt to replace time by space

Let N be the maximum length of input sequences inX, and let |X|=m. The size of PROM grows as mN .

a) Simulating behavior of type 3 using model of type 4

f

x

y

G PROM

Shift register of length N

f: XN * G → Y

b) Simulating behavior of type 0 using model of type 3

Let N be the length of tape , and let the external alphabet of a Turing machine have m symbols. The tape has mN states. The size of PROM in the system of type 3 -- representing finite tape as the state s of the feedback register -- grows as mN .

Minimum memory space cannot be traded for time. However, given enough memory space, any parallel process can be replaced by a sequential one. The opposite is not true. An attempt to replace time by space leads to a combinatorial explosion of the size of the required space.

Computational universality and learning

OBSERVATION 1. A person with a sufficiently large external memory aid can perform, in principle, any effective computational procedure (algorithm).

OBSERVATION 2. We are not born with the knowledge ofall possible algorithms. We can learn any algorithm.

OBSERVATION 3. While using an external memory aid we learn to do similar mental computations using the correspondingimaginary memory aid [3] .

How can a universal symbolic machine be implemented as a neural network? What learning algorithm is needed to make this network a universal learning machine?

Turing’s machine as a system (Robot, World) [38]

A “neural” brain for Turing’s robot [8],[12],[15],[18] (WTA.EXE)

DECODING

ENCODING

CHOICE

‘‘Neural” elementary operations: DECODING, CHOICE, and ENCODING

We need a read/write working memory to move to level 0. What is the brain’s working memory? It is clearly not a “RAM buffer.”

CHOICE

Next G-state procedure (data storage)

DECODING

ENCODING

Working memory and mental imagery

OBSERVATION 5. We memorize new information with the references to the pieces of information we already have in our LTM.OBSERVATION 6. Our ability to retain data in STM increases if similar data is stored in LTM.

OBSERVATION 4. We can imagine new events and remember and recall the real sequence of events.

What is working memory and mental imagery? How does the working memory interact with LTM? What does motor control have to do with it?

OBSERVATION 7. To imagine different sensory events we need to do mental motor reaction that would cause similar events.

Working memory and mental imagery as a simulation of external system (W,D). System (W,D) is the “Teacher” for system AS. [8],[15]

AS

D

Working memory and mental imagery

associations

NS

AM

Teacher

SM M

Motor controlW

associationsMS S

S

S

M

M

S

M

NM

Mental computations (thinking) as an interaction between motor control and working memory (EROBOT.EXE) [8],[15]

Introducing dynamic short-term memory (STM) and intermediate-term memory (ITM): E-states and the concept of a primitive E-machine [7],[8],[9]

MODULATION

CHOICE

Next E-state procedure

Next G-state procedure

DECODING

ENCODING

s[i]=Similarity(x[*],gx[*][i] ) (1)

se[i]=s[i]+a*e[i]+b*s[i]*e[i] (2)

DECODING (compute similarity between input and the data in ILTM)

iwin : M={i / se[i]=max(se[*])} (3) CHOICE (randomly select a winner from the set M)

ENCODING (retrieve data from the winner location of OLTM)

y[*]=gy[*][iwin] (4)

Next E-state procedure (dynamics of STM)

if(s[i] > e[i] ) e[i]=s[i]; else e[i]= (tau-1)/tau *e[i] (5)

Next G-state procedure (data storage)

“Tape record” XY-sequence in ILTM and OLTM (6)

MODULATION (effect of “residual excitation”-- E-state)

Simple example of a primitive E-machine

Effect of recency: simulating a RAM buffer without actually moving data [8],[11],[12]

address

data_in

data_out

i

0 2 31

a b b a

0 2 31

b a a b

a bb b a a a b

0 1 2 3 4 5 6 7

E-state

0 1 2 3

b a

address

dataa b

Simulated state of RAM

Mental set and context

OBSERVATION 9. We can selectively tune our attention to avoice we want to hear -- the so called cocktail party phenomenon.

OBSERVATION 8. We can see different sub-pictures in a picturedepending on what we expect to see. We can hear different tunes inthe same sequence of sounds depending on what we want to hear.

What mechanism available in neural networks can account for these phenomena? What is mental set? How can a system with a linearly growing size of its knowledge (software) handle an exponential number of possible contexts?

This problem was raised by neurophysiologist G.W. Zopf Jr. (1961) in his paper entitled “Attitude and context” [43] . Zopf argued that this killer-issue undermined alltraditional approaches to brain modeling and cognitive modeling.

Dynamic reconfiguration: simulating a combinatorialnumber of Boolean functions, N, w/o re-learning [7],[8],[11]

i0 1 10

0 1 0 1

0 1 10

0 1 0 1

0 10 0 0 1 1 1

x1

x2

y1

XOR

AND

0 1 2 3 4 5 6 7

OR

NAND

noisy AND

NOR

m

N 22 , where m is the number of binary inputs. Let m=10, then N=21024

More complex example of a primitive E-machine [8],[11]

Elements with “residual excitation” (E-states)

General propositions

TEACHER

BD

NS

NM

NH NI

W

X and Y are the sets of the values of inputs (in-arrows) and outputs (out-arrows) of B; G is the set of states of LTM ( called G-states); E is the set of states of STM and ITM (called E-states); Q is the setof states associated with delays of signals; g(0) is the initial state of LTM (initial brain’s software);fy: X*G*E*Q Y; fg: X*G*E*QG; fe: X*G*E*QE, and fq: X*G*E*QQ are functions calculating the values of outputs and the next states. NOTE: fy is a probabilistic procedure.

NH –centers of emotions (‘H’= “Hedonic”)

NI – all other internal observable centers

Concept of “observable” behavior

Proposition 3: All functions fy, fg, fe, and fq are determined at the brain hardware level and are essentially fixed (not affected significantly by learning). The main result of learning is in changing g(0) into g(t), that is in changing the initial brain’s software (“BIOS”) into the software of a trained brain B(t).

Proposition 2: The work of B(0) can be represented in the following general form: B(0)=(X,Y, G,E,Q,fy,fe,fe,fq,g(0)), where

NS –sensory centers

NM –motor centers

Proposition 1: The results of learning depend only on the SMHI-sequence and not on the way this sequence is produced.

Learning as an evolutionary software development

Possibilities of learning increase dramatically with the learner’s level of computing power. Only systems of type 0 can have “serious software” – in such systems, the problem of learning becomes the problem of producing the instances of behavior and reinforcing the right instances. More complex behavior results in more complex software, the later producing even more complex behavior, and so on. This evolutionary loop which can start with random behavior (e.g., babbling) can lead to extremely complex results. In contrast, there isn’t much room for software development and learning in the systems of types 4 and 3.

In the EROBOT, the problem of learning was reduced, essentially, to memorizing robot’s behavior. Trivial as it may look, this approach is universal and can be developed very far in more sophisticated universal learning robots. In contrast, the approaches to learning that employ “smart” learning algorithms are not universal. They try to optimize learner’s performance in selected contexts and throw away information needed in a large number of other possible contexts.

Pitfall of “partial modeling”: the whole can be simpler than its parts

It is a known fact that the behavior of a “whole” physical system may have an efficient formal representation whereas the parts (projections) of this behavior may not. The Maxwell equations give a good illustration of this phenomenon. There exists an efficient formal representation of the behavior of the “whole” electromagnetic field. In nontrivial cases, however, it is practically impossible to find separate formal representations of either the electric or the magnetic projections of this behavior.

I argue that the same holds for the “physical” cognitive system (W,D,B). There exists an efficient formal representation of the behavior of the “whole” B(0). It is practically impossible, however, to find separate formal representations of the projections of this behavior corresponding to different nontrivial cognitive phenomena.

This claim has fundamental implications for the general structure of the theory of human cognition. It suggests that the only practically possible way to develop an adequate mathematical theory of behavior of cognitive system (W,D,B(0)) is to reverse engineer B(0). Metaphorically speaking, B(0) is for the cognitive theory what the Maxwell equations are for the classical electrodynamics [9],[11] .

(W,D) B(0)Maxwell equations

Boundary conditionsand sourcesspecific external constraints

fundamental constraints

specific external constraints

fundamental constraints

It is impossible to find separate formal representations of the “symbolic” and “dynamical” parts of the behavior of an E-machine. The “whole” has a simple representations – the “parts” don’t.

Note. In [8] , E-machines were characterized as “non-classical” symbolic systems. Due to the random choice, the probabilities of sequential symbolic processes in these systems are controlled by massively parallel dynamical processes. This approach provides a powerful integration of symbolic and dynamical computations. To my knowledge, no other general style of symbolic/dynamical computations, e.g., [1],[5],[42] , offers similar explanatory possibilities. I argue that the brain is an essentially probabilistic machine - not a fuzzy, etc., deterministic system. That is, fluctuations in nerve membranes are of critical importance [40] .

“Symbolic” output

“Symbolic” input DECODING

MODULATION

RANDOM CHOICE

ENCODING

“Symbolic” input LTM

“Dynamical” STM and ITM

“Symbolic” output LTM

NEURAL IMPLEMENTATION

Returning to 3-layer associative neural network (Invention of a “3-neuron” association was a big step in the evolution of the brain! See “The Brain”, a Scientific American book, 1979.)

A functional model of the previous network [7],[8],[11]

(WTA.EXE)

(1)

(2)

(3)

(4)

(5)

Cerebellar cortex network [37]

Implementation of a very big ILTM [8]

Implementation of a very big OLTM [8]

Sophisticated machinery of a cell

Nucleus

Membrane proteins

Membrane

It took evolution much longer to create individual cells than to build systems containing many cells, including the human brain. Different cells differ by their shape and by the types of membrane proteins.

Nucleus

Membrane proteins

Membrane

18nm

3nm

Typical neuron

Neuron is a very specialized cell. There are several types of neurons with different shapes and different types of membrane proteins. Biological neuron is a complex integrated unit – not a simple “atomic” element used in traditional ANN models! Where does this complexity come from?

Protein molecule as a probabilistic molecular machine (PMM) [13],[16]

i

Ensemble of PMMs (EPMM)

E-states as occupation numbers

EPMM as a statistical mixed-signal computer [16]

Ion channel as a PMM

Monte-Carlo simulation of patch clamp experiments

Two EPMM’s interacting via a) electrical and b) chemical messages

Spikes produced by an HH-like model [21] with 5-state K+ and Na+ PMM’s. (EPMM.EXE)

A model of sensitization and habituation in a pre-synaptic terminal

A PMM implementation of a putative calcium channel with sensitization and habituation

Note. The PMM formalism allows one to naturally represent considerably more complex models. This level of complexity is not available in traditional ANN models.

BIBLIOGRAPHY (slide 1 of 4)

[4] Chomsky, N. (1956). Three models for the description of language. I.R.E. Transactions on Information Theory. JT-2, 113-124.

[7] Eliashberg, V. (1967). On a class of learning machines. Proceedings of the Conference on Automation in the Pulp & Paper industry, April 1967, Leningrad USSR. Proc of VNIIB, #54, 350-398.

[9] Eliashberg, V. (1981). The concept of E-machine: On brain hardware and the algorithms of thinking. Proceedings of the Third Annual Meeting of the Cognitive Science Society, 289-291.

[8] Eliashberg, V. (1979). The concept of E-machine and the problem of context-dependent behavior. Txu 40-320, US Copyright Office.

[2] Ashcroft, F.M. (2004). Ion channels and disease. Academic Press, London

[1] Anderson, J.R. (1976). Language, Memory, and Thought. Hillsdale, New Jersey: Lawrence Erlbaum Associates, Publishers.

[10] Eliashberg, V. (1988). Neuron layer with reciprocal inhibition as a mechanism of random choice . Proc. of the IEEE ICNN-88

[5] Collins, A.M., & Quillian, M.R., (1972). How to make a language user. In E. Tulving & W. Donaldson (Eds.) Organization and memory. New York: Academic Press.

[3] Baddeley, A.D. (1982). Your memory. A user's guide. Macmillan Publishing Co., Inc.

[6] Deutsch, D. (1985). Quantum theory, the Church-Turing principle and the universal quantum computer. Proc. of the Royal Society of London A400, pp. 97-117.


[15] Eliashberg, V. (2002). What Is Working Memory and Mental Imagery? A Robot that Learns to Perform Mental Computations. Web publication, www.brain0.com, Palo Alto, California

[14] Eliashberg, V. (1993). A relationship between neural networks and programmable logic arrays. 0-7803-0999-5/93, IEEE, 1333-1337.

[16] Eliashberg, V. (2005). Ensembles of membrane proteins as statistical mixed-signal computers. Proc. of IJCNN-2005, Montreal, Canada, pp. 2173-2178.

[19] Hille, B. (2001). Ion channels of excitable membranes. Sinauer Associates. Sunderland, MA

[21] Hodgkin, A.L., and Huxley, A.F. Description of membrane current and its application to conduction and excitation in nerve. Journal of Physiology, 117, pp. 500-544, 1952.

[13] Eliashberg, V. (1990b). Molecular dynamics of short-term memory. Mathematical and Computer modeling in Science and Technology. vol. 14, 295-299.

[12] Eliashberg, V. (1990a). Universal learning neurocomputers. Proceeding of the Fourth Annual parallel processing symposium. California state university, Fullerton. April 4-6, 1990. , 181-191.

[18] Grossberg, S. (1982). Studies of mind and brain. Boston: Reidel Press.

[20] Hinton, G.E., and Andereson, J. (1981). Parallel models of associative memory. Hilsdale, Nj: Lawrence Erlbaum Associates, Publishers.

[17] Gerstner, W., and Kistler, W.M. (2002). Spiking Neuron Models. Cambridge University Press.

[11] Eliashberg, V. (1989). Context-sensitive associative memory: "Residual excitation" in neural networks as the mechanism of STM and mental set. Proceedings of IJCNN-89, June 18-22, 1989, Washington, D.C. vol. I, 67-75.


[32] Rosenblatt, F. (1961). Principles of Neurodynamics. Perceptron and the theory of brain mechanisms. Spartan Books.

[28] Meynert, T. (1884). Psychiatrie. Wien.

[31] Reichardt, W., and MacGinitie, G. (1962). Zur Theorie der Lateralen Inhibition. Kybernetik, B. 1, Nr. 4.

[29] Minsky, M.L. (1967). Computation: Finite and Infinite Machines. Prentice-Hall, Inc.

[27] McCulloch, W.S., and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115-133.

[33] Rumelhart, D.E., McClelland, J.L., and the PDP research group (1986). Parallel distributed processing. Cambridge MA: MIT press.

[24] Kandel, E.R., and Spencer, W.A. (1968). Cellular Neurophysiological Approaches in the Study of Learning. Physiological Rev. 48, 65-134.

[23] Hopfield, J.J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences USA, 79, 2554-2558.

[25] Kandel, E. Jessel, T., and Schwartz, J. (2000). Principles of Neural Science. McGraw-Hill.

[26] Kohonen, T. (1984). Self-Organization and Associative Memory, Springer-Verlag.

[30] Pinker, S, and Mehler, J. eds. (1988). Connections and Symbols. The MIT Press.


[42] Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, pp. 338-353.

[41] Vvedensky, N.E. (1901). Excitation, inhibition, and narcosis. In Complete collection of works. USSR, 1953.

[43] Zopf, G.W., Jr. (1961). Attitude and Context. In "Principles of Self-organization", Pergamon Press, pp. 325- 346.

[38] Turing, A.M. (1936). On computable numbers, with an application to the Entscheidungsproblem. Proc. London Math. Society, ser. 2, 42

[37] Szentagothai, J. (1968). Structuro-functional considerations of the cerebellar neuron network. Proc. of IEEE, vol. 56, 6, pp. 960-966.

[36] Steinbuch, K. (1961). “Die Lernmatrix”, Kybernetik, vol. 1, pp.36-45.

[39] Varju, D. (1965). On the theory of Lateral Inhibition. Consiglio Nazionalle Delle Reicerche Quardeni de “La Ricerca Scientifica”, v.31.

[40] Veveen, A.A, and Dreksen, H.E. (1968). Fluctuation phenomena in nerve membrane. Proc. of the IEEE, v. 56, No. 6.

[34] Siegelman, H.T. (2003). Neural and Super-Turing Computing: Minds and Machines, vol. 13, No. 1. pp. 103-114.

[35] Sima, J., and Orponen, P. (2003). General-Purpose Computations with Neural Networks: A survey of Complexity-Theoretic Results. Neural Computations, 15, pp. 2727-2778.

on working memory and mental imagery: how does the brain learn to think? victor eliashberg, ph.d....

Documents

universality type

y b x f prom b x

finitestate machines

y state

g y x f s ram

highest computing power

human brain

y prom register s