a gaussian synapse circuit for analog vlsi neural networks

5
IEEI? TRANSACTIONS ON VERY LARGE SCALEiINTEGRATION (VLSI) SYSTEMS, VOL 2, NO. I, MARCH I994 , 129 A Gaussian Synapse Circuit for Analog VLSI Neural Networks Joongho Choi, Bing J. Sheu, and Josephine C.-F. Chang Abstract- Back-propagation neural networks with Gaussian fune- tion synapses have better convergence property over those with linear- multiplying synapses. In digital simulation, more computing time is spent on Gaussian h e t i o n evaluation. We present a compact analog synapse cell which is not biased in the subthreshold region for fully-parallel operation. This cell can approximate a Gaussian function with accuracy around 98% in the ideal case. Device mismatcb induced by fabrication process will cause some degradation to this approximation. The Gaussian synapse cell can alsa he used in unsupervised learning. Programmability of the proposed Gaussian synapse cell is achieved by changing the stored synapse weight Wji, the reference e u m n t and the sizes of transistors in the differential pair. I. INTRODUCTION Artificial neural networks have shown great promise for complex pattern classification applications. There are five popular transfer functions in artificial neural networks study: linear function, step function, ramp function, sigmoid function, and Gaussian function [I]. With the exception of the linear function, the other transfer functions introduce a nonlinearity in the network behavior by bounding the output values within a fixed range. In addition, a high-gain region facilitates pattern-classification capabilities of the artificial neural networks. The step function and the ramp function are simplified versions of the sigmoid function. Due to the continuous derivative values, the sigmoid function is widely used in back-propagation networks, which have been applied to various applications such as character recognition, sonar target recognition, image classifica- tion, signal encoding, knowledge processing, and a variety of other pattern-analysis problems [2],[3]. However, the conventional error back-propagation network usually requires a very long time for correct weight adjustment in convergence. The sigmoid function of a conventional multilayer network gives a smooth response over a wide range of input values. In contrast, the Gaussian function network responds significantly only to a very local region of the space of input values. Back-propagation training is more efficient in neural networks based on Gaussian function than those based on sigmoid function in the hidden layers. Up to two to three orders of magnitude speed-up in training for applications in pattern recognition such as phoneme classification by using Gaussian function neural networks have been reported [4],[5]. We present a compact analog VLSI circuit design for Gaussian function neural networks. The proposed circuit is not biased in the subthreshold region so that a significant driving capability is achieved. It is also optimized for several design issues such as precision, operation speed, and cell compactness to be suitable for scalable neural network implementation. I Manuscript received February 26, 1993; revised September 13, 1993. This work was suppoted in part by DARF’A under Contract MDA 972-90-C-0037 and by TRW Inc. and Samsung Electronics Co. J. Choi was with the Department of Electrical Engineering, University of Southern California, Los Angeles, CA. He is now with IBM T. 1. Watson Research Center, Yorktown Heights, NY 10598. B. J. Sheu and I. C.-F. Chang are with the Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089-0271. IEEE Log Number 9214289. Fig. 1. characteristics. A portion of a complete neural network with Gaussian synapse 11. GAUSSIAN FUNCTION NETWORKS Fig. I(a) shows a portion of a complete Gaussian function neural network. The input neurons are fully connected to the output neurons through the synapse matrix. The input neurons can belong to the input layer or a previous hidden layer, while the output neurons can belong to the next hidden layer or the output layer of the network. The resultant operation of the input neurons, synapses, and output neurons can be expressed as where W,, is the weight of the synapse cell which connects the ith input neuron and the jth output neuron. M input neurons and N output neurons are chosen for (1). Each synapse between an input neuron and an output neuron can perform the distributed Gaussian function with a weight value being the mean value of a Gaussian function. Changing the mean value W,, means to increase or to decrease connection strength of the input neuron X, to the output neuron Y, [6]. The U,’ determines the standard deviation value of the Gaussian function characteristic. In a Gaussian function network, each synapse needs to com- pute the exponential nonlinearity, c-(x*~M’J*)2’2u~~, rather than a linear-multiplication, X, . WJZ. The linear operation is used in the conventional back-propagation networks. To accomplish the linear- multiplication and the squared-difference operation, a differential- input Gilbert multiplier can be used in analog neural network proces- sor design [7]. The exponential nonlinearity was usually computed in simulations on digital computers. Recently, one hardware imple- mentation of analog VLSI neural network processors with transistors biased in the subthreshold region was reported [PI. In the subthreshold region, the drain current of an MOS transistor has an exponential dependence on the gate bias [9]. The subthreshold-region VLSI circuits are suitable for the implementation of biologically inspired artificial neural systems [IO]. Millions of MOS transistors biased in the subthreshold region can be integrated on a single silicon chip because of the extremely low power consumption of each component. 1063-8210/94$04.00 0 1994 IEEE ~ -

Upload: jc-f

Post on 22-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

IEEI? TRANSACTIONS ON VERY LARGE SCALEiINTEGRATION (VLSI) SYSTEMS, VOL 2, NO. I , MARCH I994 , 129

A Gaussian Synapse Circuit for Analog VLSI Neural Networks

Joongho Choi, Bing J. Sheu, and Josephine C.-F. Chang

Abstract- Back-propagation neural networks with Gaussian fune- tion synapses have better convergence property over those with linear- multiplying synapses. In digital simulation, more computing time is spent on Gaussian h e t i o n evaluation. We present a compact analog synapse cell which is not biased in the subthreshold region for fully-parallel operation. This cell can approximate a Gaussian function with accuracy around 98% in the ideal case. Device mismatcb induced by fabrication process will cause some degradation to this approximation. The Gaussian synapse cell can alsa he used in unsupervised learning. Programmability of the proposed Gaussian synapse cell is achieved by changing the stored synapse weight Wji, the reference eumnt and the sizes of transistors in the differential pair.

I. INTRODUCTION Artificial neural networks have shown great promise for complex

pattern classification applications. There are five popular transfer functions in artificial neural networks study: linear function, step function, ramp function, sigmoid function, and Gaussian function [I]. With the exception of the linear function, the other transfer functions introduce a nonlinearity in the network behavior by bounding the output values within a fixed range. In addition, a high-gain region facilitates pattern-classification capabilities of the artificial neural networks. The step function and the ramp function are simplified versions of the sigmoid function. Due to the continuous derivative values, the sigmoid function is widely used in back-propagation networks, which have been applied to various applications such as character recognition, sonar target recognition, image classifica- tion, signal encoding, knowledge processing, and a variety of other pattern-analysis problems [2],[3]. However, the conventional error back-propagation network usually requires a very long time for correct weight adjustment in convergence. The sigmoid function of a conventional multilayer network gives a smooth response over a wide range of input values. In contrast, the Gaussian function network responds significantly only to a very local region of the space of input values. Back-propagation training is more efficient in neural networks based on Gaussian function than those based on sigmoid function in the hidden layers. Up to two to three orders of magnitude speed-up in training for applications in pattern recognition such as phoneme classification by using Gaussian function neural networks have been reported [4],[5].

We present a compact analog VLSI circuit design for Gaussian function neural networks. The proposed circuit is not biased in the subthreshold region so that a significant driving capability is achieved. It is also optimized for several design issues such as precision, operation speed, and cell compactness to be suitable for scalable neural network implementation. I

Manuscript received February 26, 1993; revised September 13, 1993. This work was suppoted in part by DARF’A under Contract MDA 972-90-C-0037 and by TRW Inc. and Samsung Electronics Co.

J. Choi was with the Department of Electrical Engineering, University of Southern California, Los Angeles, CA. He is now with IBM T. 1. Watson Research Center, Yorktown Heights, NY 10598.

B. J. Sheu and I. C.-F. Chang are with the Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089-0271.

IEEE Log Number 9214289.

Fig. 1. characteristics.

A portion of a complete neural network with Gaussian synapse

11. GAUSSIAN FUNCTION NETWORKS Fig. I(a) shows a portion of a complete Gaussian function neural

network. The input neurons are fully connected to the output neurons through the synapse matrix. The input neurons can belong to the input layer or a previous hidden layer, while the output neurons can belong to the next hidden layer or the output layer of the network. The resultant operation of the input neurons, synapses, and output neurons can be expressed as

where W,, is the weight of the synapse cell which connects the ith input neuron and the jth output neuron. M input neurons and N output neurons are chosen for (1). Each synapse between an input neuron and an output neuron can perform the distributed Gaussian function with a weight value being the mean value of a Gaussian function. Changing the mean value W,, means to increase or to decrease connection strength of the input neuron X , to the output neuron Y, [6]. The U,’ determines the standard deviation value of the Gaussian function characteristic.

In a Gaussian function network, each synapse needs to com- pute the exponential nonlinearity, c - ( x * ~ M ’ J * ) 2 ’ 2 u ~ ~ , rather than a linear-multiplication, X , . WJZ. The linear operation is used in the conventional back-propagation networks. To accomplish the linear- multiplication and the squared-difference operation, a differential- input Gilbert multiplier can be used in analog neural network proces- sor design [7]. The exponential nonlinearity was usually computed in simulations on digital computers. Recently, one hardware imple- mentation of analog VLSI neural network processors with transistors biased in the subthreshold region was reported [PI. In the subthreshold region, the drain current of an MOS transistor has an exponential dependence on the gate bias [9]. The subthreshold-region VLSI circuits are suitable for the implementation of biologically inspired artificial neural systems [IO]. Millions of MOS transistors biased in the subthreshold region can be integrated on a single silicon chip because of the extremely low power consumption of each component.

1063-8210/94$04.00 0 1994 IEEE

~ -

130 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 2, NO. I . MARCH 1994

1 I , VLW current-mode computation used in the synapse cell depends upon the current-matching property in the current mirror, a large channel length is helpful for obtaining a high-precision operation, especially reducing the offset current.

The input voltage is applied to the gate terminal of one transistor in the differential pair, and the weight value is stored on the total capacitance at the gate terminal of the other transistor. The two currents in the differential pair are expressed as

12 = I , + -(I<" - V,) - - (ri" - (3) 4 J;; with the input voltage in a finite region of

(b)

Fig. 2. Basic Gaussian synapse cell with single-ended input. (a) Circuit schematic. (b) The solid line is the simulation result of the Gaussian synapse cell with mean, 0; standard deviation, 0.78. The dashed line is the ideal Gaussian curve.

On the other hand, in the strong-inversion region, the MOS transistors have a power-law dependence on the bias voltages. Since the strong- inversion operation of MOS circuits provides the features of high current driving, large dynamic range, and high noise immunity, the high-speed analog VLSI neural network processors is desirable to be built with MOS transistors biased in the strong-inversion region for engineering applications.

A. Cixuir Analysis The proposed Gaussian synapses resemble the exponential nonlin-

earity by using piecewise approximation in the current-voltage space. This approximation results from the fact that the I - V characteristics of a transistor in a strong-inversion region obey a power rule instead of an exponential relation. Fig. 2(a) shows the circuit schematic diagram and the transistor sizes of a basic synapse cell with single- ended input data. The Gaussian function synapse cell consists of the MOS differential pair and several arithmetic computational units in the current-mode configuration. Transistors with large channel lengths are used to avoid the channel-length modulation effect. Since the

Here, I, is the reference current, and 13 = 1 . CO, . is the transconductance value of transistors MI and M z .

The output current of this synapse cell can be determined by

I,,, = a . I, - (16 + I g ) , ( 5 )

where a is the drain current ratio of transistor M 1 2 to MIO. When I<, - \', < 0, then Il > I, and I2 < I,. In this situation, I6 = I I - b I , and I , = 0. Here, b is the drain current ratio of transistor M,(h&) to MI. On the other hand, when J?, - V, > 0, then I I < I, and 12 > I, . In this new situation, Is = 0 and I , = 12 - b . I,. When the input voltage VL is comparable to the synapse weight value \'L, transistors ME,, Mg, M a , and A% turn off and the output current is mainly contributed by transistor M I Z . The values for the control parameter a and b can be chosen to better approximate the ideal Gaussian curve. Their typical values are quite close to one. The SPICE-3 [11],[12] circuit simulation result with the weight value of zero is shown in Fig. 2(b). The simulated output current closely matches the original Gaussian function curve.

An enhanced synapse cell with differential inpuuweight has also been developed. The circuit schematic diagram and the transistor sizes are shown in Fig. 3(a). Fig. 3(b) shows the comparison of the simulated output current characteristic curve of this enhanced synapse cell and the ideal Gaussian function curve. A better matching than the basic synapse cell has been achieved due to the symmetric handling of the positive and negative signals. Both the symmetric input and the synapse voltages are obtained with reference to the analog ground by the inverting voltage amplifiers, which consist of operational amplifiers with the input and the feedback resistors. The enhanced synapse cell approximates the Gaussian function with an accuracy around 98% over the input voltage range of 3 V in the ideal case when device imperfections such as mismatch, offset, and so on are not considered. Device mismatch induced by fabrication process will cause some degradation to this approximation. In fact, the usable output signal range of the enhanced synapse cell is almost doubled due to the use of the differential circuit architecture. However, the area of the enhanced synapse cell is approximately twice of that of the basic synapse cell. The required silicon area for the basic synapse cell is 125 X x 69 A, and that for the enhanced synapse cell is 146 X x 99 X for the CMOS scalable design rule from MOSIS Service of USChformation Sciences Institute at Marina del Rey, CA [13].

The synapse voltage V, is stored on the capacitance which is made of the gate capacitance of the MOS transistor. An additional

131

-3 -2 -1 0 1 2 3 Vin (V)

(b) Fig 3. Enhanced Gaussian synapse cell with differential inputlweight. (a) Circuit schematic. (b) The solid line is the simulation result of the Gaussian synapse cell with mean, 0 standard deviation. 1.07. The dashed line is the ideal Gaussian curve.

capacitor can be added to increase the effective storage strength. The differential input and weight voltages with 0 V common mode are sampled with the unity-gain buffer for one polarity signal and with the inverting voltage amplifier for the other polarity signal. The weight voltage is written into this capacitance during the learning phase to update the weight value and during the retrieving phase to refresh the weight information. This weight update process is done through the MOS switches which are controlled by the row decoder and the column decoder. The device sizes of the pass transistors have been chosen to be the minimum-geometry of the given fabrication process. Therefore, the common-mode noise resulting from feedthrough or discharge is significantly reduced.

B. Programmability The great power of an artificial neural network derives from its

ability to be adapted to the unknown and changing environment. Therefore, good programmability is of fundamental importance. In implementing the circuit building blocks of various Gaussian func-

Vin - Vw (V)

(C) Fig. 4. Programmability of the enhanced Gaussian synapse cell. (a) Different amplitudes with I, being 30, 20, and 10 PA. (b) Three different standard deviations: 1.708, 1.016, and 0.626 produced by W L being 4 A18 A, 4 ,414 A, and 4 A12 A. (c) Different mean values with V, being -1, 0, and 1 V.

132 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS. VOL. 2, NO. I, MARCH 1994

Vin 'i w3 - Input Synapses

Neurons (a)

Resistor p V0"t

-

output Neuron

I . I . , I I , I

time ["%I 0 0.2 0.4 0.6 0.8 1 1.2 1.4

I , , . , I * , I time IW]

' 0 0.2 0.4 0.6 0.8 I 1.2 1.4

(b)

Fig. 5. response of the Gaussian network.

(a) An example network with four Gaussian synapse cells. (b) Speed

tions for an analog VLSI neuroprocessor, three values are to be programmed: the maximum magnitude, the mean value, and the standard deviation. From (2) to (6). the output current is controlled by the reference current I,. By changing this reference current, the magnitude of the Gaussian output current can be adjusted.

Due to the possible leakage through the reverse-biased PN junc- tions, periodic refreshing is necessary to maintain an accurate synapse value. If the EEPROM device is used, the mean value can be stored permanently at the r w m temperature.

The standard deviation of the Gaussian function can be changed according to the constraint given by (4). For a fixed value of I,, the shape of the output current curve can be varied by changing the sizes of the transistors of the input differential pair. In the differential pair, transistors with different sizes can be connected together through MOS switches which are controlled by the data stored in a IocalD- flip flop [14]. By combining these programmable data, various sizes of the transistors in the differential pair brings the corresponding standard deviation of the Gaussian function.

The simulated results on the programmability of the proposed enhanced Gaussian synapse cell are shown in Fig. 4 for different values of amplitude, mean, and standard deviation. The curves shown in Fig. 4(a) and (b) are obtained by making V, equal to zero. In Fig. 4(a), three Gaussian curves are created by setting the reference current to 10, 20, and 30 PA, respectively. In Fig. 4(b), the Gaussian curve with a very small standard deviation 0.626 is created by choosing the W/L. ratio of the input differential pair to be 4 X/2 A. Three curves corresponding to the different mean values are shown in Fig. 4(c). The amount of curve shifting is equal to the distance value between the origin and the stored synapse value.

(d)

Fig. 5. (c) The summed current is shown in the dashed line. Output currents of four individual synapse cells are shown in the solid lines. (d) Measured results of the Gaussian synapse cell with different weight values of -1. 0, and I V.

C. Network Considerations Fig. 5(a) shows an example network for demonstrating the per-

formance of the proposed Gaussian function network. The input neurons consist of unity-gain amplifiers as data buffers. The same input voltage value is applied to the four input neurons. A linear resistor in the output neuron converts the summed current into the output voltage. When the number of synapses increases, the summed current may also increase and a smaller resistance value is to be used for dynamic range adjustment. Since the inverting input of the output neuron is virtually grounded, the contribution of current from one synapse cell is independent of that from another synapse cell. Therefore, the neuron response does not depend on the accumulation

Fig. 5(h) shows the SPICE simulation results of the Gaussian network. The stored weight values of the four synapse cells are 2.6, 1.8, 0.2, and -3.0 V. A typical response time of less than I 0 0 ns is achieved for the internal capacitive load. A large-scale neural network can be constructed using the proposed Gaussian synapse cells and the inpudoutput neuron cells reported in [7],[15]. Fig. 5(c) shows the output current of each Gaussian cell in solid lines and their summed current in the dashed line as the input voltage changes from -3 to 3 V continuously. The measured results on the dc characteristics of

Of mUltipk gdsB(S)'S and gdslz'S.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATTON (VLSI) SYSTEMS, VOL. 2. NO. I , MARCH 1994 133

ACKNOWLEEGMENT

Gaussian N M l k

Computer 1 Fig. 6. modules for on-chip learning.

Schematic diagram of the analog Gaussian network with supporting

the synapse cells with the stored weight values of -1, 0, and 1 V are shown in Fig. 5(d).

A complete Gaussian network can be supported by a DSP core, memory, AAJ and D/A converters for on-chip learning as shown in Fig. 6. The feedforward operation is performed by the Gaussian network. The DSP core updates synapse weight values during the learning phase. The digital weight values can be stored in the SRAM first. After digital-to-analog conversion, the analog weight signals are sent to the Gaussian network through row and column decoders. Outputs from the network are digitized by the analog-to-digital converter and sent to the DSP core for the learning process.

111. CONCLUSION Programmable Gaussian synapse cells which are suitable for fast

learning in artificial neural networks have been developed. The proposed Gaussian network is not biased in the subthreshold region and therefore the drawback of slow operation is eliminated. The differential inpudweight synapse cell can approximate a Gaussian function with the 98% accuracy, in the ideal case without considering device mismatch, for the input voltage range of -3 to 3 V in a 2-pm CMOS technology.

The authors would like to thank reviewers for valuable comments and suggestions.

REFERENCES

[I] P. K. Simpson, “Foundations of neural networks,” in Artificial Neural Networks: Paradigms, Applications, and Hardware Implementations, E. Shchez-Sinencio and C. Lau, Eds. New YorkIEEE Press, 1992.

[2] R. P. Lipmann, “An introduction to computing with neural nets,” IEEE ASSP Mag. , vol. 4, pp. 4-22, Apr. 1987.

[3] J. Dayhoff, Neural Neiwork Architectures. New YorkVan Nostrand Reinhold, 1990.

[4] J. Platt, “A resource-allocating neural network for function interpola- tion,” Neural Computation, vol. 3, no. 2, pp. 213-225, Summer 1991.

[5] A. L. Dajani, M. Kamel, and M. I. Elmasry, “Single layer potential function neural network for unsupervised learning,” Proc. of IEEWINNS Int. Joint Conf Neurnl Net., vol. It, pp. 273-278, San Diego, CA, June 1990.

[6] T. Poggio and F. Girosi, “Networks for approximation and learning,” IEEE Pmc., vol. 78, pp. 1481-1497, Sept. 1990.

[7] B. J. Sheu, J. Choi, and C.-F. Chang, “An analog neural network processor for self-organizing mapping,” in Tech. Digest, IEEE Int. Solid- State Circ. Con$ , San Francisco, CA, Feb. 1992, pp. 136-137.

[8] S. S. Watkins and P. M. Chau, “A radial basis function neurocomputer implemented with analog VLSI circuits,” Pmc. IEEWINNS Inf. Joint Con$ Neural Net., vol. 11, pp. 607-612, Baltimore, MD, June 1992.

[9] P. R. Gray and R. G. Meyer, Analysis and Design of Analog Integrated Circuits, 2nd Ed.

[IO] C. A. Mead, Analog VLSI and Neural Systems. New YorkAddison- Wesley, 1989.

[ I l l B. Johnson, T. Quarles, A. R. Newton, D. 0. Pederson, and A. Sangiovanni-Vincentelli, SPICE3 Version 3E1 Users Guide, Dept. of EECS, Univ. California, Berkeley, Apr. 1991.

[I21 B. J. Sheu, D. L. Scharfetter, P. K. Kuo and M.-C. Jeng, “BSIM: Berkeley short-channel IGFET model for MOS transistors,” IEEE J. Solid-state Circ. , vol. 22, pp. 558-566, Aug. 1987.

(131 C. Tomovich, “MOSIS-A gateway to silicon,” IEEE Circ. andDevices Mag., vol. 4, pp. 22-23, Mar. 1988.

[I41 P. W. Hollis and J. J. Paulos, “Artificial neural networks using MOS analog multipliers,” IEEE J. Solid-state Circ., vol. 25, pp. 849-855, Jun. 1990.

[I51 S. Satyanarayana and Y. P. Tsividis, “A reconfigurable VLSI neural network,” IEEE J. Solid-state Circ., vol. 27, pp. 67-81, Jan. 1992.

New YorkWiley, 1984.