design of dynamical associative memories via finite...
TRANSCRIPT
DESIGN OF DYNAMICAL ASSOCIATIVE MEMORIES
VIA FINITE-STATE RECURRENT NEURAL
NETWORKS
by
Mehmet Kerem MUEZZINOGLU
August, 2003
IZMIR
DESIGN OF DYNAMICAL ASSOCIATIVE MEMORIES
VIA FINITE-STATE RECURRENT NEURAL
NETWORKS
A Thesis Submitted to the
Graduate School of Natural and Applied Sciences of
Dokuz Eylul University
In Partial Fulfillment of the Requirements for
The Degree of Doctor of Philosophy in Electrical and Electronics Engineering,
Electrical and Electronics Program
by
Mehmet Kerem MUEZZINOGLU
August, 2003
IZMIR
ACKNOWLEDGMENTS
I would like to dedicate this work to my aunt Remziye Dalkıran, who initiated my
education. The steps through this point would have been more troublesome and less
meaningful without her. I would like to thank my parents for their continuous support during
my education and to my wife Tulay Muezzinoglu for her patience and helps in improving
and writing this thesis. I also appreciate Irem Stratmann’s prompted efforts in providing
necessary scientific documents.
I am thankful to the Scientific and Technical Research Council of Turkey, Munir Birsel
Foundation for providing the financial support to improve this work at Computational
Intelligence Laboratory, University of Louisville. Prof. Dr. Jacek M. Zurada was the one
who took care of me and my works there. His kind supervision is gratefully acknowledged.
Beyond all, I am indebted to Prof. Dr. Cuneyt Guzelis, who has never been false in his
advices at every stage of this Ph.D. work. Not only has he patiently supervised my graduate
studies, but also shaped consistently my academic point of view.
Mehmet Kerem MUEZZINOGLU
ABSTRACT
Information retrieval capability of recurrent neural networks and performances of their
formerly-proposed design procedures are questioned in this thesis work. Five novel design
methods for discrete Hopfield recurrent network model to restore prototype static vectors
from their distorted versions along the operation on a finite state-space are then introduced.
Qualitative properties provided by these methods are verified analytically, while quantitative
ones are estimated by conducting computer experiments. A comparison of each proposed
method with the conventional design procedures is presented in terms of these properties.
The performances of the resulting networks are finally demonstrated on benchmark static
information retrieval applications, namely character recognition and image reconstruction.
Keywords: Associative memory, Hopfield network, information storage, information
retrieval, image reconstruction.
OZET
Bu tez calısmasında dinamik yapay sinir aglarının bilgi geri-catma basarımları ve
bunlara iliskin onceden onerilmis tasarım yontemleri sorgulanmaktadır. Bozulmus statik
bellek vektorlerini onarmak uzere sonlu durum uzayında calısan ayrık Hopfield agı icin
bes yeni tasarım yontemi onerilmektedir. Yontemlerce saglanan nitel ozellikler analitik
olarak dogrulanmakta, nicel ozelliklerse bilgisayar deneyleri ile kestirilmektedir. Onerilen
her yontem, bilinen tasarım yontemleri ile bu ozellikleri acısından karsılastırılmıstır. Ilgili
yontemlerle tasarlanan dinamik agların basarımları, karakter tanıma ve goruntu onarma gibi
statik bilgi geri-catma uygulamalarında gosterilmistir.
Anahtar sozcukler: Cagrısımlı bellek, Hopfield agı, bilgi saklama, bilgi geri-catma,
goruntu onarma.
CONTENTS
Page
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IXList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X
Chapter OneIntroduction 1
1.1 The Memory Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Address Addressable Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Content Addressable Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Associative Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.1 Association in Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.2 Formulation of Auto-Association . . . . . . . . . . . . . . . . . . . . . . . . 51.2.3 The Nearest Codeword Problem . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.4 Auto-Associative Memory Design . . . . . . . . . . . . . . . . . . . . . . . 61.3 Neural Associative Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.1 Feed-Forward Auto-Associators . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.1.1 Optimal Linear Associative Memory . . . . . . . . . . . . . . . . . . . . . 81.3.1.2 The Hamming Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.2 Dynamical Auto-Associators . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.2.1 Brain-State-in-a-Box Model . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.2.2 The Hopfield Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.2.3 M-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter TwoDiscrete Hopfield Associative Memory 15
2.1 Discrete Hopfield Network Topology . . . . . . . . . . . . . . . . . . . . . . . 152.1.1 Operation Modes of Network . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1.2 Implementation Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 Recurrent Associative Memory Design Criteria . . . . . . . . . . . . . . . . . . 182.2.1 Criteria for Memory Representation . . . . . . . . . . . . . . . . . . . . . . . 192.2.2 Criteria for Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2.3 Ideal Recurrent Associative Memory . . . . . . . . . . . . . . . . . . . . . . 212.3 Milestones of Recurrent Associative Memory Design . . . . . . . . . . . . . . . 222.3.1 Outer-Product Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Projection Learning Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.3.3 Eigen-Structure Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.3.4 Linear Inequality Systems to Store Fixed Points . . . . . . . . . . . . . . . . 26
Chapter ThreeTwo Graph Theoretical Design Methods for Recurrent Associative Memory 29
3.1 The Boolean Hebb Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1.2 A Graph Representation of a Binary Memory Set . . . . . . . . . . . . . . . . 303.1.2.1 The Boolean Hebb Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.1.2.2 Formulation of Maximal Independent Sets . . . . . . . . . . . . . . . . . . 313.1.2.3 Compatibility of a Binary Set . . . . . . . . . . . . . . . . . . . . . . . . . 323.1.3 Design Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.1.3.1 Unipolar Discrete Hopfield Network . . . . . . . . . . . . . . . . . . . . . 353.1.3.2 A DHN Free from Spurious Memories: MIS Network . . . . . . . . . . . . 373.1.3.3 All Fixed Points of MIS-N are Attractive . . . . . . . . . . . . . . . . . . . 383.1.3.4 An Update Rule Provides Attractiveness for Each Memory Vector . . . . . . 393.1.4 Quantitative Properties of Boolean Hebb Rule . . . . . . . . . . . . . . . . . 423.1.4.1 Comparison with Outer-Product Method . . . . . . . . . . . . . . . . . . . 463.1.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.1.5.1 A Compatible Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.1.5.2 A Compatibilization Procedure and its Character Recognition Application . 513.2 Recurrent Associative Memory Design via Path Embedding into a Graph . . . . 533.2.1 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.2.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Chapter FourConstruction of Energy Landscape for Discrete Hopfield Associative Memory 58
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.2 Discrete Quadratic Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2.1 Original Design Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.2.2 Applicability of the Original Method . . . . . . . . . . . . . . . . . . . . . . 634.2.3 An Extension of the Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.3 Computer Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.3.1 Applicability and Capacity of the Original Design Method . . . . . . . . . . . 704.3.2 A Design Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.3.3 Character Recognition and Reconstruction . . . . . . . . . . . . . . . . . . . 734.3.4 A Classification Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.3.5 An Application of the Extended Method . . . . . . . . . . . . . . . . . . . . 75
Chapter FiveMulti-State Recurrent Associative Memory Design 77
5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.2 Design Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.2.1 Complex-Valued Multistate Hopfield Network . . . . . . . . . . . . . . . . . 805.2.2 Design of Quadratic Energy Function with Desired Local Minima . . . . . . . 825.2.3 Elimination of Trivial Spurious Memories . . . . . . . . . . . . . . . . . . . 84
5.2.4 Algorithmic Summary of the Method . . . . . . . . . . . . . . . . . . . . . . 865.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.3.1 Complete Storage Performance . . . . . . . . . . . . . . . . . . . . . . . . . 875.3.2 Application of the Design Procedure . . . . . . . . . . . . . . . . . . . . . . 89
Chapter SixMulti-Layer Recurrent Associative Memory Design 96
6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966.2 Multi-Layer Recurrent Network . . . . . . . . . . . . . . . . . . . . . . . . . . 976.3 Design Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Chapter SevenConclusions 104
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
LIST OF TABLES
Page
Table 3.1 The maximum capacity Cmax(n), the probabilitypc (n|m ≤ Cmax(n)), and the best lossless compression ratioRb. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Table 3.2 Percentages of complete storage in the DHNs designed by the Outer-Product Method (POPM%) and the Boolean Hebb rule (PBHR%) foruniformly distributed random sets. . . . . . . . . . . . . . . . . . . 48
Table 3.3 Complete storage percentages POPM% and PBHR% for different bitprobabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Table 3.4 Average percentages AvPOPM% and AvPBHR% for different bitprobabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Table 3.5 Simulation results of MIS-N. . . . . . . . . . . . . . . . . . . . . . 51
Table 3.6 Average number of spurious memories for some n, m values. . . . . 57
Table 4.1 Percentages of memory sets that yielded feasible inequality systems. 71
Table 5.1 Percentages of memory sets that yielded feasible inequality systems. 89
Table 6.1 Performance of the proposed method in providing perfect storage andcreating spurious memories and/or limit cycles depending on l. . . . 102
LIST OF FIGURES
Page
Figure 1.1 Block diagram of a typical address addressable memory operation. . 2
Figure 1.2 A content addressable memory. . . . . . . . . . . . . . . . . . . . . 4
Figure 2.1 Conventional discrete Hopfield network model. . . . . . . . . . . . . 16
Figure 2.2 Implementation of the analog Hopfield neural network model. . . . . 17
Figure 3.1 (a)The graphs Gx and resp. Gy having Sx = 1, 2 and resp. Sy =
1, 2, 3 as their unique MIS. x = [1 1 0]T and y = [1 1 1]T . (b) Thegraph G into which both x and y are embedded. . . . . . . . . . . . 33
Figure 3.2 (a) Gx, Gy and resp. Gz having Sx = 2, 3, Sy = 1, 3 andresp. Sz = 1, 2 as their unique MIS’s. (b) The graph G has anextraneous MIS, namely Se = 1, 2, 3. . . . . . . . . . . . . . . . 33
Figure 3.3 (a) The original numerals to be stored as memory vectors. (b)The compatibilized characters. (c) Some distorted numerals. (d)Numerals recalled by MIS-N. . . . . . . . . . . . . . . . . . . . . . 53
Figure 3.4 The graph indicating the binary vectors [0 1 0 1]T and [1 0 1 1]T asits paths between the nodes vd and vs. . . . . . . . . . . . . . . . . . 54
Figure 3.5 Block diagram of the proposed associative memory. . . . . . . . . . 56
Figure 4.1 An algorithmic summary of the overall design method. . . . . . . . . 69
Figure 4.2 Block diagram of the extended network. . . . . . . . . . . . . . . . 70
Figure 4.3 Set of characters which are embedded by the original design methodas memory vectors to DHN. . . . . . . . . . . . . . . . . . . . . . . 73
Figure 4.4 Reconstructions obtained by the resulting DHN. . . . . . . . . . . . 74
Figure 4.5 Three memory patterns used in the classification application. . . . . 74
Figure 4.6 The input map (a) and the classification result (b). . . . . . . . . . . 75
Figure 5.1 An illustration of csign8(u) for u = −1.2 − 0.5i. . . . . . . . . . . . 79
Figure 5.2 Test images used in image reconstruction example. . . . . . . . . . . 91
Figure 5.3 Images corrupted by 20% salt-and-pepper noise (above) and theirreconstructions obtained by the network (below). . . . . . . . . . . . 92
Figure 5.4 Images corrupted by 40% salt-and-pepper noise (above) and theirreconstructions obtained by the network (below). . . . . . . . . . . . 93
Figure 5.5 Images corrupted by 60% salt-and-pepper noise (above) and theirreconstructions obtained by the network (below). . . . . . . . . . . . 93
Figure 5.6 Filtered images obtained from noisy images with 40% salt-and-pepper noise by the network (above) and by median filtering (below). 94
Figure 5.7 Lenna images obtained by the networks designed by the generalizedHebb rule and by the proposed method, respectively. . . . . . . . . . 95
Figure 6.1 A two-layer recurrent network made up of discrete perceptrons. . . . 98
CHAPTER ONE
INTRODUCTION
Intelligence necessitates the ability to process information and any information first
needs to be stored in order to be processed. That is why a fundamental property
common to all intelligent systems is that they all employ well-organized, easily accessible
information storage devices. Due to these storage units, computers can perform high-speed
computations that increase our living standards, while animals gain experience to maintain
their lives. This work is devoted to the design of information storage systems, which can
actually do more than just containing the information. We begin by introducing basic
concepts and the terminology that will be used throughout the thesis.
1.1 The Memory Concept
A device that contains information and makes it available upon request is called a memory.
What is definitely expected from such a system is simply to preserve its content until it is
reloaded. A conventional memory contains inherently static information encoded using a
suitable alphabet, which is defined by the constraints imposed by physical implementation
of the system. For example, a RAM device used in digital computers can store only binary
information due to its CMOS structure, hence any information, e.g. a visual or an audio
track, should be encoded as a binary pattern every time before storage. As a consequence,
any information retrieved from this device should be decoded in order to be meaningful for
the user.
A memory performs its typical operation by presenting a single element of its content as
the output, which is implied by the input. Though systems employing a memory unit are
2
......
"Information atl-th location is
required!"
Output
Location #1
Location #l
AddressDecoder
MemoryContent
Figure 1.1: Block diagram of a typical address addressable memory operation.
conceptually dynamical, it should be noted that a conventional memory itself is algebraic,
because the information acquired from it is dependent only on the presented input according
to the definition above. This information is then transmitted to other processing units, where
it gains meaning, maintaining the possibly dynamical behavior of the overall system.
According to the way of inquiring their contents, i.e. the type of the excitatory input
applied, memories are grouped into two categories, namely Address Addressable Memory
(AAM) and Content Addressable Memory (CAM).
1.1.1 Address Addressable Memory
AAM comprises information storage units designed in such a way that each item of the
content occupies a specific location, i.e. a unique address, in a specific storage unit. Any
desired item is requested from AAM by providing its address as the input to the system. A
mechanism to convert the presented address to the location information is involved in this
operation. This peripheral, so called the address decoder, is usually considered as a part
integrated to the memory container. A block diagram of a conventional AAM is shown in
Figure 1.1.
A conventional AAM does not interpret the input nor process its content. Thus, its
operation is not fault-tolerable. In other words, one has to provide the precise and correct
address of the desired item. Most digital computers use localized AAMs, in which each
item is stored into a single memory unit independently of others. This is the actual reason
3
why sometimes inconsistencies occur in the execution of algorithms on digital hardware. In
most of these cases either information stored in the memory is modified out of the control of
the executed process or a wrong address information is presented to the memory causing the
process of a different information than the desired one. If no hardware or software action is
taken against these, a software crash problem is evident.
1.1.2 Content Addressable Memory
Beyond information storage, a memory may be capable of much superior tasks, such as
processing its input. What we roughly mean by processing here is to convert an information
into a desired form. Like encoding and decoding, a typical information processing task
widely performed in engineering is filtering out an effect of an error, i.e. noise, that
might have corrupted the original information. This procedure is obviously crucial for
any system that deals with information, hence possessing a noise-filtering ability would
be a favorable property for a memory. As an information container, CAM is indeed an
information processing system that possesses error correction performance.
Opposed to a localized organization, the content of a CAM may be distributed to different
locations in its information container where each location may include information about
several items of the content. Such a system’s superiority in preserving its content is due
to this distributed structure as undesired modifications on an item of the content can be
compensated by its complementary parts located at other locations.
Besides their offering an alternative distributed organization scheme, CAMs differ from
AAMs also by the way of their excitation. Instead of an address, one applies either an item
of the content or a distorted version of it as the input, then the system presents as output the
single item from its content, which is most similar to this input. This operation performed by
CAM is illustrated in Figure 1.2. It will be called auto-association hereinafter and discussed
in the next section.
Note that it is its auto-associative nature what renders a CAM tolerant to erroneous
inquiries. That is why CAMs are also cited as associative memories, though they constitute
only a subclass of associative memories, namely auto-associative memories.
4
CAM
p|M|p1 p2, , ,...x pi
The memorypattern mostsimilar to x
Distorted orincomplete version
of a memorypattern
Figure 1.2: A content addressable memory.
1.2 Associative Memory
An associative memory is a system, which stores mappings of specific input representations
to specific output representations. That is to say, a system that associates two patterns such
that when one is encountered, subsequently, the other can be reliably recalled. From this
point of view, an associative memory can be interpreted a selective filter, which removes the
noise on the memory patterns only.
Depending on the relevance between the domain and the range of the mapping it
performs, an associative memory belongs to one of the two categories, namely hetero-
associative and auto-associative. Memories of the former type associates between two
disjoint spaces. AAMs are examples of hetero-associative systems: Any input-output pair
(a,m) obtained from the system is an element of A×M , where A is the set of all acceptable
addresses and M is the set including the content, so A ∩ M = ∅. The latter type of systems
will be the focus of interest of this thesis, thus, ignoring their prefixes, hereinafter they will
be cited shortly as Associative Memories (AMs).
1.2.1 Association in Engineering
Kohonen has pointed an analogy between an associative memory and an adaptive filter
function in (Kohonen, 1988). The filter can be viewed as taking an ordered set of input
signals, and transforming them into another set of signals, i.e. the output of the filter. It is the
notion of adaptation, allowing its internal structure to be altered by the transmitted signals,
which introduces the concept of memory to the system. The above mentioned filtering
5
behavior is of course valid only for the patterns stored in associative memory, excluding it
from the traditional filter concept.
The most significant engineering application of such an information processing system
is the pattern association, which has been known to be the prominent feature of human
memory since Aristotle and used in all models of cognition as the basic operation
(Anderson, 1995).
1.2.2 Formulation of Auto-Association
AM is an information processing system that auto-associates, i.e. performs associations
between two nested pattern sets M and M , where M ⊆ M . The function performed by an
AM is expressed as follows:
f : M → M , f [x] = arg minm∈M
d(x,m) (1.1)
where M stands for the set of memory patterns, i.e. memory set, and d(·, ·) is a pre-
determined metric on M .
Evaluation of (1.1) for a given instance (x, M) is called the nearest neighbor
classification problem, whose solution is a crucial step not only for association but also for
many unsupervised learning procedures, e.g. clustering (Dogan & Guzelis, 2003). Nearest
neighbor classification belongs to a class of optimization problems, namely NP-complete
problems, which are known to be hardest among all problems (Vavasis, 1991). This is to say,
obtaining an exact solution of this set-constrained programming turns out to be excessively
time- and memory-consuming when |M | and/or the dimension of M , i.e. the lengths of
memory patterns, increase.
Systems developed to solve the considered problem, i.e. Nearest Neighbor Classifiers
(NNCs) (Duda & Hart, 1973), can actually be considered as perfect but expensive AMs.
Traditional NNCs follow an algebraic way in calculating (1.1): Having presented an
instance x ∈ M , these systems first need to calculate the distances d(x,m) for every
m ∈ M , and then determine the minimum one by mutual comparisons. As implied by the
6
arg operator in (1.1), they finally reconstruct the pattern m∗ which provides this minimum
distance and provide f [x] = m∗.
Though the algorithm given above is easily-implementable on a digital hardware, when
the cardinality of M is large, its performance requires large amount of computation and
memory resources because of the |M | distance calculations at the first step, the storage of
|M | distances at the second step, and the |M | · (|M | − 1)/2 comparisons at the third one.
Note also that the last step necessitates the explicit reconstruction of the memory pattern,
so each element of M must be represented as is, at least at this final step, occupying a
specific location in the system. That is why conventional NNCs should contain localized
information, hence do not benefit from the efficiency of compact representation, e.g.
memory compression.
1.2.3 The Nearest Codeword Problem
Since any finite static information can be encoded as a sequence of binary words of a specific
length, say n, in a distance-preserving fashion (Mano, 1991), an AM operating on the set
M = 0, 1n would provide a significant computer-aided solution to the nearest neighbor
classification problem. That is why works on binary AM is dominant in the related literature,
which is the case for this thesis, too.
This special form of auto-association, where M is given as a binary set in 0, 1n and
d(·, ·) in (1.1) is chosen as the Hamming distance (a well-known metric on the binary space),
has been formally defined as the nearest codeword problem (Garey & Johnson, 1979), which
is also NP-complete.
1.2.4 Auto-Associative Memory Design
Design of a system that evaluates (1.1) as its input-output relation is called the AM design,
which is the ultimate goal of this thesis work. Nevertheless, since its evaluation is expensive
as mentioned above, we are mostly interested in approximating (1.1) here instead by using
compatible models, which are relatively cheaper and faster than NNCs.
7
Having a system model at hand a priori, the design obviously reduces to the
determination of system parameters. This assumption will be valid for all design procedures
mentioned throughout this material.
Artificial neural networks have been considered as appropriate structures for such
models. This is partly due to their biological interpretation, which is out of the scope of
this work, as well as to their mathematically tractable information processing capabilities,
as discussed next.
1.3 Neural Associative Memories
A frequently-quoted formal definition given in (Hecht-Nielsen, 1990) introduces Artificial
Neural Networks (ANNs) as “parallel and distributed information processing structures
consisting of units, which can possess a local memory and carry out localized information
processing operations...”. According to this description, when designed suitably, ANNs are
supposed to be adequate devices to resemble AMs. Motivated by this reasoning, information
storage and retrieval capabilities of ANNs have been constituting one of the major research
areas in literature.
1.3.1 Feed-Forward Auto-Associators
The processing units mentioned in the definition above whose connectionism1 constitutes
an ANN are called neurons. From mathematical point of view, each neuron is assumed to
be performing basically a two-step algebraic operation, that is taking the weighted sum of
its inputs xi ∈ <li=1 and passing it through a nonlinearity φ(·), called activation function,
to produce its output:
y = φ
(
l∑
i=1
wi · xi+t
)
, (1.2)
where wi ∈ <li=1 are (synaptic) weights and t ∈ < is the threshold.
1This term has found a wide usage in ANNs literature instead of connectedness.
8
Every neuron in a network is considered to be a member of a layer, i.e. a processing
level, according to its turn of operation in the information flow within the assembly. In
feed-forward networks, each neuron contributes this flow only in the direction from the
input layer towards the output layer. A network topology which allows connections from
the output of each neuron in a layer towards an input of each neuron in the subsequent layer
is called fully-connected.
Note that feed-forward networks do not employ any memory, hence no information is
ever stored in their structures. This is the reason why they are also called as algebraic
networks. On the other hand, they have been utilized for auto-association due to their
avoiding some critical problems, e.g. instability, that may arise in the design of their
dynamical alternatives. The following feed-forward auto-associators are worth mentioning
shortly here in order to emphasize their advantages and especially disadvantages, which
directed the research on AMs towards dynamical systems.
1.3.1.1 Optimal Linear Associative Memory
The basic idea behind the Optimal Linear Associative Memory (OLAM) is that a single
layer of n neurons possessing linear activation functions performs linear filtering on an n-
dimensional real signal space, that is the simplest form of auto-association indeed.
The goal of OLAM design is to render the network map each element of a given finite
memory set M ⊂ <n onto itself instantaneously. Utilizing the outer-product method
(Haykin, 1994), the parameters of the considered network topology is determined as
W =∑
∀p∈M
p · pT , (1.3)
and all thresholds as zero. Here wij denote the real weight coefficient associated to the
connection from j-th input to i-th neuron.
Note that the basic design criterion given above does not imply error correction nor
avoids vectors other than the memory vectors to be stored, i.e. associated to itself. The
first major drawback of OLAM design can also be explained in this way: It is in general
9
impossible to filter out the noise on an arbitrary signal by using a linear filter. However,
when M consists of orthogonal vectors, it can easily be shown that the network designed
by (1.3) becomes selective only for the memory vectors. As discussed by Kohonen in
(Kohonen, 1977), the very restrictive orthogonality assumption can be relaxed to a milder
one, namely linear independence of memory vectors, if the weight parameters are chosen as
W = M ·(
MT · M)−1 · MT . (1.4)
where M denotes the n × |M | real matrix constructed by augmenting memory vectors as
columns.
OLAM is a simple but definitely primitive AM because of the above mentioned
restrictions on the memory set. This network has been proven in (Zurada, 1992) to exhibit
maximum performance when designed by (1.4), which also constitutes a background for
another well-known design technique described in the subsequent chapter.
1.3.1.2 The Hamming Network
To perform auto-association on a binary pattern space, the Hamming network proposed in
(Watta & Hassoun, 1991) employs a Hamming-distance calculator block incorporated with
a competitive network known as MAXNET (Suter & Kabrisky, 1992) at the first stage of its
operation. This crucial block contains the encodings of binary memory patterns, possibly
as Boolean functions, and points out the nearest one to the input pattern by producing a 1
at one of its |M | outputs and 0 at all others. The memory pattern implied in this way is
explicitly reconstructed then in the second stage. As a result, the network exactly evaluates
(1.1), thus is a perfect AM.
Although some alternative implementation techniques have been proposed to simplify
the costly operation of the first block, e.g. (Aksın, 2002), the Hamming network and its
derivatives (Ikeda et al., 2002) are simply NNCs realized on ANNs, so suffer from the
disadvantages mentioned in the last paragraph of Section 1.2.1.
10
Auto-association indeed is a more complicated task than a simple nonlinear
transformation of a matrix-vector product. This is why a single-layer feed-forward network
topology is too ineffective to cope with this problem. On the other hand, as demonstrated
by the Hamming network, it costs sophisticated processors, like MAXNET, to perform this
task adequately via an algebraic input-output relation. Though there appeared many other
attempts in literature, which are omitted here, to realize (1.1) using feed-forward ANNs,
each of them was subject to the above-mentioned trade-off between the simplicity of the
network topology and the auto-association performance. As a result, when the goal is to
obtain a cheap AM that has a comparable auto-association performance to that of an NNC,
the designer should give up the algebraicity and, of course, its natural benefits.
1.3.2 Dynamical Auto-Associators
What one can achieve by using a feed-forward ANN is a strict subset of the capabilities
of the same network with feedback provided from the output to the input, namely its
dynamical counterpart. Once designed properly, with possibly some more effort than the
design of feed-forward ANNs, dynamical ANNs offer a comparable topological complexity
to that of the simple feed-forward ones while improving substantially the auto-association
performance. On the other hand, their implementation schemes, e.g. (Ghosh et al., 1994),
employing analog circuit elements make them a much cheaper and faster alternative to
NNC, which is indeed suitable to be implemented as a software on a digital hardware.
Dynamical ANN models to be designed as AMs are usually considered as autonomous,
hence their design procedures aim to perform (1.1) not as an input-output relation but as a
mapping from the initial state vector to the steady-state solution. The trajectory of the state
vector produced along the operation is interpreted as an error correction, which does not
require any comparisons between the distorted pattern and the memory vectors. Moreover,
all memory vectors are encoded as system parameters and this allows for a distributed
representation, even data compression.
In the rest of this material, we focus on the design of such dynamical ANNs as AMs.
To locate the original design methods described in the subsequent chapters, the history of
dynamical auto-associators is pictured briefly in the sequel to be continued in Section 2.3.
11
1.3.2.1 Brain-State-in-a-Box Model
Auto-association performance of dynamical systems have been exploited first by a
biologically-inspired work (Ritz et al., 1977) in 1977. The dynamical ANN proposed therein
was called the brain-state-in-a-box model, and was made up of n coupled neurons with
piecewise-linear activation functions, which were collapsing all solutions of the dynamical
system within the unit hypercube [0, 1]n. Though it constitutes the first existence proof
of AMs realized on dynamical ANNs , this network was lacking a design procedure and
probably this was the reason why dynamical ANNs could not attract any significant attention
until 80’s (Golden, 1986).
1.3.2.2 The Hopfield Era
The pioneering work by Hopfield (Hopfield, 1982) was aiming to show the collective
computational capabilities of very simple algebraic processing units when properly
connected allowing for feedback. The problem chosen as an instance of collective operation
was exactly auto-association, hence this approach initiated the usage of dynamical ANNs
as AMs. This single work has been cited more than 4000 times for two decades by research
articles within the Science Citation Index.
The proposed network, so called the Discrete Hopfield Network (DHN), consists of a
single layer of n coupled (bi-state) neurons with hard-limiter type activation functions2.
The dynamics of the network was therefore constrained in the finite state-space −1, 1n,
i.e. the vertices of the unit hypercube. Many design methods for DHN to perform (1.1)
including many modifications of the original model have been proposed then. A formal
analysis of DHN together with a comprehensive discussion on its capabilities constitute the
subject of the following chapter.
1.3.2.3 M-Model
A Hopfield-like dynamical network with a sigmoidal or a piecewise-linear nonlinearity
has been considered as an effective binary AM since the qualitative result of Michel et.
2Such neurons had been introduced as discrete perceptrons in (Rosenblatt, 1962) and they have beenutilized to implement dichotomies, which constitutes another major issue in ANN literature.
12
al. (Michel et al., 1991), and handled as a different model, called M-Model. Though the
analytical design procedure, called eigen-structure method, proposed for this model is really
superior in many aspects to some well-known methods for conventional DHN, M-Model’s
infinite state space is its major drawback in resembling 1.1, whose domain and range are
both finite. The model and its design procedure is explained in Section 2.3.3 and it is
demonstrated by simulation results in subsequent chapters that there exists indeed better
conventional DHN methods than the eigen-structure method.
1.4 Organization of Thesis
The goal of this work is to derive efficient design techniques primarily for the conventional
discrete Hopfield topology to obtain an auto-associative memory while ensuring the perfect
storage. First two chapters of this material is devoted to the introduction of memory concept
and the formulation of the associative memory design problem, respectively. A critical
review of some well-known design procedures are given in Chapter 2.
A binary associative memory design procedure that gives a discrete Hopfield network
with a symmetric binary weight matrix is introduced in the first part of Chapter 3.
The method, which was first proposed in (Muezzinoglu, 2000) and then extended in
(Muezzinoglu & Guzelis, 2001), is based on introducing the memory vectors as maximal
independent sets to an undirected graph, which is constructed by Boolean operations
analogous to the conventional Hebb rule. The parameters of the resulting network is then
determined via the adjacency matrix of this graph in order to find a maximal independent
set whose characteristic vector is close to the given distorted vector. The applicability of
the design method is finally investigated by a quantitative analysis, which was also given in
(Muezzinoglu & Guzelis, 2003a). The theoretical results presented therein are valuable as
they prove that, whenever the given memory vectors are correlated as being compatible, a
discrete Hopfield network with only binary parameters can be designed i) free from spurious
memories, ii) ensuring perfect storage, and iii) with high storage capacity. The graph
theoretical concept of compatibility is introduced and a quantitative analysis is conducted
to enlighten how restrictive this property really is.
13
Another graph theoretical design method for binary recurrent associative memory to
retrieve binary memory vectors from their distorted versions is introduced in the second
part of Chapter 3. The method, which was reported in (Muezzinoglu & Guzelis, 2002),
is based on the observation that an undirected graph of n + 2 nodes represents some n-
dimensional vectors as paths between a specific pair (s, d) of nodes such that any edge of a
path indicates two successive entries of value 1 of the representing vector. We construct a
graph as a union of paths generated for a memory vector set. A memory vector embedded
into the graph in this way is recalled by using a discrete Hopfield network whose trajectory
begins at a distorted vector as the initial condition. Finally the network converges to a
binary vector indicating one of the nearest paths in this graph. As a result, the original
memory vector is reconstructed from the arc-based representation of this path. Quantitative
analysis supported by the simulations shows that the method is superior to many recurrent
associative memory design methods in these aspects: i) An arbitrary memory set can be
embedded as attractive fixed points of the resulting associative memory, ii) the number of
extraneous fixed points is in general greater than the ones caused by the conventional outer-
product method, but these unavoidable extraneous fixed points are all close to the memory
vectors in the resulting network, thus they cause relatively small errors in recall.
An energy function-based auto-associative memory design method to store a given
unipolar binary memory vector set as attractive fixed points to an asynchronous discrete
Hopfield network is presented in Chapter 4. The discrete quadratic energy function, whose
local minima correspond to the attractive fixed points of the network, is constructed by
solving a system of linear inequalities derived from the strict local minimality conditions.
This idea was introduced in (Muezzinoglu et al., 2003a). The parameters (weights and the
thresholds) of the network are then calculated using this energy function. If the inequality
system is infeasible, it can be concluded that no such asynchronous discrete Hopfield
network exists. In this case, we extend the method to design a discrete piecewise quadratic
energy function, which can be minimized by a generalized version of the conventional
discrete Hopfield network, also proposed therein. In spite of its computational complexity,
it is verified by the computer simulations that the original method performs better than
the conventional design methods in the sense that the memory can store, and provide
the attractiveness for almost all memory sets whose cardinality is less than or equal to
14
the dimension of its elements. A convincing character recognition application presented
(Muezzinoglu et al., 2003b) is also included. The complete method, together with its
extension, guarantees the storage of an arbitrary collection of memory vectors, which are
mutually at least two Hamming distances away from each other. The derivation of this
method enlightens the achievable upper bound on the performance of a conventional discrete
Hopfield network in association tasks.
Motivated by the results derived in Chapter 4, a method to store each element of an
integral memory set M ⊂ 1, 2, . . . , Kn as a fixed point into a complex-valued multi-state
Hopfield network is introduced in Chapter 5. This method, which was originally proposed
in (Muezzinoglu et al., 2003c), employs a set of inequalities to render each memory pattern
as a strict local minimum of a quadratic energy landscape, too. Based on the solution of this
system, it gives a recurrent network of n multi-state neurons with complex and Hermitian
synaptic weights, which operates on the finite state space 1, 2, . . . , Kn to minimize this
quadratic function. The maximum number of integral vectors that can be embedded into the
energy landscape of the network by this method is investigated by computer experiments.
Chapter 5 also presents an application of the proposed method by reconstructing noisy gray-
scale images, as was done in (Muezzinoglu et al., 2003d).
Finally, to achieve a perfect storage beyond the capability of the conventional discrete
Hopfield network, a novel design procedure to embed binary memory vectors as attractive
fixed points to a recurrent multi-layer neural network is presented in Chapter 6. It
is first shown that an additional layer to conventional Hopfield model is necessary for
providing perfect storage of some memory sets. Then the link between the number of
hidden layer neurons and the association performance has been exploited. In the proposed
design procedure, which was originally reported in (Muezzinoglu & Guzelis, 2003b)
and (Muezzinoglu & Guzelis, 2003c), we make use of the well-known back-propagation
learning algorithm to provide attractiveness for each memory vector. A pruning technique
is then employed to minimize the number of hidden-layer neurons, so that the network
topology is simplified. The performance of the proposed method is investigated by extensive
computer analysis.
CHAPTER TWO
DISCRETE HOPFIELDASSOCIATIVE MEMORY
This chapter introduces DHN, which is considered as the fundamental model in the
design strategies that are developed in the subsequent chapters. The recurrent AM design
criteria are posed next irrespective of the network model used in the design. Some major
procedures that have been suggested previously to render DHN perform as an AM are
also mentioned herein, since they will then constitute the references for evaluating of the
proposed methods.
2.1 Discrete Hopfield Network Topology
The conventional fully-connected DHN topology (Hopfield, 1982), as illustrated in
Figure 2.1 consists of a single layer of n bi-state discrete perceptrons each of which
takes a weighted sum of the delayed output values and then passes it through the signum
nonlinearity sgn(·) to produce the next output. The connection weights from the state
(output) of i-th neuron to an input of j-th one is a real number denoted by wji and each
neuron possesses a real threshold (or bias) ti.
2.1.1 Operation Modes of Network
There are two operation modes defined for this discrete-time system, namely synchronous
and asynchronous modes. In synchronous mode, the system performs the recursion
x[k + 1] = sgn (W · x[k] + t) , (2.1)
16
Σ u(.)
Σ u(.)
...
w
w
w
w
11
nn
n1
1n
x [k]1
nx [k]
x [k+1]1
x [k+1]n
...
t1
tn
Figure 2.1: Conventional discrete Hopfield network model.
where x[k] stands for the network state (output values of the neurons) at time instant k.
Alternatively, allowing the update of only one element, say i-th one of the state vector at
each iteration according to
xi[k + 1] = sgn
n∑
j=1
wij · xj[k] + ti
(2.2)
prescribes the asynchronous operation mode of the same network.
Due to the the nonlinearity sgn(·), the state-space of the recurrent network in either
operation mode is the bipolar binary space −1, 1n, i.e. the 2n vertices of the unit-
hypercube1. It is also obvious by the two recursions that the fixed points of the network
are invariant under the operation mode. Note however that, as only one neuron is allowed
change its state in asynchronous mode, for all initial conditions the state vector of the
network necessarily follows a path passing through adjacent vertices.
The update order applied in asynchronous mode may affect the trajectory of DHN. In
other words, when two different update orders are applied for two identical asynchronous
DHNs with the same initial conditions, the sequence of binary vectors produced along these
two recursions is different in general. For natural reasons, generally no update order is
specified for asynchronous DHN, deciding on randomly which neuron to update at time
1Due to the change-of-variables xu = (xb + e)/2, where e is a vector with all 1 entries, for every DHNoperating on the bipolar binary space with state vector xb, one can a network with the same properties butoperating on the unipolar space 0, 1n with state-vector xu. The converse is also true as this transformationis bijective, i.e. xb = 2 · xu − e.
17
r+- V x
xi
i-
+- xj
Rij
i
xi(0)
RCi
......
cc
Figure 2.2: Implementation of the analog Hopfield neural network model.
instant k, but complying with the condition: All neurons should be updated within time
intervals n · l < k < n · (l + 1), l = 0, 1, 2, . . .. This scheme is called as the random
update. Due to this randomness involved in their operations, such asynchronous DHNs are
considered as stochastic systems despite their deterministic physical parameters.
2.1.2 Implementation Notes
What makes the usage of DHN preferable in complicated tasks, such as auto-association
or optimization, is definitely its simplified implementation schemes employing basic circuit
elements, instead of complicated digital processors. Though this issue is out of the scope of
this work, which deals with AM design from the point of view given in section 1.2.4 only,
a straightforward model proposed in (Michel & Liu, 2002) is included here in Figure 2.2 to
support the argument that Hopfield networks are much cheaper alternatives to NNCs, and,
consequently, to other ANNs that resemble NNCs.
The circuit made up of the neurons given by Figure 2.2 with infinite gain operational
amplifiers, performs the synchronous recursion (2.1) by making use of only n operational
amplifiers, n linear resistors, and n linear capacitors. Each operational amplifier in the
circuit acts as a weighted-summer of the state variables xi[k] and its infinite open-loop gain
18
realizes a signum nonlinearity sgn(·). Choosing R = 1Ω, the real parameters wij of the
recursion becomes equal to Rij.
To resemble the asynchronous mode operation, it is sufficient to incorporate an external
device to the given model in order to control the delay blocks, i.e. the latches denoted by D
in the figure, such that no two of them change their outputs exactly at the same time, if all
elements of the circuit are considered to be ideal. On the other hand, such unsystematic
delays occur in practice, without any need to take action to ensure them. Hence, the
asynchronous DHN with random update indeed can be considered as a more realistic
model developed to exhibit the effects of non-ideal switching of these delay elements in
synchronous mode. However, it should be noted here that, this is not our (and many other
researchers’) point of view to an asynchronous DHN, since we will focus on this system not
because we can never implement a synchronous DHN in reality, but because of its ensuring
a vital AM design criterion described below.
2.2 Recurrent Associative Memory Design Criteria
Autonomous recurrent networks, especially DHNs, are utilized in AM design to realize
the association function (1.1) as a map from their initial states to their fixed points. To be
precise, whenever a distorted pattern d is presented to a recurrent AM by injecting it as the
initial value of the state vector x[0], the evaluation of the association function should be
produced by the system as the steady-state solution: x[∞] = f(d). Since the network is
expected to perform this for every d ∈ M , it should be designed such that its state-space
includes the pattern space M . Satisfying this fundamental consideration or not is of course
a matter of the choice of activation functions of the neurons in the output layer. For example
DHN automatically satisfies it for binary association as its state-space is the entire binary
space, which is equal to M in nearest code word problem (c.f. Section 1.2.3). However,
in the design of DHNs to operate as AMs on other pattern spaces, the criterion mentioned
above should constitute the first concern of the designer. Further design considerations are
grouped into two categories as follows.
19
2.2.1 Criteria for Memory Representation
Since the steady-state solutions of the recurrent network represents the range M of the
association function, which consists of constant values only in the considered case of auto-
association, the network should not exploit any kind of behavior other than converging
towards a fixed point. Such dynamical systems are defined in (Hirsch & Smale, 1974) as
convergent. Another design condition then follows as the convergence:
Condition 1 Each trajectory of the system should tend to a fixed point, i.e. the system
should be convergent.
To represent all static memory vectors properly, each fixed point of the network should be
corresponded by the designer to an element of M , which is a given collection of n-vectors
in auto-association problem. Hence, recurrent AM design can be viewed as a dynamical
system design with given fixed points.
Condition 2 The set of fixed points of the system should contain the given set of memory
vectors.
Moreover, this correspondence should be provided in one-to-one fashion, because any
fixed point that is excluded by M would otherwise represent an undesired memory, namely
spurious memory, in the state-space of the system.
Condition 3 The set of fixed points should contain no element other than the given memory
vectors, preventing spurious memories.
2.2.2 Criteria for Error Correction
Conditions 1 and 2 are necessary and sufficient to represent the given memory set as fixed
points of the considered recursion, thus a recurrent network satisfying them recalls each
memory successfully when initiated by the memory pattern itself. However, one cannot
expect such a network to perform error correction yet, without any conditions its recurrence.
20
Error correction, or memory retrieval, has been defined as the evolution of the state vector
from a distorted pattern, which is injected as the initial state vector, towards a memory
pattern, which is a specific fixed point x∗ of the recurrence. Consequently, if this fixed point
can be introduced to the network such that other points in the state-space which are located
around x∗ are mapped to x∗ along the recurrence, then the network gains error-correction
capability, at least for the distorted patterns within this neighborhood. For binary recurrent
AM, this condition is expressed as:
Condition 4 Any fixed point x∗ of the system is attractive in the sense that for any point x
within 1-Hamming distance neighborhood of x∗, there exists a trajectory which starts at x
and tends to this fixed point x∗.
Though the above given definition of attractiveness becomes equivalent to the asymptotic
stability in the sense of Lyapunov (Vidyasagar, 1993) for deterministic and uniquely-
solvable finite-state systems, it is necessary here to extend the stability concept to a
stochastic process, such as asynchronous DHN with random update.
Finally, to satisfy the nearest neighbor consideration as imposed by (1.1), the network
should correct errors such that for any point in the state-space should be mapped to the
nearest fixed point. It can easily be shown that this claim is equivalent to the following final
condition.
Condition 5 The radii of attraction basins of attractive fixed points are almost equal.
Hence, these basins share the state space in an equal way.
Herein, attraction basins are defined based on the definition of attractiveness in Condition
4: A point x is in the attraction basin of a fixed point x∗ if there is a trajectory starting at x
and ending at x∗.
21
2.2.3 Ideal Recurrent Associative Memory
A procedure for a specific network model that gives recurrent AM satisfying all design
conditions for an arbitrary collection of memory vectors is called as an ideal AM design
method. Such a method has not appeared in the literature for DHN, nor for any other finite-
state recurrent network model, yet. Actually, an ideal design method can never be proposed
for DHN as proven in Chapter 4.
In representing the memory set, most procedures fail to satisfy Condition 3, since any
attempt to avoid spurious memories usually necessitates the handling of whole state-space
of the network (Athithan & Dasgupta, 1997), which is huge for pattern spaces of high
dimensions.2 Almost all existing methods suffers from the same problem also in satisfying
Condition 5. Therefore, methods satisfying the rest of the conditions are considered to be
the successful ones, which are still rare in the literature.
Definition 1 A design method that gives a recurrent network satisfying Condition 1, 2, and
4 for an arbitrary collection of memory vectors is said to provide the perfect storage.
The expression “the arbitrary collection” naturally includes memory sets of large
cardinalities. As will be illustrated, challenging such sets are relatively harder than storing
small memory sets as fixed points for any design method. In other words, storing a set of
uncorrelated memory vectors by any design method turns out to be a more difficult task as
the cardinality increases. This is why most design methods are investigated quantitatively
to give an upper bound on the number of memory vectors that can be successfully stored.
The regarding method is then considered to be working properly for the memory sets under
this limit.
Definition 2 The maximum number of memory vectors that can be stored as fixed points to
a recurrent AM by a design method is called the method’s memory capacity.
2Even determining the exact locations of spurious memories in a designed network has been reported asan NP-complete problem in (Bruck & Roychowdhury, 1990).
22
2.3 Milestones of Recurrent Associative Memory Design
As proven in (Bruck & Goodman, 1988) and in the next chapter for the unipolar binary case,
symmetry of the weight matrix is one of the two sufficient conditions for the convergence
of an asynchronous DHN. Symmetry is also advantageous in the sense that such a weight
matrix entails only the diagonal and the upper triangular part, i.e. (n2 + n)/2 real values, in
order to be characterized. Noting that some efficient but computationally-costly methods,
such as (Sudharsanan & Sundareshan, 1991) and (Sompolinsky & Kanter, 1986) have been
proposed for non-symmetric DHNs, we further restrict ourselves at this point to symmetric
DHNs.
This section presents four major recurrent AM design methods in chronological order
that have been proposed for symmetric DHNs.
2.3.1 Outer-Product Method
Inspired by the Hebb rule (1.3) used in the design of OLAM, the first recurrent AM design
tool for DHN was proposed in (Hopfield, 1982). The outer-product method is very easy-to-
apply, but gives a rather primitive AM as explained below.
Given a memory set M ⊆ −1, 1n, the positive integer-valued weight matrix of the
network is determined by
W =∑
p∈M
p · pT + |M | · I, (2.3)
where I denotes the n × n unity matrix, and the threshold of each neuron is chosen as zero.
It has been shown in (Bruck & Goodman, 1988) that the outer-product method ensures
the convergence of the resulting network. However, there is no guarantee that each given
memory vector will be mapped to a fixed point. Moreover, nothing can be said about
the attractiveness of fixed point of the network. As a result outer-product method can not
provide the perfect storage. The reason can be explained by an energy-function approach:
23
As will be shown in Chapter 4, each DHN whose weight matrix W is symmetrical
possessing zero diagonal entries, locally minimizes a discrete quadratic
E(x) = −xT ·W · x (2.4)
defined on the state-space −1, 1n. In other words the state vector necessarily tends to a
discrete local minimum of (2.4). However, the considered method in general does not map
M to the local minima of (2.4), since the local minima of the quadratic form
E(x) = xT ·
∑
p∈M
p · pT
· x − |M | · xT · x (2.5)
is in general different than the elements of M . This also explains why the method provides
perfect storage in the trivial case where |M | = 1, because the single local minimum of (2.5)
is indeed the single memory vector in this case.
Another negative outcome of the method is that the spurious memories necessarily occur
in the resulting network, while some of these undesired points can be easily identified as
negatives of the stored memory patterns3.
Fact 1 If p is a fixed point of a DHN recursion with zero thresholds, then −p is also a fixed
point.
Finally, the method is also ineffective from quantitative aspects. A theoretical result
presented in (Dembo, 1989) states that, given uncorrelated n-dimensional memory vectors,
the outer-product method is able to store up to only 0.138 · n points among these as
fixed points of the resulting DHN, without ensuring attractiveness, as will be verified
experimentally in Chapter 3.
Despite its above-mentioned shortages, the outer-product method initiated a major
research field for ANNs, hence is valuable as the pioneering attempt to design AMs on
DHNs.
3Such spurious memories will be introduced in Chapter 5 as trivial ones.
24
2.3.2 Projection Learning Rule
Utilizing pseudo-inverse techniques, Personnaz et.al. developed a method that guarantees
Condition 1 and 2 (Perzonnas et al., 1986). The method, so called the projection learning
rule, has been originally introduced for synchronous DHN, but, of course, works for the
asynchronous DHN, too.
The basic idea behind the procedure is that the storage of memory vectors would be
successfully accomplished if the equality
W · M = M (2.6)
holds, when the thresholds are chosen as zero. Here M is the matrix form of the given
memory set M ⊆ −1, 1n. The minimum-norm solution to (2.6) is given by
W = M · M+, (2.7)
where M+ denotes the Moore-Penrose pseudo-inverse of the binary matrix M. Note that,
when the memory vectors are linearly independent, one obtains the expression
W = M ·(
M · MT)−1 · MT (2.8)
which is of the projection matrix form, so maps any vector onto a memory vector in a single
iteration of the synchronous DHN recursion.
Projection learning rule does not guarantee the attractiveness condition, unless the
memory vectors are mutually orthogonal. However, it at least exploits a correlation, namely
orthogonality, for the given memory set to provide perfect storage.
By the orthogonality assumption, the conditional memory capacity provided by the
method is determined as n, which is relatively high compared to the outer-product method.
25
2.3.3 Eigen-Structure Method
One of the well-known works on recurrent AM design was proposed by Michel et al. in
(Michel et al., 1989). The eigen-structure method in this work was originally proposed
for a modified DHN, called M-Model, where the signum nonlinearities are replaced with
piecewise-linear (or saturated linear) functions. The system thus operates within the unit-
hypercube [−1, 1]n in discrete time. A detailed analysis of the considered network model
and its applications can be found in (Michel & Farrell, 1989). The method has been extended
to continuous-time networks then in (Li et al., 1989).
Given a memory set M = p1, . . . ,p|M | ⊆ −1, 1n the eigen-structure method
consists of the following steps:
1. Choose a memory vector pr and compute the n × (|M | − 1) matrix:
Y = [p1 − pr · · · pr−1 − pr pr+1 − pr · · · p|M | − pr]. (2.9)
2. Calculate singular value decomposition of Y and obtain the matrices U, V and Σ
such that Y = U · Σ ·VT . Let
Y = [y1 · · · y|M |−1],
U = [u1 · · · un],
l = dimension of Spany1, . . . ,y|M |−1. (2.10)
3. Compute
W+ =l∑
i=1
ui(ui)T ,
W− =n∑
i=l+1
pi(pi)T . (2.11)
4. Choose a positive number τ and compute the weight matrix and the threshold vector
of the network as follows:
W = W+ − τ ·W− and t = pr − W · pr. (2.12)
26
Some important properties of this procedure are as follows:
• Matrices W+ and W− depend only on the given memory set and they are independent
of the choice of pr at the first step of the procedure.
• Each memory vector is a fixed point of the resulting network.
• For sufficiently small τ > 0, each equilibrium of the network is stored as an
asymptotically stable equilibrium of the network.
The most positive property of the method is that, without any restrictions on the given
memory set M , the eigen-structure method is capable of storing each memory vector as
an asymptotically stable equilibrium of the resulting network. This is to say, the domain
of attraction for each fixed point is ensured to be nonempty bounded set in <n. However,
this does not imply that each fixed point is attractive on the binary pattern space −1, 1n.
Consequently, the perfect storage is not guaranteed by the eigen-structure method.
The maximum number of memory vectors that can be stored by the method is 2n, which
is the maximum capacity that a design method can achieve.
The results listed above, show that the performance of the system is closely related to
the selection of the parameter τ . For the increasing value of τ , up to a critical value above
which Condition 2 fails, the decrement in the number of spurious states in the network has
been observed by the authors. However, the occurrence of spurious states in the network
are not totally prevented in general, so we say that the method does not satisfy Condition 4.
2.3.4 Linear Inequality Systems to Store Fixed Points
Another effective recurrent AM design method was proposed in (Tan et al., 1991) treating
each neuron of a DHN separately and formulating the design considerations as a system
of linear inequalities. By this approach described below, the design reduces to a linear
feasibility problem to be solved by following various strategies.
27
First ignoring the overall dynamical behavior of the system, each discrete perceptron in
the network can be considered to be implementing a dichotomy as producing the output
y =
1 if∑
i wi · xi + t > 0
−1 otherwise, (2.13)
where x is the input vector applied. Then, a fixed point x∗ of DHN recursion is identified
as a bipolar binary point satisfying the inequality system
x∗1 ·(
wT1 · x∗ + t1
)
> 0
...
x∗n ·(
wTn · x∗ + tn
)
> 0 (2.14)
where wTi denotes the i-th row of the weight matrix.
When this system of n inequalities is imposed for all p ∈ M , a solution (W, t) to the
overall system would actually constitute a set of desired coefficients to ensure Condition 2.
However, the above given conditions are derived only for fixed points, thus the network may
exhibit undesired behaviors during its recursion. In addition to the resulting c · |M | linear
inequalities, the symmetry condition is therefore imposed to ensure Condition 1:
wij = wji ∀i, j ∈ 1, 2, . . . , n. (2.15)
Solving a system of linear inequalities can be formulated as an optimization problem,
called linear feasibility problem (Mangasarian, 1994):
minw∈<d
c
s.t. A · w + b ≤ 0 (2.16)
where c is an arbitrary constant. Two effective tools to solve this problem are the well-known
simplex method (Bertsekas, 1995) and a useful learning algorithm proposed for discrete
perceptrons, so called the discrete perceptron learning algorithm (Rosenblatt, 1962). One
28
should note however that a solution to the formulated problem might not exits, which means
that there exists no DHN that accepts all memory vectors as its fixed points.
The proposed method is superior to any other method in storing fixed points, but does
not take the attractiveness condition into account, thus does not guarantee perfect storage.
On the other hand, it may actually be extended to render each fixed point attractive by
augmenting additional linear inequalities to the constraints. Beyond fixed points, even
desired trajectories can be embedded to the DHN recurrence in this way. In relation to
the discrete-perceptron-based design, a new method following a similar but indirect way in
the recurrent AM design will be described in Chapter 4, further ensuring attractiveness and
also accounting for the memory capacity.
CHAPTER THREE
TWO GRAPH THEORETICALDESIGN METHODS FOR
RECURRENT ASSOCIATIVEMEMORY
This chapter introduces two original design methods for asynchronous DHN operating
on unipolar binary pattern space. Both methods make use of graphs that provide effective
binary information representation.
3.1 The Boolean Hebb Rule
3.1.1 Motivation
Based on the observation of the one-to-one correspondence between the fixed points of a
specific DHN and the Maximal Independent Sets (MISs) of a given graph, a design method
for DHNs in order to solve the vertex cover problem has been suggested in (Shrivastava
et al., 1992). This DHN, so called nonpositive Hopfield network, has a symmetrical,
nonpositive weight matrix and a zero threshold vector. Though its aim is not to obtain an
AM but to solve a quadratic 0-1 problem, the proposed method constitutes a remarkable
contribution to the binary recurrent AM design, because it actually provides an AM
satisfying both Condition 1 and 2 under a condition, namely the compatibility of a given
set of memory vectors: There exists a graph accepting all these vectors as the characteristic
vectors of its MISs. However, this design method still fails to satisfy the attractiveness
consideration. As a consequence of this fact, it has been applied in (Shrivastava et al., 1995)
30
to correct unidirectional errors in binary codes, i.e. the errors caused by transitions either
0 → 1 or 1 → 0, but not both.
The maximum number of binary codes that can be introduced as attractive fixed points
to the nonpositive Hopfield network has also been investigated and found to be 3n/3 for
n = 3l, where l is a positive integer. (see (3.20) for other n values). This number is
fairly high when compared to the achievable capacities of outer-product method and to the
projection learning rule.
This section reports an extended theoretical result, namely a DHN satisfying Conditions
1-4 exists with high memory capacity if the given memory vectors are correlated as
being compatible. A Hebbian-like design procedure for this network is described, and a
comprehensive capacity analysis is finally presented.
3.1.2 A Graph Representation of a Binary Memory Set
A graph G =< V, E > consists of a set V of nodes and a set E ⊂ V × V of edges which
represent connections between some pairs of nodes. Two nodes of a graph are said to be
adjacent if there exists an edge between them, and nonadjacent otherwise. A graph of n
nodes is represented by an n×n binary symmetrical matrix A = [aij], called the adjacency
matrix, such that
aij =
1 if (i, j) ∈ E
0 otherwise(3.1)
A set S ⊂ V of nodes is called an independent set if the elements of S are pairwise
nonadjacent. An independent set SM is maximal if none of its strict supersets, i.e. the sets
including SM together with at least one node not included by SM , is also an independent
set. An independent set can be represented by an n-dimensional binary vector, so called the
characteristic vector, defined by:
xSi =
1 if node i ∈ S
0 otherwise. (3.2)
31
3.1.2.1 The Boolean Hebb Rule
The adjacency matrix A of a graph with known independent sets
xS1 ,xS2, · · · ,xSp
is the
Boolean complement of the following martix A:
A =[
xS1 ·(
xS1
)T]
∨[
xS2 ·(
xS2
)T]
∨ · · · ∨[
xSp ·(
xSp
)T]
(3.3)
Herein, “·” stands for real vector multiplication, and “∨” for bitwise Boolean OR operation.
Note that the above construction of matrix A from a given set of vectors is similar to the
Hebb’s rule (or outer-product rule) used in the design of DHN. Boolean OR operation is
used here instead of real addition in order to accumulate the outer-products.
3.1.2.2 Formulation of Maximal Independent Sets
The following theorem enables to identify MIS’s in an algebraic way.
Theorem 1 ((Pardalos & Rodgers, 1992)) x∗ is the characteristic vector of an MIS in graph
G if and only if x∗ is a discrete local minimum of the quadratic function defined by
EMIS(x) = xTAx − eTx, x ∈ 0, 1n , (3.4)
where e is the n-dimensional column vector whose entries are all unity.
The above mentioned local minimality of an x∗ means that EMIS(x∗) ≤ EMIS(x) for
every x satisfying dH(x∗,x) = 1. Such a minimum point x∗ is called strict if the inequality
is strict, i.e. EMIS(x∗) < EMIS(x) for every x satisfying dH(x∗,x) = 1, where dH(·, ·)denotes the Hamming distance.
Fact 2 Any local minimum of (3.4) is necessarily strict.
Proof: Considering the result presented by Theorem 1, it suffices to prove that the
characteristic vectors of MIS’s in a graph are at least 2-Hamming distance away from
32
each other mutually. To see this fact, we will use contradiction. Let x and y denote two
characteristic vectors associated to MIS’s X and Y in a graph G, and suppose dH(x,y) = 1,
which means that x and y differ in a single entry, say i-th one. Then, x is either equal to
y + ui or to y − ui, where ui is the i-th unit vector: uii = 1 and ui
j = 0 ∀j 6= i. The first
case here implies that X is the set of nodes which can be obtained by augmenting the node
i to Y , i.e. Y ⊂ X . Similarly, the second case implies X ⊂ Y . These inclusions contradict
with the maximality of Y and X , respectively.
3.1.2.3 Compatibility of a Binary Set
Let the rule (3.3) used in the construction of A matrix from its constituting characteristic
vectors be applied using the elements of a given M = ximi=1 ⊆ 0, 1n. In this case, the
correspondence of M to the set of local minima of the energy function EMIS is ensured
to be one-to-one if and only if each vector represents a maximal independent set and no
extraneous maximal independent set occurs in the resulting graph. The following definition
and theorem are given to describe and to test this property, respectively.
Definition 3 A set M of n-dimensional binary vectors is called compatible if there exists a
graph G with n vertices such that the set of characteristic vectors of MIS’s in G is equal to
M .
There are obviously two cases in which the compatibility of M is violated:
Case 1. There exists a pair of distinct vectors x and y both in M such that the independent
set Sx represented by x is a superset of Sy represented by vector y.
Case 2. There exist an extraneous MIS in graph G which does not correspond to any vector
in M .
These two cases are illustrated in Figures 3.1 and 3.2 respectively.
By the following theorem, we introduce the necessary and sufficient conditions for
compatibility which are checked directly on the memory vectors.
33
G yG x
1
2 3
1
32
1
2 3G(b)(a)
Figure 3.1: (a)The graphs Gx and resp. Gy having Sx = 1, 2 and resp. Sy = 1, 2, 3 as
their unique MIS. x = [1 1 0]T and y = [1 1 1]T . (b) The graph G into which both x and y
are embedded.
Gx32
1
Gy
1
32Gz
1
2 3
1
2 3G
(b)(a)
Figure 3.2: (a) Gx, Gy and resp. Gz having Sx = 2, 3, Sy = 1, 3 and resp. Sz = 1, 2as their unique MIS’s. (b) The graph G has an extraneous MIS, namely Se = 1, 2, 3.
Theorem 2 A set M of binary vectors is compatible if and only if the following conditions
are satisfied:
COMP1 For every x,y ∈ M, there exists indices i, j such
that xi = yj = 1 and xj = yi = 0.
COMP2 Whenever xj = xk = yi = yk = zi = zj = 1 and
xi = yj = zk = 0 for some x,y, z ∈ M, there
exists w ∈ M such that wi = wj = wk = 1
and in addition xl = yl = zl = 1 implies wl = 1
for any l.
As explained in the proof of Theorem 2, which can be found in Appendix, COMP1 is the
necessary and sufficient condition for representing each element of a binary vector set as an
34
MIS in the resulting graph obtained by (3.3). Similarly, COMP2 defines the binary vector
sets which does not cause an extraneous MIS in the resulting graph after applying (3.3). In
other words, these are actually the necessary and sufficient conditions on a binary vector set
for avoiding the violations mentioned in Case 1 and Case 2, respectively.
3.1.3 Design Procedure
Given a graph G, the problem of finding an independent set of maximum cardinality is called
the maximum independent set problem. It has been proven in (Pardalos & Rodgers, 1992)
that the global minimizer of EMIS corresponds to the characteristic vector of the maximum
independent set in G, hence the minimization of EMIS constitutes one of the well-known
solution strategies for the maximum independent set problem.
A gradient-like dynamical neural network can be employed to minimize EMIS, as done in
(Jagota, 1995), (Sengor et al., 1999) and (Pekergin et al., 1999). However, if there exist some
MISs of lower cardinality than the largest one in the given graph, then the solutions of these
networks may be trapped by a non-global local minimum depending on the initial state.
Thus the exact solution of the maximum independent set problem is not guaranteed by these
networks. This disadvantage of gradient based methods in solving maximum independent
set problem turns out to be an advantage if it is applied on AM design as this paper suggests.
From this point of view, the suggested design procedure for binary recurrent AM is made
up of the following steps:
Step 1. Construct a graph G such that the set of characteristic vectors of MISs in G is equal
to the set of memory vectors and determine the adjacency matrix A of G.
Step 2. Design a convergent gradient-like dynamical system whose energy function is
equal to EMIS .
Note that, given a memory set M , the first step of the procedure can be easily performed
by applying (3.3), whenever M is compatible. As discussed in the previous section, each
memory vector is distinguishable as an MIS from the resulting graph, if and only if M is
compatible. With this assumption, now it remains to implement the second step, namely the
35
synthesis of a dynamical network to retrieve a memory vector which is close to its initial
state vector. To achieve this, we focus here on DHN.
3.1.3.1 Unipolar Discrete Hopfield Network
The considered discrete Hopfield recursion is an asynchronous update for the entries of the
binary state vector:
xi[k + 1] = φ
ti −n∑
j=1
wijxj[k]
, φ[α] =
1, α > 0
0, α ≤ 0(3.5)
where wij is the weight weight between neurons i and j, ti is the threshold of neuron i, and
xi denotes the state of neuron i.
A quadratic energy function associated to this recursion and defined on the state-space
of the network is given in the following matrix form:
E(x) = xTWx − tTx, (3.6)
where W := [wij] ∈ <n×n is the weight matrix, t := [ti] ∈ <n is the threshold vector,
and x := [xi] ∈ 0, 1n is the state vector. The following theorem, which is a modified
version of the one given in (Bruck & Goodman, 1988), provides a sufficient condition on
the convergence of the recursion (3.5).
Theorem 3 For a symmetric weight matrix W ∈ 0, 1n×n and a threshold vector t with
all unity entries, the recursion (3.5) is convergent, namely it converges to one of its fixed
points.
Proof: First we show that for a symmetric, binary weight matrix W and all unity
thresholds, the energy function (3.6) is non-increasing along the recursion (3.5). Let i-th
entry of the state vector be updated at an arbitrary time step k. The difference in the n-
36
dimensional state vector x can then be represented by the difference vector ∆x defined
by
∆xi = x[k + 1] − x[k] =
1 if xi[k + 1] = 1 and xi[k] = 0
−1 if xi[k + 1] = 0 and xi[k] = 1
0 otherwise
(3.7)
and all other n− 1 entries of ∆x is zero. Then, by using the symmetry of W, the difference
in the energy function E(x) can be written as
∆E = E (x [k + 1]) − E (x [k])
= (x[k + 1])TW x[k + 1] − (x[k])T
W x[k] − tT (x[k + 1] − x[k])
= (x[k + 1] − x[k])TW (x[k + 1] + x[k]) − tT (x[k + 1] − x[k])
= ∆xTW (2x[k] + ∆x) − tT∆x
= ∆xi
2n∑
j=1
wijxj[k] − ti
+ wii (∆xi)2 . (3.8)
The first term in (3.8) is nonpositive by the definition of the recursion together with
ti = 1 for every i, while the second term wii (∆xi)2 can be either 1 or 0. On the other hand,
the term 2∑n
j=1 wijxj[k] − ti is a nonzero integer since W and x are binary and ti = 1
for every i. Considering also (3.7), we conclude that the sum (3.8) is nonpositive. Since
there exist 2n possible states, E(x) takes values from a finite set, E(x) : x ∈ 0, 1n.
Then, non-increasing E(x) eventually takes one of these finite values and remains constant.
For the asynchronous update mode, ∆E = 0 implies one of these three cases: i) ∆xi =
0 ∀i; ii) wii = 1, ∆xi = −1, 2∑n
j=1 wijxj[k] − ti = 1; or iii) wii = 1, ∆xi = 1,
2∑n
j=1 wijxj[k]− ti = −1. The first case implies that x[k] is a fixed point, while the second
one implies a 1 → 0 transition in the i-th entry of the state vector. Case iii) implies that
the sum 2 ·∑nj=1 wijxj[k] is equal to zero. However this contradicts with the implications
wii = 1, ∆xi = 1 which means that the third case is impossible. Since 1 → 0 transitions
can occur for at most n successive steps, then the trajectory settles down to a fixed point
after finite number of time steps.
37
3.1.3.2 A DHN Free from Spurious Memories: MIS Network
In the light of Theorem 1 and Theorem 3, choosing the weight matrix W equal to the
adjacency matrix A and t equal to e, gives a convergent DHN whose energy function (3.6)
has local minima located exactly at the characteristic vectors of MIS’s, i.e. the memory
vectors. We call this specific DHN as the Maximal Independent Set Network (MIS-N). The
following theorem together with Theorem 1 provide that MIS-N satisfies also Condition 3
for a compatible set of memory vectors.
Theorem 4 The set of fixed points of MIS-N has a one-to-one correspondence with the set
of discrete local minima of (3.6).
Proof: Let x∗ denote a local minimum of (3.6). Then it is necessarily strict by Fact 2.
Let x be a binary vector which lies 1-Hamming distance away from x∗. Then, for some
index i one of the followings is true: i) x∗ = x + ui, ii) x∗ = x − ui, where ui stands for
the i-th unit vector.
Since x∗ is a strict local minimum, one can write:
(x∗)TAx∗ − eTx∗ < (x)T
Ax − eT x. (3.9)
In case i) the inequality (3.9) becomes:
(x∗)TAx∗ − eTx∗ <
(
x∗ + ui)T
A(
x∗ + ui)
− eT(
x∗ + ui)
Since A is symmetric, for x∗i = 0 we get
0 < 2(
ui)T
Ax∗ +(
ui)T
Aui − eTui. (3.10)
38
Note that (ui)TAui is either 0 or 1. Rearranging (3.10), one obtains (ui)
TAx∗ =
∑nj=1 aijx
∗j > 0 since aij, x
∗j ∈ 0, 1. This implies
φ
1 −n∑
j=1
aijx∗j
= 0 for all i satisfying x∗i = 0. (3.11)
Similarly, in case ii), we obtain the following inequality from (3.9).
(x∗)TAx∗ − eTx∗ <
(
x∗ − ui)T
A(
x∗ − ui)
− eT(
x∗ − ui)
.
Then, for x∗i = 1 we have
0 < −2(
ui)T
Ax∗ +(
ui)T
Aui + eT ui. (3.12)
Rearranging (3.12) yields (ui)TAx∗ < 1 for all x∗
i = 1 which is equivalent to
φ
1 −n∑
j=1
aijx∗j
= 1 for all i satisfying x∗i = 1. (3.13)
(3.11) and (3.13) imply that x∗ is a fixed point.
To show that the converse is also true, assume x∗ is a fixed point, so satisfies (3.11) and
(3.13). By the definition of φ(·), (3.13) implies (3.12) and (3.11) implies (3.10). These two
implications prove that x∗ is a (strict) local minimum since (3.9) is satisfied.
3.1.3.3 All Fixed Points of MIS-N are Attractive
The fixed point to which a specific initial state vector converges along the asynchronous
MIS-N recursion might be affected by the update order. In the following theorem, we prove
that for any point x which is in the 1-Hamming distance neighborhood of a fixed point x∗,
there exists at least one trajectory starting at x and ending at x∗.
39
Theorem 5 A fixed point x∗ in the MIS-N is attractive in the sense that for each vector
x, which is 1-Hamming distance away from x∗, there exists a trajectory starting at x and
ending at x∗.
Proof: Let x∗ be a fixed point of MIS-N and let x ∈ 0, 1n be a vector which is
1-Hamming distance away from x∗, i.e. there exists an index j such that x∗i = xi, ∀i 6= j
and either i) x∗j = 0 and xj = 1, or ii) x∗
j = 1 and xj = 0 holds. Suppose that x is injected
as the initial state vector to the network, i.e. x[0] = x, and j-th entry of the state vector is
chosen to be updated first. Since x∗ is a fixed point of the network and it also represents an
MIS in the graph represented by A, one can write in case i)
∑
i
ajixi =∑
i6=j
ajix∗i + ajjxj ≥
∑
i
ajix∗i ≥ 1. (3.14)
(3.14) together with x∗j = φ (1 −∑
i ajix∗i ) = 0 imply that xj[1] = φ (1 −∑
i ajixi) = x∗j =
0. In case ii), we have
∑
i
ajixi =∑
i6=j
ajix∗i + ajjxj ≤
∑
i
ajix∗i < 1. (3.15)
(3.15) together with x∗j = φ (1 −∑
i ajix∗i ) = 1 imply that xj[1] = φ (1 −∑
i ajixi) =
x∗j = 1. These two facts show that any fixed point x∗ is attractive in all directions if the j-th
entry, which distinguishes x∗ from its neighbor x, is chosen to be updated prior to the other
entries.
3.1.3.4 An Update Rule Provides Attractiveness for Each Memory Vector
The results derived above guarantee that, like nonpositive Hopfield network (Shrivastava
et al., 1995), the first two design considerations are satisfied by the MIS-N network
irrespective to the update order of the state vector entries. The fixed point to which a
specific initial state vector converges, might be affected by the update order. In the following
theorem, we prove that for any point x which is in the 1-Hamming distance neighborhood
of a fixed point x∗, there exists at least one trajectory starting at x and ending at x∗.
40
Theorem 6 A fixed point x∗ in the MIS-N is attractive in the sense that for each vector
x, which is 1-Hamming distance away from x∗, there exists a trajectory starting at x and
ending at x∗.
Proof: Let x∗ be a fixed point of MIS-N and let x ∈ 0, 1n be a vector which is
1-Hamming distance away from x∗, i.e. there exists an index j such that x∗i = xi, ∀i 6= j
and either i) x∗j = 0 and xj = 1, or ii) x∗
j = 1 and xj = 0 holds. Suppose that x is applied
to the recurrence (3.5) as the initial state vector x[0] = x and j-th entry of the state vector
is chosen to be updated first. Since x∗ is a fixed point of the network and it also represents
an MIS in the graph represented by A, one can write in case i)
∑
i
ajixi =∑
i6=j
ajix∗i + ajjxj ≥
∑
i
ajix∗i ≥ 1. (3.16)
(3.16) together with x∗j = φ (1 −∑
i ajix∗i ) = 0 imply that xj[1] = φ (1 −∑
i ajixi) = x∗j =
0. In case ii), we have
∑
i
ajixi =∑
i6=j
ajix∗i + ajjxj ≤
∑
i
ajix∗i < 1. (3.17)
(3.17) together with x∗j = φ (1 −∑
i ajix∗i ) = 1 imply that xj[1] = φ (1 −∑
i ajixi) = x∗j =
1. These two facts show that any fixed point x∗ is attractive in all directions if the j-th entry
which distinguishes x∗ from its neighbor x = x[0] is chosen to be updated prior to the other
entries.
If the update order of the states in the discrete Hopfield recurrence (3.5) is chosen to
be random (as usually done), then the network becomes nondeterministic. Hence, the
classical Lyapunov stability of a fixed point for such networks does not apply directly.
One can always use alternatively a deterministic update order so as to make the network
deterministic, which is indeed equivalent to that: There exists a unique trajectory starting
at each point in the state space of the network. But this trajectory might not guarantee the
desired error correction if an entry of the state vector other than the one considered in the
proof of Theorem 6, is chosen to be updated first. In order to ensure the Lyapunov stability
41
for the fixed points of MIS-N, we need i) to determine the cases in which some 1-Hamming
distance neighbors of the fixed point has the possibility (depending on the update rule) of
converging to other fixed points, and ii) propose an update order to avoid these cases.
In the following case study, x∗ is treated as a fixed point and x as a point which is 1-
Hamming distance away from x∗.
Case 1: Suppose x∗j = 0 and xj = 1. For any k 6= j, we can write
∑
i akixi =∑
i6=j akix∗i + akjxj =
∑
i6=j akix∗i + akj
=∑
i akix∗i + akj.
(3.18)
If x is applied to the MIS-N as the initial state vector x[0] and k-th state is chosen to be
updated at the first step, then one of the following cases occurs:
1. If x∗k = φ (1 −∑
i akix∗i ) = 0, then
∑
i6=j akix∗i ≥ 1 and hence
∑
i akix∗i + akj =
∑
i akixi ≥ 1. Thus, xk[1] = φ (1 −∑
i akixi) = 0 which means that no state
transition occurs.
2. If x∗k = φ (1 −∑
i akix∗i ) = 1, then
∑
i akix∗i = 0 since aki, x
∗i ∈ 0, 1. So,
(3.18) implies either i) xk[1] = φ (1 −∑
i akixi) = 1 (when akj = 0), or ii)
xk[1] = φ (1 −∑
i akixi) = 0 (when akj = 1). Case i) means no transition:
xk[1] = xk[0] = 1. However, case ii) means a 1 → 0 transition whenever there
exists an edge between the nodes j and k in the corresponding graph.
Case 2: Suppose x∗j = 1 and xj = 0. For any k 6= j, we can write
∑
i
akixi =∑
i6=j
akix∗i + akjxj =
∑
i6=j
akix∗i =
∑
i
akix∗i − akj. (3.19)
If x is applied to the MIS-N as the initial state vector x[0] and k-th state is chosen to be
updated at the first step, then one of the following cases occurs:
42
1. For x∗k = 1, we have akj = 0 since x∗ represents an MIS in the graph represented
by A. Then, x∗k = φ (1 −∑
i akix∗i ) = 1 and akj = 0 together with 3.19 imply that
xk[1] = xk[0] = 1. This means that no transition is possible for the k-th entry.
2. For x∗k = φ (1 −∑
i akix∗i ) = 0, we have
∑
i akix∗i ≥ 1. Then, 3.19 implies either i)
xk[1] = φ (1 −∑
i akixi) = 0 (when akj = 0), or ii)∑
i akjxi ≥ 2 (when akj = 1)
so xk[1] = 1. In case i) no transition occurs, i.e. xk[1] = xk[0] = 0. Case ii) means
a 0 → 1 transition. This case is possible but rare to face with since it occurs only
when the k-th node of the MIS represented by x∗ is connected to j-th node but not
connected to any other node of this MIS.
In the light of the above discussion, we propose the following search procedure as the an
update rule which provides a trajectory starting at the neighbor x and ending at x∗, hence
makes the MIS-N asymptotically stable.
Set j = 1. If no transition is available for the j-th entry, then increment j by 1. If a
0 → 1 (resp. 1 → 0) transition is valid for the entry j satisfying xj = 0 (resp. for j
satisfying xj = 1) of the current state vector x, before accepting it, check all other zero
entries k 6= j : xk = 0 (resp. unity entries k 6= j : xk = 1) whether a transition is
also valid for these entries or not. If invalid for all k, then accept the transition in the j-th
entry and increase j by 1. If valid for some k, then check all of the possible neighbor states
which will be obtained by updating these entries whether at least one of them is a fixed point
or not.1 If a neighbor is fixed point then accept the transition leading to this fixed point. If
not, then accept the valid transition in the j-th entry and increase j by 1. Stop the procedure
when all entries become unchanged.
3.1.4 Quantitative Properties of Boolean Hebb Rule
As already shown in Section 3.1.3.2, an AM designed by the Boolean Hebb rule has no
spurious memories if and only if the given set of memory vectors is compatible. Then,
the maximum number of n-dimensional vectors that can be embedded by this method to
1A point x ∈ 0, 1n is a fixed point of MIS-N iff Φ (e−W · x) = x, where Φ(·) is the diagonaltransformation from <n to 0, 1n defined by Φ(u) = [φ(u1) · · · φ(un)]T .
43
DHN recursion is equal to the maximum number of MIS’s that a graph with n-vertices may
contain.
A specific graph which contains disjoint triangles was investigated by Moon and Moser in
(Moon & Moser, 1965). Then, Erdos (Erdos & Erne, 1973) showed that this specific graph
has the maximum number of MIS’s among all graphs of the same number of vertices. This
specific graph was independently shown by Furedi (Furedi, 1987), and Moon and Moser
(Moon & Moser, 1965) to have exactly the following number of MIS’s.
Cmax(n) =
3n3 if n = 0 (mod3)
4 · 3n−43 if n = 1 (mod 3)
2 · 3n−23 if n = 2 (mod 3)
, for n ≥ 2. (3.20)
Shrivastava et. al. have given this number in (Shrivastava et al., 1995) as the capacity of
their nonpositive Hopfield network. Based on the results found in Section 3.1.3.2, we also
introduce this number as the reachable upper bound on the number of vectors that can be
stored in a DHN designed by Boolean Hebb rule. Although Cmax(n) (see Table 3.1) is
much greater than the capacities achieved by many available methods, it is actually not an
effective capacity of the considered AM, since not every memory set consisting of Cmax(n)
binary vectors is compatible. The compatible sets of cardinality Cmax(n) are indeed rare in
all of the binary sets of cardinality Cmax(n) as a consequence of very strict constraints on
the construction of the above mentioned specific graphs. Thus, we will call Cmax(n) as the
maximum capacity.
A given compatible set uniquely determines a graph whose MIS set is identical to the
given set. By the one-to-one correspondence between the set of MIS’s in a graph and
a compatible set of binary vectors, we can say that the number of all compatible sets
containing n-dimensional binary vectors is equal to the number of different graphs with
n-vertices that can be obtained by the Boolean Hebb rule, which is actually the number of
different adjacency matrices that can be obtained by (3.3).
44
Any graph with n vertices without self-loops can be represented by an n×n (symmetric)
adjacency matrix with all zero diagonal entries. The number of all such graphs is given by
N0(n) = 2n(n−1)
2 . (3.21)
If some nodes of the considered graph have self-loops, as possible in the Boolean Hebb
rule, then the diagonal of adjacency matrix may contain 1’s.
Fact 3 The number of different adjacency matrices that can be obtained by Boolean Hebb
rule is
Nc(n) = 2n(n−1)
2 +n∑
k=1
(nk) 2
(n−k)(n−k−1)2 (3.22)
where the subindex c, which will also be used below, refers to the compatibility.
Proof: The first term in (3.22) is the number of adjacency matrices with all zero
diagonal entries. Let A be an n × n adjacency matrix obtained by (3.3). It can be observed
that i-th diagonal entry aii of A can be unity only if the i-th row (consequently i-th column)
of A has all unity entries. In other words, the existence of any zero entry on the i-th column
of A implies aii = 0. The number of adjacency matrices with exactly k unity diagonal
entries is
(nk) · 2 (n−k)(n−k−1)
2 . (3.23)
The term (nk)in (3.23) is the number of all possible diagonals possessing exactly k unity
entries. For each of (n − k) zero diagonal entries, there are(
n−kk
)
off-diagonal binary
entries which are arbitrary. So, discarding the case k = 0 implying diagonal entries all
zeros, the number of adjacency matrices with at least one unity diagonal entry is given as
N1(n) =n∑
k=1
(nk) 2
(n−k)(n−k−1)2 (3.24)
45
Finally, the number of graphs that can be obtained by Boolean Hebb rule is the sum of the
two terms N0(n) and N1(n).
Dividing Nc(n) by the number of all n-dimensional binary vector sets gives the
probability that, under the uniform distribution assumption, an arbitrarily given set of n-
dimensional binary vectors is compatible when all sets of binary vectors are equiprobable:
pc(n) =2
n(n−1)2 +
∑nk=1 (n
k) 2(n−k)(n−k−1)
2
22n . (3.25)
Note that pc(n) decreases sharply as n goes to infinity, which means that the performance of
Boolean Hebb rule drastically decreases for increasing n if the number of memory vectors
is allowed to be arbitrary with no care of exceeding the maximum capacity. However, since
we know that any set of cardinality more than Cmax(n) is necessarily incompatible, then
another quantity can be introduced by pre-discarding these incompatible sets as a measure
of an effective performance of the Boolean Hebb rule. By restricting the given memory set
to contain not more than Cmax(n) elements, we obtain the probability of a given binary set
M be compatible with |M | ≤ Cmax(n)
pc (n|m ≤ Cmax(n)) =2
n(n−1)2 +
∑nk=1 (n
k) 2(n−k)(n−k−1)
2
∑Cmax(n)i=1 (2n
i )(3.26)
where m = |M |. This quantity is calculated for some n values and listed in Table 3.1 for
comparison.
Although the probability pc (n|m ≤ Cmax(n)) is still very small, the Boolean Hebb rule
provides a very good compression ratio if a large memory set is compatible. A memory set
M consisting of n dimensional binary vectors is embedded with perfect recall into an n×n
(symmetric and binary) adjacency matrix, which is represented by (n2 + n)/2 bits, while
M itself requires |M | · n bits. Considering the fact that the cardinality |M | of a compatible
memory set may reach Cmax, we define the best lossless compression ratio as the proportion
46
Table 3.1: The maximum capacity Cmax(n), the probability pc (n|m ≤ Cmax(n)), and the
best lossless compression ratio Rb.
n Cmax(n) pc (n|m ≤ Cmax(n)) Rb · 100
3 3 2.62 · 10−1 66
4 4 6.36 · 10−2 62
5 6 2.01 · 10−3 50
6 9 2.20 · 10−6 39
7 12 1.52 · 10−10 33
8 18 2.60 · 10−19 25
9 27 2.11 · 10−34 19
10 36 1.91 · 10−53 15
of the number of bits used for representing M in the adjacency matrix A to |M |·n, evaluated
at |M | = Cmax:
Rb =n + 1
2 · Cmax
. (3.27)
This ratio is given also in Table 3.1 for some n values. It should be pointed out that, as n
goes to infinity, the best lossless compression ratio goes to zero, indicating the remarkable
lossless compression performance of the Boolean Hebb rule for high dimensional vectors.
3.1.4.1 Comparison with Outer-Product Method
As explained in Section 3.1.3.2, compatibility of a memory set is a desired property for the
application of the Boolean Hebb Rule (BHR) as it provides the one-to-one correspondence
of the memory vectors to the fixed points of the resulting AM. That is why, in the
quantitative analysis given in the previous subsection, we have assumed that a given set
of binary vectors is compatible. In other words the preceding subsection presents the
performance analysis of the design method for perfect recall of the embedded vectors, which
is indeed a very strict restriction.
47
In order to examine the applicability of the BHR, below we first do not insist on avoiding
spurious memories but do insist on storing all elements of a given M completely as imposed
by Condition 2. This, indeed, corresponds to relaxing the compatibility assumption in the
way stated by the following fact.
Fact 4 All elements of a binary vector set M are stored as fixed points of a DHN, i.e. M
is completely stored BHR if i) M satisfies COMP1, and ii) there exists a compatible n-
dimensional binary vector set M which is a superset of M .
Proof: If M satisfies COMP1, then the independent sets represented by the memory
vectors do not cover each other. (See Theorem 2.) Then, the only case in which a memory
vector x ∈ M is not represented as an MIS after the design procedure (3.3) is that M
does not satisfy COMP2 (which results in some extraneous MIS’s), and in addition, an
extraneous MIS covers the independent set represented by x. If the set M , which is an
augmented version of M including the characteristic vectors of all extraneous MIS’s, is
compatible, then each element x ∈ M , implying x ∈ M , is represented as an MIS.
In order to compare the complete storage performances of BHR and the outer-product
method, we have produced 1000 sets containing exactly m different, randomly-chosen
binary vectors of dimension n drawn according to uniform distribution. Assuming each
of these sets as the given memory set, we have applied both of the methods to obtain DHNs.
Finally we have checked whether the considered set was completely stored in each network.
The percentages of the completely stored sets via the outer-product method (POPM%) and
via the BHR (PBHR%) in all 1000 sample sets are given in Table 3.2 for some m, n values.
Observe from Table 3.2 that the percentages obtained for outer-product are higher than
the corresponding ones obtained for BHR, and for the same m/n ratio the complete storage
performances of both of the methods decrease as n increases. This shows that the outer-
product is superior to BHR in the sense of complete storage for an arbitrary memory set
chosen according to the uniform distribution. However, this is not the case for sparse
memory sets, i.e. the sets containing relatively small number of 1 entries. To see that,
we have repeated the previous procedure for 1000 sparse memory sets drawn such that the
48
Table 3.2: Percentages of complete storage in the DHNs designed by the Outer-Product
Method (POPM%) and the Boolean Hebb rule (PBHR%) for uniformly distributed random
sets.
n m POPM% PBHR%
50 2 100 100
4 99 89
6 83 6
8 36 0
10 4 0
100 4 100 100
8 95 2
12 34 0
16 2 0
20 0 0
probability of choosing 0 as an entry of a memory vector is 66% while the probability of
choosing 1 is 33% (0 : 66%, 1 : 33%). The complete storage percentages obtained for such
memory sets are listed in Table 3.3. The percentages obtained for bit probabilities 1 : 66%,
0 : %33 which reflect the complete storage performances of the two methods for dense2
memory sets are also given in Table 3.3.
Observe from Table 3.2 and Table 3.3 that the complete storage performance of the outer-
product method reaches its maximum for equiprobable bits and decreases symmetrically as
the bit probabilities deviate from 1 : 50%, 0 : 50%. On the other hand, the performance of
the BHR continuously increase as the chosen memory sets get sparser.
We further relax the complete storage assumption by no longer insisting on storing all of
the given memory vectors in the resulting DHN and observe the proportion of the number of
stored vectors to the cardinality of the original (given) memory set. (This ratio will be called
as the storage percentage.) For the above mentioned bit probabilities, the average storage
2We call a set of binary vectors as dense if the number of its 1-entries is greater than that of 0-entries.
49
Table 3.3: Complete storage percentages POPM% and PBHR% for different bit probabilities.
n m 0 : 33%, 1 : 66% 0 : 66%, 1 : 33%
POPM% PBHR% POPM% PBHR%
50 2 100 100 100 100
4 89 9 88 98
6 19 0 18 79
8 1 0 0 38
10 0 0 0 6
100 4 99 84 98 100
8 0 0 0 85
12 0 0 0 9
16 0 0 0 1
20 0 0 0 0
percentages obtained by the outer-product method (AvPOPM%) and BHR (AvPBHR%) in
1000 random memory sets are presented in Table 3.4.
Table 3.3 and Table 3.4 confirm the well-known result (Dembo, 1989) on the storage
capacity of the outer-product method which states that the method stores 0.138n memory
vectors with probability almost 1. As m exceeds 0.138n, the storage percentages start to
decrease. Moreover, this decrement gets sharper as n increases for a fixed bit probability.
We can conclude by these results that the outer-product method has a better performance
than BHR in storing randomly chosen memory vectors as fixed points (either attractive or
not) in the resulting DHN, when the memory sets are dense or when the bit probabilities
are equal. However, BHR is a better alternative to outer-product method in storing sparse
memory sets as fixed points, which is the case in some applications such as character
recognition (see Figure 3.3). We have observed that our method starts to become superior
to outer-product method at the bit probabilities 1 : 35%, 0 : 65%. Moreover, as stated in
Theorem 6, all of these fixed points are attractive and this can not be guaranteed by the
outer-product method. It should also be noted that the BHR provides a better compression
50
Table 3.4: Average percentages AvPOPM% and AvPBHR% for different bit probabilities.
n m 0 : 33%, 1 : 66% 0 : 50%, 1 : 50% 0 : 66%, 1 : 33%
AvPOPM% AvPBHR% AvPOPM% AvPBHR% AvPOPM% AvPBHR%
50 2 100 100 100 100 100 100
4 95 60 100 95 97 99
6 61 7 97 57 54 98
8 30 3 88 23 24 88
10 10 0 68 5 11 75
100 4 99 84 100 100 99 100
8 28 1 99 33 29 99
12 4 0 90 2 3 77
16 1 0 65 1 1 39
20 0 0 38 0 0 10
ratio since the weight matrix of a DHN obtained by the outer-product method is in general
(signed) integer valued, while BHR always results in a binary weight matrix.
3.1.5 Simulation Results
3.1.5.1 A Compatible Example
The design procedure explained above is simulated for the following compatible set of
memory vectors.
x1 =[
1 1 0 1 0
]T
, x2 =[
0 0 1 1 0
]T
,x3 =[
1 0 0 0 1
]T
.
For each initial state vector x0 ∈ 0, 1n the steady state solution of the MIS-N and the
true mapping obtained by the binary association function (1.1) are listed in Table 3.5 where
the initial state vectors x0’s are represented by their corresponding decimal numbers.
51
Table 3.5: Simulation results of MIS-N.
x0 MIS-N f [x0] x0 MIS-N f [x0]
0 x1 x2 or x3 16 x1 x3
1 x3 x3 17 x3 x3
2 x1 x1 18 x1 x1
3 x3 x2 or x3 19 x3 x3
4 x2 x2 20 x2 x2 or x3
5 x3 x2 or x3 21 x3 x3
6 x2 x2 22 x2 x2
7 x3 x2 23 x3 x2 or x3
8 x1 x1 24 x1 x1
9 x3 x3 25 x3 x3
10 x1 x1 26 x1 x1
11 x3 x1 27 x3 x1
12 x2 x2 28 x2 x1
13 x3 x2 or x3 29 x3 x3
14 x2 x2 30 x2 x1
15 x3 x2 31 x3 x1
As seen in Table 3.5, for most of the initial state vectors, the results obtained by the MIS-
N and the binary association function agree. However, for the initial state vectors 0, 7, 11,
15, 16, 27, 28 and 30, MIS-N converges to erroneous memory vectors since the attraction
regions of the fixed points can not be trimmed, hence Condition 5 is not guaranteed by the
design method presented here.
3.1.5.2 A Compatibilization Procedure and its Character Recognition Application
In this example, we introduce a procedure for “compatibilizing” a given incompatible set of
binary vectors, via modifying some elements of the given set such that the resulting set is
compatible. By the modification of a binary vector, we mean complementing an entry of that
vector. To reach a compatible memory set still meaningful for applications, the number of
52
modifications performed in compatibilization should be as small as possible. It is a fact that
if a given set of vectors has cardinality |M | not greater than Cmax(n), then a compatible set
of cardinality |M | exists and can be obtained from the original incompatible one with a finite
number of bit modifications. At this point we introduce the nearest compatible set problem
as the problem of finding a compatible set of the same cardinality which is obtained by
applying the minimum number of modifications on a given incompatible set. For any given
set M of binary vectors, an approximate solution to this problem can be obtained by the
following algorithm. This algorithm basically determines the entries which cause violations
of COMP1 and COMP2 and modifies the associated vectors.
Algorithm 1
Step 1: Determine the sets:
C1 = (x,y) ∈ M × M | x,y violates COMP1C2 = (x,y, z) ∈ M × M × M | x,y, z violates COMP2If both C1 and C2 are empty, then go to Step 4.
Step 2: For each triple (x,y, z) ∈ C2, determine a triple of indices (i, j, k) such that
xi = yj = zk = 0 and xj = xk = yi = yk = zi = zj = 1.3 Set one of the entries
xj, xk, yi, yk, zi, zj to zero.
Step 3: For each couple (x,y) ∈ C1, determine the smallest index i such that xi = yi = 0.
If ‖x‖ > ‖y‖, then set yi = 1. Else, set xi = 1. If an index i such that xi = yi = 0 does
not exist, then determine the smallest index j such that xj = yj = 1. If ‖x‖ > ‖y‖, then set
xj = 0. Else, set yj = 0. Return to Step 1.
Step 4: Stop.
Ten 10×10 black-white decimal numerals given by the first row of Figure 3.3 are desired
to be stored as memory patterns to the recurrent AM. The binary vectors associated to the
digits are constructed by applying lexiographic ordering the columns of the image intensity
3Note that such a triple necessarily exists by Theorem 2.
53
(d)
(c)
(b)
(a)
Figure 3.3: (a) The original numerals to be stored as memory vectors. (b) The
compatibilized characters. (c) Some distorted numerals. (d) Numerals recalled by MIS-N.
matrix which is a binary matrix indicating the black pixels by 1’s and the white ones by
zeros. Then, we have 10 binary vectors each consisting of 100 entries. It can easily be
verified that this set is not compatible. For example, the vectors associated to the digits 3
and 8 violate COMP1.
Applying the above given compatibilization algorithm, we obtain the modified vectors
which represent the modified characters given by the second row of Figure 3.3. Using this
memory set, we have designed an MIS-N via the Boolean Hebb rule and applied some
distorted versions of the original numerals given by the third row of Figure 3.3 as initial
states. The memory vectors recalled by the network for these distorted vectors are given in
the last row Figure 3.3.
3.2 Recurrent Associative Memory Design via Path Embedding into a Graph
Following another graph theoretical approach, a new recurrent AM design method is
described in this section which ensures the storage of all memory vectors as attractive
equilibria of a convergent DHN.
3.2.1 Proposed Method
The tool we use for representing a given memory set M = x1,x2, . . . ,xm ⊆ 0, 1n
here is a directed graph G = 〈V, E〉 which consists of a set V of n + 2 nodes labelled with
54
v
v v
v v
v
1 2
3 4
s d
Figure 3.4: The graph indicating the binary vectors [0 1 0 1]T and [1 0 1 1]T as its paths
between the nodes vd and vs.
vs, v1, v2, . . . , vn, vd, and a set E ⊂ V × V of ordered pairs (vi, vj) satisfying j > i, called
increasing arcs. (It is assumed that 1 > s and d > n.)
A path P is a subset of E which provides a route from vs to vd via its elements, so
produces an increasing sequence SP of nodes starting at vs and ending at vd. Given a path
P , there exists a unique binary vector [1... xT ... 1]T , so called the node-based characteristic
vector of P , whose 1-entries are indicated by SP as vi ∈ SP ⇔ xi = 1, i = 1, 2, . . . n.
Conversely, any n + 2-dimensional binary vector having 1’s as the first and the last entries,
indicates a unique increasing node sequence in the same way. Hence, it is indeed the node-
based characteristic vector of a path in the n + 2-node graph. As an example, the graph
indicating the binary vectors [0 1 0 1]T and [1 0 1 1]T is given in Figure 3.4. Based on this
representation, we introduce our embedding procedure as follows:
Step 1. For i = 1, 2, . . . , m, augment 1’s as the first and the last entries to the memory
vector xi, and then construct the graph Gi = 〈V, Ei〉 containing a single path indicated
by the node-based characteristic vector [1... (xi)T ... 1]T .
Step 2. Combine G1, G2, . . . , Gm in a single graph by
G = 〈V, E〉 =
⟨
V,n⋃
i=1
Ei
⟩
. (3.28)
55
Note that the resulting graph G represents all memory vectors since it contains their
corresponding paths. Besides, this representation might provide binary data compression
when G is expressed in an upper-triangular, zero-diagonal, node-to-node incidence matrix
form T:
tij =
1 if (vi, vj) ∈ E
0 otherwise, (3.29)
which is of dimension (n+2)×(n+2), so independent of the number m of memory vectors.
Given a graph G of n+2 nodes, the problem of retrieving an embedded memory vector is
equivalent to the problem of extracting its indicating path. To achieve this, we first label the
increasing arcs of G with e1, e2, . . . , el where l = |E|. This labelling enables us to define
the arc-based characteristic vector y ∈ 0, 1l associated to a path P in G as ei ∈ P ⇔yi = 1, i = 1, 2, . . . , l. Then, the node-to-arc adjacency matrix form A of G:
aij =
1 if arc ej departs from node vi
−1 if arc ej arrives at node vi
0 otherwise
, (3.30)
which is of dimension (n+2)×l, makes it possible to distinguish the arc-based characteristic
vectors of paths from other l-dimensional binary vectors in an algebraic way, as stated by
the following fact proven in (Bazzaraa & Jarvis, 1977).
Fact 5 An l-dimensional unipolar binary vector y satisfies
Ay = b (3.31)
if and only if y is the arc-based characteristic vector of a path in G. Where b stands for an
n + 2-dimensional vector defined as b1 = −1, bn+2 = 1 and bi = 0 for i = 2, 3, . . . , n + 1.
56
DHNNode
toArc
ConversionNode
toArc
Conversion
yx01
1
y0
x1
1f(x )0
1
1∼∼
Figure 3.5: Block diagram of the proposed associative memory.
It is easy to see that the unipolar binary solutions of (3.31) correspond to the discrete
local minima of the positive semi-definite quadratic:
Φ(y) =1
2‖Ay − b‖2
2 =1
2yTATAy − bTAy +
1
2bT b. (3.32)
Then, initiated by an l-dimensional binary vector y0, the unipolar DHN recursion (3.5) with
W = ATA and t = ATb, converges to an arc-based characteristic vector associated to
a path in G. Together with an algebraic layer performing arc-to-node conversion of the
state vector y at the output, this network resembles [1... (f [x0])T ... 1]T as its steady-state
response, where f [·] is the association function defined by (1.1), if y0 is chosen as the
arc-based characteristic vector, which is indicated by the node-based characteristic vector
[1... (x0)T ... 1]T . This operation is illustrated in Figure 3.5.
3.2.2 Simulation Results
To demonstrate the performance of the proposed method, we have generated 100 random
memory sets containing m binary vectors of dimension n and observed that, for each instant,
the resulting dynamical network had fixed points located exactly at the augmented memory
vectors, and that there occurred some fixed points other than the desired ones. The average
number of these spurious memories over 100 trials are listed on the third column AvS0 of
Table 3.6 for some n and m values. The fourth column AvS2 of Table 3.6 includes the
average number of spurious memories which are located more than 2 Hamming distance
away from the nearest memory vector.
57
Table 3.6: Average number of spurious memories for some n, m values.
n m AvS0 AvS2
5 3 2.1 0
5 4.9 0
10 10.7 0
7 4 7.9 0.2
7 26.7 0.7
15 57.8 0.4
10 5 45.8 14.7
10 226.4 68.4
20 535.7 121
As verified by these results, the proposed method is superior to the conventional methods
in the sense that it can store an arbitrary collection of memory vectors as fixed points in
the resulting network. However, it cannot avoid spurious memories, because the graph G
obtained by the embedding procedure possibly includes some extraneous paths whose node-
based characteristic vectors are excluded by M . Obviously, the arc-based characteristic
vectors of such paths also minimize (3.32), hence constitute spurious memories. It has
also been observed that the average number of spurious memories caused by the method
is greater than that of the outer-product rule for large n values. On the other hand, most
of these fixed points are located in a small neighborhood of the desired ones, hence cause
smaller errors in recall.
This method can be improved by assigning weights to the increasing arcs of G and
adjusting these weights such that the desired paths have a specific length, say 1, in order
to distinguish them from the undesired ones.
CHAPTER FOUR
CONSTRUCTION OFENERGY LANDSCAPE FOR
DISCRETE HOPFIELDASSOCIATIVE MEMORY
An energy function-based auto-associative memory design method to store a given set
of unipolar binary memory vectors as attractive fixed points of an asynchronous discrete
Hopfield network is presented in this chapter.
4.1 Motivation
A comprehensive stability analysis for DHN presented in (Bruck & Goodman, 1988) has
shown that the asynchronous recursion (2.2) necessarily tends to a fixed point if W is
symmetric and has nonpositive diagonal entries. The proof is based on analysis of a discrete
quadratic energy function
E(x) = xTQx + cTx, (4.1)
defined on the state-space of (2.2) with Q ∈ <n×n and c ∈ <n, which is non-increasing
along the asynchronous recursion.
It can be further shown that (2.2) indeed tends to a (discrete) local minimum of (4.1) if
the diagonal entries of the weight matrix are all zero, i.e. a network consisting of a single
layer of neurons without self-feedback. On the other hand, as proven in the next section,
59
one can always find a symmetric DHN without self-feedback which has fixed points located
exactly at the discrete local minima of a given quadratic form defined on the binary space.
As the network model to be designed in order to perform (1.1), we consider here again
an asynchronous DHN operating on 0, 1n according to the recursion:
xi[k + 1] = φ
n∑
j=1
wijxj[k] + ti
, i ∈ 1, . . . , n (4.2)
where W = [wij] ∈ <n×n is the weight matrix and t = [ti] ∈ <n is the threshold vector.
Assuming that the given memory vectors are at least 2-Hamming distance away from
each other, the design of such a finite-state recurrent network, while ensuring perfect
storage, is in fact equivalent to the design of its energy function, i.e. determining coefficients
(Q, c) in (4.1), under the following condition.
Condition 6 x ∈ M implies that x is a strict local minimum of (4.1), i.e. E(x) < E(y) for
all y ∈ 0, 1n such that d(x,y) = 1.
Instead of dealing directly with the considered DHN recursion, we follow this indirect
energy function-based approach in this chapter.
4.2 Discrete Quadratic Design
To simplify the notation, we first note that the n × n matrix Q in (4.1) can be considered
without loss of generality as symmetric, since for an arbitrary matrix P ∈ <n×n, there exists
a symmetric counterpart R = (P + PT )/2 such that xTRx = xTPx for all x ∈ <n.
Due to the unipolarity of the discrete variable x ∈ 0, 1n, the linear term cTx can
be further expressed as a quadratic term: xT diag(c1, c2, . . . , cn)x to reformulate E(·) as a
single quadratic term
E(x) = xT Qx (4.3)
60
on 0, 1n, where Q is a symmetric real matrix equal to Q+diag(c1, c2, . . . , cn). Expanding
(4.3) as the sum∑n
i=1
∑nj=1 qijxixj and then using the symmetry of Q, provides an
alternative notation
E(x) = a(x)Tw (4.4)
which is linear in the coefficient vector w = [q11 · · · q1n... q22 · · · q2n
... · · · ... qnn]T ∈<(n2+n)/2 obtained by a lexiographic ordering of the coefficients qij’s. The column vector
a(x) represents the multiplicative nonlinearity of (4.3) in x:
a(x) := 2
[
x21
2
...x1x2... · · · ...x1xn
......x2
2
2
... x2x3... · · · ...x2xn
...... · · · ......x
2n
2
]T
. (4.5)
Expressing (4.3) as the weighted sum of parameters in this way enables computation of the
coefficient vector w∗ in <n(n+1)/2 under linear inequality constraints to construct a Q such
that (4.3), and consequently (4.1), satisfies Condition 6 for a given set of memory vectors
M ⊆ 0, 1n.
We assume throughout the design that the condition
d(u,v) > 1 ∀(u,v) ∈ M × M, u 6= v (4.6)
holds for a given memory set M . Then, in order to embed each memory vector as a strict
local minimum of the desired quadratic (4.1) as suggested by Condition 6, we obtain for
each memory vector p ∈ M the set of strict linear inequalities
a(p)T w < a(y)Tw, y ∈ B1(p) − p (4.7)
to be solved for the parameter vector w. Here, ”-” stands for the set difference and B1(u)
is defined as the 1-Hamming neighborhood of u as x ∈ 0, 1n : d(u,x) ≤ 1. We
61
denote the polyhedral cone induced by these n linear inequalities by Sp. Since the desired
coefficient vector w∗ lies within the intersection
S =⋂
p∈M
Sp, (4.8)
its search is indeed the feasibility problem of the homogeneous linear inequalities which
induce the polyhedral cone S. By rearranging (4.7) and incorporating all inequalities
associated to all memory vectors p|M |i=1, we obtain the homogeneous inequality system
as
A(M)w < 0, (4.9)
where
A(M) :=
N(p1)...
N(p|M |)
. (4.10)
and N(p) is the n × (n2 + n)/2 matrix whose j-th row is determined as a(p)T − a(yj)T
with yjk = 1 − pk for j = k and yj
k = pk otherwise.
4.2.1 Original Design Method
If the given memory set yields a feasible inequality system, then the coefficient matrix Q
of the desired quadratic (4.3), so the coefficients (Q, c) of (4.1) which satisfy A1 can be
easily determined from a solution of (4.9). Hence, in our original design method, we look
for a solution of (4.9) by directly applying an available method, such as linear programming
(Luenberger, 1973) or Newton’s method (Bertsekas, 1995). Having determined the discrete
quadratic energy function (4.1) which indicates the memory set as its local minima set,
construction of a dynamical system, whose limit points correspond to these local minima,
completes the recurrent AM design. We describe this procedure in the following corollary.
62
Corollary 1 The fixed points of asynchronous recursion (4.2) correspond to the local
minima of (4.1) for the weight matrix W = −2Q and the threshold vector t = −c.
Moreover, recursion designed in this way is convergent, namely for any initial state it
converges to one of its fixed points.
Proof: The state vector x of a finite-state dynamical system converges to a local
minimum of a discrete function E(x) if every state transition provides a decrement in E(x).
The proof is based on imposing this condition on the DHN dynamics whose state-space is
the unipolar binary space 0, 1n. Taking into account that only one entry of the state vector
is allowed to change at a single time step, we analyze the desired behavior of the network
in two separate cases:
1. Suppose xi = 0 and i-th entry is updated at time instant k, then the value of this entry
in the next step should be:
xi[k + 1] =
0 if (x[k] + ei)TQ(x[k] + ei) + cT (x[k] + ei) <
(x[k])TQx[k] + cTx[k]
1 otherwise
(4.11)
where ei stands for the i-th unit vector. Since the diagonal entries of Q are all zero,
we rearrange (4.11) and formulate it as
xi[k + 1] = φ
−2n∑
j=1
qijxj[k] − ci
, (4.12)
where φ(·) is the unit-step nonlinearity.
2. Suppose now that xi = 1 and i-th entry is updated at time instant k. Then we write:
xi[k + 1] =
0 if (x[k] − ei)TQ(x[k] − ei) + cT (x[k] − ei) <
(x[k])TQx[k] + cTx[k]
1 otherwise
(4.13)
which can be expressed exactly as (4.12).
63
Comparing (4.12) with (4.2) we conclude that the desired network can be obtained by
choosing W = −2 · Q and t = −c. The convergence yields from the well-known result of
Bruck and Goodman in (Bruck & Goodman, 1988) as Q here is considered as zero-diagonal.
Observe from the proof of Corollary 1 that the resulting network is an energy-minimizing
network, in the sense that a state transition is accepted if and only if it causes a decrement
in (4.1). Since a point in the 1-Hamming neighborhood of a local minimum x∗ has strictly
greater energy by the construction of (4.1), then we conclude that each fixed point of the
network, which corresponds to a memory vector, is attractive. This implies that for each y
in the 1-Hamming neighborhood of a fixed point x∗, there exists an index i ∈ 1, 2 . . . , nsuch that, the state vector necessarily converges to x∗ in a single step, when the network is
initiated by y and i-th neuron is updated first. By deciding on a random update order, we
obviously can not ensure this convergence, because there is no guarantee that i-th neuron
will be updated first in the case mentioned above. For this purpose, as we did in deriving
the Boolean Hebb rule in the previous chapter, we propose the following update order to be
followed by the resulting network, which ensures the correction of each 1-bit distortion of
the memory vectors.
Attempt to update j-th neuron for the current state vector for j ∈ 1, . . . , n. If any of
the state transitions leads to a fixed point 1, then accept that transition. Otherwise choose
an arbitrary transition.
4.2.2 Applicability of the Original Method
It is evident by the above derivation that the feasibility of (4.9), i.e. non-emptiness of the
polyhedral cone S, is necessary and sufficient to embed all memory vectors as attractive
fixed points of DHN, hence the success of the method is totally dependent on the given
memory set M , which is the only information used in the construction of the inequality
system (4.9). Although we know that S might be an empty set for some M , yet we can only
be sure of this by constructing (4.9) and then checking for its feasibility by attempting to
1A point x ∈ 0, 1n is a fixed point of DHN if and only if Φ (e−Wx) = x, where Φ(·) is the diagonaltransformation from <n to 0, 1n defined by Φ(u) = [φ(u1) · · · φ(un)]
T .
64
solve it. A simple result on the feasibility of the inequality system (4.9) is provided by the
following fact.
Fact 6 The inequality system (4.9) is feasible for any memory set containing a single, yet
arbitrary binary vector p ∈ 0, 1n.
Proof: Let us define u = x−p and observe that the positive definite quadratic Q(u) =
‖u‖22 on 0, 1n possesses a unique strict local minimum at u = 0. Then, p is the (unique)
strict local minimum of the quadratic P (x) = Q(x − p).
Although the original method described in the Section 4.2.1 is not applicable for a
memory set which yields an infeasible inequality system, we extend this method in the
following subsection to carry on the design even if (4.9) is infeasible.
4.2.3 An Extension of the Method
In this subsection, we assume that the system (4.9) of n · |M | homogeneous inequalities has
no solution for a given M which satisfies (4.6). As no discrete quadratic energy function
possessing such a memory set as strict local minima exists in this case, the best one can do is
to construct a discrete piecewise quadratic function instead, to be minimized by a modified
version of DHN. For this purpose we need to partition the inequality system (4.9) into two
feasible systems as A1w < 0 and A2w > 0. Such a partitioning is always possible by the
following constructive proposition, since (4.9) contains no zero row.
Proposition 1 If A ∈ <l×k has no zero row, then the following algorithm provides an
w∗ ∈ <k such that
∑ki=1 auiw
∗i < 0, ∀u ∈ I1
∑ki=1 aviw
∗i > 0, ∀v ∈ I2,
(4.14)
where I1 and I2 are two disjoint integer sets with I1 ∪ I2 = 1, . . . , l.
65
Step 0: Choose an arbitrary w0 ∈ <k. Construct three matrices A1, A2 and A3 each of
which consists of the rows of A whose inner products with w is negative, positive and zero,
respectively. Set k = 0.
Step 1: Let u be the number of rows of A3. If u = 0, then set w∗ = wk and stop. If not,
choose an arbitrary i ∈ 1, . . . , u and a sufficiently small positive ε such that A1(w +
ε · ai) < 0 and A2(w + ε · ai) > 0, where ai is the transpose of the i-th row of A3. Set
wk+1 = wk + ε · ai.
Step 2: Augment the rows of A3 whose inner products with wk+1 is positive (negative) to
A1 (to A2). Delete these rows from A3, increment k by 1 and return to Step 1.
Proof: The algorithm finds a point which does not belong to any of the hyperplanes
defined by the rows of A, and is indeed an escape procedure from the null-space of the rows
that are orthogonal to the initial choice w0. If at any step k of the algorithm, the vector
wk is orthogonal to a row of A, say alT (which is indeed a row of A3 in step k), then the
vector wk+1 = wk + ε · al calculated at Step 2 obviously does not belong to the hyperplane
represented by alTw = 0 for all ε 6= 0, since al 6= 0 by assumption. On the other hand,
choosing ε 6= 0 sufficiently small in magnitude guarantees that the wk+1 obtained in this
way does not belong to the hyperplanes defined by the rows of A1 nor A2, as wk was indeed
in an open half-space bounded by these hyperplanes. So the algorithm eventually produces
a vector w∗ which is not orthogonal to any rows of A. Note that this result also establishes
that A can be partitioned as suggested in (4.14).
When the algorithm is applied to A(M) in (4.9), which is now considered as infeasible,
the matrices A1 and A2 produced by the algorithm induce two disjoint feasible subsystems
of the inequality system (4.9). Although the coefficient vector w∗ produced by the algorithm
satisfies all inequalities induced by A1, as A1w∗ < 0, the inequalities induced by A2 are all
violated: A2w∗ > 0. Consequently, the former inequality system gives rise to a quadratic
energy landscape coefficients (Q, c) constructed by w∗, while the latter yields (−Q,−c)
constructed by −w∗.
66
Recall from the construction of A(M) described just below (4.9) that each inequality in
(4.9), thus each row of A1 and A2, indeed imposes a restriction on the energy of a specific
vector y ∈ B1(x) to satisfy E(x) < E(y), where x is a memory vector. Let D denote the
set of binary vectors restricted in this way by the inequality subsystem A2w < 0, and let
D := 0, 1n − D. Then, we conclude by the above discussion that each memory vector is
a strict local minimum of the piecewise quadratic function
EPQ(x) =
xTQx + cTx if x ∈ D,
−xT Qx − cTx if x ∈ D(4.15)
where the coefficients (Q, c) are calculated by using w∗ as described in the beginning of
this section.
The best performance a conventional asynchronous symmetric DHN can achieve as a
binary associative memory is indeed provided by the original design method described in
the Section 4.2.1. However, in the present case, no regular quadratic form (4.1) satisfying
A1 exists, thus there exists no asynchronous symmetric DHN having attractive fixed points
located at the memory vectors. A modification of (4.2) becomes thus necessary to minimize
the discrete piecewise quadratic energy function (4.15), which we have constructed instead
of (4.1) in this case.
To minimize a given continuous piecewise quadratic, the idea of choosing the weights
and thresholds of a continuous recurrent network dependent on the state vector is not new
(Park et al., 1993). But, to our knowledge, no asynchronous recursion has been proposed
for minimizing a discrete piecewise quadratic form yet. To minimize a discrete piecewise
quadratic function of the form (4.15), we propose a generalized version of (4.2) with state-
dependent weights and thresholds, and investigate its qualitative performance below.
Definition 4 The generalized version of the recursion (4.2) as
xi[k] = φ
(
h [x [k]]
(
n∑
i=1
wijxj[k] + ti
))
(4.16)
xi[k + 1] =1 − h [x[k]]
2xi[k] +
1 + h [x[k]]
2xi[k], (4.17)
67
where h[·] : 0, 1n → −1, 1 is a discrete function that separates a subset D ⊆ 0, 1n
from its complement D as
h[u] =
1 if u ∈ D
−1 if u ∈ D(4.18)
is called the Constrained One-Nested Discrete Hopfield Network (CON-DHN).
We use the terms constrained and resp. one-nested here to point out the additional
constraint imposed by (4.17) on the original recursion (4.2), and resp. the parameter control
mechanism h[·], which can be realized as a discrete multi-layer perceptron (Rosenblatt,
1962), bringing an additional nonlinearity nested in the activation function φ(·).
Corollary 2 The asynchronous recursion (4.16-4.17) has fixed points located at the local
minima of (4.15) for the weight matrix W = −2 · Q and the threshold vector t = −c.
Moreover, these fixed points are all attractive.
Proof: Suppose first that the state vector x[k] of the network, which designed in this
way, is in D at any time instant k. Then, h[x[k]] is equal to 1 and the right-hand side of
(4.16) is the same as that of (4.2), which implies that any state transition which causes a
decrement in the quadratic form (4.1) is accepted and the outcome is assigned to x[k] as the
new state candidate. If this new vector is also in D, then its value is assigned as the new
state vector x[k + 1], so the network operates as in the unconstrained case (4.2), which was
proven to be an energy minimizer in Corollary 1. However, if such a transition leads to a
point in D, i.e. if h[x[k]] = −1, then the second equation (4.17) imposes x[k+1] = x[k]. In
this way CON-DHN restricts the state vector to stay in D. Then, for any initial state vector
in D the state vector of the network evolves in D as guaranteed by (4.16).
On the other hand, if an initial state vector is in D, i.e. h[x[0]] = −1, then the right-
hand side of (4.16) yields a state transition which causes a decrement in the quadratic form
(−Q,−c), i.e. an increment in (4.1). If the new state candidate is in D, then x[1] assigned
by (4.17) is in D and the network operates as in the previous case in further time steps.
68
Otherwise, the candidate is not accepted as the new state, and in the next time step (4.16)
produces another candidate, until a point in D is obtained. Such a point is necessarily
produced by (4.16) at some time step, because, by construction of D, any point z in D has
a 1-Hamming distance neighbor in D which has lower energy than z. This also establishes
the attractiveness of the fixed point, as one of these neighbors is indeed a local minimum of
(4.15).
Since the recursion (4.16-4.17) designed in this way is also an energy minimizing
network, which converges to a local minimum of (4.15), we can conclude that the resulting
CON-DHN corrects all possible 1-bit distortions of the original memory vectors, with the
update order of the neurons chosen as proposed at the end of the previous section. The
summary of the overall design method proposed in this section is illustrated as a flowchart
in Figure 4.1.
The necessity of algebraic computations h[x[k]] and h[x[k]], which are used in the update
of the state vector by (4.16-4.17) can be justified as follows: CON-DHN performs a more
complicated task than that of the conventional DHN, as it can be designed to recall each
element of an arbitrary M from its distorted versions, even when M does not comply with
the restriction (4.9). This relaxation costs an additional Multi-Layer Perceptron (MLP) to
perform h[·] in addition to the computations required by conventional DHN recursion. The
proposed network is illustrated in Figure 4.2.
From the information storage point of view, the weights wij and the threshold tj of this
MLP in addition to the parameters W and t of the conventional recurrence (which have
been determined by Corollary 2) are needed to be known. In order to minimize the amount
of this additional data that characterizes MLP, one should adjust the coefficient vector w
in the energy landscape design such that the number of hyperplanes which separate D
from D is minimum. We note that an approximate solution to this problem is to minimize
the cardinality of D in the design, which can be approximated by finding a least squares
solution to the infeasible inequality system (4.9) by Han’s method (Han, 1980), or by its
variations, e.g. (Pınar & Chen, 1999) or (Bramley & Winnicka, 1996). An exact solution to
69
GivenM 0,1n
satisfy (4.6).which
ConstructA(M)w < 0
Is it feasible ?
Solve itand obtain
E(x) = x Qx+c xT T
yes
Use Corollary 1and find a DHN
no
PQE (x)
ApplyProposition 1and obtain
Design a MLP to implement h(.)
Use Corollary 2 and find a CON-DHN
Figure 4.1: An algorithmic summary of the overall design method.
70
x[k] x[k+1]
z I-1
WeightControl
Mechanism
Σ φ (.)
t
W
Figure 4.2: Block diagram of the extended network.
the following problem would obviously minimize the number of parameters of the MLP, so
would make our design more efficient.
Problem 1 Given D ⊆ 0, 1n and D = 0, 1n − D, find a set of hyperplanes with
minimum cardinality which separates D from D.
We leave this problem open and proceed with the simulation results.
4.3 Computer Experiments
4.3.1 Applicability and Capacity of the Original Design Method
The original design method proposed in Section 4.2.1 is applicable only for a memory set
which satisfies (4.6) and only when this set yields a linear homogeneous strict inequality
system (4.9) derived from the local minimality conditions. Hence, the probability that a
given memory set M yields a feasible inequality system is a measure of the performance
of the design method if M satisfies (4.6), which is the case in many applications. To
quantify the applicability of the original method, we have investigated the mildness of these
restrictions.
We have generated 100 binary memory sets containing |M | unipolar binary vectors
of dimension n randomly, all satisfying (4.6), for some |M |, n values and constructed
the homogeneous inequality system associated to each set as described by (4.9). The
71
Table 4.1: Percentages of memory sets that yielded feasible inequality systems.
n |M | P% n |M | P%
10 5 100 50 25 100
10 90 50 100
15 9 75 78
20 0 100 5
20 10 100 100 50 100
20 100 100 100
30 62 150 86
40 0 200 8
percentages (P%) of memory sets which resulted in a feasible inequality system are given
in Table 4.1.
It can be observed from Table 4.1 that almost all memory sets with ratio |M |/n less
than 1 gives a feasible inequality system, so our original method is applicable for such sets.
Moreover, as n increases, this critical ratio also increases. This means that our method has
better performance for large n values. This bound of this ratio (which is 1 in the worst case)
under which our method almost ensures the desired recurrent AM, is much greater than that
of the conventional outer product rule which ensures the storage of only 0.14n arbitrarily
chosen memory vectors as fixed points (without ensuring their attractiveness). Assuming
that the memory vectors are mutually orthogonal, projection learning rule is capable of
embedding up to n binary vectors as attractive fixed points to a DHN. However, this bound
is not comparable to ours since orthogonality is a rather strict restriction on the memory
vectors. In other words, the projection learning rule has an acceptable performance when
applied to some specific memory sets among all memory sets with |M |/n < 1. The eigen-
structure method, which is probably the most effective design method yet, can store an
arbitrary binary memory set as attractive fixed points. As introduced at the end of Chapter 2,
this method has been proposed for the design of a continuous Hopfield network whose state
space is the n-dimensional hypercube [0, 1]n, including its interior region. The attractiveness
72
of a fixed point is defined on this space but not on 0, 1n. The method does not guarantee
the correction of errors caused by a bit reversal of the memory vectors despite its providing
attractiveness in the continuous sense. The following design example demonstrates the
superiority of the proposed procedure to the former methods.
4.3.2 A Design Example
Consider that a memory set consisting of the following four vectors is to be stored as
attractive fixed points of a recurrent neural network.
x1 = [0 1 0 0 1]T , x2 = [0 1 1 1 1]T , x3 = [1 0 1 0 1]T , x4 = [1 1 0 1 1]T .
Note that these memory vectors satisfy (4.6). By applying the proposed method and making
use of linear programming for the solution of the homogenous linear inequalities, we have
obtained the weight matrix and the threshold vector of the asynchronous DHN recursion
(4.2) as
W =
0 −15.2 −7.5 7.6 5
−15.2 0 −15.2 15.1 9.2
−7.5 −15.2 0 7.6 5
7.6 15.1 7.6 0 −6.1
5 9.2 5 −6.1 0
, t =
−6.3
−3.3
−6.3
12.8
−1.3
.
It can be verified that each memory vector is an attractive fixed point of this AM. The
projection learning rule and the eigen-structure method (for design parameter τ = 0.5)
could also store each vector as a fixed point of the recurrent networks of their concern, while
the outer product rule could not store any of these vectors at all. By injecting each of the
32 binary vectors of dimension 5 to the networks obtained by these methods, we have also
checked their performance in terms of creating spurious states. This simulation has shown
that our method caused no spurious memory while the three extraneous binary vectors
[0 0 1 1 0]T , [1 0 0 1 0]T , [1 1 1 1 1]T were stored as fixed points in the network obtained
by the projection learning rule. The outer product rule also stored two spurious memories
73
Figure 4.3: Set of characters which are embedded by the original design method as memory
vectors to DHN.
[0 1 0 1 1]T and [1 0 1 0 0]T . For the design parameter τ = 0.5, the eigen-structure method
created four spurious memories, namely [0 0 1 0 1]T , [0 1 0 1 1]T , [1 0 0 0 1]T , [1 0 1 1 1]T
but they could be avoided by increasing τ . However, this effect also prevented the desired
memory vectors from being stored. As an example, for τ = 1, no binary fixed point could
be stored as a fixed point to the network by this method.
4.3.3 Character Recognition and Reconstruction
We applied the design procedure for the set of characters given in Figure 4.3. The
lexiographic orderings of these 13 × 10 black-white characters, where 1 and 0 denote a
black and a white pixel, respectively, have been considered as the given memory vectors. It
has been observed that this memory set satisfies (4.6). These 130-dimensional vectors have
resulted in a consistent inequality system, so we have generated a regular quadratic energy
function by solving it. The fixed points of DHN obtained as stated by Corollary 1 were
identical with at the original memory vectors. It can be verified that DHNs designed by the
outer product method and projection learning rule can not store this information without
any modification on the original characters. The network designed by the eigen-structure
method stores all memory vectors but it is incapable of correcting most of the errors caused
by 1-bit reversals on the original characters. Moreover, the convergence of the state vector
of this network to some non-binary fixed points in [0, 1]n was observed for some initial
conditions. This, of course, cannot be considered as a correct behavior.
Although the original method ensures only the correction of 1-bit errors, many 10-bit
distortions, even some 20-bit distortions, of the memory vectors can be corrected by the
resulting DHN, i.e. the basin of attraction of some memory vectors includes even some
74
Figure 4.4: Reconstructions obtained by the resulting DHN.
Figure 4.5: Three memory patterns used in the classification application.
of its 20-Hamming neighborhood. Some of these corrections are illustrated in Figure 4.4.
Interestingly, no spurious memory was detected during the simulations of the DHN designed
for this memory set, despite the fact that the method does not devise any procedure to avoid
spurious memories.
4.3.4 A Classification Application
We have also tested the performance of the recurrent AM as a classifier. The classification
network in this experiment consists of a pre-processing network cascaded to a recurrent AM
designed by our method for the lexiographic orderings of the three 7 × 7 memory patterns
in Figure 4.5. The pre-processing network is used here to scan the input image with a 7× 7
window and then to obtain the lexiographic ordering of each window. A distorted version
of a map, shown in Figure 4.6a, which contains three types of patterns, is presented to the
classification network. The network was able to classify the two recognized patterns in the
map (see Figure 4.6b) and the other patterns were associated to the blank, which had also
been introduced to the network as a memory pattern in the design phase. This example has
shown that the classification task can be performed by the proposed recurrent AM, even in
noisy environment, besides its general usage in pattern recognition applications.
75
Figure 4.6: The input map (a) and the classification result (b).
4.3.5 An Application of the Extended Method
Finally, we present the results of another simple example to demonstrate the extension of
the method described in Section 4.2.3
Consider a memory set consisting of the following vectors:
x1 = [0 0 0 0 0]T , x2 = [0 0 1 1 1]T , x3 = [0 1 0 1 1]T , x4 = [0 1 1 0 0]T ,
x5 = [1 0 0 1 1]T , x6 = [1 1 0 0 0]T , x7 = [1 1 1 0 1]T , x8 = [1 1 1 1 0]T .
The original method cannot be applied for this memory set, because the inequality system
(4.9) is infeasible and, thus, there exists no quadratic form (4.1) that has strict local minima
located at these vectors. By applying Proposition 1, we obtain a coefficient vector w∗ which
partitions the design inequalities as in (4.14). From this coefficient vector, we then construct
the piecewise quadratic form (4.15) with
Q =
0 −4.7 0.9 −0.9 −0.9
−4.7 0 −4.7 2.9 2.9
0.9 −4.7 0 −0.8 −0.8
−0.9 2.9 −0.8 0 −7.5
−0.9 2.9 −0.8 −7.5 0
, c =
7.6
−6.2
7.7
4.6
4.6
,
76
and
D =
[0 0 0 1 1]T , [0 1 0 0 0]T , [0 1 1 1 1]T , [1 1 0 1 1]T , [1 1 1 0 0]T , [1 1 1 1 1]T
.
It can be easily verified that each memory vector is a strict local minimum of this discrete
function. The weight matrix and the threshold vector of CON-DHN are determined as
W = −2 · Q and t = −c, respectively, according to Corollary 2. The separating function
h[·] is finally realized by a discrete multi-layer perceptron, which responds to a vector in D
as 1, and −1, otherwise.
It has been observed that each memory vector has been stored as an attractive fixed point
of the resulting CON-DHN by the extended method, providing perfect storage. By initiating
the network by each of all possible 32 vectors, we have observed that no spurious memory
occurred in the resulting network.
CHAPTER FIVE
MULTI-STATE RECURRENTASSOCIATIVE MEMORY
DESIGN
In this chapter, a design procedure for an AM operating on multi-valued pattern space
is presented. A generalized DHN, namely complex-valued multi-state Hopfield network is
introduced and its design is maintained in a similar way to the one followed in the previous
chapter.
5.1 Motivation
Though many methods have been proposed aiming to obtain a DHN as a binary associative
memory, hardly a few papers have appeared in the literature that generalize the design to
the non-binary case, i.e., for cases where the memory vectors are allowed to take integral
values other than −1 and 1.
To be able to recall n-dimensional integral memory vectors in 1, 2, . . . , K, the
conventional Hopfield model obviously needs to be generalized such that the state space of
the network contains I := 1, 2, . . . , Kn. A straightforward way to achieve this is through
generalizing the conventional bi-state activation function to a K-stage quantizer as proposed
and analyzed in (Zurada et al., 1996). By replacing the activation functions of neurons in
the conventional Hopfield network with this nonlinearity remarkable steps have been made
towards the design of multi-state associative memories (Shankmukh & Venkatesh, 1995),
(Elizade & Gomez, 1992), (Mertens et al., 1991). It has also been shown in (Nadal & Rau,
78
1991) that the maximum number of integral patterns that can be stored in such a network by
any design procedure is proportional to n · (K − 1) · f(K), where f(K) is of order 1.
An alternative dynamical finite-state system operating on I has been introduced in
(Jankowski et al., 1996) as the complex-valued multi-state Hopfield network. This
model employs the complex neuron model (Aizenberg & Aizenberg, 1992) employing the
complex-signum nonlinearity. Each neuron in this autonomous, single-layer, connectionist
network simply takes a complex weighted sum of previous state values and passes it through
the complex-signum activation function. This produces its next state, where the complex-
signum is a K-stage phase quantizer for complex numbers and is defined as:
csignK(u) :=
e0 0 ≤ arg(u) < 2πK
ei 2πK
2πK
≤ arg(u) < 4πK
...
ei 2πK
(K−1) (K − 1)2πK
≤ arg(u) < 2π
(5.1)
Note that, by the virtue of this nonlinearity, each state of the network is allowed to take one
of the equally spaced K points on the unit circle of the complex plane (see Figure 5.1). Each
neuron indicates an integral information modulated as the phase angle of its unit-magnitude
complex-valued state, which constitutes an element of the state vector of the dynamical
network. Hence, not the original integral vectors, but their transformed versions can be
stored and recalled by this network. This injective transformation, which basically maps
each entry of a vector in the integral lattice I as a point on the unit circle of the complex
plane, is expressed as:
pK(·) : 1, 2, . . . , Kn →
ei 2πK
j : j ∈ 0, . . . , K − 1n
pK(u) :=[
ei 2πK
u1 ei 2πK
u2 · · · ei 2πK
un
]T. (5.2)
The range of pK(·), which will be called the transformed vectors in the rest of the paper,
can also be considered as the co-domain of the transformation. In this case, the usage of
complex-valued multi-state Hopfield network, which actually operates on the transformed
79
Figure 5.1: An illustration of csign8(u) for u = −1.2 − 0.5i.
vector space, is meaningful in processing integral vectors. Each state of the network can be
uniquely transformed to an integral vector in I via p−1K (·).
A generalized Hebb rule has been proposed in (Jankowski et al., 1996) as a learning
procedure for complex-valued multi-state Hopfield network to recall some specific phase
combinations from their distorted versions. However, as expected, this generalized rule,
which constitutes the unique learning procedure for the considered network model, suffers
from almost the same limitations as it does in the binary case. This is why an efficient
application of this network could not have been proposed yet. On the other hand, another
significant qualitative result addressed in (Jankowski et al., 1996) is that the state vector
of the network necessarily converges to a local minimum of a specific real-valued quadratic
functional. This is defined in terms of the network parameters, along the collective operation
of the n complex-valued neurons in asynchronous mode, if the complex weight matrix of
the network is Hermitian and its diagonal entries are all non-negative. Such a network will
be called Hermitian hereafter.
Several design procedures that employ inequalities in the design of recurrent neural
networks have been reported, e.g. (Tan et al., 1991), (Schwarz & Mathis, 1992),
(Xiangwu & Hu, 1997). Such attempts mainly focused on embedding fixed points into
the conventional Hopfield network and constructed the design inequalities directly from
the nonlinear recursion performed by the network. Though a solution of these inequalities
gives the desired parameters of the recursion which has fixed points located at the given
80
binary points, networks designed in these ways might not be capable of restoring a memory
vector from its distorted versions, since attractiveness is not a design condition in such
methods. By posing this property as a constraint, an indirect method to construct the energy
landscape of the discrete Hopfield network via solution of homogenous linear inequalities
was proposed in (Muezzinoglu et al., 2003a). Nevertheless, these effective approaches
on designing conventional bi-state network have not yet been extended for multi-state
associative memories.
Based on the energy minimization performed by the complex-valued multi-state Hopfield
network, this paper suggests an indirect design procedure. The procedure gives a Hermitian
weight matrix such that each transformed memory vector is an attractive fixed point of the
resulting finite state system. The proposed method basically employs homogenous linear
inequalities to dig a basin for each transformed memory vector in the quadratic energy
landscape to ensure that they are all strict local minima. If the system of inequalities is
feasible, then its solution provides the desired quadratic form, and finally the complex
weights of the network are determined from the Hermitian coefficient matrix of this
quadratic.
Feasibility of the inequality system constructed in the design is actually not only
sufficient but also necessary for the existence of a Hermitian network that possesses
attractive fixed points located exactly at the transformed memory vectors. In other words,
if the constructed inequality system is infeasible, no Hermitian network can possess a limit
set that contain the transformed memory vectors. This implies that the proposed method
reveals the best performance of such a network as a multi-state associative memory.
5.2 Design Procedure
5.2.1 Complex-Valued Multistate Hopfield Network
Assume a complex-valued multi-state Hopfield network consists of n fully connected
neurons, whose states at time instant k constitute the state vector x[k] of the network.
Let wij denote the complex-valued weight associated to the coupling from the state of
the j-th neuron to an input of the i-th one. The asynchronous operation of the network
81
is characterized as updating the state of a single neuron, say l-th neuron, at time k according
to the recurrence
xl[k + 1] = csignK
ei(π/K)∑
j
wljxj[k]
, (5.3)
while keeping all other states unchanged. Here K is the resolution factor of the network,
and it determines the cardinality of the finite state-space. Although the term ei(π/K) has no
effect on the network dynamics theoretically, it provides a phase margin of π/K for phase
noise of the weighted sum of state vector entries.
The qualitative properties of the proposed network can be investigated by introducing an
energy function defined on the state-space in terms of the weight coefficients:
E(x) := −1
2
∑
i
∑
j
wijxixj, (5.4)
similar to the way followed in the stability analysis of conventional Hopfield network
(Bruck & Goodman, 1988). A sufficient condition on the convergence of the recursion (5.3)
has been reported in (Jankowski et al., 1996) as a Hermitian weight matrix (W = W∗)
with nonnegative diagonal entries (wii ≥ 0). The proof of this statement is simply achieved
by showing that each state transition necessarily causes a decrement in the energy function
under these conditions, which also enable us to rewrite (5.4) in a real-valued quadratic form:
E(x) = −1
2x∗Wx. (5.5)
Since the network operates in a finite state-space by definition of csignK(·), then the domain
of (5.5) is finite. The state transitions therefore ends at a local minimum of (5.5) in finite time
steps for any initial condition. In fact, the domain of the energy function (5.5) and the state
space of the asynchronous recursion (5.3) are the same spaces. Hence the energy function,
which is quadratic in the states but linear in the weight coefficients, not only establishes the
convergence analysis, but also defines attractive fixed points of the network as its strict local
minima.
82
It is assumed throughout the derivation that the update order of the neurons, i.e. the index
l in (5.3), is chosen at random, like usually it is done in the conventional discrete Hopfield
network.
5.2.2 Design of Quadratic Energy Function with Desired Local Minima
We restrict ourselves to the synthesis of the complex-valued multi-state Hopfield network
with Hermitian weight matrix with zero diagonal entries. Note that this assumption not
only reduces the amount of parameters that describe the network, but also simplifies the
design as it already guarantees convergence. Indeed, the design of the network is equivalent
to the design of its energy function in this case, since the parameters (i.e. the Hermitian
weight matrix) of the network can be uniquely determined from the coefficients of its energy
function and vice versa. Thus, rather than the recursion (5.3) directly, our design method
described in the following mainly focuses on the energy function (5.5), which is necessarily
real-valued by the previous assumption.
Given a set of integral memory vectors M ⊂ 1, 2, . . . , Kn, let Mc denote the set of
complex vectors obtained by transforming elements of M into their complex representation
by (5.2). In order to perform a search for a Hermitian coefficient matrix W such that the
real-valued discrete quadratic form (5.5) attains a local minimum at each element of Mc, we
simply apply the definition of a strict local minimum, and impose a set of strict inequalities:
E(x) < E(y), ∀y ∈ BK1 (x) − x (5.6)
to be satisfied for each x ∈ Mc. Here BK1 (u) is the 1-neighborhood of u and defined
formally as:
BK1 (u) :=
n⋃
i=1
v : vi = uiei 2π
K ∨ vi = uie−i 2π
K , vj = uj, j 6= i
∪ u . (5.7)
83
By substituting (5.4) in (5.6), we express this condition as 2n inequalities to be satisfied by
the coefficient matrix W = [wij]:
∑
i
∑
j
wijxixj >∑
i
∑
j
wijyiyj, ∀y ∈ BK1 (x) − x . (5.8)
Incorporating now our initial design considerations wij = wji and wii = 0, condition (5.8)
can be further expressed in terms of only the upper triangle entries of W:
∑
1≤i<j≤n
wij [xixj − yiyj] + wij [xixj − yiyj] > 0, (5.9)
for all y ∈ BK1 (x) − x. We then substitute the identity
wijxixj + wijxixj = 2RewijRexixj − 2ImwijImxixj, (5.10)
in (5.9) and obtain:
∑
1≤i<j≤n
Rewij [Rexixj − Reyiyj] + Imwij [Imyiyj − Imxixj] > 0. (5.11)
for all y ∈ BK1 (x) − x. Recall from the definition of transformation p(·) in (5.2) that
Rexixj = cos(
2πK
(−xi + xj))
and Imxixj = sin(
2πK
(−xi + xj))
where x is the
original integral vector from which the unit-magnitude complex vector x is obtained. Hence,
the design condition (5.11) could be directly expressed in terms of the original memory
vectors, i.e. the elements of M , instead of the transformed ones in Mc.
We finally gather all inequalities associated to all memory vectors, and formally impose
the overall system of inequalities, which have been derived above, as the design condition
as follows.
84
Corollary 3 The quadratic form (5.5) possesses a strict local minimum at each element of
Mc if and only if the homogenous inequality
∑
1≤i<j≤n
Rewij[
cos(
2πK
(xj − xi))
− cos(
2πK
(yj − yi))]
+Imwij[
sin(
2πK
(yj − yi))
− sin(
2πK
(xj − xi))]
> 0, (5.12)
is satisfied by the Hermitian weight matrix W for all x ∈ M and for all y ∈ IK1 (x)− x.
Here IK1 (x) is the ball that contains the inverse-transformed versions of the vectors in
BK1 (x), namely x and all of its 1 neighbors in the integral lattice 1, 2, . . . , Kn:
IK1 (u) :=
n⋃
i=1
v : vi = ui + 1 (mod K) ∨ vi = ui − 1 (mod K), vj = uj, j 6= i ∪ u .
To find real and imaginary parts of desired weight coefficients, a solution to this system
of 2 · |M | · n inequalities is needed to be calculated by an appropriate method. Note that
(5.12) is a linear feasibility problem, because left-hand side of each inequality is linear in
the variables Rewij and Imwij for i, j = 1, 2, . . . , n. Due to this property, if 5.12 is
a feasible inequality system for a given M , any linear programming procedure, e.g. the
primal-dual method (Luenberger, 1973), or the perceptron learning algorithm (Rosenblatt,
1962), would provide a solution, so the complex parameters of the network could be
determined by reconstructing W from this solution. On the other hand, infeasibility of
(5.12) means that the given memory vectors cannot be altogether embedded as strict local
minima into (5.5), and consequently that there exists no Hermitian network which has
attractive fixed points located at each of these vectors.
5.2.3 Elimination of Trivial Spurious Memories
The goal of the design method described above is only to render each memory vector as an
attractive fixed point of the network. Since no additional condition has been imposed on
eliminating undesired fixed points that might occur in the resulting network, the Hermitian
weight matrix W obtained by solving (5.12) by any suitable procedure could also satisfy
a set of inequalities, which implies a vector other than the elements of Mc be a strict local
minimum of (5.5), although these inequalities are not explicitly imposed in the design.
85
Most of the associative memory design methods are known to cause spurious memories.
Unfortunately, neither the existence nor the location of many of these points in the state-
space of the dynamical network is predictable. Moreover, discrimination of these vectors
after the design is very difficult for large n since almost every point in the huge state
space of the system should be checked for this purpose. On the other hand, some of the
spurious memories are correlated with the memory vectors and their locations can be exactly
determined in terms of the memory vectors. For example, the conventional Hebb rule
used in the design of binary associative memory introduces many undesired fixed points
to the network beyond the desired ones, and most of these points cannot be determined
without checking each point in the entire state-space (Zurada, 1992). However, one can
easily conclude that if x is a fixed point of the discrete Hopfield network, then so is −x.
This property of network designed by the Hebb rule enables the designer to address some
spurious memories in advance, which are directly related to the original memory vectors.
A similar relation can be extracted from our design method by observing from (5.12)
that only the differences between the entries of the integral memory vectors, not their actual
values, are used in the construction of the design inequalities. It can be easily verified that
the inequality system constructed for an integral memory vector x ∈ 1, 2, . . . , Kn in
the way proposed in the previous subsection would be exactly the same one constructed
for each vector x + k · e (mod K), where k = 1, 2, . . . , K and e is the n-vector with all 1
entries. Hence, the weight matrix calculated from the solution of (5.12) not only makes each
element of Mc an attractive fixed point, but also introduces at least (K − 1)|M | additional
vectors, namely the transformed versions of the integral vectors obtained by incrementing
each element of M in modulo K by k · e, k = 1, . . . , K − 1, as spurious memories to the
network. Such vectors are called trivial spurious memories and an extension to the design
is proposed in the following to eliminate them.
Let us append an arbitrary integer, say 1, to each memory vector in M as last entry and
apply the proposed procedure to obtain the complex-valued multi-state associative memory
of n + 1 neurons. Since the last entry of any trivial spurious memory is different than 1
by definition, one can simply exclude their transformed versions from the state-space of
the network by restricting the dynamics (5.3) in the subspace that consists of the vectors
86
whose last entries are equal to ei 2πK . This is achieved by setting the state of the (n + 1)-st
neuron fixed to ei 2πK along the recursion. Note that this state is connected to the inputs of
other neurons via the weights wl,n+1nl=1, thus this modification on the network model is
actually equivalent to introducing a complex threshold tl = ei 2πK wl,n+1 to l-th neuron of the
original network (5.3) for l = 1, . . . , n, whose dynamical behavior can now be recast as:
xl[k + 1] = csignK
ei(π/K)
n∑
j=1
wljxj[k] + tl
. (5.13)
Although the method avoids the trivial spurious memories, there might still occur some
non-trivial spurious memories in the network. It is expected that the number of such
attractive fixed points increase with K, since the cardinality of the state-space increases
with K. However, it is ensured by the method that none of these spurious memories is
located in IK1 (x) for all x ∈ M , therefore the resulting network corrects all possible errors
caused by incrementing or decrementing a single entry of the memory vectors by 1. In other
words, correction of the vectors in 1-neighborhood of the memory vectors are guaranteed.
5.2.4 Algorithmic Summary of the Method
A summary of the proposed design method described in Section 5.2.2 together with its
improvement in Section 5.2.3 is given below.
Algorithm 2 Input to the algorithm is M ⊂ 1, 2, . . . , Ln
Step 0: Set a resolution factor K ≥ L for the network. Append 1 to every x ∈ M as the last
entry. Set A as the empty matrix.
Step 1. For each x ∈ M and for each y ∈ IK1 (x) − x, calculate the row vector
[
c12 s12 c13 s13 · · · c1,n+1 s1,n+1... c23 s23 c24 s24 · · · c2,n+1 s2,n+1
... · · · ... cn,n+1 sn,n+1
]
,
where cij = cos(
2πK
(xj − xi))
− cos(
2πK
(yj − yi))
and sij = sin(
2πK
(yj − yi))
−sin
(
2πK
(xj − xi))
, and append it as an additional row to matrix A.
87
Step 2: Find a solution q∗ ∈ <n(n+1) for the inequality system Aq > 0 by using any
appropriate method.
Step 3: Construct the Hermitian matrix
W =
0 q∗1 + iq∗2 q∗3 + iq∗4 · · · q∗2n−1 + iq∗2n
q∗1 − iq∗2 0 q∗2n+1 + iq∗2n+2 · · · q∗4n−3 + iq∗4n−2
......
.... . .
...
q∗2n−1 − iq∗2n q∗4n−3 − iq∗4n−2 q∗6n−5 − iq∗6n−4 · · · 0
.
Step 4: Extract the parameters of recursion (5.13) from W as wij = wij for i, j =
1, 2, . . . , n and tj = ei 2πK wi,n+1 for j = 1, 2, . . . , n.
As the dimension n of the memory vectors increases, manipulating the energy of each
memory vector in the way suggested by the second and third steps of this algorithm becomes
time and memory consuming when compared to the generalized Hebb rule. In practice,
this procedure is easily realizable for memory sets with resolution factor of order 10 and
dimension of order 10, sufficient to perform reconstruction of gray-scale images. On the
other hand, the performance of the resulting network is much better than that of the one
designed by the generalized Hebb rule, as shown at the end of the next section.
5.3 Simulation Results
Results of computer experiments are presented below to illustrate the quantitative
performance of the method, i.e. the maximum cardinality of an arbitrary memory set that
can be successfully embedded into the network by the proposed design method. The recall
capability of the resulting network and its application on reconstructing gray-scale images
are also demonstrated.
5.3.1 Complete Storage Performance
Any fixed point of an n-th order dynamical system can be considered as an n-dimensional
static information encoded as system parameters. As demonstrated in the previous section,
88
dynamical associative memories are designed from this point of view by determining the
parameters of an a priori chosen network model such that a given set of static vectors are the
fixed points of this system. Hence, an associative memory realizes a dichotomy defined on
its state space: some specific points in this space are fixed points (constitute the limit set) of
the system, while the rest are not. However, the design of an ideal associative memory in this
way is generally not possible for every possible memory set, i.e. not every dichotomy can
be implemented, because of limitations of the chosen model, e.g. the number of parameters.
In our case, for example, the network model involves (n2 + n)/2 complex coefficients
(weights and thresholds), however, the number of all possible dichotomies is equal to 2Kn
,
which is the number of subsets of the state space 1, 2, . . . , Kn. If it were possible to
design the complex-valued multi-state Hopfield network as an ideal associative memory
for every possible memory set, then this design would be a very efficient compression tool
that enables the lossless compression of an arbitrary memory set into (n2 + n)/2 complex
numbers. However, such a compression seems impossible from the information theory
point of view, since the number of free variables, i.e. parameters, is quadratic in n, while
the number of dichotomies grows exponentially with n. Therefore, if the design is based on
a network model, which is the case for many neural associative memories, then only some
of the possible memory sets can be introduced as fixed points to the network by any design
method.
We say that a memory set M is stored completely by our design method if each element
of M constitutes a fixed point in the resulting network. We measure the quantitative
performance by the percentage of the number of completely stored memory sets among
a collection of memory sets generated randomly. Recall that the complete storage of a
memory set is equivalent to the feasibility of the inequality system (5.12) constructed for
this set.
For some n, |M | and K values 100 random memory sets have been generated and
checked whether each of these sets yielded a feasible inequality system or not. The number
of sets that yielded a feasible inequality system for each experiment is listed in Table 5.1,
which shows that almost every set with |M | ≤ n can be completely stored independent of
the value of K.
89
Table 5.1: Percentages of memory sets that yielded feasible inequality systems.
n |M | P% (K = 5) P% (K = 10) n |M | P% (K = 5) P% (K = 10)
5 3 100 100 20 10 100 100
5 90 95 20 100 100
7 2 5 30 3 17
10 4 100 100 50 25 100 100
10 97 100 50 100 100
15 0 12 75 9 21
The effect of K on complete storage performance is also shown in Table 5.1. The
probability of complete storage P% increases as the resolution factor K increases for fixed
n and |M |. However, this would cause the state space to grow enormously and, hence,
possibly cause more non-trivial spurious memories as illustrated in the next subsection.
5.3.2 Application of the Design Procedure
We first give an illustrative example of proposed design procedure and investigate the
performance of resulting network.
Example 1 Consider the memory set consisting of the following integral vectors:
x1 =
3
5
5
3
, x2 =
4
3
1
5
, x3 =
4
4
5
4
, x4 =
5
2
4
3
.
90
which belong to the integral lattice 1, 2, . . . , 54. We have first augmented 1 to each vector
as the last entry and transformed them to their phase-modulated versions by (5.2):
x1 =
ei 6π5
eiπ
eiπ
ei 6π5
, x2 =
ei 8π5
ei 6π5
ei 2π5
eiπ
, x3 =
ei 8π5
ei 8π5
eiπ
ei 2π5
, x4 =
eiπ
ei 4π5
ei 8π5
ei 6π5
;
assuming that the resolution factor K is equal to 5. The inequality system has been
constructed as in (5.12) and been solved by linear programming to obtain the weight matrix
and the threshold vector as
W =
0 7.4 − i68.4 −65.6 + i132.8 139.7 − i31.7
7.4 + i68.4 0 108.4 − i17.8 −76.5 − i92.6
−65.6 − i132.8 108.4 + i17.8 0 80.9 + i167.1
139.7 + i31.7 −76.5 + i92.6 80.9 − i167.1 0
,
t =
−73 − i134.1
−82.6 + i46.1
126.6 − i131.3
20.7 + i174.4
.
It can be verified that for these parameters each transformed memory vector xi is a fixed
point of the recursion (5.13). After injecting each 1-neighbor of each memory vector as the
initial state vector it has been observed that the network converged to the nearest memory
vector for each initial condition. Hence, it can be concluded that the design has been
successful. We have also identified the spurious memories by checking the transformed
version of each element of the integral lattice 1, 2, . . . , 54 and observed that the network
has 15 spurious memories, none of which is trivial. Note that the same memory set can be
embedded for a larger resolution factor. When the design is repeated for K = 6, one can
see that the number of spurious memories increases by 2.
91
Figure 5.2: Test images used in image reconstruction example.
Since gray-scale images can be represented by integral vectors, reconstruction of such
images from their distorted versions constitutes a straightforward application of multi-
state associative memory, as investigated in (Zurada et al., 1994). The following example
illustrates the performance of the proposed method in performing this task.
Example 2 Gray-scale versions of three well-known test images, namely Lenna, peppers,
and cups images, have been used in this experiment. Due to computational limitations, the
original high-resolution 256-level images have been re-scaled to 100 × 100 resolution and
their gray-levels have been quantized down to 20 levels. Thus, each image can be considered
as a 100 × 100 matrix consisting of integral numbers where 0 and 20 denote a black and
a white pixel, respectively, and each integer value in between these values indicate a gray
tone. These three prototype images are shown in Figure 5.2.
Each image has been segmented into 500 20-dimensional vectors as xluv ∈
1, 2, . . . , 2020 for u = 1, . . . , 5 and v = 1, . . . , 100, such that j-th column of l-th image
is represented by concatenating 5 of these integral vectors, namely xlij, i = 1, . . . , 5. Here l
denotes the image index: 1 for Lenna, 2 for peppers, and 3 for cups. A 20-neuron complex-
valued multi-state associative memory has then been designed for each triple of memory
vectors x1uv,x
2uv,x
3uv, u = 1, 2, . . . , 5 and v = 1, 2, . . . , 20. Since we have attempted to
embed only 3 vectors into a 20-neuron network by our method, which is far below the
actual capacity investigated in Section 5.3.1, all 500 designs have been successful.
92
Figure 5.3: Images corrupted by 20% salt-and-pepper noise (above) and their
reconstructions obtained by the network (below).
After the design phase the distorted versions of the prototype images have been obtained
by adding 20% salt-and-pepper noise, as shown in Figure 5.3a. Each of these distorted
images was segmented the same way as described above, and then the transformed version
of each vector obtained in this way as the initial condition to the corresponding network
was applied. After all 500 networks reached their steady states, i.e. fixed points, the integral
vectors have been obtained by the inverse transformation p−1K (·) and combined in a 100 ×
100 matrix. The reconstructed images obtained by this procedure for each distorted image
are shown in the corresponding column of Figure 5.3b. It can then be concluded that the
networks are capable of removing 20% salt-and-pepper noise on each image successfully.
In other words almost none of these 500 networks converges to a spurious memory in this
experiment.
As the experiments were repeated for 40% and 60% noise (see Figures 5.4a and
resp. 5.5a, non-trivial spurious memories became effective in the recall, so reconstruction
performance decreased. This can be observed from the recalled images shown in
Figures 5.4b and resp. 5.5b.
93
Figure 5.4: Images corrupted by 40% salt-and-pepper noise (above) and their
reconstructions obtained by the network (below).
Figure 5.5: Images corrupted by 60% salt-and-pepper noise (above) and their
reconstructions obtained by the network (below).
94
Figure 5.6: Filtered images obtained from noisy images with 40% salt-and-pepper noise by
the network (above) and by median filtering (below).
The tasks performed by a filter and by an associative memory are conceptually different:
A filter is usually expected to remove noise on any signal, while an associative memory is
designed to filter out the noise on prototype vectors only. However, despite the negative
effects of spurious memories, the performance of the network in filtering noisy images is
still comparable to that of median filtering, which is known to be one of the most effective
methods for filtering out salt-and-pepper noise. This can be verified by Figures 5.6a
and 5.6b, showing the reconstructed versions of 40% corrupted images obtained by our
method and by median filtering, respectively.
The recall capability of our method with the generalized Hebb rule proposed in
(Jankowski et al., 1996) was also compared. In this experiment, the three images in
Figure 5.2 were used as the prototype images in generalized Hebb rule. The dominant
effect of spurious memories can be visually identified when Lenna image was about to be
reconstructed from its 20% distorted version when generalized Hebb rule is used in the
design (see Figure 5.7a). Our method on the other hand enables an almost perfect recall as
shown in Figure 5.7b.
95
Figure 5.7: Lenna images obtained by the networks designed by the generalized Hebb rule
and by the proposed method, respectively.
CHAPTER SIX
MULTI-LAYER RECURRENTASSOCIATIVE MEMORY
DESIGN
To achieve perfect storage of binary memory vectors, a generalization of the discrete
Hopfield model to a multi-layer recurrent network of bi-state discrete perceptrons is
suggested in this chapter. The proposed design procedure employs the back-propagation
learning algorithm. In the training phase, the discrete perceptrons are replaced with
sigmoidal neurons having large gains. The number of neurons in the hidden layer, is
assumed to be adjustable and this flexible structure of the network allows the perfect storage
of arbitrary (uncorrelated) binary vectors. The performance of the proposed network is
investigated by intensive computer simulations.
6.1 Motivation
The discussion made in Section 4.2 yields the fact that any discrete Hopfield-based
dynamical associative memory design procedure, which aims a symmetric weight matrix,
is indeed an attempt to map the given M to the set of discrete local minima of a quadratic.
The design method proposed in Chapter 4 achieves perfect storage with this motivation,
whenever this is achievable. However, there is no way to introduce all memory vectors as
fixed points to DHN if M is not quadratic-distinguishable defined below.
Definition 5 A set M ⊆ 0, 1n is quadratic-distinguishable if there exists a functional of
the form (4.1) which has a discrete local minimum at each element of M . It is called strictly
97
quadratic-distinguishable if, in addition, this quadratic has no local minimum other than
the elements of M .
One can verify that quadratic-distinguishable sets constitute a rather small subset in the
set of all possible binary sets. An investigation performed by computer experiments in
Section 4.2.2 has shown that the validity of this property is closely related to the cardinality
|M | of M , and that almost all binary sets containing less than 1.5 ·n elements are quadratic-
distinguishable while such sets turn out to be rare as |M | increases. This result actually
explains the reason why the conventional model does not work adequately as a binary AM
in most cases, i.e. when the memory vectors are chosen arbitrarily, not correlated in this
way.
Another well-known defect of the model is that, even though all given memory vectors
can be perfectly stored by an appropriate design method, the state vector might converge to
a binary point, which does not correspond to any memory vector, i.e. a spurious memory.
Their occurrence is obviously due to the violation of the property in strict sense. Avoiding
spurious memories in the design, as well as addressing them in the resulting network, is not
an easy task for large dimensions.
From these aspects, a generalization of the conventional model is evidently necessary
to achieve the perfect storage of an arbitrary memory set. A successful generalization has
been proposed in Section 4.2.3 by incorporating an algebraic multi-layer perceptron to the
conventional asynchronous dynamics (4.2). In the following section, we suggest another
modification, namely introducing an additional layer that comprises adjustable number of
neurons, to DHN.
6.2 Multi-Layer Recurrent Network
As an alternative model to the conventional discrete Hopfield network, we consider here two
cascaded layers of bipolar discrete perceptrons as illustrated in Figure 6.1. In synchronous
98
...
w
w
w
w
11
nn
n1
1n
x [k]1
nx [k]
x [k+1]1
x [k+1]n
...
t1
tn
Σ sgn(.)
b1
bl
r
1l
11
rln
rl1
r ...
HiddenLayer
OutputLayer
Σ sgn(.)
Σ sgn(.)
Σ sgn(.)
Figure 6.1: A two-layer recurrent network made up of discrete perceptrons.
mode, this network operates on the bipolar binary state-space −1, 1n according to the
recurrence:
x[k + 1] = so (W · sh (R · x[k] + b) + t) (6.1)
Here W, R, t, and b denote l × n output layer weight matrix, n × l hidden layer
weight matrix, n-dimensional output layer bias vector, and l-dimensional hidden layer
bias vector, respectively. The vector-valued functions sh(·) : <n → −1, 1l and so(·) :
<n → −1, 1n are diagonal transformations defined as [sgn(·) · · · sgn(·)]T . Asynchronous
operation mode for the model could also be defined in the same way as was done in (2.2)
for the single layer case. Note that the model enables the designer adjust the number l
of hidden-layer neurons. It is logical to expect that the two-layer network, even with this
flexibility, is supposed to outperform the classical Hopfield model in association task.
6.3 Design Procedure
We pose the dynamical associative memory design problem for the two-layer recurrent
structure in Figure 6.1 as the determination of the parameters W, R, t, and b which
imposes to recursion (6.1) (or to its asynchronous counterpart) an attractive fixed point
located exactly at each element of M that is in general quadratic-non-distinguishable.
99
Definition 6 i. A fixed point p∗ of the an asynchronous recursion pi[k+1] = ρ(p[k]) defined
on a binary space is called attractive if there exists an update rule such that the recursion
converges to p∗ for all initial conditions in B1(p∗), where Bd(q) denotes the set of binary
points which are located at most d-Hamming distance away from q. ii. For synchronous
recursions, the definition of attractiveness is equivalent to that of stability in the sense of
Lyapunov (Vidyasagar, 1993).
As described in Condition 4, attractiveness for a fixed point x ensures the correction
of any 1-bit distortion on x along the recursion (6.1), thus it is a key property for binary
dynamical associative memories. This is why it is considered as the crucial design condition
here. The search for network parameters under this constraint can now be shaped as a formal
supervised learning procedure ignoring the dynamicity of the network.
Problem 2 Determine real coefficients W, R, t, and b such that the equality
so (W · sh (R · q[k] + b) + t) = p (6.2)
holds for all p ∈ M and for all q ∈ B1(p)1.
In order to solve the design equations (6.2), which involve nested discontinuous
nonlinearities, one can make use of a systematic technique, namely back-propagation
algorithm. However, since the network output, i.e. the left-hand side of (6.2), is a
discontinuous functional of the considered parameters, back-propagation training algorithm
would not be applicable for the network. To overcome this problem, each activation function
sgn(·) should first be replaced with a continuous one which has a similar form to that of
sgn(·). A well-known sigmoid function is given by
sgm(u) =1
1 − exp(−λ · u), (6.3)
1It is assumed in the derivation of design equalities (6.2) that the elements of M are located at least2-Hamming distance away from each other. If memory sets violating this assumption are to be taken intoaccount, then the equality should be imposed for all q ∈ B1(p) − M , where “-” denotes the set difference.
100
where the parameter λ trims the gain of this sigmoidal nonlinearity2, is known to be a good
candidate. The network then becomes ready to be trained to produce the desired outputs
(memory vectors), for the sample input vectors (their 1-Hamming neighbors). Mean-
Square-Error (MSE) is used as the performance index in the back-propagation learning
algorithm. It may be anticipated that the cumulative MSE for the training set diminishes
when sufficiently large number of hidden-layer neurons is used in the design. After the
training phase, the activation functions are finally replaced back with sgn(·).
Complete stability issue has not been considered throughout the design, since the
recursive characteristic of the network was ignored. This may cause the actual network
designed in this way oscillate for some initial conditions other than the 1-Hamming
neighbors of memory vectors. However, as will be illustrated below, these oscillations are
not catastrophic since they occur rarely, and the network interestingly converges to a fixed
point for almost all initial state vectors. The one-to-one correspondence between the fixed
points and the memory vectors is not guaranteed, either, since no additional constraint is
imposed in the design to avoid spurious memories.
6.4 Experimental Results
We present in this section the results of some computer experiments which were conducted
to enlighten the qualitative and quantitative performances of the proposed method.
Experiment 1: A straightforward design was first performed for the following randomly
generated memory set:
M =
−1
−1
−1
−1
,
−1
−1
1
1
,
−1
1
−1
1
,
−1
1
1
−1
,
1
−1
−1
1
,
1
1
−1
−1
,
1
1
1
1
.
Note that the M is not quadratic-distinguishable, so there exists no DHN of 4 neurons which
has attractive fixed points located at the elements of M .
2It can easily be seen that sgm(·) approaches to sgn(·) as λ → ∞.
101
The sample input and desired output sets were then generated as described in the previous
section. When the proposed design procedure was applied for l = 4, the parameters of the
two-layer network with sgm(·) activation functions were obtained as:
W =
1.95 −0.59 −1.76 2.21
9.45 −5.93 −5.09 −2.71
6.67 8.81 −1.96 −5.52
−0.77 1.18 −8.44 −0.30
,R =
2.11 2.13 1.78 −0.32
−1.09 −1.22 2.90 0.48
−1.55 −1.68 −0.11 −2.66
−1.50 −1.49 0.07 −2.04
,
t =
−6.45
−0.02
−2.42
−0.91
,b =
−2.84
−0.09
1.62
−1.81
.
It can be verified that these parameter values globally minimizes MSE after replacing the
activation functions with sgn(·), so the resulting network satisfies all desired input-output
relations.
Each of the 24 binary vectors was then injected as the initial state vector to the network
and the recurrent behavior was observed in order to verify perfect storage: Each element of
M constituted an attractive fixed point of the network. 8 of these binary vectors converged
to the same points as the ones obtained by nearest neighbor classifier, while our network
converged to different points for the rest 8 initial conditions. However, the network at
least did not contain any spurious memories, although the procedure had not imposed any
condition to avoid them.
Experiment 2: In this experiment, we randomly generated memory sets for several n
and |M| values and investigated the effect of the number l of hidden-layer neurons on the
performance of the resulting network upon training.
For each randomly generated memory set consisting of |M | n-dimensional bipolar binary
vectors the proposed design was performed to obtain three two-layer recurrent networks
102
Table 6.1: Performance of the proposed method in providing perfect storage and creating
spurious memories and/or limit cycles depending on l.
n |M| l = n/2 l = n l = 3n/2
PS NPS% NPC% PS NPS% NPS% PS NPS% NPC%
66
√20 0
√12 0
√0 0
12 × 32 0√
17 0√
5 0
18 × 48 3√
23 0√
9 0
88 × 22 0
√12 0
√0 0
16 × 20 10√
14 10√
8 0
24 × 33 16 × 25 12√
10 2
1010 × 32 4
√24 0
√2 0
20 × 40 8√
26 4√
6 0
30 × 60 14 × 30 12√
12 8
1212 × 18 6
√12 0
√2 0
24 × 28 10√
20 4√
6 2
36 × 54 34 × 30 22√
14 16
which comprises n/2, n, and 3n/2 number of hidden-layer neurons, respectively. The results
are listed in Table 6.1.
Here PS denotes the perfect storage and a check sign in this column indicates that the
perfect storage was achieved, while a cross indicates that some memory vectors could not
be embedded as an attractive fixed point to the corresponding network. The quantity NPS%
stands for the percentage of the initial conditions, for which the corresponding network
converged to a spurious memory, in all possible 2n initial states. Similarly, NPC% denotes
the percentage of initial state vectors, for which the network entered a limit cycle.
As can be observed from Table 6.1, the perfect storage is more likely to be achieved
for a relatively large number of hidden-layer neurons, because the number of adjustable
parameters, i.e. the dimensions of R and b, increase as l increases. Both the percentages of
spurious memories and limit cycles also decrease in this case, hence the resulting network’s
103
quantitative performance is improved. However, it should be noted that this effect slows
down the back-propagation algorithm and also increases the cost of identifying the network.
Unfortunately, there exists no procedure to find an optimal l. One strategy to approximate it
could be the pruning technique, which is choosing l large enough to ensure perfect storage
and then repeating the design by decrementing it until the perfect storage fails.
CHAPTER SEVEN
CONCLUSIONS
Five novel design methods to improve the performance of DHNs in evaluating the
association function given by (1.1) as a mapping from an initial state vector to a fixed point
have been proposed in this thesis work.
After introducing the memory concept and the associative memory, first the universal
network model, called DHN, has been introduced in Chapter 2. Five recurrent AM design
criteria have then been derived therein. Three major DHN-based AM design methods have
been explained and their performances have been criticized in terms of their fulfillment of
these criteria.
A Boolean Hebb rule for DHN, which admits binary parameters only, have been first
introduced in the first part of Chapter 3. The basic idea in this design method is simply
to embed each binary memory vector as a maximal independent set into a graph. We
have determined the conditions under which the proposed method gives a recurrent AM
satisfying most of these criteria that have not been simultaneously fulfilled by any available
DHN-based AM design method. We have also given a quantitative analysis of the designed
network and compared the storage capacity of the method to the ones provided by some
well-known methods. The simulations have shown that, even if the design conditions on the
memory set are violated, the performance of the method still outperform the outer-product
method for sparse memory sets. In the second part of Chapter 3, another graph theoretical
approach, namely representing the memory vectors as paths between two specific nodes of a
directed graph, has been presented. Generally with the cost of a higher number of neurons,
this second method guarantees perfect storage. Though one cannot avoid the occurrence
of many spurious memories in the network obtained by the method, these undesired fixed
105
points may occur only in a small neighborhood of the original memory vectors, thus cause
small errors.
Another binary recurrent AM design method which employs homogeneous linear
inequalities derived from the local minimality conditions has been presented in Chapter 4.
A solution to this inequality system yields the coefficients of the discrete quadratic energy
function of the DHN, so the weight matrix and the threshold vector of the network can be
directly determined. Simulations have shown that almost all memory sets with cardinality
less than n, where n is the dimension of the memory vectors, can be completely stored
in the dynamical network and so perfectly recalled. The method eventually establishes
an encoding for an arbitrary set of n-dimensional memory vectors as (n2 + n)/2 weight
and threshold coefficients associated to the recursion. The simulation results have shown
that DHN is suitable to recall each letter of English alphabet when designed by the
proposed method. It should be noted that this has not been achieved by any formerly-
proposed design method yet. Probably the most valuable observation of this work, which
triggered the subsequently proposed methods is that a DHN can only possess fixed points,
which are correlated as being quadratic-distinguishable (c.f. Definition 6). This condition
enlightens the upper bound on DHN’s performance in association task. To achieve a higher
performance beyond this limit, a generalization of the conventional DHN model has also
been proposed and demonstrated.
The approach presented in Chapter 4 has then been generalized to multi-state AM design
in Chapter 5. Besides some straightforward generalizations of the conventional DHN model,
complex-valued multi-state Hopfield network has been introduced as an efficient tool to
process static integral information. To support this idea, a design method for a subclass of
this model has been proposed, and uses Hermitian network model to make it operate as a
multi-state associative memory. The new method was shown to outperform the generalized
Hebb rule, which has yet constituted the only known so far learning rule for this model in
associating phase-modulated integral information. The recall performance of the resulting
network has been illustrated on restoring gray-scale images, and the results have been found
satisfactory.
106
A design procedure to ensure perfect storage of uncorrelated memory vectors into a two-
layer recurrent network has been finally proposed in Chapter 6. The adjustability of the
number of neurons in the hidden-layer of the proposed model allows the designer to attain
any desired degree of performance with the cost of a longer training. Though the model
has already been shown to be much superior to the conventional discrete Hopfield network
in association task, it still needs to be analyzed theoretically to reveal the convergence
conditions and the energy function. The proposed method and its analysis should be
extended to be also applicable in the network’s asynchronous operation mode.
REFERENCES
Aizenberg, N., & Aizenberg, I. (1992). CNN based on multivalued neuron as a model of
associative memory for gray-scale images. Proc. 2nd Int. Workshop on Cellular Neural
Networks and their Applications (CNNA-92), Munich, Germany, 36.
Aksın, D. (2002). A high-precision high-resolution wta-max circuit of O(N) complexity.
IEEE Trans. Circuits and Systems Part II, 49, 48–53.
Anderson, J. (1995). Introduction to neural networks. Cambridge, MA: MIT Press.
Athithan, G., & Dasgupta, C. (1997). On the problem of spurious patterns in neural
associative memory models. IEEE Trans. Neural Networks, 8, 1483–1491.
Bazzaraa, M., & Jarvis, J. (1977). Linear programming and network flows. New York: John
Wiley & Sons.
Bertsekas, D. (1995). Nonlinear programming. Belmont, MA: Athena Scientific.
Bramley, R., & Winnicka, N. (1996). Solving linear inequalities in a least squares sense.
SIAM J. Sci. Comp., 17, 275–286.
Bruck, J., & Goodman, J. (1988). A generalized convergence theorem for neural networks.
IEEE Trans. Information Theory, 34, 1089–1092.
Bruck, J., & Roychowdhury, V. (1990). On the number of spurious memories in the hopfield
model. IEEE Trans. Information Theory, 36, 393–397.
Dembo, A. (1989). On the capacity of associative memories with linear threshold functions.
IEEE Trans. Information Theory, 35, 709–720.
Dogan, H., & Guzelis, C. (2003). A gradient network for vector quantization and its image
compression applications. Lecture Notes in Computer Science, (to appear).
Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.
108
Elizade, E., & Gomez, S. (1992). Multistate perceptrons: Learning rule and perceptron of
maximal stability. J. Phys. A, Math. Gen., 25, 5039–5045.
Erdos, P., & Erne, M. (1973). Clique numbers of graphs. Discrete Mathematics, 59, 235–
242.
Furedi, Z. (1987). The number of maximal independent sets in connected graphs. Journal
of Graph Theory, 4, 463–470.
Garey, M., & Johnson, D. (1979). Computers and intractability: A guide to the theory of
np-completeness. New York: W.H. Freeman.
Ghosh, J., Lacour, P., & Jackson, S. (1994). Ota-based neural network architectures with
on-chip tuning of synapses. IEEE Trans. Circuits and Systems-II, 41, 49–58.
Golden, R. (1986). The ’brain-state-in-a-box’ is a gradient descent algorithm. Journal of
Mathematical Psychology, 30, 73–80.
Han, S.-P. (1980). Least squares solution of linear inequalities.
Haykin, S. (1994). Neural networks: A comprehensive foundation. New York: McMillan
College.
Hecht-Nielsen, R. (1990). Neurocomputing. Reading, MA: Addison-Wesley.
Hirsch, M., & Smale, S. (1974). Differential equations, dynamical systems, and linear
algebra. New York: Academic Press.
Hopfield, J. (1982). Neural networks and physical systems with emergent collective
computational abilities. Proc. Natl. Acad. Sci., USA, 75, 2554–2558.
Ikeda, N., Watta, P., Artıklar, M., & Hassoun, M. (2002). A two-level hamming network for
high performance associative memory. Neural Networks, 14, 1189–1200.
Jagota, A. (1995). Approximating maximum clique with a Hopfield network. IEEE Trans.
Neural Networks, 6, 724–735.
Jankowski, S., Lozowski, A., & Zurada, J. (1996). Complex-valued multistate neural
associative memory. IEEE Trans. Neural Networks, 7, 1491–1496.
Kohonen, T. (1977). Associative memory: A system-theoretical approach. Heidelberg:
Springer-Verlag.
109
Kohonen, T. (1988). Self-organization and associative memory. Berlin: Springer-Verlag.
Li, J., Michel, A., & Porod, W. (1989). Analysis and synthesis of a class of neural networks:
Linear systems operating on a closed hypercube. IEEE Trans. Circuits and Systems-I,
36, 1405–1422.
Luenberger, D. (1973). Introduction to linear and nonlinear programming. Reading, MA:
Addison-Wesley.
Mangasarian, O. (1994). Nonlinear programming. Philedelphia: SIAM.
Mano, M. (1991). Digital design. New York: Prentice Hall.
Mertens, S., Koehler, H., & Bos, S. (1991). Learning grey-toned patterns in neural networks.
J. Phys. A, Math. Gen., 24, 4941–4952.
Michel, A., Farrell, J., & Porod, W. (1989). Qualitative analysis of neural networks. IEEE
Trans. Circuits and Systems-I, 36, 229–243.
Michel, A., & Liu, D. (2002). Qualitative analysis and synthesis of recurrent neural
networks. New York: Mercel Dekker.
Michel, A., Si, J., & Yen, G. (1991). Analysis and synthesis of a class of discrete-time
neural networks described on hypercubes. IEEE Trans. Neural Networks, 2, 32–46.
Michel, A. N., & Farrell, J. (1989). Associative memories via artificial neural networks.
IEEE Control Systems Magazine, 10, 1405–1422.
Moon, J., & Moser, L. (1965). On cliques in graphs. Isr. J. Math., 3, 23–28.
Muezzinoglu, M. (2000). A graph theoretical approach to the binary dynamical associative
memory design. M.Sc. Thesis: Istanbul Technical University.
Muezzinoglu, M., & Guzelis, C. (2001). A Boolean Hebb rule for binary associative
memory design. Proc. 44th IEEE Midwest Symposium on Circuits and Systems
(MWSCAS’01), Dayton, OH,, 713–716.
Muezzinoglu, M., & Guzelis, C. (2002). Associative memory design via path embedding
into a graph. Proc. 11th Turkish Symposium on Artificial Intelligence and Neural
Networks (TAINN’2002), Istanbul, Turkey, 65–71.
Muezzinoglu, M., & Guzelis, C. (2003a). A Boolean Hebb rule for binary associative
memory design. IEEE Trans. Neural Networks, -, (to appear).
110
Muezzinoglu, M., & Guzelis, C. (2003c). Perfect storage of binary patterns in recurrent
multi-layer associative memory. Neural Processing Letters, -, (submitted, in review).
Muezzinoglu, M., & Guzelis, C. (2003b). Perfect storage of binary patterns in recurrent
multilayer associative memory. Proc. 12th Turkish Symposium on Artificial Intelligence
and Neural Networks (TAINN’2003), Canakkale, Turkey, (to appear).
Muezzinoglu, M., Guzelis, C., & Zurada, J. (2003a). Construction of energy landscape
for discrete Hopfield associative memory with guaranteed attractiveness of fixed points.
Proc. 1st International IEEE EMBS Conference on Neural Engineering, Capri, Italy, –.
Muezzinoglu, M., Guzelis, C., & Zurada, J. (2003b). Construction of energy landscape for
discrete Hopfield associative memory. IEEE Trans. Neural Networks, -, (to appear).
Muezzinoglu, M., Guzelis, C., & Zurada, J. (2003c). A new design method for complex-
valued multistate Hopfield associative memory. Proc. International Joint Conference on
Neural Networks (IJCNN’03), Portland, OR, (to appear).
Muezzinoglu, M., Guzelis, C., & Zurada, J. (2003d). A new design method for the complex-
valued multistate Hopfield associative memory. IEEE Trans. Neural Networks, 14, 891–
899.
Nadal, J., & Rau, A. (1991). Storage capacity of potts-perceptron. J. Phys. I France, 1,
1109–1121.
Pardalos, P., & Rodgers, G. (1992). A branch-and-bound algorithm for the maximum clique
problem. Computers Operations Research, 19, 363–375.
Park, J., Kim, Y., Eom, I., & Lee, K. (1993). Economic load dispatch for piecewise quadratic
cost function using Hopfield neural network. IEEE Trans. Power Systems, 8, 1030–1038.
Pekergin, F., Morgul, O., & Guzelis, C. (1999). A saturated linear dynamical network for
approximating maximum clique. IEEE Trans. Circuits and Systems-I, 46, 677–685.
Perzonnas, L., Guyon, I., & Dreyfus, G. (1986). Collective computational properties of
neural networks: New learning mechanism. Phys. Rev. A, 34, 4217–4228.
Pınar, M., & Chen, B. (1999). l(1) solution of linear inequalities. Ima. J. Numer. Anal., 19,
19–37.
111
Ritz, S., Anderson, J., Silverstein, J., & Jones, R. (1977). Distinctive features, categorical
perception, and probability learning: Some applications of a neural model. Psychological
Review, 84, 413–451.
Rosenblatt, F. (1962). Principles of neurodynamics: Perceptrons and the theory of brain
machines. Washington: Spartan.
Schwarz, S., & Mathis, W. (1992). Cellular neural network design with continuous
signals. Proc. 2nd Int. Workshop on Cellular Neural Networks and their Applications
(CNNA-92), Munich, Germany, 17.
Sengor, N., Cakır, Y., Guzelis, C., Pekergin, F., & Morgul, O. (1999). An analysis of
maximum clique formulations and saturated linear dynamical network. ARI, 268–276.
Shankmukh, K., & Venkatesh, Y. (1995). Generalised scheme for optimal learning in
recurrent neural networks. IEE Proc.-Vis. Image Signal Process., 142, 71–77.
Shrivastava, Y., Dasgupta, S., & Reddy, S. (1992). Guaranteed convergence in a class of
Hopfield netwoks. IEEE Trans. Neural Networks, 3, 951–961.
Shrivastava, Y., Dasgupta, S., & Reddy, S. (1995). Nonpositive Hopfield networks for
unidirectional error correcting coding. IEEE Trans. Circuits and Systems-I, 42, 293–
306.
Sompolinsky, H., & Kanter, I. (1986). Temporal association in asymmetric neural networks.
Physical Review Letters, 57, 2861–2864.
Sudharsanan, S., & Sundareshan, M. (1991). Equilibrium characterization of dynamical
neural networks and a systematic synthesis procedure for associative memories. IEEE
Trans. Neural Networks, 2, 509–521.
Suter, B., & Kabrisky, M. (1992). On a magnitude preserving iterative maxnet algorithm.
Neural Computation, 4, 224–233.
Tan, S., Hao, J., & Vandewalle, J. (1991). Determination of weights for Hopfield associative
memory by error back propagation. Proc. IEEE Int. Symposium on Circuits and Systems,
5, 2491.
Vavasis, S. (1991). Nonlinear optimization: Complexity issues. New York: Oxford
University Press.
112
Vidyasagar, M. (1993). Nonlinear systems analysis, second edition. New Jersey: Prentice-
Hall Publications.
Watta, P., & Hassoun, M. (1991). Exact associative neural memory dynamics utilizing
Boolean matrices. IEEE Trans. Neural Networks, 2(4), 437–448.
Xiangwu, M., & Hu, C. (1997). Using evolutionary programming to construct Hopfield
neural networks. Proc. IEEE Int. Conference on Intelligent Processing Systems, 1, 571.
Zurada, J. (1992). Introduction to artficial neural systems. St. Paul: West Pub. Co.
Zurada, J., Cloete, I., & van der Poel, E. (1994). Neural associative memories with multiple
stable states. Proc. 3rd Int. Conf. Fuzzy Logic, Neural Nets, and Soft Computing, Iizuka,
Japan, 45–51.
Zurada, J., Cloete, I., & van der Poel, E. (1996). Generalized Hopfield networks with
multiple stable states. Neurocomputing, 13, 135–149.
APPENDIX
Proof of Theorem 2
”Only if” Part: i) COMP1 is obviously necessary for the compatibility: If COMP1 is
violated, then there exists no i, j such that xi = yj = 1 and xj = yi = 0 for some
x,y ∈ M , so either xi = 1 ⇒ yi = 1 ∀i, or yi = 1 ⇒ xi = 1 ∀i. This means either
that the independent set Sy covers Sx, i.e. Sx ⊆ Sy, or that Sy ⊆ Sx, both violating Case 1.
ii) To prove the necessity of COMP2, let us consider three distinct characteristic vectors
x,y, z ∈ M such that they mutually satisfy COMP1 but violate COMP2. There are two
cases to be analyzed:
I) xj = xk = yi = yk = zi = zj = 1 and xi = yj = zk = 0 without having an w ∈ M such
that wi = wj = wk = 1.
II) xj = xk = yi = yk = zi = zj = 1, xi = yj = zk = 0 and there exists some w ∈ M
such that wi = wj = wk = 1. But, for each of these w vectors wl = 0 for some l with
xl = yl = zl = 1.
Embedding this set of vectors into a graph G =< V, E > will cause an extraneous
MIS in both cases. To see that, suppose at the beginning we have a fully connected graph,
i.e. E = (i, j) ∈ V × V and embed vectors one by one. Embedding a binary vector
d ∈ M into G is then equivalent to removing some edges (p, q) ∈ E from the graph when
dp = dq = 1. Consequently, an existing edge (p, q) in G cannot be excluded without
embedding a vector whose p-th and q-th entries are both 1. Now, the above considered
vectors x,y and resp. z remove the edges (j, k), (i, k) and resp. (i, j). If there exists
no w ∈ M such that wi = wj = wk = 1 (as stated in I)), then, after the embedding
procedure, the resulting graph will contain an MIS i, j, k which is imposed by none of
114
the embedded vectors, causes a violation mentioned in Case 2. Now assume II) which
means xj = xk = xl = yi = yk = yl = zi = zj = zl = wi = wj = wk = 1 and
xi = yj = zk = wl = 0 for x,y, z,w ∈ M and no w ∈ M with wi = wj = wk = wl = 1.
Such a quadruple results in an extraneous independent set i, j, k, l which is necessarily a
subset of an extraneous MIS causing a violation mentioned in Case 2.
”If” Part: The proof will be done by contradiction.
i) Assume first that the compatibility is violated by a pair of distinct characteristic vectors
x,y ∈ M in the way stated in Case 1. As a direct consequence of the embedding procedure
(3), Sx ⊆ Sy implies xi = 1 ⇒ yi = 1 ∀i. This violates COMP1.
ii) We will show that the existence of an extraneous MIS implies that there exists a triple of
characteristic vectors violating COMP2.
Any MIS of cardinality 2 is a pair i, j of nodes having no edge between them and
satisfying other conditions needed for being an MIS. In the embedding procedure explained
above, an edge can not be removed from the graph without embedding a vector x whose
i-th and j-th entries are both unity. Since this edge should be imposed by a vector from M ,
then this MIS cannot be extraneous. Hence, any extraneous MIS should be of cardinality 3
or more.
Consider now the case that the assumed extraneous MIS has cardinality 3, and is denoted
by Se3 = i, j, k. Since it is extraneous, Se
3 should not be created by a single vector x in
M . By the definition of independent set, a graph including Se3 as its independent set does
not contain the edges (j, k), (i, k) and (i, j). For a pair of distinct vectors x,y removing
these three edges from the graph, there is only one possibility which needs care: One of the
vectors, say x, removes edge (j, k) and the other, y, removes edges (i, k) and (i, j). This
requires xj = xk = 1 and yi = yj = yk = 1. Se3 is indeed imposed by y, so not extraneous.
By the above analysis, it becomes clear that any extraneous MIS should be of cardinality
3 or more, and an extraneous MIS with cardinality 3 cannot be created by a single or a
couple of characteristic vectors. In fact, as described in the ”only if” part, a triple of vectors
violating COMP2 while satisfying COMP1 creates an extraneous MIS of cardinality 3.
115
Moreover, this is the unique way for a triple causing an extraneous MIS of cardinality 3
or just an extraneous independent set of cardinality 3. (The other ways can be eliminated in
similar to the eliminations done for vector pairs.)
What remains to be proven is that an extraneous MIS Se≥3 with cardinality not less than
3 can only be created by some triple of vectors violating COMP2 while mutually satisfying
COMP1. Let X be the set of Se≥3-nonredundant vectors responsible for the existence of
Se≥3, i.e. each vector in X removes at least one edge between two nodes both in Se
≥3 and
none of two vectors in X removes the same set of Se≥3-related edges. For any vector x ∈ X ,
define the index sets I0x = i ∈ Se
≥3|xi = 0 and I1x = i ∈ Se
≥3|xi = 1 and observe that
I0x 6= ∅ and |I1
x| ≥ 2. Also note that x does not remove the set of edges (i, j) ∈ I 0x × I0
xand also it contributes to the existence of Se
≥3 as removing some edges (j, k) ∈ I1x × I1
x.
Let L0x be any strict subset of I0
x . Then, the set Σe≥3 = Se
≥3 − L0x is also extraneous. To
see that, suppose Σe≥3 is not extraneous while Se
≥3 is extraneous. Now, there should exist a
vector u ∈ M which directly imposes Σe≥3 in the resulting graph. By the definition of L0
x,
x contributes only to the extraneousness of Σe≥3 which is assumed to be non-extraneous, so
contradicts with x ∈ X . It means that the extraneousness of Se≥3 implies the extraneousness
of Σe≥3. Now, consider a specific x ∈ X and choose an L0
x as |L0x| = |I0
x| − 1. Let us
extract the indices i’s belonging to L0x from each vector in X to obtain a new set X of Σe
≥3-
nonredundant vectors which creates Σe≥3. Note that X contains at least three vectors. Let ξ
denote the reduced form of x. Observe that I1ξ = I1
x while I0ξ consists of a single element,
say i. The connections of node i to any other node whose index is in Σe≥3 −i = I1
ξ should
be removed by some other vectors in X . For such a vector η ∈ X , we should have ηi = 1.
On the other hand, there should exist a j ∈ Σe≥3 with ηj = 0. Then, there is a third vector
ζ ∈ X which removes the edge (i, j) as ζi = ζj = 1. There necessarily exists an index
k ∈ Σe≥3 such that ζk = 0 with ηk = 1; otherwise ηk = 0 whenever ζk = 0 contradicts with
the Σe≥3-nonredundancy of η and ζ .
This means that there exists a triple x,y, z ∈ M which are the originals of ξ, η, ζ having
the pattern mentioned in COMP2. The independent set i, j, k created by x,y and z is
either extraneous, so there exists no w ∈ M such that wi = wj = wk = 1 violating
COMP2; or not extraneous. Assume now that all such triples it, jt, ktt are not extraneous.
116
This means that for each t there exists an wt ∈ M such that wtit = wt
jt= wt
kt= 1. Then the
implication xl = yl = zl = 1 ⇒ wtl = 1 ∀l, t contradicts with the extraneousness of Se
≥3.
So this implication should be violated for some l which is equivalent to saying that there
exists a triple of vectors violating COMP2.
NOTES