modular neural networks ii - university of calgary in...

22
Modular Neural Networks II Presented by: David Brydon Karl Martens David Pereira CPSC 533 - Artificial Intelligence Winter 2000 Instructor: C. Jacob Date: 16-March-2000

Upload: others

Post on 24-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Modular NeuralNetworks II

Presented by:David BrydonKarl MartensDavid Pereira

CPSC 533 - Artificial Intelligence Winter 2000Instructor: C. JacobDate: 16-March-2000

Page 2: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Presentation Agenda

A Reiteration Of Modular Neural Networks Hybrid Neural Networks Maximum Entropy Counterpropagation Networks Spline Networks Radial Basis Functions

Note: The information contained in this presentation hasbeen obtained from Neural Networks: A SystematicIntroduction by R. Rojas.

Page 3: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

A Reiteration of Modular NeuralNetworks

There are many different types of neural networks -linear, recurrent, supervised, unsupervised, self-organizing, etc. Each of these neural networks havea different theoretical and practical approach.

However, each of these different models can becombined.

How ? Each of the afore-mentioned neural networkscan be transformed into a module that can be freelyintermixed with modules of other types of neuralnetworks.

Thus, we have Modular Neural Networks.

Page 4: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

A Reiteration of Modular NeuralNetworks

But WHY do we have Modular Neural NetworkSystems ?

To Reduce Model Complexity To Incorporate Knowledge To Fuse Data and Predict Averages To Combine Techniques To Learn Different Tasks Simultaneously To Incrementally Increase Robustness To Emulate Its Biological Counterpart

Page 5: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

A very well-known and promising family ofarchitectures was developed by Stephen Grossberg.It is called ART - Adaptive Resonance Theory. It is closer to the biological paradigm than feed-forward networks or standard associative memories. The dynamics of the networks resembles learning inhumans. One-shot learning can be recreated with this model.There are three different architectures in this family: ART-1: Uses Boolean values ART-2: Uses real values ART-3: Uses differential equations

Page 6: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Each category in the input space is represented by avector.The ART networks classify a stochastic series ofvectors into clusters.All vectors located inside the cone around eachweight vector are considered members of a specificcluster.Each unit fires only for vector located inside itassociated ‘cone’ of radius ‘r’.The value ‘r’ is inversely proportional to theattention parameter of the unit.

Large ‘r’ means classification of the inputspace is fine.

Small ‘r’ means classification of the input spaceis coarse.

Page 7: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Fig. 1. Vector clusters and attention parameters

Page 8: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Once the weight vectors have been found, thenetwork computes whether new data can or cannotbe classified by the existing clusters.If not, a new a new cluster is created with a newassociated weight vector.ART networks have two major advantages: Plasticity: it can always react to unknown inputs (bycreating a new cluster with a new weight vector, ifthe given input cannot be classified by existingclusters). Stability: Existing clusters are not deleted by theintroduction of new inputs (New clusters will just becreated in addition to the old ones).However, enough potential weight vectors must beprovided.

Page 9: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Fig. 2. The ART-1 Architecture

Page 10: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

The Structure of ART-1 (Part 1 of 2):There are two basic layers of computing units.

Layer F1 receives binary input vectors from the inputsites.

As soon as an input vector arrives it is passed tolayer F1 and from there to layer F2.

Layer F2 contains elements which fire according tothe “winner-takes-all” method. (Only the elementreceiving the maximal scalar product of its weightvector and input vector fires).

When a unit in layer F2 has fired, the negative weightturns off the attention unit. Also, the winning unit inlayer F2 sends back a 1 throughout the connectionbetween layer F2 and F1.

Now each unit in layer F1 becomes as input thecorresponding component of the input vector x andof the weight vector w.

Page 11: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

The Structure of ART-1 (Part 2 of 2):

The i-th F1 unit compares xi with wi and outputs theproduct xiwi.The reset unit receives this information and also thecomponents of x, weighted by p, the attentionparameter so that its own computation is

p (x1+x2+…+xn) - x.w 0 which is the same as

(x.w) / (x1+x2+…+xn) p

The reset unit fires only if the input lies outside theattention cone of the winning unit. A reset signal issent to layer F2, but only the winning layer isinhibited.This is turns activates the attention unit and a newround of computation begins. Hence, there isresonance.

Page 12: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

The Structure of ART-1 (Some Final Details):

The weight vectors in layer F2 are initialized with allcomponents equal to 1 and p is selected to satisfy0<p<1. This ensures that eventually an unused vectorwill be recruited to represent a new cluster.The selected weight vector w is updated by pulling itin the direction of x. This is done in ART-1 by turningof all component in w which are zeros in x.

The purpose of the reset signal is to inhibit all unitsthat do not resonate with the input. A unit in layer F2,which is still unused, can be selected for the newcluster containing x. In this way, sufficientlydifferent input data can create a new cluster. Bymodifying the value of the attention parameter p, wecan control the number of clusters and how widethey are.

Page 13: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

The Structure of ART-2 and ART-3

ART-2 uses vectors that have real-valuedcomponents instead of Boolean components.The dynamics of the ART-2 and ART-3 models isgoverned by differential equations.However, computer simulations consume too muchtime.Consequently, implementations using analoghardware or a combination of optical and electronicelements are more suited to this kind of model.

Page 14: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Maximum entropySo what’s the problem with ART ? It tries to buildclusters of the same size, independently of thedistribution data.So, is there a better solution ? Yes, Allow the clustersto have varying radii with a technique called the“Maximum Entropy Method”.

What is “entropy” ? The entropy H of a data set of Npoints assigned to k differently clusters c1, c2,c3,…,cn is given by

H=- p(c1)log(p(c1)) + p(c1)log(p(c2)) + ... +p(cn)log(p(cn))

where p(ci) denotes the probability of hitting the i-thcluster, when an element of the data set is picked atrandom.Since the probabilities add up to 1, the cluster thatmaximizes the entropy is one for which all clusters areidentical. This means that the clusters will tend tocover the same number of points.

Page 15: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Maximum entropyHowever, there is still a problem - whenever thenumber of elements of each class in the data setis different. Consider the case of unlabeledspeech data: some phonemes are more frequentthan others and if a maximum entropy method isused, the boundaries between clusters willdeviate from the natural solution and classifysome data erroneously.So how do we solve this problem ? With the“Boostrapped Iterative Algorithm”:cluster: Computer a maximum entropy clusteringwith the training data. Label the original datadata according to this clustering.select: Build a new training set by selecting fromeach class the same number of points (randomselection with replacement). Go to the previousstep.

Page 16: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Counterpropagation networkAre there any other hybrid network models ?Yes, the counter-propagation network asproposed by Hecht-Nielsen.So what are counter-propagation networksdesigned for ? To approximate a continuousmapping f and it inverse f-1.A counter-propagation consists of an n-dimentional input vector which is fed to a hiddenlayer consisting of h cluster vectors. The outputis generated by a single linear associator unit.The weights in the network are adjusted usingsupervised learning.

The above network can successfully approximatefunctions of the form f: Rn -> R.

Page 17: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Fig. 3 Simplified counterpropagation nework

Page 18: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Counterpropagation networkThe training phase is completed in two parts

Training of the hidden layer into a clusteringof input space that corresponds to an n-dimentional Voronoi tiling. The hidden layersoutput needs to be controlled so that only theelement with the highest activation fires.The zi weights are then adjusted to representthe value of the approximation for thecluster region.

This network can be extended to handle multipleoutput

units.

Page 19: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Fig. 4 Function approximation with acounterpropagation network.

Page 20: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Spline networksCan the approximation created by acounterpropagation network be improved on?YesIn the counterpropagation network the VoronoiTiling, is composed of a series horizontal tiles.Each of which represents an average of thefunction in that region.The spline network solves this problem byextending the hidden layer in thecounterpropagation network. Each unit is pairedwith a linear associator, the cluster unit is usedto inhibit or activate the linear associator whichis connected to all inputs.This modification allows the resulting set of tilesto be oriented differently with respect to eachother. Creating an approximation with a smallerquadratic error, and a better solution to theproblem.Training proceeds as before except the newly

Page 21: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Fig. 5 Function approximation with linearassociators

Page 22: Modular Neural Networks II - University of Calgary in Albertapages.cpsc.ucalgary.ca/.../Winter2000/CPSC533/Slides/05.2.5-ART.pdf · Radial basis functions Has a simular structure

Hybrid Neural Networks

Radial basis functionsHas a simular structure as that of the counterpropagation network. The difference is in theactivation function used for each unit is Gaussianinstead of Sigmoidal.The Gaussian approach uses locally concentratedfunctions.The Sigmodal approach uses a smooth stepapproach.Which is better depends on the specific problemat hand. If the function is smooth step then theGaussian approach would require more units,where if the function is Gaussian then theSigmodal approach will require more units.