research article a sensitivity-based improving learning...

Research ArticleA Sensitivity-Based Improving Learning Algorithm forMadaline Rule II

Shuiming Zhong12 Yu Xue2 Yunhao Jiang2 Yuanfeng Jin3

Jing Yang4 Ping Yang2 Yuan Tian5 and Mznah Al-Rodhaan5

1 Jiangsu Engineering Center of NetworkMonitoring NanjingUniversity of Information Science andTechnology Nanjing 210044 China2 School of Computer amp Software Nanjing University of Information Science and Technology Nanjing 210044 China3Department of Mathematics Yanbian University Jilin Yanji 133002 China4 Beijing Normal University Zhuhai 519087 China5 Computer Science Department College of Computer and Information Sciences King Saud University Riyadh 11362 Saudi Arabia

Correspondence should be addressed to Yuanfeng Jin yfkimybueducn

Received 6 May 2014 Revised 30 July 2014 Accepted 2 August 2014 Published 27 August 2014

Academic Editor Yuri Vladimirovich Mikhlin

Copyright copy 2014 Shuiming Zhong et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

This paper proposes a new adaptive learning algorithm forMadalines based on a sensitivitymeasure that is established to investigatethe effect of aMadaline weight adaptation on its outputThe algorithm following the basic idea of minimal disturbance as theMRIIdid introduces an adaptation selection rule by means of the sensitivity measure to more accurately locate the weights in real needof adaptation Experimental results on some benchmark data demonstrate that the proposed algorithm has much better learningperformance than the MRII and the BP algorithms

1 Introduction

The ability to learn is the uppermost function of neural net-works Hence to build a proper learning mechanism is a keyissue for all kinds of neural networks This paper focuses onthe learning mechanism of a Madaline especially improvingits learning performance

A Madaline (many Adalines) [1] is a binary feedforwardneural network (BFNN) with supervised learning mecha-nism which is suitable for handling inherently discrete taskssuch as logical calculation pattern recognition and signalprocessing Theoretically a discrete task can be regarded as aspecial case of a continuous one and the BP algorithm [2]based on continuous techniques is by now the most maturelearning algorithm of feedforward neural networks that iswhy the continuous feedforward neural networks (CFNNs)with the BP algorithm are more popular than MadalinesHowever compared with the CFNNs Madalines do havesome obvious advantages in nature that is (1) it is easy fordescription of discrete tasks without extra requirement of dis-cretization (2) it is simple in computation and interpretation

with hard-limit activation function and limited input andoutput states and (3) it is facilitative for hardware imple-mentation with the available VLSI technology Further theprocess of discretizing CFNNrsquos output for classification tasksis quite application-dependent and not suitable to be involvedin a general learning algorithm So a learning algorithm forBFNNs without relying on continuous technique and dis-cretization is worthy of being explored

HoweverMadalines have not yet had an effective learningalgorithm In literatures there have been many studies onMadaline learning since Madaline model was brought for-ward in early 1960s On the whole two main approaches onMadaline learning are well known One is an adaptive appro-ach that extends the perceptron rule [3] or something like thatto Madalines For example Ridgwayrsquos algorithm [4] calledMRI (Madaline rule I) by Winter [5] and the MRII (anextension of theMRI) [5 6] applyMays rule [7] a variation ofthe perceptron rule to Madalines Unfortunately these algo-rithms are still too poor in performance to meet practicalapplications The other is called geometrical constructionapproach [8 9] which fabricates a set of hyperplanes based on

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014 Article ID 219679 8 pageshttpdxdoiorg1011552014219679

2 Mathematical Problems in Engineering

Adaline (neuron) structure feature to meet the input-outputmapping in the training data set The main disadvantage ofthis approach is that it usually results in a Madaline withmuch larger architecture than the one resulted from an adap-tive learning algorithm and thus it not only complicates thecomputation andhardware implementation but also degradesthe generalization ability of the Madaline In fact thegeometrical construction approach is not a learning algo-rithm in adaptive sense Therefore it is still significant toinvestigate the learning algorithm of Madalines

It is obvious that the basic idea of minimal disturbance [15 6] is crucial to almost all adaptive supervised learning algo-rithms such as the MRII and BP As a key principle the ideatries in each iteration of weight adaptation to not only bettersatisfy the current training sample but also avoid as muchas possible the disturbance on the effects established by previ-ous training samples Although the BP algorithmwas derivedfrom the steepest descend method it actually also followsthis idea [1] Unfortunately the MRII does not well imple-ment the principle and this is the main cause of its poorperformance It can be found that the confidence level (sum-mation of weighted inputs of a neuron) [5 6] adopted by theMRII as a measure for implementing the principle cannotguarantee to select proper neurons for the weight adaptationduring learning

In Madaline learning one of the most important issuesis the effect of variation of network parameters on its outputso Madaline sensitivity (ie the effect of parameter variationon network output) can be used to properly assess this effectBased on the Madaline sensitivity measure a new learningalgorithm (SBALR) [10] of Madalines is proposed and it hasperformedwell in learningHowever why isMRII (a previouslearning algorithm for Madalines) poor in learning perfor-manceThis problem has still not been solved in theoryThispaper tries to theoretically analyze MRIIrsquos disadvantage andto further improve it

This paper presents an improving Madaline learningalgorithm based on aMadaline sensitivitymeasureThemaincontribution of the algorithm is that (1) it analyzes MRIIrsquosshortage in learning performance from the sensitivity point ofview and points out that the confidence level in MRII cannotproperly measure the output perturbation due to weightadaptation (2) it proposes an adaptation selection rule bymeans of the sensitivity measure to improveMRIIThe adap-tation selection rule for neurons couldmore accurately locatethe neurons and thus their weights in real need of adaptationduring learning so as to better implement the minimal dis-turbance principle for greatly improving the learning perfor-mance of MRII

Although both this paper and [10] take the Madalinesensitivity theory as the important theory there are twomaindifferences between them (1) In this paper the sensitivity ismainly taken as a measure to better locate the neurons in realneed of adaptation during learning while in [10] the sensi-tivity is used to guide weight learning rule development (2)in goal this paper adopts the sensitivity theory to analyzeMRIIrsquos shortages in performance and then to improve MRIIwhile [10] takes the sensitivity theory to guide learning ruledesign and thus to develop a completely new learning algo-rithm for Madalines independent of MRII

The rest of this paper is organized as follows In the nextsection the Madaline model and its sensitivity are brieflydescribed Measures for evaluating the effects of weightadaptation are discussed in Section 3 An adaptation selectionrule based on the Madaline sensitivity is proposed in Sec-tion 4 Following in Section 5 is the new Madaline learningalgorithm based on the rule Experimental evaluations andresults are given in Section 6 Finally Section 7 concludes thepaper

2 Madaline Model and Sensitivity Measure

21 Madaline Model and Notations A Madaline is a kind ofbinary multilayer feedforward neural network with a super-vised learning mechanism and consists of a set of neuronscalled Adalines with binary input output and hard limitactivation function The input of an Adaline which is rep-resented by 119883 = (119909

0 1199091 119909

119899)119879 including an extra element

1199090(equiv 1) corresponding to a bias119908

0 is weighted by the weight

119882 = (1199080 1199081 119908

119899)119879 containing the bias and then fed into

an activation function 119891(sdot) to yield an output of the Adalineas

119910 = 119891 (119883119879119882) =

minus1 119883119879119882 lt 0

+1 119883119879119882 ge 0(1)

Generally a Madaline has 119871 layers and each layer 119897 (1 le 119897 le119871) has 119899119897 (119899119897 ge 1) Adalines The form of 1198990 minus 1198991 minus sdot sdot sdot minus 119899119871 isused to represent the Madaline in which each 119899119897 (0 le 119897 le 119871)not only stands for a layer but also indicates the number ofAdalines in the layer 1198990 is an exception which denotes theinput dimension of the Madaline For the 119895th layer 119883119895 (1 le119895 le 119871) denotes the input of all Adalines in the layer and119884119895 (1 le 119895 le 119871) denotes the output of the layer They meet119884119896minus1 = 119883119896 (2 le 119896 le 119871) Particularly 1198831 denotes not only theinput of all Adalines in the first layer but also the input of theentire Madaline 119899119871 denotes the output layer and 119884119871 is theoutput of both the last layer and the entire Madaline

It is well known that a network with a single hiddenlayer and enough hidden neurons is adequate to deal with allmapping problems [11] For simplicity and without loss ofgenerality the following discussion only focuses on theMadalines with single hidden layer

22 Madaline Sensitivity Measure Usually an adaptivesupervised learning is a process of iterative weight adaptationuntil the input-output mapping indicated by a training dataset is established So in each iteration how to correctly locatethe weights in real need of adaptation is a key issue for thesuccess of a Madaline learning algorithm In order to suc-cessfully locate the weights in need of adaptation it is vital toanalyze Madaline output variation caused by the weightadaptation Since the study on Madaline sensitivity aims atexploring the effects of a Madaline weightsrsquo variation on itsoutput it is reasonable to investigate the sensitivity as ameasure to locate the weights

The following subsections will briefly introduce the latestresearch results on the Madaline sensitivity which will be

Mathematical Problems in Engineering 3

employed as a technical tool to support the investigationof Madaline learning mechanism For further details pleaserefer to [12ndash14]

221 Adaline Sensitivity

Definition 1 The sensitivity of an Adaline is defined as theprobability of the Adalinersquos output inversion due to its weightvariation with respect to all inputs which is expressed as

119904 (Δ119882) =119881var119881119899

(2)

where 119881var is the number of inputs whose Adalinersquos output isinversed due to the Adalinersquos weight variation and 119881

119899is the

number of all inputs

The research results have shown that the Adaline sensitiv-ity can be approximately computed as

119904 (Δ119882)

asymp 119904 (Δ119882)

=

arccos ((1198821198821015840) (|119882| 10038161003816100381610038161003816119882101584010038161003816100381610038161003816))

120587

(|Δ119882| |119882|)

120587 for |Δ119882| ≪ |119882|

(3)

where 119882 Δ119882 and 1198821015840 respectively refer to the originalweight the weight variation and the varied weight

Due to the information propagation between layers ina Madaline the Adaline sensitivity will lead to the corre-sponding input variation of all Adalines in the next layer Sothe Adaline sensitivity to its input variation also needs tobe taken into account However it can be easily tackled bytransforming the input variation to an equivalent weightvariation as

119904 (Δ119883) asymp 119904 (Δ119883)

=arccos (sum119899

119894=01199081198941199081015840119894sum119899

119894=01199082119894)

120587

1199081015840

119894= minus119908119894119894 isin 1198951 1198952 119895

119870

119908119894

others

(4)

where Δ119883 denotes the input variation in which only119870 inputelements are varied and 119895

119905(1 le 119905 le 119870) denotes that

the 119895119905th input element of the Adaline is varied and its cor-

responding equivalent varied weight element is 1199081015840119895119905

119899 is theinput dimension of the Adaline

Usually each weight element of an Adaline during train-ing is in the same magnitude thus according to the studyresult of [11] (4) can further be simplified to

119904 (Δ119883) asympradic4119870 (119899 + 1)

120587=(|Δ119883| |119883|)

120587for 119870 ≪ 119899 (5)

222Madaline Sensitivity Based on the structural character-istics of Madalines and the sensitivity of Adalines the sensi-tivity of a layer and a Madaline can separately be defined asfollows

Definition 2 The sensitivity of layer 119897 (1 le 119897 le 119871) is a vectorin which each element is the sensitivity of the correspondingAdaline in the layer due to its input and weight variationswhich is expressed as

119878119897= (119904119897

1 119904119897

2 119904

119897

119899119897)119879

(6)

Definition 3 The sensitivity of aMadaline is the sensitivity ofits output layer that is

119878net = 119878119871= (119904119871

1 119904119871

2 119904

119871

119899119871)119879

(7)

During training it could be helpful to quantitativelyevaluate the output variation of a Madaline due to its weightadaptation Usually there are two ways to evaluate the outputvariation One is the number of inputs at which theMadalineoutput is varied the other is the number of output elementswhose values are varied before and after the weight adap-tation Apparently for Madalines with a vector output thelatter can more truly reflect their output variation before andafter the weight adaptation Therefore the sensitivity of aMadaline can be further quantified as follows

119904net =sum119899119871

119894=1119904119871119894sdot 119881119899

119899119871 sdot 119881119899

=1

119899119871

119899119871

sum119894=1

119904119871

119894 (8)

where 119881119899is the number of all inputs

From (8) the Madaline sensitivity is equal to the averageof sensitivity values of all Adalines in the output layer

3 Measures for Evaluating the Effects ofWeight Adaptation

During the training of a Madaline a weight adaptation willinevitably lead to its output variation In order to make theMadaline obtain the desired output for the current inputsample by weight adaptation and meanwhile meet the mini-mal disturbance principle it is necessary to find a measure toevaluate if the effects of weight adaptation on the output ofthe Madaline are acceptable

31 Sensitivity Measure According to the above Madalinesensitivity definition a Madaline output variation due to itsweight adaptation is just the Madaline sensitivity due to itsweight adaptation that is

output variation = 119904net (Δ119882119897

119894) 1 le 119897 le 119871 1 le 119894 le 119899

119897

(9)

Considering the computation difference of the Adaline sensi-tivity between the hidden layer and the output layer we dividethe computation of (9) into the following two cases


Adaline sensitivity

MadalinesensitivityW and ΔW

(a) For an output-layer Adaline

Adaline sensitivity

Succeeding weightsMadaline sensitivity

W and ΔW

(b) For a hidden-layer Adaline

Figure 1 The computing process of Madaline sensitivity

(a) for the weight adaptation of the 119894th (1 le 119894 le 119899119871)Adaline in the output layer its sensitivity can becomputed by (3) as

1199042

119894(Δ1198822

119894)

=arccos ((1198822

119894(1198822119894+ Δ1198822

119894)) (

100381610038161003816100381610038161198822119894

10038161003816100381610038161003816

100381610038161003816100381610038161198822119894+ Δ1198822

119894

10038161003816100381610038161003816))

120587

(10)

(b) For the weight adaptation of the 119895th (1 le 119895 le 1198991)Adaline in the hidden layer the input variation ofAdalines in its succeeding layer will occur and thiswill propagate layer by layer to the output layer Thusthe sensitivity of the hidden-layer Adaline due to itsweight adaptation is firstly computed by (3) and thesensitivity of the Adalines in the output layer due toits corresponding input variation is computed by (4)and then the sensitivity of each Adaline in the outputlayer can be computed as

1199042

119894(Δ1198821

119895) asymp 1199041

119895(Δ1198821

119895) 1199042

119894(Δ1198832

119894)

= 1199041

119895(

arccos ((100381610038161003816100381610038161198822

119894

10038161003816100381610038161003816

2

minus 21199082119894119895) 100381610038161003816100381610038161198822119894

10038161003816100381610038161003816

2

)

120587)

1 le 119894 le 1198992

(11)

Based on the result of (10) or (11) theMadaline sensitivitydue to its weight adaptation can be calculated by (8)

32 Confidence Level In order to facilitate analysis it is nec-essary to firstly introduce the weight adaptation rule in theMRII namely Mays rule [5] as follows

1198821015840= 119882 + 119889120578(

(119877 minus 119889119883119879119882)

(119899 + 1))119883 for 10038161003816100381610038161003816119883

11987911988210038161003816100381610038161003816lt 120575 (12)

where 119882 1198821015840 and 119883 respectively represent the originalweight the variedweight and the current input of anAdaline119889 is the desired output of the Adaline for the current input120578 (gt 0) and 119877 (gt 0) respectively represent an adaptationconstant and an adaptation level 119899 is the input dimension ofthe Adaline and 120575 (gt 0) is a dead zone value

When the output of an Adaline needs to be reversed itwould have 119889119883119879119882 le 0 So according to Mays rule (12) itfurther has

Δ119882 = 119889120578((119877 minus 119889119883

119879119882)

(119899 + 1))119883 = 119889120578(

(119877 +1003816100381610038161003816100381611988311987911988210038161003816100381610038161003816)

(119899 + 1))119883

(13)

In theMRII the absolute value of weighted input summa-tion |119883119879119882| called confidence level is used as a measure toevaluate the effects of weight adaptation on Madaline outputduring training It is obvious that themeasure has some short-comings for evaluating the effects because the value of |119883119879119882|is only related to the current input and does not take allinputs into consideration However the Madaline sensitivitymeasure covers all inputs with no functional relation to anyindividual input In this sense the confidence level is a localmeasure for the network output variation at a given inputwhile the Madaline sensitivity is a global measure for allpossible inputs

From the sensitivity study one could make furtheranalysis about the shortcomings of the confidence level Theweight adaptation of an Adaline will directly affect the input-output mapping of the Adaline If the input-output mappingvaries this variation will propagate through the network andfinally may cause a variation of the input-output mapping ofthe Madaline Since both Adaline sensitivity and Madalinesensitivity are only functions of 119882 and Δ119882 they canrespectively reflect the output variations of Adalines andMadalines According to (10) and (11) the network outputvariation due to the weight adaptation of an Adaline in aMadaline can be illustrated as in Figure 1

However according to (13) Δ119882 is an increasing functionof the confidence level |119883119879119882| under given parameters 120578 119877119899 and 119883 and its direction is the same as 119889119883 Unfortunately119882 cannot be reflected by the confidence level |119883119879119882| either inthemagnitude or in the direction So it can be seen from Fig-ure 1 that the confidence level |119883119879119882| of an Adaline is unableto exactly reflect the output variation of the Adaline and thusthe output variation of the correspondingMadaline based onweight adaptation rule (12) This shortcoming of the confi-dence level makes it unable to correctly guide the design ofMadaline learning algorithm

33 Simulation Verification for the TwoMeasures In order toverify the correctness of the above theoretical analysis com-puter simulations were carried out A Madaline with thearchitecture of 10ndash5ndash1 and random weights was chosen For


Table 1 Comparison of three different measures

Hidden Adalinenumber

Measures Number of varied outputelements in simulationConfidence Sensitivity

1 2214 0023 262 0058 0008 53 1429 0054 644 1685 0022 285 2959 0038 47

each hidden-layer Adaline from the first one to the lastone its weights were adapted by (12) (in which parameter 120575was ignored) and then the corresponding values of the twomeasures (the confidence level |119883119879119882| and the sensitivitymeasure) and the number of varied output elements due tothe weight adaptation were computed and simulated Theexperimental results are listed in Table 1

According to the values of the two measures and simula-tion results in Table 1 all hidden-layer Adalines are queuedin a sequence with an ascending order Table 2 gives in threerows three sequences in which the first one is regarded as thestandard and each wrongly located Adaline in the other twosequences is marked with bold

From Tables 1 and 2 one could find that the Madalinesensitivity measure is obviously superior to the confidencelevel In Table 2 there are four wrong locations in thesequence of the confidence level |119883119879119882| and two wronglocations in the sequence of the Madaline sensitivity It couldbe further found fromTable 2 that the twoAdalines wronglylocated by Madaline sensitivity namely Adaline 4 and Ada-line 1 are adjacent Adalines in the standard sequence Inaddition one could find from Table 1 that the actually variedoutput elements of them 26 output elements forAdaline 1 and28 output elements for Adaline 4 are very close This slightmismatch of the Madaline sensitivity measure with simula-tion results may mainly come from the approximate compu-tation of the sensitivity measure

Tables 1 and 2 show that our conclusion drawn from theabove theoretical analysis about the two measures is consis-tent with the result of the experiments which further verifiesthe fact that the Madaline sensitivity is a more appropriatemeasure to evaluate the effects of weight adaptation on aMadaline output

4 An Adaptation Selection Rule

For CFNNs with the support of the steepest descent tech-nique all neurons take part in weight adaptation duringtraining However because of Madalinesrsquo discrete featuresthe determination of which Adaline being in need of adap-tation is more complicated

For a Madaline when output errors occur the easiestway is to directly adapt the weights of the Adalines in theoutput layer whose outputs are in error But it is well knownthat a single-layer neural network can handle only linearlyseparable problems So a precondition of being able todirectly adapt the Adalines in the output layer is that the

Table 2 Hidden-layer Adaline sequence ordered by the actualvariation and the two measures

Measures Sequence of Adalines in hidden layerNumber of varied outputelements in simulation 2 rarr 1 rarr 4 rarr 5 rarr 3

Sensitivity measure 2 rarr 4 rarr 1 rarr 5 rarr 3

Confidence level 2 rarr 3 rarr 4 rarr 1 rarr 5

hidden-layer outputs must be linearly separable If the pre-condition is not satisfied it is impossible to train a Madalineto solve a nonlinearly separable problem by only adaptingAdalines in output layer For this consideration in the layerlevel the priority of adaptation would be given to Adalinesin hidden layer As the information flow in a Madaline isalways one-way from the input layer to the output layer it isapparent that the former layer would be in general prior to itssucceeding layer in a Madaline with many hidden layers

In the same hidden layer when the network output errorfor the current input occurs there may be many selections ofAdaline for adaptation to reduce the error due to the binaryfeature of the Adaline output Then a question is how toselect the Adaline or the Adaline combination that is really inneed of adaptation for improving training precision Actuallythere are two aspects that need to be considered for theselection One is that the adaptation of the selectedAdaline(s)must be able to reduce output errors of the Madaline for thecurrent input This is easy to be judged by the following waycalled ldquotrial reversionrdquo reverse the output(s) of the selectedAdaline(s) and then compute the output of the Madaline tocheck if the number of output element errors for the currentinput is reduced If it is view this selection as a useful oneTheother is that the adaptation of the selected Adaline(s) alsomust minimize the Madalinersquos output disturbance for allnoncurrent inputs According to the analysis in Section 3 aMadalinersquos output disturbance due to its weight adaptationcan be properly evaluated by the Madaline sensitivity SoMadaline sensitivity measure can be used to establish anadaptation selection rule as follows ldquogive priority of adapta-tion to the Adaline(s) that can reduce the output errors andmeanwhile minimize the sensitivity measurerdquo

In order to simplify the computation of the sensitivitymeasure during training it is noted that the weight adap-tation according to (12) is always a small real value so theconstraint in (3) can be met Besides the constraint in (5) canbe also met as long as the number of hidden-layer Adalines1198991 is more than one Thus (11) can be further simplified into

1199042

119894(Δ1198821

119895) = 1199041

119895(Δ1198821

119895) 1199042

119894(Δ1198832

119894)

asymp (2

1205872)radic

1

(1198991 + 1)(

10038161003816100381610038161003816Δ1198821119895

10038161003816100381610038161003816100381610038161003816100381610038161198821119895

10038161003816100381610038161003816

)

(14)

From (14) it can be seen that the sensitivity of an output-layer Adaline in aMadaline due to its hiddenAdalinersquos weightadaptation only depends on the weight variation ratio that is|Δ1198821119895||1198821119895| (1 le 119895 le 1198991) Hence by (8) the sensitivity


Input A Madaline with given architecture and random initial weights a set of training datalearning parameters 119877 120578 and the requirements of training precision and the Maximalepochs(1) Randomly arrange training samples(2) Loop for all training samples stating with 1 rarr 119894

(21) Feed the 119894th training sample into the Madaline(22) If the output of the Madaline is correct for the 119894th sample 119894 + 1 rarr 119894 go to Step 2(23) For each hidden layer 119897 119897 from 1 to 119871 minus 1 do

(231) Determine weight adaptations of all Adalines in the 119897th layer by (12) and thencalculate values of their sensitivity measure by (9) or (15)

(232) Sort 119897th-layer Adalines according to their sensitivity measure values inascending order

(233) For 119895 from 1 to lfloor1198991198972rfloor + 1 do(2331) For all possible adjacent Adaline combinations with 119895 elements in the queue

doA Implement the trial reversion for the current Adaline combinationB If output errors of the Madaline donrsquot reduce reject the adaptation and

continue to do for next Adaline combinationCWeight(s) of Adaline(s) in the current combination are adapted by (12)

Count the Madalinersquos output errorsD If the Madaline errors are equal to zero 119894 + 1 rarr 119894 go to Step 2 Else 1 rarr 119897 and

go to Step 23(24) For the 119896th-Adaline in output layer 119896 from 1 to 119899119871 doIf the output of the 119896th Adaline isnrsquot correct to the 119894th sample its weight is adapted by (12)

(3) Go to Step 1 unless the training precision meets the given requirement for all trainingsamples or training epochs reach the given numberOutput all weights and training errors under all training samples

Algorithm 1

measure for hidden-layer Adalines can be further simplifiedinto

the sensitivity measure =10038161003816100381610038161003816Δ1198821

119895

10038161003816100381610038161003816100381610038161003816100381610038161198821119895

10038161003816100381610038161003816

1 le 119895 le 1198991 (15)

5 New Madaline Learning Algorithm

AMadaline learning algorithm aims to assign proper weightsto every Adaline so that the input-output mapping could beestablished to maximally satisfy all given training samplesThe basic idea of the Madaline learning algorithm can bebriefly described as follows All training samples are itera-tively trained one by one until output errors of the Madalinefor all the samples are zero or meet a given precision require-ment Each time one training sample is fed into theMadalineand then selected weight adaptations are conducted in a layerfrom the first layer to the output layer until the output ofthe Madaline meets the desired output of the current sampleAs to the selection of weights for adaptation it can be treatedby distinguishing two cases the selection of Adaline(s) in ahidden layer and the selection of Adaline(s) in the outputlayer In the former case Adalines in the layer are selected toadapt according to the adaptation selection rule in the lattercase those Adalines that have erroneous outputs are selectedto adapt The details of an adaptive learning algorithm for aMadaline based on its sensitivity measure can be pro-grammed as shown in Algorithm 1

6 Experimental Evaluations

Usually the learning performance and the generalizationperformance are two main indexes to evaluate a learningalgorithm Due to the discrete characteristic of MadalinesMSE (mean square error) is no longer suitable to evaluate thelearning performance of the learning algorithmofMadalinesHerein instead of MSE the sample success rate and thenetwork convergence rate are used to evaluate the learningperformance The success rate is the percentage of successfultraining samples by a Madaline in training while the con-vergence rate is the percentage of Madalines that reach acomplete solution under specified requirements among agroup of Madalines participating in training Besides thegeneralization rate that shows the percentage of the successfultesting samples by aMadaline after training is used to evaluatethe generalization performance

To evaluate the efficiency of the proposed algorithmsome experiments are carried out using the algorithms theMRII and the BP respectively In the experiments Madalinesand MLPs (multilayer perceptron) with a single hidden layerwere organized to solve several representative problems and5 of them are chosen fromUCI repository [15]They are threeMonks problems two Led display problems and the And-Xor problem The Monks problems are Monks-1 with 124training samples and 432 testing samples Monks-2 with 169training samples and 432 testing samples and Monks-3 with122 training samples and 432 testing samples The Leddisplay problems are Led-7 with 7 attributes and Led-24 with


0

20

40

60

80

100

120

Data set (network architecture)

Con

verg

ence

rat

e (

)

BPMRIIOurs

Mon

ks-1

(10

-4-1

)

Mon

ks-1

(10

-3-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)

(a) Convergence rate

0

20

40

60

80

100

120


Succ

ess

rate

()

Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

BPMRIIOurs

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)

(b) Success rate

0

20

40

60

80

100

120


Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

Gen

eral

izat

ion

rate

()

BPMRIIOurs

(c) Generalization rate

Figure 2 Performance comparison among BP MRII and our algorithm

24 attributes both of them have 200 training samples and1000 testing samples and the latter adds 17 irrelevantattributes on the basis of the former The And-Xor with twoinputs and two outputs is a representative nonlinear logicalcalculation problem in which one output implements theldquoANDrdquo calculation of two inputs while the other implementsthe ldquoXORrdquo calculation of them For each experiment thetraining goal that is output error was set to 0 and epochs ofMonks problems Led problems and And-Xor problem were

set to 2000 2000 and 200 respectively Besides for MLPsthe momentum gradient descent BP algorithm was used totrain them

In order to guarantee the validity of experimental resultsall results presented in Figure 2 are the average of 100 runsrsquoresults Figure 2 shows that our algorithm has better per-formance than MRII not only on learning performance butalso on generalization performance especially for the difficultclassification problems such as Led-24 only for several


relative simple problems such as And-Xor and Led-7 MRIIhas a good performance Compared with BP algorithm ouralgorithm also shows better learning performance and gen-eralization performance especially on convergence rate onlyfor monks-3 problem BP algorithm is slightly better thanour algorithm The experimental results of Figure 2(a) showthat the BP algorithm is rather poor on the convergence ratewhich highlight the BP algorithmrsquos shortage of easily fallinginto the local minimum

7 Conclusion and Future Work

This paper presents a new adaptive learning algorithm forMadalines based on aMadaline sensitivitymeasureThemainfocus of the paper is how to implement the minimal distur-bance principle in the algorithmAn adaptation selection rulebased on the sensitivity measure is proposed to carry outthe minimal disturbance principle Both theoretical analysisand experimental evaluations demonstrate that the sensitivitymeasure is superior to the confidence level used in the MRIIWith the proposed adaptation selection rule the algorithmcan more accurately locate the weights in real need ofadaptation Experiments on some representative problemsshow that the proposed algorithm has better learning abilitynot only than that of the MRII but also than BP algorithm

Although the proposed learning algorithm of Madalineshas better performance it is noticed that there still exist someweaknesses because of the usage of the Mays rule in thealgorithm One is that too many parameters need to be setin advance which can hamper the application of MadalinesThe other is that the Mays rule is unable to guarantee weightadaptation to exactly follow the minimal disturbance ideaIn our future works we will try to solve these two issues todevelop a more perfect Madaline learning algorithm

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is supported by the Research Foundation ofNanjing University of Information Science and Technology(20110434) the National Natural Science Foundation ofChina (11361066 61402236 and 61403206) Natural ScienceFoundation of Jiangsu Province (BK20141005) UniversityNatural Science Research Program of Jiangsu Province(14KJB520025) a Project Funded by the Priority AcademicProgram Development of Jiangsu Higher Education Institu-tions and the Deanship of Scientific Research at King SaudUniversity (RGP-264)

References

[1] B Widrow and M A Lehr ldquo30 years of adaptive neural net-works perceptron Madaline and backpropagationrdquo Proceed-ings of the IEEE vol 78 no 9 pp 1415ndash1442 1990

[2] E D Rumelhart E G Hinton and R J Williams ldquoLearninginternal representations by error propagationrdquo in Parallel Dis-tributed Processing Exploration in the Microstructure of Cogni-tion vol 1 chapter 8 MIT Press Cambridge Mass USA 1986

[3] F Rosenblatt ldquoOn the convergence of reinforcement proceduresin simple perceptronsrdquo Cornell Aeronautical LaboratoryReportVG-1796-G-4 Buffalo NY USA 1960

[4] W C Ridgway ldquoAn adaptive logic system with generalizingpropertiesrdquo Tech Rep 1557-1 StanfordElectron Lab StandfordCalif USA 1962

[5] RWinterMadalines rule II a newmethod for training networksfor adalines [PhD thesis] Department of Electrical Engineer-ing Stanford University 1989

[6] R Winter and B Widrow ldquoMadaline rule II a training algo-rithm for neural networksrdquo in IEEE International Conference onNeural Networks vol 1 pp 401ndash408 SanDiego Calif USA July1988

[7] C H Mays ldquoAdaptive threshold logicrdquo Tech Rep 1556-1Stanford Electronics Lab Stanford Calif USA 1963

[8] M Frean ldquoThe upstart algorithm a method for constructionand training feedforward networksrdquo Neural Computation vol2 no 2 pp 198ndash209 1990

[9] J H Kim and S K Park ldquoGeometrical learning of binary neuralnetworksrdquo IEEE Transactions on Neural Networks vol 6 no 1pp 237ndash247 1995

[10] S Zhong X Zeng S Wu and L Han ldquoSensitivity-based adap-tive learning rules for binary feedforward neural networksrdquoIEEE Transactions on Neural Networks and Learning Systemsvol 23 no 3 pp 480ndash491 2012

[11] N E Cotter ldquoThe Stone-Weierstrass theorem and its applicationto neural networksrdquo IEEE Transactions on Neural Networks vol1 no 4 pp 290ndash295 1990

[12] S Zhong X Zeng H Liu and Y Xu ldquoApproximate comput-ation of Madaline sensitivity based on discrete stochastic tech-niquerdquo Science China Information Sciences vol 53 no 12 pp2399ndash2414 2010

[13] Y Wang X Zeng D S Yeung and Z Peng ldquoComputation ofMadalinessensitivity to input andweight perturbationsrdquoNeuralComputation vol 18 no 11 pp 2854ndash2877 2006

[14] X Zeng Y Wang and K Zhang ldquoComputation of Adalinesrsquosensitivity to weight perturbationrdquo IEEE Transactions on NeuralNetworks vol 17 no 2 pp 515ndash519 2006

[15] httpwwwicsuciedusimmlearnMLRepositoryhtml

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of


Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of


Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of


CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of


Operations ResearchAdvances in

Journal of


Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences


The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Algebra

Discrete Dynamics in Nature and Society



Decision SciencesAdvances in

Discrete MathematicsJournal of


Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of


Adaline (neuron) structure feature to meet the input-outputmapping in the training data set The main disadvantage ofthis approach is that it usually results in a Madaline withmuch larger architecture than the one resulted from an adap-tive learning algorithm and thus it not only complicates thecomputation andhardware implementation but also degradesthe generalization ability of the Madaline In fact thegeometrical construction approach is not a learning algo-rithm in adaptive sense Therefore it is still significant toinvestigate the learning algorithm of Madalines

It is obvious that the basic idea of minimal disturbance [15 6] is crucial to almost all adaptive supervised learning algo-rithms such as the MRII and BP As a key principle the ideatries in each iteration of weight adaptation to not only bettersatisfy the current training sample but also avoid as muchas possible the disturbance on the effects established by previ-ous training samples Although the BP algorithmwas derivedfrom the steepest descend method it actually also followsthis idea [1] Unfortunately the MRII does not well imple-ment the principle and this is the main cause of its poorperformance It can be found that the confidence level (sum-mation of weighted inputs of a neuron) [5 6] adopted by theMRII as a measure for implementing the principle cannotguarantee to select proper neurons for the weight adaptationduring learning

In Madaline learning one of the most important issuesis the effect of variation of network parameters on its outputso Madaline sensitivity (ie the effect of parameter variationon network output) can be used to properly assess this effectBased on the Madaline sensitivity measure a new learningalgorithm (SBALR) [10] of Madalines is proposed and it hasperformedwell in learningHowever why isMRII (a previouslearning algorithm for Madalines) poor in learning perfor-manceThis problem has still not been solved in theoryThispaper tries to theoretically analyze MRIIrsquos disadvantage andto further improve it

This paper presents an improving Madaline learningalgorithm based on aMadaline sensitivitymeasureThemaincontribution of the algorithm is that (1) it analyzes MRIIrsquosshortage in learning performance from the sensitivity point ofview and points out that the confidence level in MRII cannotproperly measure the output perturbation due to weightadaptation (2) it proposes an adaptation selection rule bymeans of the sensitivity measure to improveMRIIThe adap-tation selection rule for neurons couldmore accurately locatethe neurons and thus their weights in real need of adaptationduring learning so as to better implement the minimal dis-turbance principle for greatly improving the learning perfor-mance of MRII

Although both this paper and [10] take the Madalinesensitivity theory as the important theory there are twomaindifferences between them (1) In this paper the sensitivity ismainly taken as a measure to better locate the neurons in realneed of adaptation during learning while in [10] the sensi-tivity is used to guide weight learning rule development (2)in goal this paper adopts the sensitivity theory to analyzeMRIIrsquos shortages in performance and then to improve MRIIwhile [10] takes the sensitivity theory to guide learning ruledesign and thus to develop a completely new learning algo-rithm for Madalines independent of MRII

The rest of this paper is organized as follows In the nextsection the Madaline model and its sensitivity are brieflydescribed Measures for evaluating the effects of weightadaptation are discussed in Section 3 An adaptation selectionrule based on the Madaline sensitivity is proposed in Sec-tion 4 Following in Section 5 is the new Madaline learningalgorithm based on the rule Experimental evaluations andresults are given in Section 6 Finally Section 7 concludes thepaper

2 Madaline Model and Sensitivity Measure

21 Madaline Model and Notations A Madaline is a kind ofbinary multilayer feedforward neural network with a super-vised learning mechanism and consists of a set of neuronscalled Adalines with binary input output and hard limitactivation function The input of an Adaline which is rep-resented by 119883 = (119909

0 1199091 119909

119899)119879 including an extra element

1199090(equiv 1) corresponding to a bias119908

0 is weighted by the weight

119882 = (1199080 1199081 119908

119899)119879 containing the bias and then fed into

an activation function 119891(sdot) to yield an output of the Adalineas

119910 = 119891 (119883119879119882) =

minus1 119883119879119882 lt 0

+1 119883119879119882 ge 0(1)

Generally a Madaline has 119871 layers and each layer 119897 (1 le 119897 le119871) has 119899119897 (119899119897 ge 1) Adalines The form of 1198990 minus 1198991 minus sdot sdot sdot minus 119899119871 isused to represent the Madaline in which each 119899119897 (0 le 119897 le 119871)not only stands for a layer but also indicates the number ofAdalines in the layer 1198990 is an exception which denotes theinput dimension of the Madaline For the 119895th layer 119883119895 (1 le119895 le 119871) denotes the input of all Adalines in the layer and119884119895 (1 le 119895 le 119871) denotes the output of the layer They meet119884119896minus1 = 119883119896 (2 le 119896 le 119871) Particularly 1198831 denotes not only theinput of all Adalines in the first layer but also the input of theentire Madaline 119899119871 denotes the output layer and 119884119871 is theoutput of both the last layer and the entire Madaline

It is well known that a network with a single hiddenlayer and enough hidden neurons is adequate to deal with allmapping problems [11] For simplicity and without loss ofgenerality the following discussion only focuses on theMadalines with single hidden layer

22 Madaline Sensitivity Measure Usually an adaptivesupervised learning is a process of iterative weight adaptationuntil the input-output mapping indicated by a training dataset is established So in each iteration how to correctly locatethe weights in real need of adaptation is a key issue for thesuccess of a Madaline learning algorithm In order to suc-cessfully locate the weights in need of adaptation it is vital toanalyze Madaline output variation caused by the weightadaptation Since the study on Madaline sensitivity aims atexploring the effects of a Madaline weightsrsquo variation on itsoutput it is reasonable to investigate the sensitivity as ameasure to locate the weights

The following subsections will briefly introduce the latestresearch results on the Madaline sensitivity which will be





119904 (Δ119882) =119881var119881119899

(2)


119899is the



119904 (Δ119882)

asymp 119904 (Δ119882)

=

arccos ((1198821198821015840) (|119882| 10038161003816100381610038161003816119882101584010038161003816100381610038161003816))

120587

(|Δ119882| |119882|)

120587 for |Δ119882| ≪ |119882|

(3)



119904 (Δ119883) asymp 119904 (Δ119883)

=arccos (sum119899

119894=01199081198941199081015840119894sum119899

119894=01199082119894)

120587

1199081015840

119894= minus119908119894119894 isin 1198951 1198952 119895

119870

119908119894

others

(4)







119904 (Δ119883) asympradic4119870 (119899 + 1)

120587=(|Δ119883| |119883|)

120587for 119870 ≪ 119899 (5)



119878119897= (119904119897

1 119904119897

2 119904

119897

119899119897)119879

(6)


119878net = 119878119871= (119904119871

1 119904119871

2 119904

119871

119899119871)119879

(7)


119904net =sum119899119871

119894=1119904119871119894sdot 119881119899

119899119871 sdot 119881119899

=1

119899119871

119899119871

sum119894=1

119904119871

119894 (8)







119894) 1 le 119897 le 119871 1 le 119894 le 119899

119897

(9)



Adaline sensitivity



Adaline sensitivity


W and ΔW




1199042

119894(Δ1198822

119894)

=arccos ((1198822

119894(1198822119894+ Δ1198822

119894)) (

100381610038161003816100381610038161198822119894

10038161003816100381610038161003816

100381610038161003816100381610038161198822119894+ Δ1198822

119894

10038161003816100381610038161003816))

120587

(10)


1199042

119894(Δ1198821

119895) asymp 1199041

119895(Δ1198821

119895) 1199042

119894(Δ1198832

119894)

= 1199041

119895(

arccos ((100381610038161003816100381610038161198822

119894

10038161003816100381610038161003816

2

minus 21199082119894119895) 100381610038161003816100381610038161198822119894

10038161003816100381610038161003816

2

)

120587)

1 le 119894 le 1198992

(11)



1198821015840= 119882 + 119889120578(

(119877 minus 119889119883119879119882)

(119899 + 1))119883 for 10038161003816100381610038161003816119883

11987911988210038161003816100381610038161003816lt 120575 (12)



Δ119882 = 119889120578((119877 minus 119889119883

119879119882)

(119899 + 1))119883 = 119889120578(

(119877 +1003816100381610038161003816100381611988311987911988210038161003816100381610038161003816)

(119899 + 1))119883

(13)









1 2214 0023 262 0058 0008 53 1429 0054 644 1685 0022 285 2959 0038 47















1199042

119894(Δ1198821

119895) = 1199041

119895(Δ1198821

119895) 1199042

119894(Δ1198832

119894)

asymp (2

1205872)radic

1

(1198991 + 1)(

10038161003816100381610038161003816Δ1198821119895

10038161003816100381610038161003816100381610038161003816100381610038161198821119895

10038161003816100381610038161003816

)

(14)













Algorithm 1



119895

10038161003816100381610038161003816100381610038161003816100381610038161198821119895

10038161003816100381610038161003816

1 le 119895 le 1198991 (15)







0

20

40

60

80

100

120


Con

verg

ence

rat

e (

)

BPMRIIOurs

Mon

ks-1

(10

-4-1

)

Mon

ks-1

(10

-3-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)


0

20

40

60

80

100

120


Succ

ess

rate

()

Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

BPMRIIOurs

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)

(b) Success rate

0

20

40

60

80

100

120


Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

Gen

eral

izat

ion

rate

()

BPMRIIOurs













Acknowledgments


References























Volume 2014




Journal of











Journal of


Function Spaces






Algebra













119904 (Δ119882) =119881var119881119899

(2)


119899is the



119904 (Δ119882)

asymp 119904 (Δ119882)

=

arccos ((1198821198821015840) (|119882| 10038161003816100381610038161003816119882101584010038161003816100381610038161003816))

120587

(|Δ119882| |119882|)

120587 for |Δ119882| ≪ |119882|

(3)



119904 (Δ119883) asymp 119904 (Δ119883)

=arccos (sum119899

119894=01199081198941199081015840119894sum119899

119894=01199082119894)

120587

1199081015840

119894= minus119908119894119894 isin 1198951 1198952 119895

119870

119908119894

others

(4)







119904 (Δ119883) asympradic4119870 (119899 + 1)

120587=(|Δ119883| |119883|)

120587for 119870 ≪ 119899 (5)



119878119897= (119904119897

1 119904119897

2 119904

119897

119899119897)119879

(6)


119878net = 119878119871= (119904119871

1 119904119871

2 119904

119871

119899119871)119879

(7)


119904net =sum119899119871

119894=1119904119871119894sdot 119881119899

119899119871 sdot 119881119899

=1

119899119871

119899119871

sum119894=1

119904119871

119894 (8)







119894) 1 le 119897 le 119871 1 le 119894 le 119899

119897

(9)



Adaline sensitivity



Adaline sensitivity


W and ΔW




1199042

119894(Δ1198822

119894)

=arccos ((1198822

119894(1198822119894+ Δ1198822

119894)) (

100381610038161003816100381610038161198822119894

10038161003816100381610038161003816

100381610038161003816100381610038161198822119894+ Δ1198822

119894

10038161003816100381610038161003816))

120587

(10)


1199042

119894(Δ1198821

119895) asymp 1199041

119895(Δ1198821

119895) 1199042

119894(Δ1198832

119894)

= 1199041

119895(

arccos ((100381610038161003816100381610038161198822

119894

10038161003816100381610038161003816

2

minus 21199082119894119895) 100381610038161003816100381610038161198822119894

10038161003816100381610038161003816

2

)

120587)

1 le 119894 le 1198992

(11)



1198821015840= 119882 + 119889120578(

(119877 minus 119889119883119879119882)

(119899 + 1))119883 for 10038161003816100381610038161003816119883

11987911988210038161003816100381610038161003816lt 120575 (12)



Δ119882 = 119889120578((119877 minus 119889119883

119879119882)

(119899 + 1))119883 = 119889120578(

(119877 +1003816100381610038161003816100381611988311987911988210038161003816100381610038161003816)

(119899 + 1))119883

(13)









1 2214 0023 262 0058 0008 53 1429 0054 644 1685 0022 285 2959 0038 47















1199042

119894(Δ1198821

119895) = 1199041

119895(Δ1198821

119895) 1199042

119894(Δ1198832

119894)

asymp (2

1205872)radic

1

(1198991 + 1)(

10038161003816100381610038161003816Δ1198821119895

10038161003816100381610038161003816100381610038161003816100381610038161198821119895

10038161003816100381610038161003816

)

(14)













Algorithm 1



119895

10038161003816100381610038161003816100381610038161003816100381610038161198821119895

10038161003816100381610038161003816

1 le 119895 le 1198991 (15)







0

20

40

60

80

100

120


Con

verg

ence

rat

e (

)

BPMRIIOurs

Mon

ks-1

(10

-4-1

)

Mon

ks-1

(10

-3-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)


0

20

40

60

80

100

120


Succ

ess

rate

()

Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

BPMRIIOurs

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)

(b) Success rate

0

20

40

60

80

100

120


Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

Gen

eral

izat

ion

rate

()

BPMRIIOurs













Acknowledgments


References























Volume 2014




Journal of











Journal of


Function Spaces






Algebra










Adaline sensitivity



Adaline sensitivity


W and ΔW




1199042

119894(Δ1198822

119894)

=arccos ((1198822

119894(1198822119894+ Δ1198822

119894)) (

100381610038161003816100381610038161198822119894

10038161003816100381610038161003816

100381610038161003816100381610038161198822119894+ Δ1198822

119894

10038161003816100381610038161003816))

120587

(10)


1199042

119894(Δ1198821

119895) asymp 1199041

119895(Δ1198821

119895) 1199042

119894(Δ1198832

119894)

= 1199041

119895(

arccos ((100381610038161003816100381610038161198822

119894

10038161003816100381610038161003816

2

minus 21199082119894119895) 100381610038161003816100381610038161198822119894

10038161003816100381610038161003816

2

)

120587)

1 le 119894 le 1198992

(11)



1198821015840= 119882 + 119889120578(

(119877 minus 119889119883119879119882)

(119899 + 1))119883 for 10038161003816100381610038161003816119883

11987911988210038161003816100381610038161003816lt 120575 (12)



Δ119882 = 119889120578((119877 minus 119889119883

119879119882)

(119899 + 1))119883 = 119889120578(

(119877 +1003816100381610038161003816100381611988311987911988210038161003816100381610038161003816)

(119899 + 1))119883

(13)









1 2214 0023 262 0058 0008 53 1429 0054 644 1685 0022 285 2959 0038 47















1199042

119894(Δ1198821

119895) = 1199041

119895(Δ1198821

119895) 1199042

119894(Δ1198832

119894)

asymp (2

1205872)radic

1

(1198991 + 1)(

10038161003816100381610038161003816Δ1198821119895

10038161003816100381610038161003816100381610038161003816100381610038161198821119895

10038161003816100381610038161003816

)

(14)













Algorithm 1



119895

10038161003816100381610038161003816100381610038161003816100381610038161198821119895

10038161003816100381610038161003816

1 le 119895 le 1198991 (15)







0

20

40

60

80

100

120


Con

verg

ence

rat

e (

)

BPMRIIOurs

Mon

ks-1

(10

-4-1

)

Mon

ks-1

(10

-3-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)


0

20

40

60

80

100

120


Succ

ess

rate

()

Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

BPMRIIOurs

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)

(b) Success rate

0

20

40

60

80

100

120


Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

Gen

eral

izat

ion

rate

()

BPMRIIOurs













Acknowledgments


References























Volume 2014




Journal of











Journal of


Function Spaces






Algebra













1 2214 0023 262 0058 0008 53 1429 0054 644 1685 0022 285 2959 0038 47















1199042

119894(Δ1198821

119895) = 1199041

119895(Δ1198821

119895) 1199042

119894(Δ1198832

119894)

asymp (2

1205872)radic

1

(1198991 + 1)(

10038161003816100381610038161003816Δ1198821119895

10038161003816100381610038161003816100381610038161003816100381610038161198821119895

10038161003816100381610038161003816

)

(14)













Algorithm 1



119895

10038161003816100381610038161003816100381610038161003816100381610038161198821119895

10038161003816100381610038161003816

1 le 119895 le 1198991 (15)







0

20

40

60

80

100

120


Con

verg

ence

rat

e (

)

BPMRIIOurs

Mon

ks-1

(10

-4-1

)

Mon

ks-1

(10

-3-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)


0

20

40

60

80

100

120


Succ

ess

rate

()

Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

BPMRIIOurs

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)

(b) Success rate

0

20

40

60

80

100

120


Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

Gen

eral

izat

ion

rate

()

BPMRIIOurs













Acknowledgments


References























Volume 2014




Journal of











Journal of


Function Spaces






Algebra




















Algorithm 1



119895

10038161003816100381610038161003816100381610038161003816100381610038161198821119895

10038161003816100381610038161003816

1 le 119895 le 1198991 (15)







0

20

40

60

80

100

120


Con

verg

ence

rat

e (

)

BPMRIIOurs

Mon

ks-1

(10

-4-1

)

Mon

ks-1

(10

-3-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)


0

20

40

60

80

100

120


Succ

ess

rate

()

Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

BPMRIIOurs

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)

(b) Success rate

0

20

40

60

80

100

120


Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

Gen

eral

izat

ion

rate

()

BPMRIIOurs













Acknowledgments


References























Volume 2014




Journal of











Journal of


Function Spaces






Algebra










0

20

40

60

80

100

120


Con

verg

ence

rat

e (

)

BPMRIIOurs

Mon

ks-1

(10

-4-1

)

Mon

ks-1

(10

-3-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)


0

20

40

60

80

100

120


Succ

ess

rate

()

Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

BPMRIIOurs

And

-Xor

(2-2

-2)

And

-Xor

(2-3

-2)

(b) Success rate

0

20

40

60

80

100

120


Mon

ks-1

(10

-3-1

)

Mon

ks-1

(10

-4-1

)

Mon

ks-2

(10

-2-1

)

Mon

ks-2

(10

-3-1

)

Mon

ks-3

(10

-2-1

)

Mon

ks-3

(10

-3-1

)

Led-7

(7-4

-4)

Led-24

(24

-4-4

)

Gen

eral

izat

ion

rate

()

BPMRIIOurs













Acknowledgments


References























Volume 2014




Journal of











Journal of


Function Spaces






Algebra
















Acknowledgments


References























Volume 2014




Journal of











Journal of


Function Spaces






Algebra
















Volume 2014




Journal of











Journal of


Function Spaces






Algebra









research article a sensitivity-based improving learning...

Documents