design and performance of adaptive systems based on...

8 IEEE CIRCUITS AND SYSTEMS MAGAZINE 1531-636X/05/$20.00©2005 IEEE FIRST QUARTER 2005

The theory and design of linear adaptive filters based on FIR filter structures is well developed and widelyapplied in practice. However, the same is not true for more general classes of adaptive systems such as lin-ear infinite impulse response adaptive filters (IIR) and nonlinear adaptive systems. This situation resultsbecause both linear IIR structures and nonlinear structures tend to produce multi-modal error surfaces forwhich stochastic gradient optimization strategies may fail to reach the global minimum. After briefly dis-cussing the state of the art in linear adaptive filtering, the attention of this paper is turned to IIR and non-linear adaptive systems for potential use in echo cancellation, channel equalization, acoustic channelmodeling, nonlinear prediction, and nonlinear system identification . Structured stochastic optimization algo-rithms that are effective on multimodal error surfaces are then introduced, with particular attention to theParticle Swarm Optimization (PSO) technique. The PSO algorithm is demonstrated on some representativeIIR and nonlinear filter structures, and both performance and computational complexity are analyzed forthese types of nonlinear systems.

Design and Performance of Adaptive Systems Basedon Structured StochasticOptimization Strategies

Abstract

WA

RR

EN

PH

OTO

GR

AP

HIC

Feature

D. J. Krusienski and W. K. Jenkins

Introduction

daptive digital signal processing is a subject that hasattracted increasing attention in recent years dueto demands for improved performance in high

data rate digital communication systems and in widebandimage/video processing systems. Adaptive system identi-fication, adaptive noise cancellation, adaptive linear pre-diction, and adaptive channel equalization are just a fewof the important applications areas that have been signif-icantly advanced with adaptive signal processing tech-niques. Much of the recent success in adaptive signalprocessing has been facilitated by improvements in VLSIdigital signal processor (DSP) integrated circuit technolo-gy, which currently provides large amounts of digital sig-nal processing power in a convenient and reliable form.Since improvements in integrated circuit technology con-tinue to emerge it is likely that adaptive techniques willassume an even more important role in high performanceelectronic systems of the future.

While theory and design of linear adaptive filtersbased on FIR filter structures is a well developed subject,the same is not true for linear infinite impulse responseadaptive filters (IIR) or nonlinear adaptive systems. Bothlinear IIR structures and nonlinear structures tend to pro-duce multi-modal error surfaces for which stochastic gra-dient optimization strategies may fail to reach the globalminimum. Also, with stochastic gradient algorithms, IIRadaptive filters must be carefully monitored for stabilityin case their poles move outside the unit circle duringadaptation.

Linear FIR Adaptive Filters

An adaptive finite impulse response (FIR) filter consistsof a digital tapped delay line with variable multipliercoefficients that are adjusted by an adaptive algorithm[1]. The adaptive algorithm attempts to minimize a costfunction that is designed to provide an instantaneouson-line estimate of how closely the adaptive filterachieves a prescribed optimum condition. The costfunction most frequently used is an approximation tothe expected value of the squared error, E{|e|2(n)},where e(n) = d(n) − y(n) is the difference between atraining signal d(n) (sometimes called the desiredresponse) and the filter output y(n), and E{•} denotesthe statistical expected value. The training signal d(n) isobtained by different means in different applications,and the acquisition of an appropriate training signal isoften a limiting factor in performance.

The input vector and the coefficient weight vec-tor of the adaptive filter at the nth iteration aredefined as X(n) = [x(n), x(n−1), . . . , x(n − N+1)]t andW(n) = [w0(n),w1(n), . . . , wN−1(n)]t , respectively, where

the superscript t denotes vector transpose. The nth output isthen given by

y(n) =N−1∑

k=0

wk x(n − k) = Wt(n)X(n). (1)

In the following discussion, the training signal d(n) andthe input signal x(n) are assumed to be stationary andergodic. An adaptive filter uses an iterative method bywhich the tap weights W(n) are made to converge to theoptimal solution W∗ that minimizes the cost function. It iswell known that W∗, known in the literature as the Wienersolution, is given by

W∗ = R−1xx pxd, (2)

where Rxx = E{X(n)Xt(n)} is the autocorrelation matrix ofthe input and pxd = E{X(n)d(n)} is the cross-correlationvector between the input and the desired response. Themost common iterative approach is to update each tapweight according to a steepest descent strategy; i.e., thetap weight vector is incremented in proportion to thegradient ∇w according to

W(n + 1) = W(n) − µ∇w, (3)

where µ is the step size, ∇w = [∇w0(n), . . . ,∇wN−1(n)]t ,and ∇wk(n) = δE{e2(n)}/δwk is the partial derivative of thecost function with respect to wk(n), for k = 0, . . . , N − 1.However, the precise value of the cost function is notknown, nor is the gradient known explicitly, so it is neces-sary to make some simplifying assumptions that allow gra-dient estimates to be computed on-line. Differentapproaches to estimating the cost function and/or the gra-dient lead to different adaptive algorithms, such as the wellknown Least Mean Squares (LMS) and the Recursive LeastSquares (RLS) algorithms. In particular, including the Hess-ian matrix in equation (3) to accelerate the steepest descentoptimization strategy leads to the family of quasi-Newtonalgorithms that are characterized by rapid convergence atthe expense of greater computational complexities.

The LMS algorithm [1] makes the simplifyingassumption that the expected value of the squared erroris approximated by the squared error itself, i.e.,E{|e(n)|2} ∼ |e(n)|2. In deriving the algorithm, the errorsquared is differentiated with respect to W to approxi-

9FIRST QUARTER 2005 IEEE CIRCUITS AND SYSTEMS MAGAZINE

A

mate the true gradient. In vector notation the LMS updaterelation becomes

W(n + 1) = W(n) + 2µe(n)X(n). (4)

The value of µ, usually determined experimentally,greatly affects both the convergence rate of the adaptiveprocess and the minimum mean squared error afterconvergence. To ensure stability and guaranteeconvergence of both the coefficients and the meansquared error estimate, µ must satisfy the condition0 < µ < 1/N E{x2(n)}, where E{x2(n)} = (1/N)tr[Rxx] isthe average input signal power which can be calculatedfrom Rxx if the input autocorrelation matrix is known.When µ is properly chosen, the weight vector convergesto an estimate of the Wiener solution.

Linear IIR Adaptive Filters

The mean squared error (MSE) approximation that led tothe conventional LMS algorithm for FIR filters has alsobeen applied to infinite impulse response (IIR) filters [2].

Recall that a direct form IIR digital filter is characterizedby a difference equation

y(n) =Nb−1∑k=0

bkx(n − k) +Na−1∑j=0

ajy(n − j), (5)

where the bk’s are the coefficients that define the zeros ofthe filter and the ak’s define the poles. The LMS adaptivealgorithm for IIR filters is derived in a similar manner asin the FIR case, although the recursive relation (equation(5)) is used instead of the convolution sum (equation (1))to characterize the input-output relationship of the filter.The IIR derivation is more complicated because the recur-sive terms on the right side of equation (5) depend onpast values of the filter coefficients.

If derivatives of the squared error function are calculated using the chain rule, so that first orderdependencies are taken into account and higher orderdependencies are ignored, the result is

∇E[|e|2] =⌊−2e(n)

∂(y(n))

∂a,−2e(n)

∂(y(n))

∂b

⌋,

where

∂y(n)

∂bk= x(n − k) +

Na∑j=1

aj(n)∂y(n − j)

∂bkk = 0, . . . , Nb − 1

(6a)

and

∂y(n)

∂ak= y(n − k) +

Na∑j=1

aj(n)∂y(n − j)

∂akk = 0, . . . , Na − 1.

(6b)

This procedure does not result in a closed form expressionfor the gradient but it does produce a recursive relation bywhich the gradients can be generated using equation (6).

It is well known that the use of the output error in theformulation of the cost function prevents bias in the solu-tion due to noise in the desired signal. However, the effectof this recursion is to make the problem nonlinear interms of the coefficient parameters. Also, the current fil-ter parameters now depend directly upon previous filtercoefficients, which are time-varying. This often leads toMSE surfaces that are not quadratic in nature. There aremany examples in the literature for which IIR MSE sur-faces contain local minima, in addition to the global min-imum. In these cases gradient search techniques maybecome entrapped in local minimum, resulting in poor

10 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2005

Y

X

pi(n)

pibest

ei2=[1,1]

acc2=1

Figure 3. Example of scaling and random direction boundsfor the vectors.

Downlink

Linear (L) Linear (L)Stationary

Nonlinearity (N)

Uplink Channel

Figure 1. Model of a typical nonlinear communication channel.

Linear (L) Nonlinear (N) Linear (L)

Figure 2. LNL model.

performance due to improper convergence of the adap-tive filter [3]–[6].

Nonlinear Adaptive Systems

There are many the applications where voice signals,audio signals, images, or video signals are subjected tononlinear processing, and which require nonlinear adap-tive compensation to achieve the proper system identi-fication and parameter extraction. For example, ageneric nonlinear communication system is shown inFigure 1. The overall communication channel betweenthe transmitter and the receiver is often nonlinear, sincethe amplifiers located in the (satellite) repeaters usuallyoperate at/or near saturation in order to conservepower. The saturation nonlinearities of the amplifiersintroduce nonlinear distortions in the signal theyprocess. The path from the transmitter to the repeateras well as from the repeater to the receiver may each bemodeled as a linear system. The amplifier characteris-tics are usually modeled using memoryless nonlineari-ties. In general, the static nonlinearity is comprised of alinear term and higher order polynomial terms; hencethe output of such a system can be represented as thesum of a linear part and a nonlinear part. Such systemscan be modeled by connecting nonlinear and linear filtermodules into a series cascade configuration. In particu-lar, many nonlinear systems can be represented by theLNL model shown in Figure 2 [7]. It is well known that asimilar nonlinear model can be used for magneticrecording channels, where theinteraction between the electronicbit stream and the magnetic record-ing medium via the read/writeheads exhibits nonlinear behavior.As in the case of IIR adaptive struc-tures, nonlinear adaptive filters canalso produce multi-modal error sur-faces on which stochastic gradientoptimization strategies may fail toreach the global minimum due topremature entrapment. Neural net-works and Volterra nonlinear adap-tive filters are well known for theirtendency to generate troublesomemultimodal error surfaces [8]–[12].

Structured Stochastic

Optimization Algorithms

Stochastic optimization algorithmsaim at increasing the probability ofencountering the global minimum,without performing an exhaustivesearch of the entire parameter

space. Unlike gradient based techniques, the perform-ance of stochastic optimization techniques in general isnot dependent upon the filter structure. Therefore, thesetypes of algorithms are capable of globally optimizing anyclass of adaptive filter structures or objective functionsby assigning the parameter estimates to represent filtertap weights, neural network weights, or any other possi-ble parameter of the unknown system model (even theexponents of polynomial terms) [13].

A brute-force, purely stochastic search would consistof continuously evaluating random, independent parame-ter estimates relative to a suitable objective/cost/fitness


gbest

Y

X

pi(n)

pi(n+1)

pbestia_

b_

veli(n)

veli(n)+a_

+b_

= veli(n+1)

Figure 4. Example of the possible search region for a singleparticle.

0 10 20 30 40 50 60 70 80 90 100−20

−15

−10

−5

0

5

10

15

20

Generation

MS

E(d

B)

PSOMPSOPSO+MutationPSO+Adaptive Inertia

Figure 5. MPSO Enhancements.

function until some minimal error condition is satisfied. Aslightly more sophisticated stochastic search would be toevaluate a set of parameter estimates generated by anappropriate random distribution with respect to theparameter space. Although these types of stochasticsearches can potentially find the global optimum in somesimple cases, they lack any real structure and fail to uti-lize other information available from the combinedsearch that would enable them to be more efficient.

The foundation of a structured stochastic search is tointelligently generate and modify the randomized esti-mates in a manner that efficiently searches the errorspace, based on some previous or collective informationgenerated by the search [14]. Several different structuredstochastic optimization techniques can be found in adap-tive filtering literature, most notably simulated annealing[15]–[19], evolutionary algorithms such as the geneticalgorithm [20], [18] [21]–[24], and swarm intelligence

algorithms such as particle swarm optimization [25]–[37].One interesting item to note is that all of the prominentstructured stochastic techniques are inspired by a naturalor biological process. This can be attributed to the obser-vance that such natural processes exhibit a sense of struc-ture and stability achieved through some sort ofrandomness or chaos. This section reviews the twoprominent population based structured stochastic opti-mization strategies, evolutionary algorithms and particleswarm optimization, due to their superior convergenceproperties for adaptive filtering applications. Specialemphasis is placed on PSO due to its relative novelty.

Evolutionary Algorithms

Evolutionary algorithms (EA) [38] begin with a randomset of candidate solutions (the unknown parameters to beoptimized), termed the population. Each candidate solu-tion in the population is termed an individual. Each indi-vidual’s set of parameters is termed a chromosome orgenome, and each parameter is termed a gene. Dependingon the nature of the problem, the chromosomes may rep-resent real numbers or can be encoded as binary strings.

At every generation, the fitness of each individual isevaluated by a predetermined fitness function that isassumed to have an extremum at the desired optimalsolution. An individual with a fitness value closer to thatof the optimal solution is considered better fit than anindividual with a fitness value farther from that of theoptimal solution. The population is then evolved basedon a set of principles rooted in evolutionary theory suchas natural selection, survival of the fittest, and mutation.Natural selection is the mating of the fittest individuals(parents) within the population to produce a new indi-vidual (offspring). This equates to randomly swappingcorresponding parameters (crossover) between the par-ents to produce a new, potentially better fit individual.The new offspring then replace the least fit individualsin the population, which is the survival of the fittestfacet of the evolution. A portion of the population isthen randomly mutated in order to add new parametersto the search. The expectation is that only the offspringthat inherit the best parameters from the parents willsurvive and the population will continually converge tothe best possible fitness that represents the optimal orsuitable solution. Several EA paradigms exist such asthe genetic algorithm, evolutionary programming, andevolutionary strategies; each emphasizing only specificevolutionary constructs, encoding, and operators.

Previous work on EAs for adaptive filtering, specifi-cally the GA, shows that the GA is capable of globallyoptimizing both IIR [20], [18] [22]–[24] [39] and nonlin-ear [23], [24] filter structures. A hybrid genetic-LMSapproach is used in [39], which utilizes the GA to find


Mutate Particles

Adjust Inertia

New gbest?

Determine gbest, Pbesti, and Ji

Re-Randomize About gbest

Update Particle Velocities and Positions

Yes

Yes

No

Finish Minimum Error Condition Satisfied?

Evaluate Particle Fitness, Ji

Initialize Swarm

Figure 6. MPSO Flowchart.

suitable initial IIR filter coefficientsfor the LMS algorithm, resulting inan improved performance overimplementing a standard GA orLMS. The performance of the GA isexamined for general nonlinearrecursive adaptive filters in [24],along with a proof of convergencefor the estimation error.

Particle Swarm Optimization

Particle swarm optimization wasfirst developed in 1995 by Eberhartand Kennedy [40]–[45] rooted onthe notion of swarm intelligence ofinsects, birds, etc. The algorithmattempts to mimic the naturalprocess of group communication ofindividual knowledge that occurswhen such swarms flock, migrate,forage, etc. in order to achieve someoptimum property such as configu-ration or location.

Similar to EAs, conventionalPSO begins with a random population of individuals;here termed a swarm of particles. As with EAs, each par-ticle in the swarm is a different possible set of theunknown parameters to be optimized. The parametersthat characterize each particle can be real-valued orencoded depending on the particular circumstances.The premise is to efficiently search the solution spaceby swarming the particles toward the best fit solutionencountered in previous epochs with the intent ofencountering better solutions through the course of theprocess and eventually converging on a single minimumerror solution.

The conventional PSO algorithm begins by initializinga random swarm of M particles, each having R unknownparameters to be optimized. At each epoch, the fitness ofeach particle is evaluated according to the selected fit-ness function. The algorithm stores and progressivelyreplaces the most fit parameters of each particle (pbesti,i = 1, 2, . . . , M) as well as a single most fit particle (gbest)as better fit parameters are encountered. The parametersof each particle (pi) in the swarm are updated at eachepoch (n) according to the following equations:

veli(n) = w ∗veli(n − 1) + acc1∗diag [e1, e2, . . . , eR]i1

× ∗(gbest − pi(n − 1))

+ acc2∗diag [e1, e2, . . . , eR]i2

∗(pbesti − pi(n − 1))

(7)

pi(n) = pi(n − 1) + veli(n), (8)

where veli(n) is the velocity vector of particle i, er is avector of random values ∈ (0, 1), acc1, acc2 are the acceleration coefficients toward gbest, pbesti respectively,and w is the inertia weight.

It can be gathered from the update equations that thetrajectory of each particle is influenced in a directiondetermined by the previous velocity and the location ofgbest and pbesti. Each particle’s previous position(pbesti) and the swarm’s overall best position (gbest) aremeant to represent the notion of individual experiencememory and group knowledge of a “leader or queen,”respectively, that emerges during the natural swarmingprocess. The acceleration constants are typically cho-sen in the range ∈ (0, 2) and serve dual purposes in thealgorithm. For one, they control the relative influencetoward gbest and pbesti respectively, by scaling eachresulting distance vector, as illustrated for a 2-dimen-sional case in Figure 3. Secondly, the two accelerationcoefficients combined form what is analogous to thestep size of an adaptive algorithm. Acceleration coeffi-cients closer to 0 will produce fine searches of a region,while coefficients closer to 1 will result in lesser explo-ration and faster convergence. Setting the accelerationgreater than 1 allows the particle to possibly overstepgbest or pbest, resulting in a broader search. The randomei vectors have R different components, which are ran-


0 20 40 60 80 100 120 140 160 180 200−70

−60

−50

−40

−30

−20

−10

0

10

Generation

MS

E(d

B)

PSOMPSOPSO-LMSCON-LMSGA

Figure 7. Population = 50. z

domly chosen from a uniform distribution in the range∈ (0, 1). This allows the particle to take constrained ran-domly directed steps in a bounded region between gbestand pbesti, as shown in Figure 3.

A single particle update is graphically illustrated intwo dimensions in Figure 4. The new particle coordinatescan lie anywhere within the bounded region, dependingupon the weights and random components associatedwith each vector. The particle update bounds in Figure 4are basically composed of all of the bounded regions foreach vector as shown in Figure 3, with the addition of theprevious velocity component.

When a new gbest is encountered during theupdate process, all other particles begin to swarmtoward the new gbest, continuing the directed glob-al search along the way. The search regions contin-ue to decrease as new pbestis are found within thesearch regions. When all of the particles in theswarm have converged to gbest, the gbest parame-ters characterize the minimum error solutionobtained by the algorithm.

One of the key advantages of PSO is the ease ofimplementation in both the context of coding andparameter selection. This is much simpler and intu-itive to implement than complex, probability basedselection and mutation operators required for evolu-tionary algorithms. Guidelines for selecting and opti-mizing the PSO parameters are detailed in [46]–[50].

The Particle Swarm—

Least Mean Square

(PSO-LMS) Hybrid Algorithm

One weakness of conventional PSO isthat its local search is not guaranteedconvergent; its local search capabili-ty lies primarily in the swarm sizeand search parameters. On the otherhand, the problem with simply run-ning a brute-force population of inde-pendent LMS algorithms [4] is thatthere is no collective informationexchange between population mem-bers, which makes the algorithm inef-ficient and prone to the localminimum problem of standard LMS.Therefore, it is desirable to combinethe convergent local search capabili-ties of the LMS algorithm [51] withthe global search of PSO.

When initialized in the global opti-mum valley, the LMS algorithm canbe tuned to provide an optimal rate ofconvergence without apprehensionof encountering a local minimum.

Therefore, by using a structured stochastic search, such asPSO, to quickly focus the population on regions of interest,an optimally tuned LMS algorithm can take over and pro-vide better results than standard LMS. A generalized formof this PSO-LMS Hybrid algorithm is presented here, whichcan be extended for IIR updates and back-propagationupdates for nonlinear structures.

For a general adaptive filter structure, the LMS updatetakes the form:

w(n) = w(n − 1) + µe(n − 1)∇y(n − 1), (9)

where e(n) is the instantaneous error between the desiredsignal and filter output, ∇y(n) is the gradient of the out-put with respect to the filter parameters, and µ is the stepsize. This update can be considered a directional vector,similar to those used to generate the particle updates ofPSO. To form the PSO-LMS hybrid, the LMS update fromequation (1) is combined with to the PSO particle updatefrom equation (5) to create the hybrid update:

pi(n) = pi(n − 1) + c1veli(n) + c2µe(n − 1)∇y(n − 1), (10)

where c1(n) and c2(n) are scaling factors that controlthe relative influence of the PSO and LMS directions,


0 20 40 60 80 100 120 140 160 180 200−90

−80

−70

−60

−50

−40

−30

−20

−10

0

10

Generation

MS

E(d

B)


Figure 8. Population = 100.

respectively. These scaling factors should be chosensuch that c1(n) + c2(n) = 1 in order to control the sta-bility of the algorithm. The principle is to decrease theinfluence of the more global PSO component as the algo-rithm progresses, which will act toincrease the influence of the LMScomponent in order to provide thedesired convergence properties. Inaddition to the locally convergentproperties, another advantage ofthis algorithm is that the LMS com-ponent enables tracking of a dynam-ic system.

Modified Particle Swarm

Optimization

Although PSO-LMS has several desir-able traits, the LMS component ofthe update can grow complex andtend to slow the convergence of cer-tain IIR and nonlinear structures.Therefore, several simple enhance-ments can be added to conventionalPSO that significantly boost the per-formance without much added com-plexity. The following enhancementsaddress the two main weaknesses ofconventional PSO: outlying particlesand stagnation.

If the new gbest particle is an out-lying particle with respect to theswarm, the rest of the swarm cantend to move toward the new gbestfrom the same general direction.This may potentially leave some crit-ical region around the new minimumexcluded from the search. To com-bat this problem, when a new gbest isencountered, a small percentage ofrandomly selected particles can bere-randomized about the new gbest.This does not affect the globalsearch properties, and can actuallyimprove the convergence rate.

Particles closer to gbest will tendto quickly converge on it and becomestagnant (no longer update) whilethe other more distant particles con-tinue to search. A stagnant particle isessentially useless because its fitnesscontinues to be evaluated but it nolonger contributes to the search.Stagnancy of particles can be elimi-

nated by slightly varying the random parameters of eachparticle at every epoch, similar to mutation in EAs. Thishas little effect on particles distant from gbest because thisrandom influence is relatively small compared to the


0 20 40 60 80 100 120 140 160 180 200

0

5

−20

−15

−10

−5

10

15

20

Generation

MS

E(d

B)



0 20 40 60 80 100 120 140 160 180 200−20

−15

−10

−5

0

5

10

15

20

Generation

MS

E(d

B) PSO

MPSOPSO-LMSCON-LMSGA


random update of equation (2). However, this eliminatesany stagnant particles by forcing a finer search about gbest.

Because the mutation operator tends to slow theoptimal convergence rate of PSO in general, the followingadaptive inertia operator is included to compensate:

wi(n) = 1(1 + e

−� Ji(n)

S

)

where wi(n) is the inertia weight of theith particle, � Ji(n) is the change in parti-cle fitness between the current and lastgeneration, and S is a constant used toadjust the transition slope based on theexpected fitness range. The adaptiveinertia automatically adjusts to favordirections that result in large increases inthe fitness value, while suppressingdirections that decrease the fitnessvalue. This modification does not pre-vent the hill-climbing capabilities of PSO,it merely increases the influence ofpotentially fruitful inertial directions,while decreasing the influence of poten-tially unfavorable inertial directions.

The relative effects of the suggestedenhancements are illustrated for a simpleIIR system identification example in Figure

5. A flow chart for such an algorithm,referred to here as MPSO, is given inFigure 6. This algorithm is capable ofproviding desirable performance andconvergence properties in most anycontext. In addition to eliminatingthe concerns of conventional PSO,the algorithm is designed to balancethe convergence speed and searchquality tradeoffs of a stochasticsearch. At a minimum, a more com-pact form of MPSO, using only theadditional mutation operator, willprovide a significant increase in per-formance of conventional PSO. Whencomputational complexity is anissue, another major advantage ofthese algorithms is that they are ableto achieve robust results with small-er populations; since the computa-tional complexity is directly relatedto the population size for populationbased algorithms.

Experimental Examples

In the following examples, the properties of the afore-mentioned algorithms several IIR and nonlinear systemidentification problems, using a windowed meansquared error between the desired training signal andthe adaptive filter output as the fitness function. Allalgorithms were initialized with the same population ofreal-valued parameters and allowed to evolve. Thepopulation sizes were selected to provide reasonable

16

0 20 40 60 80 100 120 140 160 180 200−100

−80

−60

−40

−20

0

20

Generation

MS

E(d

B)



Table 1.

Algorithm Description

PSO The conventional PSO algorithm.

MPSO The modified PSO algorithm, with matching PSO parameters.

PSO-LMS The PSO-LMS algorithm, with matching PSO parameters. A smooth transition between algorithms was selected to occur at the point that conventional PSO stagnates.

CON-LMS The congregational LMS algorithm [52] is implemented where a population of independent LMS estimates is updated at each epoch.

GA The genetic algorithm uses a ranked elitist strategy suggested in [20], where the 6 fittest members of the population are used to generate offspring, which replace the remaining least fit members of the population. For each offspring, two of the 6 parents are selected randomly and the crossover is performed by a random weighted average of each parent's coefficients. This scheme was chosen due to its accelerated convergence properties.

IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2005

performance, while revealing the performance charac-teristics of each algorithm. The fitness is defined as theMSE error of the candidate solutions evaluated over acausal window of 100 input samples. Unless specifiedotherwise, the input signals are zero-mean Gaussianwhite noise with unit variance. For each simulation, theMSE is averaged over 50 independent Monte Carlo tri-als. The specifics of each algorithm are given in Table 1.

Matched high-order IIR Filter, White noise input

For this example the plant, given below is a fifth-orderlow-pass Butterworth filter example taken from [20]:

HPL ANT (z−1) =0.1084 + 0.5419z−1 + 1.0837z−2 + 1.0837z−3 + 0.5419z−4 + 0.1084z−5

1 + 0.9853z−1 + 0.9738z−2 + 0.3864z−3 + 0.1112z−4 + 0.0113z−5

(11)

HAF (z−1) =p1

i + p2i z−1 + p3

i z−2 + p4i z−3 + p5

i z−4 + p6i z−5

1 + p7i z−1 + p8

i z−2 + p9i z−3 + p10

i z−4 + p11i z−5

. (12)

The learning curves for this example are given inFigures 7 and 8. The simulations having the larger pop-ulation illustrate the case where the population size issufficient and all of the algorithms rarely becometrapped in a local minimum.

Table 2 lists the local minima statistics after swarm con-vergence for thedecreased popula-tion IIR examples.One interesting ob-servance is thatpopulation basedsearches are inher-ently less likely toproduce unstableIIR filters due tothe number of esti-mates available at any given epoch, which is not the casefor sequential searches such as LMS.

Reduced order IIR Filter, Colored noise input

For this example, the plant given below is an exampletaken from [20] and [53]:

HPLANT (z−1) = 1(1 − 0.6z−1)3

(13)

HAF (z−1) = p1i

1 + p2i z−1 + p3

i z−2(14)

HC OLOR(z−1) = (1 − 0.6z−1)2(1 + 0.6z−1)2. (15)

The adaptive filters use a colored input generated by fil-tering white noise by the FIR filter given in equation(15). This, in combination with the reduced order, cre-ates a bimodal error surface. Again, the colored noiseleads to smaller stable step sizes and slowed conver-gence rates. The learning curves for this example aregiven in Figures 9 and 10.

Matched Structure Volterra Identification

In this example, a matched structure truncated Volterraadaptive filter is used to identify the truncated Volterraplant taken from [54]:

y[n] = −0.64x[n] + x[n − 2] + 0.9x2[n] + x2[n − 1]

(16)

y[n] = p1i x[n] + p2

i x[n − 1] + p3i x[n − 2]

+ p4i x2[n] + p5

i x2[n − 1] + p6i x2[n − 2]

+ p7i x[n]x[n − 1] + p8

i x[n]x[n − 2]

+ p9i x[n − 1]x[n − 2]. (17)

The learning curves for this example are given in Figure 11.

Nonlinear LNL Cascade Identification

In this example, the identification of an LNL cascadetaken from [54] is performed using a unmatched LNLadaptive filter. The LNL plant consists of a 4th order But-terworth lowpass filter (equation 2), followed by a 4thpower memoryless nonlinear operator, followed by a 4thorder Chebyshev lowpass filter (equation 3), as shown inFigure 12.

This system is a common model for satellite communi-cation systems in which the linear filters model the disper-sive transmission paths to and from the satellite, and thenonlinearity models the traveling wave tube (TWT) trans-mission amplifiers operating near the saturation region.

HB(z−1) =(0.2851 + 0.5704z−1 + 0.2851z−2)(0.2851 + 0.5701z−1 + 0.2851z−2)

(1 − 0.1024z−1 + 0.4475z−2)(1 − 0.0736z−1 + 0.0408z−2)

(18)

HC (z−1) =(0.2025 + 0.288z−1 + 0.2025z−2)(0.2025 + 0.0034z−1 + 0.2025z−2)

(1 − 1.01z−1 + 0.5861z−2)(1 − 0.6591z−1 + 0.1498z−2)

(19)


Table 2.

% Trials in Local Algorithm Minimum

MPSO 2

GA 6

PSO-LMS 6

CON-LMS 15

PSO 20

The LNL adaptive filter structure is given as follows:

HB(z−1) = p1i + p2

i z−1 + p3i z−2 + p4

i z−3

1 + p5i z−1 + p6

i z−2 + p7i z−3

(20)

nonlinearity = p8i

[HB(z−1)

]4(21)

HC (z−1) = p9i + p10

i z−1 + p11i z−2 + p12

i z−3

1 + p13i z−1 + p14

i z−2 + p15i z−3

. (22)

The learning curves for this example are given in Figure 13.

Summary and Conclusions

The results presented demonstrate that the structured sto-chastic algorithms are capable of quickly and effectivelyadapting the coefficients of a wide variety of IIR and nonlin-ear structures. From the simulation results, it is observedthat, with a sufficient population size, all of the structuredstochastic algorithms are capable of converging rapidly tobelow −20 dB in most instances, which is an order of mag-

nitude faster than most existing gradi-ent based techniques.

In all cases, the congregationalLMS algorithm exhibits the slowestconvergence rate due to the fact thatthere is no information transferbetween the estimates. It is observedfrom the learning curves that conven-tional PSO stagnates and PSO-LMScontinues to improve from the stag-nation point of conventional PSO at arate similar to CON-LMS. Both of theLMS based algorithms are capable ofeventually attaining the noise floorwhen the number of generations isincreased, assuming that they are nottrapped in a local minimum.

Since the GA does not have anexplicit step size, the convergencerate can only be controlled to a limit-ed extent through the crossover andmutation operations, and the algo-rithm must evolve at its own intrinsicrate. Due to the nature of the algo-rithm, these GA operators becomeincreasingly taxed as the populationdecreases, resulting in depreciatingperformance. On the other hand, asthe population size increases, the

performance gap between the GA and MPSO begins todiminish for large parameter spaces.

Though conventional PSO exhibits a fast convergenceinitially, it fails to improve further because the swarmquickly becomes stagnant, converging to a suboptimalsolution in all instances. However, with the same set ofalgorithm parameters, the MPSO particles do not stag-nate, allowing it to reach the noise floor. Smaller acceler-ation coefficients can be used with conventional PSO toallow it to approach the noise floor, forsaking the rapidconvergence rate. With the introduction of a simplemutation type operator, adaptive inertia weights, and re-randomization, MPSO can retain the favorable conver-gence rate with a smaller population, while stillachieving the noise floor and avoiding local minima. Thiscan offer considerable savings in cases where computa-tional complexity is an issue.

A topic that warrants further investigation is anassessment of the performance of these algorithms onmodified error surfaces. It is evident that structured sto-chastic search algorithms perform better on surfacesexhibiting relatively few local minima, because the localattractors offer little interference to the search. By incor-porating alternate formulations of the adaptive filter


0 20 40 60 80 100 120 140 160 180 200−70

−60

−50

−40

−30

−20

−10

0

Generation

MS

E(d

B)

PSOMPSOGA


Butterworth Filter HB(z)

Polynomial Nonlinearity ( . )4

Chebychev Filter HC(z)

Figure 12. Nonlinear LNL unknown system.

structures or performing data preprocessing, such asinput orthognalization and power normalization, theerror surface can be smoothed dramatically. These addi-tions will likely result in improved performance of sto-chastic search algorithms.

Related to this notion is the assertion that struc-tured stochastic algorithms, particularly versions ofPSO, have a tendency to excel on lower order parame-ter spaces. This is due partly to the fact that lowerdimensional parameter spaces tend to exhibit fewerlocal minima in general, and because the volume of thehyperspace increases exponentially with each addition-al parameter to be estimated. Therefore it is hypothe-sized that dimensionality reduction techniques, such asimplementing alternate formulations of adaptive filterstructures that are able to accurately model unknownsystems with minimum number of parameters, couldbenefit the overall performance.

References[1] B. Widrow and S.D. Stearns, Adaptive Signal Processing. Englewood Cliffs,NJ: Prentice-Hall, 1985.

[2] J. J. Shynk, “Adaptive IIR Filtering,” IEEE ASSP Magazine, pp. 4–21, April 1989.

[3] H. Fan, “New adaptive IIR filtering algorithms,” Ph.D. Thesis, Dept. of Elec-trical and Computer Eng., University of Illinois, Urbana Champaign, 1985.

[4] H. Fan and W. K. Jenkins, “A new adaptive IIR filter,” IEEE Trans. Cir-cuits and Systems, vol. CAS-33, no. 10, pp. 939–947, Oct. 1986.

[5] S. Pai, “Global Optimization Techniques for IIR Phase Equalizers”, M. S. Thesis, The Pennsylvania State University, University Park, PA, 2001.

[6] S. Pai, W. K. Jenkins, and D. J. Krusienski, “Adaptive IIR phase equal-izers based on stochastic search algorithms,” Proc. 37th Asilomar Conf.Circuits, Systems, and Computers, Nov. 2003.

[7] A. E. Nordsjo and L. H. Zetterberg, “Identification of certain time-varying nonlinear Wiener and Hammerstein systems,” IEEE Trans. on Sig.Proc., vol. 49, no. 3, pp. 577–592, March 2001.

[8] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed.Prentice Hall, 1999.

[9] A. Ismail and A. P. Engelbrecht, “Training product units in feedfor-ward neural networks using particle swarm optimization,” Proceedings ofthe International Conference on Artificial Intelligence, Durban, SouthAfrica. pp. 36–40, 1999.

[10] A. Ismail and A. P. Engelbrecht, “Global optimization algorithms fortraining product unit neural networks,” Proceedings of IEEE-INNS-ENNSInternational Joint Conference on Neural Networks (IJCNN 2000), pp. 132–137, 2000.

[11] R. Mendes, P. Cortez, M. Rocha, and J. Neves, “Particle swarms forfeedforward neural network training,” in Proc. Int. Joint Conf. Neural Net-works (IJCNN ‘02), vol. 2, pp. 1895–1899, 2002.

[12] F. van den Bergh and A. P. Engelbrecht, “Training product unit net-works using cooperative particle swarm optimizers,” Proceedings ofINNS-IEEE International Joint Conference on Neural Networks 2001, Washington DC, USA. 2001.

[13] K. Y. Lee, and M. A. El-Sharkawi, ed., Tutorial on Modern HeuristicOptimization Techniques with Applications to Power Systems, IEEE PowerEngineering Society, 2002.

[14] A. P. Engelbrecht, Computational Intelligence: An Introduction, JohnWiley & Sons Ltd, 2002.

[15] E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines,John Wiley & Sons, 1989.

[16] N. Benvenuto, M. Marchesi, G. Orlandi, F. Piazza, and A. Uncini, “Finite

wordlength digital filter design using an annealing algorithm,” Acoustics,Speech, and Signal Processing, ICASSP-89. vol. 2, pp. 861–864, 1989.

[17] S. Kirkpatrick, C. D. Gelatt Jr., and M. Vecchi, “Optimization by Sim-ulated Annealing,” Science, vol. 220, pp. 498–516, 1983.

[18] R. Nambiar and P. Mars, “Genetic and annealing approaches to adap-tive digital filtering,” Proc. 26th Asilomar Conf. on Signals, Systems, andComputers, vol. 2, pp. 871–875, Oct. 1992.

[19] J. Radecki, J. Konrad, and E. Dubois, “Design of multidimensional finite-wordlength FIR and IIR filters by simulated annealing,” IEEE Trans. Circuitsand Systems II: Analog and Digital Signal Processing, vol. 42, no. 6, June 1995.

[20] P. Mars, J. R. Chen, and R. Nambiar, Learning Algorithms: Theory andApplications in Signal Processing, Control, and Communications, CRCPress, Inc., 1996.

[21] A. Neubauer, “Genetic algorithms for adaptive non-linear predic-tors,” 1998 IEEE International Conference on Electronics, Circuits and Sys-tems, Volume: 1, vol. 1, pp. 209–212, 7–10 Sept. 1998.

[22] A. Neubauer, “Non-linear adaptive filters based on genetic algo-rithms with applications to digital signal processing,” Genetic AlgorithmsIn Engineering Systems: Innovations And Applications, 1997. GALESIA 97.Second International Conference On (Conf. Publ. No. 446), pp. 180–185,2–4 Sept. 1997.

[23] K. S. Tang, K. F. Man, and Q. He, “Genetic algorithms and their appli-cations,” IEEE Signal Processing Magazine, pp. 22–37, Nov. 1996.

[24] L. Yao and W. A. Sethares, “Nonlinear parameter estimation via thegenetic algorithm,” IEEE Transactions on Signal Processing, vol. 42, April 1994.

[25] P. J. Angeline, “Evolutionary optimization versus particle swarmoptimization: philosophy and performance differences,” EvolutionaryProgramming VII: Proc. of the Seventh Annual Conference on EvolutionaryProgramming, 1998.

[26] A. Carlisle and G. Dozier, “Adapting particle swarm optimization todynamic environments,” Proceedings of International Conference on Arti-ficial Intelligence (ICAI 2000), Las Vegas, Nevada, USA. pp. 429–434, 2000.

[27] M. Clerc, “The swarm and the queen: towards a deterministic andadaptive particle swarm optimization,” Proceedings of the IEEE Congresson Evolutionary Computation (CEC 1999), pp. 1951–1957, 1999.

[28] R. C. Eberhart and Y. Shi, “Evolving artificial neural networks,” Pro-ceedings of International Conference on Neural Networks and Brain, 1998,Beijing, P. R. China. pp. PL5–PL13, 1998.

[29] R. C. Eberhart and Y. Shi, “Comparison between genetic algorithmsand particle swarm optimization,” Evolutionary Programming VII: Pro-ceedings of the Seventh Annual Conference on Evolutionary Programming,San Diego, CA. 1998.

[30] R. Eberhart and Y. Shi, “Comparing inertia weights and constrictionfactors in particle swarm optimization,” in Proc. Congr. EvolutionaryComputation, vol. 1, pp. 84–88, 2000.

[31] R. Eberhart and Y. Shi, “Particle swarm optimization: Developments,applications and resources,” in Proc. Congr. Evolutionary Computation(CEC 01), vol. 1, pp. 81–86, 2001.

[32] X. Hu and R. C. Eberhart, “Adaptive particle swarm optimization:detection and response to dynamic systems,” Proceedings of the IEEE Con-gress on Evolutionary Computation (CEC 2002), Honolulu, Hawaii USA. 2002.

[33] D. J. Krusienski, “Enhanced structured stochastic global optimiza-tion algorithms for IIR and nonlinear adaptive filtering,” Ph.D. Thesis,Dept. of Electrical Eng., The Pennsylvania State University, UniversityPark, PA, 2004.

[34] D. J. Krusienski and W. K. Jenkins, “A particle swarm optimization-LMS hybrid algorithm for adaptive filtering,” Proc. of the 38th AsilomarConf. on Signals, Systems, and Computers, Nov. 2004.

[35] D. J. Krusienski and W. K. Jenkins, “Particle swarm optimization foradaptive IIR filter structures,” Proc. of the 2004 Congress on EvolutionaryComputation, pp. 965–970, June 2004.

[36] D. J. Krusienski and W. K. Jenkins, “The application of particleswarm optimization to adaptive IIR phase equalization,” Proceedings ofthe International Conference on Acoustics, Speech, and Signal Processing,Montreal, Canada, pp. II-693–II-696, 17–21 May 2004.

[37] D. J. Krusienski and W. K. Jenkins, “Adaptive filtering via particle


swarm optimization,” Proc. of the 37th Asilomar Conf. on Signals, Systems,and Computers, pp. 571–575, Nov. 2003.

[38] D. E. Goldberg, Genetic Algorithms in Search, Optimization, andMachine Learning, Addison-Wesley, 1989.

[39] S. C. Ng, S.H. Leung, C. Y. Chung, A. Luk, and W. H. Lau, “The genet-ic search approach: A new learning algorithm for adaptive IIR filtering,”IEEE Signal Processing Magazine, pp. 38–46, Nov. 1996.

[40] R. C. Eberhart and J. Kennedy, “A new optimizer using particleswarm theory,” Proceedings of the Sixth International Symposium onMicromachine and Human Science, Nagoya, Japan. pp. 39–43, 1995.

[41] J. Kennedy and R. C. Eberhart, “Particle swarm optimization,” Pro-ceedings of IEEE International Conference on Neural Networks, Piscataway,NJ. pp. 1942–1948, 1995.

[42] J. Kennedy and R. C. Eberhart, “A discrete binary version of the parti-cle swarm algorithm,” in Proc. IEEE Int. Conf. Systems, Man, and Cybernetics:Computational Cybernetics and Simulation, vol. 5, pp. 4104–4108, 1997.

[43] J. Kennedy, R. C. Eberhart, and Y. Shi, Swarm intelligence, San Francisco: Morgan Kaufmann Publishers, 2001.

[44] J. Kennedy and R. Mendes, “Population structure and particleswarm performance,” in Proc. Congr. Evolutionary Computation (CEC 02),vol. 2, pp. 1671–1676, 2002.

[45] J. Kennedy and J. M. Spears, “Matching algorithms to problems: Anexperimental test of the particle swarm and some genetic algorithms onthe multimodal problem generator,” in Proc. IEEE World Congr. Computa-tional Intelligence Evolutionary Computation, pp. 78–83, 1998.

[46] M. Clerc and J. Kennedy, “The particle swarm—Explosion, stability,and convergence in a multidimensional complex space,” IEEE Trans.Evol. Comput., vol. 6, pp. 58–73, Feb. 2002.

[47] A. I. El-Gallad, M.E. El-Hawary, A. A. Sallam, and A. Kalas, “Enhancingthe particle swarm optimizer via proper parameters selection,” CanadianConference on Electrical and Computer Engineering, 2002, pp. 792–797, 2002.

[48] Y. Shi and R. Eberhart, “A modified particle swarm optimizer,” inProc. IEEE World Congr. Computational Intelligence Evolutionary Compu-tation, pp. 69–73, 1998.

[49] Y. Shi and R. Eberhart, “Empirical study of particle swarm optimiza-tion,” in Proc. Congr. Evolutionary Computation (CEC 99), vol. 3, pp. 1945–1950, 1999.

[50] F. van den Bergh and A. P. Engelbrecht, “Effects of swarm size oncooperative particle swarm optimizers,” Proceedings of the Genetic andEvolutionary Computation Conference 2001, San Francisco, USA, 2001.

[51] S. Haykin, Adaptive Filter Theory, 4th ed. Prentice Hall, 2002.

[52] K. L. Blackmore, R.C. Williamson, I. M. Y. Mareels, and W. A. Sethares,“Online learning via congregational gradient descent,” Mathematics ofControl, Signals, and Systems 10, pp. 331–363, 1997.

[53] M. S. White and S. J. Flockton, Chapter in Evolutionary Algorithmsin Engineering Applications, Editors: D. Dasgupta and Z. Michalewicz,Springer Verlag, 1997.

[54] V.J. Mathews and G.L. Sicuranze, Polynomial Signal Processing, JohnWiley & Sons, Inc., 2000.

Dean J. Krusienski (S’01–M’04) receivedthe B.S.E.E., M.S.E.E., and Ph.D.E.E. degreesfrom The Pennsylvania State University,University Park, PA, in 1999, 2001, and 2004respectively.

From 2000–2004, he was a graduateteaching assistant at Penn State. He is cur-

rently a postdoctoral researcher at the New York StateDepartment of Health’s Wadsworth Center in Albany, NY. AtThe Wadsworth Center, he is researching the application ofadaptive signal processing and pattern recognition tech-niques to a non-invasive brain-computer interface that

allows individuals with severe motor disabilities to com-municate or control a computer/device via mental imagery.His current research interests include global optimizationalgorithms for adaptive filtering, computational intelli-gence, neural networks, blind source separation, and bio-medical signal processing.

Dr. Krusienski is a member of Eta Kappa Nu and the IEEE.

W. Kenneth Jenkins received the B.S.E.E.degree from Lehigh University and theM.S.E.E. and Ph.D. degrees from PurdueUniversity. From 1974 to 1977 he was aResearch Scientist Associate in the Com-munication Sciences Laboratory at theLockheed Research Laboratory, Palo Alto,

CA. In 1977 he joined the University of Illinois at Urbana-Champaign where he was a faculty member in Electricaland Computer Engineering from 1977 until 1999. From1986-1999 Dr. Jenkins was the Director of the CoordinatedScience Laboratory and Principal Investigator on the JointServices Electronics Program (JSEP) at the University ofIllinois. In 1999 he moved to his current position as Pro-fessor and Head of the Electrical Engineering Departmentat The Pennsylvania State University in State College, PA.

Dr. Jenkins’ current research interests include digitalfiltering, signal processing algorithms, multidimensionalarray processing, computer imaging, one and two dimen-sional adaptive digital filtering, and biologically inspiredoptimization algorithms for signal processing. He co-authored a book entitled Advanced Concepts in AdaptiveSignal Processing, published by Kluwer in 1996. He is apast Associate Editor for the IEEE Transaction on Circuitsand Systems, and President (1985) of the CAS Society. Heserved as General Chairman of the 1988 Midwest Sympo-sium on Circuits and Systems, as the Program Chairmanof the 1990 IEEE International Symposium on Circuits andSystems, as the General Chairman of the Thirty SecondAnnual Asilomar Conference on Signals and Systems, andas the Technical Program Chair for the 2000 MidwestSymposium on Circuits and Systems. He is currentlyserving as President of the Electrical and ComputerEngineering Department Heads Association (ECEDHA).

Dr. Jenkins is a Fellow of the IEEE and a recipient ofthe 1990 Distinguished Service Award of the IEEE Cir-cuits and Systems Society. In 2000 he received a Gold-en Jubilee Medal from the IEEE Circuits and SystemsSociety and a 2000 Millennium Award from the IEEE. In2000 was named a co-winner of the 2000 InternationalAward of the George Montefiore Foundation (Belgium)for outstanding career contributions to the field of elec-trical engineering and electrical science, and in 2002 hewas awarded the Shaler Area High School DistinguishedAlumnus Award.


design and performance of adaptive systems based on...

Documents