two nonlinear optimization methods for black box identification compared

7
Automatica 46 (2010) 1675–1681 Contents lists available at ScienceDirect Automatica journal homepage: www.elsevier.com/locate/automatica Brief paper Two nonlinear optimization methods for black box identification compared Anne Van Mulders a,* , Johan Schoukens a , Marnix Volckaert b , Moritz Diehl c a Vrije Universiteit Brussel, Department ELEC, Pleinlaan 2, B1050 Brussels, Belgium b K.U.Leuven, Department MECH, Celestijnenlaan 300b, B3001 Leuven, Belgium c K.U.Leuven, Department ESAT, Kasteelpark Arenberg 10, B3001 Leuven, Belgium article info Article history: Received 16 June 2009 Received in revised form 16 April 2010 Accepted 27 May 2010 Available online 24 July 2010 Keywords: Identification algorithms Parameter estimation Nonlinear systems Nonlinear models State-space models Constraints abstract In this paper, two nonlinear optimization methods for the identification of nonlinear systems are compared. Both methods estimate the parameters of e.g. a polynomial nonlinear state-space model by means of a nonlinear least-squares optimization of the same cost function. While the first method does not estimate the states explicitly, the second method estimates both states and parameters adding an extra constraint equation. Both methods are introduced and their similarities and differences are discussed utilizing simulation data. The unconstrained method appears to be faster and more memory efficient, but the constrained method has a significant advantage as well: it is robust for unstable systems of which bounded input–output data can be measured (e.g. a system captured in a stabilizing feedback loop). Both methods have successfully been applied on real-life measurement data. © 2010 Elsevier Ltd. All rights reserved. 1. Introduction Most real-life systems are to some extent nonlinear. For such systems, nonlinear modelling can improve the identification results, i.e. reduce the residual error. There exist several types of nonlinear models, among them black box models such as Volterra systems, block structured models, neural networks and fuzzy models. For more information, we refer to Giannakis and Serpedin (2001). In this paper, the model structure is a conventional linear state-space model extended with polynomial nonlinear terms. It is called a Polynomial Nonlinear State-space Model (PNLSS) (Paduart, 2008). The PNLSS model serves as an example, but the ideas could be applied to other nonlinear state-space models as well. The goal is to estimate the parameters of the nonlinear model given the exact input and the noisy output measurements. The states are assumed to be unknown and in order to solve the The material in this paper was partially presented at 15th IFAC Symposium on System Identification, SYSID 2009, July 6–8, 2009, Saint-Malo, France. This paper was recommended for publication in revised form by Associate Editor Alessandro Chiuso under the direction of Editor Torsten Söderström. * Corresponding author. Tel.: +32 2 629 36 65; fax: +32 2 629 28 50. E-mail addresses: [email protected] (A. Van Mulders), [email protected] (J. Schoukens), [email protected] (M. Volckaert), [email protected] (M. Diehl). problem we will minimize the least square error between the measured and modelled output. The contributions of this work are the following: Construction of a constrained optimization method that allows one to estimate the PNLSS model parameters from input–output data. The approach is related to multiple shooting methods for parameter estimation (Bock, 1987), but is now applied to black box system identification with a procedure that provides initial estimates of the parameters; Evaluation of its properties, such as the estimation of an unstable system based on bounded input–output data, by means of simulation examples; Comparison of the performances (least square errors) of the two optimization methods. The structure of the paper is the following: in Section 2, we present the model structure. In Section 3, we explain briefly the generation of initial estimates for the parameters to be estimated. In Section 4, the two nonlinear optimization methods are discussed and Section 5 shows the simulation results. The last section recapitulates the main conclusions of this paper. Although the paper does not contain any experimental results, both methods have already been used in practice, such as the benchmark session of SYSID2009 (Paduart, Lauwers, Pintelon, & Schoukens, 2009; Schoukens, Suykens, & Ljung, 2009; Van Mulders, Volckaert, Diehl, & Schoukens, 2009). 0005-1098/$ – see front matter © 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.automatica.2010.06.021

Upload: anne-van-mulders

Post on 26-Jun-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Automatica 46 (2010) 1675–1681

Contents lists available at ScienceDirect

Automatica

journal homepage: www.elsevier.com/locate/automatica

Brief paper

Two nonlinear optimization methods for black box identification comparedI

Anne Van Mulders a,∗, Johan Schoukens a, Marnix Volckaert b, Moritz Diehl ca Vrije Universiteit Brussel, Department ELEC, Pleinlaan 2, B1050 Brussels, Belgiumb K.U.Leuven, Department MECH, Celestijnenlaan 300b, B3001 Leuven, Belgiumc K.U.Leuven, Department ESAT, Kasteelpark Arenberg 10, B3001 Leuven, Belgium

a r t i c l e i n f o

Article history:Received 16 June 2009Received in revised form16 April 2010Accepted 27 May 2010Available online 24 July 2010

Keywords:Identification algorithmsParameter estimationNonlinear systemsNonlinear modelsState-space modelsConstraints

a b s t r a c t

In this paper, two nonlinear optimization methods for the identification of nonlinear systems arecompared. Both methods estimate the parameters of e.g. a polynomial nonlinear state-space model bymeans of a nonlinear least-squares optimization of the same cost function.While the firstmethoddoes notestimate the states explicitly, the second method estimates both states and parameters adding an extraconstraint equation. Both methods are introduced and their similarities and differences are discussedutilizing simulation data. The unconstrainedmethod appears to be faster andmore memory efficient, butthe constrained method has a significant advantage as well: it is robust for unstable systems of whichbounded input–output data can be measured (e.g. a system captured in a stabilizing feedback loop). Bothmethods have successfully been applied on real-life measurement data.

© 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Most real-life systems are to some extent nonlinear. Forsuch systems, nonlinear modelling can improve the identificationresults, i.e. reduce the residual error. There exist several types ofnonlinear models, among them black box models such as Volterrasystems, block structured models, neural networks and fuzzymodels. For more information, we refer to Giannakis and Serpedin(2001).In this paper, the model structure is a conventional linear

state-spacemodel extendedwith polynomial nonlinear terms. It iscalled a Polynomial Nonlinear State-spaceModel (PNLSS) (Paduart,2008). The PNLSS model serves as an example, but the ideas couldbe applied to other nonlinear state-space models as well.The goal is to estimate the parameters of the nonlinear model

given the exact input and the noisy output measurements. Thestates are assumed to be unknown and in order to solve the

I The material in this paper was partially presented at 15th IFAC Symposium onSystem Identification, SYSID 2009, July 6–8, 2009, Saint-Malo, France. This paperwas recommended for publication in revised form by Associate Editor AlessandroChiuso under the direction of Editor Torsten Söderström.∗ Corresponding author. Tel.: +32 2 629 36 65; fax: +32 2 629 28 50.E-mail addresses: [email protected] (A. Van Mulders),

[email protected] (J. Schoukens), [email protected](M. Volckaert), [email protected] (M. Diehl).

0005-1098/$ – see front matter© 2010 Elsevier Ltd. All rights reserved.doi:10.1016/j.automatica.2010.06.021

problem we will minimize the least square error between themeasured and modelled output.The contributions of this work are the following:

• Construction of a constrained optimization method thatallows one to estimate the PNLSS model parameters frominput–output data. The approach is related tomultiple shootingmethods for parameter estimation (Bock, 1987), but is nowapplied to black box system identificationwith a procedure thatprovides initial estimates of the parameters;• Evaluation of its properties, such as the estimation of anunstable system based on bounded input–output data, bymeans of simulation examples;• Comparison of the performances (least square errors) of the twooptimization methods.

The structure of the paper is the following: in Section 2, wepresent the model structure. In Section 3, we explain briefly thegeneration of initial estimates for the parameters to be estimated.In Section 4, the twononlinear optimizationmethods are discussedand Section 5 shows the simulation results. The last sectionrecapitulates the main conclusions of this paper. Although thepaper does not contain any experimental results, both methodshave already been used in practice, such as the benchmark sessionof SYSID2009 (Paduart, Lauwers, Pintelon, & Schoukens, 2009;Schoukens, Suykens, & Ljung, 2009; VanMulders, Volckaert, Diehl,& Schoukens, 2009).

1676 A. Van Mulders et al. / Automatica 46 (2010) 1675–1681

2. Model structure

2.1. Class of discrete time systems considered

As is known very well, state-space models are particularly wellsuited for multiple-input multiple-output (MIMO) systems. Let nuand ny represent resp. the number of input and output signals.In general, a discrete time nonlinear state-space model can beformulated as:

x(t + 1) = f (x(t), u(t))y(t) = h(x(t), u(t)).

(1)

Herein, t = [0 · · ·N − 1] is the discrete time instant, x ∈ Rn×N arethe states and u ∈ Rnu×N and y ∈ Rny×N are the input and output.n is the model order and N is the total number of time instants.The upper equation is called the state equation and describes theevolution of the states. The lower equation is called the outputequation and describes the output as function of the states andinputs.In our case, we assume that the exact description of the

nonlinear system is of the form:

x(t + 1) = Ax(t)+ Bu(t)+ Eζ (x(t), u(t))y(t) = Cx(t)+ Du(t)+ Fη(x(t), u(t)).

(2)

The vectors ζ ∈ Rnζ and η ∈ Rnη contain monomials in x(t)and u(t); the matrices E ∈ Rn×nζ and F ∈ Rny×nη contain thecoefficients associated with those monomials. nη and nζ are thenumber of monomials in resp. η and ζ .The above mentioned model is called a polynomial nonlinear

state-space model (PNLSS). It consists of a classical linear state-space model with nonlinear terms Eζ and Fη. The coefficients ofthe linear terms in x(t) (the states) and u(t) (the inputs) are givenby the coefficient matrices A ∈ Rn×n and B ∈ Rn×nu in the stateequation, and C ∈ Rny×n and D ∈ Rny×nu in the output equation.The monomials can be any chosen set of combinations of

xα11 xα22 . . . x

αnn u

β11 u

β22 . . . u

βnunu with α1, . . . , αn, β1, . . . , βnu ∈ N

and∑j αj +

∑i βi ≤ d. Herein, d ∈ N is called nonlinear

degree and has to be chosen by the user. The n state equationsand ny output equations of a linear state-space system areextended by adding a polynomial to every equation. The majoradvantage of the PNLSS model is its capability of describing avery large class of nonlinear systems, such as bilinear models,affine models, nonlinear models with only nonlinearities in thestates, nonlinear models with only nonlinearities in the input andcertain block-structured nonlinearmodels (Wiener, Hammerstein,Wiener–Hammerstein and nonlinear feedback) (Paduart, 2008). Inthis reference, the model has been successfully used on severalapplication examples. Consequently, it can be stated that the PNLSSmodel (2) is a generic ‘‘all-purpose’’ black-box model (although itsapproximation capabilities are quite large, e.g. nonsmooth stateevolutions can not be adequately represented). One drawback isthat, in practice, when a full parameterisation is used, with a highnonlinear degree, the number of parameters grows very large.Current research focuses on reducing the amount of parametersby means of similarity transforms.

2.2. Parameterization

Despite the easewithwhich the state spacemodel structure canhandle MIMO systems, we will restrict ourselves without loss ofgenerality to single-input single-output (SISO) systems (nu = ny =1) in order to focus on the main topic of this paper.Define θ ∈ Rnθ as a vector containing all themodel parameters:

θ T =[vec(A)T BT C D vec(E)T F

](3)

with vec an operator that stacks the columns of a matrix ontoeach other. Since all model parameters are included, the model is

overparameterised. This is a consequence of similarity transformson the states that do not influence the input–output behaviour.Both linear and nonlinear transforms can exist. This problem istaken care of in methods A and B, respectively by means ofa pseudo-inverse and some kind of Levenberg–Marquardt term.Other approaches exist, such as the use of canonical forms or DataDriven Local Coordinates (DDLC). The latter approach avoids thenumerical ill-conditioning of the estimation problem in the case ofa canonical parameterisation (McKelvey, Helmersson, & Ribarits,2004). The DDLC approach has in fact been proven to be equivalentto the pseudo-inverse (as is used in method A) (Wills & Ninness,2008). The advantage of the pseudo-inverse is that it can easily beimplemented,while theDDLCmethod is up tonowonly feasible forlinear, bilinear or LPV state space models. The disadvantage is thatall parameters need to be identified, although this is less importantfor nonlinearmodelswithmany parameters, since the relative gainis then small. It has been shown (Pintelon, Schoukens, McKelvey,& Rolain, 1996) that the choice of parameterisation does not affectthe stochastic properties (i.e. the minimum variance bounds).

2.3. Stochastic framework

The input is assumed to be known exactly (without noise).If the model is capable of describing the system, the outputmeasurements ym are related to the system output y(t, θ0):

ym(t) = y(t, θ0)+ v(t) (4)

with θ0 the true parameter values and v(t) the output measure-ment noise, which is here for simplicity assumed to bewhite Gaus-sian, zero mean and with finite variance.Under these assumptions, the least-squares estimator

corresponds to the maximum-likelihood estimator, which isasymptotically consistent, efficient and normally distributed(Kendall & Stuart, 1979). The noise condition can be relaxed to fil-tered white noise with existing second and fourth order moments.

3. Initial estimates

In both cases, the initial estimates for the linear parameters (A,B, C , D) are found by a two-step procedure (Paduart, 2008).We choose to use a frequency domain approach. This has two

advantages: bounded initial estimates can also be obtained forunstable systems and nonparametric weighting is easier than inthe time domain. This robustifies the method significantly.First, the Best Linear Approximation (BLA) of the system is

estimated (Pintelon & Schoukens, 2001)

GBLA(k) =SYU(k)

SUU(k)(5)

with GBLA the estimated frequency response function, k thefrequency line, SYU the estimated cross-power spectrum betweenoutput and input andwith SUU the estimated auto-power spectrumof the input. The BLA minimizes the output error in least squaressense. Also the variance σ 2GBLA(k) can be estimated to enhance thesecond step by using a weighted least-squares method.This first step offers a number of advantages: the signal to noise

ratio (SNR) is enhanced, the user can select – in a straightforwardway – a frequency band of interest and, when periodic data areavailable, the measurement noise and the effect of the nonlinearbehaviour can be separated. If the total variance lies close to thenoise variance (i.e. the nonlinear variance is small), a linear modelis sufficient, otherwise, a nonlinear model is needed.The second step is to convert this nonparametric model into a

linear parametric state-space model using the Frequency DomainSubspace identification method (McKelvey, Akçay, & Ljung, 1996;Pintelon, 2002).

A. Van Mulders et al. / Automatica 46 (2010) 1675–1681 1677

The nonlinear parameter matrices (E and F ) are initialized tozero. It is assumed that the nonlinear behaviour is not too strongsuch that this initial point belongs to the domain of attraction of theiterative algorithm. If this is not the case, nonlinear initialisation isalso possible.

4. Nonlinear optimization

In this section, two nonlinear optimization methods arediscussed. The first one (A) in less detail than the second one(B) because the former has already been described extensively inPaduart (2008).

4.1. Cost function

We define the measured output as

yTm = [ym(0) · · · ym(N − 1)] (6)and the modelled output as

y(θ)T = [y(0, θ) · · · y(N − 1, θ)] (7)with N the number of time samples. Both methods minimize thesame least squares cost function:

V (θ) = ε(θ)T ε(θ) (8)with ε(θ) = y(θ)− ym.Methods A and B differ in three respects: the choice of the

free variables, the lack or presence of constraints and the way inwhich the output is calculated. In method A, the free variables areonly the model parameters, whereas in method B, both the modelparameters and the states are free variables. The calculation of themodelled output is different because in method B, the modelledstates are used instead of the state equations.

4.2. Method A: (unconstrained) Levenberg–Marquardt

The cost function (8) of the least squares problem is nonlinearin the parameters. Using a nonlinear optimization method, suchas the Levenberg–Marquardt method (Fletcher, 1991), a (local)minimum

θ = argminθV (θ) (9)

is found in an iterative way:

(JT J + λ2LMInθ×nθ )δθ = −JT ε (10)

with λLM ∈ R+ the Levenberg–Marquardt factor and J the Jacobianmatrix:

J =∂ε

∂θ. (11)

The parameters are updated by adding δθ to the previous valueof θ . The parameter update δθ is calculated via a singular valuedecomposition (SVD) of the Jacobian matrix J = UJΣJV TJ (Paduart,2008):

δθ = −VJ(Σ2J + λ

2LMInθ×nθ

)−1ΣJUTJ ε. (12)

Doing so reduces the numerical errors compared to solving (10)directly, which would deteriorate the numerical conditioning dueto the explicit formation of the product JT J (Pintelon & Schoukens,2001).

4.3. Method B: constrained Levenberg–Marquardt (CLM)

The second method is inspired by the so called ‘‘simultaneousapproach’’ to parameter estimation in differential equations (Bock,1983, 1987; Bock, Lohmann, & Schlöder, 1992), which we shall callthe ‘‘constrained Levenberg–Marquardt’’ method.

The optimization variables ϑ ∈ Rnϑ are the states (at all timeinstants) and the parameters of the chosen model:

ϑT =[xT (0) · · · xT (N − 1) θ T

]. (13)

In this case, (7) and (8) should be interpreted as functions of ϑinstead of θ .In the CLMmethod, cost (or objective) function (8) isminimized

subject to the constraint function F(ϑ) = 0:

ϑ = argminϑ

s.t. F(ϑ)=0

V (ϑ) (14)

with

F =

f (x(0), u(0))− x(1)f (x(1), u(1))− x(2)

...f (x(N − 1), u(N − 1))− x(0)

. (15)

This constraint function is built up by the differences between bothsides of the state equation (1), evaluated at the states from (13). Thelast constraint equation imposes periodicity. For aperiodic signals,this equation should be deleted.

4.3.1. Initial estimates for ϑThe initial estimates of the parameter values θ are determined

as in Section 3.For periodic signals, the initial values of the states can be

calculated in the time domain by using the state equation of (2),with u(t) the known input, A, B (and E = 0) parameters in θ .Transient effects can be eliminated by simply simulating severalperiods (starting from x(0) = 0) and retaining only the last period.A better alternative in the time domain is to estimate the initialstate x(0) by adding an artificial input to themodel (Paduart, 2008)and then to proceed as described above to obtain the states at theother time instants. Another option is to calculate the states in thefrequency domain (z-domain) via:

X(zk) = (zkI − A)−1BU(zk) (16)

with zk = ej2πk/N and X and U the discrete Fourier transformsof the time domain signals x and u. This provides bounded initialestimates, even in the case of unstable models (when boundedinput–output measurements are available, e.g. by means of astabilizing feedback). Note that there is one veryweak assumption:(zkI − A) should be a regular matrix ∀zk.For aperiodic signals, the time-domain implementation that

starts from an estimate of x(0) can be used for stable models.However, this method does not work in the unstable case.We choose to use the frequency domain approach for both

periodic and aperiodic signals in this paper. For periodic signals,the resulting estimates are exact but for aperiodic signals, weassume that the transient errors will be corrected in the iterationsof the constrained minimization. The transient errors will not beimportant, as they are known to vanish asO(N−1/2) relative to thedominant contribution (Pintelon & Schoukens, 2001).

4.3.2. Iterative procedure

Minimization of the Lagrangian functionThe (constrained) minimization is achieved by iteratively

updating the variable vector ϑ . This minimization involves lookingfor a stationary point of the Lagrangian function V (ϑ) + λT F(ϑ)with λ the vector of Lagrange multipliers. In the minimizationprocess, a new value of ϑ is calculated by adding δϑ to the currentvalue. If we were to use the constrained Gauss–Newton method

1678 A. Van Mulders et al. / Automatica 46 (2010) 1675–1681

(Bock, 1983; Fletcher, 1991), δϑ would be the solution of:[2JT1 J1 JT2J2 0Nst×Nst

]︸ ︷︷ ︸

KKT

[δϑλ

]= −

[2JT1 εF

](17)

with KKT the Karush–Kuhn–Tucker matrix (Fletcher, 1991) andJ1 ∈ RN×nϑ and J2 ∈ RNst×nϑ the Jacobians of the residual vector εand constraint function:

J1 =∂ε

∂ϑ

J2 =∂F∂ϑ.

(18)

The Lagrange multipliers in vector λ ∈ RNst are a byproduct of thecalculations and are a measure for the sensitivity of the objectiveto changes in each constraint. Nst is the total number of states(at all time instants): Nst = Nn. Note that the first Jacobian isdifferent from the one inmethod A (11). The derivative of the errorε to the model parameters of the state equation is now zero, sincethe states have become parameters themselves. The consequenceis that J1 (as J2 and ε) does not consist of recursive expressions:it is unnecessary to simulate the state equation. Therefore, itscomputation does not suffer from instabilities of the model at anyiteration.Inversion of the KKT matrixIt might become time-consuming to invert the KKT matrix

because it grows (in both dimensions) with the number of datapointsN . On the other hand, thematrix is very sparse,which allowsfor the use of more efficient direct inversion techniques availablein software packages such as MatlabTM.Fig. 1 depicts the nonzero elements of the KKT matrix for 100

data points, model order 2 and nonlinear degree d = 3.Penalty (or merit) functionThe constraints are not satisfied during the iterative procedure.

For every new step δϑ , it needs to be determined whether thenew solution is better or worse, and consequently whether a stepshould be taken or not. By using the so-called ‘‘L1 exact penaltyfunction’’, defined as

φ(ϑ) = µV (ϑ)+ ‖F(ϑ)‖1 (19)

with µ > 0, the SQP (Sequential Quadratic Programming) methodis turned into an SL1QPmethod (Fletcher, 1991; Nocedal &Wright,1999). By means of this penalty function, the trade-off betweenthe cost V and the constraints F is made. The penalty function isexact in the sense that it is locally minimised by the solution tothe constrained problem if µ is chosen such that µ < 1/‖λ‖∞.In practice, the penalty function is minimised along the directionindicated by the KKT system of equations (line search). When, atsome point during the iteration procedure, the relative changes ofthe penalty function are less than a user-defined tolerance level,the optimisation can be stopped. It can then be assumed that alocal minimum of φ, and hence the solution to the constrainedproblem, has been reached. This solution is the stationary pointof the Lagrangian function, which yields that the constraints arefullfilled.Several other globalisation strategies can be used to ensure con-

vergence, e.g. natural level functions or the restrictive monotonic-ity test (Bock, Kostina, & Schlöder, 2000), trust region methods(Conn, Gould, & Toint, 2000), merit functions (Powell, 1978) or fil-ter SQP (Sequential Quadratic Programming, Fletcher, Leyffer, &Toint, 2002).Avoid singularityAs discussed previously in Section 2.2, similarity transforms

are linear or nonlinear transformations of the states. In our case,

Fig. 1. Position of the nonzero elements (grey dots) of the KKT matrix.

they prevent J1 from being a full rank matrix. Unfortunately, theseindeterminations cause the KKT matrix in (17) to be singular.This problem can be circumvented by replacing 2JT1 J1 by 2J

T1 J1 +

λ2LMInϑ×nϑ (addition of a Levenberg–Marquardt parameter).[

2JT1 J1 + λ2LMI

nϑ×nϑ JT2J2 0Nst×Nst

]︸ ︷︷ ︸

KKTLM

[δϑλ

]= −

[2JT1 εF

]. (20)

Since the left upper part of KKTLM and its inverse are now positivedefinite and bounded, and J2 has full rank, the SL1QP methodconverges (Han, 1977). Also, Fletcher (1991) and Nocedal andWright (1999) report its good performance in combination witha second order correction step.Some implementation detailsIn our implementation, the penalty parameterµ is chosen to be

0.9/‖λ‖∞ and is updated after every successful step. This choiceofµ guarantees that the condition for the exactness of the penaltyfunction (µ < 1/‖λ‖∞) is always fullfilled. We also implementeda Second Order Correction (SOC), which is not used in everyiteration step, but only when a normal SQP step could not providea decrease of the merit function. The SOC step should preventthe method suffering from the Maratos effect (Maratos, 1978;Nocedal & Wright, 1999), where convergence is slowed downbecause certain seemingly bad steps (forwhichφ increases) are notaccepted, while they are actually beneficial to the convergence.In the constrained framework, contrary to the unconstrained

Levenberg–Marquardt method, convergence is no longer guaran-teed by just adding a sufficiently high λLM (Fletcher, 1991). Alsothe convergence speed is influenced by the choice of λLM. Choos-ing this term as small as possible is not a good idea: it sometimesresults in a bad step because the matrix tends more towards sin-gularity. On the other hand, a large term can result in slow con-vergence. Therefore, the regularisation term is altered during theiterations. When a step is successful, it might be possible that asmaller Levenberg–Marquardt parameter is sufficient, so it is di-minished (for instance by a factor 1.2). When a step is not success-ful or when the line search resulted in a very small step, while theadded term is not very large, the Levenberg–Marquardt parameteris increased (for instance by a factor 10). This approach is inspiredby the adaptation of the step length in gradient descent methodsand the adaptation of the Levenberg–Marquardt term in a Leven-berg–Marquardt method.Why not use an SVD?In a normal situation, one would immediately consider using

the pseudo-inverse, but even for sparse matrices, singular value

A. Van Mulders et al. / Automatica 46 (2010) 1675–1681 1679

Fig. 2. Convergence curve: method A (crosses) and method B (circles).

Fig. 3. Constraint versus iteration number: method B.

decompositions (SVD) require a huge amount of computation time.The number of flops required for the SVD decomposition of a fullm×mmatrix increases asO(m3) (Golub&Van Loan, 1989). Appliedto the KKTmatrix, withm = Nst+nϑ , this corresponds to anO(N3)increase for n, nθ � N .

5. Simulations

In this paper, two cases are used to illustrate the advantagesof method B. The first simulation shows that it can cross unstableregions during the optimisation, the second simulation exemplifiesits use for the modelling of unstable systems. Advantages ofmethod A on the other hand are a higher computation speed andlower memory requirements. In general, this will render methodA preferable, but in some cases, one can not solve the optimisationproblem with method A while method B works as usual.

5.1. Simulation 1

The idea is to start with a known PNLSS model structure sothat – for the right model settings – the cost function shouldconverge to zero (or computer precision) in the noiseless case andto noise level in the noisy case. The constraint function should alsoconverge to computer precision, even in the noisy case. In general,the results of methods A and B are comparable, but nevertheless,some cases could be found where, although starting at the sameinitial estimate, method A could not converge while method B hadno difficulties. The converse is not true. One such a result is shownin Fig. 2, in which the convergence curves of methods A and Bare plotted (cost versus iteration number) in a noiseless case. Theselectedmodel structure is of ordern = 2 andhas nonlinear degreed = 3. The input is aperiodic (with initial state zero) and Gaussianwith zeromean. The evolution of the constraint function ofmethodB is depicted in Fig. 3. For method A, this has no use: the constraintis always fulfilled. At every iteration, a trade-off is made betweencost and constraint. The measure of this trade-off is determined bythe Lagrange multipliers λ (Section 4.3.2) and by the value of µ inthe L1 exact penalty function φ (19). Evenwhen the constraints arefulfilled in the beginning, theywill in general becomenonzero afterthe first iteration if the constraints are nonlinear in the parameters(as is the case here). The difficulty to identify the chosen modelcan be explained by its ‘nearly unstable’ nature: the poles of the

Fig. 4. Convergence curve for unstable system: method A (crosses) and method B(circles).

underlying linear system lie close to the unit circle (at radius 0.967)and the intermediate estimated models are often unstable. Duringconvergence, method B manages to pass through an unstableregion, as can be seen by looking at the simulated model output atcertain iterations. This simulated output is not calculated with thestate-parameters in ϑ , but with the entire state-space model (2),feeding the states back (via the state equation) and hence causingthe problem. Method A is not capable of passing through unstableregions (as will be explained in Section 5.3) and gets stuck at theirborder.

5.2. Simulation 2

We now give an example of a third order unstable, nonlinearsystem that can be identified well only with method B.[x1(t + 1)x2(t + 1)x3(t + 1)

]= A

[x1(t)x2(t)x3(t)

]+ Bu(t)+

1.2x1(t)20.3x2(t)3

−0.1x3(t)3

y(t) = x1(t) (21)

with

A =

[1.5 0 00.5 0.4 0.30.3 0.8 −0.6

]B =

[ 1−0.20.3

]. (22)

The unstable system is placed inside a feedback loop, such that afeedback termw is added to the (Gaussian and aperiodic) referencesignal r:

u(t) = r(t)+ w(t) (23)

with

w(t) = −1.5y(t)− 1.2y(t)2. (24)

This feedback loop does not provide stabilization for any arbitraryinput, but the output realisation was bounded for an input signalwith root mean square (RMS) value 0.2, while the system behavesclearly unstable without the feedback terms. This example is thussuitable for demonstrating that method B can handle unstablesystems.The subspace method provides an initial estimate of the state-

space parameters of the BLA. The initial states are obtained with(16). Since this initial linear model is unstable, method A can noteven start optimizing. It is however possible to stabilize themodel,for instance by reflecting the poles into the unit circle. Of course,thiswill in general create an additional (bias) error on the estimate.Nevertheless, this idea is used to obtain starting values for methodA. The convergence curves of both methods are shown in Fig. 4.The evolution of the constraint function of method B is depicted inFig. 5.In this example, the initial linear estimates are bad approxima-

tions of the true system. The RMS values of the output errors are re-spectively 0.17 and 0.06 for methods A and B, while the RMS valueof the output itself is 0.2. In method B, the first successful step wasmade after 8 iterations. This is because the initial estimate of λwaschosen too small.

1680 A. Van Mulders et al. / Automatica 46 (2010) 1675–1681

Fig. 5. Constraint versus iteration number for unstable system: method B.

Other examples that are not displayed here, show that whenimproved initial estimates are used (such as an initial output errorof about 0.1), method A still fails to converge, whilemethod B findsthe solution.

5.3. Discussion

The problem with method A is that it calculates the outputby means of the entire state-space model (2). If the system isunstable, the states and consequently the output (andoutput error)will grow very (even infinitely) large. In method B, the states areconsidered to be model parameters, and the output (and outputerror) is calculated by means of the output equation only. If theinitial state estimates are bounded, so will be the output. In otherwords, method A suffers from the ill-posedness of the simulationproblem for an unstable system, while method B does not.

6. Conclusion

The results of constrained and unconstrained optimizationwiththe same settings were very comparable. Nevertheless, they differin some respects:• When using off-the-shelf sparse direct solvers, the constrainedoptimization cannot handle very large amounts of data points,because of an increased computation time and memorylimitations. In this respect, dedicated sparse solvers could beused to improve the implementation performance.• The constrained optimization showed to be robust in the caseof nearly unstable nonlinear systems with a stable underlyinglinear part,while unconstrained optimization sometimes failed.The constrained optimization was able to identify an unstablenonlinear systemwithin a control loop that generates boundedinput–output values, unlike the unconstrained optimization,which simply breaks down on the evaluation of the modeloutput and gets stuck at the stability border. Moreover, theconstrained method offers the advantage of being able to startwith an unstable initial model.

Further research will focus on combining the positive aspectsof both methods by gradually moving from a constrainedoptimization with few data points and all constraint equations,towards the unconstrained method with many data points. Thatway, the optimization process can cross unstable regions and thevariance of the results will be smaller because in the end, the entiredata set can be used.

Acknowledgements

This research was supported by the Methusalem grant ofthe Flemish Government (METH-1), the Belgian Program onInteruniversity Poles of Attraction initiated by the Belgian State,Prime Minister’s Office, Science Policy programming (IUAP VI/4- Dysco), the research council of the Vrije Universiteit Brussel(OZR), the Fund for Scientific Research (FWO - Vlaanderen), KULEF/05/006 (OPTEC), IOF-SCORES4CHEM, FWO G.0320.08 (convexMPC), G.0558.08 (Robust MHE), ICCoS, EU FP7 223854 (HD-MPC),Helmholtz viCERP and Comet ACCM.

References

Bock, H. G. (1983). Recent advances in parameter identification techniques for ODE.Boston: Birkhäuser.

Bock, H. G. (1987). Randwertproblemmethoden zur parameteridentifizierung inSystemen nichtlinearer differentialgleichungen. Vol. 183 of Bonner Mathema-tische Schriften. Universität Bonn. Bonn.

Bock, H. G., Kostina, E. A., & Schlöder, J. P. (2000). On the role of natural levelfunctions to achieve global convergence for Damped Newton Methods. In M.Powell, et al., (Eds.), System modelling and optimization. Methods, theory andapplications (pp. 51–74). Kluwer.

Bock, H. G., Lohmann, T., & Schlöder, J. P. (1992). Numerical methods for parameterestimation and optimal experimental design in chemical reaction systems.Industrial and Engineering Chemistry Research, 31, 54–57.

Conn, A., Gould, N., & Toint, P. L. (2000). Trust-region methods. Philadelphia, USA:SIAM.

Fletcher, R. (1991). Practicalmethods of optimization (2nd ed.). NewYork: JohnWileyand Sons.

Fletcher, R., Leyffer, S., & Toint, P. L. (2002). On the global convergence of a filter SQPalgorithm. SIAM Journal on Optimization, 13(1), 44–59.

Giannakis, G. B., & Serpedin, E. (2001). A bibliography on nonlinear systemidentification. IEEE Transactions on Signal Processing , 81(3), 533–580.

Golub, G. H., & Van Loan, C. F. (1989).Matrix computations (2nd ed.). Baltimore: TheJohns Hopkins University Press.

Han, S. P. (1977). A globally convergent method for nonlinear programming. Journalof Optimization Theory and Applications, 22, 297–309.

Kendall, M., & Stuart, A. (1979). The advanced theory of statistics. Vol. 2: inference andrelationship (4th ed.). London: Griffin.

Maratos, N. (1978). Exact penalty function algorithms for finite dimensional andoptimization problems. Ph.D. thesis. Imperial College of Science and Technology.London.

McKelvey, T., Helmersson, A., & Ribarits, T. (2004). Data drive local coordinatesfor multivariable linear systems and their application to system identification.Automatica, 40, 1629–1635.

McKelvey, T., Akçay, H., & Ljung, L. (1996). Subspace-based multivariable systemidentification from frequency response data. IEEE Transactions on AutomaticControl, 41(7), 960–979.

Nocedal, J., & Wright, S. J. (1999). Numerical optimization. New York: Springer-Verlag.

Paduart, J. (2008). Identification of nonlinear systems using Polynomial NonlinearState Space models. Ph.D. thesis. Vrije Universiteit Brussel.

Paduart, J., Lauwers, L., Pintelon, R., & Schoukens, J. (2009). Identificationof a Wiener–Hammerstein system using the Polynomial Nonlinear StateSpace approach. In 15th IFAC symposium on system identification, SYSID 2009(pp. 1080–1085).

Pintelon, R. (2002). Frequency-domain subspace system identification using non-parametric noise models. Automatica, 38, 1295–1311.

Pintelon, R., & Schoukens, J. (2001). System identification: a frequency domainapproach. New Jersey: IEEE Press.

Pintelon, R., Schoukens, J., McKelvey, T., & Rolain, Y. (1996). Minimum variancebounds for overparameterized models. IEEE Transactions on Automatic Control,41(5), 719–720.

Powell, M. J. D. (1978). A fast algorithm for nonlinearly constrained optimizationcalculations. In G. A. Watson (Ed.)., Lecture notes in mathematics: Vol. 630.Numerical analysis, Dundee 1977. Berlin: Springer.

Schoukens, J., Suykens, J., & Ljung, L. (2009). Wiener–Hammerstein benchmark. In15th IFAC symposium on system identification, SYSID 2009.

Van Mulders, A., Volckaert, M., Diehl, M., & Schoukens, J. (2009). Two nonlinearoptimization methods for black box identification compared. In 15th IFACsymposium on system identification, SYSID 2009 (pp. 1086–1091).

Wills, A. G., & Ninness, B. (2008). On gradient-based search formultivariable systemestimates. IEEE Transactions on Automatic Control, 53(1), 298–306.

Anne Van Mulders received the degree of mechanicalengineer in 2007 from the Vrije Universiteit Brussel (VUB),Brussels, Belgium. She is currently a Ph.D. student at theVUB at the Electrical Measurement Department (ELEC).Her main research interests are in the field of nonlinearsystem identification.

Johan Schoukens received the degree of engineer in 1980,the degree of doctor in applied sciences in 1985, all fromthe Vrije Universiteit Brussel (VUB). He is presently aprofessor at the VUB. The prime factors of his researchare in the field of system identification for linear and non-linear systems, and growing tomatoes and melons in hisgreenhouse.

A. Van Mulders et al. / Automatica 46 (2010) 1675–1681 1681

Marnix Volckaert received the degree of aerospaceengineer in 2007 from the Technische Universiteit Delft,Delft, The Netherlands. He is currently a Ph.D. student atthe Katholieke Universiteit Leuven (K.U.Leuven), Leuven,Belgium, at the division of Production engineering,Machine design and Automation, at the Department ofMechanical Engineering. His main research interests arein the field of iterative learning control and nonlinearsystems.

Moritz Diehl is an associate professor at KatholiekeUniversiteit Leuven and principal investigator of K.U.Leuven’s Optimization in Engineering Center OPTEC. Heobtained his Ph.D. at the University of Heidelberg innumerical mathematics in 2001. He works on structureexploiting optimizationmethods for nonlinear and convexoptimization in engineering applications. Major focus ofhis work are fast real-time optimization methods forestimation and control.