adaptive observers for linear stochastic time-variant systems with disturbances

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSINGInt. J. Adapt. Control Signal Process. 2009; 23:547–566Published online 18 June 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/acs.1051

Adaptive observers for linear stochastic time-variant systemswith disturbances

Stefano Perabo∗,† and Qinghua Zhang

IRISA, INRIA Rennes, Campus Universitaire de Beaulieu, Avenue du General Leclerc,35042 Rennes Cedex, France

SUMMARY

Motivated by fault detection and isolation problems, we present an approach to the design of state observersfor linear time-variant stochastic systems with unknown parameters and disturbances. The novelties withrespect to more conventional techniques are as follows: (a) the joint estimation of state, disturbancesand parameters can be carried out; (b) it is a full-stochastic approach: the unknown parameters anddisturbances are random quantities and prior information, in terms of means and covariances, can beeasily taken into account; (c) the observer structure is not fixed a priori, rather derived from the optimalone by means of a sliding window approximation; (d) contrary to descriptor system techniques, whichestimate the state starting from a restricted set of disturbance-free equations, our approach is focused ondisturbance estimation, from which state estimates are derived straightforwardly. Copyright q 2008 JohnWiley & Sons, Ltd.

Received 23 January 2007; Revised 24 October 2007; Accepted 18 April 2008

KEY WORDS: adaptive observers; unknown input observers; fault detection; linear stochastic systems

1. INTRODUCTION

The following discrete time linear stochastic system is considered in this paper:

xk+1= Akxk+Bkuk+�kp+Ekdk+wk (1a)

yk =Ckxk+vk (1b)

for k�0, with Ak ∈Rn×n , Bk ∈Rn×m , �k ∈Rn×q , Ek ∈Rn× f and Ck ∈Rl×n known time-variantmatrices. As usual, the vector sequences {xk}, {uk} and {yk} denote, respectively, the state, input andoutput stochastic processes. The sequences {wk} and {vk} are assumed to be zero mean, white anduncorrelated wide-sense stochastic processes, with E[wkwT

k ]=Qk and E[vkvTk ]= Rk �O (positive

∗Correspondence to: Stefano Perabo, IRISA, INRIA Rennes, Campus Universitaire de Beaulieu, Avenue du GeneralLeclerc, 35042 Rennes Cedex, France.

†E-mail: [email protected], [email protected]

Copyright q 2008 John Wiley & Sons, Ltd.

548 S. PERABO AND Q. ZHANG

definite), where E[·] denotes the mathematical expectation operator. The initial condition x0 hasknown mean E[x0]=�0 and covariance E[(x0−�0)(x0−�0)

T]= P0. Both the initial condition x0and the input process {uk} are assumed uncorrelated with the noise sequences.

The term Ekdk accounts for unknown disturbances acting on the system, where the sequence{dk} is an unknown (and uncontrolled) input modeled as a wide-sense stochastic process, notnecessarily stationary. Stationary signals having a prescribed rational power spectrum, �d(z), canthen be viewed as a special case. The disturbances are further assumed uncorrelated with the initialstate, the noise and the input processes, respectively, x0, {wk}, {vk} and {uk}.

Finally, the term �kp accounts for constant parameters that need to be estimated online orfaults, such as some kinds of actuator faults (for an example see [1]). Here p is a random variableuncorrelated with the noise, input and disturbance processes.

The problem to be solved is the following: find for each N�0 the minimum variance linearestimator of the state xN given the past input and output sequences (say uN0 ={u j : j =0,1, . . . ,N }and similarly yN0 ), in other words the optimal filter denoted by xN |N and conditions guaranteeingthe uniqueness of the corresponding estimate xN |N (by convention italic characters denote samplesfrom the corresponding random variables which, instead, are denoted by roman characters). Observethat we seek the state estimator despite the presence of the unknown random parameters p anddisturbances {dk}. When these are absent, the solution of the problem leads to the well-knownKalman filter.

The following two related problems will also be discussed in this paper. First, how to recursivelyand reliably compute the estimates xN |N once sample paths (measurements) of the input andoutput processes, uN

0 and yN0 , become available. Second, how to weaken the uniqueness conditionsby considering smoothed estimators xN |N+D , for some appropriate delay D. The importance ofsmoothing in the considered context will be discussed later in Section 3.2.2.

When there are no disturbances, i.e. dk ≡0 for each k, the solutions of the stated problem areknown as adaptive observers, or else adaptive filters, for the class of linear time-variant stochasticsystems described in (1). In the complementary case, i.e. when p≡0, they are called unknown inputobservers. Both these subproblems have been studied extensively in the system theory literature:adaptive observers have been considered, for example, in [2, 3], whereas the existing solutionsfor discrete time stochastic systems with unknown inputs can be roughly classified into two mainapproaches. The first is based on a fixed observer structure, for example, [4–7], chosen a prioriand tuned in order to obtain unbiased minimum error variance estimates. Usually these methodswork under the conditions guaranteeing the existence of the filtered estimates. The second is basedon state estimation techniques for descriptor systems, for example, [8–11]. A singular disturbance-free state equation is obtained for each k by multiplying on the left the state transition equationby a (usually rectangular) matrix Lk+1 whose null space contains the range of Ek . The statesequence is then usually estimated by (possibly recursive) maximum likelihood techniques underthe Gaussian noise assumption (see, however, [12] for generalization).

With respect to these methods, the one presented here is novel regarding the following points.First, it can handle both unmeasured disturbances and unknown parameters: these are modeledas random quantities and prior information, in terms of means, covariances, spectral densities,etc., can be easily taken into account. The Gaussian assumption is not invoked, rather the optimalsolution of the state estimation problem is searched for the restricted class of linear estimators.The knowledge of only the first two moments of the random quantities involved is required, notof their complete probability description.

Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2009; 23:547–566DOI: 10.1002/acs

ADAPTIVE OBSERVERS FOR SYSTEMS WITH DISTURBANCES 549

Second, contrary to descriptor system techniques that try to annihilate the disturbances, thefocus here is on their estimation, which is important for fault detection and isolation purposes.In fact it will be shown that some sufficient conditions for the existence of optimal filteredand/or smoothed-state estimates are equivalent to the conditions guaranteeing the existence anduniqueness of the disturbances and/or parameter estimates, from which state estimates can bederived straightforwardly.

Finally, the observer structure is not fixed, rather derived from the optimal one by means ofa recursive sliding window approximation. The user is required to balance the computationalcomplexity and the estimation accuracy by choosing the window size.

The paper is organized as follows: Section 2 will review the basic results of the linear estimationtheory. Sections 3 and 4 will present the method in detail, focusing, respectively, on parameters anddisturbances estimation, and state estimation in general time-variant systems. A comparison of theproposed method with an existing approach, the parity space approach, is presented in Section 5.An application and the conclusions will then follow.

2. PRELIMINARIES ON LINEAR FILTERING THEORY

This section summarizes some basic results of the linear estimation theory needed in the remainderof the paper. More details can be found in [13, Chapters 2–5; 14, Chapters 3–9].

The notations �x=E[x] and �x=E[(x−�x)(x−�x)T] are used to denote the mean and the

covariance matrix of a random vector x. Given two random vectors x and y, the minimum varianceunbiased linear estimator of x given y is

x=�x+K (y−�y) (2a)

where K is any solution of K�y=�xy and �xy=E[(x−�x)(y−�y)T]. The unambiguous notation

E[x|y] will also be used. If �y is full rank, then the covariance matrix �x of the zero mean errorx=x− x is

�x=�x−�xy�−1y �yx (2b)

A linear subspace Y spanned by the set {y j : j =0, . . . ,k} will be denoted byL{yk0}. With respect to

the outer product 〈x,y〉=E[(x−�x)(y−�y)T], the minimum variance unbiased estimator E[x|Y]=

E[x|yk0] can be considered as the orthogonal projection of the random vector x onto L{yk0}.Obviously, E[·] is a linear operator and, moreover,

Y⊂Z ⇒ E[E[x|Z]|Y]= E[x|Y] (3)

Assume now that in (1) dk ≡0 for each k, and p≡0. Recursive equations for the state predictorxk+1|k = E[xk+1|yk0,uk0], the so-called innovation representation of systemmodel (1) (Kalman filter),have been first derived in [15]:

xk+1|k = Ak xk|k−1+Bkuk+Kkek (4a)

yk =Ck xk|k−1+ek (4b)



where

Kk = Ak Pk|k−1CTk �−1

k (5a)

Pk+1|k =(Ak−KkCk)Pk|k−1(Ak−KkCk)T+Kk RkK

Tk +Qk (5b)

�k =Ck Pk|k−1CTk +Rk (5c)

The positive-definite matrices {�k} are the covariances of the innovation sequence

ek =yk− E[yk |yk−10 ,uk−1

0 ] (5d)

and the recursion is initiated by setting x0|−1=x0 and its covariance P0|−1= P0. The assumptionthat the input process is uncorrelated with the initial state and the noises guarantees that there isno feedback from the process {yk} to {uk} (see [16, 17] for details), which is also equivalent to thefollowing condition:

E[ekuTh ]=O ∀k,h�0 (6)

Given any two positive integers k and h such that k�h, the following transition matrices canbe defined recursively:

�hh = I, �k+1

h =(Ak−KkCk)�kh (7)

Under the assumption that (Ak,Ck) is uniformly completely observable and (Ak,Q1/2k ) is uniformly

completely reachable, it can be proved (see [18, p. 240; 19]) that there exist two positive constantsa and b such that ‖�k

h‖2=max‖x‖2=1 ‖�khx‖2�ae−b(k−h), from which the asymptotic stability of

the filter can be established.

3. PARAMETERS AND DISTURBANCES ESTIMATION

3.1. Basic equations for estimation

The objective of this section is to derive a set of equations such that the unknown parameters andthe disturbance sequence can be estimated from knowledge of the input and output sequences.Pretend for a while that they are known quantities, i.e. as if they were inputs of the system describedby (1), and assume the following:

Assumption 1(Ak,Ck) is uniformly completely observable and (Ak,Q

1/2k ) is uniformly completely reachable.

Assumption 2The parameters p and the disturbance sequence {dk} are uncorrelated from the initial state x0 andthe noise sequences {wk} and {vk}.

Hence, as recalled in Section 2, there is no feedback from the output to the parameters and distur-bances and, analogously to the innovation representation (4), the following modified innovation



representation can be derived:

x∗k+1|k = Ak x

∗k|k−1+Bkuk+�kp+Ekdk+Kke

∗k (8a)

yk =Ck x∗k|k−1+e∗

k (8b)

the recursion being initiated setting x∗0|−1=x0. The main difference between the two represen-

tations, emphasized by using the superscript ∗ in (8), is that the ‘estimates’ {x∗k+1|k} cannot be

computed because the realizations p and {dk}, of p and {dk}, respectively, are not really available,whereas the estimates {xk+1|k} obtained from (4) depend uniquely on the measured {yk} and {uk}.Moreover, the innovation sequence and the one-step state predictor in (8) are defined, respectively,as follows:

e∗k =yk− E[yk |yk−1

0 ,uk−10 ,p,dk−1

0 ] (9)

x∗k|k−1= E[xk |yk−1

0 ,uk−10 ,p,dk−1

0 ] (10)

Observe, however, that the error x∗k|k−1=xk− x∗

k|k−1 has zero mean and its covariance matrix,P∗k|k−1, is given recursively by the same Riccati equation (5b) that holds in the no disturbance

case. In addition, the Kalman gain and the covariance of e∗k are also given by the same expres-

sions reported before ((5a) and (5c), respectively) but, in order to keep the notation simple, nosuperscript ∗ will be added to �k and Kk .

Substitution of (8b) into (8a) results in

x∗k+1|k =(Ak−KkCk)x

∗k|k−1+Bkuk+�kp+Ekdk+Kkyk (11)

By defining recursively the following quantities,

�0=O, �k+1=(Ak−KkCk)�k+�k (12a)

s0=0, sk+1=(Ak−KkCk)sk+Ekdk (12b)

z0=x0, zk+1=(Ak−KkCk)zk+Bkuk+Kkyk (12c)

it is not difficult to check that the quantity x∗k+1|k in (11) can be rewritten in the following form:

x∗k+1|k =�k+1p+sk+1+zk+1 (13)

Note that a realization of the sequence {zk} can be computed from available data only, i.e. systemmatrices, input and output sequences. As a matter of fact, (12c) is exactly the Kalman filterequation that would be obtained if p≡0 and dk ≡0 for all k. Using (13) and (8b), the following isalso true:

Cksk+Ck�kp+e∗k =yk−Ckzk (14)



It is possible to arrange in matrix form the set of equations obtained from (14) when k=1,2, . . . ,N .For example, for N =4 one obtains

⎡⎢⎢⎢⎢⎢⎢⎣

C4�41E0 C4�

42E1 C4�

43E2 C4E3 C4�4

C3�31E0 C3�

32E1 C3E2 O C3�3

C2�21E0 C2E1 O O C2�2

C1E0 O O O C1�1

⎤⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

d0

d1

d2

d3

p

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

+

⎡⎢⎢⎢⎢⎢⎣

e∗4

e∗3

e∗2

e∗1

⎤⎥⎥⎥⎥⎥⎦=

⎡⎢⎢⎢⎢⎢⎣

y4−C4z4

y3−C3z3

y2−C2z2

y1−C1z1

⎤⎥⎥⎥⎥⎥⎦ (15)

where the transition matrices �kh are the same defined in (7). For an arbitrary N , left multiply

the above system by the block diagonal matrix blkdiag{�−1/2N , . . . ,�−1/2

1 } in such a way that thecovariance of the zero mean vector

e∗ =vec[�−1/2N e∗

N . . .�−1/21 e∗

1] (16)

is equal to the identity matrix. A system of the form

Ag+e∗ = r (17)

is thus obtained, where matrix A∈RlN×( f N+q) has the same structure as in (15), g=vec[d0 . . .

dN−1p] is the unknown term, and the vector r=vec[rN . . . r1] contains the computable residuals

rk =�−1/2k (yk−Ckzk) (18)

If dk ≡0 for each k and p≡0, then r=e∗, i.e. the vector of residuals has zero mean and itscovariance equals the identity matrix. Any statistical test indicating a deviation from this conditioncan be used to detect the presence of non-null disturbances and/or parameters.

Since samples of r are available but instead e∗ cannot be observed, the most appealing approachto estimate g is to compute its minimum variance linear estimator given the random vector r.Thanks to Assumption 2, property (6) holds also for the parameters and the disturbances sequence,i.e. for any k,h=0,1,2, . . . ,

E[e∗kd

Th ]=O and E[e∗

kpT]=O (19)

As a result, g and e∗ in (17) are in fact uncorrelated. Provided that prior information on therandom vector g is given in terms of its mean �g and covariance �g, a straightforward applicationof formulas (2a) and (2b) gives

g=�g+�gAT(A�gAT+ I)−1(r−�r) (20a)

�g =�g−�gAT(A�gAT+ I)−1A�g (20b)

where the mean �r of the residual vector r is obtained from the relation 0=E[e∗]=�r−A�g. Notethat its covariance �r is equal to (A�gAT+ I) and is always invertible because A�gAT is positivesemidefinite. Under the assumption that also �g is invertible, by using well-known formulas derived



from the so-called matrix inversion lemma [20, p. 18], the following can also be obtained:

g=�g+(ATA+�−1g )−1AT(r−�r) (21a)

�g =(ATA+�−1g )−1 (21b)

By replacing matrix �−1g by a factorization BTB (for example, the Cholesky factorization) with

B∈R( f N+q)×( f N+q), it is easily recognized from (21a) that g can be obtained from

(ATA+BTB)(g−�g)=AT(r−A�g) (22)

Observe that the estimator g= E[g|r] gives exactly the projections of p and of the disturbances{dk} onto the space L{rN0 } generated by the residuals defined in (18), i.e. the linear estimators

dk|N = E[dk |rN0 ], for k=0, . . . ,N−1, and p|N = E[p|rN0 ] (since p does not depend on time, thenotation p|N is used to denote its estimate based on all the measurements till time N ). Instead,appropriate blocks of �g give the correlation matrices between the errors p|N =p− p|N and dk|N =dk− dk|N , for k=0, . . . ,N−1.

One could suspect, at this point, that the information about the unknown terms, which is availablefrom knowledge of the input and output sequences, is not fully exploited if the only quantitiesthat are used for the estimation of the disturbances and parameters are the residuals defined in(18). The following proposition, however, provides a geometrical interpretation of the residualsthat guarantees the optimality of the proposed method as long as linear estimators are considered(for a proof see [21]).Proposition 1If Assumptions 1 and 2 are satisfied, then the following orthogonal decomposition holds:

L{yN0 ,uN−10 }=L{rN0 }⊕L{uN−1

0 } (23a)

As a consequence also

E[dk |rN0 ]= E[dk |yN0 ,uN−10 ] and E[p|rN0 ]= E[p|yN0 ,uN−1

0 ] (23b)

The stated result will also be used in Section 4 to derive the state estimators.When sample paths of the input and output sequences, say {uN−1

0 } and {yN0 }, are available, oneis faced with the problem of computing numerically the estimate g= E[g|r=r ], where the vector rdenotes the corresponding realization of r and g=vec[d0|N . . . dN−1|N p|N ]. The most importantcase in practice, i.e. the lack of prior information about the disturbances and parameters, will bediscussed in the following.

3.2. The case with no prior information

3.2.1. Estimability conditions. The absence of prior information about g can be dealt with bysetting �g =0 and letting �g →∞ (or equivalently �−1

g →0), which corresponds to a very large

uncertainty. Formula (22) becomes (ATA)g=ATr, which is the system of normal equations forcomputing the unique least-squares solution of

Ag=r (24)



in the unknown g, provided that matrix A has full column rank. Hence:

Proposition 3For a given N�1, the estimates p|N and dk|N for 0�k�N−1 are unique if and only if matrix Ahas full column rank.

The following are two other easy to prove necessary conditions for the uniqueness of the estimate:

Proposition 4For a given N�1, the solution of the least-squares problems in (24) is unique only if the followingnecessary conditions are satisfied:

(C1) rank

⎡⎢⎢⎢⎢⎢⎣

E0 O · · · O �0

O E1 · · · O �1

......

. . ....

...

O O O EN−1 �N−1

⎤⎥⎥⎥⎥⎥⎦= f N+q (25a)

(C2) rank

(N∑

k=1�Tk C

Tk Ck�k

)=q (25b)

If rank(Ek)= f for all k�0 and (C1) is true for a value N =Nmin, then it is satisfied for all valuesN�Nmin. Analogously, if (C2) is true for a value N =Nmin, then it is satisfied for all valuesN�Nmin.

ProofBy factorizing A as follows (continuing the example with N =4, as done in (15)),

A=

⎡⎢⎢⎢⎢⎢⎣

C4�41 C4�

42 C4�

43 C4

C3�31 C3�

32 C3 O

C2�21 C2 O O

C1 O O O

⎤⎥⎥⎥⎥⎥⎦ ·

⎡⎢⎢⎢⎢⎣E0 O O O �0

O E1 O O �1

O O E2 O �2

O O O E3 �3

⎤⎥⎥⎥⎥⎦ (26)

it is clear that if the null space of the nN×( f N+q) right factor is not empty, then the null spaceof A is not empty too, thus condition (C1).

Instead, condition (C2) can be proved denoting by X the columns multiplied by the unknownparameters and recalling that rank(X)= rank(XTX) for any arbitrary matrix X . �

Condition (C2) can be considered as an excitation condition for the sequence of matrices {�k}defined in (12a). In the case where there are no disturbances, it is also sufficient and it has alreadybeen used in [2, 3] as a condition for the existence of an adaptive state observer.

Unfortunately, finding another general estimability condition is a very complex task. However,from a practical point of view, it should be noted that the proposed method requires simply checkingthe rank of matrices and solving least-squares problems, for which efficient numerical tools arereadily available [22, Chapters 2–3]. Moreover, it is clear that in order to compute the estimates



using the method presented above, a growing size least-squares problem is to be solved as Nincreases.

The purpose of the discussion that follows in the next subsections is then simply to illustrate,in a special but of practical importance case, two points: (a) the concept of ‘delayed’ estimation,which is strictly related to the design of adaptive state smoothers when adaptive filters do notexist and (b) the possibility of setting up a recursive approximate estimation procedure leading toa sliding window approach to the estimation of the disturbances and unknown parameters.

3.2.2. Delayed estimation. Consider first the special case when there are no unknown parameters,q=0. A sufficient condition to ensure that A has full column rank for all N�1, hence that theestimates dk|N for 0�k�N−1 are unique, is

(C3) rank(Ck+1Ek)= f ∀k�0 (27)

This guarantees that for each k the block of columns multiplied by dk has full rank and is linearlyindependent from the other columns, thanks to the particular structure of A which has only zeroson its lower right part.

Condition (C3) in (27) is often stated as an assumption to prove the existence of unknowninput observers for both deterministic and stochastic systems. This statement, however, needs tobe made clearer. It will be explained in Section 4 that it guarantees the existence of an optimalunknown input filter. In fact, the uniqueness of the estimates dk|N for 0�k�N−1 is sufficient forthe computation of the filtered state estimate xN |N of xN given measurements till time N .

Instead, when (C3) is not satisfied, it could still be possible to design a fixed-lag smoother, i.e.to compute for some delay D>0 the estimate xN |N+D of xN given measurements till time N+D,a sufficient condition being the uniqueness of the estimates dk|N+D for 0�k�N−1. With someabuse of terms, these will be called delayed estimates in the following.

To exemplify what has been just asserted, consider the case R(Ek)⊆N(Ck+1) for all k�0(the range of Ek is a subset of the null space of Ck+1) so that Ck+1Ek =O and thus (C3) is notsatisfied. This situation may happen typically when Ck+1 and Ek have both some zero entries, forexample, Ck+1=[1 0] and Ek =[0 1]T. Then the zero blocks appear in the term Ag in (17) asshown in the left-hand side of the following scheme (for N =4):

4

3

2

k=1

⎡⎢⎢⎢⎢⎣

× × ∗ O

× ∗ O O

∗ O O O

O O O O

⎤⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎣

d0

d1

d2d3

⎤⎥⎥⎥⎥⎦ −→

5

4

3

2

k=1

⎡⎢⎢⎢⎢⎢⎢⎢⎣

× × × ∗ O

× × ∗ O O

× ∗ O O O

∗ O O O O

O O O O O

⎤⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎣

d0

d1

d2

d3d4

⎤⎥⎥⎥⎥⎥⎥⎥⎦

(28)

It is evident that dN−1 (d3 in the example above) is not estimable from measurements collected tilltime N (in other words d3|4 is not unique; in fact it is arbitrary). However, if the blocks markedwith a ∗, i.e. matrices Ck+2�

k+2k+1Ek in (17), have full column rank, then it is possible to estimate

d0 to dN−2 (d0|4, d1|4 and d2|4). Moreover, in order to estimate the disturbances up to N−1, itis sufficient to add the measurements at time N+1 (at time 5 to continue the example) so thatthe unique estimates dk|N+1 for k=0, . . . ,N−1 and, in particular, dN−1|N+1 (in the example d3|5)could be computed. The right-hand side of (28) depicts the situation.



If the blocks marked with ∗ are also zero, Ck+2�kk+1Ek =O , but those immediately above,

Ck+3�k+3k+1Ek , are full column rank, then a two-step delay is sufficient to compute the estimates

dk|N+2 for k=0, . . . ,N−1, and so on.In summary, the above arguments can be summarized as follows: if the conditions

(C4)

⎡⎢⎢⎢⎢⎢⎣

Ck+D�k+Dk+1 Ek

· · ·Ck+2�

k+2k+1Ek

Ck+1Ek

⎤⎥⎥⎥⎥⎥⎦=O, rank(Ck+D+1�

k+D+1k+1 Ek)= f ∀k�0 (29)

are satisfied, then the estimates dk|N+D for 0�k�N−1 are unique. This result can thus beconsidered an extension of condition (27), because it guarantees the existence of some delayedestimates when A has no full rank. Finally, observe that both (27) and (29) are sufficient but notnecessary, and that (29) amounts to assuming a scalar delay structure between disturbances andoutput. More general estimability conditions can be found in [21].

When there are unknown parameters, the conditions in (27) or (29) are no longer sufficient,because some of the columns of A, i.e. those multiplied by the parameters p, could be linearlydependent on the ones multiplied by the disturbances (recall that (C1) and (C2) in Proposition 3are only necessary). Since sufficient conditions are not known, in general, the rank of matrix Ahas to be checked numerically. However, note the following result:

Proposition 5(a) Assuming that condition (C3) in (27) is satisfied, if the estimates p|N and dk|N for 0�k�N−1are unique (i.e. matrix A has full column rank) for a value N =Nmin, then they are unique alsofor all N�Nmin.

(b) Analogously, assuming that condition (C4) in (29) is satisfied, if the delayed estimatesp|N+D and dk|N+D for 0�k�N−1 are unique for a value N =Nmin, then they are unique also forall N�Nmin.

ProofPart (a): when measurements at time k+1 are available, matrix A is built from the smaller previousone used at time k, say Ak , as follows (up to a permutation of the columns):[ × Ck+1Ek

Ak O

](30)

Hence, it has full column rank as soon as Ak has full column rank, because Ck+1Ek is not rankdeficient by assumption. Part (b) is proved in the same manner. �

3.2.3. Approximate recursive estimation. Consider the hypotheses of Proposition 5(a) so thatunique estimates dk|N exist for 0�k�N−1 and N�Nmin. The objective is then to find a recursiveprocedure to compute approximations of these estimates (these should be denoted, for example,by �k|N , but the notation will not be changed in the following in order not to overburden it).

Observe that the upper left blocks of matrix A tend to zero as N grows, because the uniformobservability and reachability assumption guarantees that the transition matrices �k

h defined in (7)



tend to the null matrix as the difference k−h→∞. Hence, it is possible to consider an approximateproblem by replacing A with A+E, where E annihilates the blocks �−1/2

k Ck�kh Eh−1 such that

k−h�L . The system (A+E)g=r has thus the structure shown in the left part of the followingscheme (for N =5 and L=3):

(31)

The value of L must satisfy L�Lmin�1, where Lmin is the minimum value guaranteeing thatrank(A)= rank(A+E) for all N , so that the estimability properties of the original problem areconserved also in the approximate one. Obviously, the accuracy of the approximate solutionincreases as L→∞.

In the above, also an initial data window has been indicated with a box. Since A+E has fullcolumn rank only for N�Nmin, the initial window size, say W , must satisfy W�Nmin, in orderfor the unknowns to be estimable. Moreover, in order to apply the procedure that is going to beexplained, it is also required that W�L (in the example in (31), the most natural choice W = L=3is shown).

Using measurements in the initial window, i.e. till time W , it is then possible to compute byleast squares the estimates p|W and dk|W for 0�k�W −1. Note that there are lW equations inf W +q unknowns. As it is well known, if the least-squares problem (A+E)g=r is multiplied onthe left by an orthogonal matrix, then the estimate g remains unchanged. Hence, if in particularonly the equations in the initial window are multiplied on the left by an orthogonal matrix, then theinitial estimates do not change. In addition, if the orthogonal matrix is chosen to be the factor Q inthe QR decomposition of the initial block, then the structure in the right part of (31) is obtained,where only the R factor appears. Note that the null rows that are produced have not been indicated,since they do not contribute to the estimation problem. Moreover, to obtain the estimates, now alinear system of f W +q equations in f W +q unknowns has to be solved.

Observe that the upper triangular square block R0, of dimension f × f , has certainly full columnrank because d0 is estimable by assumption. It is also clear that, owing to the approximate bandedstructure, no equations involving d0 are added as measurements at time k>W become available.Consider then the set of equations at time W +1 (i.e. at time 4), relating the new unknown dWand the residual rW+1 (i.e. d3 and r4, respectively). After a permutation of the rows of the initialblock, the situation is as depicted in the left part of the following scheme:

(32)

The initial window equations are indicated by a dotted box, whereas the new window is indicated bya solid line box. In fact, in order to compute the estimate d0|W+1, one proceeds in two steps. First,



solve by least squares the system of equations in the new window, a system of f (W −1)+q+lequations in f W +q unknowns. Second, substitute the estimates obtained by the first step inthe equations involving d0, i.e. the block with R0 on the bottom, and solve a linear systemof f equations in f unknowns.

Note, however, that the estimates obtained in the first step, i.e. by considering the new window,are exactly p|W+1 and dk|W+1 for 1�k�W . Thus, in order to start a sliding window recursion,the estimate d0|W+1 is simply disregarded, i.e. the second step above is not carried out, and theleast-squares problem in the new window, the first step, is solved as before by a QR decomposition.After the elimination of the null rows and permutation, the situation is then as depicted in the rightpart of (32).

The equations corresponding to what has been called so far the new window are indicated bya dotted box. The next window, involving measurements till time W +2, is the one marked witha solid line and permits the computation of the estimates p|W+2 and dk|W+2 for 2�k�W +1.Instead, the estimate d1|W+2 is simply not computed. Observe, however, that the estimates d0|W+2

and d1|W+2 can be in fact computed by backward substitution provided that the full rank triangularfactors R0 and R1 are stored. And so on.

Under the hypotheses of Proposition 5(b), the same technique can be used to compute thedelayed estimates over a sliding window. The details should be clear and are not presented here.

4. STATE ESTIMATION

4.1. General expression for smoothed estimators

The objective of this section is to find an expression for the smoothed-state estimate at any timeN , defined as

xN |N+D = E[xN |yN+D0 ,uN+D

0 ] (33)

and for the covariance matrix PN |N+D of the error xN − xN |N+D (recall that P∗N |N+D denotes

instead the covariance of the error xN − x∗N |N+D). Here D�0 is an integer delay. From (3), it is

clear that it can be obtained by the projection xN |N+D = E[x∗N |N+D|yN+D

0 ,uN+D0 ]. It is known [14,

Theorem 10.1.1] that

x∗N |N+D = x∗

N |N−1+N+D∑k=N

GkN e

∗k (34)

where

GkN = P∗

N |N−1(�kN )TCT

k �−1k (35)

By replacing the innovation e∗N by (yN −CN x∗

N |N−1) and the state predictor x∗N |N−1 by using (13),

the following is obtained (when D=0 set the sum in the right term to zero):

x∗N |N+D =(I −GN

NCN )(�Np+sN )+N+D∑

k=N+1Gk

N e∗k +(zN +GN

N rN ) (36)



where, using (12b), sN =∑N−1k=0 �N

k+1Ekdk . Moreover, note that the recursion in (12c) can beexpressed as

zk+1= Akzk+Bkuk+Kkrk ∈L{rN0 }⊕L{uN−10 } (37)

Hence, by using the linearity of (36) and Proposition 1, it is easy to conclude that xN |N+D isobtained by replacing the parameters p, the term sN and the innovations e∗

k appearing in (36) withtheir best linear estimators based on rN+D

0 :

xN |N+D =(I −GNNCN )(�N p|N+D+ sN |N+D)+

N+D∑k=N+1

GkN e

∗k|N+D+(zN +GN

N rN ) (38)

Thanks to the linearity of (12b), the estimator sN |N+D =∑N−1k=0 �N

k+1Ek dk|N+D can be obtainedrecursively from the following:

s0|N+D =0, sk+1|N+D =(Ak−KkCk)sk|N+D+Ek dk|N+D (39)

RemarkIf approximate recursive estimation techniques are applied, as explained in Section 3.2.3, then itis possible to compute an approximation of sN |N+D by evaluating the truncated sum sN |N+D =∑N−1

k=N−W �Nk+1Ekdk|N+D , where the estimates dk|N+D are obtained by using only the current data

window. If the window length W is increased, then (a) the stability of the transition matrices �Nk+1

guarantees an improved dumping of the truncation error and (b) the errors (dk|N+D− dk|N+D) tendto zero, thus motivating this approach.

4.2. Sufficient conditions for the uniqueness of the state estimates

It is now clear that knowledge of unique estimates p|N+D and dk|N+D for k=0, . . . ,N−1 is suffi-cient in order to compute a unique estimate sN |N+D . Section 3.2.2 and, in particular, Proposition 5addressed this point under some specific hypotheses.

Instead, unique estimates e∗k|N+D can always be computed, irrespective of the rank of A. In

fact, it is always possible to find a linear transformation g=Tg′ such that matrix A′ =AT is fullcolumn rank and to rewrite (17) in the form A′g′+e∗ = r. Hence, E[e∗|r]= e∗ = r−A′g′, i.e. theprojections e∗

k|N+D = E[e∗k |rN+D

0 ] of the innovation e∗k onto L{rN+D

0 } can be always computed.Following the same line of reasoning as Proposition 1, it is true that these projections coincidewith the estimators E[e∗

k |yN+D0 ,uN+D−1

0 ] that are required in (38).To summarize, putting together Proposition 5 and the above observation, the following is true:

Proposition 6(a) Assuming that condition (C3) in (27) is satisfied, if the estimates p|N and dk|N for 0�k�N−1are unique (i.e. matrix A has full column rank) for a value N =Nmin, then unique filtered stateestimates xN |N exist for all N�Nmin.

(b) Analogously, assuming that condition (C4) in (29) is satisfied, if the delayed estimatesp|N+D and dk|N+D for 0�k�N−1 are unique for a value N =Nmin, then unique smoothed-stateestimates xN |N+D exist for all N�Nmin.



4.3. State estimation error covariance

An expression for the state estimation error covariance can be obtained as follows: decomposexN − xN |N+D as (xN − x∗

N |N+D)+(x∗N |N+D− xN |N+D) and note that the first term is orthogonal to

the space L{yN+D0 ,uN+D

0 } to which the second term belongs. Thus, it is possible to decomposethe error covariance matrix as

PN |N+D = P∗N |N+D+�PN |N+D (40)

where P∗N |N+D can be computed [14, Lemma 10.2.1] from knowledge of P∗

N |N−1 and �k|k−1 fork=N , . . . ,N+D, which are known. Instead, �PN |N+D accounts for the increase in uncertaintydue to the fact that the parameters and the disturbances terms cannot be observed but onlyestimated. By defining e∗ =e∗− e∗ and noting that e∗ =−Ag, it is straightforward to see, butnot detailed here, that matrix M can be defined such that x∗

N |N+D− xN |N+D =Mg. Hence, oneobtains �PN |N+D =M�gMT, with the necessary adjustments in the case approximate estimationtechniques are applied.

5. COMPARISON WITH THE PARITY SPACE APPROACH

The approach of generating the residuals, as described in Section 3.1, shares many similaritieswith the so-called parity space method ( see [23–25; 26, Chapter 7.4; 27, Chapter 11.2]), whichfinds wide application in fault detection problems. In the parity space method, the parameters anddisturbances are estimated from a set of relations that can be cast in the form Ag+w= r. MatrixA differs from A in (17) only because the transition matrices �k

h defined in (7) are replaced by�kh = Ak−1 . . . Ah+1Ah . Moreover, the covariance of the noise term w does not equal the identity

matrix and the residuals r are built in a different way.The approach proposed here is new in that it makes explicit reference to the innovation repre-

sentation of system (1), with the following advantages:

(a) The components of the noise term e∗ are independent and normalized, whereas an importantdrawback of the parity space approach is that the covariance of the noise term w has tobe whitened before computing the least-squares estimate, thus increasing the computationalload, especially for large-scale problems.

(b) If matrices Ak are not stable, as it can happen typically in control problems, matrix A couldbe largely ill-conditioned, thus making numerically harder the process of computing reliablythe estimate, especially for large window sizes.

(c) The initial condition x0 affects the residuals r through the sequence {zk}. However, thetransition matrices �k

h are stable. Hence, the effect of the initial condition is progressivelyforgotten as k→+∞. As a consequence, when using the sliding window estimation proce-dure, one does not have to take care of the estimation or rejection of the state at the initialtime of the window. In other words, the disturbance and parameter estimation problem issomehow asymptotically decoupled from the state estimation problem.

(d) State estimation is straightforward and can be easily performed on demand, once the distur-bance and parameter estimates have been computed. Moreover, it does not require expensivematrix recalculations, being sufficient to implement a truncated version of the recursion in(39), involving only the estimates obtained in the current data window.



6. AN APPLICATION

Consider the following general reaction mechanism (the so-called Brusselator [28]) involving sixchemical species (n=6) where the supplied reactants are A and B (m=2) and the reaction productsare C , D, X and Y :

A→ X+D, B+X →Y +C, 2X+Y →3X

The notation → means that the species on its left-hand side react together to form the ones onthe right-hand side, at some reaction rate (not explicitly indicated here, for simplicity). It is anexample of auto-catalytic chemical reaction, i.e. a reaction in which a species (in this case X ) actsto increase the rate of its producing reaction. Define the state vector x=[x1, . . . , xn] to contain theinstantaneous quantity of the components {A, B, X,C,Y,D} in the system. Assuming that all thereaction rate constants are equal to 1 and that all the components are continuously removed at arate proportional to their quantity, a state space model can be derived in the following form [29]:

x=�r(x)−x+u

0 50 100 150 200 250 300 350 400

5

5.5

6

6.5

7

kT

Sta

te 1

0 50 100 150 200 250 300 350 400

14

16

18

20

kT

Sta

te 2

0 50 100 150 200 250 300 350 400

0.3

0.35

0.4

kT

Sta

te 3

Figure 1. The first three non-measured components of the state vector (zoomed view): nominal values(dotted line), estimated by the adaptive observer (solid line).



0 50 100 150 200 250 300 350 4000

0.05

0.1

kT

Sta

te 1

0 50 100 150 200 250 300 350 4000

0.5

1

1.5

kT

Sta

te 2

0 50 100 150 200 250 300 350 4000

0.02

0.04

0.06

0.08

kT

Sta

te 3

Figure 2. The estimation error of the first three non-measured componentsof the state vector (absolute value).

where

�=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−1 0 0

0 −1 0

1 −1 1

1 0 0

0 1 −1

0 1 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, r(x)=⎡⎢⎣

x1

x2x3

x3x5

⎤⎥⎦ , =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 0

0 1

0 0

0 0

0 0

0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

is a diagonal matrix whose elements {�1, . . . ,�n} are the removal rates and u=[u1 u2]T are thefeed rates. Assuming that only the state variables xi for i ∈{4,5,6} can be measured (l=3), theobjective is to design a state observer that can reject perturbations acting on the input u ( f =2) andthat can estimate online the values of the removal rates �3 and �5 of X and Y , respectively (q=2).

In order to apply the proposed adaptive observer, this non-linear system is first linearized anddiscretized around a nominal state trajectory x : the nominal feed and removal rates are denoted,respectively, by u and �i , and an explicit Euler scheme with sampling period T is used. Theresult is a time-variant linear system in the form of (1) where the noise wk is supposed to account



0 50 100 150 200 250 300 350 400

−1

−0.5

0

0.5

1

kT

Dis

turb

ance

1

0 50 100 150 200 250 300 350 400

−1.5

−1

−0.5

0

0.5

1

1.5

kT

Dis

turb

ance

2

Figure 3. The two disturbances estimated by the adaptive observer. The true sequences are two sinusoidalperturbations of periods 25 and 60 and amplitudes 0.4 and 0.6, respectively, starting from the time t=200.

for linearization and discretization errors. Hence, xk = x(kT )− x(kT ), p=[�3− �3 �5− �5], dk =u(kT )− u(kT ) and, in particular,

�k =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 0

0 0

−T x3(kT ) 0

0 0

0 −T x5(kT )

0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, Ek =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

T 0

0 T

0 0

0 0

0 0

0 0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, Ck =⎡⎢⎣0 0 0 1 0 0

0 0 0 0 1 0

0 0 0 0 0 1

⎤⎥⎦

It is clear that rank(CkEk−1)=0 so that a smoother has to be designed. A delay D=1 turns outto be sufficient for the following nominal values:

{�1, . . . , �n}={0.5,0.5,1,1,3,1}

u1(t)={

12 t, 0�t�20

10, 20< t, u2(t)=

{45 t, 0�t�20

16, 20< t



0 100 200 300 400 500 600 700 8001

1.05

1.1

1.15

1.2

kT

Par

amet

er 1

0 100 200 300 400 500 600 700 800

2.95

2.96

2.97

2.98

2.99

3

kT

Par

amet

er 2

Figure 4. The two parameters: nominal values (dotted line) and true values (dashedline), estimated by the adaptive observer (solid line).

T =0.2 and zero initial conditions. The true values of �3 and �5 are set, respectively, to 1.05 and2.95 and sinusoidal perturbations of periods 25 and 60 and amplitudes 0.4 and 0.6 are added,respectively, to u1 and u2 from the time t=200. The noise covariances are set to R=2 ·10−6 Ipand Q=8 ·10−5 In . The window size is L=45 (this value has been chosen by plotting the impulseresponses of the linearized system and by evaluating the dominant time constant). The results ofthe simulation are presented in Figure 1 (state estimates, xk|k+1), Figure 2 (state estimation error,|xk|k+1−xk |), Figure 3 (disturbances estimates, dk|k+2) and Figure 4 (parameters estimates, p|k).

Concerning the disturbances estimates, it is clear that, in this particular example, the sensitivityof the second component, with respect to measurement noise, is large compared with the sensitivityof the first component. However, the periods and amplitudes of the two sinusoidal disturbancesare well estimated.

Moreover, small biases are also present in the parameter estimates and the tracking of the firstcomponent of the state worsens sensibly after the disturbances start to act on the system. Themain explanation for this behavior is that linearization and discretization errors may have a meanvalue that is different from zero, thus violating the assumption on the noise wk . The magnitudeof these errors can only worsen when the working point of the system deviates from the nominal,because the system matrices of the linearized model are not adapted to account for this change(for example, matrix �k still depends on x). As a consequence, the residuals {rk}, which enter



the estimation procedure, are affected also by modeling errors whose effect is compensated by thealgorithm at the cost of corrupting the disturbance and parameter estimates.

Hence, if the proposed adaptive observer is applied to linear systems obtained by linearization ofnon-linear ones, then clearly its performance is expected to degrade proportionally to the amplitudeof the disturbance terms and the harshness of the non-linearities.

7. CONCLUSIONS

A general sliding window approach to the design of adaptive observers for linear time-variantstochastic systems with disturbances has been presented. It has been shown how to tackle theproblem of the joint estimation of the unknown parameters and the disturbances, thus allowing theapplication of the method for both fault detection and adaptive estimation purposes. The inclusionof prior information on the unknown quantities is straightforward. The relation between disturbanceand parameter estimation, on one hand, and state estimation, on the other, has also been illustrated.In particular, it has been shown how state smoothers can be designed when popular conditions forthe existence of state filters are not satisfied.

REFERENCES

1. Combastel C, Zhang Q. Robust fault diagnosis based on adaptive estimation and set-membership computations.Proceedings of the 6th IFAC Symposium on Fault Detection, Supervision and Safety of Technical Processes,Beijing, People’s Republic of China, August 2006; 1273–1278.

2. Zhang Q. Adaptive observer for MIMO linear time varying systems. IEEE Transactions on Automatic Control2002; AC-47(3):525–529.

3. Guyader A, Zhang Q. Adaptive observer for discrete time linear time varying systems. Proceedings of the 13thSymposium on System Identification, Rotterdam, Holland, August 2003; 1743–1748.

4. Kitanidis PK. Unbiased minimum-variance linear state estimation. Automatica 1987; 23:775–778.5. Chen J, Patton RJ. Optimal filtering and robust fault diagnosis of stochastic systems with unknown disturbances.

IEE Proceedings—Control Theory and Applications 1996; 143:31–36.6. Darouach M, Zasadzinski M. Unbiased minimum variance estimation for systems with unknown exogenous

inputs. Automatica 1997; 33(4):717–719.7. Hou M, Patton RJ. Optimal filtering for systems with unknown inputs. IEEE Transactions on Automatic Control

1998; 43(3):445–449.8. Nikoukhah R, Willsky AS, Levy BC. Kalman filtering and Riccati equations for descriptor systems. IEEE

Transactions on Automatic Control 1992; AC-37(9):1325–1342.9. Darouach M, Zasadzinski M, Onana AB, Nowakowsk S. Kalman filtering with unknown inputs via optimal state

estimation of singular systems. International Journal of Systems Science 1995; 26(10):2015–2028.10. Levy BC, Benveniste A, Nikoukhah R. High-level primitives for recursive maximum likelihood estimation. IEEE

Transactions on Automatic Control 1996; AC-41(8):1125–1145.11. Nikoukhah R, Campbell SL, Delebecque F. Kalman filtering for general discrete-time linear systems. IEEE

Transactions on Automatic Control 1999; AC-44(10):1829–1839.12. Germani A, Manes C, Palumbo P. Polynomial filtering for stochastic non-Gaussian descriptor systems. IEEE

Transactions on Circuits and Systems 2004; 51(8):1561–1576.13. Anderson BDO, Moore JB. Optimal Filtering. Prentice-Hall: Englewood Cliffs, NJ, 1979.14. Kailath T, Sayed AH, Hassibi B. Linear Estimation. Prentice-Hall: Englewood Cliffs, NJ, 2000.15. Kalman RE. A new approach to linear filtering and prediction problems. Journal of Basic Engineering—

Transactions of the ASME 1960; 82(1):35–45.16. Caines PE, Chan CW. Feedback between stationary stochastic processes. IEEE Transactions on Automatic Control

1975; AC-20(4):498–508.17. Gevers MR, Anderson BDO. On jointly stationary feedback-free stochastic processes. IEEE Transactions on

Automatic Control 1982; AC-27(2):431–436.



18. Jazwinski AH. Stochastic Processes and Filtering Theory. Academic Press: New York, 1970.19. Delyon B. A note on uniform observability. IEEE Transactions on Automatic Control 2001; AC-46(8):1326–1327.20. Horn RA, Johnson CR. Matrix Analysis. Cambridge University Press: Cambridge, 1985.21. Perabo S, Zhang Q. Adaptive observer for linear time-variant stochastic systems with disturbances. Proceedings

of the European Control Conference, Kos, Greece, July 2007.22. Bjorck A. Numerical Methods for Least Squares Problems. SIAM: Philadelphia, PA, 1996.23. Chow EY, Willsky AS. Analytical redundancy and the design of robust failure detection systems. IEEE Transactions

on Automatic Control 1984; AC-29(7):603–614.24. Gertler J. Fault Detection and Diagnosis in Engineering Systems. Marcel Dekker: New York, 1998.25. Tornqvist D, Gustafsson F. Eliminating the initial state for the generalized likelihood ratio test. Proceedings of

the 6th IFAC Symposium on Fault Detection, Supervision and Safety of Technical Processes, Beijing, People’sRepublic of China, August 2006; 643–648.

26. Basseville M, Nikiforov IV. Detection of Abrupt Changes: Theory and Application. Prentice-Hall: EnglewoodCliffs, NJ, 1993.

27. Gustafsson F. Adaptive Filtering and Change Detection. Wiley: New York, 2001.28. Prigogine I, Lefever R. Symmetry breaking instabilities in dissipative systems. Journal of Chemical Physics

1968; 48:1665–1700.29. Bastin G, Levine J. On state accessibility in reaction systems. IEEE Transactions on Automatic Control 1993;

AC-38(5):733–742.


adaptive observers for linear stochastic time-variant systems with disturbances

Documents