adaptive lqg control of input-output systems—a cost-biased...

ADAPTIVE LQG CONTROL OF INPUT-OUTPUTSYSTEMS—A COST-BIASED APPROACH∗

M. PRANDINI† AND M. C. CAMPI†

SIAM J. CONTROL OPTIM. c© 2001 Society for Industrial and Applied MathematicsVol. 39, No. 5, pp. 1499–1519

Abstract. In this paper, we consider linear systems in input-output form and introduce a newadaptive linear quadratic Gaussian (LQG) control scheme which is shown to be self-optimizing. Theidentification algorithm incorporates a cost-biasing term, which favors the parameters with smallerLQG optimal cost and a second term that aims at moderating the time-variability of the estimate.The corresponding closed-loop scheme is proven to be stable and to achieve an asymptotic LQG costequal to the one obtained under complete knowledge of the true system (self-optimization).

The results of this paper extend in a nontrivial way previous results established along the cost-biased approach in other settings.

Key words. LQG adaptive control, least squares identification, cost-biased identification, self-optimality

AMS subject classifications. 93E20, 93E15, 93E24, 49L20

PII. S0363012999366369

1. Introduction. Since the appearance of the original contribution of Astromand Wittenmark [1], the analysis of self-tuning control systems has constituted achallenging topic for theorists working in the area of adaptive control. The first sig-nificant convergence results were obtained in the late 1970s for minimum-variancecontrol schemes. In particular, a global convergence result for an adaptive controlsystem based on the stochastic gradient algorithm has been established in [13]. Ex-tensions to the least squares (LS) algorithm are dealt with in [32] by introducinga suitable modification to the standard recursive least squares algorithm. Such amodification is in fact not necessary in order to achieve optimality [20].

The common result of all the above-mentioned contributions is that a minimum-variance self-tuning control system obtains under various operating conditions thesame performance as the one achievable under complete knowledge of the true plant(self-optimization). It is important, however, to emphasize that the minimum-variancecontrol law calls for the restrictive—and often unrealistic—assumption that the plantis minimum-phase. Extending these results to more general control techniques suitablefor nonminimum-phase plants has attracted much attention in the last decade. Thecorresponding analysis, however, is far more complex.

It is by now well known (see, e.g., [21, 22, 18, 25, 33]) that the self-optimizationresult does not hold true for general control laws based on the minimization of mul-tistep performance indexes. As a matter of fact, the interplay between identificationand control in a certainty equivalence adaptive control scheme may result in the con-vergence of the parameter estimate to a parameterization different from the true onein absence of suitable excitation conditions (see, e.g., [5, 18, 2, 6]). When a costcriterion other than the output variance is considered, this identifiability problem re-sults in a strictly suboptimal performance. In particular, the identifiability problem

∗Received by the editors December 20, 1999; accepted for publication July 28, 2000; publishedelectronically January 25, 2001. Research supported by MURST under the projects on “Identificationand Control of Industrial Systems” and “Synthesis of Adaptive and Robust Controllers.”

http://www.siam.org/journals/sicon/39-5/36636.html†Dipartimento di Elettronica per l’Automazione, Universita degli Studi di Brescia, Via Branze

38, 25123 Brescia, Italy ([email protected], [email protected]).

1499

1500 M. PRANDINI AND M. C. CAMPI

is significant in infinite-horizon linear quadratic Gaussian (LQG) control and, in fact,in [33] it is proven that for a state space system subject to Gaussian noise the set ofthe parameterizations leading to optimality of LQG control is strictly contained inthe set of the potential convergence points.

A first approach to achieve optimality consists of securing the parameter con-sistency by introducing suitable probing signals in the control system. The probingsignals should be sufficiently exciting so that consistency is achieved, and—at thesame time—mild enough in order not to degrade the control system performance. In[8, 9, 10, 14, 28], this is obtained by a careful selection of an asymptotically vanishingdither noise. This approach is useful only in the case when noise injection is feasible.

A second approach—adopted in this paper—is based on the so-called cost-biasedmethod originally introduced in [21]. In order to better focus on the basic idea under-lying this approach and to highlight the main contributions given in the present paper,we proceed as follows: first we introduce the dynamic systems we consider; then weoutline the cost-biased approach with specific reference to our class of systems; finallywe put our results into perspective with the other existing results obtained along thecost-biased approach.

We consider dynamic systems in input-output form described by the followingequation:

A(ϑ◦; q−1) yt = B(ϑ◦; q−1)ut−1 + nt,(1.1)

where A(ϑ◦; q−1) = 1 −∑ni=1 a◦i q

−i and B(ϑ◦; q−1) =∑m

i=1 b◦i q−i+1 are polynomials

in the unit-delay operator q−1 and ϑ◦ = [ a◦1 a◦2 . . . a◦n b◦1 b◦2 . . . b◦m ]T is the systemparameter vector. The control objective is to minimize the quadratic cost

lim supN→∞

1

N

N−1∑t=0

[ y2t + β u2

t ],(1.2)

where the control weighting coefficient β is strictly positive.The basic idea of the cost-biased approach can be outlined as follows.Suppose a standard LS algorithm is used for the identification of system (1.1)

and let ϑLSt be the corresponding LS estimate at time t. According to the certainty

equivalence principle, the control action is obtained by the relation ut = ut (ϑLS

t ),where u

t (ϑ) indicates the optimal LQG control law for system (1.1) with parameterϑ. For ease of reference, let us introduce the symbol S(ϑ1, ϑ2) for the control systemformed by system (1.1) with parameter ϑ1 with the loop closed by ut = u

t (ϑ2).Since the identification is performed in closed-loop, it is expected that the behaviorof S(ϑ◦, ϑLS

t ) will be the same, at least in the long run, as the one of S(ϑLSt , ϑLS

t ).

Then, the LQG cost for S(ϑ◦, ϑLSt )—i.e., the incurred cost—will be the same as the

LQG cost for S(ϑLSt , ϑLS

t ). However, one should note that the latter configuration isoptimal for the estimated model, whereas the incurred cost obviously cannot be lowerthan the optimal cost for the true system. From this, one concludes that the leastsquares algorithm has a natural tendency to return estimates with an optimal costthat is not smaller than the optimal cost associated with the true system and that,when it is strictly larger, the adaptive scheme attains a suboptimal performance.

In the cost-biased approach an extra term that favors parameters with smalleroptimal cost is added to the LS identification cost. This extra term is selected with atwofold objective. On the one hand, it should be strong enough so that the optimalLQG cost associated with the estimated model is asymptotically not larger than the

ADAPTIVE LQG CONTROL OF INPUT-OUTPUT SYSTEMS 1501

optimal cost for the true system. If, on the other hand, it is so mild that the closed-loop identification property S(ϑ◦, ϑLS

t ) = S(ϑLSt , ϑLS

t ) is preserved, then one still has

that the incurred cost is equal to the cost for S(ϑLSt , ϑLS

t ). From this, optimality ofthe adaptive control scheme is achieved.

The cost-biased approach has been successfully applied in a number of differentsettings. Controlled Markov chains with a finite parameter set are considered in[21]. The results of this paper have been extended to Markov chains with an infiniteparameter set in [24] and to systems with a general state space but still with a finiteparameter set in [19].

Linear systems in a state space representation are dealt with in [18] and [7]. Inthese papers, the restrictive assumption that the state is fully accessible is made.Moreover, it is assumed that the noise system affects all state variables. This assump-tion is crucial for the correct functioning of the proposed identification procedure. Asa matter of fact, the presence of a full-range noise sheds light on the existing differencebetween the true system and the estimated model and this helps the identificationtask. In the paper [7], it is in fact shown that this mechanism is effective enoughso as to counteract the biasing effect of the cost-biasing term thus guaranteeing theclosed-loop identification property. Unfortunately, the assumption that the noise isfull-range is so restrictive that it cannot be applied to many situations of interest. Inparticular, a state space realization of the input-output system (1.1) does not satisfythis condition.

In the present paper, an optimal adaptive control scheme for system (1.1) stillbased on the cost-biasing idea is presented. Extending the cost-biased approach tosystems as (1.1) is important in that input-output systems are largely used in adaptivecontrol applications. Moreover, assuming only the input and output measurability ismuch more realistic than assuming full state accessibility. As a side remark we alsonote that, in contrast with [18] and [7], our approach does not require the noise to beGaussian.

The paper is organized as follows: in section 2, we describe the cost-biased adap-tive LQG control scheme and recall some relevant properties of the standard LS es-timates. The study of the cost-biased identification algorithm is presented in section3. Section 4 is devoted to the analysis of the closed-loop stability and the charac-terization of the self-tuning LQG control performance. Finally, section 5 presentsconclusions and suggestions for future research.

2. The cost-biased adaptive LQG control system.

2.1. The LQG optimal control problem. In this section, we summarize someknown facts on infinite-horizon LQG control relevant for the subsequent developments.This is also useful in order to introduce the assumptions and the notations we shalluse throughout the paper.

Consider the discrete time single input, single output (SISO) system (1.1) wheresignal nt is a stochastic disturbance precisely described in the following.

Assumption 2.1. {nt} is a martingale difference sequence with respect to afiltration {Ft}, satisfying the following conditions:

1. supt E[|nt|p/Ft−1] < ∞ almost surely (a.s) ∀p > 0;

2. limN→∞ 1N

∑N−1t=0 n2

t = σ2 > 0 a.s.

Note that Assumption 2.1 is satisfied when {nt} is an independently and identi-cally distributed (i.i.d.) Gaussian sequence, but it includes many other situations.

We make the assumption on system (1.1) that n > 0 (nontrivial autoregressive


part). Note that if n = 0, the trivial control law ut = 0, t ≥ 0, is obviously optimalirrespective of the value of ϑ◦.

We further assume that system (1.1) belongs to a known set of stabilizable modelsaccording to the following.

Assumption 2.2. ϑ◦ ∈ Θ, where Θ is a compact set such that Θ ⊂ {ϑ ∈ n+m : qsA(ϑ; q−1) and qs−1B(ϑ; q−1) have no unstable pole-zero cancellations}, s =max{n,m} being the order of the system.

System (1.1) is initialized at time t = 0 with yt = ut−1 = 0, t ≤ 0.For the determination of an optimal control law, it is convenient to represent

system (1.1) in a state space form such that the state is accessible and then applythe well-known solution to the optimal LQG control problem for full state accessiblestate space systems (see, e.g., [10], [3]).

Defining xt := [yt yt−1 . . . yt−(n−1) ut−1 ut−2 . . . ut−(m−1)]T , system (1.1) can be

given the following state space representation of order s := n + m− 1{xt+1 = A(ϑ◦)xt + B(ϑ◦)ut + Cnt+1, x0 = [0 0 . . . 0]T ,yt = Hxt

(2.1)

with matrices

A(ϑ) =

a1 . . . an−1 an1 0 . . .

. . .. . .

1 0

b2 . . . bm−1 bm0 . . . 0

. . . 00

0 . . . . . . 00 . . . . . . 0

. . .. . .

0 0

0 . . . . . . 01 0

. . .. . .

1 0

,

B(ϑ) =

b10...010...0

, C =

10...000...0

, H =

[1 0 · · · 0 0 · · · 0

].

In this way, the LQG regulation problem for the system in input-output representa-tion (1.1) is reformulated as a complete state information control problem where the

performance index to be minimized is given by lim supN→∞1N

∑N−1t=0 [xT

t Txt + βu2t ],

where T = HTH ≥ 0 and β > 0.Note that, in the case when n > 1 and m > 1, the state space representation

(2.1) of system (1.1) is nonminimal (the order of system (1.1) is s = max{n,m},whereas the dimension of matrix A(ϑ◦) is s = n + m − 1). However, from the blocktriangular matrix structure of A(ϑ◦) it is easily seen that the added eigenvalues areidentically equal to zero. Hence from Assumption 2.2 it follows that (A(ϑ◦), B(ϑ◦))


is stabilizable and (A(ϑ◦), H) is detectable and the standard approach based on thesolution to a Riccati equation can be used to determine the control law.

Specifically, the solution to the original LQG control problem has the followingexpression [10]:

ut = S(ϑ◦; q−1) yt + R(ϑ◦; q−1)ut,(2.2)

where S(ϑ◦; q−1) =∑n−1

i=0 si(ϑ◦)q−i and R(ϑ◦; q−1) =

∑m−1i=1 ri(ϑ

◦)q−i, and coeffi-cients {si(ϑ◦)} and {ri(ϑ◦)} are computed as follows.

Set L(ϑ◦) := [ s0(ϑ◦) s1(ϑ◦) . . . sn−1(ϑ◦) r1(ϑ◦) . . . rm−1(ϑ◦) ]. Then

L(ϑ◦) = −(B(ϑ◦)TP (ϑ◦)B(ϑ◦) + β)−1B(ϑ◦)TP (ϑ◦)A(ϑ◦),(2.3)

where P (ϑ◦) is the unique positive semidefinite solution to the discrete time algebraicRiccati equation

P = A(ϑ◦)T[P − PB(ϑ◦)(B(ϑ◦)TPB(ϑ◦) + β)−1B(ϑ◦)TP

]A(ϑ◦) + HTH.

Moreover, the optimal LQG cost is given by J(ϑ◦) = σ2trace(P (ϑ◦)CCT ), a.s.Remark 2.3. Since the positive semidefinite solution P (ϑ) to

P = A(ϑ)T[P − PB(ϑ)(B(ϑ)TPB(ϑ) + β)−1B(ϑ)TP

]A(ϑ) + HTH(2.4)

is analytic as a function of the parameter vector ϑ in the set C = {ϑ ∈ n+m :qsA(ϑ; q−1) and qs−1B(ϑ; q−1) have no unstable pole-zero cancellations} (see [12]), itis easily seen that si(ϑ), ri(ϑ), and J(ϑ) are analytic functions of ϑ, ϑ ∈ C, as well.

2.2. The cost-biased identification algorithm. Introducing the observationvector ϕt := [ yt . . . yt−(n−1) ut . . . ut−(m−1)]

T , system (1.1) can be given the regression-like form

yt = ϕTt−1ϑ

◦ + nt,(2.5)

and the LS identification index for the estimate of ϑ◦ is [26]

Vt(ϑ) =

t∑s=1

(ys − ϕTs−1ϑ)2.(2.6)

In the theorem below, we recall a fundamental result for the LS estimate proven in[23, Theorem 1].

Theorem 2.4. Suppose that ut is Ft-measurable. Then

(ϑ◦ − ϑLSt )T

t∑s=1

ϕs−1ϕTs−1(ϑ◦ − ϑLS

t ) = O

(log λmax

(t∑

s=1

ϕs−1ϕTs−1

))a.s.(2.7)

In particular, this implies that under the conditions

(i) λmin

(t∑

s=1

ϕs−1ϕTs−1

)→ ∞ a.s.,

(ii) log λmax

(t∑

s=0

ϕs−1ϕTs−1

)= o

(λmin

(t∑

s=1

ϕs−1ϕTs−1

))a.s.,


the LS estimate is consistent.

In adaptive control, identification is performed in closed-loop. Therefore, onecannot ensure the satisfaction of conditions (i) and (ii) and the true parameter vectoris generally not consistently estimated. Nevertheless, property (2.7) still provides avaluable bound on the discrepancy between the estimated parameter and the trueparameter. We call this property “closed-loop identification property” to emphasizethat it holds even in closed-loop. On the other hand, as discussed in section 1, theLS identification algorithm generally provides estimates with an optimal LQG costlarger than the optimal cost associated with the true system. This is the reason whyoptimality of an LS-based adaptive control scheme is not guaranteed.

Motivated by these considerations, we introduce a cost-biased identification al-gorithm with the twofold objective of preserving the LS property (2.7) and forcingthe estimates to lie asymptotically in the parameter region with an optimal cost notlarger than the optimal cost associated with the true system.

Consider the estimate ϑt computed through the following algorithm:

ϑt =

{arg min

ϑ∈ΘDt(ϑ) if t = ti, i = 0, 1, 2, . . . ,

ϑt−1 otherwise,(2.8)

where the time instants {ti} are obtained by the recursive equation ti+1 = ti + Ti

initialized with t0 = 0 and the cost-biased identification index Dt(ϑ) is given by

Dt(ϑ) = Vt(ϑ) + αtJ(ϑ) + γt‖ϑ− ϑt−1‖, ϑ−1 = 0,(2.9)

where Vt(ϑ) is the LS cost (2.6) and J(ϑ) is the optimal LQG cost for system (1.1)with parameter ϑ. The identification algorithm is completely defined by specifyingthe sequences of

• freezing time intervals {Ti},• cost-biasing weights {αt},• friction parameters {γt}.

We discuss hereafter the meaning of these parameters, while their actual choice ispostponed to the following section.

The freezing parameter Ti is used to ensure stability of the closed-loop system.Since the parameter estimate changes with time and the control law is tuned tosuch an estimate, the adaptive control system is time-varying. On the other hand,it is well known that guaranteeing a stability property at each time instant for the“frozen dynamics” does not imply that the overall time-varying system has a stabledynamics. This problem can be solved by updating the estimate at a slower rate thanthe updating of the system variables, and this is achieved by a suitable choice of Ti.This same approach is exploited, for instance, in [17], [27], and [29].

The cost-biasing term αtJ(ϑ) is introduced with the objective of penalizing those

parameterizations with high optimal LQG cost. The weight αt has to be appropri-ately selected so as to balance the contrasting objectives of preserving the closed-loopidentification property (2.7) and forcing the asymptotic estimate to correspond to amodel with value of the optimal LQG performance index not larger than the optimalperformance value for the true system.

Finally, the friction term γt‖ϑ − ϑt−1‖ is introduced so as to avoid the estimate

ϑt being subject to undesired jumps in the time instants ti when it is updated. Thisis necessary to prove optimality of the adaptive scheme.


3. Selection of {Ti}, {αt}, {γt} and properties of ϑt. The adaptive control

law is given by the optimal control law (2.2) with the estimate ϑt in place of ϑ◦

(certainty equivalence principle):

ut = S(ϑt; q−1) yt + R(ϑt; q

−1)ut.

The system

{yt+1 = [1 −A(ϑt; q

−1)] yt+1 + B(ϑt; q−1)ut,


−1)ut(3.1)

is then given the name of time-varying estimated system. We will select Ti so as tostabilize system (3.1) and later on in section 4 we shall see that this leads to the sta-bility of the true closed-loop system. Letting xt := [yt . . . yt−(n−1)ut−1 . . . ut−(m−1)]

T ,this system can be given the state space representation

xt+1 = F (ϑt)xt(3.2)

with

F (ϑ) = A(ϑ) + B(ϑ)L(ϑ),(3.3)

where matrices A(ϑ), B(ϑ), and L(ϑ) have been introduced in section 2.1.

Choose now a constant µ < 1 (contraction constant). The time interval Ti is thendefined as

Ti := inf{τ ∈ Z+ : ‖F (ϑti)τ‖ ≤ µ}(3.4)

(note that such a Ti exists since ϑti belongs to Θ and therefore corresponds to astabilizable system). In this way, the time-varying system (3.1) is kept constantuntil its dynamics is contracted by a factor µ, whence guaranteeing its stability. Thefollowing proposition makes this precise.

Proposition 3.1. The autonomous estimated system xt+1 = F (ϑt)xt is a.s.exponentially stable, uniformly in time.

Proof. The proof is given in the appendix.

The choice of {αt} and {γt} is discussed in the next theorem.

Theorem 3.2. Suppose that ut is Ft-measurable. Given δ > 0, select

αt := log1+δ λmax

(t∑

s=1

ϕs−1ϕTs−1

)(3.5)

and {γt} to be a positive diverging sequence of real numbers satisfying γt = o(αt).Then,

(i) (ϑ◦−ϑti)T

ti∑s=1

ϕs−1ϕTs−1(ϑ◦−ϑti) = O

(log1+δ λmax

(ti∑

s=1

ϕs−1ϕTs−1

))a.s.,

(ii) lim supt→∞

J(ϑt) ≤ J(ϑ◦) a.s.,

(iii) if

N∑t=1

‖ϕt−1‖2 = O(N) a.s., then

N∑t=1

‖ϑt − ϑt−1‖ = o(N) a.s.


Proof. The proof is given in the appendix.According to (3.5), {αt} is chosen to be an increasing sequence of real numbers

adaptively selected on the basis of the data generated by the controlled system. Ac-cording to result (ii), this selection is effective in pushing the estimate towards theregion where the optimal LQG cost is not larger than J(ϑ◦). In turn, result (i)shows that the closed-loop identification property (2.7) is preserved with two slightdifferences: (1) the exponent 1 + δ appears in the right-hand side, (2) the rate ofdivergence in point (i) of Theorem 3.2 concerns only the time instants ti when the

estimate ϑt is updated, while the original closed-loop identification property refers toall t’s.

Before ending this section, we state a proposition regarding the estimation error

et := ϕTt [ϑ◦ − ϑt].(3.6)

The technical proof of this proposition is given in the appendix and is obtained by asuitable manipulation of the sole result (i) in Theorem 3.2.

Proposition 3.3. The estimation error et = ϕTt [ϑ◦ − ϑt] satisfies the following

equation:

N∑t=0, t�∈BN

|et|p = o

(N∑t=0

‖ϕt‖p + N

), p ≥ 2, a.s.,

where BN is a set of instant points which depends on N , whose cardinality is bounded:|BN | ≤ CB ∀N .

4. Stability and optimality. The closed-loop system{yt+1 = [1 −A(ϑ◦; q−1)] yt+1 + B(ϑ◦; q−1)ut + nt+1,


−1)ut(4.1)

can be represented as a variation system with respect to the so-called estimated systemof (3.1) as follows:

{yt+1 = [1 −A(ϑt; q

−1)] yt+1 + B(ϑt; q−1)ut + nt+1 + et,


−1)ut,(4.2)

where et is defined in (3.6). The uniform stability property of the estimated system(3.1) (Proposition 3.1) and the property of et stated in Proposition 3.3 are exploitedin the next theorem to prove stability of system (4.1).

Theorem 4.1 (Lp-stability). The adaptive LQG control scheme{yt+1 = [1 −A(ϑ◦; q−1)] yt+1 + B(ϑ◦; q−1)ut + nt+1,


−1)ut

is Lp-stable: lim supN→∞1N

∑N−1t=0 [|yt|p + |ut|p] < ∞ a.s. ∀p > 0.

Proof. Fix a time point N > 0 and an integer d ≥ 1.From Proposition 3.3, there exists a set of instant points BN−1 such that

N−1∑t=0, t�∈BN−1

e2d

t = o

(N−1∑t=0

‖ϕt‖2d

+ N

)a.s.(4.3)


In view of representation (4.2) of system (4.1), it is easily seen that the state vector

xt = [yt . . . yt−(n−1) ut−1 . . . ut−(m−1)]T

is governed by the equation

xt+1 = F ◦(ϑt)xt + Cnt+1(4.4)

= F (ϑt)xt + C[et + nt+1],(4.5)

where F ◦(ϑ) = A(ϑ◦) + B(ϑ◦)L(ϑ), A(ϑ◦), B(ϑ◦), L(ϑ), and C are defined in section2.1, and F (ϑ) is given in (3.3).

For the following derivations, it is convenient to use representation (4.4) in thetime instants t ∈ BN−1 and representation (4.5) for t /∈ BN−1, thus finally leading to

xt+1 =

{F ◦(ϑt)xt + Cnt+1, t ∈ BN−1,

F (ϑt)xt + C[et + nt+1], t /∈ BN−1.(4.6)

Note now that since ϑt belongs to the compact set Θ and F ◦(ϑ) is a continuous

function of ϑ, ϑ ∈ Θ, we then have that ‖F ◦(ϑt)‖ is uniformly bounded. From this

fact and the uniform exponential stability of the autonomous system xt+1 = F (ϑt)xt

(Proposition 3.1), and the fact that |BN−1| ≤ CB ∀N (see Proposition 3.3), it is easyto show that the state vector xt generated by system (4.6) can be bounded as follows:

‖xt‖ ≤ k1

t∑i=1

νt−i|ni| +

t−1∑i=0,i/∈BN−1

νt−i|ei| , t ≤ N,

where k1 and ν ∈ (0, 1) are suitable constants. We now havek1

t∑i=1

νt−i|ni| +

t−1∑i=0,i/∈BN−1

νt−i|ei|

2d

≤ k2d

1

t∑i=1

νt−i|ni| +

t−1∑i=0,i/∈BN−1

νt−i|ei|

2

2d−1

≤ k2d

1

2

{t∑

i=1

νt−i2 (ν

t−i2 |ni|)

}2

+ 2

t−1∑i=0,i/∈BN−1

νt−i2 (ν

t−i2 |ei|)

2

2d−1

≤ k2d

1

2

t∑i=1

νt−it∑

i=1

νt−in2i + 2

t−1∑i=0,i/∈BN−1

νt−it−1∑

i=0,i/∈BN−1

νt−ie2i

2d−1

≤ k2d

1

(2

1 − ν

)2d−1 t∑

i=1

νt−in2i +

t−1∑i=0,i/∈BN−1

νt−ie2i

2d−1

.

Iterating this same equation d times, we then obtain

‖xt‖2d ≤ k2

t∑i=1

νt−in2d

i +

t−1∑i=0,i/∈BN−1

νt−ie2d

i

, t ≤ N,(4.7)


k2 being a suitable constant, from which we finally get

1

N

N∑t=1

‖xt‖2d ≤ k3

1

N

N∑t=1

n2d

t +1

N

N−1∑t=0,t/∈BN−1

e2d

t

,(4.8)

where k3 is a suitable constant, independent of N .We next bound the two terms in the right-hand side of (4.8).

The term 1N

∑Nt=1 n2d

t is handled as follows. Define vt := n2d

t −E[n2d

t |Ft−1]. Then{vt} is a martingale difference satisfying

∞∑t=1

1

t2E[v2

t |Ft−1] ≤∞∑t=1

1

t2E[n2d+1

t |Ft−1] < ∞,

due to Assumption 2.1. By applying Theorem 2.18 in [16], we then conclude that1N

∑N−1t=0 [n2d

t −E[n2d

t |Ft−1]] tends to zero, a.s. Since 1N

∑N−1t=0 E[n2d

t |Ft−1] is bounded

by Assumption 2.1, we finally have lim supN→∞1N

∑N−1t=0 n2d

t < ∞, a.s.

The term 1N

∑N−1t=0,t/∈BN−1

e2d

t is immediately bounded by means of (4.3) and the

final bound for 1N

∑Nt=1 ‖xt‖2d

is obtained

1

N

N∑t=1

‖xt‖2d

= O(1) + o

(1

N

N−1∑t=0

‖ϕt‖2d

)a.s.

Since 1N

∑N−1t=0 ‖ϕt‖2d ≤ 1

N

∑Nt=0 ‖xt‖2d

, this implies that 1N

∑N−1t=0 ‖ϕt‖2d

remainsbounded. Then, the thesis immediately follows from the arbitrariness of d and the

fact that 1N

∑N−1t=0 ‖ϕt‖p ≤ 1

N

∑N−1t=0 [‖ϕt‖2d

+ 1], p ≤ 2d.In the next theorem we show that the LQG adaptive control scheme is self-

optimizing.Theorem 4.2 (optimality). The adaptive LQG control scheme{

yt+1 = [1 −A(ϑ◦; q−1)] yt+1 + B(ϑ◦; q−1)ut + nt+1,


−1)ut

is self-optimizing: lim supN→∞1N

∑N−1t=0

[y2t + β u2

t

]= J(ϑ◦) a.s.

Proof. We start by showing that xt := [yt . . . yt−(n−1) ut−1 . . . ut−(m−1)]T satisfies

the following equation:

‖xt‖p = o(t), ∀p > 0, a.s.(4.9)

This condition will be useful in the subsequent derivations. By contradiction, supposethat there exist {tk}k≥0 and a real number η > 0, such that ‖xtk‖ > ηtk ∀k. Then

lim supN→∞

1

N

N∑t=1

‖xt‖1+p ≥ lim supk→∞

1

tk‖xtk‖1+p ≥ lim sup

k→∞

1

tkη1+pt1+p

k = ∞,

which contradicts Theorem 4.1.Observe now that the dynamic programming equation for the estimated model

xt+1 = A(ϑt)xt + B(ϑt)ut + Cnt+1 writes

J(ϑt) + xTt P (ϑt)xt = xT

t Txt + βu2t + E

[(A(ϑt)xt + B(ϑt)ut + Cnt+1)T

P (ϑt)(A(ϑt)xt + B(ϑt)ut + Cnt+1) | Ft

],(4.10)


where P (ϑ) is the solution to the Riccati equation (2.4). By (2.1) and the definition(3.6), xt can be given the following expression:

xt+1 = A(ϑt)xt + B(ϑt)ut + Cnt+1 + Cet.(4.11)

Substituting (4.11) in (4.10), we then get

J(ϑt) + xTt P (ϑt)xt = xT

t Txt + βu2t + E

[(xt+1 − Cet)

TP (ϑt)(xt+1 − Cet) | Ft

],

from which

1

N

N−1∑t=0

J(ϑt) − 1

N

N−1∑t=0

[xTt Txt + βu2

t ]

= − 1

N

N−1∑t=0

[xTt P (ϑt)xt − E

[xTt+1P (ϑt+1)xt+1 | Ft

]]

+1

N

N−1∑t=0

E[xTt+1(P (ϑt) − P (ϑt+1))xt+1 | Ft

]

+1

N

N−1∑t=0

E[eTt CTP (ϑt)Cet | Ft

]

−21

N

N−1∑t=0

E[xTt+1P (ϑt)Cet | Ft

].(4.12)

From property (ii) in Theorem 3.2, we get lim supN→∞1N

∑N−1t=0 J(ϑt) ≤ J(ϑ◦) a.s.

Therefore, the thesis will be proved if we show that all the terms in the right-handside of (4.12) tend to zeros as N → ∞. We shall study each term separately.

First term:

1

N

N−1∑t=0

[xTt P (ϑt)xt − E


]]= − 1

NxTNP (ϑN )xN

+1

NxT

0 P (ϑ0)x0 +1

N

N−1∑t=0

[xTt+1P (ϑt+1)xt+1 − E


]].

The term 1N xT

0 P (ϑ0)x0 equals zero. As for 1N xT

NP (ϑN )xN , observe that

1

NxTNP (ϑN )xN ≤ k1

1

N‖xN‖2,

k1 being a suitable constant, since P (ϑ) is uniformly bounded on the compact set Θ(see Remark 2.3). Therefore, from (4.9) we get

limN→∞

1

NxTNP (ϑN )xN = 0.

Define wt := xTt+1P (ϑt+1)xt+1 −E


]. Then {wt} is a martingale

difference. Hence, 1N

∑N−1t=0 wt asymptotically vanishes if

∑∞t=0

1t2 E[w2

t+1|Ft] < ∞(see Theorem 2.18 in [16]). We have

E[w2t+1|Ft] ≤ E

[(xT

t+1P (ϑt+1)xt+1)2| Ft

] ≤ k2E[‖xt+1‖4| Ft

] ≤ k3

[‖xt‖4+‖ut‖4+1

],


k2, k3 being suitable constants, since P (ϑ) is uniformly bounded over Θ and {nt}satisfies point 1 in Assumption 2.1. We then need to prove that

∑∞t=0

1t2 [‖xt‖4 +

‖ut‖4] < ∞. This is easily shown through (4.9) with p = 8, which implies ‖xt‖4 =o(t1/2) and u4

t = o(t1/2), since∑∞

t=01t2 [‖xt‖4 +‖ut‖4] =

∑∞t=0

1t3/2

1t1/2 [‖xt‖4 +‖ut‖4],

where∑∞

t=01

t3/2 converges.Second term:Observe that {vt} := {xT

t+1(P (ϑt)−P (ϑt+1))xt+1−E[xTt+1(P (ϑt)−P (ϑt+1))xt+1| Ft

]}

is a martingale difference. By derivations similar to those for the first term, we canprove that 1

N

∑Nt=0 vt → 0. Then

limN→∞

1

N

N−1∑t=0

E[xTt+1(P (ϑt) − P (ϑt+1))xt+1 | Ft

]= 0

is proven by showing that

limN→∞

1

N

N−1∑t=0

xTt+1(P (ϑt) − P (ϑt+1))xt+1 = 0.(4.13)

To prove (4.13), apply the Schwarz inequality to obtain∣∣∣∣∣ 1

N

N−1∑t=0

xTt+1(P (ϑt) − P (ϑt+1))xt+1

∣∣∣∣∣ ≤ 1

N

N−1∑t=0

‖P (ϑt) − P (ϑt+1)‖ ‖xt+1‖2

≤(

1

N

N−1∑t=0

‖P (ϑt) − P (ϑt+1)‖2

) 12(

1

N

N−1∑t=0

‖xt+1‖4

) 12

.

By Theorem 4.1 1N

∑N−1t=0 ‖xt+1‖4 is bounded. Moreover, limN→∞ 1

N

∑N−1t=0 ‖P (ϑt)−

P (ϑt+1)‖2 = 0 because of property (iii) in Theorem 3.2 and the Lipschitz continuityof P (ϑ) over Θ (P (ϑ) is analytic on Θ and Θ is compact). This concludes the proofof (4.13).

Third term:Since ϑt ∈ Θ and P (ϑ) is uniformly bounded on Θ, then

0 ≤ 1

N

N−1∑t=0

E[eTt CTP (ϑt)Cet | Ft

]=

1

N

N−1∑t=0

eTt CTP (ϑt)Cet ≤ h11

N

N−1∑t=0

e2t ,

h1 being a suitable constant. We now show that

limN→∞

1

N

N−1∑t=0

e2t = 0 a.s.(4.14)

From Proposition 3.3 it follows that there exists a set of instant points BN−1 whose

cardinality is upper bounded by a constant CB < ∞ such that 1N

∑N−1t=0, t�∈BN−1

e2t =

1N o(∑N−1

t=0 ‖ϕt‖2) a.s. Then, recalling the definition (3.6) of et, we have

1

N

N−1∑t=0

e2t =

1

No

(N−1∑t=0

‖ϕt‖2

)+

1

N

∑t∈BN−1

|ϕTt (ϑ◦ − ϑt)|2.


By Theorem 4.1, the first term tends to zero. As for the second term, we have thatit can be upper bounded as follows:

1

N

∑t∈BN−1

|ϕTt (ϑ◦ − ϑt)|2 ≤ h2 CB

1

Nmax

0≤t≤N−1‖ϕt‖2, h2 = suitable constant,

since ϑt is bounded uniformly in time. Noting that ‖ϕt‖2 ≤ ‖xt+1‖2 + ‖xt‖2, from(4.9) we get

‖ϕt‖2 = o(t) a.s.,(4.15)

which implies 1N max0≤t≤N−1 ‖ϕt‖2 → 0.

Fourth term:

1

N

N−1∑t=0

E[xTt+1P (ϑt)Cet | Ft

]

=1

N

N−1∑t=0

E[(A(ϑ◦)xt + B(ϑ◦)ut + Cnt+1)TP (ϑt)Cet | Ft

]

=1

N

N−1∑t=0

xTt A(ϑ◦)TP (ϑt)Cet +

1

N

N−1∑t=0

uTt B(ϑ◦)TP (ϑt)Cet.

We next show that each term on the right-hand side goes to zero as N tends to infinity.Since ϑt ∈ Θ, with Θ compact, and P (ϑ) is analytic on Θ, by using Schwarz

inequality, we have

∣∣∣∣∣ 1

N

N−1∑t=0

xTt A(ϑ◦)TP (ϑt)Cet

∣∣∣∣∣ ≤ k

(1

N

N−1∑t=0

‖xt‖2

) 12(

1

N

N−1∑t=0

e2t

) 12

.

Then limN→∞ 1N

∑N−1t=0 xT

t A(ϑ◦)TP (ϑt)Cet = 0 a.s. follows from Theorem 4.1 and(4.14).

Similarly, one can prove that limN→∞ 1N

∑N−1t=0 uT

t B(ϑ◦)TP (ϑt)Cet = 0a.s.

5. Conclusions. The more commonly adopted strategy for the design of adap-tive control laws is the certainty equivalence approach. Although the approach isconceptually simple, working out stability and optimality results for certainty equiva-lence adaptive control schemes is a difficult task even in the ideal case when the truesystem belongs to the model class. This is due to the intricate interaction betweencontrol and identification in closed-loop, which can cause identifiability problems.

We introduced a new LQG adaptive control scheme based on the certainty equiva-lence principle able to ensure both stability and optimality results irrespectively of theexcitation characteristics of the involved signals by adopting a cost-biased approach.

This paper presents the following limitations:• the true system is described as an ARX system subject to white noise. This

hypothesis is necessary mainly for the applicability of the proposed cost-biased LSidentification method, whose properties are in fact derived on the basis of the LSestimate properties. As a consequence of this fact, the extension to the ARMAXsystem case is not straightforward.


• the proposed identification method is nonrecursive. The cost-biased iden-tification index has, in general, multiple local minima and its minimization is notstraightforward. Therefore, it should be minimized by resorting to some global op-timization algorithm (see, e.g., [30, 31, 4, 15]). This limitation must be removed byintroducing some recursive way to minimize our performance index so as to retain allthe properties relevant to control.

These problems constitute interesting research issues. In particular, inspired bythe result obtained for the white noise case, one can conceive of introducing appro-priate cost-biased identification algorithms for the colored noise case. In this regard,much work has to be done, but an encouraging starting point is represented by thefact that the extended LS algorithm satisfies closed-loop properties similar to thosevalid for the LS algorithm (see, e.g., [10]).

Appendix. Proofs of the results in section 3.

Proof of Proposition 3.1. Recall that ϑt ∈ Θ, t ≥ 0, where Θ is compact and issuch that all the parametrizations in Θ correspond to stabilizable models. We startby proving that T (ϑ) := inf{τ ∈ Z+ : ‖F (ϑ)τ‖ ≤ µ} is uniformly bounded in thecompact set Θ, i.e., supϑ∈Θ T (ϑ) < ∞.

Condition ϑ ∈ Θ implies that the system A(ϑ; q−1)yt+1 = B(ϑ; q−1)ut associatedwith parameter ϑ is stabilizable and therefore stabilized by the control law ut =S(ϑ; q−1)yt + R(ϑ; q−1)ut. From this it follows that the dynamic matrix F (ϑ) of thetime-invariant system

xt+1 = F (ϑ)xt(A.1)

is exponentially stable.

Denote by {λi(ϑ)}i=1,...,n+m−1 the eigenvalues of F (ϑ).

By the observation that F (ϑ) is a continuous function of ϑ, C = {ϑ ∈ n+m :qsA(ϑ; q−1) and qs−1B(ϑ; q−1) have no unstable pole-zero cancellations}, we have thatλ(ϑ) := maxi∈{1,...,n+m−1} |λi(ϑ)| is also a continuous function of ϑ, ϑ ∈ C. Being Θcompact and included in C, the conclusion is finally drawn that

λ := maxϑ∈Θ

λ(ϑ) < 1.

Fix now a real number ν ∈ (λ, 1) and introduce the system

wt+1 =1

νF (ϑ)wt.(A.2)

System (A.2) is exponentially stable ∀ϑ ∈ Θ, since |λi(ϑ)ν | ≤ λ

ν < 1 ∀ i, ∀ϑ ∈ Θ.Hence, the solution S(ϑ) to the Lyapunov equation associated with matrix 1

ν F (ϑ)

1

νF (ϑ)T S(ϑ)

1

νF (ϑ) − S(ϑ) = −I

is positive definite. Moreover, it is a standard fact that the state vector wt of system(A.2) can be bounded as follows in terms of S(ϑ):

‖wt‖ ≤√

λmax(S(ϑ))

λmin(S(ϑ))‖ wt�‖, t ≥ t ≥ 0,(A.3)


where λmax(S(ϑ)) and λmin(S(ϑ)) are, respectively, the maximum and minimumeigenvalues of S(ϑ). Since S(ϑ) is continuous in the closed set Θ (see [12]), we can de-

fine c := maxϑ∈Θ

√λmax(S(ϑ))λmin(S(ϑ)) and rewrite inequality (A.3) as ‖wt‖ ≤ c ‖wt�‖, t ≥ t

∀ϑ ∈ Θ. Setting wt� = xt� , we finally get a bound on the state vector xt of the time-invariant system (A.1)

‖xt‖ ≤ c νt−t�‖xt�‖, t ≥ t, ∀ϑ ∈ Θ.(A.4)

Set T = inf{τ ∈ Z+ : c ντ ≤ µ} < ∞. Since ‖xT+t�‖ = ‖F (ϑ)T xt�‖ ≤ µ‖xt�‖∀ϑ ∈ Θ, ∀xt� , then ‖F (ϑ)T ‖ = sup‖x‖�=0

‖F (ϑ)T x‖‖x‖ ≤ µ ∀ϑ ∈ Θ, and therefore

T (ϑ) = inf{τ ∈ Z+ : ‖F (ϑ)τ‖ ≤ µ} satisfies T (ϑ) ≤ T ∀ϑ ∈ Θ. This finally impliesthat

supϑ∈Θ

{T (ϑ)} ≤ T < ∞.(A.5)

Let us turn now to considering the time-varying system xt+1 = F (ϑt)xt.

Being ϑt ∈ Θ, t ≥ 0, from (A.5) it follows that the updating time interval Ti in(3.4) is uniformly bounded:

T := supi≥0

Ti < T .(A.6)

We are now in a position to establish the uniform exponential stability. We apply(A.4) to the state vector xt on each finite time interval [ti, ti+1], thus getting

‖xt‖ ≤ c νt−t�‖xt�‖, ti ≤ t ≤ t ≤ ti+1.(A.7)

If we choose t = ti, we have ‖xt‖ ≤ c νt−ti‖xti‖, t ∈ [ti, ti+1]. From the definition(3.4) of {Tk}, it follows that ‖xti‖ ≤ µi−j‖xtj‖, j ≤ i. By applying (A.7) in the time

interval [tj−1, tj ] with t = tj , we get ‖xtj‖ ≤ c νtj−t�‖xt�‖, t ∈ [tj−1, tj ]. These lastthree inequalities lead to

‖xt‖ ≤ c νt−tiµi−j c νtj−t�‖xt�‖, tj−1 ≤ t ≤ tj ≤ ti ≤ t ≤ ti+1, j ≤ i.

By setting ν = max{ν, µ 1T } < 1, we have that µ ≤ νTk ∀k and therefore

‖xt‖ ≤ c2 νt−tiνti−ti−1 . . . νtj+1−tjνtj−t�‖xt�‖= c2 νt−t�‖xt�‖, tj−1 ≤ t ≤ tj ≤ ti ≤ t ≤ ti+1.

Finally, from this last inequality and inequality (A.7), we get ‖xt‖ ≤ c2 νt−t�‖xt�‖,t ≤ t, i.e., the thesis.

Proof of Theorem 3.2. Point (i): Dt(ϑ) − Vt(ϑLSt ) can be written as follows:

Dt(ϑ) − Vt(ϑLSt )=

t∑s=1

(ys − ϕTs−1ϑ)2 + αtJ

(ϑ) + γt‖ϑ− ϑt−1‖ −t∑

s=1

(ys − ϕTs−1ϑ

LSt )2

=ϑTt∑

s=1

ϕs−1ϕTs−1ϑ− 2ϑT

t∑s=1

ϕs−1ys + αtJ(ϑ) + γt‖ϑ− ϑt−1‖(A.8)

−(ϑLSt )T

t∑s=1

ϕs−1ϕTs−1ϑ

LSt + 2(ϑLS

t )Tt∑

s=1

ϕs−1ys.


The LS estimate ϑLSt minimizing Vt(ϑ) satisfies the following equality:

t∑s=1

ϕs−1ys =

t∑s=1

ϕs−1ϕTs−1ϑ

LSt .

Substituting this last expression in (A.8), we obtain

Dt(ϑ) − Vt(ϑLSt ) = (ϑ− ϑLS

t )Tt∑

s=1

ϕs−1ϕTs−1(ϑ− ϑLS

t )(A.9)

+αtJ(ϑ) + γt‖ϑ− ϑt−1‖.

Set ϑt := arg minϑ∈Θ Dt(ϑ). By definition of ϑt we have

Dt(ϑt) − Vt(ϑLSt ) ≤ Dt(ϑ) − Vt(ϑ

LSt ), ϑ ∈ Θ.

By choosing ϑ = ϑ◦ and using expression (A.9), we then get

(ϑt − ϑLSt )T

t∑s=1

ϕs−1ϕTs−1(ϑt − ϑLS

t ) + αtJ(ϑt) + γt‖ϑt − ϑt−1‖

≤ (ϑ◦ − ϑLSt )T

t∑s=1

ϕs−1ϕTs−1(ϑ◦ − ϑLS

t ) + αtJ(ϑ◦) + γt‖ϑ◦ − ϑt−1‖(A.10)

= O(αt) a.s.,

where the last equality follows from Theorem 2.4, the definition (3.5) of αt, the fact

that ‖ϑ◦ − ϑt−1‖ is bounded, and the relation γt = o(αt). Since αtJ(ϑt) + γt‖ϑt −

ϑt−1‖ ≥ 0, we have (ϑt − ϑLSt )T

∑ts=1 ϕs−1ϕ

Ts−1(ϑt − ϑLS

t ) = O(αt) a.s. From defini-tion (3.5) of αt and Theorem 2.4, we then have

(ϑt − ϑ◦)Tt∑

s=1

ϕs−1ϕTs−1(ϑt − ϑ◦) ≤ 2

[(ϑt − ϑLS

t )Tt∑

s=1

ϕs−1ϕTs−1(ϑt − ϑLS

t )

+(ϑLSt − ϑ◦)T

t∑s=1

ϕs−1ϕTs−1(ϑLS

t − ϑ◦)]

= O(αt) a.s.,

thus concluding the proof of point (i), since ϑt = ϑt, for t = ti, i = 0, 1, . . . .Point (ii): A simple elaboration of (A.10) shows that

J(ϑt) ≤ (ϑ◦ − ϑLSt )T

∑ts=1 ϕs−1ϕ

Ts−1(ϑ◦ − ϑLS

t )

αt+ J(ϑ◦) +

γtαt

‖ϑ◦ − ϑt−1‖

=O(log λmax(

∑ts=1 ϕs−1ϕ

Ts−1))

log1+δ λmax(∑t

s=1 ϕs−1ϕTs−1)

+o(αt)

αt+ J(ϑ◦) a.s.,

where in the second equation we have used the definition of αt given in equation(3.5) and the fact that γt = o(αt). To conclude the proof, it suffices to show thatlimt→∞ log λmax(

∑ts=1 ϕs−1ϕ

Ts−1) = ∞. The easy proof of this fact is omitted.

Point (iii): By the definition (2.8) of ϑt, we have

Vt(ϑt) + αtJ(ϑt) + γt‖ϑt − ϑt−1‖ ≤ Vt(ϑt−1) + αtJ

(ϑt−1),


which implies

N∑t=1

γt‖ϑt − ϑt−1‖ ≤N∑t=1

[Vt(ϑt−1) − Vt(ϑt)] +

N∑t=1

αt[J(ϑt−1) − J(ϑt)].(A.11)

The first term in the right-hand side of (A.11) can be bounded as follows:

N∑t=1

[Vt(ϑt−1) − Vt(ϑt)] ≤ V1(ϑ0) − VN (ϑN ) +

N−1∑t=1

[Vt+1(ϑt) − Vt(ϑt)]

≤ V1(ϑ0) +

N−1∑t=1

[ϕTt (ϑ◦ − ϑt) + nt+1]2

≤ V1(ϑ0) + 2

N−1∑t=1

[ϕTt (ϑ◦ − ϑt)]

2 + 2

N−1∑t=1

n2t+1

≤ k1

[1 +

N∑t=1

‖ϕt−1‖2 +

N−1∑t=1

n2t+1

],

k1 being a suitable constant, where we used the boundedness of ϑt.By Remark 2.3, the second term in the right-hand side of (A.11) can be bounded

as follows:

N∑t=1

αt[J(ϑt−1) − J(ϑt)] = α1J

(ϑ0) − αNJ(ϑN ) +

N−1∑t=1

(αt+1 − αt)J(ϑt)

≤ α1J(ϑ0) + max

ϑ∈ΘJ(ϑ)

N−1∑t=1

(αt+1 − αt)

= k2[1 + αN ],

where k2 is a suitable constant.Substituting these bounds in (A.11), we then have

1

N

N∑t=1

γt‖ϑt − ϑt−1‖ ≤ k

[1

N+

αN

N+

1

N

N∑t=1

‖ϕt−1‖2 +1

N

N−1∑t=1

n2t+1

],(A.12)

with k = suitable constant. Observe now that all the terms in the right-hand sideof (A.12) are O(1). This, in particular, follows from the assumption of point (iii)

in Theorem 3.2 that∑N

t=1 ‖ϕt−1‖2 = O(N) and Assumption 2.1, point 2. Then1N

∑Nt=1 γt‖ϑt − ϑt−1‖ = O(1). Since γt tends to infinity, this last equation implies

1N

∑Nt=1 ‖ϑt − ϑt−1‖ = o(1), that is, the thesis.Proof of Proposition 3.3. Fix a real number ε > 0 and a time instant N . Con-

sider the set of instant points in the interval [0, N ] where ϑt := ϑ◦ − ϑt changes:t0, t1, . . . , ti(N), where i(N) := max{i : ti ≤ N}. In these instant points we define a

set of subspaces {Sti}i(N)i=0 through the following backward recursive procedure:

for i = i(N) + 1, set Si = ∅,for i = i(N), i(N)− 1, . . . , 0, set (here and throughout the symbol ϑt,S stands for

the projection of vector ϑt onto the subspace S)

Sti =

{Sti+1 if ‖ϑti,S⊥

ti+1

‖ ≤ ε,

Sti+1 ⊕ span{ϑti} otherwise.(A.13)


For each t ∈ [0, N ], with the notation i(t) := max{i : ti ≤ t}, we have

|ϕTt ϑt|p ≤ cp |ϕT

t,S⊥ti(t)

ϑt,S⊥ti(t)

|p + c1 |ϕTt,Sti(t)

ϑt,Sti(t)|p,(A.14)

where cp is a suitable constant depending on p. By definition (A.13), the first termin the right-hand side can be upper bounded as follows:

|ϕTt,S⊥

ti(t)

ϑt,S⊥ti(t)

|p ≤ εp‖ϕt‖p.(A.15)

To handle the second term, we first work out a basis in Sti(t) . For this purpose,

consider the subset {τj}dim(St0 )j=1 of instant points {ti}i(N)

i=0 such that subspace Sti

enlarges: Sτj ⊃ Sti , ti > τj . The searched basis is {ϑτj}dim(St0)

j=dim(St0)−dim(Sti(t)

)+1.

In view of the uniform boundedness of ϑt and also considering the very definitionof subspaces Sti (equation (A.13)), it is easy to see that vectors {ϑτj} are spreadin subspace Sti(t) in such a way that the angle between each pair of vectors tendsto zero only when ε → 0. Consequently, there exists a constant c(ε), depending onε, but independent of N , such that term |ϕT

t,Sti(t)ϑt,Sti(t)

|p in the right-hand side of

inequality (A.14) can be bounded as follows:

|ϕTt,Sti(t)

ϑt,Sti(t)|p ≤ ∆p‖ϕt,Sti(t)

‖p

≤ ∆pc(ε)

dim(St0)∑

j=dim(St0 )−dim(Sti(t))+1

‖ϕt,span{ϑτj}‖p,(A.16)

where ∆ = maxϑ1,ϑ2∈Θ‖ϑ1 − ϑ2‖.By plugging estimates (A.15) and (A.16) in (A.14), we obtain

|ϕTt ϑt|p ≤ cpε

p‖ϕt‖p + cp ∆p c(ε)

dim(St0 )∑j=dim(St0 )−dim(Sti(t)

)+1

‖ϕt,span{ϑτj}‖p.

Summing up these relations from time t = 0 to t = N , we finally have

N∑t=0


pN∑t=0

‖ϕt‖p + cp ∆p c(ε)

N∑t=0

dim(St0 )∑j=dim(St0

)−dim(Sti(t))+1


(A.17)Introduce now the time-varying set of instant points

BN := ∪dim(St0)

j=1 {τj , τj + 1, . . . , τj + T − 1},

where T := supi≥0 Ti < ∞ (see (A.6) in the proof of Proposition 3.1). Sincedim(St0) ≤ n + m, we obviously have |BN | ≤ T (n + m).

Then

N∑t=0, t�∈BN

dim(St0 )∑j=dim(St0 )−dim(Sti(t)

)+1

‖ϕt,span{ϑτj}‖p ≤

dim(St0 )∑j=1

τj−1∑t=0



We now show that

ti−1∑t=0

|ϕTt ϑti |p = o

(ti−1∑t=0

‖ϕt‖p)

a.s.,(A.18)

from which it follows that

N∑t=0, t�∈BN

dim(St0)∑

j=dim(St0 )−dim(Sti(t))+1

‖ϕt,span{ϑτj}‖p ≤ n + m

εp

[o

(N∑t=0

‖ϕt‖p)

+ O(1)

],

(A.19)where we used the fact that dim(St0) ≤ n + m ∀N .

Observe first that

ti−1∑t=0

‖ϕt‖2 = O

(ti−1∑t=0

‖ϕt‖p)

a.s.(A.20)

Indeed, using Jensen’s inequality [11, Corollary 1 in section 4.3])

ti−1∑t=0

‖ϕt‖2 = ti

( 1

ti

ti−1∑t=0

‖ϕt‖2

)p/2

2/p

≤ ti

[1

ti

ti−1∑t=0

‖ϕt‖p]2/p

=

ti−1∑t=0

‖ϕt‖p[

ti∑ti−1t=0 ‖ϕt‖p

]1−2/p

,

where

lim supi→∞

ti∑ti−1t=0 ‖ϕt‖p

< ∞ a.s.(A.21)

This last equation is easily derived as follows. From the regression-like form yt =ϕTt−1ϑ

◦ + nt, it follows that |nt|p ≤ 2p−1 max{‖ϑ◦‖, 1}[ |yt|p + ‖ϕt−1‖p ]. Takinginto account that the autoregressive part of system is not trivial (n > 0), this inturn implies that |nt|p ≤ h1[ ‖ϕt‖p + ‖ϕt−1‖p ], from which it is easily shown that∑N−1

t=1 |nt|p ≤ h1

∑N−1t=0 ‖ϕt‖p, where h1 is a suitable constant. Since 1

N

∑N−1t=1 |nt|p ≥

[ 1N

∑N−1t=0 n2

t ]p/2 (using Jensen’s inequality), from Assumption 2.1, we then get

lim supN→∞

1

N

N−1∑t=0

‖ϕt‖p > 0 a.s.,

from which (A.21) follows.

By means of (A.20), we now show that∑ti−1

t=0 |ϕTt ϑti |p = o(

∑ti−1t=0 ‖ϕt‖p) a.s.,

which implies (A.18). This equation is easily derived from property (i) in Theorem 3.2


as follows:

ti−1∑t=0

|ϕTt ϑti |p ≤

∣∣∣∣∣ti−1∑t=0

|ϕTt (ϑ◦ − ϑti)|2

∣∣∣∣∣p/2

= o

(Log

ti−1∑t=0

‖ϕt‖)p(1+δ)/2

(by property (i))

= o

(ti−1∑t=0

‖ϕt‖2

)

= o

(ti−1∑t=0

‖ϕt‖p)

(by (A.20)).

By using inequality (A.17) and inequality (A.19), we obtain

N∑t=0, t�∈BN


pN∑t=0

‖ϕt‖p + cp∆pc(ε)n + m

εp

[o

(N∑t=0

‖ϕt‖p)

+ O(1)

]

≤ cpεpO

(N∑t=0

‖ϕt‖p + N

)+ cp∆pc(ε)

n + m

εpo

(N∑t=0

‖ϕt‖p + N

),

which finally implies that

lim supN→∞

N∑t=0, t�∈BN

|ϕTt ϑt|p

N∑t=0

‖ϕt‖p + N

≤ cpεp.

Since ε is arbitrarily chosen, the thesis follows.

REFERENCES

[1] K. Astrom and B. Wittenmark, On self-tuning regulators, Automatica, 9 (1973), pp. 185–189.

[2] A. Becker, P. R. Kumar, and C. Z. Wei, Adaptive control with the stochastic approximationalgorithm: Geometry and convergence, IEEE Trans. Automat. Control, AC-30 (1985),pp. 330–338.

[3] D. P. Bertsekas, Dynamic Programming and Optimal Control, Vols. I and II, Athena Scien-tific, Belmont, MA, 1995.

[4] B. Betro and F. Schoen, Sequential stopping rules for the multistart algorithm in globaloptimisation, Math. Programming, 38 (1987), pp. 271–286.

[5] V. Borkar and P. P. Varaiya, Adaptive control of Markov chains, I: Finite parameter set,IEEE Trans. Automat. Control, AC-24 (1979), pp. 953–958.

[6] M. C. Campi, The problem of pole-zero cancellation in transfer function identification andapplication to adaptive stabilization, Automatica, 32 (1996), pp. 849–857.

[7] M. C. Campi and P. R. Kumar, Adaptive linear quadratic Gaussian control: The cost-biasedapproach revisited, SIAM J. Control Optim., 36 (1998), pp. 1890–1907.

[8] H. F. Chen and L. Guo, Convergence rate of least-squares identification and adaptive controlfor stochastic systems, Internat. J. Control, 44 (1986), pp. 1459–1476.

[9] H. F. Chen and L. Guo, Optimal adaptive control and consistent parameter estimates forARMAX model with quadratic cost, SIAM J. Control Optim., 25 (1987), pp. 845–867.


[10] H. F. Chen and L. Guo, Identification and Stochastic Adaptive Control, Birkhauser, Boston,1991.

[11] Y. S. Chow and H. Teicher, Probability Theory: Independence, Interchangeability, Martin-gales, 3rd ed., Springer Texts Statist., Springer-Verlag, New York, 1997.

[12] D. F. Delchamps, Analytic feedback control and the algebraic Riccati equation, IEEE Trans.Automat. Control, AC-29 (1984), pp. 1031–1033.

[13] G. C. Goodwin, P. J. Ramadge, and P. E. Caines, Discrete-time stochastic adaptive control,SIAM J. Control Optim., 19 (1981), pp. 829–853.

[14] L. Guo, Self-convergence of weighted least-squares with applications to stochastic adaptivecontrol, IEEE Trans. Automat. Control, 41 (1996), pp. 79–89.

[15] B. Hajek, A tutorial survey of theory and applications of simulated annealing, in Proceedingsof the 24th IEEE Conference on Decision and Control, Fort Lauderdale, FL, December1985, IEEE, Piscataway, NJ, 1985, pp. 755–760.

[16] P. Hall and C. Heyde, Martingale limit theory and its application, Probability and Mathe-matical Statistics, Z.W. Birnbaum and E. Lukacs, eds., Academic Press, New York, 1980.

[17] J. Kanniah, O. P. Malik, and G. S. Hope, Self-tuning regulator based on dual-rate sampling,IEEE Trans. Automat. Control, AC-29 (1984), pp. 755–759.

[18] P. R. Kumar, Optimal adaptive control of linear-quadratic-Gaussian systems, SIAM J. ControlOptim., 21 (1983), pp. 163–178.

[19] P. R. Kumar, Simultaneous identification and adaptive control of unknown systems over finiteparameter sets, IEEE Trans. Automat. Control, AC-28 (1983), pp. 68–76.

[20] P. R. Kumar, Convergence of adaptive control schemes using least-squares parameter esti-mates, IEEE Trans. Automat. Control, AC-35 (1990), pp. 416–424.

[21] P. R. Kumar and A. Becker, A new family of optimal adaptive controllers for Markov chains,IEEE Trans. Automat. Control, AC-27 (1982), pp. 137–146.

[22] P. R. Kumar and W. Lin, Optimal adaptive controllers for unknown Markov chains, IEEETrans. Automat. Control, AC-27 (1982), pp. 765–774.

[23] T. L. Lai and C. Z. Wei, Least squares estimates in stochastic regression models with applica-tions to identification and control of dynamic systems, Ann. Statist., 10 (1982), pp. 154–166.

[24] W. Lin and P. Kumar, Stochastic control of a queue with two servers of different rates, inAnalysis and Optimization of Systems, A. Bensoussan and J.-L. Lions, eds., Lecture Notesin Control and Inform. Sci. 44, Springer-Verlag, New York, 1982, pp. 719–728.

[25] W. Lin, P. R. Kumar, and T. I. Seidman,Will the self-tuning approach work for general costcriteria?, System Control Lett., 6 (1985), pp. 77–85.

[26] L. Ljung, System Identification: Theory for the User, Prentice-Hall, Englewood Cliffs, NJ,1999.

[27] R. Ortega, R. Kelly, and R. Lozano-Leal, On global stability of adaptive systems using anestimator with parameter freezing, IEEE Trans. Automat. Control, AC-34 (1989), pp. 343–346.

[28] M. Prandini, Adaptive LQG Control: Optimality Analysis and Robust Controller Design,Ph.D. Thesis, University of Brescia, Brescia, Italy, 1998.

[29] M. Prandini, S. Bittanti, and M. C. Campi, A penalized identification criterion for securingcontrollability in adaptive control, J. Math. Systems Estim. Control, 8 (1998), pp. 491–494.(Retrieval code for the electronic version: 29460.)

[30] A. H. G. Rinnooy Kan and G. T. Timmer, Stochastic global optimization methods. I. Clus-tering methods, Math. Programming, 39 (1987), pp. 27–56.

[31] A. H. G. Rinnooy Kan and G. T. Timmer, Stochastic global optimization methods. II. Mul-tilevel methods, Math. Programming, 39 (1987), pp. 57–78.

[32] K. S. Sin and G. C. Goodwin, Stochastic adaptive control using a modified least squaresalgorithm, Automatica, 18 (1982), pp. 815–321.

[33] J. H. van Schuppen, Tuning of Gaussian stochastic control systems, IEEE Trans. Automat.Control, AC-39 (1994), pp. 2178–2190.

adaptive lqg control of input-output systems—a cost-biased...

Documents