quadratic regulatory theory for analytic non-linear systems with additive controls

Automanca. Vol. ~. No. 4. pp. 531-544, 1989 Printed in Great Britain.

0005-1098/89 $3.00 + 0.00 Pergamon Press plc

(~) 1989 International Federation of Automatic Control

Quadratic Regulatory Theory for Analytic Non-linear Systems with Additive Controlst

TAKETOSHI YOSHIDA~: and KENNETH A. LOPARO§

A quadratic regulator theory, developed for a class of non-linear systems with additive control, is based on Carleman linearization and provides a systematic methodology for feedback control synthesis; several examples illustrate the results for finite and infinite time problems.

Key Words--Optimal control; non-linear systems; optimal regulators; control systems; minimum principle; dynamic programming.

Alntrad--Quadratic regulator problems over finite and infinite time intervals are solved for non-linear systems which are analytic in the state and linear in the control. The solution approach is based on using a formal power series expansion method for solving a non-linear Hamiltonian system. The technique of Carleman linearization facilitates the approach. The unique solution to the non-linear optimal regulator problem is obtained in terms of a convergent power series which satisfies the Hamilton-Jacohi-Beliman equation of dynamic programming. The realization of the optimal regulator is investigated and sufficient conditions are derived from the optimal regulator using the technique of Carleman iinearization. Several examples are given to illustrate the theory.

1. INTRODUCTION

OZnMAL CONTROL THEORY has played an important role in many problems in modern control engineering. The most notable is linear optimal control theory, in particular the linear quadratic regulator (LQR) problem. Qualitative and quantitative aspects of the LQR problem are well documented in the literature (Anderson and Moore, 1971).

For the general class of non-linear systems the optimal control formalism has not been that successful. The construction and realization of optimal control laws for non-linear systems is an immature field from an implementation perspective. There have been a wide variety of techniques and approximations used to circum- vent the difficulties associated with the non-

t Received 2 April 1985; revised 19 August 1986; revised 18 August 1987; received in final form 5 December 1988. The original version of this paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor V. Utkin under the direction of Editor H. Kwakernaak.

:1: Author to whom correspondence should be addressed. IBM Research, Tokyo Research Laboratory, IBM Japan, Ltd., 5-19 Sanbancho, Chiyoda-Ku, Tokyo 102, Japan.

§Department of Systems Engineering, Case Western Reserve University, Cleveland, OH 44106, U.S.A.

531

linear optimal control problem; the primary emphasis is usually on the synthesis of suboptimal control laws. For example, a perturbation method was introduced in Nishikawa et al. (1971) for the design of a suboptimal control law.

The class of control systems of interest in this work is characterized by smooth (or analytic) vector fields. Thus, formal power series expansion methods are applicable. There have been many papers which use a power series expansion method (e.g. Al'brekht (1962), Chap. 4 of Lee and Marcus (1967), Lukes (1969), Willemstein (1977), Zubov (1975), and Sain (1985)). The works by Lukes, Willemstein, and Zubov are the closest to the work described in this paper.

The work of Lukes (1969) deals with a general class of non-linear optimal control problems where the system dynamics and cost functions are smooth vector-valued and real-valued functions of the state and control vectors, respectively. For this set-up the main result is the existence of a unique stabilizing feedback control in the neighborhood of the origin in the state space. If the smoothness assumption is strength- ened to analyticity, this work proves that the cost-to-go function associated with the stabilizing feedback control is an analytic function in the neighborhood of the origin in the state space. Willemstein (1977) extended the results of Lukes to finite time problems using the same techniques. The approach taken by Lukes and Willemstein shows that the optimal control (or cost-to-go) can be expanded in a power series about the initial state and that this power series converges in the neighborhood of the origin. Such an approach is not directly applicable to extending these works without further detailed analysis.

532 T. YOSHIDA and K. A. LOPARO

The work of Zubov (1975)? is essentially the same in spirit as the work of Lukes and Willemstein in that a formal power series expansion method is used to characterize the optimal control of a non-linear regulator type problem. In the work of Zubov, a perturbation- type approach is used to show that the first term of an expansion of the optimal feedback control and the optimal cost-to-go function are the same as those associated with a linear quadratic regulator problem. The behavior of the power series is only considered in the neighborhood of the origin in the state space; no global analysis is presented. It is interesting to note that references to Zubov's work from 1966 (and 1975) do not appear in the work of Lukes or Willemstein and no references to Lukes' work appear in the later work of Zubov in 1975. In any case, the work published in these three references provided a foundation for the power series expansion method but does not establish a global theory. The usefulness of the power series expansion method is exhibited by some illustra- tive examples.

There are many aspects of the development which need to be investigated in more detail. The practical requirements are the realization of optimal or suboptimal control laws, the systematic development of algorithms, and so on. Moreover, a sound theoretical foundation needs to be established relating to existence, uniqueness and convergence aspects of the (formal) power series representation of the optimal control law. The purpose of this paper is to investigate a class of free end point non-linear optimal control problems over a finite time interval. The system is assumed to be analytic in the state and linear additive in the control. This is a special case of the problem treated by Lukes, Willemstein, and Zubov, for which stronger results are obtained in this paper. A minor assumption is that the cost criterion is quadratic in the state and control. The main objectives of the paper are given below as:

(i) the construction of the optimal control law in a systematic fashion;

(ii) the convergence analysis of the power series which defines the optimal control law;

(iii) an investigation of the realizability of the optimal control laws;

(iv) the construction of suboptimal control laws.

Throughout the paper the technique of Carleman linearization and the calculus of

vectors and matrices are used (see Brockett (1973), Vetter (1973), Krener (1974), Baillieul (1976), Loparo and Blankenship (1977, 1978), or Brewer (1978) for details).

2. PROBLEM DEFINITIONS AND ASSUMPTIONS

Consider the optimal control problem

Minimize J = ~ [x'(t)Qx(t) + u'(t)eu(t)] dt

u(t) e U

O<_t<_T

subject to

dx =f(x) + Bu; x(0) = Xo, t-->0 (1)

where U is the set of square integrable controls defined on the time interval [0, T]; L2[0; T].

Assumptions. ( A 1 ) x e R n, u e R " . (A2) f : R" ~ R" is an analytic vector field and

satisfies f(0) = 0, that is x -~ 0 is an equilibrium state of the system (1) with u =- 0.

(A3) Q = Q ' = C ' C > - 0 , R = R ' > 0 . (A4) The triple (A, B, C) is completely

controllable and observable, where A is the Jacobian of the function f (x ) evaluated at the equilibrium state x = 0

A = af(x---2ax' x=o

Remarks. ( i )The above problem includes the LQR problem as a special case. (ii) In the standard LQR theory the assumption (A4) guarantees the existence and uniqueness of the solution of a Riccati type equation. This assumption can be relaxed to the assumption that the triple (A, B, C) is stabilizable and detectable (see Kailath and Ljung (1976) for details).

3. CONSTRUCTION OF THE OPTIMAL REGULATOR

The Hamiltonian for the problem is defined by

H(x, u, p, t) = ½(x'Qx + u'Ru) + p ' ( f ( x ) + Bu)

where p is an n x 1 costate vector. Pontryagin's minimum principle asserts that

along an optimal trajectory the necessary conditions are

t The authors would l ike to thank Prof. Otomar Hajek for providing a review of the Russian manuscript.

aH dx aH dp OH = 0 ; - - = - 7 - - ; - - =

&t dt d p dt ax

Quadratic regulator theory 533

where we have

COH ~ = R u + B ' p COu

-~u -~u = R > 0 (by Assumption (A3)).

Because the Hamiltonian is quadratic in the control u, the optimal control law, if it exists, minimizes the Hamiltonian globally. In this case the necessary conditions are also sufficient.

Let u, and p , represent the optimal control and costate vectors for the problem, respectively: then the optimal control law u, is given by

U, = - R - i n ' p , (2)

where the optimal costate vector p , satisfies the non-linear Hamiltonian system evaluated along an optimal state and costate trajectory pair ( x , , p , )

dx , d t = f ( x , ) - B R - ~ B ' p , (3)

dp. = COf' (x ) . . . . dt - Qx , cox P* (4)

with boundary conditions

x(O) = xo

p (T) = 0, where T is fixed and T < ~.

In general, it is impossible to find a closed-form solution of the system (3) and (4). Note that the right-hand sides of (3) and (4) are analytic in the state and linear in the costate. Hence, the formal power series expansion method can be applied to solving the non-linear Hamiltonian system (3) and (4).

By Assumption (A2) the function f ( x ) can be expanded in a power series about the origin; let

f ( x ) = A x + ~ Fkx Ikl (5) k=2

thus

cof'(x) = A ' ~ coxtkl' + k~__ z ~ F~, (6) cox

where the n × n matrix A is the Jacobian o f f (x ) evaluated at the origin in R" and x I t ' l is a lexicographic listing vector (Loparo and Blank- enship, 1977, 1978).

For the non-linear Hamiltonian system (3) and (4) assume a formal power series solution of the form

p , = p , ( t ; x ) = ~ ek(t)x Ikl. (7) k = l

The boundary condition, p , (T ; x) = 0 for any x

requires that

Pk(T) ---- 0 for all k.

From (2), the optimal regulator is formally represented by

u, = u,(t; x) = - R - 1 B ' ~ P,(t)x Ikl (8) k = l

for all t • [0, T]. Substitute (8) into (3) to obtain

dx , = ,;t( t)x, + ~ [Fj - KPj(t)]x~ 1 (9) dt j=2

along an optimal trajectory x, and matrices K and A are defined by

r = B R - ' B ' ; fi~(t) = A - re , ( t ) .

The "moment" equations along an optimal trajectory are given by

dx~] dt = Atkl(t)x~J

+ ~_~ (Fj_k+ 1 t - KP/-k+I( ))lka-k+qX* , O! j =k+ l

k = 2 , 3 , 4 , . . . (10)

Substitution of (6) and (7) into (4) yields, using (9) and (10) and collecting the like powers of x. , the following system of differential equations for the Pk(t), k = 1, 2, 3 . . . . :

dPI + A'P, + P1A - P, KP1 + Q ] x , dt

k=2 L d t

Defining

" Ox!kr,.: D.IJI] . j= 1, 2, 3 . . . . e ~ j ( t ) x l : + J - ' ] = . . ~ . j . , ,

dx Ix=x. k = 2 , 3 , 4 . . . .

yields X2(t)x~ 1 = [P,(t)F2 + P2,(t)]xl, 21

and k--1

Xk(t)xl, kl = P,(t)Fk + ~, Pj(t)(Fk_/+, j = 2

- KPk_j+l(t))lj.~_j+ll

+ Z g.k_j+,(t) x~., k = 3, 4, 5 . . . .

j = 2

As (11) must hold for any x,, we obtain

dPl + A ' P l + e ,A - e, KP, + Q = 0 (1)-) dt

dPk + f l 'P , + Pk/llkl + Xk = 0; dt

k = 2 , 3 , 4 , . . . (13)


where the boundary conditions are

Pk(T) = 0; k = 1 , 2 , 3 ....

Remarks. (i) Equation (12) is a Riccati type equation for the linearized system which has a unique positive definite solution. (ii) Equations (13) are linear in the Pk(t), k = 2, 3, 4 . . . . The following lemma facilitates the solutions of the Pk(t).

Lemma 3.1. Let P(t), D(t), E(t), F(t) be the n x N, N x N, n x n, n x N matrices, respectively, continuous on [0, T]: then the linear matrix differential equation

dP - - + E P + P D + F = O ; P ( T ) = P r (14) dt

has a unique solution of the form

P(t) = C'~(T, t)PrCn(T, t) . T J

+ J, C'E(r, t)F(r)Cn(r, t) dr (15)

where Co(r, t) and Ce(r, t) are defined as the unique solutions of

aCo(r, t) ar = D(r)cpo(r, /);

OCn(r, t) - E'(r)Ce(r, t) ar

Co(t, t)=IN; CE(t, t ) = I , (16)

for all r, t e [0, T], r -> t, where IN and I. are the identity matrices on RN×N and R" ×", respectively.

Proof. Differentiation of (15) with respect to t yields

dP(t) = aC~(T, t) prdpo(T, t) dt at

aCo(T, t) ' T F ( t ) + On( , t)Pr at

+ J, I_ at F(r )Co(r , t)

+ C~r ' OF(T) acP°(r'at t)] dr.

Note that

and

Co(r, t)Co(t, r) = IN

aCo(r, t) at

= - C o ( r , t)D(t)

Ce(lr, t)Ce(t, r) = I,

and

aCe(r, t) = -E(t)C'E(r, t).

at

Hence, P(t) satisfies (14). Set t = T in (15) to see P ( T ) = P r and the solution is unique. This completes the proof. QED

A formal representation of the optimal regulator of the form (8) has been derived. The coefficient matrices of the power series are the solutions of (12) and (13). In order to complete the construction procedure for the optimal regulator we must establish the convergence of the power series given in (8). For this purpose it is sufficient to examine the convergence of the power series given in (7).

Lemma 2.2. For sufficiently small IIx011 the power series

p.( t;x)l . . . . = ~ Pk(t)xl*kl(t); X(0)=Xo k=l

converges absolutely and uniformly on [0, T], where x, is the optimal trajectory and the coefficient matrices Pk(t) are the solutions of (12) and (13).

Remark. This is essentially the same as the result of Willemstein (1977). For an alternate proof, using Weierstrass' M-test, the interested reader is referred to Yoshida (1984) for details.

4. HAMILTON-JACOBI-BELLMAN EQUATION

Lemma 2.2 guarantees that the optimal regulator satisfies the Hamiiton-Jacobi-Bellman (H-J-B) equation, at least locally, as well as the necessary conditions for optimality. The solution of the H - J - B equation can be used to evaluate the cost performance of the optimal regulator. This section concentrates on sufficient conditions for optimality and the performance evaluation of the optimal regulator constructed in the previous section. Moreover, we will prove the global convergence of the power series which charac- terizes the optimal regulator in connection with the H - J - B equation.

Define the value function of the problem by

v(x(t), t) min ~ ft r = [x'(r)Qx(r)

+ u'(t)Ru(r)] dr (17)

where the minimization on the right is performed subject to

dx d t = / ( x ) + Bu.

The value function v(x(t), t) satisfies the H - J - B


equation

au

(18)

with boundary condition

v(x(r), T) = 0

for any x(T), where x is evaluated along an optimal trajectory.

The optimal control law, u, , is given by

u. = -R-1B ' av . (19) ax

The relationship between the minimum principle and dynamic programming for the problem is given along an optimal trajectory by

av(x, t) p.( t ; x) = Ox (20)

where p,(t; x) is given by

p.( t ; x) = ~ Pk(t)x Ikl. (21) k = l

In Lemma 2.2 the series (21) has been shown to be absolutely and uniformly convergent on [0, T] in the neighborhood of the origin in R". Thus one can conclude that there exists a power series representation for the value function

v(x(t), t) = ~ X'(t)Vk(t)x[kl(t) (22) k = l

which is an absolutely and uniformly convergent series on [0, T], in the same neighborhood of the origin. The boundary condition, v(x(T), T)= 0 for any x(T), implies

V,(T) = O, for all k.

The matrices Vk are n x N(n; k) matrices where

N(n; k)= (n + k - 1 ) k

Define

x'(t)Vk(t) xlkl(t \ k = l

+ ½(x'(t)Qx(t) + u'(t)Ru(t)) (23) where

dx d--=f(x)+Bu; x(0)=Xo, t • [ 0 , r].

Theorem 4.1. For any u • U given p,( t ; x) as in (21) we have/~(u) > 0, where the equality holds if

u = u,(t;x)lx=x. = -R-1B ' ~ Pk(t)XI, kl k = l

~ t is, the optimal regulator u.(t; x) satisfies the ~I~J,-B equation. Furthermore, the minimum cost is given by

X~Pk(O)x~ k]. Jm~. kZ"_- l (k + 1)

Proof. Refer to Yoshida (1984) for details.

Theorem 4.1 asserts that if the optimal control has the representation (8) as a convergent power series, then this control satisfies the H - J - B equation. Thus, the H - J - B equation can be used to study the convergence properties of (8). The works of Lukes, Willemstein, and Zubov, as well as Lemma 2.2, state that locally, in a neighborhood of the origin the series (21) is convergent. Next we will show that this region of convergence can be extended to an open subset of R n for which (1) has a unique well-defined solution for all admissible controls u.

Consider the H - J - B equation; that is, -- 0

dt X'*(t)Pk(t)x~I(t)

+ ½(x',(t)Qx,(t) + u',(t)Ru,(t)) = 0 (24)

along the optimal trajectory x, , where Pk(T) = 0 for all k.

Integration of (24) over [t, T], 0 - - - t - < T, yields

1 x,(1)ek(t)x~l(t) (k + 1) k = l

1 f r 2 [x',(r)Qx,(r) + u,(r)Ru,(r)ldr. (25)

Define

~+1 1 1) x,(t)Pk(t)x~l(t) v(x,(t), t) = (k + >- 0 (26)

along the optimal trajectory x,: then from (25)

l f, r v(x,(t), t) <~ [x'(r)Qx(r)

+u'(r)Ru(r)]dr (27) for any u • U and x(t) satisfying (1).

Set u = 0 in (27) to obtain

v(x,(t),t)<-~ f trx ' (r)Qx(r)dr (28)

where x on the right-hand side is governed by the homogeneous differential equation

dx ~-~ = f (x ) ; x(0) = Xo • W = R" (29)

where W is an open subset of R n such that if

AUTO 25/4-D


Xo • W then the solution of (29) is well defined and has no finite escape time on [0, T]; from now on we will assume W = R n.

Since the matrix Q in (28) is positive semidefinite, we obtain

v(x,(t), t) <-1 ( rx , ( t )ax( t ) dt. 2J0

Thus, the series v(x,(t), T) defined by (26) is uniformly bounded on [0, T]. This fact will play an important role in the following theorem. Before proceeding with the theorem we state a property of solutions of analytic systems.

Lemma 4.1. Consider a system described by

dx ~-t = f (x ) , x(O)=xo•R" (30)

where f (0) = 0, f : R ~ ~ R ~ is an analytic vector field. Then for each Xo • R n there exists a vector z 0 • R ~ such that if z(t) satisfies (30) with z(0) = zo then IIx(011 < IIz(/)ll for all t • [0, T].

Proof. Step 1. Suppose f ( x ) = A x , a linear vector

field. Then the unique solution of (30) with x(0) = x0 is given by

x(t) = e A t x o .

Then, if z0=trx0, o : > l ~ z ( t ) = ~ x ( t ) which proves the lemma for linear systems.

Step 2. Loparo and Blankenship (1978) proved an approximation theorem for systems of the type (30) using the technique of Carl•man linearization. Essentially, for systems of the type (30) defined on a compact interval [0, T], given any • > 0, T < ~, Xo • R", there exists a finite dimensional linear system defined on R p, p =p(• , T, Xo), such that

d yp(t) = Ap yp(t)

yP(t) = Cpyp(t)

with yp(O) chosen appropriately, yP(0) = Xo, and

sup IlY~(t) -x ( t ) l l > E. (31) tE[0. T]

If x o = 0 then the desired result follows trivially, so assume that x04:0. Then yp(t)~0, for all t • [0, T] and from (31) it follows that

IIx(t)ll < tr IlYp(/)II + E, for some tr > 0 and all t • [0, T]. (32)

As yp(t) is a continuous function defined on a

compact set, define 6p such that

0 < 6p = min Ilyp(t)ll. (33) t~[0. T]

Then, from (32) and (33) it follows that

Jlx(t)ll </3p Ilyp(t)ll, cr + < tip. (34)

Define Zo = ypXo with tip < ),p, then it follows that with this set of initial conditions, IIx(t)ll < Ilz(t)ll on [0, T]. QED

Theorem 4.2. Suppose the solution of (1) with u = 0 has no finite escape time. The optimal regulator

u,( t ; x) = -R -1B ~ Pk(t)xlkl(t), x(O) = XO k = l

converges absolutely and uniformly on [0, T] for any x0 • R ~.

Proof. Let x,(t) be the optimal trajectory for the system with x(0) = Xo given. Since v(x,(t), t) is uniformly bounded as a function of t on [0, T] it follows that there exists a constant Ck < ~ which is independent of x,(t) and t, such that

k - ~ X',(t)Pk(t)xl, kl(t) < Ck IIx,(t)ll k+l.

The matrix valued functions Pk(t) are continuous functions on [0, T] and, as such, there exists constant matrices P~' and Pk M such that

tx' P'~Yl <-Ix'Pk(t)yl <---Ix'P~yl for a l l x • R n, y • R Ntn;k) and t • [ 0 , T].

Using Lemma 4.1 select a vector zoeR" such that if z,(t) is the associated optimal trajectory then:

( i ) I Ix , ( / ) l l < I I z , ( / ) l l ;

(ii) Ix',(t)P~xt, kl(t)l <--Iz',(t)P"flzl,kl(t)l, k = 1, 2, 3 ~ . , .

It follows that

k- -~ x ', (t)x~l(t) (t)Pk

1 { x t } ~+1 , k II *( )11 < z,(t)Pk(t)Z~ I(t)

Then

x,(t)Pk(t)xl,kl(t) <--- C a k+' k = l k = l

where 0 < C < ~ and

IIx,(t)ll < 1. a = sup ,<0.rl IIz,(t)ll


Therefore, the series on the left in the above inequality converges absolutely and uniformly on [0, T] for any x0 ~ R ~. Differentiation of v(x(t), t) with respect to x along an optimal path yields

a l )

-~x . . . . = ~=1 Pk(t)Xl*kl(t)"

This series also converges absolutely and uniformly on [0, T] for any Xo e R"; the proof is complete. QED

Remark. If the solution of (1) with u = 0 has a finite escape time t; --- t I (Xo), then define a set Wsuch that

W --- {Xo E R ~ I T < ti(Xo)}, T is given.

In this case the region of convergence of the series which defines the optimal control is restricted to W ~-R ~.

5. REALIZATION OF OPTIMAL REGULATORS We have obtained the optimal regulator in the

form

u,( t ;x) = - R - 1 B ' ~ Pk(t)xlkl(t). (35) k = l

In order for the optimal regulator to be realizable the series in (35) must be of finite length or must converge to a computable analytic function. Here, the trivial situation of the former case is P~(t)= 0 for all k and the latter case is equivalent to the situation that the Hamiltonian system defined by (3) and (4) has a closed-form solution.

We begin the investigation with the following theorem.

Theorem 5.1. The optimal regulator is realizable as a finite series by

N - 1

u,( t ;x) = - R - 1 B ' ~ Pk(t)xlkl(t) k = l

if and only if there exists a positive integer N -> 2 such that

Xk(t) = 0

for all t E [0, T] and all k-> N. Moreover, the minimum cost is given by

Jmin = x;Pk(O)x[ kl. k = l a. 7- ,t

Proof. What we have to show is that

ek(t)=O iff Xk(t)=O.

The matrices Pk(t) are given by

Pk(t) = dp~i,(r, t)Xk(r)dp;h,,(r, t )dr . (36)

for all t e[0, T]. Suppose X k ( r ) = 0 for all r e [t, T]: then it is obvious that Pk(t) = 0 for all t ~ [0, T]. Suppose Pk(t) = 0 for all t e [0, T]. As ~b~i, and ¢AI~j are transition maps, Xk(r) must be equal to zero for all r ~ [t, T]. This completes the proof. QED

As a corollary to the above theorem we have the well-known result in LQR theory.

Corollary 5.1. If f (x )= Ax the optimal regulator is realizable by

u,(t; x) = -R-IB'Pl( t )x( t )

where P~(t) is the solution of Riccati equation (12). Moreover, the minimum cost is given by

J m i n I t = ~ x ~ ( O ) x o .

Proof. The fact, Fk=0 for k = 2 , 3 , 4 , . . . , implies that Xk(t)= 0 for k ~> 2. From Theorem 5.1, P,(t)= 0 for all k-> 2 and the desired result follows. QED

In general, the realizability condition in Theorem 5.1 is difficult to verify or check. However, restricting ourselves to special situ- ations, we can proceed to further investigate realizability conditions. Of particular interest is the extension of the results of Athans et al. (1963) which uses a norm-invariant condition.

Consider the optimal control problem

Minimize J = 1 f r 2 [x'(t)Qx(t) + u'(t)Ru(t)] dt

subject to

dx k - - = A x + Y~ F, xtq + Bu; x(0) =Xo. dt i=2

Suppose that

x'Pt(t)F~x lil= 0 for any

x ¢ R " , i = 2 , 3 . . . . . k. (37)

Then we have

x'X2xl2! = x'(elF2 + P21)x TM

= 3x'P~F2xf2J

=0

and thus P2(t)= 0 for all t ¢ [0, T] by Theorem 5.1.

Similarly, we can obtain in order

x'Xixlil=(i+l)x'P~F~xlO=O; for i = 3 , 4 , 5 . . . .

and thus P~(t) = 0 for all t ~ [0, T], i = 3, 4, 5 . . . . Therefore, for this control problem the optimal

regulator is given by

u,(t ; x) = - R -xB'P,(t)x(t)


where P~(t) is the solution of the Riccati equation

de, + A ' P ~ + P ~ A - P ~ K P I + Q = O ; P~(T) = 0

dt

and the minimum cost is

1 ' J,.i. = 2x (O)xo.

Note that if we have the conditions:

(i) x 'Ax = 0 for any x e R ~, (ii) BR-~B ' = K = 61,; Q = yI , for real posi-

tive constants di and y;

then condition (37) reduces to

x'F/x I~l = 0, i = 2, 3 . . . . . k.

In this case we have

u . ( t ; x) = -R- IB 'PI ( t ) x ( t )

where

P~(t)= (~ ) ~atanh [(y6)~t2(T- t)]I,.

6. SUBOPTIMAL REGULATORS In the previous section we have seen that a

closed-form analytic realization of the optimal regulator is impossible in most general situ- ations. It is interesting to consider truncation of the optimal regulator in the series form as a suboptimal regulator• In this section we will investigate this intuitive idea•

Consider the system described by

dx dt = f (x ) + Bu.

(38) a ~

u, = - R - 1 B ' ~, Pk xlkl k~l

where we have

f ( x ) = A x + ~ FkX I~1. k~2

Hence, (38) becomes

where

Gkd_k+l ( t ) =- (~ ']_k+l( t)) lk j_k+ll .

Thus, one can write (Loparo and Blankenship, 1977, 1978)

d 12

Lx ,j

= AI2)(t) G22(t)- . . G2.pT,(t x[. 2

. . i . . . o A l(t)J Lx 'Jl

+ rp(x, t).

Let yp(t) be the solution of the truncated (linear) system (i.e. rp(x, t) ~0) : that is

dYp = Ap(t)yp; yp(O) = ypo dt

x~P)(t) = Cpyp(t); Cp = [I. l 0]

where [x0] y.(0) = x! 21

lxip,J ,4 t) ~ ( t ) J ~ 3 ( t ) . . G2'p- l ( t ) ] "

Ap(t)= -4121(t) Gz2!t) • • •

. . . 0 ". A~,l(t) d

Theorem 6.1. (Approximation theorem.) Let • > 0, T < ~, Xo ~ W c R" be given and let x(t) be the solution of (38) such that x(0) = xo, where W is an open set such that if x0 ~ W then x(t) has no finite escape time on [0, T]; then there exists a positive integer p = p(e , T, Xo) such that

sup Ilx(t) - x~p)(t)ll < ~, 0 -< t -< T. (39)

where

d x = t i ( t ) x + ~ Pk(t)x 1kl dt k =2

K = B R - I B '

A(t) = A - KPI(t)

~ ( t ) = Fk - KPk(t).

The moment equations are given by

dxlkl dt - Atkl(t)xl~l

+ ~f~ Gk.i_k+l(t)X UI, k = 2, 3, 4 . . . . i = k + l

Proof (Loparo and Blankenship, 1978). The above theorem suggests the existence of a suboptimal regulator which is almost optimal in the sense of (39). This suggests a natural way for constructing a suboptimal regulator. Hence, the systems which give rise to solutions with finite escape times are of no interest.

A suboptimal regulator of order s is given by

u s ( t ; x ) = - R - I B ' $ Pk(t)xlkl(t). k = l

The cost is given by

k = l


7. EXAMPLES

Example 1.

Minimize J = 1 fo r u,t/ 2 2(X2 + u2) dt

subject to

d X = x 3 + u ; x(0) =Xo. dt

Here, A = 0, B = 1, Q = R = 2, and f ( x ) = x3; thus

F a = I and Fk=O for k = 2 , 4 , 5 , . . .

The Riccati equation becomes

dPl , 2 dt 2Pl + 2 = 0, Pl(T) = 0.

The solution is given by

P ~ ( t ) = 2 t a n h ( T - t ) .

For Pz(t) we have

dP2 + A'P2 + P2At21 + X2 = 0; P2(T) = 0 dt

where K = B R - 1 B ' = 12

fi, = fi~' = A - K P ~ = - t a n h ( T - t ) .

fiqEl = - 2 tanh (T - t)

dx 2 X2x 2 = - -~ F2Plx = 0 since F2 = 0.

Thus X2 = 0 and by Lemma 3.1 we have that P2(t) = 0.

For P3(t) we have

dP3+A'P3+P3fitI31+X3=O; P3(T) = 0

where

Hence

At31 = - 3 tanh (T - t)

X3(t) = 8 tanh (T - t).

P3(t) = 2[2 - {cosh ( r - t)}-4].

The matrices Pk(t), k = 4 , 5, 6 . . . . . can be obtained in the same manner. The optimal regulator is given by

u . ( t ; x) = - [ t a n h (T - t)]x(t)

- [ 1 - {cosh (T - t)}-4]x3(/) + - . .

The minimum cost is given by

Jmin = x 2 tanh T + ½ xg[1 - (cosh T)-'] + - . .

Remarks. (i) This result is the same as obtained by

Willemstein (1977) and in Fliess and Bour- dache-Siguerdidjane (1984) where different

methods for obtaining the coetticients of the power series expansion are presented.

(ii) Consider the system

dx = x3; x(0) = Xo. dt


+ 1 x.)_- ' )

Hence the solution has a finite escape time t / = 1/(2x~), thus the convergence of the optimal regulator and the minimum cost function are guaranteed only on the region W defined by

W = {Xo ~ Rn: Ixol < 1/~/(2T)}; T is given.

Example 2.

Minimize J = 1 fo r u ,v 2 [llx(/)l12 + Ilu(/)l12] dt.

Subject to

x2 = | b x 3 x l + u2 ; x(O)=x,, . (40)

X3 LCXlX2.~ tl3

Let x = [xt, x2, Xa]' and using the notation introduced earlier, (40) can be written as

dx d t = F2xl21 + u; x(0) = Xo

where

F2 = 0 b/~/2 0 0 c/~/2 0 0 0

x I~j -- [xL X/E~lx~, X/Exlx3, xL V2x2x3, x~]'

u = [ u , , u2,~31'

and A = 0 , B = I a , Q = R = I 3 , Fk = 0 for k = 3 , 4, 5 , . . .

The Riccati equation becomes

dP' = - e 2 + Ia = 0; P,(T)-- O. dt


P~(t) = [tanh (T - 0113.

For P2(t) we have

dP2 + fi~'P2 + P2flI21+ X2=O; P2(T)=O (41) dt

where

A(t) = -Pl ( t ) = - [ t anh (T - t)]I3

,4t21(t) = - [tanh ( T - t)]It2 j = - [2 tanh ( T - 0116

X2(t) = P,(t)F2 +/32,(0

axlZ]' P21(t)xt21= Ox F~Pl(t)x


= 0 X/2xl 0 2x2 X/2x3 .

0X 0 0 V2X 1 0 ~/2X 2 2X 3

Thus

P2~(t) = tanh (T - t)

I! 0 0 X 0 (a + c)]X/2 (a + b )]~/2 0

O0 (b+c)]~/2 i l O .

0 0

Hence the matrix X2(t) becomes

X2(t) = tanh (T - t)

x 0 q/X/2 0 0 (42)

q /~/2 0 0 0

where q = a + b + c. Therefore, the solution of (41) is given by

P2(t) = }[1 - {cosh ( r - t)}-alX2

where

f(2 = 0 q/I~2 0 0 . q/X/2 0 0 0

The matrices Pk(t), k - - 3 , 4 , 5 . . . . can be obtained in the same manner. The optimal regulator is given by

u.(t, x) = -Pl(t)x(t) - P2(t)x[21(t) + " "

and the minimum cost becomes

Jmin = L2 X~I(0)Xo + 13 xoe2(O)x~ 2] + ' "

Example 3. (Rotational motion of a rigid body.) Consider Example 2 and suppose a + b + c = 0: then

Pt(t) = [tanh ( T - 0113.

From (42) we have X2(t) = 0 and thus P2(t) = 0. Similarly, we can show that Pk(t)=O,

k = 3 , 4 , 5 . . . . Therefore, the optimal regulator becomes

u.( t ; x) = - [ t anh (T - t)]x(t)

and the minimum cost is given by

Jmin = i (tanh T)llxoll 2.

This result is obvious from the investigation of Section 5.

8. INFINITE TIME PROBLEM Consider an optimal control problem formu-

lated as follows: Minimize J

u E U

2 [x'(t)Qx(t)+u'(t)Ru(t)]dt (43)

subject to

i=f(x)+Bu; x(0) =Xo

where U is the set of square integrable controls defined on [0, ~).

Assumptions. (A1) x e R " , u e R " . (A2) f : R" ~ R" is an analytic vector field and

satisfies f ( 0 ) = 0, where x = 0 is an equilibrium state of the system (1) when u = 0.

(A3) Q = Q ' = C ' C > - O , R = R ' > O . (A4) The system (1) is completely control-

lable and the free system k = f ( x ) , y = x'Qx is completely observable as defined (Moylan and Anderson, 1973). That is, for any initial state X(to) and any other state xl there exists a square integrable control and a time t~ such that x(q) = Xl. Furthermore, if y(t) = 0 for to -< t --- q , where to and q are arbitrary, then this implies that x(t) = 0 for all t.

(A5) The system (1) when u = 0 is asymptotically stable within the domain of attraction, D, defined by

D = {xoe R" l limx(t;Xo)=O}

where x(t; Xo) denotes the solution of the system (1) with u = 0 such that x(0; Xo) = xo.

The next theorem extends the finite time solution of the LQR problem to infinite time by investigating the asymptotic behavior of the finite-time solution.

Theorem 8.1. For the infinite-time problem a unique optimal regulator is given by

, ,(x) = - R - ' B ' Y ~ Skxt~J. (44) k = l

The n x N(n; k) coefficient matrices Sk are the solutions of the algebraic equations

A'S1 + S1A - $1KSt + Q = 0 (45)

A'Sk+Sk,'ilkl+f(k=O; k = 2 , 3 , 4 . . . . (46)

,¢2x 12j = ( & F2 + ,~)xl~l


i k - 1 ',xt*J = S , F , + i=2

-- g s k - i + l ) l L k - i + 11

+ xl l; k = 3, 4, 5 . . . . /=2

where

K = B R - ~ B ' ; ,4 = A - KSI

~ i X i l + i _ l l = Ox tir F;Six[i l . , i = 1, 2, 3 . . . . Ox j = 2 , 3 , 4 , . . .

Proof. In Section 3 we constructed the optimal finite-time regulator. From Anderson and Moore (1971) it follows that there exists a constant n x n matrix $1 such that

S~ = lim P~(t; T ) = lim/>1(0; T - t) T---~ T---*~

where P~(t; T) is the solution of (12) where the notation emphasizes the dependence on the terminal time T. This implies that

lim dP~ (t; T) = 0. r--,~ dt

Therefore, the matrix S~ satisfies (45). The constant n x n matrix A is now defined by

,4 = lim (A - KP~(t; T)) = A - KS1. T - ' ~

From the standard LQR theory the matrix ,4 is asymptotically stable. Thus the matrices Alk 1 a re also asymptotically stable. The matrices Pk(t; T) are calculated by

P,(t; T) = c/~,(r, t )Xk(r; T)¢/i,,,(r, t) dr,

k = 2, 3, 4 . . . . (47)

where the Xk(t) in (36) are rewritten as Xk(t; T) to express the dependency on T.

Consider the case where k = 2: define

aX[2] ' $21x TM = F~S1x.

ax

Then define the matrix )(2 by

• e~'2 x[21 ---- lim X 2 ( / ; T ) x TM T---~oe

= + 0xt2J.

Hence, as T--* ~ (47) can be written as

f 0 °~ - _ - lira P2(0; T) = e A'' X2 e al2¢ dt

T - - ~

where the integral exists since ,4 and "41kl are asymptotically stable. This implies that $2 satisfies (46) for k =2. Note that a similar

argument holds for all k. This completes the proof. QED

Define the value function by

l f/ v(x(t), t) = ~ [x'(t)Qx(t) + u '( t )Ru(t) l dt

and define

It(u) = ~ x'(t)Vkxlkl(t)

+ ½ (x ' ( t )Qx(t) + u'( t )Ru(t))

where Vk are constant n x N ( n ; k ) matrices and

i c = f ( x ) + Bu; x ( O ) = x o e O .

Theorem 8.2. For any u • U we have /~(u) -> 0, where the equality holds if

u = u , ( x ( t ) ) = - R - 1 B ' ~ Skxlkl(t) k = l

that is, the optimal regulator satisfies the H - J - B equation. Furthermore, the minimum cost is given by

1 , ,-, [kl Jmin = ,= l (k + l)xOokXo ; x o • D .

Proof. Follow a similar argument as in Theorem 4.1.

In the above discussion we have not considered the convergence of the series. To address this problem define

oc

v(x , ( t ) , t) = k~ = 1 X,,(t)Skxl,kl(t) =1 (k + 1)

where x. denotes the optimal trajectory: then

< l f / v(x.( t ) , t) - ~ x ' ( r ) Q x ( r ) dr

where x on the right-hand side of the inequality is governed by

i = f ( x ) ; x(0) = Xo • D.

Therefore, we have the following theorem.

Theorem 8.3. The optimal regulator u,(x( t ) ) given in Theorem 8.1 converges absolutely and uniformly for any Xo • G on [0, o0), where G is any open subset of D which has compact closure in D.

Remark. The proof of the above theorem follows the same procedure as in Theorem 4.2, the only difference is that the convergence in this case is guaranteed only with the domain of attraction D on the time interval [0, o0).


This completes the construction of the optimal regulator. The optimal regulator is given by (44) and the optimal feedback system becomes

Yc=f(x)+Bu.; xoeD. (48)

Finally, the stability property of the optimal system (48) can be investigated.

Theorem 8.4. The optimal feedback system (48) is asymptotically stable at least within the domain of attraction, D, of the free system

.~ =f (x ) ; x(0) = Xo ~ O.

Proof. Define a scalar-valued function V(x(t)) by

1 V(x(t)) = k~=a (k + 1) x'(t)Skxlkl(t)" (49)

From Theorem 8.2 we have

dV + ½ (x'Qx + u'.Ru.) = 0. (50)

Therefore, V(x(t)) is a Lyapunov function for the optimal system (48). As the convergence of the series in (49) is guaranteed only on G, (50) holds on G. Thus, the optimal system (48) is asymptotically stable within D. This completes the proof. QED

Remarks. (i) The domain of attraction, D, is introduced

to guarantee the convergence of the series in (49). If the system (43) is stabilizable then it is possible to prove global convergence as the stabilizing feedback is admissible. If the series in (49) is of finite length and satisfies (50), then the optimal system will be asymptotically stable in the large regardless of the domain of attraction D. In this case assumption (A5) is not necessary.

(ii) A Lyapunov function of the optimal system is defined by (49) and the convergence of the series is guaranteed within D. This shows that the region of convergence of the series defines a domain of attraction of the optimal feedback system.

9. REALIZATION OF OPTIMAL AND SUBOPTIMAL R E G U L A T O R S

In Sections 5 and 6 some remarks on the realizability of optimal and suboptimal regulators for the finite time problem are presented. The same arguments for the realizations hold here as well, the only difference is that the Pk are replaced by Sk for all k.

Example. Consider the problem

Minimize J = [x'(t)Qx(t) + u'(t)Ru(t)l at u e U

×2

5.0

FIG. 1. Phase trajectories of the system when u = 0.

subject to

Yc =Ax + F3 x131 + Bu; x(0) =Xo

where

A = 9.5 -4 .5 ; B = - 4 -

-0 .3 1.8 0.3 0.5] F = -1 .8 -2 .7 -0 .5 -0 .7

[10 .__[0.001 0] Q = 1 ' 0.01

X = IX,, X2]' ; X TM = Ix 3, ~/3X2X2, ~/3XlX22, x, l '

u = u 2 1 ' .

Step 1. Free response (u = 0). The eigenvalues of the matrix A are

-0.5 +j7.85. The phase trajectories are given in Fig. 1 in which the singular points are the stable couples (0, 0), (2.7, -0.254), ( -2 .7 , 0.254) and the unstable couples (1.01, 0.75), (-1.01, -0.75).

Step 2. Response of the controlled system. From (42) and (43) the matrices $1, $2, $3 are

given as

= [ 11.66 -o .272] S~ L-0.272 4.333J × 10-2

$2=0

s3= [-0.ooo3 0.ooo8 -0.o0o57 -0.oooo5] I_ 0.0010 -0.0002 0.00018 -0.000221

× 10 -4, etc.

Defining

ul= -R-IB'SIx; u2=0; u3= -R- tB 'S3x TM


4.C

(c) ) i

(a)

o;1 > t

x

3.0

-0.1 > t

(a)

FIG. 2. Response of the system: (a) free response; (b) u =ul ; (c) u =u l+u3 .

~ i z a b i l i t y conditions. Sufficient conditions of ~ e tealizability by a linear optimal regulator were given for polynomial systems. A theoretical explanation was given to a typical example of the realization using a linear optimal regulator, i.e. Example 3 in Section 7.

In Section 6 suboptimal realizations were discussed in connection with an approximation theorem based on the technique of Carleman linearization. In Sections 8 and 9 the infinite time regulator problem is discussed.

It is observed that the theory presented in this paper includes the standard LQR theory. Of interest in practical aspects is the development of a conaputer aided design package for the present work. This is closely related to the development of computational algorithms for the technique of Carleman linearization (Loparo and Blanken- ship, 1977).

Acknowledgements--The first author wishes to express his gratitude to Prof. S. Kawamoto for his encouragement throughout this research. This work was supported in part under U.S. Department of Energy Contract #DEAC01-79- ET 29363.

the responses of the control system are given in Fig. 2 for

(a) u = 0;

(b) u = ut; (c) u = ut + u3.

10. CONCLUSIONS

A quadratic regulator theory has been established for analytic non-linear systems with additive controls. The optimal regulator was constructed in a power series form based on the minimum principle. Comparing with earlier results two improvements can be recognized for the construction procedure in Section 3. First of all, the algorithm is independent of the evaluation of the cost performance. Further- more, the construction procedure is systematic because of the introduction'of lexicographic listing vectors and the convergence properties are established.

It was shown in Section 4 that the optimal regulator satisfies the H - J - B equation of dynamic programming. The minimum cost was evaluated explicitly in connection with the solution of the H - J - B equation. The optimal regulator in a power series form was shown to converge globally in Theorem 4.2. This is an important result from a practical perspective.

Section 5 presented the realization of optimal regulators. A necessary and sufficient condition on the realizability was given in Theorem 5.1. A special case was investigated to obtain further

REFERENCES

Al'brekht, E. G. (1962). On the optimal stabilization of nonlinear systems. Z Appl. Math. Mech., 25, 1254-1266.

Anderson, B. D. O. and J. B. Moore (1971). Linear Optimal Control. Prentice-Hall, Englewood Cliffs, New Jersey.

Athans, M. et al. (1963). Time-, fuel-, and energy-optimai control of nonlinear norm-invariant systems. IRE Trans. Aut. Control, AC-8, 196-202.

Baillieul, J. (1976). Multilinear optimal control. Proc. Conf. o~Geometry for Control Engr, pp. 337-359. Math. Sci. Press, Brookline, Massachusetts.

Baiifieul, J. (1980). The geometry of homogeneous polynomial dynamical systems. Nonlinear Analysis: Theory, Meth. Applic., 4, 879-900.

Brewer, J. W. (1978). Kronecker products and matrix calculus in system theory. IEEE Trans. Circuit Syst., CAS-25, 772-781.

Brockett, R. W. (1973). Lie algebra and lie group in control theory. In D. Q. Mayne and R. W. Brockett (Eds), Geometric Methods in System Theory, pp. 43-82. D. R¢idel, Dordrecht, Holland.

Fliess, M. and H. Bourdache-Siguerdidjane (1984). Ouelques remarques 61ementaires sur le calcul des lois de bouclage en commande optimale non lin6aire. In A. Bensoussan and J. L. Lions (Eds), Conf. Analysis Optimiz. System. Nice Lect. Notes Control Inf. Sci., 63, pp. 499-512. Springer, Berlin.

Kailath, T. and L. Ljung (1976). The asymptotic behavior of constant-coefficient Riccati differential equations. IEEE Trans. Aut. Control, AC-21, 385-388.

Krener, A. J. (1974). Linearization and bilinearization of control systems. Proc. Allerton Conf. on Circuit and Syst. Theory, pp. 834-843. University of Illinois, Urbana.

Lee, E. B. and L. Marcus (1967). Foundations of Optimal Control Theory, Chap. 4. Wiley, New York.

Loparo, K. A. and O. L. Biankenship (1977). Algebraic features of some computational problems in nonlinear stabifity theory. Proc. Control Decision Conf.

Loparo, K. A. and G. L. Blankenship (1978). Estimating the domain of attraction on nonlinear feedback systems. IEEE Trans. Aut. Control, AC-2,3, 602-608.

Lukes, D. L. (1969). Optimal regulation of nonlinear dynamical systems. SlAM J. Control Opt., 7, 75-100.

544 T. YOSHIDA a n d K. A . LOPARO

Moylan, P. J. and B. D. O. Anderson (1973). Nonlinear regulator theory and an inverse optimal control problem. IEEE Trans. Aut. Control, AC-I8, 460-465.

Nishikawa, Y. et al. (1971). 'A method for suboptimal design of nonlinear feedback systems. Automatica, 7, 703-712.

Okubo, S. and T. Kitamori (1975). Construction of nonlinear regulator (in Japanese). Trans. SICE, 11, 541-549.

Sain, M. K. (Ed.) (1985). Applications of tensors to modelling and control. Control Systems Technical Report, #38, Department of Electrical Engineering, Notre Dame University, December 1985.

Vetter, W. J. (1973). Matrix calculus operations and Tavlor expansions. S lAM Rev., 15, 352-369.

Willemstein, A. P. (1977). Optimal regulation of nonlinear dynamical systems on a finite interval. SIAM J. Control Opt., 15, 1050-1069.

Yoshida, T. (1984). Quadratic regulator theory for analytic nonlinear systems with additive controls. Ph.D. Disser- tation, Case Western Reserve University, Cleveland, Ohio.

Zubov, V. I. (1975). Lectures on Control Theory (in Russian). Nauka, Moscow.

quadratic regulatory theory for analytic non-linear systems with additive controls

Documents