00000767_on the iterative learning control theory for robotic manipulators

8/4/2019 00000767_On the Iterative Learning Control Theory for Robotic Manipulators

1/9

14 IEEE J O U R N A L OF ROBOTICS A N D A U TO MA TION , VOL. 4,NO . I . FE B R U A R Y I Y X X

On the Iterative Learning Control Theory forRobotic ManipulatorsPAOLA BONDI, GIUSEPPE CASALINO, AN D LUCI A GAM BARDELLA

Abstract-A high-gain feedback point of view is considered withinthe iterative learning contr ol theory for rob otic manipulators. Basicresults concerning the uniform boundedness of the trajectory errors areestablished, and a proo f of convergence o f the algorithm is given.

I. INTRODUCTIONHE SET of proposed methodologies for the control ofT ovements of robotic manipulators can be substantiallysubdivided into three main categories. The first refers to the

use of control systems designed on the basis of classicalservomechanism concepts [I]-[3], where the fact that amanipulator is actually a highly nonlinear system is almostcompletely neglected. Consequently, although the regulatorsturn out to be very simple and commonly used in normalapplications, they can not face the increasing demand in controlperformances.

The second category include s control methodologies such asinverse control [4], and decoupling control [ 5 ] , [6].Since in this case the nonlinearities of the sy stem ar e explicitlytaken into account, very good performance can be obtained.The disadvantage is represented by the fact that very accuratemathematical models are required, together with a preciseforecasting of the load. More over, they give rise to costly andvery complicated control systems.

The third category is represented by all the methodologieswhose goal is to encompass the previously cited draw backs byusing adaptive control concepts. They generally refer to theso-called adaptive model following techniques [7], whichdo not require a p recise description of the mathem atical modelof the system and can face parameters variations, but they stillrequire a complicated structure for the regulator due to thecomplexity of the adaptation mechanism. Instead, this paperrefers to the so-called iterative learning technique appliedto robotic manipulators, a technique which has been recently,and independently, proposed by different researchers, includ-ing Arimoto et al. [12]-[14], Craig [16], and the authors [8]-[ I l l .As it is now well known, such techniques are based on theuse of repeated trials of tracking of a preassigned trajectory.

Manuscript received June 2 , 1986; revised June 18, 1987. This paper waspresented in part at the 1986 IEEE Conference on Robotics and Automation,San Francisco, CA, April 7-10, 1986.P. Bondi and L. Gambardella are with the Dipartimento di Matematica eApplicazioni R . Caccioppo li. Univer siti di Napoli, Via Mezzo-Cannone 8,80134 Naples, Italy.G . Casalino is with the Dipartimento di Informatica. Sistemiatica, eTelernatica, Universith de Genova, Via Opera 11, 16145 Genoa, Italy.IEEE Log Number 87 18069.

At each trial, the position, velocity, and acceleration errors,with respect to the reference trajectory, are recorded to beexploited by a suitable algorithm in the next trial, with the aimof progressively reducing the trajectory errors. Provided itworks, the proposed methodology turns out to be very simpleand requires only a very rough description of the manipulatorto be controlled, but its main disadvantage is obviouslyrepresented by the necessity to perform different trials for anynew trajectory which must be tracked. At the present m oment,research activities are devoted to investigations .about thepossibilities of decomposing the problem of following acomplex trajectory into a set of subproblems, each onereferring to a simpler (and already learned) trajectory.

The paper is partly based on previous work of the authors[SI-Ill] and presents an inherently nonlinear analysis of thelearning procedure. In particular, a high-gain feedbackpoint of view is instrumentally used to prove the possibility ofsetting up uniform upper bounds to the trajectory errorsoccurring at each trial. Since such a bounding condition isessential for the proof of convergence of the procedure, it ha salways been assu med, even if the possibility of its achievementhas never been proved completely.

The proof of such possibility actually represents the maincontribution of the present work. More particularly, thesubsequent analysis of convergence allows us to show that,apart from minor conditions, the existence of a finite (but notnecessarily narrow) bound on the trajectory deviations maysubstantially suffice for guaranteeing the zeroing of the errorsafter a sufficient nu mber of trials. This fact in turn leaves openthe possibility of obtaining the exact tracking of the desiredmotion, even in presence of moderate values assigned to thefeedback gains.

The paper is organized as follows: in Section I1 the learningmethodology is described and the underlying rationale ex-plained. In Section I11 the results concerning the conditions forthe uniform boundedn ess of the trajectory errors ar e presentedwhile in Section IV the convergence analysis is performed.Furthe r, in Section V some comm ents concerning the practicalimplementability of the procedure are given, and, finally,simulation experiments are presented in Section VI.

11. DESCRIPTIONF TH E L E A R N I N GETHODOLOGYLet us consider the dynamic equation of a manipulator,relating the vector 4 E R n f the chosen (generalized)coordinates with the vector M E R of the corresponding(generalized) torques, is given by

A ( a ) q + B ( q , ) 4 + C ( q ) = M (1 )0882-4967/88/0200-0014$0 OO O 98 8 IEEE


2/9

BOND1 et al.: ITERATIVE L E A R N I N G CONT ROL THEORY 15where matrices A (q) ,B ( q , q) , nd vector C ( q ) re continuousand locally Lipshitzian functions of their arguments. MatricesA(q ) (positive definite) and B ( q , q) are supposed to becompletely unknown, and the same assumption substantiallyholds also with respect to vector C ( q ) ,with the exce ption thatthe knowledge of its value C (q(0)) t the initial time is allow ed(more will be said about this point in this section).

Further, it is assumed that the manipulator is operating in

The sa me procedure is in turn applied for all successive trials(provided the errors remain bounded), thus implying that thedriving signal applied at the generic ith (i 2 1) trial takes onthe form

= I -6mj mo-$ 6m --mo- 2 (K6qk-ifk = I k = 1

closed loop with a linear feedback law of the type +L6qk-I +P6qk-I). (8)M = m + Q - K q - L q - P q (2)

where m represents the vector o f the effective driving signals,Q another constant vector of external signals, while matricesP , L , an d K are the position, velocity, and acceleration gains,respectively.

Remark 1: Note that the acceleration feedback introducesan algebraic loop in the closed-loop system. This actuallyimplies that control law ( 2 ) cannot be exactly implemented,but only approximated by a suitable choice of the operatingconditions. The presence of this drawback will, however, betemporarily neglected and recon sidered in greater detail at theend of the paper.

By substituting ( 2 ) in ( l ) , the closed-loop equation isobtained of the form

whose structure allows the interpretation of the terms A (q) ,B ( q , q) , an d [C(q ) - Q ] as (unknown) perturbationsoccurring in the linear and known model

Remark 2: Obviously, the adoption of the proposedlearning procedure requires the presence of a repositioningsystem to be used at the end of each trial, after time T. Th eproblem of devising such a system, however, represents afurther step in the developm ent of the learning theory and willnot be considered here (see [8] for details of the argument).In the remainde r of this section w e shall restrict ourselves toconsidering some other aspects of the learning process whichrepresent the basis for the discussion of Section 111 and theconvergence analysis of Section IV . To this aim, let usconsider a generic driving signal mi+ corresponding to atrial successive to the first, and rewrite it in the followingform:

mj+I=m*+6mi+I, i=O, 1, (9)where m*(t) represents the (unknown) signal allowing theclosed-loop system to follow the desired trajectory q*(t), ndSm?+ its expression given by (8)the corresponding driving signal error.Then by substituting into mi+

K q+ L q + Pq= m. (4) (first part) and representing mj(t) n a form analogous to (9),we immediately getFor such know n model the computation of the signal mo(t),

allowing the output to follow a specified trajectory q*(t)(hereafter assumed bounded with its first and second deriva-tive and such that q(0)= 0; q ( t ) = 0 fo r I 2 T > 0, where Tis a priori assigned), is straightforward and given by

mo= K q * + L q * +Pq* ( 5 )which also results in a bounded signal. Obviously, if thecomputed signal mo(t) s applied to the closed-loop system (3),the corresponding trajectory qO(t) will be different from thedesired one q*( t ) .Nevertheless, assu me that a first trial of movement has beencarried out using mo(t) tarting from the rest position q(0) =q * ( O ) (this requires Q = C[q*(O)]nd explains the assump-tion about the knowledge of C[q(O)]),hat the correspondingtrajectory errors 6qo(t)0 qo(t)- q*(t),6qo( t ) , 6qo( t ) , fo r tE [0, T ] , have resulted to be bounded, and that they have

Such relation allows interpreting the signal 6mj+( t ) as anestimate for the driving signal erro r occurring while perform-ing the present ith movement, thus explaining the rationaleunderlying the use of - 6mj+ ( t ) s the compensat ing term forthe next trial. M oreove r, the sa me relation (10) also shows thatthe estimation error at the ith trial actually coincides with thedriving signal error that will occur during the next trial, i + 1.

Analytic expressions for 6m,?(t) can be obtained as de-scribed in the following. Consider the overall closed-loopequation (3 ) when the generic ith trial is performed ( m =mj(t)) ,nd rewrite it in the following form:

been recorded by a suitable device. Then the proposedlearning procedure suggests repeating a trial, starting from thesignal

+ [C(q*+ 6 q j )- Q ]=m*+ 6m;, i = O , 1, . e .(1 )ame initial rest position q * ( O ) and using the new driving

m,=mo-6ml (6 ) Then by substituting into m*(t) ts expression given bym* = [ K+ A q*)]q* [ L+ B ( q * , q*)]q* Pq*here the correcting term 6ml is given by

6ml=K6qo+L6qo+P6qo. (7 ) + [C(q*) I (12)


3/9

1 6 IEEE JOURNAL OF ROBOTICS AND AUTOMATION, VOL. 4, NO . I , FEBRUARY 1988with straightforward manipulations, we immediately get and, consequently, represent both the closed-loop system (3 )

and the reference model (4) in the following equivalent("normalized") form:[ K +A ( q ) ] + [t B (q , q ) ] +Pq + [C(q )- Q] = U (1 8)

(19)

6m ; = K6q,+L64,+P6q,+A$q,+ B J q , + (A , A *)q*+ ( B ,-B*)q*+(C,-C,), i = o , 1 , * * . (13)

where, for the case of notations, we have defined Kq +Lq +Pq = U- e11

whereA , 6 A(q*+&?,) A* 4 A ( q * ) { U , A (419 B ( q , 4 ),B, 4 B(q*+6q,, q * + 6 q , ) B, 4 B ( q * , q*) (14) 1c;6 C(q*+6q;) c* 6 C ( q * ) .

In (13) the first three terms represent 6mj+ ( t ) ,hus implying(see (10)) that the last five terms necessarily coincide withl(t). This is in turn equivalent to saying, for i > 1 only,that Grn,?(t) an also be expressed as

6m = A i -S q , - + Bi- qi-.+ ( A ; - , A , ) @ *+ (Bi-1 -B,)Q*+(C;-l-C,), i = l , 2 , e - . . (15)

Finally, observe that by equ ating the right side of (13) an d (15)with very simple algebra we obtain

( K +A; ) 4, ( L+ B ; ) 4;+ P64; =A;- 64;- 1+Bj-164;- - A , -A ; - , ) q * - B i - B i - l ) q *-(C;-C;-1), i = l , 2 , * * . (16)

For the foregoing normalized representation, we have thatA(q) ,B ( q , q) ,and c(q) Q all converge to zero for a + 00and any finite q , q , thus implying, contrary to (3), that (18)can also be interpreted as a class of systems parametrized by aand admitting (19) as limit equation for a + W . he fol lowinglemmas further characterize the system class ( 1 8).

Lemma 1: Let u,*(t) represent the normalized signalallowing the corresponding system of class (18) to follow thedesired trajectory q*(t).Denote as 6u(t) an input perturbationsuperimposed to u z ( t ) , nd denote as (6q(O) , Sq(0)) n initialstate perturbation. Let (6q,(t), 6q,(t), 6q,(t)) represent thevector of the corresponding trajectory errors with respect toq*(t), q * ( t ) , q * ( t ) for a given choice of a > 0. Thenprovided the limit system (19) is asymptotically stable, thefollowing property holds true for class (18):

~~

which represents a set of differential equations relating, for i2 1, the evolution of the present trajectory errors with thosewhich occurred during the previous trial of movement.111. TRAJECTORYRRORSOUNDEDNESS

In this section it is shown that suitable values for thefeedback gains K , L , an d P can always be devised guarantee-ing the uniform boundedness of the trajectory errors 6q , ( t ) ,6 q , ( f ) ,an d 6q,(t) or any time t and any trial i. This actuallycorresponds to proving the possibility of setting up a domainof attraction around the desired trajectory, and, consequently,shows that the correcting signals used at each trial never pushthe system out of the attraction domain.

This result, though not representing a convergence on e, willbe fully exploited in Section IV, where the convergenceanalysis will be effectively performed. To this aim, let us firstexpress the feedback matrices K , L , an d P as functions of acommon scalar factor CY > 0, i . e . ,K = a K L = a t P = a P , a>O (17)

~ ~ ~- ~ - -__~~ -a 2 a,

llxoll < pr,>O such thatfor all E > O 3 I".'" I

Proof: For this lemma the main part of the proof isactually quite general, as will be apparent in the followingdiscussion. Consider a family of autonomous systems, para-metrized by a > 0, of the form

X=f, [X, r ( x t 0 1 (22)where x E R" , Ily(x, t)ll < h ( x ) for all t , with h ( - ) scalarcontinuous function, and f(. .) continuous in its arguments,such that for any a > 0 the following conditions are satisfied:

1) 3 lim,+mfa(x, y ) = f ( x ) uniformly in any compact set2) f,(O, y ) = 0 f or a l l y E R" ;3) the null solutio n of the "limit system ,"

W C R'";

x= ( x ) (23)which exists by 1 ) an d 2) , is uniformly asymptoticallystable.

Then, we want first to show that the following property holdstrue for the class (22):


4/9

BOND1 el al.: ITERATIVE LEA RN IN G CONTROL THEORY

where x , ( t , xo) denotes the solution of the perturbed systemX = f, (X , v ) + g ( x , ) Xcr(0, xo >= xo . (25)

To prove the foregoing property , first observe that assumption3) implies the total stability of the limit system (see, forinstance, [15]), that is, for the perturbed limit system ,

X = f ( x ) + g ( x , ) x ( 0 , x o ) = x o , (26)the following property holds true:

expressing the normalized signal u ( f ) su = U,*+6u = [R+ (q*)] *+ [e B(q* , 4*) ]q*

+ P q * + [C(q*) Q ] + 6 uand substituting in (18), we obtain a normalizedanalogous to (13), that is,( R + A ) s q ( r ;+B ) 6 q+P6q + ( A A*)q*+ ( B - B * ) q * + ( C - C*)=6u

~~

Then due to the foregoing property and taking into accountthat (25) can be equivalently rewritten as. \ ; = f ( x ) + [dx, )+fa(x ,v) f (x ) l x(0,xo)=xo (28)(i.e., in a form analogous to (26) where the perturbation isno w [ g ( x , ) + f , ( x , y ) - ( x ) ] ) ,t immediately follows thatIlx,(t, xo)ll < E , providedllxoll < p r I M x , t > + f , ( x ,~ > - f ( x ) l l

where the second of (29) is in turn certainly satisfied if theperturbation g ( x , t ) s such that

(30)Then by keeping in mind that due to assumption 1) we have inthe compact set { x E R: \Ix(JG c } the property

for all 6 > 0 3a6,,>0 such that

It follows that by letting (for instance)S=XS, , O < h < l an d a, a ~ s , , , (32)

condition (30) (and then the second of (29)) turns out to besatisfied for all a 2 a, by any perturbation such that

and this completes the proof of property (24).At this point let us turn to consider system class (18). By

~ ~~

where A A(q* + & ) / a , A , = A(q * ) /a ,and the

17

(34)form

(35)

(27)

sameholds true for the other matrices B, * and vectors c,c*.we immediately get the state-space representation

From (35), by defining the vector x 2 ( x l ,x2) L! ( 6 q , Sg),X , = x zXz= - ( R + A ) - ] [ ( t + B ) X , + ~ X , + ( AA,)q* (36)+( B B*)q* + (C- C*)+ 6u1.

Then since - K + A ) - 6u plays the same role as g ( x , t ) n( 2 5 ) , and by noting that the autonomous part of (36) actuallysatisfies assumptions 1)-3) (withi f ( x ) @ K 6 q + e6q +P6q = 0), it follows that property (24) directly applies to(36), meaning that Lemma 1 actually holds with respect tovector (6q,(t), 6qn( t ) ) . However, by taking into account theuniform boundedness of q*(t),q * ( t ) , an d 4*( t ) nd, conse-quently (since functions A ( . ) , B ( * , .), C ( . ) are locallyLipshitzian), of the Lipshitz condition uniform ly satisfied withrespect to CY 2 CY,y the right side of (36) in the compact sets{x:llxll < E } , (6u:I lSul l < G E } , t easily follows that the samenorm limitations can als o be made to hold fo r the whole vector

Lemma 2: Le t uo(r) represent the normalized signalallowing the limit system (19) to follow the desired tra.jectoryq*(t).Consider uo( t ) pplied to a generic system of class (18)and denote as 6 u z c l ( t )he error with respect to u z ( t ) . Then

3 lim 6u&=O uniform with respect to t .

(6q,(t), 64,( t ) , 64,(f)).

(37)CL - -

Proof: To prove the lemma, first observe that the inputsignal erro r 6u:., takes on the following form (see (34), and letu = U0 = K q * + tq*+ Pq):

6U = -A (q*)q*- B q* @*) - C(q*)-Q]. (38)Then noting that for a -+ 03, A ( q * ) , ( q * , q*) , nd c (q* )Q all converge to zero uniformly with respect to t (since q*( t ) ,q*(t), * ( t ) are bounded signals by assumption), the lemmadirectly follows.

At this point, by reconsidering the proposed learning


5/9

18 IEEE JOURNAL OF ROBOTICS A ND AUTOMATION, VOL. 4, NO. I , F EBRU A RY 1988procedure and exploiting the results of the previous lemmas,the following main theorem can be stated.

Theorem 1: Let (6q;,,(t),6q;,,(t), 6q,,,(t))denote the vectorof the trajectory errors corresponding to a generic trial, withinitial condition (6q;(O), S q , (O ) ) , and relevant to a specificchoice for the feedback parameter CY.Then provided thenormalized reference model (19) is an asymptotically stableone, the following property holds true:

11(6q;,,(t),6qj,,(t), 6qi,,(t>)ll < E for all i , for all t . Thiscompletes the proof of the theorem.IV . CONVERGENCENALYSIS

In this section it is shown that, in the case of nullperturbations in the initial states, suitable values for thefeedback gains K , L , an d P can always be devised guarantee-ing not only the uniform boundedness of the trajectory errors__ ~ -~ ~

Proof: Le t us start by considering the first trial of theprocedure. Due to (37) of Lemma 2 we have that a can besufficiently increased to satisfy the following tw o conditions:

a 2 a, Il6u;,,(t)ll ,< 9, or all t (40)where a, , 8, re the posit ive numbers appearing in Lemm a 1for a fixed E > 0, thus implying that 6qo,,(t), 6qo,,(t), Gqo,,( t )are all uniformly norm bounded by E, provided that (\(6q,(O),

Now refer to the second trial and consider the expression ofthe new normalized input error 6uFa( t ) , given by (15) afterdivision by CY ; i .e . ,

&&(O)II < Pf.

1 [A06qo,,+B,6&,,+ (Ao-A,M*CY+(Bo-B*)4*+(Co-C*)1. (41)

Then since q*(t),q * ( t ) , an d q*(t ) are uniformly bounded,functions A ( - ) ,B ( . ) , C a ) are locally L ipshitzian, and duethe fact that (6qo,,(t),6qo,,(t), 6q0,,(t)) range inside thecompact set P, 'i { ( 6 q , 6 q , S q ) : ) ) ( 6 q , S q , 6q)ll < E } , itfollows that the whole part in brackets appearing in (41) isLipshitzian in Pf, eaning that

1\ \ 6u ; , , ( t ) \ \ < - hf e for al l t , for all CY 2 a , (42)CY

where A, is the Lipshitz constant in P,.priori such thatFrom the previous inequality it follows that if a is chosen a

(43)we certainly have both of the following conditions satisfied:

which guarantees E to be a norm upper bound for both thetrajectory errors occurring du ring the first and the second trial(obviously, provided that the same bound p , is maintained forinitial conditions). At this point, by repeating the samereasoning, it is easy to see that the same choice for a , as in(43), guarantees condition (44) to hold for all successivenormalized input errors, which, consequently, implies

~

6q,(t),6q,( t ) , 6q , ( t ) , but also their uniform convergence tozero with the increasing number of trials. To introduce theargument, first assume that an asymptotically stable normal-ized reference model (19) has been chosen, and the feedbackparameter a has been fixed to a sufficiently high value toassure the trajectory errors vector to be uniformly normbounded by some E > 0 at any trial (see Theorem 1) .Moreover, consider the differential equation (16) rewritten as

16 q;= [R+A ; ] - - L6q,-P6q;-Bj6q;+A ; _ s q ; - Ia

The hypothesis about the uniform boundedness of q *( t ) ,q * ( t ) , q * ( f ) , nd the local Lipshitzianity of functions A ( - ) ,B ( . , a ) , C(.) assure that the vectorf(6q j , SO;, & - I , 64;-I,64;- ,) 4 6 q j , f ( 6 q j ,6 q j ,6 q i - l , 6q;- I ,6 q i - ])), with fequ al tothe right member of (4 3 , is Lipshitzian in the vectors (6q,,S q ; ) , (6q ; - , 64;- , 64;- I ) , with constants G and R , respec-tively, in the compact set SE A { (6q , , S q ; , Sq , ) , (6q;- I , S q ,-6q j - norm bounded by E } . Moreover, since R is monotoni-cally decreasing toward ze ro for increasing a (as can easily beverified fro m (45)) we supp ose that the feedback parameter CYhas been further increased to assure that R is smaller than 1.

Le t @([O, T I , R " ) be the set of functions r E C2([0 , I ,R " ) such that r ( t ) , ( t ) , ( t ) ange inside the compact set P E,and let F be the operator mapping 6*([0, I , R " ) n 6*([0,r J , R " ) such that F(6qf- ) = 6 q , , with the 6 q, solution of (45)corresponding to zero initial data. admits as fixed point theidentically vanishing function in [0, TI . Finally, for every r Ec2( [0 ,I , R " ) we set Ilrllp 2 SUP~CIO,TI -pR'Ilr(f), ( f ) ,f ( t) l l , where p > 0.

Now we shal l prove that F is a contractive operator withrespect the norm 11 . I l p , that is, for all 6 q f - , 6q:- I E c2 ( [0 ,TI , R") ,we have 116q,!- Sqfll, < hll6qf- - 6 q f - where6qj = T(6qj-I ) , j= 1 , 2 , O < h < 1. At first we observe thatfor all r E [O , TI ,


6/9

BOND1 et al.: ITERATIVE LEARNING CONTROL THEORY 19At this point, it turns out possible to formalize all theconsiderations previously done via the following conclusion.

a number ci > 0 exists such that the convergence of thelearning procedure is guaranteed for all the feedback gains of- 6 q ; - , 7) , 692- (T),64:- (7)) 11 d7 < G s t 11 ( 6 q f (T),0 the form

6 4 ;(7)) 6 q f ( 7 ) , 6 9 ; ( 7 ) )I dT + )IerRr- 1

c1Il p --where p is a positive constant such that pR - G > 0, whichwe shall specify in the following.By the generalized Granwalls lemma we haveII(&Ij(t), 6 4 f ( t ) )- 6q f ( t ) , 64p ) ) l I

Then we have

K = a K L = a L P=ciP, with ci 2 ci ( 5 2 )provid ed null position an d velocity er ro rs occu r at thebeginning of any trial.

V . TH EACCELERATIONEEDBACKROBLEMIn the previous sections a complete solution to the problemof convergence of the learning procedure was presented by

assuming the presence of ideal acceleration sensors (withadjustable gains) in the feedback loop. H owev er, since it wasnoted (see Rem ark 1) that this assum ption makes the presenceof algebraic loops unavoidable, it is quite clear that thecases for which the theory has been developed actuallyrepresent ideal situations that the use of real accelerationsensors can only approximate. M ore specifically, the approxi-mation introduced in the reality is due to the intrinsic timedelay of the acceleration sensors which (though preservingcausality in the loops) always give rise to a mismatch betw eenthe actual acceleration q ( t ) and the measured q(t).

This consideration, how ever, leads to the natural conjecturethat, in practice, convergence to good approximations ofthe desired trajectory q*( t ) can be obtained, provided themismatches between q i ( t ) and qi(t) are kept sufficientlysmall at each trial, a condition that, as it can be argue d, setsan upper bound on the rate of change of q*( t ) ,which in turndepends on the acceleration sensors used.

Even if a quan titative and m ore rigorous analysis would berequired, it is important to keep in mind that the simulationexperiments described in the next section actually confirm theprevious conjecture, thus maintaining the value of the devel-oped theory for practical applications.

However, in this section it seems important to focusattention on the fact that, in practice, the possibility actuallyexists of devising cases whe re the results of the learnin g theorydircctly apply in exact form, without using any accelerationsensors. To clarify this idea, reconsider the dynamic equation(1) of a manipulator and rewrite matrix A (4) as

A ( 4 )= K + A(q) (53)where K now represents a constant (even rough) estimateof the whole (coordinate-dependent) matrix A (4).

(49)and by choosing p in order that R/(pR - G) + R G / ( ~ RG ) + R < 1, we have that

Then the Banach contraction principle assures thatis a contractive operator. Then rewrite (1) as

1iml+- 116qjl),= 0, which implies [R+A(q)14+B(q, ) 4 +C ( q ) =M , (54)and consider a feedback control law (with no accelerationsensors) of the formim I l ( 6 q l ( t ) , 6 q , ( t ) , 6 q l ( t ) ( ( 0, for all t E [O , TI .I- -

(50) M = m + Q -Lq- Pq ( 5 5 )


7/9

20F

Fig. 1 . Structure o f simple kinematic chain used fo r simulation experiments.

giving rise to the closed-loop equation[ K +A( ) ] + [ E+B ( q , q ) ] +Pq+ [C(q )-Q ]= m.

(56)Then since (56) is actually the sam e as that which would ha vebeen obtained for the fictitious manipulator

A(4) i i+B(q , ) 4 + C ( q ) = Mcontrolled by the noncausal control law

A4 = m +Q-K q -L q -Pq ,it directly follows that the same reaso ning of Section I1 can berepeated h ere, thus allow ing the same learning procedure to bedevised.The obvious advantages, in this case, derive directly fromthe absence of the acceleration sensors in the feedback loopand, moreover, from the opportunity of evaluating theacceleration errors via an off-line (and even noncausal)differentiation of the errors relevant to the velocity. Withregard to the asymptotic properties, it is straightforward fromTheorem 2 that (provided L , P are fixed in such a way thatcondition (51) is satisfied) the convergence is assured if thesame theorem applies with 6 < 1 .

VI. SIMULATIONXPERIMENTSTh e proposed learning methodology has been preliminarily

tested via simple simulation examples. They are brieflydescribed and discussed in the present section.

The manipulator considered for the experiments is thesimple one depicted in Fig. 1, represented as a two-linkinverted pendulum with masses concen trated at the end of eachlink and driven by the torque vector A4 = [ M , ,M 2 ] acting incorrespondence with each joint. A natural choice for thegeneralized coordinates to be associated with M is representedby the angle vector q = [y,, 2], which is related to thevector x = [xl, x2]r (the Cartesian coordinates of the endeffector) by the following easily devisable purely geometricrelations:

x = f ( q ) X = J ( q ) q .As is well-known, an other interesting geometric relation is

IEEE JOURNAL OF ROBOTICS AND AUTOMATION, VOL. 4, NO I , FEBRUARY 1988

represented by the following:M = J J ( q ) F (57)

which allows one to interpret the action of any forc e vector Fapplied to the end effector as equivalent to one produce d by thetorque vector M .For the chosen example, it is easy to see that all theforegoing relations turn o ut to be of o ne-to-one type, providedthe invariance of the sign of the angle y2 can be somehowassured for all the movements of concern (in this section m orewill be said about this point). Then provided that this isactually the case, it follows that vector x can also beconsidered as a set of generalized coordinates for our example,thus, in turn, implying that a dynamic equation of the form

a(x)x+a(x , ) X + C ( x ) = F ( 5 8 )actually exists, allowing a complete description of the behav-ior of the manipulator under the assumed conditions.Since this fact suggests the possibility of impleme nting thelearning procedure by considering desired trajectories andrelated errors, expressed in terms of Cartesian coordinates ofthe end effector, the learning-control scheme of Fig. 2 hasbeen devised and used for all simulation experiments to bedescribed.

In such a scheme the computed force F is one line translatedinto the equivalent torque vector M via relation ( 5 7 ) ,x nd xare deduced from the measurements of q , q as given by (56 ) ,an d x is obtained from X via the use of a high-pass filter.Moreover, to maintain the condition about the invariance ofthe sign of the angle y2 , and elastic torque feedback of th eform

P =Qro- 2 )has been inserted a priori (see Fig. l ) , which is, however,embedded within the term e ( x ) of (55 ) .

In our simulation examples, the adopted values for thefeedback gains K , L , P correspond to2.5 0 65 0 50 0K = [ 0 2 . 5 1 L = [ 0 6 5 1 = [ 5 0 1 (59)

while the high-pass filters employed for measuring theaccelerations each amount to a transfer function of the form

T ( s ) = Os/(s+ 10). ( 6 0 )All the considered reference trajectories x*(t) have alwaysbeen assigned a s a couple of components, each o ne obtained a sthe output of a critically dumped second-order system of th eform

( l / w ~ ) x J + ( 2 / w 0 ) X J + x , = f , , j = 1 , 2, . . . (61)driven by a step function with settable amplitude.

By varying the settling time (i.e., wo) and the amplitude ofthe step functions relevant to each component of x * ( t ) ,different forms for the reference trajectories, in the phaseplane, can be obtained, all characterized by a bell-shapedvelocity profile. Some of the obtained results are reported in


8/9

BOND1 et al. : ITERATIVE LEARNING CONTROL THEORY

r-

HANIPULATOR

f(q)Fig. 2. Simulated learning control scheme.

t 3

21

9-9,

Fig. 3(a)-(c), where the continuous lines represent thereference trajectories (each one having a settling time of theorder of 2-4 s) , while the dotted ones correspond to thevarious trials. In thes e conditions an almost com plete conver-gence to the reference trajectories has always been obtainedwith the fifth or the seventh trial.As it was expected, by augmenting the values of thefeedback gains, an increase of the convergence rate occurs.For example, by simply doubling the values of the feedbackgains and referring to the same desired trajectories, conver-gence is obtained within the second and the third trial.

VI I . CONCLUSIONThis paper has presented an inherently nonlinear analysis of

the so-called iterative learning control procedure applied torobotic manipulators, by adopting a high-gain feedback point

(C)Fig. 3 . Some of resulting simulation experiments.

of view. In particular, it has ,sen shown that by simply settinga sufficient (but not necessarily high) amount of linearfeedback in the system, it is always possible to achieve thefollowing conditions:

a) the closed-loop system has a domain of attraction aroundb) the correcting signals used at each trial never push the

the desired trajectory;system out of the attraction domain.

Such conditions are actually essential for the subsequentproof of convergence of the procedure, which has also beenreported in the paper. Investigations aiming to give aquantitative measure of th e minimal amount of linear feedbackrequired for guaranteeing the learning of a given trajectory arecurrently under development.


9/9

22

[51

181[91

IEEE

REFERENCESD. H. Whitney, Resolved motion rate control of manipulators andhuman prostheses, IEEE Trans. Man-Machine Syst., vol. MMS-IO , no . 2, pp. 47-53, June 1969.- The mathematics of coordinated control of prosthetic arm s andmanipulators, J . Dynamic Syst., Meas., Contr., Trans. A S M E , pp .303-309, Dec. 1972.R. P. Paul, B . Shimano, and G. E. Mayer, Kinematic controlequations for simple manipulators, IEEE Trans. Syst., Man,Cybern., vol. SMC-11, June 1981.-~Modeling trajectory calculation and servoing of a computercontrolled arm, Stanford Univ., Stanford, CA, AI Memo 177, Sept.1972.E. Freund, Path control for a redundant type of industrial robot,presented at the 7th Int. Symp. Industrial Robots, Tokyo, Japan, Oct.1977.J. R. Hew it and J. Padovan, Decoupled feedback control of robot andmanipulator arms, in Proc. 3rd CISM-IFTM M Symp. Theory andPractice of Robots and Manipulators, Udine, Sept. 1978.S . Dubowski and D. T. Des Forges, The application of modelreferenced adaptive control to robotic manipulators, ASME J .Dynamic Syst., Meas., Contr., p. 193, Sept. 1979.G. Casalino, Control of robotic manipulators via a trial and errorappro ach, U niv. Genoa, Genoa, Italy, Internal Rep. DIST, 1984.G . Casalino and G. Bartolini, A learning procedure for the control ofmovements of robotic manipulators, presented at the IASTED Symp.Robotics and Automation, Amsterdam, The Netherlands, June 1984.-. , A learning approach to the control of robotic manipulators,presented at the Nat. Meeting Italian Soc. for Automation (ANIPLA),Genova, Italy, Dec. 1985.G. Casalino and L. Gambardella , Learning of movements in roboticmanipulators, presented at the 1986 IEEE Conf. Robotics andAutomation, San Francisco, CA, April 1986.S . Arimoto, S. Kawamura, and F. Miyasaki, Bettering operation ofrobots by learning, J . Robotic Syst., pp. 123-140, 1984.-, Can mechanical robots learn by themselves? in RoboticResearch, 2nd Int. Symp.S . Arimoto, S . Kawamura, F. Miyasaki, and S . Tamaki, Learningcontrol theory for dynamical systems, in Proc. 24th Conf.Decisionand Control, Ft. Lauderdale, FL, Dec. 1985.T. Yoshizawa, Stability theory by Lyapunovs second method,Math. Soc. Japan, 1966.J . J. Craig, Adaptive control of manipulators through repeatedtrials, in Proc. 1984 Amer. Control Conf.. San Diego, CA, June1984.

Cambridge, MA: MIT Press, 198 5.

JOURNAL OF ROBOTICS AND AUTOMATION, VOL . 4. NO . I , FEBRUARY 1988

Paola Bondi was born in Caserta, Italy, on April 8,1951. She received the Dr. Mathematics degreefrom the University of Naples, Naples, Italy. in1974.Since 1974 she has been w ith the Department ofMathematics and Applications, University of Na -ples, where she is currently an A ssociate Professor.Her research interests include stability theory,adaptive control, and robotic manipulators.

Giuseppe Casalino was born in Genoa, Italy, in1949. He graduated in electronic engineering at theUniversity of Genoa in 1975.From 1976 to 1978 he was a Research Fellow inAutomatic Control at the Institute of ElectricalEngineering of the University of G enoa. Since 1978he has been an Assistant Professor of AutomaticControl and Multivariable Control Theory at thesame institute. From O ctober to December 1983 hewas a Visiting Fellow at the Australian NationalUniversity, Department of Systems Engineering.He is currently an A ssociate Professor at the Department of Communications,Computer, and Systems Science (DIST) of the University of Genoa. Hisresearch interests include the fields of adaptive control, multivariable anddecentralized control, industrial process con trol, and robotics.

Lucia G ambardella was born in Torre del Greco(Napoli), Italy, on December 1 1 , 1945. She re-ceived the degree in mathematics f rom the Univer-sity of Napoli, in 1969.Since 1970 she has been with the University ofNapoli where she is presently an Associate Profes-sor. Her present interests are in the areas of stabilitytheory, adaptive control, and learning automata.

00000767_on the iterative learning control theory for robotic manipulators

Documents