control of a two-dof manipulator equipped with a pnr ... · control of a two-dof manipulator...

Control of a two-DOF manipulator equipped with a pnr- VariableStiffness Actuator

Francesco Romano and Luca Fiorio and Giulio Sandini and Francesco Nori1

Abstract— Recently, new trends in robotics have proposedvariable stiffness actuators (VSA) as an alternative to theclassical actuator design, based on a rigid coupling betweenmotors and actuated joints. These novel technologies ask fornovel control and planning strategies. In the present paper weconsider the problem of motion planning for a specific class ofVSA. This class, named passive noise rejecting VSA (pnrVSA)is capable of rejecting noise without explicitly resorting tofeedback. Passive noise rejecting actuators call for new planningand control strategies. In this paper we apply an open-loopcontrol obtained as the solution of a stochastic optimal controlproblem solved with path integral (PI) approach. Specificfocus is on obtaining a feedback-free numerical solution whichemphasizes the peculiar characteristics of the pnrVS actuator.The proposed numerical technique is applied to control a two-DOF manipulator equipped with pnrVS actuators. It is shownthat stochastic optimal control can successfully simulate anhighly unstable task consisting of pushing against an unstablewall in presence of instability and noise. The proposed task isreminiscent of real tasks such as screwing and carving.

I. INTRODUCTION

Forthcoming robotic applications require robots to physi-cally interact with the environment. This requirement is ex-tremely different from current industrial applications whererobots are primarily required to perform positioning tasks.These novel applications ask for novel design principlesand technologies. Within this context compliance, achievedeither with proper control strategies or with mechanicalflexible elements, is becoming a fundamental requirement todeal with uncertainties in modeling the physical interaction.Compliant actuators have been recently proposed for safeinteraction [1], mechanical robustness and energy efficiency[2].

In the context of human motor control, there is enoughevidence supporting the idea that humans regulate compli-ance via muscle co-activation. Burdet et al. [3] suggestedthat humans rely on muscle co-activation also to cope withsensorimotor delays and noise in presence of instabilities.The finding that monkeys actually specify the intrinsic mus-culoskeletal impedance even when deafferented [4] furtherreinforced the idea that stiffness regulation is indeed a crucialmovement feature that (in biological systems) is not realizedwith explicit feedback loops.

This aspect has been weakly explored and exploited in thefield of robotics. The present paper aims at exploiting this

This paper was supported by the FP7 EU projects CoDyCo (No. 600716ICT 2011.2.1 Cognitive Systems and Robotics), and Koroibot (No. 611909ICT-2013.2.1 Cognitive Systems and Robotics).

1The authors are with the Robotics, Brain and CognitiveSciences Department, Italian Institute of Technology, Genoa, [email protected]

potential by showing (in a stochastic scenario) that certaintypes of variable stiffness actuators can cope with instabilitiesin an open-loop manner (i.e. without explicitly relying on thefeedback term).

Starting from these findings, we recently designed andbuilt a passive noise rejecting Variable Stiffness Actuator(pnrVSA) prototype based on quadratic non-linear springs[5], [6] . For a single actuated joint, our system adoptstwo motor-gear groups in agonist-antagonist configurationcoupled to the joint via serial non-linear springs. The noveltyof this actuator resides in two parallel non-linear springsconnecting the internal motor-gear groups to the actuatorframe. Those additional elastic elements create a close forcepath that mechanically attenuates the effects of externalnoise. Co-activation (the simultaneous activation of bothmotors) increases both the joint stiffness and the joint passivenoise rejection, which is the ability to reduce the effect ofnoise without relying on explicit feedback loops. The ideabehind this choice is that digital controllers are always band-limited and this limitation should reflect in the disturbancerejection property of the system. In this paper we studythe control of a two-DOF manipulator equipped with pnrVSactuators. In particular we focus on the effect of neglectingthe feedback term in the control and on exploiting themechanical feedback provided by the actuators. Results areprovided in simulation showing the effectiveness of thepnrVS actuators.

In this paper we propose a numerical solution for quitea difficult task: controlling a two-DOF planar manipulatorequipped with pnrVSAs in an unstable task. Quite a numberof solutions exist for closed-loop stochastic planning [7], [8],[9]; in our case we use a recent approach called path-integralstochastic optimal control to construct controllers that cancope with an issue which is often neglected for practical1 andtechnical reasons: the absence of feedback (i.e. pure open-loop planning). The objective of the present work is to showhow it is possible to exploit mechanical features availablein variable stiffness actuators such as pnrVSA to cope withuncertainties without explicitly relying on feedback.

The paper is organized as follows. Section II illustratesthe variable stiffness actuator considered in this paper anddescribes its model and its properties. Section III brieflydescribes the open-loop control framework we applied to

1Robots nowadays rely on fast feedback loops with delays and latenciesof approximately 1ms; it makes therefore no sense to force solutionswhich do not take into account this efficient feedback. Our motivation isslightly different and therefore the request of open-loop solutions becomesa necessity.

Fig. 1: Schema of the pnrVSA prototype

achieve the proposed task. Section IV presents the consideredtask performed by the two-DOF manipulator equipped withtwo pnrVSAs and shows the results we obtained.

II. PASSIVE NOISE REJECTING VARIABLE STIFFNESSACTUATOR

Mathematical model and properties of our prototype ofvariable stiffness actuator (pnrVSA) is presented here. Fora full description of the device, from a mechanical point ofview and control properties, we refer to [10], [11], [12].

This type of actuator presents the unique property of beingable to mechanically reject external disturbances withoutrelying on explicit feedback. The actuator is composed of twomotors connected to the joint through two “serial” nonlinearelastic elements and it is connected to the frame by meansof two “parallel” nonlinear elastic elements. A model of theactuator is shown in Fig 1. In the following we will considerfor the sake of simplicity cubic springs as nonlinear elasticelements.

The dynamic of the system can be derived by using La-grangian formalism thus obtaining the following equations:

Iq q = k1(θ1 − q)3 + k2(θ2 − q)3 − bq q + τext

I1θ1 = k1(q − θ1)3 + k′1(θ

′1 − θ1)

3 − b1θ1 +m1

I2θ2 = k2(q − θ2)3 + k′2(θ

′2 − θ2)

3 − b2θ2 +m2,

(1)

where q is the joint angle, θ1, θ2 are the motor angles, θ′1, θ′2are the angles between the frame and the torsional springs,Iq , I1, I2 are the inertias of the joint and of the two motors,k1, k2 are the serial spring elastic coefficients and k′1, k′2are the parallel spring elastic coefficients, bq , b1, b2 are theviscous friction coefficients, τext is an external torque actingon the joint and m1, m2 are the motor torques.

The net internal torque acting on the joint q is given bythe following equation:

τ = k1(θ1 − q)3 + k2(θ2 − q)3, (2)

and the equivalent stiffness at joint q is given by:

kq = −∂τ∂q

= 3k1(θ1 − q)2 + 3k2(θ2 − q)2. (3)

We can notice that by controlling θ1 and θ2 we can varyboth the torque at joint and its intrinsic stiffness.

During our simulation we consider a two-DOF planarmanipulator. Each joint is actuated by a pnrVS actuator. Thedynamic equation of the manipulator is [13]:

M(q)q+ C(q, q)q− J⊤c λ = τ , (4)

where q =[q1 q2

]⊤is the generalized coordinate vector,

M(q) is the mass matrix, C(q, q) is the velocity dependenttorque terms, λ is the reaction forces due to contacts, Jc isthe jacobian of the contact points and τ =

[τ1 τ2

]⊤ thegeneralized force vector. Because the actuation is providedby pnrVS actuators, each joint torque term is in the form ofEq. 2.

III. OPEN-LOOP STOCHASTIC OPTIMAL CONTROL

In our simulations we generate the control policies bymeans of a stochastic optimal control algorithm. We considerstochastic nonlinear control-affine systems in the followingform:

dx = a(x(t), t) dt+B(x(t), t)u(x(t), t) dt+ C(x, t) dw,(5)

where x ∈ Rn is the state of the system, u ∈ Rm is thecontrol input, w ∈ Rm is brownian noise, a(·) is the driftterm, B(·) is the control matrix and C(·) is the diffusionmatrix. Note that the equality is defined in the sense of Ito.

We define the cost-to-go at time t0 starting in state x0 as:

J(x0, t0) = E[ϕ(x(tf )) +∫ tf

t0

L(x, t) + 1

2u⊤Rudt], (6)

where ϕ(·) is a final state-dependent cost and L(·) is theLagrangian cost term.

The optimal cost-to-go J∗ satisfies the stochastic versionof the Hamilton-Jacobi-Bellman (HJB) equation:

−∂J∗

∂t = minu

[L+1

2u⊤Ru+ (a+Bu)⊤ ∂J∗

∂x

+1

2tr(CC⊤ ∂2J∗

∂x2 )],

and the optimal control u∗ can be expressed asu∗ = −R−1B⊤ ∂J∗(x,t)

∂x .The HJB equation can be transformed into a linear second

order partial differential equation by performing a logarith-mic transformation ψ = exp(− 1

λJ) and by assuming that

C = B√λR−1, (7)

with 0 < λ ∈ R. Solutions of the linearized PDE can beobtained by using the Feynman-Kac formula [7], [14], [15].

It can be then shown that the optimal control can beexpressed at each state/time as a path integral [7] whichcan be approximated via importance sampling methods. Inthis paper we use a slightly modified version of the originalPI2 algorithm [16], [17] which produces open-loop solutionsinstead of feedback ones.

To solve the optimal control problem numerically, weconsider a discrete-time representation of the system in Eq. 5:

xi+1 = xi + ai∆t+Biui∆t+√∆tCiϵi, (8)

Fig. 2: Two-DOF manipulator equipped with two pnrVSAs.It is shown the pushing task against the wall together withthe reaction force λwall and the divergent force field Fx(x).

where i = 1, . . . , n ∈ N, ∆t = Tn , T = tf − t0,

xi = x(i∆t+ t0), ϵi = N (0, 1m) and 1m ∈ Rm denotesthe identity matrix of size m.

IV. SIMULATION RESULTS

In this paper we illustrate the effectiveness of pnrVSAfor unstable tasks. In the simulated task a multi-jointarm equipped with pnrVSAs pushes against a wall withits end effector. Fig. 2 shows the experimental setup.At the contact point an unstable, divergent, force isacting parallel to the wall. The force is modeled asFx(x) = (x− x0) + a tanh(b(x− x0)) where x0 is a refer-ence point. The properties of the two-DOF arm are based onhuman-like measures: l1 = 0.3m, l2 = 0.4m, m1 = 1.4kg,m2 = 1.1kg, center of mass positions c1 = 0.11m, c2 =0.16m. The moment of inertia of each link is computed byIi = mic

2i . The viscous friction of internal motors has the

following coefficients bij = 4Nm s/[rad], the inertias of themotors are assumed to be 2Nm s2/[rad].

The tasks the actuator has to accomplish are the following:i) push with constant force of 10N against the wall along they-axis; ii) keep the end effector position p =

[xee yee

]⊤at p∗ =

[0.3m 0.4m

]⊤; iii) minimize the control effortneeded to accomplish the tasks.

The dynamic equation of a two-DOF planar manipulatoris the one in Eq. 4. We added the rigid contact constraintequation transformed at the acceleration level:

Jcq + Jcq = 0

We can rewrite Eq. 4 together with the actuators dynamicequations in (1) into state space form, as this is needed bythe algorithm introduced in Section III. We define the statevector x ∈ R12 and the control vector u ∈ R4 as in the

TABLE I: Cost matrices used to compute the control policies.

Matrix ValueQ 1010 14qλ 106

Rs 103 12R 14

TABLE II: Total costs and theirs components in the twodifferent scenarios for the two different control solutions.

Low Noise High noiseHigh Low High Low

Stiffness Stiffness Stiffness StiffnessTask cost 1.62 108 1.53 108 2.84 108 4.04 109

Stiffness cost 2.02 106 1.42 106 1.95 106 8.45 105

Trajectory cost 4.82 108 4.21 108 8.62 108 3.82 109

Control cost 5.58 104 3.15 104 5.58 104 3.15 104

End cost 4.62 107 5.68 107 1.05 108 1.22 109

Total Cost 5.28 108 4.78 108 9.67 108 5.04 109

following:

x :=

x1,2

x3,4

x5,6

x7,8

x9,10

x11,12

=

qqθ1θ2θ1θ2

, u :=

[m1

m2

],

where θi =[θi1 θi2

]⊤ and mi =[mi1 mi2

]⊤. It is nowpossible to write the system in a state space form. In orderto be used in a numerical algorithm we discretize it by usingthe Euler discretization method with time step of 10ms thusobtaining a dynamical system in the form of Eq. 8.

We now define our cost function as in Eq. 6. The Lagrangecost term has been chosen as:

L(x, t) =[(p− p∗)⊤ p⊤

]Q

[p− p∗

p

]+qλ(λwall − λdes)

2 + k⊤q Rskq ,

where kq =[kq1 kq2

]⊤ is the stiffness of the two joints.The actual values of the cost matrices used in our simulationare listed in Table I. The final cost ϕ(x(tf )) is equal to thestate-dependent running cost L(x, t).

In order to show how stiffness regulation can impact themanipulator performances we compute two different controlsolutions by varying the noise level entering the system. Todo that we changed the parameter λ in Eq. 7. We denotethe solution with λ = 1 as the “high stiffness” solutionand the solution with λ = 0.01 as the “low stiffness”solution. Once the control policy has been computed in thetwo aforementioned ways, we evaluated it by applying itto two different scenarios: a “low noise” scenario, i.e. theevolution of the system has been computed with λ = 0.01,and a “high noise” scenario, i.e. λ = 1 in the simulation.

We performed 20 evaluations of 10s for each scenario.Resultant costs are shown in Table II. Fig. 3 and Fig. 5 showthe end effector average position (in red) together with thestandard deviation (the blue shaded area).

(a) High stiffness solution: end effector x position, xee. (b) High stiffness solution: end effector y position, yee.

(c) Low stiffness solution: end effector x position, xee. (d) Low stiffness solution: end effector y position, yee.

Fig. 3: Simulation of the actuator behavior with low noise. The mean is plotted in red. The shaded blue region representsthe standard deviation.

High stiffness solution High stiffness solution Low stiffness solution Low stiffness solution

(a) Top: kq1 ; Bottom: kq2 . (b) Top: τ1; Bottom: τ2. (c) Top: kq1 ; Bottom: kq2 . (d) Top: τ1; Bottom: τ2.

Fig. 4: Torques and joints stiffness with high noise. The mean is plotted in red. The shaded blue region represents thestandard deviation.

(a) High stiffness solution: end effector x position, xee. (b) High stiffness solution: end effector y position, yee.

(c) Low stiffness solution: end effector x position, xee. (d) Low stiffness solution: end effector y position, yee.

Fig. 5: Simulation of the actuator behavior with high noise. The mean is plotted in red. The shaded blue region representsthe standard deviation.

Figure 3 show the performances of the two control policiesfound in the low noise scenario. As expected, the twosolutions are almost equivalent, with almost no error alongthe y-axis and a low variance along the x-axis. The first twocolumns of Table II show the various components of the(averaged) cost. It is possible to notice that the task-relatedcost are almost the same, while the stiffness and control costcomponents of the low stiffness control policy are almosthalf the ones of the high stiffness solution. This fact showsthat high stiffness is unnecessary and expensive to adopt incase of low noise.

Figure 5 shows instead how the same control policies ap-plied to the previous case perform in a more noisy scenario.We can notice that while the more stiff solution managesto stay around the desired position, the low stiffness controlrapidly goes away from the reference position as shown inFig.5c. The last two columns of Table II show the costs inthis scenario. Differently from the previous scenario in thiscase the task related costs are one order of magnitude bigger

in the low stiffness control solution.Figure 6 and 4 show the joint torques and stiffness in

the low noise and high noise scenario respectively. In bothscenarios and in both high and low stiffness control solutions,the torque and stiffness at q1 are greater than the ones at q2.This fact, even if can be counterintuitive, is easily explainedby computing the jacobian of the end effector. Indeed, inthe manipulator configuration torques at the second joint canonly generate forces along the x direction, while torques atthe first joint can generate forces both in x and y direction.

V. CONCLUSIONS

Recently, a number of novel actuator designs with variablepassive properties have been proposed. One of the proposeddesigns (pnrVIA [10]) allows for simultaneous control [11]of passive stiffness and passive noise rejection. The specificfeatures of this new actuator principle calls for novel methodsfor movement planning: in this paper we analyzed howopen-loop stochastic optimal control can be used to properly

High stiffness solution High stiffness solution Low stiffness solution Low stiffness solution

(a) Top: kq1 ; Bottom: kq2 . (b) Top: τ1; Bottom: τ2. (c) Top: kq1 ; Bottom: kq2 . (d) Top: τ1; Bottom: τ2.

Fig. 6: Torques and joints stiffness with low noise. The mean is plotted in red. The shaded blue region represents the standarddeviation.

exploit the advantages offered by this actuator.The paper relied on a slight modification of available

stochastic optimal control methodologies. The used methodfalls into a class which relies on stochastic sampling ofdiffusion processes to approximate path integrals. The op-timal control algorithm has been used for open-loop move-ment and stiffness planning with pnrVSA. An unstable taskwere performed by exploiting the unique property of thisclass of VSA actuators, nominally the ability to changethe system passive stiffness and noise rejection by actua-tors co-activation (and therefore without relying on sensoryfeedback). Simulation results showed how, by taking intoaccount uncertainties during the planning phase, it is possibleto successfully accomplish the desired task even withoutresorting to a feedback control law.

REFERENCES

[1] A. Bicchi and G. Tonietti, “Fast and soft-arm tactics,” IEEE Robotics& Automation Magazine, no. June, 2004.

[2] B. Vanderborght, R. Van Ham, D. Lefeber, T. G. Sugar, and K. W. Hol-lander, “Comparison of Mechanical Design and Energy Consumptionof Adaptable, Passive-compliant Actuators,” The International Journalof Robotics Research, vol. 28, no. 1, pp. 90–103, Jan. 2009.

[3] E. Burdet, R. Osu, D. W. Franklin, T. E. Milner, and M. Kawato,“The central nervous system stabilizes unstable dynamics by learningoptimal impedance.” Nature, vol. 414, pp. 446–449, 2001.

[4] A. Polit and E. Bizzi, “Characteristics of motor programs underlyingarm movements in monkeys.” Journal of neurophysiology, vol. 42, pp.183–194, 1979.

[5] B. Berret, S. Ivaldi, F. Nori, and G. Sandini, “Stochastic optimal con-trol with variable impedance manipulators in presence of uncertaintiesand delayed feedback,” in 2011 IEEE/RSJ International Conference onIntelligent Robots and Systems. Ieee, Sept. 2011, pp. 4354–4359.

[6] B. Berret, I. Yung, and F. Nori, “Open-loop stochastic optimal controlof a passive noise-rejection variable stiffness actuator: Applicationto unstable tasks,” in 2013 IEEE/RSJ International Conference onIntelligent Robots and Systems. Ieee, Nov. 2013, pp. 3029–3034.

[7] H. Kappen, “Linear Theory for Control of Nonlinear StochasticSystems,” Physical Review Letters, vol. 95, no. 20, p. 200201, Nov.2005.

[8] E. Todorov, “Efficient computation of optimal actions.” Proceedingsof the National Academy of Sciences of the United States of America,vol. 106, no. 28, pp. 11 478–83, July 2009.

[9] E. Theodorou, J. Buchli, and S. Schaal, “Reinforcement learning ofmotor skills in high dimensions: A path integral approach,” 2010 IEEEInternational Conference on Robotics and Automation, vol. 1, pp.2397–2403, 2010.

[10] L. Fiorio, A. Parmiggiani, B. Berret, G. Sandini, and F. Nori,“pnrVSA: human-like actuator with non-linear springs in agonist-antagonist configuration,” 2012 12th IEEE-RAS International Confer-ence on Humanoid Robots (Humanoids 2012), no. 1, pp. 502–507,Nov. 2012.

[11] F. Nori and L. Fiorio, “Control of a Single Degree of Freedom NoiseRejecting - Variable Impedance Actuator,” in 10th IFAC Symposiumon Robot Control, P. Ivan, Ed., Sept. 2012, pp. 473–478.

[12] L. Fiorio, F. Romano, A. Parmiggiani, G. Sandini, and F. Nori, “On theeffects of internal stiction in pnrVIA actuators,” in Humanoid Robots,13th IEEE-RAS International Conference on, Atlanta, USA, 2013.

[13] B. Siciliano, L. Sciavicco, L. Villani, and G. Oriolo, Robotics: Mod-elling, Planning and Control, 2009.

[14] B. Oksendal, Stochastic Differential Equations. An Introduction withApplications, 2005.

[15] E. Theodorou, “Iterative path integral stochastic optimal control:theory and applications to motor control,” Ph.D. dissertation, 2011.

[16] E. Theodorou and F. Stulp, “An Iterative Path Integral Stochastic Op-timal Control Approach for Learning Robotic Tasks,” World Congress,2011.

[17] E. Theodorou and E. Todorov, “Relative entropy and free energydualities: Connections to path integral and kl control.” in CDC. IEEE,2012, pp. 1466–1473.

control of a two-dof manipulator equipped with a pnr ... · control of a two-dof manipulator...

Documents