towards the development of a soft manipulator as an …...2018/08/06 · research article towards...

Research Article

Towards the development of a softmanipulator as an assistive robot forpersonal care of elderly people

Yasmin Ansari1, Mariangela Manti1, Egidio Falotico1,Yoan Mollard2, Matteo Cianchetti1 and Cecilia Laschi1

AbstractManipulators based on soft robotic technologies exhibit compliance and dexterity which ensures safe human–robotinteraction. This article is a novel attempt at exploiting these desirable properties to develop a manipulator for an assistiveapplication, in particular, a shower arm to assist the elderly in the bathing task. The overall vision for the soft manipulatoris to concatenate three modules in a serial manner such that (i) the proximal segment is made up of cable-based actuationto compensate for gravitational effects and (ii) the central and distal segments are made up of hybrid actuation toautonomously reach delicate body parts to perform the main tasks related to bathing. The role of the latter modules iscrucial to the application of the system in the bathing task; however, it is a nontrivial challenge to develop a robust andcontrollable hybrid actuated system with advanced manipulation capabilities and hence, the focus of this article. We firstintroduce our design and experimentally characterize its functionalities, which include elongation, shortening, omnidir-ectional bending. Next, we propose a control concept capable of solving the inverse kinetics problem using multiagentreinforcement learning to exploit these functionalities despite high dimensionality and redundancy. We demonstrate theeffectiveness of the design and control of this module by demonstrating an open-loop task space control where it suc-cessfully moves through an asymmetric 3-D trajectory sampled at 12 points with an average reaching accuracy of 0.79 cm+ 0.18 cm. Our quantitative experimental results present a promising step toward the development of the softmanipulator eventually contributing to the advancement of soft robotics.

KeywordsAssistive robotics, soft robotics, machine learning, robot control

Date received: 20 May 2016; accepted: 10 December 2016

Topic: Special Issue - Soft Robotics Interacting with the Real WorldTopic Editor: Li Wen

Introduction

Activities of daily living (ADLs) refer to the daily tasks

people should be able to independently accomplish1 and

are grouped into two general categories: (i) basic ADLs

(BADLs) and (ii) instrumental ADLs. However, achieving

these is a significant challenge for the elderly generation

with chronic ailments and loss of abilities.2,3 Initially, chil-

dren took care of their elders; however, nowadays the

majority of the population is dependent upon professional

healthcare services. This has brought about an interest to

the development of advanced assistive technical devices

(also referred to as assistive robotics) in order to promote

independent living and consequently provide the possibil-

ity to the elderly generation to improve their quality of

life.2,4,5 This work focuses on the development of assistive

devices for BADLs, in particular, the task of bathing which

1The BioRobotics Institute, Scuola Superiore Sant’Anna, Pisa, Italy2Flowers Lab, Inria, France

Corresponding author:

Yasmin Ansari, The BioRobotics Institute, Scuola Superiore Sant’Anna, via

Rinaldo Piaggio, 34 Pontedera 56025, Italy.

Email: [email protected]

International Journal of AdvancedRobotic Systems

March-April 2017: 1–17ª The Author(s) 2017

DOI: 10.1177/1729881416687132journals.sagepub.com/home/arx

Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 3.0 License

(http://www.creativecommons.org/licenses/by/3.0/) which permits any use, reproduction and distribution of the work without

further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/

open-access-at-sage).

mailto:[email protected]

https://doi.org/10.1177/1729881416687132

http://journals.sagepub.com/home/arx

https://us.sagepub.com/en-us/nam/open-access-at-sage

https://us.sagepub.com/en-us/nam/open-access-at-sage

http://crossmark.crossref.org/dialog/?doi=10.1177%2F1729881416687132&domain=pdf&date_stamp=2017-04-11

is considered challenging because the entire body is

exposed in this scenario. Literature reveals, that this area

has been scarcely investigated, with only two commercially

available technical solutions.

� Oasis Seated Shower system:6 provides an independent,

safe, and hygienic shower experience. The system is

equipped with a push-button soap application and rinse

system. On the downside, it completely lacks active

interaction between the user and the tool which also

includes lack of functionalities such as rubbing/wiping

the senior. Furthermore, it is bulky and expensive.

� Seat Lift Device:7 is much simpler, partially auto-

mated, and helps positioning the user in the bathing

task. However, it does not provide any assistance in

showering task.

There still exists a multitude of unaddressed issues in

this domain where the system should: (i) be client-oriented

i.e. translate customer requests into technical requirements;

(ii) be safe, reliable, and adaptable for human-robot inter-

action;8,9 (iii) be highly dexterous with a large reachable

workspace; and finally (iv) autonomously reach difficult

and sensitive regions of the body (refer to Figure 1).

Soft robotic manipulators

In order to address the above limitations and increase the

user autonomy at home, a suitable approach is to exploit

principles and technologies from the rapidly growing

research field of Soft Robotics10 for developing a manipu-

lator as a shower arm in the bathing task. Soft robotics refers

to a new generation of flexible robots that take inspiration

from muscular hydrostats such as octopus tentacles, elephant

trunks, etc., in the use of soft materials to achieve advanced

manipulation capabilities while being safe to interact

with.11–13 These natural manipulators have inspired the

development of novel soft robotic manipulators for obtain-

ing a desired gamut of motion. These manipulators comprise

of actuators elements that can be arranged into the single

body or repeated into single identical modules. The advan-

tages of the modular approach to soft robots, introduced by

Onal and Rus14 for the first time, demonstrates how actua-

tion units can be composed in various, possibly redundant

arrangements to achieve exceedingly complex motions. Tak-

ing advantage of this achievement, a mechanical modular

manipulator can be designed by concatenating modular

blocks, which, in turn, are an assembly of soft actuators that

can be intrinsic (actuators are located within the body thus

forming part of the animated mechanism), extrinsic (the

actuator is remote and motion is transferred into the mechan-

ism via a mechanical linkage), or hybrid (combination of

intrinsic and extrinsic actuation).15

We have considered that the technical requirements for the

design of a soft manipulator are defined by the workspace that

it should cover, which includes a volumetric space around and

close to the users’ body as shown in Figure 1. This will be

translated into a mechanical modular manipulator by conca-

tenating three modular blocks such that (i) the proximal mod-

ule is a structural module based on radially arranged cables

that are used to compensate for gravitational effects and (ii)

the centre and the distal modules on hybrid actuation are

identical functional modules. The success of the manipulator

is highly dependent upon the functional capabilities and

working of the hybrid actuated module which is a nontrivial

challenge. Consequently, the focus of this work lies in the

development of a reliable and controllable distal module.

Challenges in design

The challenge of developing manipulators that rely on the

combination of fluidic actuators and cables i.e. hybrid actua-

tion has been largely faced in the surgical robotics field open-

ing the door to other applications.16 Good examples of soft

manipulators based on tendons pulled in relation to pressur-

ized chambers are: the KSI tentacle manipulator powered by a

hybrid system of pneumatic bellows and electric motors,17 the

continuum robot proposed by Pritts19 uses 6-8 contracting/

extending McKibben actuators for providing two-axis bend-

ing, the Air-Octor manipulator that uses three cables and a

single chamber,20 and the OCTARM continuum robot21 that

is able to perform adaptive manipulation in challenging envir-

onments by using air muscle actuators. These examples have

been integrated into a more comprehensive literature analy-

sis, as reported in Table 1, that provides a chronological order

of some examples for bioinspired manipulators that cover all

three categories focusing on the actuation strategy that

enables specific functionalities (i.e. elongation/contraction,

bending and variable stiffness). This table also indicates the

major achievements in Soft Robotics concerning the design of

Figure 1. General overview of the bathing task. The softmanipulator approaches the back region of the user, one of themost critical and difficult regions to reach. The lower limbs havealso been identified as difficult task. An additional soft manipulatorcould be mounted on the wall for the lower limbs.

2 International Journal of Advanced Robotic Systems

Tab

le1.

Asu

mm

ary

ofth

edes

ign

and

contr

olofso

me

exam

ple

sofso

ftro

botic

man

ipula

tors

.

Man

ipula

tor

Public

atio

n

Tim

elin

eA

pplic

atio

nD

esig

n

Act

uat

ion

Funct

ional

itie

sC

ontr

ol

Typ

eA

ctuat

ors

KSI

tenta

cle1

71995

Endosc

opy

Agr

iculture

Nucl

ear

Modula

rH

ybri

dBel

low

-shap

edpneu

mat

icbla

dder

Rad

ially

arra

nge

dca

ble

s

Var

iable

stiff

nes

s

Short

enin

g

Elo

nga

tion

Om

nid

irec

tional

ben

din

g

Model

bas

ed

Act

ive

Hose

18

2001

Sear

chan

dR

escu

eM

odula

rIn

trin

sic

Wounded

tube

actu

ator

Om

nid

irec

tional

ben

din

g—

Art

ifici

alm

usc

leco

ntinuum

robot1

9

2004

NSS

*M

odula

rIn

trin

sic

(Exte

nso

rþ

contr

acting)

pneu

mat

ic

Cham

ber

s

Om

nid

irec

tional

ben

din

g—

Air

-Oct

or2

02005

Sear

chan

dre

scue

Modula

rH

ybri

dSi

ngl

epneu

mat

icch

amber

Rad

ially

arra

nge

dca

ble

s

Var

iable

stiff

nes

s

Short

enin

g

Elo

nga

tion

Om

nid

irec

tional

ben

din

g

—

Oct

Arm

21

2005–2008

Indust

rial

Spac

e

Mili

tary

Modula

rIn

trin

sic

Rad

ially

arra

nged

(Ext

enso

rþ

Cont

ract

ing)

pneu

mat

ics

cham

bers

Short

enin

g

Elo

nga

tion

Om

nid

irec

tional

ben

din

g

Model

bas

ed22

Hyb

rid

23

Oct

opus2

42010–2016

Mar

ine

Singl

ebody

Extr

insi

cC

able

Elo

nga

tion

Ben

din

g

Model

bas

ed

Lear

nin

gbas

ed

BH

A25

2011–2016

Indust

rial

Modula

rIn

trin

sic

Rad

ially

arra

nge

dbel

low

-shap

edpneu

mat

ics

cham

ber

s

Elo

nga

tion

Om

nid

irec

tional

ben

din

g

Lear

nin

gbas

ed26

Hig

hly

articu

late

man

ipula

tor

enab

led

by

jam

min

gm

edia

27

2012

NSS

*M

odula

rH

ybri

dG

ranula

rja

mm

ing

Rad

ially

arra

nge

dca

ble

s

Spri

ngs

Var

iable

stiff

nes

s

Short

enin

g

Om

nid

irec

tional

ben

din

g

—

Shri

nka

ble

,st

iffnes

s-co

ntr

olla

ble

soft

man

ipula

tor2

8

2014

NSS

*M

odula

rH

ybri

dLa

tex

cham

ber

Rad

ially

arra

nge

dca

ble

s

Bra

ided

fabri

c

Var

iable

stiff

nes

s

Short

enin

g

Elo

nga

tion

Om

nid

irec

tional

ben

din

g

—

STIIFF

–FL

OP

29,3

02014–2016

Surg

ery

Modula

rIn

trin

sic

Rad

ially

arra

nge

dpneu

mat

ics

cham

ber

s

Gra

nula

rja

mm

ing

Var

iable

stiff

nes

s

Elo

nga

tion

Om

nid

irec

tional

ben

din

g

—

Hyb

rid

Rad

ially

arra

nge

dpneu

mat

ics

Rad

ially

arra

nge

dca

ble

s

Var

iable

stiff

nes

s

Short

enin

g

Elo

nga

tion

Om

nid

irec

tional

ben

din

g

—

Multi-se

gmen

tso

ftflu

idic

actu

ator3

12015

NSS

*M

odula

rIn

trin

sic

Anis

otr

opic

pneu

mat

icch

amber

sO

mnid

irec

tional

ben

din

gM

odel

-bas

ed

*NSS

:not

spec

ifica

llyst

ated

.**

Sim

ula

tion.

manipulators based on variable stiffness mechanisms. More-

over the Application field of Table 1 highlights how none of

these is intended for assistive robotics. It is clear how our

proposal of using Soft Robotics methodologies and mechan-

isms for building the first assistive soft robot is challenging

and completely new. In order to develop a manipulator as a

shower arm in the bathing task, we investigate an actuation

strategy that enables all the mentioned specific functionalities

(i.e. elongation/contraction, bending and variable stiffness).

Challenges in control

Soft robots often lack an analytical model due to their nonlinear

material properties and high-dimensional articulated structure.

This makes predictable and controlled movements with posi-

tional accuracy a significantly challenging task. Consequently,

one of the open research questions is to achieve task space

control that allows these robotic systems to accurately follow

a trajectory, that is, point-to-point motion control. This is

essential for a bathing task where close contact is required

between the manipulator and the human to perform scrubbing

and cleaning. The objective of point-to-point motion control is

to find the correct combination of input actuation at each dis-

crete sample, that is, inverse kinematics/kinetics. Immega and

Antonelli17 proposed a cascaded spline arc approach that con-

trols the cables of the KSI tentacle that roughly positions it in 3-

D space, which is further improved through a closed-loop

vision feedback. Giorelli et al.24 used a Jacobian based

approach to reach an average tip accuracy of 6% of the total

manipulator length. Marchese et al.31 applied a closed-loop

controller on a 3D soft-arm to position the end effector to reach

a ball with a diameter of 0.04m. These traditional model-based

methods are limited in accuracy and computational costs by

modelling assumptions that need to be improved for practical

application of soft manipulators. Learning based control23,24,26

is a promising approach to automate low-level sensorimotor

skills. The underlying principle is to allow the manipulator to

autonomously explore its environment and correlate sensory

and motor spaces. This work investigates the application of

these methods for the previously unaddressed challenge of the

control of hybrid actuated systems.

The paper is organized as follows: Section ‘Design’

presents the design, fabrication, and characterization of

the single distal module. Section ‘Control Framework’

presents the inverse kinematics control approach that is

incorporated as a low-level module in the hierarchical

controller presented in Section ‘Hierarchical Controller’

Experimental methodology and results in point-to-point

motion control is reported in Section ‘Experiments’ fol-

lowed by a comprehensive discussion of the main out-

comes of the proposed work is stated in the last section

of the paper.

Design

The overall design of the distal segment relies on the total length

which should be able to reach the target workspace which

includes back and/or lower limbs (refer to Figure 1). On the one

hand, a McKibben-based actuation maximizes the reachable

length due to the elongation performances of the fluidic actua-

tors, thus, reducing the number of modules required to achieve

the desired length. However, it is unstable due to gravitational

Figure 2. Flowchart showing the fabrication process for the single McKibben actuator.


effects. On the other hand, the use of cables enables multiple

functionalitiesof shortening,omnidirectionalbending,andcom-

pensation for gravity effects. However, it requires a higher num-

ber of modules for covering the same length. We propose to

embed both technologies within the distal module in order to

complement the limitations of each other while increasing

the overall functionality of the complete module. Thus, the

distal module is hybrid actuated. The innovative element in

its mechanical design is represented by the flexible fluidic

actuators that are McKibben based but with bellow-shaped

surface, discussed subsequently.

Design and manufacturing of the distal module

Three McKibben actuators and three cables are alternately dis-

placed at an angle of 60� along a circle with a radius of 3 cm. A

central hollow chamber of 1.4 cm diameter has been inserted to

facilitate the flow of water and soap. The total length of the

module is 20.5 cm, while the total weight is 180 g (refer

to Figure 2). The cables and chambers are decoupled, meaning

that each one has a dedicated activation line for tension and

pressure regulation, respectively.

Design of the McKibben actuator. The manufacturing process

for the McKibben actuator32–34 comprises of the following

steps: (i) forming the external chamber: A braided sheath (Pro

Power PETBK3B10, Farnell Components) with a length of

60 cm and internal nominal diameter of 0.3 cm is expanded

and inserted on a metallic cylinder with a diameter of 0.8 cm.

A mechanical deformation is applied in the inferior/superior

direction to introduce a fiber deformation until a cylindrical

bellow structure is created. The final shape is ‘‘memorized’’

by applying a cyclic uniform heat along the structure after

which the actuator has an outer diameter of 1 cm and a total

length of 19.5 cm (refer to Figure 2). A recommended tem-

perature is around 350�C for 3 min. (ii) Combining the inter-

nal and external chambers: A balloon (latex) is inserted into

Figure 3. CAD design of the single module; on the top right asection view of the module with the three McKibben-basedactuators (identified by A1–A2–A3), the three cables (identifiedby B1–B2–B3), and the internal channel for the provision of water.

Figure 4. Experimental setup for the single modulecharacterization.

Figure 5. Elongation–contraction module characterization. Theblue marker represents the starting configuration without activeelements.

Ansari et al. 5

the braided sheath. It is delimited at both ends by endcaps

which provide an air-source from one end and an anchoring

point to the lateral surface from the other. The bellow-shaped

external chamber facilitates a radial containment effect of the

internal elastic chamber, thus, improving the elongation per-

formances of the combined structure. (iii) Structural support:

The entire module is completed by adding a custom-made

helicoidally shaped structure along its length (Figure 3). It

houses the three cables, the three chambers, and the internal

channel, thus, providing a constraint for the structural ele-

ments and avoiding lateral expansions of the actuators. The

last part of the module consists of the lateral interfaces (end

plates) for confining the module.

Kinetic characterization of the distal module

The kinetic characterization refers to the evaluation of

the complete behavior of the system that includes

contraction/elongation, bending/stiffness, and the com-

plete reachable workspace, as discussed in the subse-

quent subsections.

Experimental setup. The experimental set-up for the character-

ization of the distal module consists of: (1) a rigid rectangular

frame made of 0.8 cm acrylic sheet that orients the module

vertically downward; it is custom designed to encompass the

actuation systems; (2) Pneumatic Set-up: three proportional

pressure-controlled electronic valves (K8P Series by EVP Sys-

tems, Italy, Input: 0 - 10 V, Output: 0 -3 bar), one filter (EVP

Systems, Italy: MC-104FB0); one manometer (EVP Systems,

Italy: M043-p12 0-12bar); one stand-alone air compressor; (3)

Cable Set-up: three Hs-645 Hitech Servomotors; (4) A six-

DOF electromagnetic sensing probe (Aurora® Tracking by

Northern Digital Inc., Canada) has been fixed at the tip of the

module base which is capable of sub-millimetre position track-

ing. This is a visualization tool that provides a log of tracking

movements at the end of an experiment (Figure 4).

Elongation/contraction. The linear motion of the module was

captured by moving it from a completely contracted state to a

completely extended state. The contracting state corresponds

to the cables pulled by the servomotors without the chambers

being inflated. All the servomotors are then simultaneously

rotated through 180� in steps of 20�. The position obtained

Figure 6. Activation pattern with a single cable (B3 in Figure 3) up to the maximum position and activation of the opposite chamber (A1in Figure 3). (a) Global bending movement of the element; (b) xz view; (c) yz view. Arrows with different colors show the orientation ofthe module: x, y, z directions in green, blue, and red, respectively. Arrows with different colors show the orientation of the module: x, y,z directions in green, blue, and red, respectively.


after the completion of this rotation corresponds to the neutral

point of the manipulator. Next, the pneumatic chambers are

inflated from 0.5 to 1 bar. As explained in,35 we start from 0.5

bar as it is the inferior pressure limit of a single actuator i.e.,

below this limit the actuator does not detect any global elon-

gation movement. Figure 5 illustrates the displacement of the

module along the z-axis from the contracted state to the

extended state which is found to be 5 cm. With respect to the

neutral position, this implies that there is a contraction of 3 cm

and elongation of 2 cm.

Bending and stiffening. In order to perform the bending tests,

we evaluated two different activation patterns: (i) bending

due to a single cable and single chamber: a single randomly

chosen cable of the system is contracted from an untensioned

state into a tensioned state by rotating the servomotor from

the neutral position through 180� in incremental steps of 20�.When the maximum tensioned state is achieved, the diame-

trically opposite pneumatic chamber is inflated by applying

a pressure from 0.5 to 1 bar, with an incremental step of 0.1

bar. Referring to Figure 3, this activation pattern is achieved

when A1 collaborates with B3, A2 with B1, or A3 with B2.

Figure 6 illustrates the resulting behavior from this activa-

tion pattern. The global bending of the system contributed by

the cable undergoes a displacement of 60 mm along the z-

axis (viewed as the xz-view and yz-view in Figure 6). A

stiffness variation can be observed when pneumatic actua-

tion begins which is due to an antagonistic effect. (ii) Bend-

ing due to two cables and two chambers: two adjacent cables

of the system are contracted from an untensioned state into a

tensioned state by rotating the corresponding servomotors

simultaneously through 180� in incremental steps of 20�.Referring to Figure 3, this activation pattern can be achieved

when first B1–B2 or B2–B3 or B3–B1 are activated. When

the maximum tensioned state is achieved, the diametrically

opposite pneumatic chambers corresponding to each servo-

motor, that is, A2–A3, A3–A1, or A1–A2, respectively, are

inflated similar to the procedure explained previously.

Figure 7. Activation pattern with two cables (B1–B2 in Figure 3) up to the maximum position and activation of the two oppositechambers (A2–A3 in Figure 3). (a) Global bending movement of the element. (b) xz view. (c) yz view. Arrows with different colors showthe orientation of the module: x, y, z directions in green, blue, and red, respectively. Arrows with different colors show the orientationof the module: x, y, z directions in green, blue, and red, respectively.

Ansari et al. 7

Fig

ure

8.(

a)W

ork

spac

eev

aluat

ion

for

the

singl

em

odule

.The

blu

epoin

tre

pre

sents

the

initia

lposi

tion

oft

he

module

inan

unac

tiva

ted

stat

e.(b

)W

ork

spac

evi

ewfr

om

the

X–Y

pla

ne.

(c)

Work

spac

evi

ewfr

om

the

X–Z

pla

ne.

(d)

Work

spac

evi

ewfr

om

the

Y–Z

pla

ne.

8

Figure 7 depicts resulting behavior from this activation pat-

tern. The global bending of the system contributed by the

cables undergoes a displacement of 5 cm along the z-axis.

The reduced value in comparison to the first activation pat-

tern is due to the double-cable activation that pulls the cables

with an increased force. It can also be concluded from these

figures that the chambers contribute to adapt the tip orienta-

tion of the module. The hybrid actuation is promising

because of the stiffness variation for applying restrained

forces in the interaction with the user.

Reachable workspace. The workspace measurement (see

Figure 8) aims to measure all the reachable positions of the

module tip. A total of 8000 points were recorded which

resulted from a permutation of all six inputs applied using

the following values: (i) servomotors: 0�, 45�, 90�, 135�,180�; (ii) pneumatic pressure: 0 bar, 0.5 bar, 0.8 bar, and 1 bar.

Figure 8(a) depicts the isometric view of the resulting

point cloud, highlighting the workspace in the range of 21.5

cm � 17.2 cm � 10 cm in the x–y–z axes, respectively. The

blue marker corresponds to the starting configuration of the

end effector. The overall shape of the workspace can be cate-

gorized as a volumetric convex with the lower limits defined by

the maximum bending capabilities (approximately 90�) of the

module, as expected. The asymmetry of the shape can be cred-

ited to the manufacturing procedure i.e. the irregularities that

are introduced during the assembly phase which are not easily

identifiable when the module is at rest. For a detailed analysis,

the workspace is also presented from a 2D viewpoint from all

three planes. Figure 8b shows the top view of the workspace

from the X-Y plane. The shape is circular, as theoretically

expected. Figure 8c indicates an asymmetric response of the

system from the X-Z planar viewpoint, whereas a symmetric

response of the system from the Y-Z planar viewpoint is

depicted in Figure 8d. Data points are more pronounced toward

the left portion in the former graph as a result of a tightened

cable attachment to the respective servo arm.

Control framework

Learning based controllers36,37 offer a promising solution to

address the control challenge. In particular, we take inspiration

from a previous work38 where each actuator is considered an

autonomous agent that resides within and shares an environ-

ment forming a distributed multi-agent system (MAS).39 The

underlying objective is to enable the agents to coordinate their

actions to learn a joint optimum behaviour. Reinforcement

Learning (RL)40 is a very attractive online optimization tech-

nique in milieu of automating a solution for MAS as it assists in

(i) model-free learning and (ii) online learning. The former is

beneficial to learn optimal behaviour in unpredictable environ-

ments where the dynamics are not known a-priori, whereas, the

latter is particularly useful to learn behaviour without specify-

ing the underlying actuation technology of the system. This,

however, requires that the agents learn in an independent man-

ner41 similar to the Iterated Prisoner’s Dilemma.42

Reinforcement learning preliminaries

RL is a sequential decision-making tool where an agent

iteratively interacts with an environment modeled as a

Markov decision process (MDP). An MDP is a 4-tuple

<S, A, T, R>, where S is the state space, A is the action

space, T is the environment dynamics given by the condi-

tional probability Pr fs0 | s, ag that taking action a in state s

will lead to new state s0, and R is the reward received when

an action a in state s leads to a new state s0. An agent acts on

the environment by selecting actions according to a policy �,

that is, a mapping from states to actions. Starting from an

initial state, the agent iteratively interacts with the MDP

according to the policy to generate a trajectory which

refers to a sequence of state-action-reward tuples

< s0 ! a0 ! r1 ! s1 ! a1 ! r2 ! s2 ! a2; . . . >.

The return over an infinite horizon trajectory refers to the

cumulative discounted reward as Rt ¼P1

k¼0 gtrkþtþ1,

where g 2 ½0; 1Þ, and r is the reward received at each

time-step. This can also be extended to a finite horizon set-

ting for episodic tasks where g ¼ 1. In the context of con-

trol, an action-value function (the utility is described using

the action-value function as opposed to the more commonly

used value-function V�ðsÞ ¼ E ½Rt jst ¼ s� as a matter of

convenience to the underlying work) Q is widely employed

to evaluate the utility of executing an action a in a state s and

following a given policy � thereafter, given as43

Qðs; aÞ ¼ E

(X�k¼0

gtrkþtþ1jst ¼ s; at ¼ a; �

)(1)

where � is the terminating condition, s is the state of the

system at time t, and a is the action selected in state s at

time t in given policy �. This work considers trajectories that

finish within a finite horizon; hence, � represents when either

the goal sg is reached or a maximum number of action trials

are attempted in an episode. In optimal control problems, the

goal is to find an optimal control policy�� that maximizes the

expected cumulative discounted return regardless of the ini-

tial state. The focus of this work is to derive �� via policy

iteration (PI),44 one of the two main classes of dynamic pro-

graming to solve MDPs.

Formally, PI is a subroutine that runs trajectories iteratively

for a given duration, wherein it alternates the processes of

policy evaluation and policy improvement as follows: Starting

from any initial position at time-step t, an action in the current

state is chosen according to a policy �t and is evaluated by Qt.

In the next time-step t þ 1, an improved policy �tþ1 is gener-

ated by acting greedily (exploitation) with respect to Q�t. In

this framework, the policy is also referred to as the actor and the

Q-function the critic. The goal of PI is to learn an optimal Q-

function Q* such that it satisfies the condition

Q�ðs; aÞ > Q�ðs; aÞ 8 s 2 S ; a 2 A; 8 � (2)

Consequently, an optimal policy �* can be automatically

derived from Q* as

Ansari et al. 9

��ðsÞ ¼ argmaxa Q�ðs; aÞ (3)

For a finite-dimensional MDP, the Bellman equation is

employed to analytically optimize the Q-function. However,

for most real-world applications, including the problem-at-

hand, prior knowledge of environment dynamics is rendered

unknown due to stochasticity. The most powerful approach

in this scenario is temporal difference (TD)-based PI which

is model free and learns incrementally from the data.40 There

are two classical TD learning algorithms applied in the con-

text of control, that is, Q-learning (off-policy TD learning)

and SARSA (on-policy TD learning), where the focus of this

work lies on the latter approach.

In SARSA, an arbitrarily initialized Q-function is incre-

mentally updated with a tuple < st; at; rtþ1; stþ1; atþ1 >as follows

Qtþ1ðst; atÞ ¼ Qtðst; atÞþ � � ½rtþ1 þ gQtðstþ1; atþ1Þ � Qtðst; atÞ�

(4)

where � 2 ½0; 1Þ is the learning-rate/step-size parameter;

the term between square brackets is the TD error between

the current estimate Qtðst ; atÞ and the updated estimate

rtþ1 þ gQtðstþ1 ; atþ1Þ. Equation (4) highlights that every

update aims to estimate the Q-function of the policy being

followed, that is, on-policy TD learning.

This learning process can be further sped-up by

incorporating eligibility traces e 2 R which update Q

not only by the current tuple but rather with all visited

state-action pairs that lead to this tuple as it is the causal

result of an entire trajectory. This variant of SARSA,

more commonly known as SARSA(l) with accumulat-

ing eligibility traces,45 has memory parameters associ-

ated with each state-action pair that are initialized to 0

at the beginning of each trajectory and updated as

et ðs; aÞ ¼�

glet�1ðs; aÞ þ 1 if s ¼ st ; a ¼ at

glet�1ðs; aÞ otherwise(5)

where l 2 ½0; 1Þ is the accumulating trace-decay error that

controls the degree of how fast a trace is decreased to zero.

Then, equation (3) is modified to update Q as follows

Qtþ1ðst; atÞ ¼Qtðst; atÞ þ � � ½rtþ1 þ gQtðstþ1; atþ1Þ� Qtðst; atÞ� � etðs; aÞ (6)

Convergence to Q* is dependent upon two factors: (i)

Robbins-Monro criteria:46 the learning rate � should be opti-

mized in the range � 2 ½0; 1� to satisfyP1

i¼0 ai ¼ 1 andP1i¼0 ai

2 <1; (ii) exploration:47 to ensure that every state-

action pair is visited an infinite number of times. The former

criteria can be achieved through a manual selection process

whereas the latter requires that the updates occur by following

a stochastic policy (e.g. ] –greedy, softmax, etc.) that will

explore the environment by selecting random actions with

certain probability measure. Additionally for SARSA, this

exploration should be reduced to 0 as t !1. Under these

conditions, an optimal memoryless stationary policy, that is,

�*¼ � 8 t can be extracted from the Q-function to reach the

target according to equation (3). PI is motivating due to its

ability to reach optimality with asymptotic performance.

However, fulfilling criteria (ii) is nontrivial for a con-

tinuous domain environment, necessitating techniques to

approximate the Q-function, as discussed subsequently.

Parametric linear function approximation

For continuous domains, the Q-function can be estimated

either with linear/non-linear function approximation. The-

oretical stability and convergence properties favor linear

approximators over non-linear approximators.48

The choice opted for the parametric linear function

approximator is tile coding, which partitions the state space

into multiple layers called tilings in the spherical coordinate

system such that R¼ [0 max(R;)]; (ii) ϕ¼[0� 360�]; and (iii)

� ¼ [max(�) 90�]; where max(R) refers to the length of the

manipulator in a fully extended state; ϕ refers to the azimuth

which for omnidirectional bending will always be 360�; and

� refers the contraction capability of the manipulator mea-

sured from the z-axis. The complete feature space is then

represented by rectangular shaped tilings (refer to Figure 9)

and is automatically restricted to the reachable workspace of

the manipulator. Each tiling is divided into elemental blocks

called a tile that allows for local generalization of distances

close in Euclidean space. The resolution is different in each

dimension of the state-space, given mathematically as

’i¼li

N(7)

where i ¼ 1, . . . , n and n is the total number of dimensions

of the state-space; ’ represents the resolution of a tile in a

Figure 9. 3-D Cartesian plane is abstracted into 2-layeredrectangular tilings in R, ϕ, and � dimensions. There are 7 tiles,8 tiles, and 1 tile with a width of wR, wϕ, and w� in each dimension,respectively. The origin of the tilings are offset with respect toeach other. The position of the soft manipulator end effectoractivates one tile per tiling.


given dimension; l represents the width of the tile in given

dimension; N represents the total number of tilings. The role

of resolution is crucial on the learning performance of the

algorithm: A larger resolution implies slower learning but more

optimal solutions, whereas smaller tiles imply faster learning

but less optimal solutions. The selection of the resolution is

mostly done through a manual trial and error process. Tiles act

as receptive fields, that is, only one tile per tiling is activated if

and only if the given state falls in the region delineated by that

tile. The origin of each tiling is offset to avoid singularities at

the boundaries of the tiles. Q-function is then simply approxi-

mated by a sum of the indexes of these activated tiles as

Q ðs; a Þ ¼Xk

j ¼ 1�

jðs Þwj (8)

where j ¼ 1 . . . k, where k is the total number of tilings;

�: S x A ! Rk is a vector of binary basis functions

½�1ðs; aÞ; . . . ; �kðs; aÞ�T ; w 2 Rk is the parameter vector.

Equation (8) shows that the computational complexity of

tile coding is linearly dependent on the number of til-

ings. Parameter tuning for tile coding is done offline and

kept constant throughout the experimental work.

SARSA(l) can be combined with linear approximation

by using gradient-based updates which will change the

parameter vector according to,

wtþ1 ¼wt þ �t � ½rtþ1 þ g�ðstþ1; atþ1ÞT wt � �ðst; atÞT wt�

� @

@wt

Qðst; atÞ � et

(9)

Reward structure and policy learning

In order to facilitate learning, we guide the policies through

the technique of reward shaping such that the MAS is

motivated to select actions that progressively lead towards

the goal. (Note: In this work, the optimal policy is learnt

from a single starting state towards the goal). This is

implemented in an information theoretic manner by

defining the reward using a distance measure from the

goal as follows: A bounded monotonically increasing

reward function is implemented representing unequally

spaced concentric spheres centered on the target, where

the number of spheres in this work are selected randomly

but once created, is applied in a similar manner for any

target. The motivation behind this reward structure is to

allot the distribution of information throughout the

abstracted state-space such that motor commands that

result in positions with distances further from the goal

will be updated with a higher negative value in compar-

ison to targets closer to the goal. Even though this pro-

vides efficient distribution of information, it would still

suffer from the curse of dimensionality49 as the policy

space increases exponentially with the number of agents.

Thus, we modularize the learning process that is the pol-

icy learns to reach the goal by learning by moving

through the state-space in the direction of greater reward.

This is achieved by taking advantage of our reward struc-

ture such that once it encounters an action that is

rewarded with a scalar value higher than previously

encountered ones, it will be the first action taken by the

system in the next episode onwards.

Hierarchical controller

A schematic overview of the motion controller is depicted

in Figure 10. It has a hierarchical architecture such that at

the higher level, a trajectory is generated within the reach-

able workspace of the manipulator using the spherical

Figure 10. Schematic view of the hierarchical control architecture for a point-to-point motion controller for soft robotic modules.

Ansari et al. 11

coordinates. It is sampled at n points ½s0; s1; . . . ; sn� and

follows them in a sequential order. In structured environ-

ments, this could be used to generate circular, curved, and

so on, trajectories offline. However, in this work, we are

motivated by the unpredictability of the showering envi-

ronment and, hence, aim to test the movement of the mod-

ule through an asymmetrical trajectory which essentially

implies that the task space is sampled at random points in

the state-space in a sequential order with the aim of the

manipulator to reach these points in the same order. These

samples are then fed sequentially (si) to the low-level mod-

ule where it generates a solution to reach the given Carte-

sian spatial target position using the control framework

discussed in section ‘‘Control framework’’. Once gener-

ated, the next sample is requested.

Experiments

The implementation of the tile coding software was

achieved through typically available software. Before

using it, all the necessary parameters required by the

algorithm are set as follows (refer to Table 2): (i) reach-

able workspace: It is defined in spherical coordinates in

consistence with the characterization of the module

shown in Figure 8, that is, R ¼ [10 cm 30 cm], ϕ ¼[0� 360�], � ¼ [0� 90�]. The upper range of the radius

was overextended to compensate for the unseen scenar-

ios, for example, the tear of a cable resulting in a larger

range for the workspace. (ii) Resolution: 10, 12, and 10

tiles were initialized in the R, ϕ, and � dimensions with

a total of 4 tilings offset with respect to one another

such that the state space has a total of 4800 tiles. (iii)

Learning parameters: The step-size (�) and exploration

(]) are set to 0.16 and 0.05, standard values found in

text. The eligibility traces (7) are set to 0.9. (iv) Accu-

racy: The radius of the target circle was kept at a ran-

domly chosen value of .99 cm. This indicates the

maximum accuracy that is acceptable as a solution, the

minimum being 0. (v) Action-set: The module is allowed

to choose actions from a heuristically initialized discrete

Table 2. A summary of applied parameters for the proximal module.

Module length 20.5 cm

Reachable workspace R 10–30 cm nR 10 wR 0.02ϕ 0�–360� nϕ 12 wϕ 0.52� 0�–45.8� n� 10 w� 0.08

State-space parameters T 4� 0.16] 0.1l 0.9! 1

Action set Pneumatic chambers [�0.087 �0.0349 �0.0175 0.0175 0.0349 0.087] (bar)Servomotors [�25 �20 �10 �5 0 5 10 20 25]�

Figure 11. Experimental setup.


action-set that can simultaneously decrease/increase/keep

unchanged the length of the cables and pneumatics. The

length of the cable is controlled by a servomotor rotation

range of [0�, 180�], whereas the length of the chamber is

controlled by pressure-regulating valves in the range of

[0,1] bar. The cardinality of the action-set implies that the

module has to able to learn the required motor command

from a total of 96 ¼ 531,441 input action combinations.

(vi) Algorithmic settings: The algorithm runs online for a

total of 100 episodes with 50 trials per episode and is

evaluated on two criteria: (a) mean reaching error and

(b) trials needed for convergence.

Experimental setup

The experimental setup is similar to the one mentioned in

section ‘‘Kinetic characterization of the distal module’’;

however, an OptiTrack Motion Capture V120: Trio vision

feedback system was employed to provide real-time 3-D

positioning data. There are seven retroreflective markers

attached to the end effector in a random manner such that

four markers need to be detected for the 3-D reconstruction

of the tip position. The complete setup with the initial

position of the robotic module is shown in Figure 11. The

system represents a mapping from an R6 motor space to an

R3 Cartesian space.

Table 3. A summary of the obtained results for point-to-point motion control of the module through a sampled asymmetrictrajectory.a

No. of goals reached 12Desired accuracy 0.99 cmAverage reaching accuracy 0.79 cm + 0.18 cm

Sample number 1 2 3 4 5 6 7 8 9 10 11 12

Reaching accuracy (cm) .93 .59 .63 .93 .81 .79 .94 .66 .88 .98 .4 .93Actions taken per point before convergence 5394 4396 16 1020 787 410 30 423 27 8 230 661

aNote: The data from this table are color coordinated with the data provided in Figure 13.

Figure 12. The distal module moving point-to-point through an asymmetric 3-D trajectory generated offline and sampled at 12 points.Each frame corresponds to the learnt kinematic solution of the manipulator for a given sample. The green line depicts the trajectoryfollowed by the manipulator from sample to sample.

Ansari et al. 13

Results

We generated 12 points in an uneven shaped curve starting

from the top right end with respect to the module in the

resting position moving downward toward the resting posi-

tion. The higher-level controller then fed each sample

sequentially to the low-level controller.

Reaching error. Figure 12 depicts the kinematic solution

learnt by the manipulator for each sample. The algorithm

was found to converge for all 12 points with an average

reaching accuracy of 0.79 cm (+ 0.18) as summarized in

Table 2. It is worthy to mention that this result for accuracy

opens the possibility to apply this module to reach sensitive

areas for providing future services of scrubbing and rinsing.

Furthermore, the possibility to control both pneumatic and

cable actuators simultaneously implies that the control can

inherently produce stiffness, which otherwise require

tedious kinematic and force mappings. Thus, it facilitates

a model-free form of handling internal/external loads in an

efficient manner. The authors would also like to mention

that this accuracy can be improved by reducing the target

region size, provided that the system can also physically

reach that desired accuracy. The green line depicts the

trajectory the module moves through to go from one point

to another. The trajectory from the resting position to the

first target point is not shown as it obstructs the view from

the other targets.

Convergence time. For a given sample, the algorithm allows

the module to test a total of 50,000 actions (i.e. 100 epi-

sodes � 50 trials per episode); however, Table 3 demon-

strates that the algorithm is able to generate solutions for all

the given samples within a couple of hundred of actions

apart from targets 1 and 2, which will be discussed in the

next paragraph. This is a direct consequence of the reward

structure mentioned in the previous section. In a state

where the manipulator has experienced actions with both

lower and higher scalar reward, it will prefer actions with

higher reward. This implies high data efficiency but also

the ability to quickly learn optimal solutions high in accu-

racy. However, as a future research goal, the authors intend

to use the hierarchical control in conjunction with a learnt

forward model. A forward mapping is a causal unique rela-

tion between the motor and task space, and thus, are accu-

rate to learn. The inverse kinematic solutions can then be

learnt without executing real-time trajectories, saving both

time and hardware resources.

Targets 1 and 2 from Table 3 show that a significant

amount of actions were required to achieve convergence. This

is because the actions taken by the module to reach these two

targets positioned the retroreflective markers in such a way

that it was challenging to reconstruct the 3-D position for the

Opti-track system. As mentioned previously, an occlusion for

more than three markers at various places in the state space

returns no information to the algorithm which implies that

(0,0,0) is returned to the algorithm. This state has been given a

reward of�1000. The red plots in Figure 13(b) illustrate the

reward accumulation for these two targets, which is signifi-

cantly much higher as compared to the other samples. Despite

this challenge, the algorithm was able to learn a solution as

long as the target point was not occluded.

The quality of the learnt policies. As mentioned in section

‘‘Parametric linear function approximation’’, the worst per-

formance of this algorithm will be generating a suboptimal

policy. In this practical setup, this was found to be true. The

algorithm is tested from various starting positions; how-

ever, in majority of the cases, the solution was generated

from the positions that are closer to the goal rather than

further from the goal. One reason for this could be that the

resolution of the state-space is not optimized, that is, it

should be increased. However, this will result in an expo-

nential increase in computational time, whereas the policy

currently learnt very well meets our requirements. Conse-

quently, the authors believe that such an approach should

not be precluded from practical applications.

Figure 13 depicts the reward accumulation for the all

samples. The same trend is exhibited for all experiments

which is that the reward accumulation will increase until

convergence where it will receive a reward of 0, hence

becoming constant.

Discussion and conclusions

Soft robotics has already demonstrated promising results to

revolutionize key sectors such as industry and surgery pri-

marily because this technology guarantees safe human–

robot interaction. This article aims to apply the powerful

properties of soft robotic systems to design the first system

to assist the elderly community in the task of bathing. We

proposed an overall modular system; however, in order to

Figure 13. The reward accumulation for all samples of theasymmetric trajectory.


reach this ambitious goal, we first needed to design, fabri-

cate, and test the key component module that will perform

the main bathing activities, that is the distal module which

will come in direct contact with the human.

For the design, a hybrid actuated system was proposed.

It comprises of radially arranged bellow-shaped pneumatic

chambers and cables. The kinetic characterization of the

module pointed out the following functional capabilities

of the robotic platform: The novel bellow shape of the

fluidic actuators contributes toward elongation and con-

traction, whereas the fluidic chambers contribute to adapt

the tip orientation. The overall advantages provided by the

combination of these actuators are (i) reachability to the

user workspace: It guarantees to cover larger distances with

a reduced number of modules and orient the module with

respect to the target region of the user’s body. This was

observed through the measured reachable workspace

which was 21.5 cm � 17.2 cm � 10 cm in the x–y–z axes,

respectively. Consequently, a single module has the capa-

bility to match the dimensions of the back region of the

human. However, further experimentations are required to

test how well this module can be oriented toward this

region in a modular form. We also acknowledge that these

results are specific to the experimental setup described in

section ‘‘Experimental setup’’. However, this is not con-

sidered a source of error as the overall behavior (apart

from the asymmetries) is in congruence with theoretical

expectations. For future work, a more optimized setup will

be prepared by introducing motors instead of the servo-

motors to facilitate variable adjustment of the module

motion, due to the extrinsic cable-driven actuation. (ii)

Stiffness variation: Hybrid actuation provides an antago-

nistic motion between cables and pneumatic actuators that

results in a stiffening which is particularly important to

compensate for internal/external loading in the bathing

environment.

For the control, a point-to-point motion hierarchical

controller based on a multiagent RL was used to control

the inverse kinetics of a high-dimensional, nonlinear,

hybrid actuated manipulator. We tested the controller to

move the manipulator through an asymmetric trajectory

generated offline sample at 12 discrete points. We found

the algorithm to provide solutions to all points with posi-

tional accuracy of 0.79 cm + 0.18 cm. It is worthy to

mention that this accuracy opens the possibility to apply

this module to reach sensitive areas for providing future

services of scrubbing and rinsing. The accuracy can also

be improved by setting the target region to a lower value, of

course provided that the system can physically reach that

limit. Furthermore, as the positions are being reached with

all six input activations, this implies that the control is able

to exploit the stiffness properties in a model-free manner

while reaching which otherwise is achieved through com-

plex kinematic and force models. The authors also demon-

strated the effectiveness of applying this approach despite

its ability to generate only suboptimal policies. A limitation

of the approach lies in the possible high number of steps

needed to reach a single point. This will be addressed

through the computation of the forward model on which

the proposed method can be applied and then the inverse

solution is computed, this can be executed by the robot.

Future works will focus on this improvement as well as on

the control of the tip orientation which is not considered in

this implementation. Also, it is also demonstrated that

vision-based systems may introduce additional constraints

and hence, it is recommended to find alternative sources

of feedback.

Acknowledgements

The authors would like to acknowledge the support by the

People Programme (Marie Curie Actions) of the European

Union’s Seventh Framework Programme FP7/2007-2013/

under REA grant agreement number 608022; the European

Commission through the I-SUPPORT project (#643666); and

by the Italian Ministry of Foreign Affairs, General Directorate

for the Promotion of the Country System’’, Bilateral and Mul-

tilateral Scientific and Technological Cooperation Unit, for the

support through the Joint Laboratory on Biorobotics Engineer-

ing project.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect

to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, author-

ship, and/or publication of this article.

References

1. Bock T, Georgoulas C and Linner T. Towards robotic assisted

hygienic services: concept for assisting and automating daily

activities in the bathroom. Gerontechnology 2012; 11(2): 362.

2. Brose SW, Weber DJ, Salatin BA, et al. The role of assistive

robotics in the lives of persons with disability. Am J Phys Med

Rehabil 2010; 89(6): 509–521.

3. Katz S, Ford AB, Moskowitz RW, et al. Studies of illness in the

aged: the index of ADL: a standardized measure of biological

and psychosocial function. JAMA 1963; 185(12): 914–919.

4. Miskelly FG. Assistive technology in elderly care. Age Age-

ing 2001; 30(6): 455–458.

5. Zollo L, Wada K and Van der Loos HF. Special issue on

assistive robotics [from the guest editors]. Robot Autom Mag

IEEE 2013; 20(1): 16–19.

6. http://www.seatedshower.com/

7. http://www.drivemedical.co.uk/sections/bathroom-toilet-aids

8. Kirchner EA, Albiez JC, Seeland A, et al. Towards Assistive

Robotics for Home Rehabilitation. Biodevices 2013; 168–177.

9. Meng Q and Lee MH. Design issues for assistive robotics for

the elderly. Adv Eng Inform 2006; 20(2): 171–186.

10. Laschi C, Mazzolai B and Cianchetti M. Soft robotics: tech-

nologies and systems pushing the boundaries of robot abil-

ities. Sci Robot 2016; 1: eaah3690.

Ansari et al. 15

http://www.seatedshower.com/

http://www.drivemedical.co.uk/sections/bathroom-toilet-aids

11. Kier WM and Smith KK. Tongues, tentacles and trunks: the

biomechanics of movement in muscular-hydrostats. Zool J

Linn Soc 1985; 83(4): 307–324.

12. Kim S, Laschi C and Trimmer B. Soft robotics: a bioinspired

evolution in robotics. Trends Biotechnol 2013; 31(5):

287–294.

13. Trivedi D, Rahn CD, Kier WM, et al. Soft robotics: biological

inspiration, state of the art, and future research. Appl Bionics

Biomech 2008; 5(3): 99–117.

14. Onal CD and Rus D. A modular approach to soft robots. In:

Biomedical robotics and biomechatronics (BioRob), 2012 4th

IEEE RAS & EMBS International Conference, 24 June 2012,

pp. 1038–1045. IEEE.

15. Robinson G and Davies JB. Continuum robots-a state of the

art. In: Robotics and Automation, 1999. Proceedings. 1999

IEEE International Conference on 1999, Vol. 4, pp.

2849–2854. IEEE.

16. Loeve A, Breedveld P and Dankelman J. Scopes too flexible

and too stiff. IEEE Pulse 2010; 1(3): 26–41.

17. Immega G and Antonelli K. The KSI tentacle manipulator.

In: robotics and automation, 1995. proceedings., 1995 IEEE

International Conference on, 21 May 1995, Vol. 3, pp.

3149–3154. IEEE.

18. Tsukagoshi H, Kitagawa A, Segawa M, et al. Active hose: an

artificial elephant’s nose with maneuverability for rescue oper-

ation. In: Robotics and automation, 2001. proceedings 2001

ICRA. IEEE international conference on, 2001, Vol. 3. IEEE.

19. Pritts MB and Rahn CD. Design of an artificial muscle con-

tinuum robot. In: Robotics and automation, 2004, Proceed-

ings. ICRA’04. 2004 IEEE International Conference on, 26

April 2004, vol. 5, pp. 4742–4746. IEEE.

20. McMahan W, Jones BA and Walker ID. Design and implemen-

tation of a multi-section continuum robot: Air-Octor. In: Intelligent

Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ Interna-

tional Conference on, 2 August 2005, pp. 2578–2585. IEEE.

21. McMahan W, Chitrakaran V, Csencsits M, et al. Field trials

and testing of the OctArm continuum manipulator. In: Pro-

ceedings 2006 IEEE International Conference on Robotics

and Automation, 2006. ICRA, 2006. IEEE.

22. Kapadia A and Walker I. Task-space control of extensible

continuum manipulators. In: 2011 IEEE/RSJ International

Conference on Intelligent Robots and Systems.

23. Braganza D, Dawson M, Walker ID, et al. A neural network

controller for continuum robots. IEEE Trans Robot 2007;

23(6): 1270–1277.

24. Giorelli M, Renda F, Ferri G, et al. A feed-forward neural

network learning the inverse kinetics of a soft cable-driven

manipulator moving in three-dimensional space. In: Intelli-

gent Robots and Systems (IROS), 2013 IEEE/RSJ Interna-

tional Conference on, 3 Nov 2013, pp. 5033–5039. IEEE.

25. Grzesiak A, Becker R and Verl A. The bionic handling assis-

tant: a success story of additive manufacturing. Assem Autom

2011; 31(4): 329–333.

26. Rolf M and Steil JJ. Efficient exploratory learning of inverse

kinematics on a bionic elephant trunk. Neural Networks and

Learning Systems. IEEE Trans 2014; 25(6): 1147–1160.

27. Cheng NG, Lobovsky MB, Keating SJ, et al. Design and

analysis of a robust, low-cost, highly articulated manipulator

enabled by jamming of granular media. In: Robotics and

Automation (ICRA), 2012 IEEE International Conference

on, 2012. IEEE.

28. Stilli A, Helge AW, Althoefer K, et al. Shrinkable, stiff-

ness-controllable soft manipulator based on a bio-inspired

antagonistic actuation principle. In: 2014 IEEE/RSJ Inter-

national Conference on Intelligent Robots and Systems,

2014. IEEE.

29. Ranzani T, Cianchetti M, Gerboni G, et al. A Soft modular

manipulator for minimally invasive surgery: design and char-

acterization of a single module.

30. Shiva A, Stilli A, Noh Y, et al. Tendon-based stiffening for a

pneumatically actuated soft manipulator. IEEE Robot Autom

Lett 2016; 1(2): 632–637.

31. Marchese AD and Rus D. Design, kinematics, and control of

a soft spatial fluidic elastomer manipulator. Int J Robot Res

2016; 35(7).

32. Yang Q, Zhang L, Bao G, et al. Research on novel flexible

pneumatic actuator FPA. In: Robotics, Automation and

Mechatronics, 2004 IEEE Conference on, 1 December

2004, Vol. 1, pp. 385–389. IEEE.

33. Kang R, Branson DT, Zheng T, et al. Design, modeling and

control of a pneumatically actuated manipulator inspired by

biological continuum structures. Bioinspir Biomim 2013;

8(3): 036008.

34. De Volder M, Moers AJ and Reynaerts D. Fabrication and

control of miniature mckibben actuators. Sensor Actuat A

2011; 166(1): 111–116.

35. Manti M, Pratesi A, Falotico E, et al. Soft Assistive Robot for

Personal Care of Elderly People. In: 6th IEEE RAS / EMBS

International Conference on Biomedical Robotics and Bio-

mechatronics, 2016

36. Vannucci L, Cauli N, Falotico E, et al. Adaptive visual pur-

suit involving eye-head coordination and prediction of the

target motion. In: IEEE-RAS International Conference on

Humanoid Robots, 2014, pp. 541–546

37. Vannucci L, Falotico E, Di Lecce N, et al. Integrating feed-

back and predictive control in a bio-inspired model of visual

pursuit implemented on a humanoid robot. Lecture Notes in

Computer Science (including subseries Lecture Notes in Arti-

ficial Intelligence and Lecture Notes in Bioinformatics) 2015;

9222: 256–267

38. Ansari Y, Falotico E, Mollard Y, et al. A Multi-agent rein-

forcement learning approach for inverse kinematics of high

dimensional manipulators with precision positioning. In: 6th

IEEE RAS / EMBS International Conference on Biomedical

Robotics and Biomechatronics, 2016.

39. Wooldridge MJ. An introduction to multiagent systems. John

Wiley & Sons, 2009.

40. Sutton RS and Barto AG. Reinforcement learning: An intro-

duction. Cambridge: MIT Press, 1998.

41. Claus C and Boutilier C. The dynamics of reinforcement

learning in cooperative multiagent systems. In: AAAI/IAAI,

26 July 1998, pp. 746–752.


42. Sandholm TW and Robert HC. Multiagent reinforcement

learning in the iterated prisoner’s dilemma. Biosystems

1996; 37(1): 147–166.

43. Watkins C. Learning from delayed rewards. University of

Cambridge, 1989.

44. Howard RA. Dynamic programming and Markov processes,

1960.

45. Rummery GA and Mahesan N. On-line Q-learning using

connectionist systems, University of Cambridge, Department

of Engineering, 1994.

46. Robbins H and Sutton M. A stochastic approximation

method. Ann Math Stat 1951: 400–407.

47. Singh S, Jaakkola T, Littman ML, et al. Convergence results

for single-step on-policy reinforcement-learning algorithms.

Mach Learn 2000; 38(3): 287–308.

48. Singh S, Jaakkola T, Littman ML, et al. Convergence results

for single-step on-policy reinforcement-learning algorithms.

Mach Learn 2000; 38(3): 287–308.

49. Bellman RE. Dynamic programming. Princeton University-

Press, BellmanDynamic programming, 1957: 151.

Ansari et al. 17

towards the development of a soft manipulator as an …...2018/08/06 · research article towards...

Documents