motion planning and reactive control on learnt skill manifolds · 1 motion planning and reactive...

1

Motion Planning and Reactive Control onLearnt Skill ManifoldsIoannis Havoutis and Subramanian Ramamoorthy

Abstract—We address the problem of encoding and executingskills, i.e., motion tasks involving a combination of specificationsregarding constraints and variability. We take an approach thatis model-free in the sense that we do not assume an explicitand complete analytical specification of the task - which canbe hard to obtain for many realistic robot systems. Instead,we learn an encoding of the skill from observations of aninitial set of sample trajectories. This is achieved by encodingtrajectories in a skill manifold which is learnt from data andgeneralizes in the sense that all trajectories on the manifold satisfythe constraints and allowable variability in the demonstratedsamples1. In new instances of the trajectory generation problem,we restrict attention to geodesic trajectories on the learnt skillmanifold, making computation more tractable. This procedure isalso extended to accommodate dynamic obstacles and constraints,and to dynamically react against unexpected perturbations,enabling a form of model-free feedback control with respect to anincompletely modelled skill. We present experiments to validatethis framework using various robotic systems -ranging from a3-link arm to a small humanoid robot- demonstrating significantcomputational improvements without loss of accuracy. Finally,we present a comparative evaluation of our framework againsta state-of-the-art imitation learning method.

I. INTRODUCTION

REALISTIC humanoid robotic tasks are often defined byskill specifications that involve a combination of high-dimensional nonlinear dynamics, kino-dynamic constraintsand task variability that is often idiosyncratic when capturedin the language of cost functions in an optimal control setting.Although, in principle, such robotic skills may be handled bywell understood analytical methods, it can often be difficultand expensive to formulate models that enable such an ana-lytical approach.

As an instance of a task that motivates our general approach,consider the Nao humanoid robot as used in robotic soccercompetitions. The agent as a whole needs a variety of dif-ferent skills involving locomotion in the presence of dynamicobstacles, full body manipulation such as for kicks and dives,etc. One specific instance of a flexible skill would be thatof kicking - where the robot needs to be able to produce avariety of different kicking motions that cover the entire spacereachable by the robot’s leg, taking into account considerationsranging from staying upright, to dealing with constraints to leg

I. Havoutis is with the Department of Advanced Robotics at theItalian Institute of Technology, and S.Ramamoorthy is with the Insti-tute of Perception, Action and Behaviour, School of Informatics, Uni-versity of Edinburgh, Edinburgh EH8 9AB, United Kingdom. Email:[email protected], [email protected].

1This paper unifies and extends previous work appearing in Havoutis andRamamoorthy (2010a,b).

and body movement, to the idiosyncracies of what defines a‘good’ useful kick. The focus of this paper is on such skillsthat are variations to a basic type of motion, which we wishto encode in such a way as to enable efficient (in terms ofcomputation time and performance) motion synthesis.

To recapitulate the standard approach to dealing with suchproblems, in a classical optimal control setting, e.g. Todorovand Li (2005), one typically begins with analytical models ofthe robot’s dynamics and kinematics, as well as models ofconstraints such as due to joint limits or friction, the effectsof impact, etc. In addition, one specifies the task in termsof objectives such as cost functions that specify in additionto time and energy considerations, preferences with respectto style, redundancy resolution and other somewhat moresubjective factors. Armed with these, one is in a positionto solve the problem using a variety of different search andoptimization techniques. In practice, such an approach presentstwo types of difficulties. On the one hand, it can be hard tocompletely specify skills of the kind we have in mind in termsof the models and specifications mentioned above. However,even when this can be done, the problem of searching throughthe possible trajectories can be computationally intractable formany realistic problem instances.

In practice, motion synthesis problems are renderedtractable by the fact that the solutions to such optimizationand search problems are not arbitrarily complex paths throughhigh-dimensional configuration spaces but instead are typicallyrestricted to subspaces that are defined by the task specifica-tions. In a sense, all variations of the trajectories representinga parameterized skill can be described in terms of somequalitative structure that is common to the solution trajectories.Typically, this qualitative structure admits a geometric descrip-tion and it can be learnt from data, which is the approach ofthis paper.

This observation regarding qualitative structure and its usein motion synthesis has a rich history. To identify a fewkey instances from this literature, Full and Koditschek (1999)demonstrate how biological motion is structured through col-lapse of dimensionality due to trimming away of degrees offreedom, exploiting synergies and symmetries. The resultingreduced model, a ‘template’, can be used as the target fora reduced-complexity control strategy which may then be‘anchored’ in the original system. This has a parallel inrobotics, e.g., Burridge et al. (1999), where complex controlstrategies are composed of simpler pieces that have welldefined semantics in terms of stabilization within a basinof attraction. In Conner et al. (2009, 2003, 2006), we seean extension of this idea to the case of motion planning,

2

Fig. 1. Overview of our approach. Skills are encoded as skill manifolds thatare learnt from example solution sets offline. The learnt skill set is the usedby the robot to generate trajectories that accommodate novel goals and/orconstraints, and to react to unforeseen disturbances in an on line manner.

where motion plans are composed from local parameterizedfeedback plans. Recent work such as Tedrake et al. (2010)brings together the continuous control and discrete motionplanning angles by defining motion synthesis in terms of a treestructured object whose nodes are linear quadratic regulatorswith well defined domains of applicability.

A related observation from the control of autonomous flyingrobots, e.g., Dever et al. (2006, 2004) and Frazzoli et al.(2005, 2003), is that the solutions to optimal control problemsare nicely ordered within maneuver sets. So, a lowest leveldescription of a skill is in terms of a set of trajectories thatcapture the variability of the task while respecting all of thespecifications regarding dynamics, constraints and costs. Suchskills may then be composed and dealt with as discrete objectsfor higher levels of deliberation. For the purposes of this paper,we note that these sets typically admit a description in the formof manifolds embedded in the configuration and phase spaceof the system being controlled.

The focus of our work in this paper is on learning suchsolution sets from data in the form of a manifold, generalizingfrom a sample set of trajectories that share the underlyingqualitative structure, and utilizing this manifold as the basisfor subsequent motion synthesis, by a process of constrainedgeodesic trajectory generation. We learn the skill in an offlinephase directly from demonstrated data. Such data can be eitherthe result of an expensive optimization procedure (which isfeasible offline) or demonstrations of an expert. The learntskill manifold can be then used by the robot online, in anautonomous manner, accommodating novel constraints andgoals, see Figure 1 for a visual depiction of this pipeline.

In our work, we bring together two different ways in whichthe concept of manifolds have often been used. On the onehand, control theorists and motion planning researchers arewell aware that system dynamics of many interesting systemsadmit geometric structure yielding distinguished trajectoriesthat are sub-manifolds in the configuration and phase spacesof the system. Indeed, some of our above mentioned examplesare drawn from this literature, (e.g. Kobilarov and Marsden(2011), Lewis (2007)). Separately from this, machine learningresearchers have looked at manifolds as a way of repa-rameterizing data, reducing from an initial high-dimensional

description to some lower dimension that captures the modesof variation in a form that is suitable for visualization oroperations such as clustering. Typically, in this way of usingthe notion of manifolds, one transforms the problem intothe low-dimensional space and abandons the original high-dimensional space altogether as the objective was always theextraction of the low-dimensional coordinates and associateddistances (e.g. Tenenbaum et al. (2000), Bitzer et al. (2008)).In this paper, we use the notion of a manifold in a sense thatis closer to the control or motion planning usage - so we treattrajectories as being restricted to a subspace within a higherdimensional ambient space. Indeed, when we do reactivecontrol in the face of disturbances, we use the learnt manifoldto induce a distance in the ambient space. However, we alsodraw on the machine learning approach by utilizing thosealgorithms to identify the subspace without having recourseto an initial analytical characterization. So, a key aspect ofthe work described in this paper is the combination of thesetwo notions within a novel framework for motion synthesis.

The remainder of this paper is organized as follows. Thenext section provides the background of our approach, cov-ering both the motion planning and the machine learningdomains. Section III introduces the manifold learning methodthat is core to our proposed framework. Section IV demon-strates how one can learn such a representation in a datadriven manner, how to use a manifold encoding to generatenovel trajectories, how to add novel constraints to such learntskill manifolds, and how to utilize these for online reactivecontrol. The next three sections (section V, section VI, sectionVII) present experimental evaluations on three robotic setups.The first is a planar 3-link arm suitable for demonstrating allthe concepts of our approach in an easy to visualize manner.The second is a set of experiments on an anthropomorphicrobotic arm. The third section evaluates our approach on aset of experimental setups using the Nao humanoid robot,performing tasks as walking and balancing on one leg. Finally,section (VIII) provides a comparison of our framework witha state-of-the-art immitation learning method, namely theGMM/GMR framework (Gribovskaya et al., 2010).

II. BACKGROUNDAt a high level, we are interested in solving the problem of

finding optimal trajectories - where the specifications involve avariety of considerations already outlined above - in a way thatmirrors the model-free nature of established algorithms such asfor sampling-based motion planning. We would like to be ableto minimize the artistry demanded of the user, by devising areasonably general representation, an efficient learning methodutilizing this representation and efficient planning methodsthat utilize this encoding. So, we begin by briefly discussingbackground in control and planning methods. Also, we arenot the first to consider the notion of learning skills fromdemonstration, although we do present a novel take on theproblem in terms of our task encoding and motion synthesismethod. We review the prior work in this domain to set thescene for our own work.

Some of the most successful approaches to path planningare randomized algorithms that exploit sampling. For example,

3

the Rapidly Exploring Random Tree (LaValle, 2006) is aremarkably simple yet effective algorithm for planning a pathbetween two points in configuration space. Many modern RRT-based samplers make use of two trees (Kuffner and LaValle,2000, Diankov et al., 2008), rooted at the initial and the goalpoints, where in every step the trees are grown and a checkfor matching the trees is also performed. The key limitation,however, is that sampling a high dimensional space denselyenough is computationally infeasible.

Other studies in this direction include obstacle based sam-pling (Thomas et al., 2007, Rodriguez et al., 2006), Gaus-sian sampling (Boor et al., 1999) and related approaches.Kuffner et al. (2002) demonstrated an interesting approachto this problem based on a combination of the RRT algorithm(LaValle and Kuffner, 2001) and post-processing in the form ofdynamics filtering to ensure stability of the resulting behaviour.

Nonetheless, random sampling in high dimensions canbe excessively wasteful when the underlying task has morerestrictive structure. As it is becoming increasingly clear thatmany interesting robotic behaviours are restricted to low-dimensional subspaces (Vijayakumar et al., 2005, Bitzer et al.,2008, Full and Koditschek, 1999, Ramamoorthy and Kuipers,2006, Jenkins and Mataric, 2004), for a variety of reasonsranging from joint limits and self-collisions to stability andenergy constraints, it is desirable to leverage this in the processof achieving more focused coverage.

As known from the study of biological behaviours, natu-ral systems utilize synergies and coordination strategies thatallow for efficient locomotion and fast planning. Biologicalstrategies usually have a musculoskeletal basis that is inherentto the dynamics of the system, that restricts movement to asubset of all possible solutions. In a robotics context, systemand (possibly artificial) task constraints can serve the samepurpose. Robotics (Ramamoorthy and Kuipers, 2008, Isto andSaha, 2006) and graphics (Safonova et al., 2004) researchershave utilized this fact to devise efficient motion synthesisstrategies. Some recent works (Berenson et al., 2009, Berensonand Srinivasa, 2010, Stilman, 2007, Bretl et al., 2004) alsoaddress this issue by considering how task space constraints,e.g., end-effector constraints, can be used to structure planningin configuration space with local Jacobian mappings. Howeverthe low-dimensional nature of the solutions may not alwaysbe taken into account explicitly.

Along this direction, Kobilarov and Marsden (2011) con-sidered optimal control of mechanical systems that evolve ona finite-dimensional Lie group. They use a discrete variationalprinciple that yields a set of trajectories that approximatelysatisfy the dynamics and that respect the state-space structure(manifold) of the system in question, while they also devel-oped an approximate numerical solution for the computationof such trajectories.

An aspect of the control problem associated with motionsynthesis in complex systems such as humanoid robots isthat of identifying desirable paths, executing evolution alongthese paths and correcting deviations away from these paths.In simple systems and systems where analytical models areavailable, there exist numerous control approaches that canprovide principled and provably correct solutions (Stengel,

1995, Siciliano and Khatib, 2008, Levine, 1996). For manyrealistic robot systems, explicit design of controllers utilizingthese established analytical methodologies appears infeasible.

Instead, one can break the state space of the system insmaller regions where a well defined control law can beimplemented, this way assembling a global solution throughlocal decompositions. For example, Zhang et al. (2009) per-form an approximate cell decomposition of the free spaceand compute vector fields within cells by considering theiradjacency relations, with the constraints and goals in mind.Conner (2008) decomposes the full space in smaller domainsand sets up local predefined feedback laws with specific state-evolution properties. Such decomposition can also becomedifficult as the dimensionality of the space increases.

Imitation learning approaches attempt to encode motions,and feedback characteristics, directly from demonstration ex-amples (Schaal et al., 2003). This sometimes requires littleto no prior knowledge of the system characteristics while itis an intuitive way to teach complex motions with multi-DoFsystems. Popular imitation learning paradigms include Learn-ing by Demonstration (LbD) and Dynamic Motion Primitives(DMPs). Demonstration-Guided Motion Planning (DGMP)(Ye and Alterovitz, 2011) is another recent development onthe imitation learning front.

LbD estimates the dynamics of a given set of demonstrationsin a statistical manner, using Gaussian Mixture Model (GMM),approximating the behaviour with a nonlinear combination ofGaussian kernels. Gaussian Mixture Regression (GMR) is thenused to sample the GMM and produce a control input giventhe current system state (Calinon and Billard, 2008, 2009,Hersch et al., 2008). Disadvantages of such methods includethe appearance of spurious attractors that can trap the evolutionof the state of the system and numerical difficulties that arisefrom matrix singularities, something we discuss further insection VIII.

DMPs provide a framework for learning the dynamics ofdemonstrated motions by using a simple dynamical systemencoding with guaranteed attractor properties. Such dynamicalsystems though can only exhibit trivial behaviour, thus a non-linear function is being learnt from the examples that makesup for the idiosyncrasies of the demonstrated movements. Theadvantage of the formulation is that the generated trajectoriesretain the spatial and temporal characteristics of the motionthat has been used for learning, providing discrete and rhyth-mic movement formulations, while allow for easy changes ofstart positions, goal positions and temporal scaling. In additionconvergence to the goal is guaranteed by the dynamics ofthe underlying simple dynamical system, as the effect of thenon-linearity vanishes with the evolution of the time variable(Ijspeert et al., 2002, 2001). A difficulty common to almostall imitation learning approaches is that of scaling up tosystems with multiple DoFs and complex dynamics. Mostrecent results consider Cartesian space motions, limited to a 2or 3 dimensional Cartesian space (Gribovskaya et al. (2010),Pastor et al. (2009)), making the variables in question muchmore “well-behaved”, while an additional inverse kinematicslayer is used to produce the actual joint movement of the plant.

The machine learning literature includes many examples

4

of dimensionality reduction methods used to abstract and/ormake problem spaces manageable. For example Chalodhornet al. (2006) use a low-dimensional sensory-motor mapping tooptimize demonstrated motions over the robot’s dynamics. Inthe same spirit (Abbeel and Ng, 2004) presented Apprentice-ship Learning as an inverse reinforcement leaning formulationthat approximates the unknown reward function of a Markovdecision process (MDP) that the demonstrations are assumedto be optimizing. Wang et al. (2008) introduced GPDM, aGaussian process based dimensionality reduction with a dy-namical model of the evolution of the state, that can learn mod-els of human kinematic trajectories. Bitzer et al. (2008) usea Gaussian Process-based nonlinear dimensionality reductiontechnique to arrive at an underlying model of demonstrateddata, while using a parameterized path generation method overthe learnt representation to generate novel movements. Wecontribute to this growing literature by explicitly addressingthe issues of task variability and efficient feedback controldespite incomplete model specifications.

A. Motivation and comparison to dim. reduction methodsOne of our goals is to learn a geometric structure, i.e.,

a skill manifold, that captures the intrinsic structure of thespace of trajectories by approximating the tangent space fromdemonstration data. So, if one begins with a set of motionexamples from a specific class, e.g., due to a path optimizationor redundancy resolution principle or even a more complexkinodynamic constraint, then one seeks a representation thatintrinsically captures both the restriction of states to a low-dimensional space and the evolution of the trajectories in thatspace - as opposed to imposing a trajectory generation scheme,post hoc.

In the usual formulation, manifold learning is aimed atfinding an embedding or ‘unrolling’ of a nonlinear manifoldonto a lower dimensional space while preserving metric prop-erties such as inter-point distances. Examples include MDS(Hastie et al., 2001), LLE (Roweis and Saul, 2000) and Isomap(Tenenbaum et al., 2000). Much of this work has been focusedon summarization, visualization or analysis that explains someaspect of the observed data. On the other hand, as alreadymentioned, our goal is to learn a manifold encoding of motiontasks that allows us to perform control and planning operationsin the spirit of above mentioned methodologies - whichrequires us to address a different problem from visualizationor clustering.

Examples of state of the art manifold learning algorithmsinclude the following:• Coordinated factor analysis (CFA) has been introduced

in Verbeek (2006) for modeling data sampled from mani-folds. It uses a mixture of factor analyzers to approximatea non-linear factor manifold.

• Locally linear coordination (LLC) (Teh and Roweis(2003)) aligns the hidden representations used by eachcomponent of a mixture of dimensionality reducers into asingle global representation of the data throughout space.

• Manifold charting (Brand (2003)) coordinates local para-metric models to obtain a globally valid nonlinear embed-ding function. Like LLC, this charting method defines a

quadratic cost function and finds the optimal coordinationdirectly.

• Isomap, (Tenenbaum et al. (2000)), globally coordinatesproximal pairwise distances using all-pairs shortest pathsdistances computed from a neighbourhood graph on thedataset. Isomap assumes underlying structure is mani-fested as a bordered manifold. This underlying manifoldcan be uncovered by Isomap, given the input data set isdense enough to cover the entire manifold and forms asingle connected component.

• Locally smooth manifold learning (LSML) (Dollár et al.(2007)), rather than posing manifold learning as theproblem of recovering an embedding, poses the problemin terms of learning a warping function for traversingthe manifold. LSML explicitly focuses on generalizingto unseen portions of the manifold by making a smooth-ness assumption, something very beneficial for use in arobotics learning context as in our proposed framework.

Building on LSML, we learn a model of the tangent spaceof the low-dimensional nonlinear manifold, conditioned onthe (temporally driven) adjacency relations of the high dimen-sional data. Such a learnt manifold model can then be used tocompute geodesic distances, to find projections of points onthe manifold and to directly generate geodesic paths betweenpoints. So, an important point that differentiates our approachfrom alternate data-driven approaches that also utilize someform of dimensionality reduction is that although we have alow-dimensional representation to reduce complexity, we solvean integrated planning and control problem in the ambient highdimensional space. This gives a clearer interpretation to whatthe controller is achieving: enforcing a large domain vectorfield towards the manifold and along the manifold. This makesthe consideration of obstacles (Havoutis and Ramamoorthy(2010b)) and disturbances much more natural, without havingto worry about how they themselves may be mapped to anartificial low dimensional space. Issues such as this latter pointtend to be rather delicate in many alternate approaches.

III. MANIFOLD LEARNING

Our approach, building on LSML introduced in Dollár et al.(2006), attempts to learn an approximation of the tangent spacethat is locally common to neighboring data points. This waywe can start from a point in the high dimensional data spaceand generate its neighbors, even beyond the support of thetraining data and up to a locally quadratic approximation. Thekey difference between embedding methods, that try to find astructure preserving embedding, and our approach is the goalof learning how to traverse such structure rather than embedit to a new coordinate frame.

A. Model formulation

Assume a given data set of D dimensions, in our case aset of example trajectories in a robot’s state space. Due totask aspects including kinematic and/or dynamic constraintsand task constraints, the data lies on a smooth d-dimensionalmanifold that is embedded in the D-dimensional state space.

5

The d dimensions of the manifold effectively explain the localmodes of variation as presented in the dataset.

The data set can be described as a set of points x ∈ RD,while the image of the points on the manifold is y ∈ Rd.There exists a continuous bijective mapping M that convertslow dimensional points y from the manifold, to points x ofthe high dimensional space,

x =M(y).

The central observation is that given two neighboring pointson the manifold, xi and xj , the difference between thesepoints, ∆i.j , should be a linear combination of the tangentvectors at that point on the manifold, scaled by an unknownalignment factor �ij . Assume a mapping H from a point onthe manifold to its tangent basis H(x),

H : x ∈ RD 7→[∂

∂y1M(y) · · · ∂

∂ydM(y)

]∈ RD×d,

where each column of H(x) is a basis vector of the tangentspace of the manifold at y, i.e. the partial derivative ofM withrespect to y. Taking ∆i.j to be the centered estimate of thedirectional derivative at x̄ij (where x̄ij ≡ (xi+xj)/2) and �ijto be the unknown alignment factor, we haveH(x̄ij)�ij ≈ ∆i.j ,that holds given � is small enough and the manifold can belocally approximated with a quadratic form.

B. Formulation of the learning problem

We learn a model of H from data, parametrized by θ,giving us the tangent space model Hθ. Learning requires thefollowing error to be minimized:

err(θ) = min{�ij}

∑i,j∈Ni

∥∥Hθ(x̄ij)�ij −∆i.j∥∥22 . (1)The goal of training is to find the θ that minimizes Equation 1,where �ij are additional free parameters that are optimizedover and do not affect model complexity.

To enforce the smoothness of the mapping Hθ we add anexplicit regularization term, in addition to implicit smoothnessthat may come from the form of the model itself. The intuitionbehind the first term is that the learned tangents at twoneighboring locations, x̄ij and x̄ij

′, should be similar, i.e.,∥∥∥Hθ(x̄ij)−Hθ(x̄ij′)∥∥∥2

F, should be small. The second term is

used to avoid numerical instabilities, ensuring that Hθ’s donot get very small and �’s very large, thus

∥∥�ij∥∥22

is alsoconstrained. This way the following regularization term isadded to Equation 1:

r = λE∑∥∥�ij∥∥2

2+ λθ

∑∥∥∥Hθ(x̄ij)−Hθ(x̄ij′)∥∥∥2F.(2)

To simplify the error term we can rewrite it using a single λ aswe can treat Hθ and αHθ as the same for any α > 0. This waythe error of Hθ with regularization terms (λE , λθ), is equal tothe error of αHθ with regularization terms (α2λE , 1α2λθ), inessence translating in a single λE = λθ = λ. Throughout thiswork we set λ equal to 10−3.

C. Learning algorithm

In principle any regression technique can be employed formodeling Hθ, e.g. Vijayakumar et al. (2005), Rasmussen andWilliams (2006). A linear model of radial basis functions(RBFs) is particularly well suited for the task (Bishop (2007)).The RBFs we use are of the form:

fk(x) = exp(−||x− µk||22

2σ2k),

where the basis centers µ are computed with K-means cluster-ing on the given dataset and the basis width σ is set to be twicethe average of the minimum distance between each cluster andits nearest neighbor center. This is a common choice for σ inRBF methods (Bishop, 2007), that ensures coverage of thespace between kernels, i.e. no disconnected components exist.From the RBFs we can compute a feature vector fi from x̄ijof the form fi = [f1(x̄ij), . . . , fk(x̄ij)], where the number ofbasis function k controls the smoothness of the final mappingHθ. This way a large k would result in a mapping that betterfits local variations of the dataset but whose generalizationabilities to other points on the manifold would be weaker,following the bias-variance trade-off. We can then define,

Hθ(x̄ij) = [Θ1fij , . . . ,ΘDfij ],

where the Θs are randomly initialized d× f matrices.To solve the system we compute the thin singular value

decomposition (SVD) of ∆i. From the linearity assumptionthere are at most d non-zero singular values. In practice wecan compute the truncated SVD that keeps at most 2d singularvalues and allows for significant computational reduction. Thisway we have ∆i = U iΣiV iT and by plugging in this andHθ(x̄ij) into Equation 1, and rearranging we get:

err(θ) = minEi

n∑i=1

D∑k=1

∥∥∥fiTΘkTEi − U ik.Σi∥∥∥22. (3)

To solve Equation 3 simultaneously for E and Θ is complexbut by keeping either one constant we can optimize iterativelythe other variable, in an EM–like fashion, becoming a leastsquares problem. This way we solve for Ei keeping the Θksfixed as:

Ei = argminEi

D∑k=1

∥∥∥fiTΘkTEi − U ik.Σi∥∥∥22

(4)

= argminEi

∥∥HiEi − U ik.Σi∥∥2F (5)= Hi

+U iΣi. (6)

In the next iteration we keep Ei fixed and optimize for Θksby:

Θk = argminΘk

n∑i=1

∥∥∥fiTΘkTEi − U ik.Σi∥∥∥22

(7)

= argminΘk

∥∥∥(EiT ⊗ fiT) vec(ΘkT)− vec (U ik.Σi)∥∥∥2F.(8)

6

Since vec(U ik.Σ

i)

= ΣiU ik.T the least squares solution for Θk

becomes

vec(Θk)

=

E1T ⊗ f1T

...EnT ⊗ fnT

+ Σ

1U1k.T

...ΣnUnk.

T

(9)We continue iterating the error minimization procedure untilthe system converges. The iterative nature of the algorithmdoes not guarantee the convergence to a global minima of theerror function, thus a number of random restarts are performedto avoid bad local minima. In all experiments we performed3 random restarts and continue with the model that achievesthe smallest error. In practice the error difference between therandom restarts was never significant and, to our knowledge,there is no evidence that we suffered from serious local minimaissues.

D. Intrinsic manifold dimensionality

The manifold dimensionality d is a parameter of the mani-fold learning approach. Throughout this work we have used asimple cross-validation approach to determine this parameter.We begin with a dimensionality of 1 and iteratively increasethe model dimensionality while keeping track of the modelerror. The point at which the error derivative with respect tothe dimensionality flattens is a good candidate for the modeldimension. In other words beyond that point adding moredimensions to the model produces only a small decrement tothe model error, sometimes also interpreted as over-fitting. Anumber of methods for estimating the intrinsic dimensionalityof manifolds are available in the literature (e.g. Costa and Hero(2003), Kégl, Levina and Bickel (2004), Hein and Audibert(2005), Eriksson and Crovella (2012)). This is an active area ofmachine learning research, however, intrinsic dimensionalityestimation is beyond the scope of this paper.

IV. PATH PLANNING AND CONTROL WITH MANIFOLDS

The core of a planning and control framework requires a setof procedures relevant to learning, generalizing and executingmotions. To begin with, a procedure for learning motions isimportant. In essence this would be a way to add new skill-representations to the existing skill-set. Another importantfeature of the framework would be to generalize from learntskills and be able to refine them according to the constraintsand goals of novel planning queries. In addition, the controlphase of the framework requires a method for observing thenominal execution of the planned trajectories and reacting ina corrective manner should unexpected perturbations occur.

This section presents the main methods developed andutilized for learning manifold skill encodings and for planningand controlling a robotic system through such representations.The first subsection describes leaning skills from examplesand generating unconstrained trajectories. The second subsec-tion presents the method for incorporating novel constraintson learnt manifolds without the need to re-learn. The thirdsubsection describes how we can control the system on-line

(a) Manifold geometry (b) Initialization

(c) Gradient optimization (d) Geodesic path

Fig. 2. Sketch of the geodesic trajectory generation procedure. (a) We beginwith a learnt manifold model, a starting point on the manifold, often beingthe current state of the system, and a goal point, that is the state we want toreach. (b) We initialize with a trajectory that can traverse the ambient spaceof the system. This can be as crude as a simple manifold interpolation for lowdimensional spaces. (c) We subsequently optimize the trajectory with respectto the manifold geometry by following the gradients of two errors defined overthe geometry. The first drives the points on the manifold while the secondminimizes the path length. (d) The outcome is an optimal length geodesictrajectory. In other words the shortest trajectory that connects the start andgoal states and does not diverge from the underlying manifold geometry.

given a learnt skill manifold. In essence we present step-by-step how a manifold representation can be used to tightlyintegrate path planning and control of robotic systems.

A. The manifold encoding for robotic skills

We are interested in preserving properties (e.g. spatial prop-erties) of trajectories in the data set. Thus, our goal is to learna model of the tangent space –a geometric representation– ofthe low-dimensional nonlinear manifold, conditioned on theadjacency relations of the high dimensional data. The learntmanifold can be used to compute geodesic distances, to findprojections of points on the manifold and to directly generategeodesic paths between points.

1) Learning the manifold model: A manifold encodingis grounded in a set of example solutions. As mentionedearlier, such a set is produced either from a computationallyintensive optimization procedure, an optimal controller, oreven an expert demonstrator. The key characteristic is thatsuch examples are trajectories corresponding to a specificobjective that represents the skill, and these trajectories definethe underlying manifold geometry. Thus, similar queries havesimilar solutions, directly deriving from a local smoothnessassumption.

Training data are points from the system’s state space thatrepresent trajectories of a particular skill. These are of theform,

Q = [q1, . . . ,qk]T , (10)qi = qi1, . . . , q

in, i = 1, . . . , k, (11)

where the number of examples is k and the length of eachtrajectory is represented by n. Each datapoint qij belongs toa D-dimensional space, the system’s state space, and lies ona locally smooth d-dimensional manifold. This subspace isembedded in the D-dimensional space. To approximate thiswe learn a mapping from each point of the D-dimensionalspace to the tangent basis of the manifold, H(q).

7

The datapoints belong to the state space of the system andsuch a state space can contain configuration variables, poses,higher order terms, as velocities and accelerations, or anycombination of the above. Most of our experiments focus onthe configuration space of the systems rather than the taskspace.

Modeling Hθ is done with a linear model of radial basisfunctions (RBF’s) with features over the evidence (Hastie et al.(2001)), where the RBF’s, µ’s and σ’s are computed directlyfrom the data in an unsupervised manner. The model is trainedwith the procedure presented in the previous section.

2) Optimal geodesic paths: Our goal is to find the shortestpath between two prespecified poses q1, qn ∈ RD, D beingthe dimensionality of the configuration space, that respects thegeometry of the learnt manifold. In a robotics context, beingon the manifold essentially means that the constraints (e.g.,optimality w.r.t. a particular task-specific cost) inherent in thetraining data are respected. In practice we, discretize our pathinto a set of n via points, q = q1, . . . , qn, with the q1 and qnbeing fixed, and we follow a combination of gradient descentsteps to minimize the length of the path while not leaving thesupport of the manifold. Figure 2 presents sketches of eachstep of the trajectory generation procedure that is elaboratedbelow.

The geodesic trajectory generation procedure consists oftwo phases. In the first phase an initial estimate of the pathis created and in the second phase this path is optimized withrespect to the manifold geometry and the overall path length.

3) Geodesic path initialization: The initialization proce-dure is based on the start and goal points, that are keptconstant, and the average distance estimate `, that is derivedfrom the training data set. The process begins with the distancebetween the two initial points, the edges of the path. Thedistance is split in half and either the left or right edge (startor end) is grown recursively towards the middle.

Growing the path involves taking consecutive smallgeodesic steps, i.e. finding points that move towards themidpoint of the distance and are on the manifold. The pointreached is set to the new estimate of left (or right) and theprocedure continues with recursion. The recursion stops whenthe distance between the two points in consideration is equal orless to `. Pseudocode of this phase is available in Algorithm 1while Figure 2(b) provides a sketch of the outcome of the ini-tialization step. This produces the geodesic path initializationwhich is subsequently optimized with respect to the manifoldgeometry. This recursion is guaranteed to find a path given theassumptions of the manifold learning phase described earlier(III-A,III-C), i.e. the manifold is a single connected componentand smooth up to a locally quadratic approximation.

4) Geodesic path optimization: The initial estimate of thepath is a manifold interpolation that roughly follows the man-ifold geometry but is not the shortest geodesic path possible.Since we have learnt the tangent space of the manifold we canfind a minimum energy solution that is a geodesic path andits length can be minimized.

The optimization of the path is performed with an iterativegradient descent procedure that is performed in two steps. Thefirst step follows the orthonormal (to the manifold) component

Algorithm 1 Initial Geodesic TrajectoryINPUT: M, `, qstart, qendOUTPUT: q ≡ {qstart, q2, . . . , qi, qend}qLeft = qstart, qRight = qend, distPrev =∞loopdiff = qLeft− qRightdist = norm(diff)if dist < ` thenbreak {Distance reached}

else if dist < distPrev thenmidpoint = (qLeft+ qRight)/2qLeft = recursion(qLeft,midpoint)qRight = recursion(qRight,midpoint)break {Recursion}

elsedistPrev = distqNext = qLeft (or qNext = qRight)H = Hθ(qNext)diff = H ×H ′ × (diff/norm(diff))qNext = qNext+ diff × ` {Geodesic step}

end ifend loop

of the gradient of:

errM(q) = min{�ij}

∑i,j∈Ni

∥∥Hθ(q̄ij)�ij − (qi − qj)∥∥22 ,that essentially makes the qi’s “stick” to the learnt manifoldby iteratively moving them to points where neighbouring(consecutive) bases are aligned. The second gradient descentstep follows the parallel (to the manifold) component of:

errlength(q) =

n∑i=2

∥∥qi − qi−1∥∥22,

that iteratively minimizes the length of the path withoutleaving the support of the learnt manifold, while keeping theendpoints fixed.

Thus we are able to generate novel unconstrained trajec-tories that are geodesic paths over the learnt manifold. Suchpaths are locally optimal with respect to path length while alsofollowing the underlying geometry, specific to each roboticskill and observed in the demonstrated solution set.

B. Changing environments and dynamic constraints

Often in systems that act in a changing environment novelconstraints can be introduced dynamically. For example con-sider the task of walking for a humanoid robot. We can learn anunconstrained walking manifold that produces stable steppingmotions of variable step start and end points. Such trajectoriesassume that there are no obstacles in the way of the swingingleg. Now if obstacles are introduced, imagine the play-room ofa kid, the system will have to generate motions that are ableto avoid such obstacles in task space, e.g. step over a pileof Legos, avoid randomly dropped toys. Such obstacles werenot present in the manifold learning step and are regarded asconstraints that are not encoded in the skill manifold.

8

(a) Unconstrained geodesic (b) Obstacles introduced

(c) Constrained geodesic path

Fig. 3. Example sketch of a constrained optimization scenario and ofthe proposed solution procedure. (a) An unconstrained generated geodesictrajectory for a path planning query originating qstart and reaching qend.(b) The appearance of an obstacle set O introduces further constraints in thegeodesic generation procedure as the intersection of the set with the manifoldgeometry, in effect, creates a patch that the solutions now have to avoid. (c)The obstacle set, O, taken into consideration serves to drive the geodesictrajectory away from the red square obstacle. This is achieved through aspring-system model that extends the geodesic trajectory generation procedureand produces solutions that avoid such “no-go” while following the underlyingmanifold geometry.

Re-learning a representation that takes account such ran-domly occurring constraints would be infeasible in continu-ally changing environments. Instead, our method reuses thepreviously learnt unconstrained manifold representations andincorporates new constraints in the motion planning phase.Figure 3 provides a sketch of such a scenario. In this settingwe can generate unconstrained trajectories of the learnt skillmanifold as explained previously (Figure 3(a)) but as thesystem’s environment changes a set of obstacle points isintroduced. What our method achieves is the generation ofgeodesic trajectories that can successfully navigate away fromsuch an obstacle while not leaving the support of the learntmanifold, Figure 3(c). This is achieved with a spring-systemmodel that optimizes geodesic trajectories with respect tonovel constraints. The next subsection describes the procedurein detail.

1) Constrained geodesic paths: The algorithm presentedin section IV-A2 generates unconstrained geodesic trajectoriesthat generalize from the solution set presented to the manifoldlearning procedure in the training phase. In practice, we oftenrequire more control over the generated trajectories as, often,the system would need to avoid task space and joint spaceobstacles. This is a constrained trajectory generation problemover the manifold. We now describe a procedure for generatingconstrained geodesic paths that avoid “no-go” patches on themanifold surface. These are defined as sets of obstacle pointsO ∈ RD that are uniformly sampled from the “no-go” taskspace region and trace the obstacle patch in joint space. Forexample such points can be samples from the faces of a cubeshaped obstacle or a set of points sampled from the surfaceof a sphere (Figure 3(b)).

The intersection of the manifold set and the obstacle set,M∩O, is the region that we would like to take into consider-ation when generating a constrained geodesic path. This pointset would drive the geodesic path away from the patch that

(a) Calculation of forces (b) Manifold projection

Fig. 4. (a) An obstacle point ok affects only the path points that are withinits range (β`) and exerts on them repulsive forces (red). In contrast the pathpoints can exert repulsive (not shown) and attractive forces (blue) to theirpath neighbors. (b) All forces that act on each path point are averaged andthe resulting mean vector is subsequently projected on the learnt manifoldM.

we want to avoid but given the learnt tangent space the pathwill not leave the surface of the manifold, thus properties thatthe manifold represents are still respected.

We treat the affected consecutive geodesic path points, q, asa system of springs that can either exert attractive or repulsiveforces to their neighbors. A force fqij , with magnitude∣∣fqij∣∣ = `−√(qj − qi)2,between two consecutive path points qi and qj , is repulsiveif the distance between them is less than `, and attractive ifthe distance is greater than `. The distance ` is a metric thatis derived from the manifold learning step and is estimateddirectly from data (section IV-A2). The magnitude of the forceis directly proportional to the distance of the points in question.

The obstacle point set, O, exerts repulsive forces to thepath points. The area of effect of the obstacle point set is alsodefined relative to `; each obstacle point, ok exerts a force foik,of magnitude

|foik| = β × `−√

(qi − qo)2,

to every path point, qi, within a hypersphere of radius set toβ`. The area of effect of O can be increased or decreasedby tuning the scalar β ∈ (1,+∞) with obstacle clearance inmind, i.e. a small β will result in trajectories that evolve closeto the obstacle set while a large β has the opposite effect.

We calculate all forces that act on each affected path pointand compute a mean force vector for each point,

f̄i =1

k

k∑foik +

1

j

j∑fqij , (∀j ∈ {neigh(i)}).

This vector is then projected onto the manifold, fMi =f̄iHqθH

q′

θ , and each point is moved by a small step, γ,accordingly. In our formulation γ is statically defined inthe interest of simplicity. Established numerical optimizationmethods (Nocedal and Wright, 2006) exist that can vary γ,thus achieving faster convergence, e.g a line-search step can beperformed to compute the optimal step size for every iteration.Figure 4(b) provides a sketch of the average force projectionstep. In effect this makes the points take small geodesic stepsaway from the obstacle while not leaving the support of themanifold. We repeat the procedure until all path points havecleared obstacle points (O = {�}) or the algorithm hasconverged.

9

Algorithm 2 Constrained Geodesic Trajectory GenerationINPUT: M, qstart, qend, OOUTPUT: q ≡ {qstart, . . . , qi, qend}q← Optimal Geodesic Path(qstart, qend)repeatdO ← Compute Distances(q, O)[foik, f

qij ]← Calculate Spring Forces(q, dO)

f̄i ← foi. + fqi.

f̄Mi ← f̄iHqθH

qθ′

q← q + γf̄MiCi ← ∂q2/∂s2 {Curvature}C̄ ← 1/n

∑Ci

C̄Mi ← (C̄ − Ci)HqθH

qθ′

q← q + γC̄Miδ ← q− qold

until dO = {�} or δ ≤ 10−6

2) Curvature smoothing: The above procedure only acts onthe geodesic path points that are in the area of effect (distance< β`) of obstacle points. This tends to lead to trajectoriesthat are not smooth when only small portions of the paths areconsidered. To alleviate this, we introduce a step that considersthe full set of path points and interplays with the constraintoptimization.

We use the path curvature as a measure for smoothingthe generated geodesic paths in a structured fashion. Wecalculate the curvature, C, over the discrete geodesic points,q = q1, . . . , qn as

Ci = ∂q2

∂s2, i = 1, . . . , n− 2,

where s is the distance between two consecutive points. Wecalculate the mean curvature C̄ and the error gradient C̄ − Ci(vector), for each triple of path points. Each point is thenmoved by a small step along the error gradient and pro-jected on the manifold tangent space. In general the curvaturesmoothing step is not mandatory but rather complementaryto our obstacle avoidance formulation. The entire constrainedgeodesic trajectory generation procedure is summarized inAlgorithm 2.

C. Control on and to a skill manifold

In the previous subsections we concentrated our efforts onsolutions that restrain the system’s state to evolve on the learntskill manifold hypersurface. The intuition is that the set ofsolutions used as examples to our manifold learning phase,imply that the manifold geometry is indeed the set of desiredstates. Such a measure of “value” can be sometimes difficult toexplicitly encode in terms of an analytical cost function, andin practice may encode task specifications, system constraintsand general costs.

In effect, we demonstrated how a trajectory can be opti-mized to follow the manifold geometry from start to goal ina manner similar to a vector field on the learnt hypersurface.Perturbations that make the system’s state jump to differentpoints that belong to the manifold can be lazily handled withstraightforward replanning, utilizing the tools presented earlier.

(a) Unforeseen perturbation. (b) State projection.

(c) On-manifold replanning.

Fig. 5. A sketch of an example where the ability to project would benecessary. (a) The system executes a geodesic trajectory when an unforeseenperturbation drives the state of the system to an off-manifold point. Theremaining trajectory points are discarded. (b) Replanning from an off-manifoldpoint would be insensible to the desired state evolution. Instead, we find theprojection of the off-manifold state on the underlying geometry. This is theclosest point that we then control for in a reactive manner. (c) A new geodesictrajectory is replanned, starting from the projection state and reaching to thegoal state.

Imagine a scenario where a mobile manipulator would needto transfer a tray with drinks. The task manifold in such a casewould include all these poses that keep the tray in a horizontalconfiguration and consequently the set of joint states that resultin such an end-effector configuration. If the manipulator getsperturbed, e.g. a push from a guest who is not careful whilethe robot passes by, then the state of the system is likely toreach a point that lies beyond the support of the task manifold,i.e. the tray is tilted. In this case first our framework needs torecognize that the state has deviated from the nominal plannedpath and a reactive control input would serve to lead the stateback to the desired manifold. The straightforward choice ofaction would serve to minimize the time that the state is inoff-manifold space, thus would follow the shortest path to themanifold geometry.

It is often the case that perturbations will cause the stateto leave the manifold (Figure 5(a)). Replanning from an off-manifold state would then be infeasible as the set of desiredstates is the manifold itself. A sensible choice is to find theclosest on-manifold point and try to reach this in a reactivemanner (Figure 5(b)). Once the state has returned to themanifold hypersurface, it becomes possible to replan as before(Figure 5(c)).

This empowers our framework with a global behaviourthat covers the ambient space of the system in its entiretyand enables us to reason quantitatively about the value (orcost) of off-manifold states. It can be viewed as endowingthe space around the learnt hypersurface with a vector fieldcontroller that reactively seeks to return the state on themanifold should a perturbation occur. This is achieved via aprojection operation as detailed in the following subsection.

1) Projection of states on manifold: We are given a learntmanifold model, Hθ, and a point q that belongs to the ambientspace of the system. We seek a point q′ that minimizesthe distance between q and q′, while q′ must belong to thesubspace that the manifold represents, i.e. q′ is the closest

10

point on the manifold.The projection of q on to the manifold Hθ cannot be

computed in closed form. Instead a gradient descent approachis utilized to find a new point q′ on H that minimizes thedistance, d = ‖q − q′‖22 . First we need to initialize theprocedure with an on-manifold point. One can use q′ to be thenearest point in the training data or, in a control setting to thelast state space point of a tracked trajectory, i.e. the point thatthe perturbation occurs and state of the system left the supportof the manifold. Note that the distance formulation is notlimited to the Euclidean norm as above. It can be substitutedwith any continuous and consistent distance metric.

Since Hθ is defined over the whole RD we calculate theorthonormalized tangent space at q′, H ′ ≡ orth(Hθ(q′)), andH ′H ′T the corresponding projection matrix. We follow thegradient to the local minima on the manifold, using the updaterule for q′:

q′ ← q′ + αH ′H ′T (q − q′),

with α being a step size. The resulting q′ is an on-manifoldstate that is closest, in a local sense, to the off-manifold stateq.

D. Benefits of manifold control

Our primary objective in this setting is to devise a vectorfield in the configuration or joint space of the robot thatenforces convergence to the skill manifold and approximatelyoptimal evolution along the skill manifold. Beginning on theskill manifold, if the system were asked to achieve a goal thatis infeasible (as indicated by the data), then one should be ableto compute and execute a ‘next-best’ trajectory consisting ofconvergence to the desired subspace and subsequent evolutionalong it. So, the problem is essentially one of using the learntstructure to define an appropriately structured error functionfor control purposes.

Since all trajectories subject to the task specifications mustlie on the manifold, the desired movement from off-manifoldpoints is along the projection back to the manifold. Notethat the precisely optimal corrective path segment may wellbe slightly different from this projection, depending on thespecific nature of the overall task specification. However, thistrue underlying specification could not be known from dataalone. So, the projection on to the manifold is the logicalchoice under this level of information. Deviations along themanifold, either due to dynamic obstacles or due to unforeseenperturbations, may be dealt with differently. Along the mani-fold, even if one were pushed away from the originally plannedtrajectory, one is still assured that the system is performinga reasonable movement subject to specifications. So, these

Fig. 6. A diagrammatic outline of the proposed control strategy utilizing thelearnt manifold.

(a) RBF metric. (b) Manifold metric.

Fig. 7. a) Volumetric plot of the distance metric that can be evaluated directlyfrom the model. The metric breaks down as the distance to the manifoldgeometry get smaller, leading to overestimation of the true distance. Blackdots represent RBF centers, over which the evaluation is based. b) Volumetricplot of metric derived from our model. The metric evaluates the distance tothe modeled surface and we see that it smoothly surrounds the underlyingmanifold. Distances range from dark blue (small) to red (large), while theclosest distances are completely transparent for clarity.

two types of corrections may be handled differently, e.g. withdifferent levels of stiffness.

Figure 6 outlines a controller architecture that achieves thistype of behaviour. Geodesic paths along the manifold pro-vide the feedforward component. Feedback corrects deviationsalong and away from the manifold - with different levels ofgain. For sufficiently large perturbations away from the desiredpaths (such as in examples to follow), it may be more desirableto replan, in a receding horizon sense.

In addition, the feedback gain KM(q) is state dependentand serves to scale the control input with respect to thesystems distance from the desired manifold2. This couplingyields a system that is compliant with respect to on-manifoldperturbations, that are in a sense indifferent to the task cost,but stiffens-up against perturbations that drive the state to cost-expensive parts of the space.

A key benefit of the manifold representation (as opposed to,say, a probabilistic model of possible velocities at each state) isthat it provides a clear notion of deviation from skill sets anddeviations within that set. With this, control is conceptuallyno more complicated than a simple proportional-derivativescheme but implemented in terms of a more sophisticatednotion of ‘error’.

In the absence of the manifold representation, one couldstill have implemented alternate error metrics such as, forinstance, based on distances to centers of clusters of demon-strated data points. Stated in terms of our model, this issimilar to defining errors in terms of the centers of RBFsused in our approximation. However, such a metric wouldyield suboptimal and non-smooth behaviours depending onparameters of the statistical model such as the number ofclusters. We illustrate the behaviour of this metric within abounded volume surrounding the desired subspace in Figure

2 In the simplest case KM(q) can be equal to the manifold metric. As withany feedback gain, this can be appropriately scaled for faster/slower systemresponse.

11

7(a)3. What is plotted is the distance to the shortest kernelcenter, distances range from deep blue to red while the lowerspectrum of blue is completely transparent for clarity. Thisresults essentially in a number of low-distance spheres aroundthe centers of the model. The metric becomes smoother asdistances become larger but, on a local scale - which is theone of real interest for control purposes, such a metric wouldoverestimate the distance to the manifold and be undesirablynon-smooth. In contrast, by considering the distance to themodeled subspace directly we can get an accurate and smoothmetric, that captures the distance accurately as shown in Figure7(b).

An interesting possibility that arises from this (to be ex-plored in future work) is that one could devise planningschemes based on variants of the A* algorithm, as the costdefined using the manifold is admissible - strictly less thanor equal to the true cost of the underlying function. Thisproperty would not be present in the previous naive metric,which is prone to producing overestimates. In turn, the use ofan admissible cost guarantees that A* (and variants) returnsa path that is optimal and in practice greatly reduces runningtime.

In the following sections, we demonstrate the utility of theseideas for practical applications. The first example presentsexperiments on a simulated 3-link arm where both the man-ifold and the learnt model can be visualized. For the secondexample, we use the Kuka LWR anthropomorphic robot arm,with which we demonstrate how our method scales to morecomplex systems and more challenging tasks. Last, we presentan example on the Nao humanoid robot, generating motionsthat stand stably on one leg.

V. EXPERIMENTS WITH A THREE LINK ARM

A. Reaching with a robotic arm

Our first set of experiments are designed to elucidatethe basic concepts underlying our approach. We use a 3-link planar arm where we can explicitly visualize both theconfiguration space and the optimization manifold (surfacethat corresponds to a specific redundancy resolution strategy),along with possible obstacle points. The arm is a series of threerigid links, of 1/3 length, that are coupled with hinge joints,producing a redundant system with 3 degrees of freedom(DoFs) that is constrained to move on a 2 dimensional plane(task space).

1) Reaching examples: We start with a 21 × 31 grid intask space and compute the joint positions for each goal pointwith an iterative optimization procedure detailed below. Wesubsample 100 grid points to get a random permutation forlearning, as in Figure 8(a). We have chosen to start with train-ing data that follow a grid in task space in order to visualizethe corresponding geometry in the system’s joint space. Thisstructure arises from the optimization procedure that resolvesthe system’s redundancy, i.e. a different redundancy resolutionstrategy would shape the state space geometry differently.

3The example here is based on our experiments with the 3-link arm. Thegray trimesh represents the surface geometry generated by a minimum anglesreaching behaviour. The details are clarified in the following sections.

−0.5 0 0.5 1

1.5

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

2.4

2.5

x

y

(a) Task space

−0.5 0 0.5 1

1.5

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

2.4

2.5

x

y

(b) Neighborhood graph

00.2

0.40.6

0.81

0.8

1

1.2

1.40.6

0.8

1

1.2

q2

q1

q 3

(c) Joint space

00.2

0.40.6

0.81

0.8

1

1.2

1.40.6

0.8

1

1.2

q2

q1

q 3

(d) Tangent space

Fig. 8. Learning the optimality manifold of a 3-link arm. (a) The planartask space of the arm and subsampled points (blue) used for leaning. (b)The neighborhood graph used for learning a manifold. (c) The optimalitymanifold that we wish to learn. Light gray points are not used for learningbut are plotted to give a better estimate of the geometry of the manifold. Notethat the manifold is not planar in the ambient space but twist and turns as wemove down the q3 axis. (d) The learnt tangent space model. Blue and greenarrows are basis vectors evaluated at points that correspond to the originalgrid.

−0.5 0 0.5 1

1.5

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

2.4

2.5

x

y

(a) Interpolation

−0.5 0 0.5 1 1.5 2 2.5 31

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

x

y

(b) Extrapolation

(c) Generalization errors (d) Time

Fig. 9. Results of the 3-link arm experiments. Novel task space trajectoriesproduced with random start and end points where (a) demonstrates gener-alization within the region of support of the data, while (b) demonstratesgeneralization beyond the region of support of the training data. (c) RMSEerror of generated trajectories against ground truth for the two cases. In theinterpolation scenario the error is practically zero (y axis in log-scale). (d)Absolute planning time for the two cases. Note that in the interpolation casethe length of the paths is consistently low.

12

In our experiment, we first have to choose a redundancyresolution strategy, which implicitly specifies the manifold thatwe will subsequently learn. Here, we choose the joint spaceconfiguration, q, that minimizes the distance to a convenience(robot default or minimum strain) pose, qc. Formally,

min ‖q− qc‖2 , (12)subject to f(q)− x = 0, (13)

where f is the forward kinematics and x is the goal endpointposition on the plane.

The resulting q’s trace a smooth nonlinear manifold in jointspace, depicted in Figure 8(c). We note that the manifold doesnot lie on a plane but on a convex strip that twists clockwiseand tightens as we travel down the q3 axis. Also differentredundancy resolution strategies would produce different op-timality manifolds. We note that, in general, this kind ofinformation may not be explicitly known (in the case of humandemonstration) or visualizable for more complex problems.

2) Implementation: The first step in data-driven learningof the desired manifold is to compute the neighborhood graphof the training data. We evaluate the task space distances tocompute the neighborhood graph with the constraint that thegraph contains a single connected component. In practice wegradually increase the neighborhood distance until all pointsare connected, as in Figure 8(b).

The tangent space that we wish to learn is inherently twodimensional. We learn a model of Hθ with 10 RBF’s and 100points, the blue points in Figure 8(c). We can subsequentlyevaluate Hθ at any point in our joint space. Figure 8(d) showsthe tangent bases evaluated at every point of the previouslygenerated grid. Note that the basis vectors are aligned andvary smoothly, i.e. we obtain a good generalization within theregion of support of the data.

3) Generation of novel reaching solutions: For measuringthe goodness of our learnt manifold, we use two metrics.Central to our aims is the generalization ability of the model.Thus we quantitatively evaluate the error of planned motionsagainst the poses that the original optimization procedurewould produce. We distinguish between two scenarios for ourmotion planning. The first evaluates the model’s interpolationability, generating trajectories that in task space lie within thegrid from which 100 points have been sampled for learning.The second case evaluates the extrapolation ability of themodel by generating trajectories, the endpoints of which lieoutside the original grid. In both cases start and endpointpositions in task space were random, while results are averagedover 10 trials for each scenario.

We create 50 optimal geodesic paths, with random start andend points for each case, with the method detailed in sectionIV-A2. Samples of such paths for both generalization casesare depicted in Figure 9(a) and (b) (grid points in light grayfor comparison).

We then collect all the intermediate points and computethe optimal solutions of their forward kinematics with theredundancy resolution algorithm detailed in section V-A1, asground truth. We compute the RMSE, for each trial and foreach case, between ground truth and prediction of model, fora total of 10 trials.

−0.5 0 0.5 1 1.5 2 01

2

0.5

1

1.5

2

2.5

q2(rad)

Manifold surface

q1(rad)

q 3(r

ad) −0.5 0 0.5

0.20.40.60.8

x(m)

y(m

)

Samples

(a) Samples.

01

2 01

2

0

0.5

1

1.5

2

2.5

q2(rad)

Tangent space basis

q1(rad)

q 3(r

ad) −0.5 0 0.5

0.20.40.60.8

x(m)

y(m

)

Neighborhood graph

(b) Learnt manifold.

−0.5 0 0.5 1 1.5 2 01

2

0.5

1

1.5

2

2.5

q2(rad)

Geodesic paths

q1(rad)

q 3(r

ad) −0.5 0 0.5

0.2

0.4

0.6

0.8

x(m)

y(m

)

Task space

(c) Generated geodesic paths.

Fig. 10. The manifold learning and usage for the 3link arm example. a)Starting with 100 datapoints in joint space, that correspond to task spacecoordinates as in the inset plot. b) The neighborhood graph in task space (insetplot), and the learnt tangent space that the model predicts for the RBF centersin the high dimensional space. c) Randomly sampled optimal (unconstrained)geodesic paths and corresponding task space trajectories in the inset plot.The thin gray trimesh is a densely sampled reconstruction of the underlyingsurface, used only for comparison and as a visualization aid.

The averaged errors are depicted in Figure 9(c). Note thatthe RMSE axis is in log-scale while the difference of the twobars is of 2 orders of magnitude. To be precise the averageRMSE for paths generated within the region of support ofthe data is 1.8935× 10−4 ± 3.6013× 10−5(practically zero),while beyond the support of the data the average RMSE is6.84×10−2±2.19×10−2. In addition, computing the optimalgeodesic paths takes less time on average (Figure 9(d)) in bothcases.

B. Constrained reaching on a robotic arm

Following the preliminary evaluation of the previous sec-tion, we show that the method can handle a more interestingredundancy resolution strategy. To demonstrate this we beginwith another reaching example where we also visualise thegenerated trajectories in the configuration space of the system.

1) Reaching examples: We randomly sample 100 Cartesianpoints from the top semicircle of the task space of the system.The dataset is 100 points of x and y pairs, where

x ={x ∈ [−1, 1]y ∈ [0, 1] (14)

13

We run the task space dataset through an iterative optimizationprocedure detailed below and get the corresponding joint spacedatapoints, q = (q1, q2, q3). A set of 100 such points isdepicted in Figure 10(a)), as black dots in joint space andtask space plots.

We densely sample the space with 900 more points thatare used solely for visualization purposes and play no furtherpart in the learning procedure. For visualization purposes,we use all 1000 points to compute a Delaunay triangulationof the joint space structure as sampled, and then plot thetrimesh (triangle mesh) for comparison with the paths thatour algorithm produces. This trimesh surface is depicted in allfigures with thin gray edges.

We choose a different redundancy resolution strategy, im-plicitly specifying the geometry of the manifold (Figure 10(a),thin gray mesh). Here, we choose the joint space configuration,q, that minimizes the absolute sum of joint angles, in adifferent view it minimizes the distance to a convenience(robot default or minimum strain) pose, qc = (0, 0, 0), withan additional weighting on the cost of each joint movement,wi. Formally,

min ‖wq− qc‖2 , (15)subject to f(q)− x = 0, (16)

where w is a weighting vector, f is the forward kinematicsand x is the goal endpoint position on the plane. We set w =(4, 2, 1), which means that the cost of the first joint offsetwill be four times as significant as the last joints angle, thuspenalizing more any motion of the first link. Such weightingis often used in robotics as, in a physical setting, moving thelast link only is much more energy efficient than moving thefirst link, as the first link will have to move the rest of thesystem’s weight as well.

The resulting q’s trace a smooth nonlinear manifold in jointspace, depicted in Figure 10(a). We note that the manifoldsurface resembles a convex strip that bends backwards towardsthe edges, much like a section cut of a bent tube. This is thesurface that points of the specific optimality criterion trace.Also different redundancy resolution strategies would producedifferent optimality manifolds. We note that, in general, thiskind of information is not explicitly known (in the case ofhuman demonstration) or even visualizable, for many complexproblems.

2) Implementation: As before we compute the neighbor-hood graph of the dataset and then learn the manifold model,Hθ (Figure 10(a) and 10(b)). We can subsequently evaluateHθ at any point in our joint space. For example Figure 10(b)shows the tangent basis evaluated at the centers of the RBFsused. Note that the basis vectors are aligned and vary smoothly,i.e. we obtain good generalization within the region of supportof the data. This way, in order to “walk” on the manifold weneed to evaluate the learned tangent basis and follow eachlocal frame for each consecutive step, in other words followthe blue and green arrows of Figure 10(b) for each point inquestion.

3) Generation of constrained reaching motions: Once wehave learnt a model of the manifold tangent basis we have

−0.5 0 0.5 1 1.5 2 01

2

0.5

1

1.5

2

2.5

q2(rad)

#3

Constrained geodesic paths

q1(rad)

#2

#1

q 3(r

ad)

−0.4−0.2 0 0.2 0.4 0.6

0.4

0.6

#1

#2 #3

x(m)

y(m

)

Task space avoidance

Fig. 11. Example geodesic trajectories that avoid point set obstacles on thelearnt manifold. The unconstrained geodesic trajectories in red are the initialestimates that either collide or come unsuitably close to the obstacle points.The blue trajectories are the outcome of optimizing these geodesic trajectorieswith the constrained geodesic trajectory generation procedure that allows toavoid smoothly novel obstacle points in the system’s state space. The insetplot demonstrates how the joint space trajectories are appear in the system’stask space. See text for details.

access to the geometric properties of the surface. Subse-quently the geometry of the manifold can be used to generateoptimal geodesic paths. The procedure for generating theseunconstrained paths was described in section IV-A2, andthe key advantage is that the generated paths will be ofshortest distance and adhere to the manifold geometry. Optimalgeodesic paths generated from randomly sampled start andend points are depicted in Figure 10(c), where the manifoldgeometry is also plotted (thin gray trimesh) for comparison.Note how the generated paths trace the underlying manifoldgeometry while also minimizing the deviation from a straightline connecting start and end points (non-geodesic minimumdistance). The resulting task space trajectories –the geodesicpaths run through forward kinematics– are also displayed inthe inset plot at the same figure. Note that the resulting taskspace trajectories are curved, an observation discussed in thenext subsection.

In most realistic scenarios, we need more control overthe geodesic paths that we generate. We would like to beable to specify patches on the manifolds that we would likethe generated paths to avoid, while preserving their geodesicproperties. This is accomplished by the procedure detailed insection IV-B1, and results are depicted in Figure 11. We startwith a set of random start and end points and pick a list ofobstacle points that intersect the manifold. In Figure 11 thepoints to be avoided appear as red circles, and effectivelytrace a patch that can be viewed as a “no-go” region onthe manifold. The red lines are the predicted geodesic pathsthat travel through the obstacle regions. The blue lines are theconstrained geodesic paths that are optimized with the obstaclepatches in consideration. The resulting task space behaviourfor this set of examples is visualized in the inset plot of thesame picture.

4) Remarks: One observation, regarding the shape of thetask space trajectories generated by geodesic paths, is that

14

the shortest path in the 3 dimensional joint space would be astraight line connecting the start and end points. The optimalgeodesic paths are the joint space trajectories that connect startand end points and minimize the deviation from a straight linewith respect to the manifold geometry. Now, the manifold isthe surface defined as the union of all joint space paths thatare optimal with respect to a specific redundancy resolutionstrategy. These are shortest paths that satisfy the optimalityrequirements implicitly encoded. In our scenario, the predictedtrajectories would be composed of a series of points thatminimize the weighted sum of joint angles, thus the task spacetrajectories would be subsequently optimized with respect tominimum angular change.

Another point is that the generation of geodesic paths ismore efficient – and much faster (∼ 1 sec vs. ∼ 3 sec) – thannumerical optimization as described in section V-B1, whileachieving (practically) equal results (approximation RMSE∼ 10−3). In other words we are able to approximate thesolution set of the costly optimization with an approximationthat is able to generate accurate solutions in a fraction of thecomputational cost. In addition we are able to lazily add novelconstraints and take them into account in the motion generativeprocess without impacting the optimality with respect to themanifold hypersurface.

C. Manifold control on the 3-link armFirstly, we encode the task in a skill manifold, a subspace,

in the underlying state space that is defined by the equivalenceclass of trajectories corresponding to various instances ofthe general task. Then, we define cost hypersurfaces thatpenalize deviations from this subspace of states within theambient space. For instance, if a task is defined by the needto gyrate ones body in a certain style of configurations thenall variations on that style are captured as trajectories alongthe skill manifold. Then, deviations from that style of gyration(i.e., away from the set of all feasible trajectories) would bepenalized according to the specific structure of the task - bydefining cost hypersurfaces with respect to that underlyingskill manifold. This yields a vector field in the ambient spacethat constitutes both a basic plan from an initial condition anda controller to counteract perturbations, as also discussed insection II-A.

1) Implementation: We use the implementation that wepresented previously in section V-B2.

2) Evaluation: To evaluate the accuracy of the model werandomly pick 100 start and end points and plan a trajectorybetween them, first with our method and second, with anaive quintic polynomial method as in Craig (1989) - ourchosen benchmark. We distinguish two cases; an unperturbedtrajectory, and a random perturbation occurring at t = 0.25(Figure 13). We calculate the average cost per trajectory andaverage over the results for each case (Table I). The evaluationshows that with the use of the manifold controller we achieveconsistently lower cost trajectories, while the difference ismultiplied in the case where a perturbation occurs. The in-terpretation being that the naive planner is forced to stay in ahigh cost patch of the state space while the manifold finds theappropriate short path to the cost-optimal surface.

00.5

11.5

2

00.5

11.5

2

0.5

1

1.5

2

2.5

q1(rad)q2(rad)

q 3(r

ad)

0 1 2

0.5

1

1.5

2

2.5

q1(rad)

q 3(r

ad)

Fig. 12. Randomly sampled points in state space (red) and correspondingmanifold projections (green). The inset plot is a different perspective of thesame figure. Note that these projections are indicative of the control action,i.e., control vector field, which is naturally different in different regions of theambient state space. (The thin gray trimesh is a densely sampled reconstructionof the underlying surface, the extra points being used only for comparisonand as a visualization aid.) A rotating 3D version of the plot is also availablein the accompanying video.

0 0.5 1 1.5 2 01

2

0.5

1

1.5

2

2.5

q2(rad)

projection

q1(rad)

start

end

q 3(r

ad)

(a) Perturbed geodesic path.

−0.4 −0.2 0 0.2 0.4 0.6 0.8

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

start

end

x(cm)

y(cm

)(b) Perturbation example projections.

Fig. 13. A typical trajectory resulting from this method. A geodesic pathfrom start to end is computed, with a random perturbation occurring at timet = 0.25 that pushes the state away from the manifold. This new stateis projected back on to the manifold to find the closest feasible state. Apath from the projected point to the goal is then executed before continuingalong. The task space trajectory with perturbation. The dashed blue line is theinitial predicted trajectory while the red line is the motion due to the (severe)perturbation occurring at the first red star. The state is then pushed away fromthe initial trajectory and a new path to the goal is replanned after the novelstate is projected on the learnt manifold.

VI. EXPERIMENTS WITH THE KUKA LIGHTWEIGHT ARM

A. Serving example with the Kuka Lightweight Robot arm

The next set of experiments demonstrates how our methodcan scale up to a higher dimensional problem and capturemore interesting behaviours. For this, we use the 7-DOF KukaLightweight Robot (LWR-III), shown in Figure 14(a).

The chosen movement corresponds to that of carrying atray (it helps if we imagine that it may be loaded with hightumblers) from a randomized start position to a randomizedgoal position while trying to minimize total joint motion.This task can be broken down into two costs that need tobe simultaneously optimized. The first of these must penalizedeviation from a flat end-effector configuration, while thesecond must minimize the angular displacement.

15

This problem is defined as:

min(J), (17)subject to f(q)− x = 0, (18)

where the cost, J , can be separated into two factors: J =J1 + J2, of the form:

J1 = wq2, (19)

J2 = T2pitch + T

2roll. (20)

Where T is the end-effector orientation with respect to theglobal frame of reference. As the system is redundant we usea non-linear optimization method to obtain random trainingpoints.

1) Implementation: As in the 3-link arm example, we startwith the nearest neighbour (NN) graph computation, wherewe gradually increase the neighbourhood distance until nodisconnected subsets exist. An NN graph is shown in Figure14(c), where we plot the end-effector positions that correspondto the sampled configurations and the graph edges that resultfrom the computation.

Our training set in this case consists of 100 data pointsthat have been collected by sampling random end-effectorpositions and then optimizing with the procedure describedearlier, Figure 14(b). The dimensionality of the system in thiscase is quite high, thus the sampling is necessarily sparse. Thedimensionality of the manifold has been set to 4, while using20 RBF kernels, as lower dimensional models did not achievean acceptable test error.

The dimensionality of the system prevents any meaningfulvisualization or qualitative observation about its geometry.Nonetheless the evaluation presented below reveals that thelearnt manifold is an accurate model of the cost metric presentin the demonstrated data, while planning and replanning withthis model is highly beneficial.

2) Evaluation: The experimental procedure was as follows.We collect a set of 100 joint space trajectories that areproduced by our method where the start and goal points aresampled randomly from the reachable space of the system. Wefurther generate trajectories that correspond to the same startand goal positions using interpolation with quintic polynomialsas in Craig (1989). We generate a random perturbation attime t = 0.25 and replan with both methods (Figure 15). Weevaluate the true cost for all sets and compute the average costper trajectory.

By comparing the resulting average costs we can see thatour method produces significantly lower cost trajectories, i.e.the deviation from a flat end-effector configuration is lowerwhile the angular displacement cost is also lower. The resultingbenefit is magnified in the case where the trajectories areperturbed as the resulting average costs show (Table I), wherethe naive method on average tilts the end-effector (tray) by0.55rad, which would be considered a task failure. This occursbecause in our method the system seeks to return to thespace of the demonstration data as soon as the perturbationceases and replans thereafter while following the manifold of(approximately) optimal cost. The naive method seeks to reach

(a) Robot.

−1−0.5

00.5

1

−1

−0.5

0

0.5

1−0.5

0

0.5

1

x(m)y(m)

z(m

)

XYZ

(b) Samples.

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

10

0.5

1

x(m)y(m)

z(m

)

(c) Neighborhood graph.

Fig. 14. The redundant Kuka Lightweight Arm. (a) Picture of the robot.(b) The kinematic model of the Kuka LightWeight Arm along with 100randomly sampled joint space endpoint positions. (c) The neighborhood graphthat results from the sampled joint space points.

−0.6−0.4

−0.20

−0.6

−0.4

−0.2

0

0.2

0.4

−0.2

0

0.2

0.4

0.6

x(m)y(m)

z(m)

start

e ndXY

Z

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

time(sec)

q1:7(rad)

Fig. 15. An example planned trajectory. The unperturbed trajectory appearsas a dashed line. The perturbation pushes the state away from the plannedtrajectory (red line). The solid blue line, originating at the point when theperturbation ceases (red star), is the replanned trajectory that extends fromthe projected state point and ends at the goal position. Inset plot: Examplejoint space trajectories including a large perturbation.

the goal without considering the underlying cost-optimal sub-structure thus is prone to spend the remaining time in a non-optimal part of the state space. In table I the evaluated costfor the predicted trajectories is also broken down to its twocomponents, the average angular displacement (J1) and theaverage displacement from a flat end-effector configuration(J2). Both metrics assert that the manifold method providessuperior results regarding the underlying cost metrics.

Figure 15 shows a typical run of the control strategy.Random start and end points are selected and we use ourmanifold-based method to generate a trajectory that reachesthe goal while satisfying the learnt cost (dashed blue line). At

16

Fig. 16. Two examples of perturbed trajectories with the Kuka LWR arm.The red line traces the path of the end-effector (the perturbation is seen asthe discontinuity in this trace). Time flows from left to right. Also availablein the accompanying video.

time t = 0.25, a random perturbation occurs that pushes thestate of the system away from the planned path. The controllerrecognizes the discrepancy between the planned and currentstate and computes the projection of the new state space ontothe manifold, along a direction that is orthogonal to costhypersurface. In other words, the system moves back towardsthe closest point that satisfies the learnt cost modeled by themanifold. A new path is replanned from this state towardsthe goal, the solid blue line originating from the red star andreaching the goal. Figure 16 contains snapshots from exampletrajectories on a realistic simulator4. See accompanying videofor further examples.

The computational cost of the proposed procedure scaleslinearly with the size of the model, i.e. the complexity iscoupled with the number of RBF kernels used. Planning atrajectory on average requires less than 0.1sec while projec-tions from random state space points on the learnt manifold onaverage require 0.09± 0.05sec, on standard commodity hard-ware running not particularly optimized code. This suggeststhat manifold (re)planning can be a viable solution for onlineusage.

As a final remark regarding evaluation, we note that ourmanifold learning algorithm was provided with data fromanalytically optimized trajectories rather than human demon-stration. We choose to do this with the aim of having preciselycontrolled experiments with clear ground truth. So, we are ableto make concrete statements about the ability of our methodsto recover the ‘correct’ solutions. As such there would beno difference if we replace these trajectories by, e.g. motioncapture data, although we wouldn’t be in a position to makesimilarly quantitative statements regarding performance.

VII. EXPERIMENTS WITH THE NAO HUMANOID ROBOT

This section presents how our framework can be used witha humanoid robot. We show that we are able to learn, froma limited number of stepping examples, a representation thatcaptures an approximately complete set of quasi-static walkingmotions. Furthermore, the produced motions can accommo-date novel starting positions and goals while producing good(stable) generalizations of the examples provided.

4The simulator is developed in SLMC and uses a rigorous analytical modelof the robot’s dynamics. Note that the dynamics model is used for physicallyrealistic simulation but is not available to the learning, planning or controlalgorithms.

TABLE IMEAN COSTS EVALUATED AGAINST THE TRUE COST FUNCTIONS. THE

RESULTS ARE MEAN±STANDARD DEVIATION OVER SETS OF 100 RANDOMSAMPLED TRIALS.

System Method Unperturbed Perturbedtraj. cost traj. cost

3-link naive 0.9239± 0.1799 1.401± 0.3610manifold 0.8724± 0.1723 1.214± 0.2597

Ku

motion planning and reactive control on learnt skill manifolds · 1 motion planning and reactive...

Documents