216239 d1-1 kinematic modelling

7/28/2019 216239 D1-1 Kinematic Modelling

1/44

Large-scale integrating project

Deliverable D1.1

Kinematic model of the human hand

Project acronym: DEXMARTProject full title: DEXterous and autonomous dual-arm/hand robotic manipulation with

sMART sensory-motor skills: A bridge from natural to artificial cognitionGrant agreement no: FP7 216239Project web site: www.dexmart.eu

Due date: January 31, 2009 Submission date: February 2, 2009

Start date of project: February 1, 2008 Duration: 48 months

Lead beneficiary: OMG Revision: 1

Nature: R Dissemination level: PU

R = Report

P = Prototype

D = Demonstrator

O = Other

PU = Public

PP = Restricted to other programme participants (including the Commission Services)

RE = Restricted to a group specified by the consortium (including the Commission Services)

CO = Confidential, only for members of the consortium (including the Commission Services)


2/44

ICT FP7 216239 DEXMART Deliverable D1.1

TABLE OF CONTENTS

1 Introduction 2

2 Hand modelling: state of the art 3

3 Motion capture 5

4 Kinematic model calibration 6

4.1 Least square calibration 8

4.2 Error analysis 8

4.3 Marker motion model 18

5 Methods for experimental validation 22

5.1 Statistical validation24

6 Results 24

6.1 Marker motion model 24

6.2 Kinematic model selection 26

6.3 Evaluation of finger joints interdependencies 29

6.4 Marker set selection: human hand wearing a data-glove 35

7 Conclusions 39

A Bayes factors for marker regression: algebraic form 39

1


3/44


Figure 1: Human hand skeleton and articulations.

1 Introduction

The articulations of the human hand are more complex than the comparable articulations of other animals.In fact, the skeleton only consists of 27 bones, 14 for the fingers, five metacarpals forming the palm and eightcarpal bones in the wrist (Fig. 11). Thanks to this complexity and to an highly sensitive tactile feedbackhumans can manipulate objects in the environment and execute complex tasks that are still out of reach forstate-of-the-art robotic systems. As many of the tools that we use in our everyday life were designed for ushumans, robots that can manipulate the same tools would gain enhanced human interaction capabilities.Also, a natural ability to interact with the environment could ease the creation of richer training data-setsand consequently enhance the robot artificial intelligence.

To improve robotic capabilities a necessity arises for a model that can reproduce the vast majority ofthe hand manipulation tasks, that can be measured with available motion capture technology and that canserve as a path indicator for the evolution of robotic hands. Unfortunately, a model that emulates in totothe human hand leads to two major problems. First, the subtle movements of the wrist (carpal) bones aredifficult to measure using non-invasive techniques. Second, a detailed robotic replica of the human handis complex to implement. The good news is that an approximated articulation model that is capable ofreproducing most of the common manipulation tasks can be implemented and, application-wise, it is oftensufficient.

This report describes the research on kinematic models of the human hand conducted within the DEX-MART project. In Sec. 2 we review existing state of the art approaches adopted in a variety of research fieldsand applications. In particular, we analyse the different choices in terms of number of bones, number ofjoints and marker configurations. After a brief overview of the Vicon motion capture system (Section 3) wedescribe our contribution toward a more accurate subject calibration procedure (Section 4). To this extent,first we present the standard procedure, then we analyse the dominant errors using Magnetic ResonanceImaging (MRI) and finally we propose a novel solution that aims at reducing soft tissue artefacts by explicitlymodelling marker movements. Before presenting the experimental results we briefly discuss best practises forkinematic model evaluation (Sec. 5). The experimental validation in Section 6 assesses the performance ofthe proposed calibration procedure, compares different articulation models, and evaluates the level of inter-

1The original image is kind courtesy of Mariana Ruiz Villarreal,

http://en.wikipedia.org/wiki/File:Scheme_human_hand_bones-en.svg.

2
http://-/?-http://en.wikipedia.org/wiki/File:Scheme_human_hand_bones-en.svghttp://en.wikipedia.org/wiki/File:Scheme_human_hand_bones-en.svghttp://-/?-


4/44


dependencies between joint parameters. Also, a preliminary work on best marker set selection for combinedoptical and data glove based motion capture is presented. Finally in Sec. 7 we draw our conclusions.

2 Hand modelling: state of the art

Approximated kinematic models of the hand are a common research tool. The model accuracy mainlydepends on the application requirements and consequently on the research field. The computer-visioncommunity has used kinematic models for a long time to research on hand tracking and gesture recognition [1,2, 3, 4] (for complete overviews the reader should refer to [1, 3]). Due to the ambiguity of the measurements(i.e., the images) simple models are usually preferred. In their seminal work Rehg and Kanade [4] extractgradient-based image features and use a quadratic error minimisation procedure to track the hand motion.The kinematics are summarised by a 21+6 Degrees of Freedom (DoF) model which uses five DoF for theThumb, four for each of the other fingers; the remaining six DoF define the global position and rotationof the wrist in the 3D space. In the decade that follows this publication other researchers adopted the

same kinematic model [5, 6]. The agreement between independent publications suggests that a 21+6 DoFmodel is sufficient for gesture recognition and that tracking more complex models may not be feasible withstate-of-the-art vision-based technologies. In fact, at the best of our knowledge none of the markerlesstrackers attempts to measure subtle articulation movements like palm arching.

The biomechanical research community has widely studied the human hand using optical or magneticmotion capture. As modern motion capture systems can provide a set of discrete positions with sub-millimetre accuracy, measurement ambiguity is a smaller problem than in markerless computer vision appli-cations.Thanks to more accurate measurements biomechanical researchers had the possibility to explore awide variety of articulation models looking for a faithful reproduction of the human hand kinematics. Baselinemodels for research on biomechanics also assume a rigid palm where the carpal bones are approximated by asingle segment [7, 8] and the relative position of the four metacarpal bones (excluding the thumb) is fixed a

priori [9, 10]. However, this type of model may not be appropriate to model object manipulation and gaspingactivities where the subtle deformations of the palm become relevant. A more accurate model is part of theSantos virtual human, a complete human body model used to simulate biological activity and ergonomicsfor military and industrial purposes [11]. The Santos hand has 25+6 DoF, with four DoF modelling flexionand adduction of the ring finger and pinky Carpomethacarpal (CMC) bones. Several research works analysein-depth the motion of the thumb [12, 13, 14, 15]. Results show that the standard thumb model withthree segments and five DoF is quite a crude approximation [15]. Although the motion of the CMC joint isdominated by the two DoF associated to flexion-extension (FE) and adduction-abduction (AA), significantvariations in the pronation-supination direction exist. Further complexity is introduced by non-orthogonaland non-intersecting joint axis. To improve the model Hollister et al. [16] propose to replace the universalCMC joint with a two DoF saddle joint. Recently Chang et al. [15] realised that with a saddle joint the PS

residual is still significant and they introduce a model with a third DoF.The hand models proposed for computer graphic animations are also more complex than those used in

vision-based applications. To model palm arching Yasumuro et al. [17] add to the 21+6 DoF model oneDoF to each of the CMC finger joints. Albrecht et al. [18] use thee DoF for the finger Metacarpalphalangeal(MCP) joints and the thumb CMC joint and control the articulation via pseudo muscles. Sueda et Al. [ 19]use tendons and muscles to introduce forces and pose constraints and to predict the 3D shape of the hand.

The study of the skin surface has also attracted the attention of the computer graphic and biomechan-ical communities. In computer graphic the aim is to produce realistic deformations of characters surfacemashes [20]. Usually, the position of the mesh vertexes is transformed according to the characters pose.Undoubtedly Single Weight Enveloping (SWE) (Softimage 1992 ) or "skinning" (Alias|Wavefront, 1998) isthe most widely used and understood technique to model skin deformations. SWE computes the position

3


5/44


of each vertex as a linear combination of the joint poses. Typically, only the poses of the closer segmentsare used and the animator manually adjusts the weights to improve the quality of the rendering. Althoughthis technique is simple it is also prone to the classic "pinching" at the joints caused by the sub-space of

the deformation function becoming degenerate and collapsing when joints are extremely flexed. To reducepinching artefacts and increase the level of realism Wang et al. [21] still use linear transformations butadd multiple weights per segment. Singh and Kokkevis [20] instead extend the work from Sederberg andParry [22] and use control points aligned with the character skin and 3D Bezier control volumes to deformthe skin according to the poses of the character skeleton. This method yields much more realistic resultsthan SWE due to the Bezier sub-space deformation. Lewis et al. [23] propose a different approach thatdoes not rely on initial manual tuning of the weights. Their method allows CG animators to bind theirskin to a character skeleton in a number of key poses; then the algorithm interpolates the intermediatedeformations using non-linear kernels. Unfortunately a large number of poses is required to generate themapping for complex limbs like the hand. Anguelov et al. [24] propose a data driven approach lo learnsecondary motion like muscle bulging. Their method applies a linear regressions from the vertex positions

to the skeleton poses. The problem is formulated as a large optimisation procedure that learns the modelparameters and enforces mesh smoothness. Although the mesh deforms realistically, this approach is notsuited to biomechanical analysis as it requires several range scans of the subject. Park and Hodgins [25]instead model the dynamic movements of the skin from a large set of markers.

Unlike computer graphic animators, biomechanicists model skin motion to reduce capture artefacts andconsequently improve the accuracy of the joint angle measurements. Standard approaches to model fittingand calibration model the motion as Gaussian noise added to rigid marker positions [26]. Then rigid markersare fit to the deformed 3D reconstructions in a least squares sense. Andreacchi et al. [27] propose aninteresting solution to the soft tissue artefacts problem. Their design relies on distributing as many markeras possible on each segment. Then a mass is assigned to each marker and the centre of mass and the inertiatensor of the cluster are calculated on a frame-by-frame basis. By changing the masses the model adapts

the motion of the markers to the skin. Despite its elegance in design and implementation, this method hasbeen shown to be remarkably unstable by Cereatti et al. [28]. Cappello et al. [29] extend the popular CASTtechnique (Calibrated Anatomical System Technique) from Cappozzo et al. [30] by calibrating two distinctposes at the same time. The two sets of marker positions are interpolated using a linear function of thejoint angle. Although simple and ultimately extensible to many poses this method is limited by the lineartransformation that may not fit the skin motion.

The interaction between bones, tendons and skin constraints the possible poses that the human handcan take. To reduce the tracking search space, and consequently the computational cost, Lin et al. [31] andthe Santos model [11] impose constraints on the joint angle ranges. Also, although the parametrisation ofthe hand can have between 20 and 30 DoF, the physical action of the tendons imposes strong dependenciesbetween different joints. These dependencies can be used to reduce the number of free parameters [ 32]. A

simple rule of thumb for the Proximal Inter-Phalanx (PIP) and Distal Inter-Phalanx joint angles DIP andPIP is that DIP = 23PIP. As long as the hand moves freely this rule holds. However, if the subjectsgrasps an object like a pen or a knife, DIP can strongly deviate from the predicted angle 23PIP. Lin etal. [31] instead perform a Principal Component Analysis (PCA) of the finger motions. The analysis leadsto a principled model of the inter-parameter dependencies. The authors show that, for tracking purposes,it is possible to represent the vast majority of the hand poses with just seven degrees of freedom. Also, thekinematic poses of the hand are weakly correlated to the body poses. Jin et al. [33] exploit this dependencyto generate realistic hand animations.

Biomechanical research has investigated more in-depth joint dependencies [34, 35, 36]. In general themechanism according to which, when normal humans attempt to move just one finger the other fingershave to move as well, is well known. Also, the movements of the thumb, index finger, and little finger

typically are more independent than movements of the middle or ring fingers. Simultaneous motion of non

4


6/44


instructed digits may result in part from passive mechanical connections between the digits, in part fromthe organization of multitendoned finger muscles, and in part from distributed neural control of the hand.Recent studies have demonstrated that mechanical coupling between the fingers rather than neuromuscular

control limits appears to be a major factor limiting the complete independence of finger movements [ 37].Finger independence is generally similar during passive and active movements, but showed a trend towardless independence in the middle, ring, and little fingers during active, large-arc movements. Mechanicalcoupling limited the independence of the index, middle, and ring fingers to the greatest degree, followed bythe little finger, and placed only negligible limitations on the independence of the thumb. Studies involvingsimple grasping or skilled tasks have shown that a small number of combined joint motions (i.e., synergies)can account for most of the variance in observed hand postures that is representative of most naturalisticpostures during object manipulation. These synergies are used broadly during variety of tasks execution,simple hand motions such as reach and grasp of objects that vary in width, curvature and angle, and skilledmotions such as precision pinch. This studies suggest that this small set of synergies represent basic buildingblocks underlying natural human hand motions [38]. The degree of interdependence of the fingers depends

on the extension of flexion movement of the fingers and also it depends on the frequency of rhythmicmovements. Angular motion tended to be greatest at the middle joint of each digit, with increased angularmotion at the proximal and distal joints during 3 Hz movements [35]. Nakamura et al. [34] discovered thatthe correlation between distal and proximal joints may depend on the grasped object. Also, while Hager-Ross and Schieber [35] simply quantify the dependency level between different fingers, Lee and Zhang [ 36]propose a control model that uses finger interactions to simulate the natural motion.

Another critical element for capturing the motion of the human hand is the marker configuration.Motion capture systems are a well established technology to measure the motion of the main human limbs.However, only recent advances in terms of sensor resolution have allowed researchers to use small markers inmedium-size (3m or more) capture volumes where natural movements of the hand are easier to reproduce.Nevertheless, selecting the correct number of markers and their position is critical to minimise occlusions.

In [39] 13 colour coded markers are located on key hand locations, 5 on the finger tips, 4 on PIP jointsof (not on the thumb), 3 markers on MCP thumb, pinkie and index joints and one on the wrist. In thiscase the large size of the markers heavily constrained their positioning and therefore subtle palm movementsremained unobserved. Zhang et al. [8] for their experiments use 21 markers: one per finger tip, one per jointright above the joint, and one on the wrist. Then the knowledge of the relative position between markersand joints is used to estimate the centres of rotation. A similar setup is also presented in [28] but withoutmarkers on the finger tips. For more accurate experiments up to six markers are used to capture the complexmotion of the thumb CMC joint alone [15]. Also, a larger number of markers is positioned on the handby Cerveri at al. [9]. Although this setup requires a longer preparation, 42 markers can provide a certainlevel of redundancy in case of occlusions. Recently Baker et al. [40] use 24 markers with 4mm diameter tocapture the finger and wrist movements during computer keyboard usage. Finally, Cerveri at al. [9] showed

that 24 markers are sufficient to capture simple tasks in a constrained scenario with fixed wrist position.

3 Motion capture

This section gives a brief overview of the VICON motion capture procedures. For the life sciences market,Vicon provides a motion capture software called Nexus. This application allows a user to control thehardware, to post process the marker data and to compute joint angles. A typical optical motion capturesection includes the following steps:

1. Hardware setup.

2. Subject setup.

5


7/44


3. Motion data capture.

4. Subject calibration.

5. Kinematic fitting.A brief summary of each operation follows:

Hardware setup: the first stage is to define the capture volume and position the VICON cameras so thatthe markers will be inside the field of view of at least two cameras at any given time. Then Nexusis used to calibrate the cameras and define the origin of the capture volume to produce accurate 3Ddata. Camera calibration enables Nexus to work out the positions, orientations, and lens properties ofall the cameras. The cameras are calibrated by waving a special wand in front of them throughout thearea where you intend to capture 3D data. The same wand is then used to set the global coordinatesystem in the capture volume, so that subjects are displayed the right way up in the Nexus workspace.Fig. 2 (a) shows a visualisation of a calibrated camera setup for hand motion capture.

Subject setup: to capture motion information the user attaches a set of markers onto the subjects surface.Then Nexus requires a description of the skeletal structure of the subject, and of the marker set. Thisinformation is stored in a Vicon Skeleton Template (VST) file. In particular, for each subject theuser specifies the approximate bone lengths, which markers are attached to which segment and theposition of the markers in the segment coordinate system.

Motion data capture: the VICON cameras capture motion data by shining light from a strobe around thecamera lens onto retro reflective markers attached to the subject. These markers show up as verybright blobs in the camera, which then does circle fitting on the blobs and extracts the blob centres.Every camera sends the positions of the extracted circles to the main processing unit in Nexus, whichuses the calibration data from the cameras to calculate the positions of the centroids in 3D space.

These are the so called marker reconstructions (reckons). An example of reconstructed markers is inFig. 2 (c).

Subject calibration: it is not necessary that the information in a VST is 100% accurate. For example,the VST file for a hand may say that the index finger is 80mm long, but the length of the specificsubjects index is 90mm. As the name says the VST is a template that should generalise acrosssimilar subjects. To this extent a calibration procedure scales the VST parameters describing segmentlengths, orientations and marker position to the physical size of the subject. To calibrate the subjectusually a special trial is recorded. In this trial the subject is asked to perform a set of actions thatexpose the articulations Range Of Motion (ROM) .

Kinematic fitting: for all the captured trials Nexus uses the calibrated subject to label the reckons and

compute the joint angles by fitting the kinematic model to the data. The procedure works correctlyas long as the markers are attached in the same configuration. Fig 2 (c) shows a set of reconstructedand labelled markers. To compute the joint angles Nexus calculates the best fit between the specifiedmarker positions on the subject and the reconstructed points from the captured data. These jointangles are often considered the final output from a motion capture pipeline and are used to drive3D models of animated characters in films and games, or analysed in life sciences and engineeringapplications. Fig 2 (e)-(f) shows examples of the hand subject kinematically fitted in different poses.

4 Kinematic model calibration

As anticipated in the previous section the VST template contains information about the subject. In particular

it defines the skeleton topology (i.e., the number of segments/bones nb and their hierarchy), the mapping

6


8/44


(a) (b)

(c) (d)

(e) (f)

Figure 2: Vicon system (a) and motion capture intermediate results (b)-(f). (b): visualization of a calibratedcamera setup; (c): 3D reconstructions of markers on a hand; (d): Vicon Skeleton (subject) for a humanhand; (e)-(f): examples of the hand subject kinematically fitted in different poses.

7


9/44


between the nm markers and their parent segments and the marker positions M = {mi}nmi=1 in the the

parent segment coordinate systems2. For each marker i we define Si(, ), a 4 4 matrix that transformsthe local coordinates mi to the world coordinates. This transform depends on the joint angle state and

on the vector of subject parameters (i.e., bone lengths and orientations). Given the kinematic chain wecan decompose Si as the product of paired transformations where each pair is composed of a fixed segmenttransformation P() and a time-varying joint transformation T(). For example, if a marker i is attachedto a segment c, and c is the third segment of a chain a, b and c then

Si(, ) = Pa()Ta()Pb()Tb()Pc()Tc(). (1)

Subject calibration adapts the model to the physical dimensions of the subject and this is formulated as anoptimisation problem over the segment parameters = {lj}

nbj=1 as well as over the marker positions M.

4.1 Least square calibration

VICON subject calibration is formulated as a non-linear least square problem where each marker generatesone observation (i.e., a reckon) ri,k R per frame k according to a Gaussian model, that is

ri,k = Si(k, ) [mi + ni,k] , (2)

where ni,k is a Gaussian distributed random vector with zero mean and covariance i.Given a set K of preselected key frames from the ROM trial, the objective function f(.) to be minimised

is the sum of squared differences between marker positions and the reckons, that is

f(, M, ) =kK

nmi=1

fi,k(k, mi, )2 (3)

=kK

nmi=1

i Si(k, )1ri,k mi2 , (4)where = {k}kK denotes the joint angles for all the key frames and fi,k(.) outputs the per-frame andper-marker residual. In a typical calibration scenario the number of free parameters may be high and add upto several thousands. The solution of such a large problem is found via a conjugate gradient-based iterativeprocedure. The quality of the final results depends above all on whether the model in Eq. (2) can predictthe reconstructed data. To point out the limitations of the calibration procedure in the next section weanalyse the residual errors.

4.2 Error analysisThe model in Eq. 2 treats the marker positional error as a residual covariance. This assumption simplifies themathematical formulation of the subject calibration problem; however, as showed in Figure 3, the distributionof the marker residuals may significantly differ from a Gaussian function.

Model prediction errors have two major cause: (i) sensor noise that affects the reconstruction of the3D marker positions; (ii) marker movements due to skin and in general soft tissue deformations caused bypose changes and bulging muscles. Although we can reasonably assume Gaussianity of the sensor noise, theresidual marker movements due to skin or cloth sliding requires further study. As we will see in the followingsections a better understanding of skin motion will lead us to formulate an improved calibration procedurewhere marker movements with respect to the parent segments are explicitly accounted for.

2

Note that the the positional coordinates (local and global) are defined by homogeneous 4D vectors, e.g.,m

= [m

xm

ym

z 1]

8
http://-/?-http://-/?-


10/44


10.5

00.5

1

1

0

1

2

1

0

1

2

xy

z

10.5

00.5

1

1

0.5

0

0.5

1

1

0.5

0

0.5

1

xy

z

10.5

00.5

1

2

1

0

1

2

1

0.5

0

0.5

1

xy

z

10.5

00.5

1

10.5

00.5

1

2

1

0

1

2

xy

z

Figure 3: Example of 3D residuals on hand motion capture. The red dots show the difference vector betweenpredicted marker positions and reconstructed measurements. The standard VICON calibration assumes theresiduals to be Gaussian distributed. The plots shows that a Gaussian does not well approximate the datadistribution.

9


11/44


Figure 4: Right hand - labeled marker.

At Second University of Naples this point was developed by using hand capture data from a MagneticResonance Imaging (MRI) device. The hand was captured with different marker setups. Also, we performedstatic and sequential MRI acquisitions on different hand poses that simulate the tasks proposed in theDEXMART testing scenario [41]. Then we extracted from MRI data the displacements of markers placed onthe hand dorsal. In particular in this study, we analyse the marker movements caused by a set of predefinedflexions of the fingers. The displacement of the marker relative to the underlying bone is observed andquantified.

First, we used the MRI equipment to capture a static hand in two different poses and we reconstructed thethree-dimensional models of the hand bones. Then, reflective markers were attached to the subjects hand(see Fig. 4) and a sequential protocol was used to track their position in two different postures. To validatethe static data a dynamic MRI scan of the sensorised hand was also performed. No significant differenceswere measured between the static and dynamic displacements. Some authors [42, 43] have reported onkinematic studies based on MRI acquisition techniques, the importance of acquiring joint motion actively,due to the existence of statistically significant variations between acquiring actively or passively. Unlikeother articulations like knee and hip [44, 45, 46], this does not apply to soft tissue artefacts evaluation ofthe back-hand. In active acquisition, no abnormal tracking patterns due the influence of hand muscles andtendons were observed during the flexion and extension of fingers.

Although MRI data usually shows relevant differences across subjects in soft tissue elasticity, and these

differences are dependent on the subject weight, height and age, for the purpose of our experiment weconsidered the variation of distance between marker and bone reference to be small and therefore subject in-dependent. Our experiments were executed on a healthy male subject. The size of the hand is approximately20.5cm long. The subject consented to use of his anatomical data for scientific purposes.

MRI acquisition

The MRI scanning was performed at University of Naples Federico II with a 1.5 T station manufactured byPhilips Medical systems. We captured the subject while in supine position and with the right arm on top ofthe body. Two high-resolution MRI scans of the right hand containing thin axial slices were obtained. Thetwo series have a small Field Of View (FOV) as they measure one hand only. Two series of T1-weightedspin echo images and two series of T1-weighted gradient echo images were acquired with one frame every

10


12/44


Figure 5: MRI processing pipeline.

11.4ms, 4.4ms echo, and 250mm FOV. The surface markers in the MR image looks like small cylinders ashighlighted by the arrows in Fig. 6, 7, and 8.

For each hand pose one hundred images representing a slice of the hand were generated. The spacingbetween each slice is 1.5mm. Each image is 256 256 pixels in size, 8 bit per pixel, and with each pixelcovering a physical rectangular area 0.98mm wide. In the first posture the slicing plane is parallel to thelongitude direction of the fingers. While in the second pose the plane is perpendicular to the longitudinaldirection of the fingers.

MRI processing

In the first stage of the processing a semiautomatic analysis was conduct. We used the software Vitreaver. 2.0 of the Vital Images inc. for 2D and 3D visualization and editing of the MR images data. Thesoftware can convert the scans (DICOM format)into many different image format, segment the region of

interest and generate iso-surfaces. Also, we performed a manual segmentation to distinguish bones andsurface markers from soft tissues. First, the pixels were removed by the tuning of the threshold value at 60.Then, a contour tracing method was used to identify the object edges. In the second stage of processingwas carried out a more accurate measurement of sliding for the metacarpal markers without any manualediting of recorded images. For this purpose, we have used a co-registration method of MR Images for thetwo different poses of the hand (pose 1:open hand, pose 2: closed hand). The used method was developedat Biostructure and Bioimaging Institute (IBB) of Italian Research National Council (CNR). We used theSPM software to implement the automatic coregistration processing of MRI data, it is a suite of MatLabfunctions and subroutines,typically used for functional PET and MRI brain image analysis that implements"statistical parametric mapping". By first, the MR images sequences (pose 1 and pose 2) are smoothedand filtered to eliminate some artifact in the coregistration process due of the fat and of the skin which

are in the recorded images. The algorithms work by minimizing the sum of squares difference between

11


13/44


Table 1: Pair-wise marker distance with close and open hand. The distance measures are in millimetres.

MARKER ID DISTANCES

FIRST SECOND OPEN CLOSED DIFF.RMM4 RH4 19.1 25.4 -6.3RMM4 RH6 27.8 31.8 -4.0RMM2 RH1 34.0 36.2 -2.2RMM2 RH3 20.5 24.3 -3.8RMF1 RH3 13.5 16.9 -3.4

Table 2: Distances between a marker and the relative bone head. The radius bone is used as a reference.The distance measures are in millimetres.

ID DIRECTION BONE DISTANCES

OPEN CLOSED DIFF.

RH6 PROX. CARPAL 15.3 13.8 +1.5RH6 DISTAL CARPAL 18.6 14.2 +4.4

RMM4 PROX. 4th METAC. 14.6 18.6 -4.0RMM4 DISTAL 4th METAC. 43.5 40.5 -3.0RMM2 PROX. 3rd METAC. 24.7 30.4 -5.7RMM2 DISTAL 3rd METAC. 44.6 37.8 +6.8RMF2 PROX. 2nd MID PHA. 6.3 5.8 +0.5RMF2 DISTAL 2nd MID PHA. 21.2 23.8 -2.6

RH3 DISTAL 3rd METAC. 20.1 13.8 +6.3

the images which are to be coregistered. The first step of the process is to determine the optimum 12-parameter affine transformation. Initially, the coregistration is performed by matching the whole of the twohand pose. Following this, the registration proceeded by only matching the metacarpal bones together, byappropriate weighting of the voxels. A Bayesian framework is used, such that the registration searches forthe solution that maximizes the a posteriori probability of it being correct. i.e., it maximizes the product ofthe likelihood function (derived from the residual squared difference) and the prior function (which is basedon the probability of obtaining a particular set of zooms and shears). The affine registration is followed byestimating nonlinear deformations, whereby the deformations are defined by a linear combination of three

dimensional discrete cosine transform (DCT) basis functions. The parameters represent coefficients of thedeformations in three orthogonal directions. The matching involved simultaneously minimizing the bendingenergies of the deformation fields and the residual squared difference between the images.

Results

In the first stage of the analysis, after registration of the 3D hands we performed two different measurements.First we measured the distance of a marker from a reference bone in both poses (Fig 6 (c)-(f)). Then,we measured the pair-wise distances between metacarpal markers and their variations due to pose changes(Fig 7 (e)-(f)). Table 1 and 2 summarize the results.

The results in Tab 1 show that when the hand is flexed, due to skin stretch and muscles deformations,

the distance between markers increases. The markers slide over the bones while the hand moves, and this

12


14/44


(a) (b)

(c) (d)

(e) (f)

Figure 6: MRI measurements for the marker RH6 on open (left column) and closed (right column) hand.(a)-(b): marker spatial position (yellow arrow); (c)-(d): distance from radius proximal head; (e)-(f): distancefrom metacarpal proximal head.

13


15/44


(a) (b)

(c) (d)

(e) (f)

Figure 7: MRI measurements for the marker RMM4 on open (left column) and closed (right column) hand.(a)-(b): marker spatial position (yellow arrow); (c)-(d): distance from metacarpal proximal head; (e)-(f):distances from markers RH4 and RH6.

14


16/44


(a) (b) (c)

Figure 8: MRI measurements for the marker RMF2. The blue lines show the distances between the markerand the 3rd middle phalanx bone heads.

causes large residual errors during calibration and fitting with the optical system. In absolute value RMM4moves of 6.3mm and 4.0mm with respect to RH4 and RH6. This indicates that the skin deformations followa not linear law. In fact the distance changes between RH4 and RMM4 are bout 33% of the initial distance,while between RMM4 and RH6 the change is about 14%. These measures give us a good starting pointto correct passive optical data. The results of the RH1, RMM2 and RH3 chain are even more clear. Theslide between RMM2 and RH1 is of 6%, while RH3 slides by 18%. We can see that RMF1 slides of 25% byRH3, and we have still to consider that RH3 is moving too. The results in Tab. 2 show that RH6 slidingis apparently incongruent. Closing the hand doesnt stretch the skin as we could believe, but contracts it.This happens because the subject does not only closes the hand, but also moves it. This causes a larger skinslide of about 24% toward the metacarpal and 9% toward the radius. As a results RH6 moves toward themetacarpal. RMM4 instead shifts toward the radius because of the relative hand-radius movement; whileRMM2 moves toward the middle finger. Vice-versa, RMF2 is moving toward the metacarpal zone (Fig. 8).Finally the skin sliding causes RH3 to move toward the 3rd phalanx.

In the second stage of our analysis, the output of the coregistration process has provided a more reliableresult of the distances of metacarpal markers in the two hand poses (Fig 9). The first step of coregistrationprocess required a filtering operation of the 2 MRI series. The lack of the antenna during acquisition phaseproduce noise which is reduced dividing the images with a parabolic fitting. Another filtering process wasnecessary to reduce anisotropic noise in such a way as to preserve the parts of the images with highergradients (edge preserving). Then, a level setting process was applied to eliminate non-interesting parts ofthe images(Fig 10). Table 3 summarize the results.

The results in Tab 3 show that metacarpal markers significantly move over the bones while the handmoves from pose 1 to pose 2. Obviously, the most important contribution is given by the displacement alongthe axial direction of the hand (Y RANGE). The largest displacement appens on the third metacarpal bonein the vicinity of the proximal phalanx (middle finger). RH3 slips more than 11 mm along the direction ofthe 3th metacarpal bone while RH2 and RH4 have covered a distance little bit less. The distances coveredby markers RMM2 and RMM4 respectively are 68% and 45% of maximum displacement. Also, the markerRH5 approximately slips 68% of the maximum displacement. We can see that the sliding is maximum inthe middle of the backhand and it decreases towards the wrist and the little finger more than what happensin the thumb direction. The significant variation along the others 2 directions also mast be considered toreduce the residual errors during subject calibration and fitting with an optical capture system.

We also have analyzed the MRI scans for a gloved hand in the 2 hand poses. The considerable noisedue to the presence of the glove, made it impossible to obtain good images sequences after the filtering

and segmentation phases in the coregistration processes. As can be seen (Fig 11 (c)-(d)) many markers

15


17/44


(a) (b)

(c) (d)

Figure 9: Coregistration of MR Images for the 2 hand poses. (a)-(b): the 2 poses labelled markers; (c)-(d):the MR Images of the 2 poses, with metacarpal labelled markers.

16


18/44


(a) (b)

Figure 10: Coregistration of MR Images for the 2 hand poses. (a)-(b): Filtered Images (left column)andMR Images after the coregistration(right column).

17


19/44


Table 3: Distances between metacarpal markers in the 2 hand poses after coregistration process. Thedistance measures are in millimetres.

MARKER ID DISTANCES2-DISTANCE X RANGE Y RANGE Z RANGE

RH2 10.94 0.94 10.85 1.07RH3 11.31 2.01 10.91 2.23RH4 10.08 1.29 9.15 4.05RH5 7.76 0.01 6.07 4.83

RMM2 7.70 0.24 7.61 1.13RMM4 5.16 0.54 4.96 1.32

RH1 5.52 0.55 5.24 1.68RH6 4.98 1.01 3.82 3.03

are lost after the filtering process and the segmentation of matacarpal bones images is not properly correct.We belive that a different MRI acquisition protocol must be investigate to obtain a reliable coregistrationprocess for gloved hand.

Residual correlation analysis

To better understand the soft tissue artefacts we analysed the calibration result from a right hand ROMtrial. From a preliminary visual inspection of the results using NEXUS it became apparent that a dependencybetween marker motion and joint angles in indeed exists. The plots in Fig. 12 (a)-(c) show three typicalmarker motion behaviours. Each graph displays one component of the unnormalised marker to reckonresidual

di,k = Si(k, )1

ri,k mi, with di,k D, (5)

plotted against the maximally correlated joint angle. As a first approximation a good portion of the residualspresent a simple linear relationship with the joint angle (Fig. 12 (a)). The approximation holds when the jointangle range is small. However extreme finger flexion may result in a non-linear relationship (Fig. 12 (b)).Finally, as showed by residual-parameter correlation matrix in Fig. 12 (d) marker motion can be highlycorrelated with more than one joint parameter (i.e., matrix rows with more than one red cell). Consequentlya model with a single input cannot provide good predictions (Fig. 12 (c)).

4.3 Marker motion model

The residual error analysis showed that it may be possible to improve the predictive power of the kinematicmodel by explicitly accounting for marker movements. To this extent we propose to model the markerposition as a parametric function mi(, wi) of the joint state with parameters wi W. In particular wechose a linear form, mi(, wi) = F()wi, where

F() =

x()T 0 0 0

0 y()T

0 00 0 z()

T 00 0 0 1

18


20/44


(a) (b)

(c) (d)

Figure 11: Gloved hand for coregistration process; open (left column) and closed (right column) hand.(a)-(b): Labelled markers; (c)-(d): Output after filtering and segmentation process.

19


21/44


0.2 0.15 0.1 0.05 0 0.05 0.1 0.153

2

1

0

1

2

3

4

5

Joint angle ()

d

1 0.5 0 0.5 1 1.51.5

1

0.5

0

0.5

1

1.5

2

Joint angle ()

d

(a) (b)

0.4 0.2 0 0.2 0.4 0.61

0.5

0

0.5

1

1.5

Joint angle ()

d

d

1 2 3 4 5 6 7 8 9

10

20

30

40

50

60

70

80

900.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(c) (d)

Figure 12: Visual analysis of the relationship between joint parameters and unnormalised marker to reckonresiduals. (a)-(c): sample residual components (in mm) plotted against the maximally correlated jointparameter (in radians). (d): correlation coefficient absolute values.

20


22/44


contains the regressor vectors () for each of the three positional coordinates. In our implementation theregressors are simple polynomial components. The modified variable position marker model becomes

ri,k

= Si(k

, ) [F(k

)wi

+ ni,k

] . (6)

Note that, if the polynomials are zero-order (i.e., F() = I) Eq. (6) reduces to Eq. (2) with mi = wi.To limit the number of additional parameters it is not desirable to fix the polynomial order for all the

residuals or to have polynomial components for all the joint parameters in . As shown in Fig. 12 eachresidual is usually correlated to a limited number of joint angles and in some cases non-linear componentsmay not be necessary. At the same time favouring simpler models with a low number of extra parametersimproves the computational efficiency of the calibration step and prevents model overfitting. A modelselection procedure is required to find the correct balance between the number of extra parameters andmodel complexity. As in other model selection problems our goal is to add only those parameters whichcontribute to a significant reduction of the overall residual.

Given a measure of model quality, the optimal model selection scheme would require to optimise theparameters for all the combinations of regressors. Unfortunately, as the parameter optimisation is anexpensive procedure on its own, a suboptimal, although faster, methodology is necessary. To this extent wepropose a procedure that requires the optimisation of two models only.

The outline of the procedure is in Algorithm 1. First we calibrate the standard model defined in Eq. (2).Then we perform an analysis of the residual error to select the additional polynomial parameters. Finally,we calibrate the new model where the markers move according to the polynomial functions.

To perform model selection we treat the residual components independently. Therefore, it is convenientto define dk = [d1,k . . . dnm,k] as the concatenation of the unnormalised residuals at frame k and an

index u = 1, . . . , U for the single residual components d(u)k . Also, we define the 1D polynomial functiong(u)(, w(u)) = (u)()Tw(u) modelling the marker motion for the component u. For each residual the

model selection procedure is composed of two steps: (i) first we analyse the correlation between k andd(u)k ; those parameters with correlation larger than a threshold T are selected as active inputs for the skinmodel function g(u) (Algorithm 2); (ii) then we initialise set (u) = 1 with the zero order regressor onlyand for each input we add higher order regressors (i.e., , 2, 3, etc.) in a greedy fashion (Algorithm 3);the greedy procedure comes to a halt when none of the more complex models under test improves theperformance with respect to the current best model.

The performance of two models g and g on the data d is compared by computing the Bayes factor

p(d|g)

p(d|g)=

p(d|w, g)p(w|g)dw

p(d|w, g)p(w|g)dw. (7)

As in Eq. (6) we use a Gaussian noise model with independent marker residuals. Thus we can write the

likelihood asp(d|w, g) =

k

N(dk|g(k, w), ),

where N is a Gaussian with mean g(k, w) and variance evaluated in dk. The definition of the priorp(w|g) requires some preliminary considerations. The model selection step does not recalibrate the subjectfor each marker motion model, but compares the models on a fixed residual obtained assuming staticmarkers. Therefore we can expect a bias between this approximated residual and the actual one. The biasmagnitude is unknown a priori and should not affect the model selection result. Consequently we use auniform prior for the zero order parameter w0 w while for all the other parameters we use a standardGaussian regulariser, that is

p(w|g) N(w|0, w), (8)

21


23/44


Algorithm 1 Skin model calibration1: INPUT: {ri,k, Mstd}2: OUTPUT: {Mskin, , W}

3: Calibrate the standard model Mstd (Eq. (2)) on the frame subset K.4: Fix and M (the bone and target parameters)5: for all frames in the ROM do6: Compute the joint angles k and the residuals dk7: end for

8: {Approximated model selection}9: Mskin = Mstd

10: for all the components d(u)k D do11: Select the active input indexes Au via correlation analysis (Algorithm 2)12: Perform greedy model selection (Algorithm 3)13: Add polynomial regressors (u) to Mskin

14: end for15: Calibrate the new model Mskin via Eq. (6).

Algorithm 2 Active inputs pre-selection

1: INPUT: {, d(u)}2: OUTPUT: set of active inputs Au3: Compute the correlation c(u)j between d

(u)k and j,k (a row in Fig. 12 (d)).

4: for all j do

5: if c(u)j > T then

6: Add j to the set of active inputs Au.7: end if

8: end for

where w is the vector containing all the parameters but w0 (i.e., w = [w0 w]), and w is a diagonal priorcovariance. Also, note that the regression steps 4 and 9 in Algorithm 3 use the same regulariser. Giventhis choice of likelihood and prior, an algebraic solution of the integrals in Eq. (7) exists. The derivation ispresented in Appendix A.

To conclude this section we comment on the residual components independence assumption. In generalthe marker noise components are not independent; for example the sensor noise depends on the cameraposition and on the pose of the subject in the capture volume. However the dominant component of theresidual may still due to unmodelled soft tissue artefacts like those caused by abrupt limb accelerations.

Although the formulation in Appendix A could be easily extended to full covariance matrices, in practise,before the subject calibration step, the user is often unable to provide a good estimate for the residual errorcovariance and usually isotropic models are the default.

5 Methods for experimental validation

Ideally an objective comparison between different kinematic models would require the actual joint anglevalues measured with a procedure that guarantees higher accuracy than motion capture measurements.Unfortunately, acquiring accurate ground truth data almost always involves invasive procedures. Many haveused intra-cortical (bone) pin mounted markers or capitalised on external bone fixation already in place.For example Fuller et al. [47] use bone pins to asses the effect of skin movements on the joint estimation

22


24/44


Algorithm 3 Greedy selection of the polynomial components.

1: INPUT: {, d(u), Au}2: OUTPUT: set of regressors (u)

3: Initialize zero order model: (u) = {1}4: Regress model g(u)(, w(u)) to the data d(u).5: repeat

6: (best)() = (u)()

7: for all active inputs v Au do8: Add a component: (test) = (u) ov+1v with order ov + 19: Regress model g(test)(, w(test)) to the data d(u).

10: ifp(d(u)|g(test))

p(d(u)|g(best))> 1 then

11: (best)() = (test)()

12: end if

13: end for

14: if p(g(best)|d(u))p(g(u)|d(u))

> 1 then

15: (u)() = (best)()

16: stop = false

17: else

18: stop = true19: end if

20: until stop

procedure. Percutaneous fixation of markers has also been used, but this is only marginally less invasive and

still requires ethical approval [48]. Less invasive X-ray studies have been performed, but these invariablyrequire the attachment of radiolucent markers to the bone [49]. Despite this less invasive approach, ionisingradiation still carries with it the need for ethical approval. Instead of comparing the joint angles Veber andBajd [50] compare the subject calibration results of the phalanx segments with the data from statistically-based anthropometry (i.e., hand lengths and palm widths). However, in our opinion the accuracy of thebone lengths may not be sufficient to differentiate similar models.

As direct measurements are impractical researchers have evaluate other desirable model qualities [51, 10]or have used synthetic or semi synthetic data [28]. A well established procedure is to measure the repeatabilityof the calibration results[51, 10]. In these tests the researcher capture 20 to 30 trials of the same subjectperforming the same movement. Then a calibration step is run on each trial end the results in terms ofbone lengths and joint angles are analysed. Good models and a good calibration procedures should produce

consistent results with small cross-trial variance (see ANOVA [52]). Cereatti et al. [28] generate 3D datawith a synthetic model of the knee. The authors animate the model with real gait data and then add tothe predicted marker position real and synthetic soft tissue artefacts. Finally they compare the joint anglesand the calibration parameters with those of the synthetic subject.

Although repeatability analysis and ANOVA are well established techniques for model validation they donot tell us how well the model explains the data. For example, an over-simple model that cannot explainsome movements could still be highly consistent on a particular movement. The problem is similar to the skinparameter selection of Sec. 4.3 as the goal is again to find a compromise between complexity, descriptivenessand what we can effectively measure. Broadly speaking, complex models (with many segments and moreDoF per joint) have the potential to better explain the data; however they are less stable and slower tooptimise than simpler models. Also, given the data noise level we can model selection should tell us if the

model is overfitting the data. On this regard the next section we report on our study on statistical model

23


25/44


selection methods to evaluate hand kinematic models.

5.1 Statistical validation

Given the reconstructed data R from a trial, the selection of a kinematic model M is a similar problem tothe marker model selection step described in Sec. 4.3. While the marker procedure optimises the model fora single marker, we are now required to evaluate complete hand models. The parametrisation of a handmodel is the union of all time-constant and time-variable parameters (i.e., = [ W ]). Unfortunatelythe formulation of the marginal p(R|M) is not as straightforward as in Eq. (7). Thus we have to resort toapproximated methods.

Popular approaches to model selection use a model quality score composed of two terms: one termfavours models with a good fit to the training data; the second term assigns a larger penalty to morecomplex parametrisations. The Akaike information criterion (AIC) [53], which is based on the Kullback-Leibler information number, uses as a distance score (i.e., the lower the better) the unbiased estimator ofthe expected log-likelihood, that is

AIC = 2logp(R|, M) + 2n, (9)

where n is the number of parameters in . A variant of AIC is the Consistent AIC (CAIC) that accountsfor the number of samples nR. The CAIC formulation is

CAIC = 2logp(R|, M) + 2n(log nR + 1). (10)

The Bayesian Information Criterion [54] instead assigns a score to each model according to an approx-imation of the marginal p(R|M) under the assumptions that the data distribution is in the exponentialfamily. This results in

p(R|M) p(R|, M) n

nR/2

. (11)The BIC approximation is quite crude, especially for the parameter prior p(|M). A more elegant

solution is to bootstrap the data and estimate, under a Gaussian assumption, the prior covariance V aswell [55]. Algorithm 4 outlines the procedure used to compute an approximated p(R|M). First we calibratethe model as explained in Sec. 4.1. Then we bootstrap the unnormalised residuals and we create a set ofsemi-synthetic trials. Each trial is again calibrated and the parameter covariance V is computed. Finally weuse the covariance estimate to compute the approximated marginal (Algorithm 4, step 13).

Although the estimates produced by the bootstrap procedure are potentially more accurate than BICones, its application is limited to small subjects. In fact, bootstrapping typically requires at least 1000samples that in our case correspond to 1000 computationally expensive subject calibrations.

6 Results

6.1 Marker motion model

We evaluate the kinematic model with moving makers described in Sec. 4.3 on capture data acquired witha rig of nine 4 megapixel Vicon MX cameras. As a proof of concept we limited the capture to two fingers:the right thumb and index of one healthy subject. 31 markers with 3mm diameter were glued to the latexglove wore by the subject as showed in Fig. 13. Also, to ensure accuracy we of the global position we gluedone larger (7mm) marker over the wrist and limited the capture volume to about 1m. Finally, to reducethe occurrence of marker occlusions under wrist rotations we pointed two of the nine cameras upwards. Asin the Santos model [11] we defined the index and thumb articulations with two DoF for CMC and TMC

joints and one DoF for thumb IP, PIP and DIP joints.

24


26/44


Algorithm 4 Bootstrap marginal approximation.1: INPUT: {R, M}2: OUTPUT: p(R|M)

3: Calibrate (Maximum Likelihood solution): (ML)

= arg maxp(R|, M)4: Compute residuals D(ML) {Eq. (5)}5: for s = 1, . . . , ns {ns is the number of bootstrap samples} do6: Draw with replacement D(s) from D(ML)

7: for all k K and i = 1, . . . , nm do8: Create synthetic reconstructions: r(s)i,k = Si(

(ML)k ,

(ML))m(ML)i + d

(s)i,k

9: end for

10: Calibrate: (s) = arg maxp(R(s)|, M)11: end for

12: Compute the covariance V of over the ns samples13: Approximate marginal as: p(R|M) (2)n/2p(R|(ML), M)|V|

Figure 13: High density markerset. The thumb and the index are sensorised with 32 markers. The markersare glued to a latex glove.

We compared the results of the standard Static Marker (SM) model (Eq. ( 2)) and the enhanced modelwith Moving Markers (MM) Eq. (6) on three capture trials. Trial 1, 2 and 3 are three ROM trials. In Trial4 the subject repeatedly picks a piece of plastic cutlery (a knife) from a small container that he holds withthe other hand. For our experiments we used 100 frames from Trial 1 to calibrate the two subjects. Then,

for the remaining frames in Trial 1 and for the other two trials, we computed the joint angles and the RootMean Square Error (RMSE) of the unnormalised marker residuals. Also, we set the maximal degree of thepolynomials in the marker motion model to three. Fig. 14 and Table 4 summarise the results. For all fourtrials MM has a significant lower RMSE than SM. The reduction was expected on Trial 1 as this is thetrial used to calibrate the subject and the proposed model has a larger number of free parameters than thestandard one. However the improvement on the other three trial shows that MM generalises well on unseendata. This result is consistent for motions similar to the training ones (i.e., Trial 2 and Trial 3) as wellas for fairly dissimilar movements as in Trial 4. Also, Fig. 14 shows that the performance improvement isconsistent over time. The marker motion model outperforms the standard model both on extreme poses,when the fingers are fully flexed (see RMSE peaks in Fig. 14), and near the mean pose. Also, during theexperiments we noted that sometimes the marker motion shows an hysteretic behaviour that is caused by

the glove sliding over the skin and not returning to the original position. The problem affects both models,

25


27/44


28/44


0 200 400 600 800 1000 1200 1400 1600 1800 20000

0.5

1

1.5

RMSE

Frame number

Trial 1 (ROM)

SMMM

0 200 400 600 800 1000 1200 1400 1600 1800 20000

0.5

1

1.5

RMSE

Frame number

Trial 2

SM

MM

0 200 400 6000

0.5

1

1.5

RM

SE

Frame number

Trial 3

SMMM

0 200 400 6000

0.5

1

1.5

RM

SE

Frame number

Trial 4

SMMM

Figure 14: RMSE comparison between the standard calibration model with static markers (SM) and theproposed model with polynomial moving markers (MM). For all test trials MM better predicts the markerpositions.

27


29/44


30/44


Table 6: Five best kinematic models of the hand according to the CAIC and BIC scores when training on aROM trial (CAIC score: the lower the better. BIC score: the higher the better.).

Model type CAICThumb Index Palm score103

TM5 FM2 PM4 29.6TM6a FM2 PM4 30.6TM6b FM2 PM4 30.7TM5 FM2 PM6 31.2TM7 FM2 PM4 31.7

Model type BICThumb Index Palm score103

TM5 FM2 PM4 -15.0TM6a FM2 PM4 -15.5TM6b FM2 PM4 -15.5TM5 FM2 PM6 -15.7TM7 FM2 PM4 -16.0

plausible Finger Models (FM) exist. The most common has two DoF on the TMC joints; the other has three

DoF. We name these two models as FM2 and FM3. Finally, the Palm Model (PM) can be rigid (PMR),with two CMC Hardy-Spicer joints for ring and pinky (PM4), or with a third CMC Hardy-Spicer joint forthe index (PM6). To capture the motion we sensorised the hand with 22 markers positioned as showed inFig. 2 (c),(e), one marker per phalanx and the rest on the hand dorsal. For the evaluation we run modelselection methods on three capture trials. While the first trial is a classic ROM, in the other two trials thesubject was asked to perform two actions that are particularly relevant to the benchmarking scenario of theDEXMART project [41]. In one trial the subject unscrews a jar lid; in the other trial the subject repeatedlypicks a piece of cutlery from a small box. Finally, due to the Bootstrap procedure being too computationallydemanding on full hand subjects, we present results for CAIC and BIC methods only.

Table 6, 7 and 8 show the best 5 models according to BIC and CAIC for the three trials respectively. Thefirst observation is that BIC and CAIC outputs are fully coherent. Therefore, without loss of generality, we

can comment the results of one or the other. The ROM trial results (Table 6) show which models producegood fit to generic hand movements. In this case the Santos model (TM5-FM2-PM4) achieves the highestscore. The other high ranked models are more complex than Santos and have extra DoFs on the thumb(TM6a, TM6b and TM7) or on the palm (PM6). None of the models use three DoF for the finger MCPjoints. The Santos model is also the highest scorer on the jar lid unscrewing movement (Tab. 7 . However,the other high rank models present a larger number of DoF than on the ROM trial case. This indicatesthat, to achieve high accuracy on this specific movement a more complex palm model like PM6 can be aviable option. Finally, the results in Tab. 8 show that the cutlery picking task also triggers extra DoF onthe palm as well as on the thumb. In this case the highest score is produced by the kinematics using themost complex palm model.

6.3 Evaluation of finger joints interdependencies

The experimental study in this section reports a quantitative analysis of motion coordination from thumbto little finger, and examines the kinematic synergies during reaching and hand grasping activity. We haveasked four male subjects, with height to weight ratio from 40 to 90 percentile, to perform two types ofcylinder-grasping with their right hand that involved concurrent voluntary flexion of fingers. Two othertypes of voluntary flexion and extension of each individual fingers, the first with a support for the palm,the second without, are recorded for each volunteer. The acquisition was performed with a Vicon motioncapture system (Oxford Metrics Ltd., UK), with five-cameras. We measured the trajectories of 23 3.0-mmreflective markers on the backhand of right hand (Fig. 17 at a sampling frequency of 60 Hz, and thenoutput the time-varying marker coordinates in a three-dimensional laboratory coordinate system (x y z)

established through prior calibration.

29


31/44


Figure 16: Hand joint names.

Figure 17: Marker set.

30


32/44


Table 7: Five best kinematic models of hand according to the CAIC and BIC scores. The training is doneon motion capture data of a hand opening a jar lid (CAIC score: the lower the better. BIC score: the higherthe better.).

Model type CAIC

Thumb Index Palm score103

TM5 FM2 PM4 30.3TM5 FM2 PM6 30.7

TM6b FM2 PM4 31.2TM6b FM2 PM6 31.3TM7 FM2 PM6 32.4

Model type BIC


TM5 FM2 PM4 -15.3TM5 FM2 PM6 -15.5

TM6b FM2 PM4 -15.8TM6b FM2 PM6 -15.8TM7 FM2 PM6 -16.4

Table 8: Five best kinematic models of hand according to the CAIC and BIC scores. The training is doneon motion capture data of a hand picking up pieces of of cutlery from a small box (CAIC score: the lowerthe better. BIC score: the higher the better.).

Model type CAIC


TM5 FM2 PM6 35.6TM6a FM2 PM6 36.1TM6b FM2 PM4 36.7TM6a FM2 PM4 37.2TM6b FM2 PM6 37.3

Model type BIC


TM5 FM2 PM6 -17.9TM6a FM2 PM6 -18.2TM6b FM2 PM4 -18.5TM7 FM2 PM4 -18.7

TM6b FM2 PM6 -18.8

After a labelling procedure of markers and a tracking process we have performed a dynamic subjectcalibration and a fitting of the subject motion. The calculated joint angles values during the trials areexported in Matlab in CSV format. The movements are acquired with the performers seated in an initialpose with the torso approximately upright, the right upper arm vertical and forearm horizontal. The fingersare in natural full extension and the palm is supported by a desk. In the execution of tasks small forearmpronation/supination and torso assistance was involved4.

In the first two tasks (Fig 18), subjects reached forward over a distance of approximately 250 mm tograsp two different vertical cylinders with diameters 50 mm and 65 mm (once for each trial). The observation

is focused on concurrent voluntary flexion of all digits in whole grasp task. Before the subject returns to theinitial posture the cylinder is placed at 150 mm from its initial position and a concurrent voluntary extensionof all fingers is observed.

In the third and fourth task, subjects maintained the same initial posture as in the first two tasks. Eachsubject performed two consecutive repetitions of individual flexion voluntary flexion of individual fingers,one digit at a time. For the latter task, the palm of the performer is posed on a special support without anyother constraints. In the latter two tasks, the subjects were instructed not to consciously control involuntaryjoint flexion of the non-intended fingers; they completed 10 trials (five different movements, two repetitions)for each task.

A local coordinate system x0y0z0 was established to facilitate kinematic descriptions and definitions.

4. The subjects moved each finger into flexion and extension while attempting to keep the others, non instructed fingers

still.

31
http://-/?-http://-/?-


33/44


34/44


35/44


36/44


37/44


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

2324

25

26

DOF number

DOFnumber

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure 22: Estimated variance of the correlation indices in the gasp trials.

0 5 10 15 20 250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

principal component number

singularvaluesofthecovariancematrix

extensionflexion pca analysis

Figure 23: Normalised singular values of PCA in the flexion-extension task.

36


38/44


0 5 10 15 20 250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

principal component number

singularvalue

grasping PCA analysis

Figure 24: Normalised singular values of PCA in the grasp task.

the high degree of freedom of the hand. In particular, there are two very difficult problems to solve: thefirst is the reduction of the number of marker to place on a small area of the backhand and the second isthe reduction of the marker occlusion phenomenon due to dexterous hand performance in a capture area(also the relevance of ghost marker problem increases with the number of the markers in a small field ofview of each camera).

Most of the researchers in this area use a number of markers reduced with respect to the minimum

number required to reconstruct the motion of all the bones constituting the human hand. The reductionhas been made possible by using a mathematical model of the hand or by use of additional sensors. e.g. datagloves. The second strategy will be also used in DEXMART, in fact the planned activities already include theintegration of a data glove (under development in WP5) into the OMG optical motion capture system. Themain motivation is the objective to reduce at a minimal extent the failure in motion tracking of hand bonesdue to the marker occlusion problem, which is very frequent during manipulation tasks. Some preliminarymeasurements have already been conducted, where the motion of a single finger has been captured by usingthe typical marker set used in the literature (one marker for each finger bone) and even in such a simplecase, occlusions have been demonstrated very frequent even with the use of five cameras well distributedaround the hand workspace. To improve the quality of acquired kinematic data and to reduce the minimalnumber of markers, a sensor fusion algorithm for hand motion tracking will be realised. In particular our

sensorised glove is equipped with only three markers and three low cost angular sensors per finger. In detail,three markers are used for defining a reference system fixed to the hand wearing the glove, three markersplaced on the index finger are used to estimate the joint angles between the phalanxes. The three angularsensors are then mounted on the same finger (see Fig. 25).

To perform the experiments we used a Vicon 460 motion capture system equipped with 5 high resolutionM2 cameras. Figure 26 shows the marker trajectories for four consecutive flexion and extension movementsof the index. This experiment was executed in the two cases with the same constraints condition ( the palmof the hand is still held in a fixed position and the index motion is performed without any other constrainson other fingers.

The results show high variability of the marker trajectories across consecutive movements. This is dueto the sliding of the glove with respect to the phalanxes bones. To evaluate and reduce this effect it will be

37


39/44


Figure 25: Sensorized data glove.

Figure 26: Trajectories generated by three markers mounted on the data glove.

38


40/44


41/44


42/44


43/44


44/44

216239 d1-1 kinematic modelling

Documents