coordinate frames in robotic teleoperationlahiatt/pubs/for_iros2006.pdf · coordinate frames in...

Coordinate Frames in Robotic TeleoperationLaura M. Hiatt

Computer Science DepartmentCarnegie Mellon University

Pittsburgh, PA 15213Email: [email protected]

Reid SimmonsRobotics Institute

Carnegie Mellon UniversityPittsburgh, PA 15213

Email: [email protected]

Abstract— An important mode of human-robot interaction isteleoperation, in which a human operator directly controls arobot via hardware such as a joystick or mouse. Such controlis not always easy, however, as the viewpoint of the human, thealignment of the input device, and the local coordinate frame ofthe robot are rarely all aligned. These discrepancies force the userto reconcile the involved coordinate frames during teleoperation.Therefore, the choice of coordinate frames is critical sincean unintuitive coordinate frame mapping will likely lead tohigher mental workload and reduced efficiency. We discuss thisconcern, describe the various difficulties involved with naturalremote teleoperation of a robot, and report experiments thatdemonstrate the effects of using different frames of reference ontask performance and user mental workload.

I. INTRODUCTION

Teleoperation has traditionally been an important mode ofrobotic control. Teleoperation here refers to direct roboticcontrol using a variety of interaction modalities, including dia-logue, joysticks, and graphical interfaces. Under this paradigm,the human controls all robotic movements and the robot simplyfollows the low-level instructions. A recent trend in roboticsalso uses teleoperation in combination with autonomy, allow-ing robots to exercise some autonomy while still involvingthe human in execution-time decisions and commands [1],[2]. For example, the system may run autonomously whileboth asking for assistance and accepting interventions fromthe human operator as necessary. In both of these paradigms,teleoperation is a powerful technique for robot control. In orderfor teleoperation to be successful, however, it must allow formeaningful and natural interactions between robot and human.Because teleoperation necessarily involves providing input insome predefined frame of reference, one possible approachto improving these interactions is to explicitly consider thecoordinate frame used during the teleoperation interaction.

The concept of coordinate frames in teleoperation is notnew. There is a growing body of research on frames ofreference used in dialogue, both in human-human interactionand in human-robot interaction [3]. When humans speakamongst themselves, they frequently switch between differentframes of reference. Possible reference frames include thespeaker (“move to my right”), the listener (“move to yourleft”), a third-party object (“go behind the table”), and globalreferences (“go north”). For human-human interaction, suchreferences are second nature and are usually easily understood.For human-robot verbal interaction, however, this is not as

trivial a matter. Although some of these frames of referenceare more common than others in typical conversation, it isimportant to distinguish between the different cases in orderto ensure that such verbal instructions result in the desiredrobot behavior.

These concepts, fundamental in verbal communication,transfer to situations where manual teleoperation is the modeof interaction. The difference is that instead of saying “moveforward” or “turn right,” the operator pushes an input deviceforward or twists it right. The ability to use a natural coor-dinate mapping, however, is not as straightforward in manualteleoperation as it is in speech, as the robot generally has noway to determine the operator’s intended coordinate frame.Further, instead of being able to flexibly switch back andforth between whatever predefined coordinate frame seemsnatural and intuitive, or even dynamically defining them asthe interaction progresses, a fixed frame of reference is usuallyattached to the hardware device or interface used to teleoper-ate. For example, when pushing forward on the input device,the teleoperator cannot clarify what ‘forward’ means, as theycould in dialogue; they can only use the mapping built intothe system, whether they find it intuitive or not.

It is thus imperative to ensure that when an operatormanipulates an input device, the robot moves in the directionand the manner that the operator expects. In addition, thecoordinate frame of the input device ideally should be relatedto the coordinate frame of the robot or robot workspace in sucha way that the human operator expends little thought on mentalspatial transformations to reconcile their movement commandswith the input device’s coordinate mapping. As we will show,small changes in this relationship between device input androbotic motion result in corresponding changes of individualoperator mental workload, satisfaction, and efficiency duringteleoperation.

Our initial hypothesis was that using a task-oriented coordi-nate frame while teleoperating, such as a coordinate mappingaligned with the object being manipulated, would lead to lessoperator workload and more successful and efficient human-robot interactions. In order to test this claim, we conducted anexperiment in which users were asked to complete a simpleteleoperation task using three different mappings between theinput device’s coordinate frame and the direction of move-ment of the robot. They viewed the workspace using severaldisplays, including a stationary camera and a workspace

visualization tool. The first mapping relied upon the camera’sposition in the workspace: moving the input device forwardand backward moved the robot directly toward and awayfrom the camera. The second mapping used the robot’s owncoordinate frame. Finally, the third mapping was task-oriented,and used the coordinate frame of the object being manipulated.

The results from these experiments, while not conclusive,indicate that although all mappings had similar user mentalworkloads, the task-oriented mapping enabled the task to becompleted in less time. This suggests that although a task-oriented mapping may require just as many, if not more,spatial transformations as other mappings, it is ultimately amore effective frame of reference for teleoperation due to itsintuitive relation to the task’s goals.

In the following section, we discuss work related to ourinvestigation. In Section 3, we discuss an overview of differenttheoretical mappings that could exist between a teleoperationdevice and the resulting robotic movement, and hypothesizeabout the benefits and drawbacks of each. Section 4 introducesour experimental domain and possible mappings we can usein our system. We then continue to describe our experimentalmethod, discuss our results, and talk about future work in thisarea.

II. RELATED WORK

Two of the main goals of teleoperation research are toreduce operator workload and to make the entire experienceas efficient as possible for both the human teleoperator andthe human-robot team. One large body of research addressesthis question by focusing on the operator’s viewpoint of theworkspace. Commonly, an operator will have two views of theworld. The first is an egocentric, or first-person, view – theoperator views the world as if they were in the workspace. Thesecond is an exocentric, or third-person, view – the operatorviews the world from a vantage point outside of the workspace.Reconciling these dual displays, however, can often lead toextra effort on the part of the operator. Wang and Milgram [4]address this problem with a new type of ‘tether’ view, wherethe user has just one display that is neither completely localnor completely global. Instead, it simulates a camera flyingbehind the robot like a kite tied to the robot body. Other similarwork has used virtual reality to teleoperate robotic arms inspace, which allows operators to see simulations of parts ofthe environment that may otherwise not be visible [5].

Another area of research focuses on improving the tele-operator’s overall interface, especially the arrangement andchoice of components, in order to maximize the efficiency ofthe operator’s interactions with the robot. For example, Kaberand Chow [6] investigated how to best define and developteleoperator interface design principles. Some of the measuresthey consider while designing an interface are the currentoperator goals, operator situational awareness, and operatorworkload.

DeJong, et al. [7] investigated optimizing the teleoperator’sinterface in order to reduce the mental workload by minimiz-ing the number of mental rotations and translations that the

teleoperator must do in order to gain situational awareness.They showed that by organizing the interface and displaysin an effective way, the transformations required to convertthe robot’s desired motion – as seen in the operator’s viewof the world – into a command to the input device can beminimized, reducing training and task times and increasingtask performance.

While all of the above projects have the same overall goal asthe work presented here, they focus more on efficiently gainingsituational awareness and reducing mental workload whileinterpreting and understanding output from the robot and itsworkspace. This contrasts with our own goals of determininghow to best provide commands back into the workspace. Aproject that does look at how to best command a robot isdescribed by Diftler, et al [8]. In this system, a teleoperatorwears body trackers at the arms, neck and waist, and gloveson both hands. They can then control Robonaut, an anthropo-morphic robot, simply by moving their own body, hands andfingers. Although this approach is highly appropriate for ananthropomorphic robot, translating the operator’s movementsinto teleoperation commands becomes quite difficult if therobot lacks the necessary degrees of freedom. Instead, wefocus on optimizing simple hardware devices by intelligentlydesigning the software that integrates them into our system.

III. TELEOPERATION WITH DIFFERENT COORDINATEFRAMES

In order to teleoperate a robot, a human needs an inputdevice to translate their intentions into a form that the robotcan understand. As previously mentioned, one question thatnaturally arises is how exactly the user’s input should map tomovement by the robot in the real world.

There are many ways in which one can think of whatexactly they are controlling while moving a joystick duringteleoperation. In remote teleoperation, teleoperators typicallyuse an external camera to view the workspace. Thus, onepossible mental model of the relationship between the joystickand the world is that the operator is commanding the robotfrom their own vantage point – i.e., from the position ofthe camera in the robot workspace. Here, pushing forwardon the joystick moves the robot away from the camera, andpulling back on the joystick moves the robot towards it. Apotential advantage of this coordinate frame is that it wouldtake very little time to adjust to it: no mental transformationsor rotations are required of the operator to predict the resultsof their input. Instead, the robot moves in the exact sameway as the joystick from the teleoperator’s point of view. Adisadvantage is that it may not always feel intuitive duringtask-oriented teleoperation, such as when the operator has tomanipulate some object into a certain position. Here, it maybe more difficult to complete the task, since more mentaltransformations are required to determine how the robot orend effector needs to move in order to complete the task athand.

A second mental model is that the teleoperator imaginessitting on top of the robot, controlling it using the frame of

the robot itself: pushing forward moves the robot (and in asense the teleoperator) forward, and back moves it back. Thiscoordinate frame has the advantage of being easy to define,especially for robots with obvious fronts and backs, such asrovers and humanoids. A disadvantage is that not all robotshave obvious directionality. For example, a symmetric robot’sdirectionality is both completely arbitrary and often hard forthe operator to remember later on. A second disadvantage isthat unless a teleoperator has the ability to view the worldfrom this vantage point, they would have to manipulate theirmental model of the world to figure out how the workspacewould look from that position.

A third possibility is modeling the workspace in a task-oriented manner. While this is very task-specific, it usuallymeans adopting the coordinate frame of the object beingmanipulated, whether that is the end effector of the robot,an object it is grasping, or some other relevant item in theenvironment. Thus, instead of being perched on the robot,using this model the teleoperator would think of themselvesas directly facing the object they are trying to manipulate andcontrolling it from that viewpoint. For example, if the taskwere to use a square end effector to pick up an object, thecoordinate frame would be aligned with the end effector’s flatsides, with the user ‘facing’ the end effector while standingnear the target object. A disadvantage of this choice ofcoordinate frames is that if the camera providing the videofeed is not aligned with the task’s coordinate frame, extraspatial transformations are required in order to map the desiredmovement according to the camera view to movement of theinput device. Also, as a given task changes and evolves, itsassociated task mapping may change with it, which may jarthe user.

These three possible mappings loosely correspond to thethree main kinds of reference frames developed in the psycho-logical community [9]. The first, based on the teleoperator’sview of the workspace, corresponds to the deictic, or viewer-centered, reference frame, since it is with respect to theoperator’s virtual position in the environment. This is similarto the concept of “to my right” or “in front of me.” Thesecond, based on the robot’s position, corresponds (albeitweakly) to the extrinsic, or environment-centered, referenceframe, since it relates to an extrinsic object in the environmentthat is sometimes arbitrarily defined and is not the focusof the current task. “North” and “south” are two examplesof concepts in the extrinsic reference frame. Finally, thetask-oriented mapping corresponds to the intrinsic, or object-centered, reference frame, since it focuses on the object in theenvironment the teleoperator is trying to control. Languageillustrating this reference frame is “in front of the receptacle,”or “to the right of it.”

IV. ROBOTIC ASSEMBLY

The previous section described 3 different frames of refer-ence and their associated strengths and weaknesses. In orderto determine quantitatively how the strengths and weaknessescompare, we ran an experiment in the domain of robotic

Fig. 1. The fully assembled four-beam structure, suspended above the fullyassembled structure.

Fig. 2. The Mobile Manipulator (left) and the Roving Eye.

assembly. This domain is part of the larger Trestle project [10]and involves the assembly of a square structure that is builtusing four beams and four planarly compliant nodes (Figure1). In order to weakly simulate conditions in space, the nodesare mounted on casters that roll easily along the floor whenpushed. Thus, during beam insertion a node must be bracedto keep it from rolling away.

This assembly task has three different roles that must befilled: sensing the environment, bracing the node for insertion,and inserting the beam into the node. In our work, each ofthese roles is filled by a different robot. The Roving Eye(Figure 2) is a synchro-drive robot built on a RWI B24 baseand equipped with a stereo camera pair, which is mounted ona pan-tilt unit. It provides a color video feed to the user, aswell as relative 3D object position information to the otherrobots during visual servoing and to the user via a visualizer(described below). The Mobile Manipulator (Figure 2) is anRWI ATRV-2 skid-steered base with a Metrica/TRACLabs 5-DOF anthropomorphic arm mounted on the front, which it usesto insert the beams into the nodes. Finally, the Crane (Figure3) is the NIST-built RoboCrane [11], which consists of a 6-DOF inverted Stewart platform with mounted receptacles thatallow it to brace the nodes during beam insertion.

Part of this assembly process includes allowing a humanoperator to assist the system in its construction during times

Fig. 3. The NIST RoboCrane and four mounted receptacles.

Fig. 4. The SpaceMouse.

when the system is unable to complete the task itself, or whenthe human is believed to be more efficient. In order to doso, the human operator has the ability to teleoperate each ofthe three robots. This teleoperation is done via a six degreeof freedom “SpaceMouse” input device [12] (Figure 4). TheSpaceMouse is similar to a joystick, but with an additionalvertical translational dimension, as well as roll, pitch and yaw.

The question of mapping input from the SpaceMouse torobot movement during teleoperation is especially pertinent forthe RoboCrane. There are two main phases to bracing a nodeusing the Crane: moving the Crane’s receptacle to the vicinityof the node, and then using finer movements to align the Craneappropriately and lower it onto the node. The Crane, however,does not have an obvious front (Figure 3): instead, it has foursymmetric receptacles, which are needed due to its limitedworkspace. Because of this, without knowledge of the robot’sdefined coordinate frame, the ‘front’ of the Crane appears to bewhichever side of the Crane the human is currently observing.

In the context of teleoperating the RoboCrane, we definethree coordinate frames that correspond to the three the-oretical mappings presented in Section 3 (Figure 5). Thefirst is ViewPoint-oriented (VP), which uses the workspace’scamera as the basis for its coordinate frame. The camera ismounted on the Roving Eye, which is positioned outside theCrane’s workspace, facing it. Thus, pushing forward on theSpaceMouse moves the Crane away from the camera, pullingbackward on the SpaceMouse moves the Crane toward thecamera, left moves it left, etc. The second, Robot-Centered(RC), is with respect to the Crane’s body, using its predefinedfront and back. In our setup, the front of the Crane roughlyfaces away from the task and the workspace camera. The thirdis Task-Centered (TC). Since the task is to grab on to the nodewith one of the Crane’s receptacles, this uses the coordinate

Crane

Crane's receptacle

Roving Eye

Node

TC Mapping

VP Mapping

RC Mapping

Fig. 5. The three coordinate mappings between SpaceMouse and Cranemovement. (1) corresponds to the VP mapping, (2) to the RC mapping, and(3) to the TC mapping. (1) and (2) both rotate around the center of the Crane;(3) rotates around the receptacle.

frame of the side of receptacle facing the node: movingthe mouse forward moves the Crane towards its receptacle,roughly towards the node.

Our initial hypothesis was that the RC mapping would bethe most difficult to use and the TC would be the easiest.Since the RC mapping requires users to remember an arbitraryframe of reference potentially misaligned with the task orinformation at hand, intuition told us that users would have toconcentrate more on their teleoperation in this condition. TheTC mapping, however, would theoretically result in a lowermental workload, since no reconciliation would be requiredin order to merge their mental model of the task workspaceand the corresponding control space of the Crane. This wouldthen allow teleoperators to complete their required tasks moreefficiently. We also hypothesized that the VP mapping, whileinitially easy to understand, would ultimately be more difficultto use than the TC mapping, since it also did not align withthe coordinate frame of the task.

V. EXPERIMENTATION

To test our hypothesis, we conducted an experiment inwhich novice human users were asked to teleoperate theRoboCrane to complete a bracing task using the three differentCrane coordinate mappings described in the previous section.Although the results obtained were not statistically significant,they confirm our hypothesis that subjects are in fact fastestwhile using the TC mapping, even though as a whole theybelieve the VP mapping is easier to use.

A. Method

Subjects interacted with the Crane using a teleoperationworkstation. The workspace layout is shown in Figure 5. Theywere not allowed to view the robot workspace directly. Instead,they received two visual representations of the workspace. Thefirst was a live video feed from a camera in the workspace(Figure 6); this provided the basis for the viewpoint-orientedcoordinate frame. Since there was only output from one

Fig. 6. The video display from the Roving Eye. Here, the Crane receptacleis close to the node and is preparing for the fine positioning required forbracing.

Fig. 7. The visualizer window, showing synthesized front, side and top-downviews of the Crane receptacle and node.

camera displayed in the video feed, this view lacked depthperception and rotational accuracy. It was more useful forthe initial gross movements than the later fine motions. Thesecond visual representation was virtual, showing the relativetranslational and rotational offset between the receptacle andthe node (Figure 7). The translational offsets were representedby synthesizing views of the node from the front, side and top.Thus, if the Crane receptacle were too high, the front viewwould show the bottom of the receptacle above the node; if itwere too far in front, the side view would show it to the leftof the node. The rotational offsets were shown by simulatinga top-down view of the Crane overlaid on top of the node. InFigure 7, the Crane receptacle (represented by the top-mostoutline) is docked onto the node (represented by the solid boxwith the pentagon on top), and the Crane is rotated just slightlyclockwise with respect to the node.

The visualization was generated by using the relative poseinformation calculated by the Roving Eye’s fiducial recogni-tion software. Due to its the detail, the visualization tool wasvery useful for fine-grained movements at the conclusion ofthe bracing procedure; however, due to field of view limita-tions, if the Crane was not close to the node the visualizerprovided no information, rendering it useless for the initialgross movements toward the node.

Each subject was asked to use the SpaceMouse, GUI andvideo feed to brace one of the nodes with the Crane threetimes, using each of the three different coordinate framemappings introduced in Section 4 (Figure 5). There was nolag time introduced between their input and the movement ofthe Crane.

The viewpoint coordinate frame (VP) depended on thecamera providing the live video feed to the subject’s display.This camera was mounted on the Roving Eye, which remainedstationary during the experiment. In this condition, input fromthe SpaceMouse translated directly to movement of the Craneas seen by the Roving Eye’s cameras – no mental spatialtransformations were required while viewing the video feed.Rotation was centered at the center of the Crane.

The coordinate frame associated with the Crane’s body (RC)roughly faced away from the camera and node. Accordingto the Crane’s predefined frame, pushing forward or pullingbackward on the mouse caused the Crane to move laterally,along its edges. Rotation in this condition was also centeredin the middle of the Crane.

Finally, the task-centered coordinate frame (TC) used theorientation of the Crane’s receptacle used to complete the task.The front was defined as the side of the receptacle closestto the node. Since this viewpoint simulated the teleoperatorfacing the receptacle, the left and right were the user’s leftand right, not the receptacle’s. This made this condition theonly left-handed coordinate frame1. Rotation was centered atthe receptacle.

At the beginning of each condition, subjects were toldwhich coordinate mapping that condition used. They werethen asked to complete a practice task until they felt familiarwith the frame of reference. The practice task consisted oftranslating the robot in a certain direction and then rotating it(for example, moving the Crane forward and then rotating itcounter-clockwise). Once subjects indicated they were satisfiedwith their completion of the practice task, the Crane was resetto its starting position and they were told to begin the bracingtask. On average, subjects took less than a minute to completethe practice task.

Each of the 18 subjects performed the three conditionsin different orders, with each of the six different possibleorders being performed by three subjects. This was done toaccount for both learning and ordering effects. Since noneof the subjects had worked with our system before, theirteleoperation skills in general improved as the experimentprogressed. The ordering also prevented biasing of the databy the order in which conditions were administered.

Subjects were also given a distractor task in order to increasetheir mental workload while completing these experiments:we did not believe the bracing task alone was hard enoughimpose sufficient workload on the subjects to see measurabledifferences in workload between conditions. The task was to

1It has been shown that in describing symmetric objects, speakers giveinanimate objects their own left and right (e.g. “to the right of the table”),while at the same time assigning the ‘front’ of the object to be the side facingtheir viewpoint [9].

TABLE IAVERAGE TASK COMPLETION TIMES, IN SECONDS

VP RC TC

µ σ µ σ µ σ

276.1 200.4 253.0 189.9 224.4 165.0

watch a TV show segment while performing the bracing taskand to summarize the segment at the end of each bracing.

After finishing each condition, subjects were given a NASA-TLX survey [13] to determine their mental workload whileperforming the task. The survey reports overall workload, aswell as mental load in several different categories: mentaldemand, temporal demand, effort, performance and frustration.Each factor, including overall workload, is rated on a scalefrom 0 to 100.

After filling out the TLX survey, subjects completed awritten survey in which they evaluated the coordinate framemapping of that task. They rated the overall mapping, justtranslation, and just rotation, using a scale from 1 (veryeasy/intuitive) to 5 (very hard/unintuitive). The survey alsoincluded three qualitative questions asking about what aspectsof the mapping they most liked or disliked, and whether thismapping was one the subject would want to use again.

B. Results

During the experiment, the results of the TLX and writtensurveys were recorded, as well as the time it took each userto complete the bracing task under each of the conditions.Three subjects’ results were discarded and replaced in order toeliminate outlier effects, as their mean task times were morethan three standard deviations above the mean time of thegroup.

The subjects’ timing and survey data were analyzed usinga repeated measures ANOVA test. Because of very highinter-subject and intra-subject variability, the tests did notyield statistically significant results. However, in Section 6 wediscuss the observed trends, as well as the qualitative feedbackfrom the users.

The timing data that was collected showed that the VPmapping condition took the longest time, and the TC theshortest (Table I)2. The subjects’ TLX results versus comple-tion time are shown in Figure 8. These data agree with ourinitial hypothesis that the TC mapping leads to more efficientinteractions between the human operator and the robots.

The TLX survey revealed small differences in subjects’workload in each of the three conditions (Table II). Interest-ingly its results do not agree with the timing results. Instead,the VP mapping had the lowest average workload, and the TCmapping had the highest.

User evaluations were in accordance with the TLX surveys(Table III), and were also statistically insignificant. In all

2In all tables, µ refers to the mean and σ to the standard deviation of thedata shown.

Fig. 8. Subjects’ TLX results by performance.

TABLE IITLX WORKLOAD RESULTS

Workload Type VP RC TCµ σ µ σ µ σ

Overall 46.4 21.7 49.3 19.7 50.7 19.3

Mental 55.9 26.2 57.0 24.3 59.0 19.9

Temporal 35.9 23.3 37.5 28.7 38.7 26.2

Effort 51.2 22.6 54.0 24.1 53.9 24.4

Performance 39.8 27.0 39.6 23.9 40.6 21.1

Frustration 45.0 25.3 42.8 25.7 49.9 22.9

three categories, subjects mostly preferred the VP mapping tothe other two, with the TC mapping generally least favored.Also, when asked whether they would like to use the currentcoordinate mapping again in the future, the majority (78%)of subjects responded that they would definitely use the VPmapping again if it were an option. This was followed bythe TC mapping at 56%, with the RC mapping receiving thefewest positive responses at 50%.

VI. DISCUSSION

As mentioned, the results from these experiments did notyield statistically significant values. The main reason forthis was high inter-user variability: different subjects werejust good or bad at the task in general, which lessened themeasured differences between conditions. Still, the resultsin combination with the users’ qualitative survey feedbackallow us to draw some conclusions about coordinate frames

TABLE IIISUBJECT SURVEY RESULTS

User Preference VP RC TCµ σ µ σ µ σ

Overall 1.9 .72 2.3 1.1 2.5 1.2

Translation 1.6 .85 2.4 1.2 2.3 1.1

Rotation 2.1 1.1 1.8 1.0 2.2 1.4

Use Again? 78% – 50% – 56% –

in teleoperation.Our initial conjecture was that the viewpoint mapping

required more mental spatial transformations than the othermappings in order to reconcile the different frames of refer-ence the user makes use of during the task, and thus would bethe most difficult to use. However, the results from the TLXsurvey show that although all three mappings yield similarworkloads, the VP mapping has a slightly lower workload.One possible reason for this is that with the VP mapping usersperformed fewer, not more, spatial transformations comparedto the other mappings. To explain this, recall that the subjectsinitially had two main views of the workspace: the RovingEye’s cameras and the visualizer in the GUI. Many subjectsreported that they used the live video feed for the initialgross movements required to get the Crane close to the node,and then switched to the visualizer in order to fine-tune theCrane’s position and line it up to brace the node. This lowerworkload could be a result of the correspondence between theVP mapping and the display initially used, making it easierto get comfortable with manipulating the Crane using thatmapping. With the task-centered and robot-centered mappings,users might have had a higher workload due to initially havingto construct a mapping between what they saw on the monitorand the direction of Crane movement, making it harder to getaccustomed to controlling the robot in that way.

The user rating surveys showed significant correlation withthe TLX survey results. Users reported finding the VP mappingmore intuitive and easier to use than the other two mappings,again suggesting that it was easier to adjust to controlling therobot in that coordinate frame than the other two. Subjectsalso indicated qualitatively that this mapping was very easy toadjust to, since what they saw was what they got, resulting inless frustration during the task. The TC mapping, in contrast,received much lower user ratings. This contrast also indicatesthat they may have been attempting to visualize the entireworkspace while trying to complete the task, instead of usingthe TC mapping to simply complete the task locally as weinitially believed.

Before we ran the experiments we assumed that all threemeasures – TLX survey results, user satisfaction and timing– would agree about which coordinate mapping was ‘best.’The timing data collected, however, while still statisticallycorrelated with the other data, is correlated inversely. Thatis, users were fastest at the mapping that gave them thehighest workload and that they least preferred. The datashow that although users reported they preferred the VPcoordinate mapping to the TC and RC mappings, and hada lower mental workload, they were still able to completethe task more quickly using the TC mapping than the VPor RC mappings. This suggests that although the TC mappingrequires slightly more mental effort, its intuitive relationship tothe task workspace and goals ultimately make it more usefuland efficient than the other two mappings.

Finally, while our results are statistically insignificant, ourinitial conclusions are strengthened when one considers Figure9, which shows the least squares mean times by condition,

Fig. 9. Least squares means of each condition, reported by the order inwhich the condition was done.

separated by the order in which each condition was done.This graph suggests that the results could be affected by asteeper learning curve than we originally anticipated. Althoughthe practice effect is not statistically significant, the graphclearly shows that the differential between mapping methodsincreased as the experiment progressed, suggesting that thetrue significance of the data is simply hidden by a large initialvariance due to large learning and practice effects.

Based on these results, we can make some recommendationsabout coordinate frames in teleoperation. As Figure 8 suggestsand Tables 1, 2 and 3 make explicit, a user’s perceivedworkload is inversely related with their measured performance.This relationship implies that, at least in teleoperation, thereis a trade-off of user workload against performance. There areseveral possible causes of this. One is that in situations wherethe task at hand is more difficult, an operator will compensatefor the difficulty by concentrating more on the task, ultimatelyimproving their performance. A second is that although theVP mapping does feel easier, it simply takes more commandsto complete the task due to the mapping’s indirect spatialrelationship to the task. Thus, if efficiency is a key factor inteleoperation, and a high mental workload is acceptable, thenthe human operator should use the TC mapping; otherwise, ifthe operator wants a lower workload, or wants to concentrateon multiple tasks at once, the VP mapping might be the mostappropriate.

Another solution to this trade-off is to give the operator theability to adjust the coordinate mapping based not only onwhich task they are performing, but which component of thattask they are currently working on. As previously mentioned,users in this study often indicated that the VP mapping wasvery natural to use while making gross movements to getthe Crane close to the node, but it was much harder to usefor the fine movements required to actually align the Cranewith the node and brace it. Similarly, users indicated that theTC mapping was easier to use during fine manipulation, butdisliked how it felt while doing larger translational movements.If they had had the option of switching between these twomappings, they might have found both components of the

task easier to do. This would be analogous to how humanssmoothly switch between different frames of reference whilegiving directions or engaging in some other spatially orientedconversation, such as “move to the left, next to the table.” Thisstudy suggests a need to have the same ability in teleoperation.

VII. FUTURE WORK

One obvious next step for this research is to conduct suffi-cient trials to achieve statistical significance. As mentioned inthe previous section, one possible explanation for the highvariance of these results is a large practice effect skewingthe first two thirds of the data. Therefore, an experimentthat allows users to become fully accustomed to teleoperatingthe Crane before actually beginning the task may yield moreprecise results.

Another extension is further investigation into whether thetraining times differ between coordinate mappings. It could bethat although the VP mapping initially has a shorter trainingtime and longer task completion time than the TC mapping,the training continues past the initial period, and that aftercontinued use the task completion time decreases. Anotherviable theory is that although the TC mapping was perceivedas more difficult because it was harder to learn, it would gainmore favor from its users over time.

We would also like to see confirmation or refutation of ourearlier conjectures about the effect of the use of the videodisplays on the teleoperator’s preferred coordinate mapping.One approach would be to limit the GUI to only one of thetwo possible displays. By limiting users in this fashion, wewould potentially be able to determine whether users do infact prefer the VP mapping more because it corresponds tothe video display used for the initial movements, or if someother factor is involved.

Another issue to investigate further is the idea of changingthe frame of reference based on the current task. One exampleis to have subjects teleoperate the Crane to do two grabbingtasks in a row, using two of the four different receptacles ofthe Crane, and changing the SpaceMouse mapping betweenthem from the frame of the first receptacle to the frame of thesecond. Although the results and feedback from the two taskswould likely be the same, it is possible that the users wouldnot like the switch of coordinate mappings, even though itonly changes in order to retain the task-oriented aspect of thatmapping. This would tell us whether the user is groundingtheir mental frame in the world or in the task space.

VIII. CONCLUSIONS

This paper discussed various issues involved with usingdifferent coordinate frames in robotic teleoperation. It de-scribed a series of experiments comparing these frames withone another, each with its own theoretical justification. Theresults, although not statistically significant, showed that whileusers statistically preferred using a viewpoint-oriented coordi-nate frame, and have a lower mental workload while doingso, the time it took them to complete a teleoperation taskwas ultimately lowest while using a task-centered coordinate

frame. It is thus apparent that user perception of performanceand the performance itself are not always correlated in roboticteleoperation, suggesting that teleoperation system designersshould decide which factors are more important and adjusttheir interfaces accordingly.

This notion of different coordinate frames during teleop-eration generalizes to the other robots used in the Trestleassembly task. The Mobile Manipulator, in particular, has anumber of possible coordinate mappings that could be usedfor teleoperation. Its base has an obvious front, which couldbe used as the coordinate frame of an RC mapping. Its endeffector, which grasps the beams in order to dock them intothe nodes, also has a coordinate frame that would correspondto an RC mapping. The coordinate frame of the end of thebeam being inserted would be an appropriate choice for a TCmapping. And, of course, the position of the camera in theworkspace could also be used as a basis for a VP mappingfor teleoperation of the Mobile Manipulator as well. Thesegeneralizations can easily further broaden to include otherassembly and teleoperation tasks, as well.

ACKNOWLEDGMENT

The authors would like to thank Brennan Sellner, FrederikHeger, Sanjiv Singh and Nancy Pollard for their help with andsupport of this work.

REFERENCES

[1] T. Fong, C. Thorpe, and C. Baur, “Robot, asker of questions,” Roboticsand Autonomous Systems, vol. 42, 2003.

[2] M. Hearst, “Trends & controversies: Mixed-initiative interaction,” IEEEIntell. Syst., vol. 14, no. 5, pp. 14–23, 1999.

[3] M. Skubic, D. Perzanowski, A. Schultz, and W. Adams, “Using spatiallanguage in a human-robot dialog,” in 2002 IEEE Int’l Conference onRobotics and Automation, Washington, DC, May 2002.

[4] W. Wang and P. Milgram, “Viewpoint optimization for virtual environ-ment navigation using dynamic tethering – a study of tether rigidity,”in Proc. 47th Annual Meeting of the Human Factors & ErgonomicsSociety, Denver, CO, 2003.

[5] A. K. Bejczy, “Virtual reality in telerobotics,” in Proc. InternationalConference on Advanced Robotics, Barcelona, Spain, 1995.

[6] D. B. Kaber and M. Chow, “Human-robot interaction research andan approach to mobile-telerobot interface design,” in Proc. FifteenthTriennial Congress of International Ergonomics Association, Seoul,Korea, Aug. 2003.

[7] B. P. DeJong, J. E. Colgate, and M. A. Peshkin, “Improving teleopera-tion: Reducing mental rotations and translations,” in American NuclearSociety 10th International Conference on Robotics and Remote Systemsfor Hazardous Environments, Gainesville, FL, Mar. 2004.

[8] M. A. Diftler, R. Platt, Jr., C. J. Culbert, R. O. Ambrose, and W. J.Bluethmann, “Evolution of the NASA/DARPA Robonaut control sys-tem,” in Proc. IEEE International Conference on Robotics and Automa-tion, Taipei, Taiwan, 2003.

[9] H. A. Taylor and B. Tversky, “Perspective in spatial descriptions,”Journal of Memory and Language, vol. 35, pp. 371–391, 1996.

[10] J. Brookshire, S. Singh, and R. Simmons, “Preliminary results in slidingautonomy for assembly by coordinated teams,” in Proc. InternationalConference on Intelligent Robots and Systems (IROS), Sendai, Japan,2004.

[11] R. Bostelman, J. Albus, N. Dagalakis, and A. Jacoff, “A RoboCraneproject: An advanced concept for large scale manufacturing,” in Proc.AUVSI Conference, Orlando, FL, July 1996.

[12] Spacemouse manual. [Online]. Available:http://www.3dconnexion.com/spacemouseplus.htm

[13] S. G. Hart and L. E. Staveland, Development of the NASA-TLX (task loadindex): Results of Empirical and Theoretical Research, P. A. Hancockand N. Meshkati, Eds. Amsterdam: North-Holland, 1988.

coordinate frames in robotic teleoperationlahiatt/pubs/for_iros2006.pdf · coordinate frames in...

Documents