[ieee 2012 ieee virtual reality (vr) - costa mesa, ca, usa (2012.03.4-2012.03.8)] 2012 ieee virtual...

2
NuNav3D: A Touch-less, Body-driven Interface for 3D Navigation Charilaos Papadopoulos * Daniel Sugarman Arie Kaufmant Computer Science Stony Brook University ABSTRACT We introduce NuNav3D, a body-driven 3D navigation interface for large displays and immersive scenarios. While 3D navigation is a core component of VR applications, certain situations, like remote displays in public or large visualization environments, do not allow for using a navigation controller or prop. NuNav3D maps hand mo- tions, obtained from a pose recognition framework which is driven by a depth sensor, to a virtual camera manipulator, allowing for direct control of 4 DOFs of navigation. We present the NuNav3D navigation scheme and our preliminary user study results under two scenarios, a path-following case with tight geometrical constraints and an open space exploration case, while comparing our method against a traditional joypad controller. Index Terms: H.5.2 [User Interfaces]: Evaluation/methodology— Interaction Styles; I.3.7 [Three-Dimensional Graphics and Real- ism]: Virtual Reality 1 I NTRODUCTION &RELATED WORK Traditionally, navigation interfaces for VR and Immersive scenar- ios require some sort of tracked navigation device, prop or marker- based tracking system. World-in-Miniature [9] requires a projec- tion either inside the VR environment or on a surface and must be manipulated by touch or props. The concept of gesture based navi- gation for VR was discussed in [5] and [7]. Head-directed naviga- tion [3] maps head orientations to navigation operations, however it does not permit looking at a direction different to that of the move- ment vector. Hand-tracking has been used to allow the user to grasp virtual objects and either orbit around them or pull himself towards them [8] and pointing towards a destination in the virtual space[6]. Locomotion modalities such as Walking-in-Place [10] also allow navigation on a 2D plane while providing for a greater sense of presence than alternatives. LaViola et al. [4] have proposed a merging of foot gestures with rotational tracking of the torso for hands-free navigation in VR. The Pengufly [11] paradigm utilizes 2D projections of the tracked positions of the user’s hands and head to define a navigation vector. As already stated, all methods described above require some de- gree of marker-based tracking of either a prop or the user himself to be implemented, which might be infeasible under certain situations. Additionally, most techniques limit the number of DOFs exposed and potentially constraint the user on navigating on a 2D plane. Our proposed technique, NuNav3D, utilizes body pose recognition driven from a single depth camera [1] [2], and directly maps rota- tions and translations to simple hand movements, for simultaneous manipulation of a total of 4-6 DOFs without requiring any tracking markers or props. * e-mail: [email protected] e-mail: [email protected] e-mail: [email protected] 2 THE NUNAV3D PARADIGM NuNav3D operates on poses. We have developed a simple pose recognition scheme, based on the concept of pose uniformization. A uniform pose P Unif A results from a pose P A by placing the skele- ton at the center of the coordinate system, compensating for torso rotation by applying the inverse rotation to all skeletal joints and then normalizing all bone segments to unit length. Pose recognition is performed by comparing two poses in uniform space using the metric s P A P B = 1 n n i=0 {|P Unif A (i) - P Unif B (i)|} < s Thres (s Thres being the pose recognition threshold scalar) and utilizing a simple voting scheme over a sliding window of poses. The above method appears to be relatively robust against variance in body geometry. We define a navigation pose as illustrated in Figure 1. Once the navigation pose is detected, the system stores the user’s po- sition (defined as P Nav-Base ), relaxes s Thres and on each follow- ing update calculates v Left Offset = P Nav-Current LeftHand - P Nav-Base LeftHand and v Right Offset = P Nav-Current RightHand - P Nav-Base RightHand . (which are also scaled based on the re- laxed pose recognition threshold to achieve roughly unit length). Figure 1: An example NuNav3D navigation pose, with the user’s arms parallel to the torso and the forearms perpendicular to the arms. The system stores a snapshot of the user’s pose P Base Nav when the nav- igation pose is detected and calculates offset vectors for the user’s left and right hands which are mapped to translations and rotations respectively. On each update, v Left Offset and v Right Offset are used to manipulate a vir- tual camera. Assuming a virtual camera with position C Pos and basis vectors ˆ C x , ˆ C y , ˆ C z , for time-steps t and t + 1, we define a trans- lation vector T t +1 =(v Left Offset .y) k Translate · ˆ C z +(v Left Offset .x) k Translate · ˆ C x with k Translate being a ramp exponent. The position of the virtual camera C is defined as C t +1 = C t + T t +1 · λ Translate with λ Translate being a scalar scaling factor . Effectively this maps the left hand’s vertical hand movements to translations along the camera’s view direction and horizontal movements to strafing. Rotations are handled by defining rotation transformations R y and R x along the camera’s y and x axis (for yaw and pitch respectively). The angle value for R y is equal to -(v Right Offset .x) k Rotate radian (similarly for the x axis). The quaternions are composed to R t +1 and used to rotate ˆ C x and ˆ C z . Ef- fectively, this maps horizontal movements of the right hand to cam- 67 IEEE Virtual Reality 2012 4-8 March, Orange County, CA, USA 978-1-4673-1246-2/12/$31.00 ©2012 IEEE

Upload: arie

Post on 16-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2012 IEEE Virtual Reality (VR) - Costa Mesa, CA, USA (2012.03.4-2012.03.8)] 2012 IEEE Virtual Reality (VR) - NuNav3D: A touch-less, body-driven interface for 3D navigation

NuNav3D: A Touch-less, Body-driven Interface for 3D NavigationCharilaos Papadopoulos∗ Daniel Sugarman† Arie Kaufmant‡

Computer ScienceStony Brook University

ABSTRACT

We introduce NuNav3D, a body-driven 3D navigation interface forlarge displays and immersive scenarios. While 3D navigation is acore component of VR applications, certain situations, like remotedisplays in public or large visualization environments, do not allowfor using a navigation controller or prop. NuNav3D maps hand mo-tions, obtained from a pose recognition framework which is drivenby a depth sensor, to a virtual camera manipulator, allowing fordirect control of 4 DOFs of navigation. We present the NuNav3Dnavigation scheme and our preliminary user study results under twoscenarios, a path-following case with tight geometrical constraintsand an open space exploration case, while comparing our methodagainst a traditional joypad controller.

Index Terms: H.5.2 [User Interfaces]: Evaluation/methodology—Interaction Styles; I.3.7 [Three-Dimensional Graphics and Real-ism]: Virtual Reality

1 INTRODUCTION & RELATED WORK

Traditionally, navigation interfaces for VR and Immersive scenar-ios require some sort of tracked navigation device, prop or marker-based tracking system. World-in-Miniature [9] requires a projec-tion either inside the VR environment or on a surface and must bemanipulated by touch or props. The concept of gesture based navi-gation for VR was discussed in [5] and [7]. Head-directed naviga-tion [3] maps head orientations to navigation operations, however itdoes not permit looking at a direction different to that of the move-ment vector. Hand-tracking has been used to allow the user to graspvirtual objects and either orbit around them or pull himself towardsthem [8] and pointing towards a destination in the virtual space[6].Locomotion modalities such as Walking-in-Place [10] also allownavigation on a 2D plane while providing for a greater sense ofpresence than alternatives. LaViola et al. [4] have proposed amerging of foot gestures with rotational tracking of the torso forhands-free navigation in VR. The Pengufly [11] paradigm utilizes2D projections of the tracked positions of the user’s hands and headto define a navigation vector.

As already stated, all methods described above require some de-gree of marker-based tracking of either a prop or the user himself tobe implemented, which might be infeasible under certain situations.Additionally, most techniques limit the number of DOFs exposedand potentially constraint the user on navigating on a 2D plane.Our proposed technique, NuNav3D, utilizes body pose recognitiondriven from a single depth camera [1] [2], and directly maps rota-tions and translations to simple hand movements, for simultaneousmanipulation of a total of 4-6 DOFs without requiring any trackingmarkers or props.

∗e-mail: [email protected]†e-mail: [email protected]‡e-mail: [email protected]

2 THE NUNAV3D PARADIGM

NuNav3D operates on poses. We have developed a simple poserecognition scheme, based on the concept of pose uniformization.A uniform pose PUni f

A results from a pose PA by placing the skele-ton at the center of the coordinate system, compensating for torsorotation by applying the inverse rotation to all skeletal joints andthen normalizing all bone segments to unit length. Pose recognitionis performed by comparing two poses in uniform space using themetric sPAPB = 1

n ∑ni=0{|P

UnifA (i)−PUnif

B (i)|} < sThres (sThres beingthe pose recognition threshold scalar) and utilizing a simple votingscheme over a sliding window of poses. The above method appearsto be relatively robust against variance in body geometry.

We define a navigation pose as illustrated in Figure 1. Oncethe navigation pose is detected, the system stores the user’s po-sition (defined as PNav-Base), relaxes sThres and on each follow-ing update calculates vLeft

Offset = PNav-CurrentLeftHand −PNav-Base

LeftHand and vRightOffset =

PNav-CurrentRightHand −PNav-Base

RightHand . (which are also scaled based on the re-laxed pose recognition threshold to achieve roughly unit length).

Figure 1: An example NuNav3D navigation pose, with the user’sarms parallel to the torso and the forearms perpendicular to the arms.The system stores a snapshot of the user’s pose PBase

Nav when the nav-igation pose is detected and calculates offset vectors for the user’sleft and right hands which are mapped to translations and rotationsrespectively.

On each update, vLeftOffset and vRight

Offset are used to manipulate a vir-tual camera. Assuming a virtual camera with position CPos andbasis vectors Cx,Cy,Cz, for time-steps t and t +1, we define a trans-lation vector Tt+1 = (vLeft

Offset.y)kTranslate ·Cz+(vLeft

Offset.x)kTranslate ·Cx with

kTranslate being a ramp exponent. The position of the virtual cameraC is defined as Ct+1 = Ct +Tt+1 · λTranslate with λTranslate being ascalar scaling factor . Effectively this maps the left hand’s verticalhand movements to translations along the camera’s view directionand horizontal movements to strafing. Rotations are handled bydefining rotation transformations Ry and Rx along the camera’s yand x axis (for yaw and pitch respectively). The angle value for Ry

is equal to −(vRightOffset.x)

kRotate radian (similarly for the x axis). The

quaternions are composed to Rt+1 and used to rotate Cx and Cz. Ef-fectively, this maps horizontal movements of the right hand to cam-

67

IEEE Virtual Reality 20124-8 March, Orange County, CA, USA978-1-4673-1246-2/12/$31.00 ©2012 IEEE

Page 2: [IEEE 2012 IEEE Virtual Reality (VR) - Costa Mesa, CA, USA (2012.03.4-2012.03.8)] 2012 IEEE Virtual Reality (VR) - NuNav3D: A touch-less, body-driven interface for 3D navigation

era yaw and vertical movements to camera pitch. For both transla-tion and rotation, if the offset vector magnitude is smaller than anexperimentally defined threshold, no transformation occurs (thusfiltering out minor, naturally occurring, hand movements). Figure2 graphically demonstrates the above mapping.

Figure 2: An indicative application of offset vectors, manipulating avirtual camera. The top subfigure illustrates the calculation of the per-hand offset vectors, while the bottom subfigure shows a third-personview of a virtual camera at time step t (before applying the offset vec-tors) and t+1, after applying the corresponding transformations. Theresulting rotation, denoted by the blue shaded area, is the differencein orientation for the Cx basis vector between t and t +1.

3 EVALUATION

We conducted a user study, comparing NuNav3D against a con-ventional joypad controller under a general, open-space explorationtask (discovery of hidden objects) and a path-following task withstrict geometrical constraints (which was based on simulated virtualcolonoscopy data). Figure3 shows two indicative views of the trialdatasets. Our quantitative metric was elapsed time (ET). Subjectswere asked to rate their familiarity with 3D navigation on a scale of1 to 5 (5 being the most experienced). The study was conducted ona total of 12 individuals, with a mean age of 24.1 (SD 1.89). Themean 3D navigation experience level was 3.25 (SD 1.65).

Figure 3: Path-following data set, inspired by virtual colonoscopy(top), and open space exploration dataset (bottom, inlet denotes hid-den object positions and starting camera position).

Over the entirety of study subjects, NuNav3D presented lowerperformance than the joypad. Specifically, for the path-followingtask, NuNav3D was on average 79.9% slower than the joypad con-troller. We observed that users tended to overtranslate the virtualcamera when using Nunav3D, that resulted in corrective maneuverswhen exiting the tight constraints of the path-following task. In theexploration task, mean ET was 31% longer for NuNav3D. However,breaking down the results based on experience level yields inter-esting indications. Specifically, for users identifying themselves asbelonging to the 1-2 experience bracket, the path-following task ETusing NuNav3D was on average 37.7% (SD 20.9%) slower. In the

exploration task, the same user group was only 3.73% (SD 24.1%)slower in ET. In contrast, users in the 3-5 experience bracket were100.9% (SD 83.9%) slower in the path following task and 45% (SD82.2%) slower in the exploration task.

The subjects were also asked post-study qualitative evaluationquestions, including choice of interface for usage in the future andwhich interface they found to be less intrusive with the experience.66.6% commented that NuNav3D was less intrusive and 50% statedthat they would chose to use NuNav3D in the future.

4 DISCUSSION AND FUTURE WORK

While the sample size of the user study was not large enough todraw conclusions, it provides interesting indications for directingfuture work. For non-expert users, who were unfamiliar with thejoypad controller as well as NuNav3D, performance was compa-rable in the open-space exploration task. Going forward we planon investigating solutions for the over-translating issue observed inthe path-following task, including ways to visually represent thenavigation data to the user, as well as options in applying the navi-gation offset vectors to the virtual camera. Additionally, we plan onconducting a much more thorough experimental evaluation on thesystem, including reviewing user fatigue over time and exploringother metrics such as simultaneous DOF utilization and precisionin movement. Finally we wish to explore a multi-modal interac-tion scheme that includes NuNav3D for navigation and other body-driven modalities for tasks such as picking, etc.

ACKNOWLEDGEMENTS

We wish to thank AMD for providing the FirePro GPU and AVNetfor the tiled-display array used in our experimental setup. Thiswork was supported by NSF grants IIS0916235 and CNS0959979and NIH grant R01EB7530.

REFERENCES

[1] Microsoft Kinect. URL:http://www.xbox.com/en-US/kinect.

[2] Primesense NITE Middleware. URL:http://www.primesense.com/.

[3] A. Fuhrmann, D. Schmalstieg, and M. Gervautz. Strolling throughcyberspace with your hands in your pockets: Head directed navigationin virtual environments. In Virtual Environments ’98 (Proceedings ofthe 4th EUROGRAPHICS Workshop on Virtual Environments).

[4] J. J. LaViola, D. A. Feliz, D. F. Keefe, and R. C. Zeleznik. Hands-freemulti-scale navigation in virtual environments. In Proceedings of the2001 symposium on Interactive 3D graphics.

[5] M. Lucente, G. J. Zwart, and A. D. George. Visualization space: Atestbed for deviceless multimodal user interface. Computer Graphics,31(2), 1997.

[6] M. Mine. Virtual environment interaction techniques. Technical re-port, UNC Chapel Hill CS Dept, 1995.

[7] R. O’Hagan and A. Zelinsky. Visual gesture interfaces for virtual en-vironments. In Proceedings of the First Australasian User InterfaceConference. IEEE Computer Society, 2000.

[8] J. S. Pierce, A. S. Forsberg, M. J. Conway, S. Hong, R. C. Zeleznik,and M. R. Mine. Image plane interaction techniques in 3d immersiveenvironments. In Proceedings of the 1997 symposium on Interactive3D graphics.

[9] R. Stoakley, M. J. Conway, and R. Pausch. Virtual reality on a WIM:interactive worlds in miniature. In Proceedings of the SIGCHI confer-ence on human factors in computing systems, 1995.

[10] M. Usoh, K. Arthur, M. C. Whitton, R. Bastos, A. Steed, M. Slater,and J. Frederick P. Brooks. Walking > Walking-In-Place > Flying, invirtual environments. In Proceedings of the 26th annual conferenceon computer graphics and interactive techniques, SIGGRAPH ’99,pages 359–364, 1999.

[11] A. von Kapri, T. Rick, and S. Feiner. Comparing steering-based traveltechniques for search tasks in a cave. In IEEE Virtual Reality 2011.

68