[ieee 2012 ieee virtual reality (vr) - costa mesa, ca, usa (2012.03.4-2012.03.8)] 2012 ieee virtual...

Analysis of IR-based Virtual Reality Tracking Using Multiple KinectsSrivishnu Satyavolu∗

Dept. of Computer ScienceUniv. of Minnesota Duluth

Gerd BruderDept. of Computer Science

Univ. of Wurzburg

Pete WillemsenDept. of Computer ScienceUniv. of Minnesota Duluth

Frank SteinickeDept. of Computer Science

Univ. of Wurzburg

Figure 1: Multiple Kinects are arranged in a circular manner. CircledKinect is the primary tracker. Kinects circled with dotted lines areproviding IR interference.

ABSTRACT

This article presents an analysis of using multiple Microsoft Kinectsto track users in a VR system. More specifically, we analyse thecapability of Kinects to track infrared points for use in VR applica-tions. Multiple Kinect sensors may serve as a low cost and afford-able means to track position information across a large lab space inapplications where precise location tracking is not necessary. Wepresent our findings and analysis of the tracking range of a Kinectsensor in situations in which multiple Kinects are present. Over-all, the Kinect sensor works well for this application and in lieu ofmore expensive options, the Kinect sensors may be a viable optionfor very low-cost tracking in VR applications.

1 INTRODUCTION

This article analyzes the capability of Microsoft’s Kinect sensor toact as a tracking system in VR applications. The introduction ofMicrosoft’s Kinect sensor instigated much interest across the VRcommunity, as the sensor contains a rich array of sensors for cap-turing 3D scene information, at a very low cost of approximatelyU.S.$150. Traditional tracking systems are robust and precise, buttend to be quite expensive, when compared with the price of theKinect. We present our efforts using multiple Microsoft Kinectsensors to provide an inexpensive, infrared position tracking sys-tem for virtual reality. We analyze the Kinect’s ability to robustlytrack an infrared marker across a 7m x 9m lab space, focusing onthe accuracy of position data acquired over distance and in the pres-ence of multiple Kinect sensors that can potentially cause severe IRinterference.

2 RELATED WORK

The Kinect has seen wide-spread adoption outside the traditionalgame console market. Since its launch, the Kinect has been used

∗e-mail: [email protected]

in numerous projects integrating with either the Microsoft KinectSDK or through open-source Linux drivers like libfreenect andOpenNI. Specifically, a Kinect sensor is a motion-sensing input de-vice that comes with the Microsoft Xbox 360 console. The sensorcontains a RGB camera, a structured infrared (IR) light projector, aninfrared camera, and microphones. Some background informationon the device is available from both Microsoft and other sources [6].Kinects need to be accurately calibrated to acheive robust depth es-timates [4, 2]. Microsoft’s KinectFusion system demonstrated ro-bust acquisition of depth data for 3D reconstruction [3].

3 TRACKING POSITIONS WITH KINECTS

The overall objective of this work is to understand the efficacy ofusing multiple Microsoft Kinects as tracking devices for monitor-ing a user’s position and skeletal system in VR applications. Whilea single Kinect is capable of tracking an IR light across the lab,multiple Kinects afford different views of users within the trackedspace that are not possible with a single Kinect. Combining mul-tiple views can strengthen skeletal tracking mechanisms providedby APIS such as Microsoft’s Kinect SDK or the OpenNI frame-work. However, these APIs’ effective ranges are limited to about1.2 to 3.5 meters, which could limit usable mobility in VR applica-tions where longer tracking ranges are desired. Understanding thetracking range of the Kinect is an important factor to consider forKinect-based position tracking.

The operational range of skeletal tracking APIs is limited, likelybecause the depth resolution of the Kinect decreases with increasein distance from the Kinect. Hence, it becomes difficult to estimateskeletal joint positions as users get farther away. The tracking sys-tem described in this paper exploits the fact that, even though it’sdifficult to estimate the entire set of skeletal positions after a certainrange, it’s still possible to get reliable depth values enough to getan estimate of the user’s position across a large lab space. To getreliable position tracking, we use a single IR marker attached to theuser’s head that can be tracked over a large space.

While this article focuses on IR tracking, we specifically test thetracking sensitivity to IR interference from multiple Kinects. Thereason why interference from multiple kinects may be an issue in-volves using multiple Kinects to improve skeletal tracking. Withmultiple Kinects, IR projectors will potentially project structuredIR light into the scene and possibly into other Kinect IR sensors.In theory, this interference could cause problems with the depthvalues from the kinect. Some research has been done to reduceinterferences by using a Kinect-shuttering approach [5, 1]. Our ex-periments specifically focus on testing the Kinect-based IR trackingsystem against interference from multiple Kinects.

3.1 Tracking IR MarkerThere are a couple of issues that need to be resolved inorder toobtain the maker world position. First, the IR marker creates it’sown small circle of IR interference that makes it difficult to pre-cisely identify it in the IR image. Moreover, interference from theother Kinects does cause problems for detecting the IR marker. Inthese situations, our tracker can get confused between the actual IRmarker and the bright IR interference from the additional Kinects.To overcome these issues, we apply locality of neighborhood andstatic background separation techniques.

149

IEEE Virtual Reality 20124-8 March, Orange County, CA, USA978-1-4673-1246-2/12/$31.00 ©2012 IEEE

Figure 2: Images from (left to right): background depth image, cur-rent depth image with the tracked object, extracted foreground imagewith just the tracked object.

Background depth information of the entire scene is captured bymerging depth images of the same scene obtained over time. Theacquisition of the background depth image is done just once uponstarting the tracker system. The background image is used to extractforeground images in real time by a simple background separationtechnique using OpenCV. A series of OpenCV image morphologi-cal operations are performed on the extracted foreground depth andIR images to remove any external noise. These images can now beused to extract the posi tion and depth information of the IR marker.

Each IR image from the Kinect is analyzed in real time to findthe brightest pixel in the foreground (e.g. our IR marker). Figure2 illustrates the segmented depth information. Pixel depth is calcu-lated using a locality of neighborhood principle which estimates theIR marker depth value by examining the surrounding pixels outsidethe small circle of interference of the IR marker in the depth image.Using the mean of valid depths in a 15 by 15 pixel window, workedas a good estimate for this implementation.

Real world IR marker position Pworld is then computed from theextracted raw values, using the the following equations given byNicholas Burrus and Stephane Magnetat’s posts:

zworld = 0.1236∗ tan(dimage/2842.5+1.1863)

xworld = (ximage− cxd)∗ zworld/ f xd

yworld = (yimage− cyd)∗ zworld/ f yd

where f xd , f yd , cxd and cyd are the intrinsics of the depth cam-era. Then, reprojected 3D point on the color image P′world is:

P′world = R∗Pworld +T

xrgb = (x′world ∗ f xrgb/z′world)+ cxrgb

yrgb = (y′world ∗ f yrgb/z′world)+ cyrgb

where f xrgb, f yrgb, cxrgb and cyrgb are the intrinsics of the depthcamera, R and T are the rotation and translation parameters esti-mated during the stereo calibration.

3.2 Experimental SetupOur experimental setup consists of 5 Kinects mounted on chairs ofsame height, with one being the primary Kinect (Figure 1) whilethe remaining introduce interference. They are all arranged to pro-vide tracking centered around a 7m by 9m space. We used a World-Viz IR marker mounted on a cart as the tracked point. The depthsensor of the primary Kinect is calibrated using the intrinsic pa-rameters of the Kinect calculated using RGBDemoV0.4, NicholasBurrus’ implementation of OpenCV’s standard chessboard recogni-tion techniques. The tracked space consists of 30 manually placedpoints, that are roughly a meter apart from each other. At eachpoint, 1000 samples of the IR marker positions are collected whichare then averaged to get the final estimate of the tracked position.This is done both with and without interference from the otherkinects. The entire step is repeated for all the 30 points. The jit-ter in these IR marker positions is then analyzed over all the 1000samples with and without interference.

Figure 3: Comparison of the kinect tracking data aligned with dataacquired from a World Viz PPTH Tracking system.

4 RESULTS

Figure 3 compares the tracked positions with and without interfer-ence. The green circles represent the estimates of the points mea-sured without interference, while the red crosses represent the esti-mates of the points measured with interference from other Kinects.Standard deviations of reported pixel positions were nearly zero inall cases except for some points with high interference. The meanand standard deviations of the single point with maximized inter-ference was a mean depth value of 1016.0 (SD=89.6723). With-out interference at this location, the mean depth value was 1039.6(SD=1.0484). Additionally, when compared with our WorldVizPPTH system the Kinect was roughly off by 3cm on an average.

5 CONCLUSION

We observed that interference is not a major concern for IR trackingwith the Kinect. Thus, we believe that based on our initial exper-iments, the Kinect can serve as a low cost tracking system. Withmultiple Kinects, it should be possible to implement a more func-tional tracking system that allows multiple users to interact usinggestures across VR lab spaces.

REFERENCES

[1] K. Berger, K. Ruhl, C. Brummer, Y. Schroder, A. Scholz, and M. Mag-nor. Markerless motion capture using multiple color-depth sensors. InProceedings of Vision, Modeling and Visualization (VMV), pages 317–324, 2011.

[2] C. D. Herrera, J. Kannala, and J. Heikkila. Accurate and practicalcalibration of a depth and color camera pair. In Proceedings of In-ternational Conference on Computer Analysis of Images and Patterns(CAIP), pages 437–445, 2011.

[3] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli,J. Shotton, S. Hodges, D. Freeman, A. Davison, and A. Fitzgibbon.Kinectfusion: Real-time 3d reconstruction and interaction using a mov-ing depth camera. In Proceedings of ACM Symposium on User Inter-face Software and Technology (UIST), pages 559–568, 2011.

[4] K. Khoshelham. Accuracy analysis of kinect depth data. In ISPRSWorkshop on Laser Scanning 2011, 2011.

[5] Y. Schroder, A. Scholz, K. Berger, K. Ruhl, S. Guthe, and M. Mag-nor. Multiple kinect studies. Technical Report 09–15, ICG, TU Braun-schweig, 2011.

[6] J. Smisek, M. Jancosek, and T. Pajdla. 3d with kinect. Technical report,CTU Prague, 2011.

150

[ieee 2012 ieee virtual reality (vr) - costa mesa, ca, usa (2012.03.4-2012.03.8)] 2012 ieee virtual...

Documents