[IEEE 2010 IEEE Virtual Reality Conference (VR) - Boston, MA, USA (2010.03.20-2010.03.24)] 2010 IEEE Virtual Reality Conference (VR) - Mixed reality in virtual world teleconferencing
Post on 09-Feb-2017
Mixed Reality in Virtual World Teleconferencing
Tuomas Kantonen (1), Charles Woodward (1), Neil Katz (2)
(1) VTT Technical Research Centre of Finland, (2) IBM Corporation
ABSTRACTIn this paper we present a Mixed Reality (MR) teleconferencingapplication based on Second Life (SL) and the OpenSim virtualworld. Augmented Reality (AR) techniques are used fordisplaying virtual avatars of remote meeting participants in realphysical spaces, while Augmented Virtuality (AV), in form ofvideo based gesture detection, enables capturing of humanexpressions to control avatars and to manipulate virtual objects invirtual worlds. The use of Second Life for creating a sharedaugmented space to represent different physical locations allowsus to incorporate the application into existing infrastructure. Theapplication is implemented using open source Second Life viewer,ARToolKit and OpenCV libraries.
KEYWORDS: mixed reality, virtual worlds, Second Life,teleconferencing, immersive virtual environments, collaborativeaugmented reality.
INDEX TERMS: H.4.3 [Information System Applications]:Communications: Applications computer conferencing,teleconferencing, and video conferencing; H.5.1 [InformationSystems]: Multimedia Information Systems artificial,augmented, and virtual realities.
1 INTRODUCTIONThe need for effective teleconferencing systems is increasing,mainly due to economical and environmental reasons astransporting people for face-to-face meetings consumes lot oftime, money and energy. Massively multi-user virtual 3D worldshave lately gained popularity as teleconferencing environments.This interest is not only academic as one of the largest virtualconferences was held by IBM in late 2008 with over 200participants. The conference, hosted in a private installment ofSecond Life virtual world, was a great success saving anestimated $320,000 compared to the expense of having theconference held in the physical world .
In this paper, we present a system for mixed realityteleconferencing where a mirror world of a conference room iscreated in Second Life and the virtual world is displayed in thereal-life conference room using augmented reality techniques. Thereal peoples gestures are reflected back to Second Life. Theparticipants are also able to interact with shared virtual objects onthe conference table. A synthetic illustration of such a setting isshown in figure 1.
The structure of the paper is as follows. Section 2 describes thebackground and motivation for our work. Section 3 explainsprevious work related to the subject. Section 4 gives an overviewof the system we are developing. Section 5 goes into someexplanation of Second Life technical detail. Section 6 gives adescription of our prototype implementation. Section 7 provides adiscussion of results, as well as items for future work.Conclusions are in the section 8.
2 BACKGROUNDThere are several existing teleconference systems, ranging fromold but still often used audio teleconferencing and videoteleconferencing to web-based conferencing applications. 2Dgroupware and even massively multi-user 3D virtual worlds havealso been used for teleconferencing.
Each of these existing systems has its pros and cons.Conference calls are quick and easy to set up without otherhardware than a mobile phone, yet it is limited to audio only andrequires a separate channel e.g. for document sharing.Videoconferencing adds a new modality as pictures of participantsare transferred but it requires more hardware and bandwidth,being quite expensive in the high-end. Web-conferencing islightweight and readily supports document and applicationsharing but it lacks natural interaction between users.
We see several advantages of using a 3D virtual environment,such as Second Life or OpenSim among many other platforms, asalternative means for real-time teleconferencing andcollaboration. First, the users are able to see all meetingparticipants and get a sense of presence not possible in atraditional conference call. Second, the integrated voice capabilityof 3D virtual worlds provides spatial and stereo audio. Third, the3D environment itself provides a visually appealing sharedmeeting environment that is just not possible with other means ofteleconferencing. However, the lack of natural gestures constitutesa major drawback for real interaction between the participants.
Figure 1. Illustration of Mixed Reality teleconference:Second Life avatar among real people, wearing ultra lightweight data glasses, sharing a virtual object on the table,
inside virtual room, displayed in CAVE.
IEEE Virtual Reality 201020 - 24 March, Waltham, Massachusetts, USA978-1-4244-6238-4/10/$26.00 2010 IEEE
3 RELATED WORKIn our work, virtual reality and augmented reality is combined insimilar manner as in the original work by Piekarski et al. .Their work was quite limited in the amount of augmentedvirtuality as only position and orientation of users weretransferred into the virtual environment. Our work focuses oninteraction between augmented reality and a virtual environment.Therefore our work is closely related to immersive telepresenceenvironments such as [3, 4]. Several different immersive 3D videoconferencing systems are described in .
Local collaboration in augmented reality has been studied forexample in [6, 7]. Collaboration is achieved by presenting co-located users the same virtual scene from their respectiveviewpoints and providing the users simple collaboration toolssuch as virtual pointers. Remote AR collaboration has mostlybeen limited to augmenting live video such as in  or lateraugmenting a 3D model reconstructed from multiple videocameras as in . Remote sharing of the augmented virtualobjects and applications has been studied for example in .
Our work uses Second Life and the open source implementationof Second Life server called OpenSim, which are multi-uservirtual worlds, as the virtual environment for presenting sharedvirtual objects. Using Second Life in AR has been previouslystudied by Lang et al.  as well as Stadon  although theirwork does not include augmented virtuality.
In the simplest case, augmented virtuality can be achieved bydisplaying real video inside a virtual environment as in . Thisapproach has been also used for virtual videoconferencing in and augmenting avatar heads in . Another form of augmentedvirtuality is avatar puppeteering where human body gestures arerecognized and used to control the avatar, either only the avatarsface as in  or the whole avatar body as in . However, onlylittle previous work has been presented on augmenting SecondLife avatars with real life gestures. The main exception is the VR-Wear system  for controlling avatars facial expressions.
4 SECOND LIFE VIRTUAL WORLDSecond Life is a free, massively multi-user on-line game-like 3Dvirtual world for social interaction. It is based on communitycreated content and it even has a thriving economy. The virtualworld users, called residents, are represented by customizableavatars and can take part in different activities provided by otherresidents.
For interaction, Second Life features spatial voice chat, textchat and avatar animations. Only the left hand of the avatar can befreely animated on-the-fly, while all other animations rely on pre-recorded skeletal animations that the user can create and upload tothe SL server.
For non-expert SL users, however, meetings in SL can be quitestatic with the who is currently speaking indicator being the onlyactive element. From our experience, actively animating theavatar while talking takes considerable training and directs theusers focus away from the discussion.
Second Life has client-server architecture and each server isscalable to tens of thousands of concurrent users. The server isproprietary to Linden Labs but there exists also the communitydeveloped SL compatible server OpenSimulator .
5 SYSTEM OVERVIEWIn this project we developed a prototype and proof-of-concept ofvideo conference meeting taking place between Second Life andthe real world. Our system combines immersive virtualenvironment, collaborative augmented reality and human gesturerecognition in a way to support collaboration between real and
virtual worlds. We call the system Augmented Collaboration inMixed Environments (ACME).
In the ACME system, some participants of the meeting occupya space in Second Life while others are located around a table inreal world. The physical meeting table is replicated in Second Lifeto support virtual object interactions as well as avatar occlusions.The people in real world see the avatars augmented around a realworld table, displayed by video see through glasses, immersivestereoscopic walls or within a video teleconference screen.Participants in Second Life see the real world people as avatarsaround the meeting table, augmented with hand and bodygestures. Both the avatars and real people can interact with virtualobjects shared between them, on the virtual and physicalconference tables respectively.
The main components of the system are: co-located userswearing video-see-throught HMD, a laptop for each user runningthe modified SL client, a ceiling mounted camera above each userfor hand tracking and remote users using the normal SL client.
The system is designed for restricted conference roomenvironments where meeting participants are seated around a welllit, uniformly colored table. As an alternative to HMDs, a CAVEstyle stereo display environment or a plain old video screens canbe used.
Figure 2 shows how the ACME system is experienced in ameeting between two participants, one attending the meeting inSecond Life and the other one in real life. It should be noted thatthe system is designed for multiple simultaneous remote and co-located users. A video of the ACME system is available at .
6.1 GeneralThe ACME system is implemented by modifying the open sourceSecond Life viewer . The viewer is kept backward compatiblewith original Second Life so that, even though more advancedfeatures might require server side changes, all major ACMEfeatures are also available when the user is logged in to theoriginal Second Life world.
The SL client was run on Dell Precision M6400 laptops (IntelMobile Core 2 Duo 2.66GHz, 4GB DDR3 533MHz). LogitechQuickCam Pro for Notebooks USB cameras (640x480 RGB, 30FPS) were used for video-see-through functionality, whileUnibrain Fire-I firewire camera (640x480 YUV, 7.5 FPS) wasused for hand tracking. eMagin Z800 (800x600, 40 diagonalFOV) and MyVu Crystal 701 (640x480, 22.5 diagonal FOV)HMDs were used as video-see-through displays.
Usability studies of the system are currently limited to projectsinternal testing of individual components. The author hasevaluated the technical feasibility of each feature and commentshave been collected during multiple public demonstrations,including a demo at ISMAR 2009. We have been able to identifykey points where the application has possibilities to overcomelimitations of current systems and also points whereimprovements need to be made to create a really usable system. Aproper user study will be conducted during 2010 with HIT LabNZ, comparing the ACME system with other means oftelecommunication. Detailed plans of the study have not yet beenmade.
6.2 Augmenting realityTo be able to use SL for video see-through AR, three steps arerequired: video capture, camera pose estimation and rendering ofcorrectly registered virtual objects.
Currently the ACME system supports two different videosources, either ARToolkit  video capture routines for USB
devices or CMU  firewire camera driver API. ARToolkitOpenGL subroutines are used for video rendering.
HMD camera pose is estimated by ARToolkit marker trackingsubroutines. Multiple markers are placed around the walls of theconference room and the table so that at least one marker isalways seen by the user wearing a HMD. We experimented with20cm by 20cm and 50cm by 50cm markers at the distance from 1to 3 meters from the user. Distance between markers was aboutthree times the width of the marker.
Real world coordinate system is defined by a marker that lieson a conference table. Registration with SL coordinates is done byfixing one SL object to the real world origin and using objectscoordinate axis as unit vectors. This anchor object is selected inthe ACME configuration file. If the marker is not on the table, theanchor object must be transformed accordingly.
Occlusion is the ability of a physical object to cover those partsof virtual objects that are physically behind it. In the ACMEsystem, occlusion is implemented by modeling the physical spacein the virtual world and using the virtual model as a mask whenrendering virtual objects. The virtual model itself is not visible inthe augmented image as otherwise it would cover the veryphysical objects we want to see. Similar method was used in .
The ACME system does not place any restrictions on what kindof virtual objects can be augmented. Any virtual object can alsobe used as occlusion model. However, properly augmentingtransparent objects has not yet been implemented.
6.3 Hand trackingFor hand tracking, a camera is set up over the conference roomtable. The camera is oriented downwards so that the whole table isvisible in the camera image. The current implementation supportsonly one hand tracking camera.
Hand tracking video capturing and processing is done in aseparate thread from rendering so that a lower video frame ratecan be used without affecting rendering of the augmented video.
Hands are recognized from the video image by HSV (hue,saturation and value) segmentation. HSV color space has beenshown to perform well for skin detection . Each HSV channelis thresholded and combined into a single binary mask. Acalibration utility was created for calibrating threshold limits totake different lightning conditions into account.
The current implementation uses only a single camera for handtracking, therefore proper 3D hand tracking has not yet beenimplemented. User hand is always assumed to hover 15cm overthe table so that the user can do simple interactions with virtualobjects on the table.
6.4 Gesture interactionInteraction in the ACME system is divided into two categories:interacting with other avatars and interacting with virtual objects.
Avatar interaction is more relaxed as the intent of body languageis conveyed even when avatar movements dont precisely matchto user motion. Object interaction requires finer control as objectscan be small and in many cases the precise relative position ofobjects is of importance.
The orientation of the users face is a strong cue about wherethe user is currently focusing on. When the user is wearing avideo-see-through HMD we use the orientation of the camera,already computed for augmented reality visualization, to rotate theavatars head accordingly.
User hands are tracked by the hand...