[ieee 2010 ieee virtual reality conference (vr) - boston, ma, usa (2010.03.20-2010.03.24)] 2010 ieee...

2
An Extensible Mirror World from User-Generated Content Severi Uusitalo, Peter Eskolin, Yu You, Petros Belimpasakis Nokia Research Center ABSTRACT In this paper we describe a system for creating a navigable mirror world, utilizing community photographs of real life environments. We present the essential architecture and a prototype solution for not only geotagging, but also spatially structuring content. Mash-up interfaces are available towards 3 rd parties, for linking and georeferencing their content. INDEX TERMS: H.3.3 [Information Systems]: Information Storage and Retrieval -information search and retrieval; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism – virtual reality 1 INTRODUCTION Web based photo services have become exceedingly popular for sharing multimedia content. These services often provide a map for viewing the photos, for discovering mutually related photos as well as for providing contextual understanding for the viewer. However, simply georeferencing photos does not lead to easy navigation or understanding of the spatial relations between the photos, particularly taken by different people. It is possible to structure sets of photos further by using structure and motion computing, but this approach alone typically leaves many photos out from the presentation. Camera phones equipped with a Global Positioning System (GPS) and over sensors can nowadays automatically embed geo-coordinates to the metadata of the photo. Our architecture and prototype implementation focuses on a complete system which presents structured content, originating from popular photo sharing services, as a spatial presentation. Additionally, the system can be used as the basis for future use cases, by providing web mash-up interfaces towards 3 rd parties. 2 RELATED WORK Davis et al. [1] describe the use of metadata for organizing, searching, and browsing digital photos, and also for creating new experiences. They emphasize automatic means to collect the metadata from the context. Google Earth provides features for georeferencing photos and showing them in their respective places on a geographically contoured and 3D navigable view. Aspen moviemap [2] utilized a series of spatially structured images for providing an interactive navigation experience through the town of Aspen. Panoramas allow viewer to “look around” while a moviemap allows moving around. Currently, Google Streetview links sequences of panorama-photos created by specially equipped cars. The system of Torniai et al. [5] uses heading information from a separate sensor-equipped device, for recording heading at the time of taking a photo. A browsing interface uses metadata to provide arrows for moving towards photos taken in the respective direction. The system does not include full attitude, as pitch and roll are not detected. 3 THE CORE SERVICE Our service has similarities to the interactive moviemaps, where a feeling of immersion is obtained by the spatially structured imagery and navigation through them. Together with a map and satellite image, the neighboring photos and videos provide the user cognitive cues for understanding the semantic structure of a place [6]. An eight week long user study [4] conducted in France and in Finland has guided the development of the prototype described in this paper. It all starts when the user takes photos with the regular camera application of a mobile camera phone like usual. A software component identifies the exact moment of content capturing and records the needed sensor parameters to present the captured content in the 3D-space [3]. Those include the exact GPS location, the yaw, roll and pitch angles, of the phone. The metadata can be embedded directly in the Exchangeable image file format (EXIF) of the file, which is supported by the JPEG format. The user can upload the photo to the on-line sharing service (e.g. Flickr), utilizing the build-in sharing software that ships with the device. As a result of their activity in taking photos, the users are provided an ever-changing representation of their part of the world: a Mirror World with detail level defined by them. The mirror world is represented primarily via a standard web browser. Figure 1. Web UI of Image Space, displaying to near-by photos in the 3D space. Photos are displayed in a 3D space and other photos in the view from the current viewpoint are shown as angled rectangles based on their orientation and perspectives. These serve as hyperlinks to the respective photos. The rectangles are presented in this mirror world in the same attitude as the camera-phone was when taking the photo (Figure 1). Selecting a rectangle positions the virtual camera view to the respective photo, by first displaying a flying animation towards that image, in order to give the illusion of moving in the space. Via navigating through a scene by selecting photos the awareness of the surroundings of an individual photo can be obtained. e-mail: {first.lastname}@nokia.com, P.O. Box 1000, 33721 Tampere, Finland 311 IEEE Virtual Reality 2010 20 - 24 March, Waltham, Massachusetts, USA 978-1-4244-6238-4/10/$26.00 ©2010 IEEE

Upload: petros

Post on 09-Mar-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2010 IEEE Virtual Reality Conference (VR) - Boston, MA, USA (2010.03.20-2010.03.24)] 2010 IEEE Virtual Reality Conference (VR) - An extensible mirror world from user-generated

An Extensible Mirror World from User-Generated Content

Severi Uusitalo, Peter Eskolin, Yu You, Petros Belimpasakis

Nokia Research Center

ABSTRACT In this paper we describe a system for creating a navigable

mirror world, utilizing community photographs of real life environments. We present the essential architecture and a prototype solution for not only geotagging, but also spatially structuring content. Mash-up interfaces are available towards 3rd parties, for linking and georeferencing their content.

INDEX TERMS: H.3.3 [Information Systems]: Information Storage and Retrieval -information search and retrieval; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism – virtual reality

1 INTRODUCTION Web based photo services have become exceedingly popular for

sharing multimedia content. These services often provide a map for viewing the photos, for discovering mutually related photos as well as for providing contextual understanding for the viewer. However, simply georeferencing photos does not lead to easy navigation or understanding of the spatial relations between the photos, particularly taken by different people. It is possible to structure sets of photos further by using structure and motion computing, but this approach alone typically leaves many photos out from the presentation. Camera phones equipped with a Global Positioning System (GPS) and over sensors can nowadays automatically embed geo-coordinates to the metadata of the photo. Our architecture and prototype implementation focuses on a complete system which presents structured content, originating from popular photo sharing services, as a spatial presentation. Additionally, the system can be used as the basis for future use cases, by providing web mash-up interfaces towards 3rd parties.

2 RELATED WORK Davis et al. [1] describe the use of metadata for organizing,

searching, and browsing digital photos, and also for creating new experiences. They emphasize automatic means to collect the metadata from the context. Google Earth provides features for georeferencing photos and showing them in their respective places on a geographically contoured and 3D navigable view.

Aspen moviemap [2] utilized a series of spatially structured images for providing an interactive navigation experience through the town of Aspen. Panoramas allow viewer to “look around” while a moviemap allows moving around. Currently, Google Streetview links sequences of panorama-photos created by specially equipped cars. The system of Torniai et al. [5] uses heading information from a separate sensor-equipped device, for recording heading at the time of taking a photo. A browsing interface uses metadata to provide arrows for moving towards photos taken in the respective direction. The system does not

include full attitude, as pitch and roll are not detected.

3 THE CORE SERVICE Our service has similarities to the interactive moviemaps, where

a feeling of immersion is obtained by the spatially structured imagery and navigation through them. Together with a map and satellite image, the neighboring photos and videos provide the user cognitive cues for understanding the semantic structure of a place [6]. An eight week long user study [4] conducted in France and in Finland has guided the development of the prototype described in this paper.

It all starts when the user takes photos with the regular camera application of a mobile camera phone like usual. A software component identifies the exact moment of content capturing and records the needed sensor parameters to present the captured content in the 3D-space [3]. Those include the exact GPS location, the yaw, roll and pitch angles, of the phone. The metadata can be embedded directly in the Exchangeable image file format (EXIF) of the file, which is supported by the JPEG format. The user can upload the photo to the on-line sharing service (e.g. Flickr), utilizing the build-in sharing software that ships with the device.

As a result of their activity in taking photos, the users are provided an ever-changing representation of their part of the world: a Mirror World with detail level defined by them. The mirror world is represented primarily via a standard web browser.

Figure 1. Web UI of Image Space, displaying to near-by

photos in the 3D space.

Photos are displayed in a 3D space and other photos in the view from the current viewpoint are shown as angled rectangles based on their orientation and perspectives. These serve as hyperlinks to the respective photos. The rectangles are presented in this mirror world in the same attitude as the camera-phone was when taking the photo (Figure 1). Selecting a rectangle positions the virtual camera view to the respective photo, by first displaying a flying animation towards that image, in order to give the illusion of moving in the space. Via navigating through a scene by selecting photos the awareness of the surroundings of an individual photo can be obtained.

e-mail: {first.lastname}@nokia.com, P.O. Box 1000, 33721 Tampere, Finland

311

IEEE Virtual Reality 201020 - 24 March, Waltham, Massachusetts, USA978-1-4244-6238-4/10/$26.00 ©2010 IEEE

Page 2: [IEEE 2010 IEEE Virtual Reality Conference (VR) - Boston, MA, USA (2010.03.20-2010.03.24)] 2010 IEEE Virtual Reality Conference (VR) - An extensible mirror world from user-generated

4 THE SYSTEM ARCHITECTURE The system consists of three main entities, namely the mobile

client, the backend infrastructure and the Web UI client, as shown in Figure 2.

Figure 2. High level architecture of our system

We have been envisioning Image Space to be an open platform, rather than a stand-alone service. Following the model web mash-up paradigm, there are two ways of achieving that. Firstly Image Space can utilize external content repositories/photo sharing services, so that content can reside elsewhere, yet still be used with the Image Space visual experience. Secondly, the service provides web interfaces to 3rd parties, for linking their services and offering to the mirror world, thus making Image Space a 3D front end to further content

4.1 Utilizing External Content Repositories As on-line media sharing services are nowadays widely used,

people already have their content and social networks and we do not expect them to move them, to yet another new service. Any such content repository service could be linked to Image Space, by fulfilling a set of requirements. This way, a large number of content repository services could be linked to the core Image Space service, as shown in Figure 3, providing their content and social network associations towards the 3D mirror world.

Figure 3. Utilizing multiple external content repositories

The core requirements imposed to these repositories include: 1. Provide interfaces, so that other services can access content. 2. Allowing uploading of media, along with enhanced geo-

location and camera pose metadata. 3. Allow other services to search, and get the matched media,

within a given geographical area (e.g. bounded box search). 4. Search result clustering, so that a group of search results can

be grouped under one entity. 5. Allow the metadata update of a piece of media (e.g. for

updating media orientation metadata, if manually edited by the

user on the mirror world 3D space) For supporting social extensions, the core requirements are: 6. Providing the social network details, such as which are the

user’s friends, and what content items are shared. 7. Providing commenting, tagging, rating, etc. Many popular on-line services already support these

requirements (e.g. the widely used Flickr photo service).

4.2 Opening up Interfaces for 3rd Parties Moreover, as a web platform Image Space provides interfaces

to other 3rd parties, to mash-up their services with ours. Via a secure interface authentication framework the user can authorize the “linkage” of the two services. Then calls can be made to the Image Space API by the specific external 3rd party service, on behalf of the specific user. An example of external elements can be inserted in the Image Space mirror world view are micro-blogging entries, linked to geo-location. There are already a few services that combine micro-blogging to location, which visualize the user posts on top a 2D map. This kind of data (simple blog post along with its associated geo-metadata) can be visualized in our system, as part of the mirror world, along with the user generated multimedia content.

Different types of content could be also mashed-up with Image Space, such as articles related to points of interest, user contextual information or advertising material.

5 CONCLUSION We have presented a mirror world solution based on user

generated content. By exploiting a sensor set of camera, GPS, accelerometer and magnetometer in a single device the system structures and spatially presents the photos created by the user, in an easy to navigate environment. We implemented a prototype of a system for incremental development of a web service platform. The web interfaces towards external services allow our system to evolve towards an expandable mirror world platform that links to other services and brings their content to the virtual 3D space.

REFERENCES [1] M. Davis, S. King, N. Good, and R. Sarvas. From context to content:

leveraging context to infer media metadata. In Proceedings of the 12th Annual ACM international Conference on Multimedia (New York, NY, USA, October 10 - 16, 2004). MULTIMEDIA '04. ACM, New York, NY, 188-195.

[2] D. Kirk, A. Sellen, C. Rother, and K. Wood. Understanding photowork. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems 2006, ACM Press (2006), 761-770.

[3] M. Kähäri, and D. J. Murphy. MARA - Sensor Based Augmented Reality System for Mobile Imaging. In Proceedings of the Fifth IEEE and ACM International Symposium on Mixed and Augmented Reality, Santa Barbara, USA, October 2006.

[4] A. Lucero, M. Boberg, S. Uusitalo. Image Space: Capturing, Sharing and Contextualizing Personal Pictures in a Simple and Playful Way. In Proceedings of the ACM Advancements in Computer Entertainment (ACE 2009), Athens, Greece, October 2009

[5] C. Torniai, S. Battle, S. Cayzer. The Big Picture: Exploring Cities through Georeferenced Images and RDF Shared Metadata, CHI 2007 Workshop “Imaging the City”, San Jose, USA, April 2007.

[6] S. Uusitalo, P. Eskolin, P. Belimpasakis. A Solution for Navigating User-Generated Content. In Proceedings of the Eighth IEEE and ACM International Symposium on Mixed and Augmented Reality, Orlando, Florida, USA, October 2009.

312