[ieee 2013 ieee virtual reality (vr) - lake buena vista, fl (2013.3.18-2013.3.20)] 2013 ieee virtual...

2
Webizing Mobile AR Contents Sangchul Ahn University of Science and Technology Heedong Ko Korea Institute of Science and Technology Steven Feiner Columbia University ABSTRACT This paper presents a content structure to build mobile AR applica- tions in HTML5 to achieve a clean separation of mobile AR con- tents from their application logic to scale like the web. By ex- tending POIs (Point of Interest) to objects and places with Uni- form Resource Identifier (URI), we could build objects of inter- est for mobile AR application as DOM (Document Object Model) elements and control their behavior and user interactions through DOM events. Using our content structure, a mobile AR applica- tions can be developed as normal HTML documents seamlessly under current web eco-system. Index Terms: K.5.1 [Information Interface and Presentation (e.g., HCI)]: Multimedia Information Systems—Artificial, augmented, and virtual realities; K.5.4 [Information Interface and Presentation (e.g., HCI)]: Hypertext/Hypermedia—Architectures 1 I NTRODUCTION Most mobile AR applications have been developed in two extreme cases: one at a small scale indoors on a table top or in a room envi- ronment with a handful of objects in place, and the other at a very large scale outdoors with mostly location and orientation informa- tion with little or no object recognition and tracking. Recently, in- door AR applications are getting off of the desktop and becoming mobile to the next room, across the hallway, and to other buildings or across town. These mobile AR applications must be aware of their own location context in order to bring the objects of interest to bear dynamically for augmenting.[2] KARML[4] extended KML, location-based contents standard by OGC, to include the AR XML elements. KML is very popular through Google Maps and Google Earth applications but it is not based on HTML so the interoperability between KARML and ex- isting web contents is limited. Our approach is to make use of ex- isting HTML standard but extend the use of URI to identify and reference physical objects and places. Any HTML DOM elements can be linked to these physical URI’s and DOM eventing mech- anism can be handled by javascript layer for behaviour and user interaction. X3DOM[1] also attempts to integrate X3D scene el- ements as HTML DOM elements and events seamlessly like us. However, it does not have the notion of physical URI’s as objects and places of interest for the in-situ AR and no notion of location context. In our approach, location-based AR contents can be inter- mixed within regular HTML documents by augmenting any HTML DOM element with its location context through extension of CSS3. There are some commercial HTML5-based AR platforms like Junaio and ARchitect. Junaio provides AREL as a javascript AR API for location-based AR and GLUE as its XML extension for image recognition-based AR. ARchitect is another approach of javascript SDK for AR by Mobilizy. Both provide javascript li- brary for building AR application with location and image recog- nition based tracking. Their approach introduces AR functional- ities procedurally through javascript objects. This procedural ap- e-mail: [email protected] proach makes AR contents difficult to intermix and interoperate with standard HTML web contents and their interaction through DOM event handling mechanism. Our AR content structure is based on HTML and selection/binding mechanism of CSS with their associated javascript event handling of DOM events. 2 MOBILE AR CONTENT STRUCTURE IN HTML Webizing mobile AR content refers to the process of combining virtual and real world objects through three components: use of URI for dealing with physical objects, CSS Place Media module, and the DOM event extensions for situated interaction. 2.1 Point of Interest In our model, POI refers to physical object and it is viewed as a physical resource, an extension of web resource. It is assumed that any POI has its URI and accessible through HTTP. However identifying and tracking physical objects require the feature sam- ples suited for the recognition method. To address this, we use the REST (REpresentational State Transfer) to provide the various types of feature samples as a representation of its URI via HTTP content negotiation. And the browser can choose one of available tracking methods and request the feature sample data to URI of tar- get POI. As a result, the AR content does not need to include all feature samples for tracking but just refer to URIs of target POIs. Place is an extension of POI that can contain other POIs their locations. In addition to the default properties of POI, they have several environmental properties: boundary, Local Coordinate Ref- erence System (LCRS), and local POI map. Boundary is the ge- ometry separating the inside/outside of the place. The subspace of place is defined a LCRS description represented to a simpli- fied schema of Geography Markup Language (GML) CRS package model.[6] A local POI map contains the locations of POIs within the place boundary. By this extension, the space partitioning tree of the physical world can be constructed recursively. The scope is dynamically determined by the user location. The place is regarded as a POI when the user is outside. However it forms a spatial envi- ronment when the user enters it (e.g., building, room, and earth). 2.2 Augmenting HTML Elements with CSS Elements in a HTML document have been rendered to a 2D plain page and their layout is defined by Cascading Style Sheets (CSS). HTML elements describing POIs and their augmented model must be rendered in 3D environment, not on a Page. For this reason, we have developed a place-based augmentation model to situate the virtual objects to physical world in human scale. The AR contents are situated to a Place (e.g., building or room) providing its own CRS and POI map. To manage 3D augmentation model, we ex- tended CSS media type to Place media[3]. Place media module selects the HTML element to augment to the target POI and its 3D position and orientations. A CSS rule has two main parts: a selector, and one or more dec- larations consisting of property-value pairs. A selector determines the valid elements to apply the CSS declarations. -ar-target property is used for specifying the target POI or place. With this property, the selected HTML element #about tower is trans- formed relative to the target POI. -ar-transform applies 3D transform to the elements using three methods: translate(), rotate(), 131 IEEE Virtual Reality 2013 16 - 20 March, Orlando, FL, USA 978-1-4673-4796-9/13/$31.00 ©2013 IEEE

Upload: steven

Post on 11-Mar-2017

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2013 IEEE Virtual Reality (VR) - Lake Buena Vista, FL (2013.3.18-2013.3.20)] 2013 IEEE Virtual Reality (VR) - Webizing mobile AR contents

Webizing Mobile AR ContentsSangchul Ahn∗

University of Science and Technology

Heedong KoKorea Institute of Science and Technology

Steven FeinerColumbia University

ABSTRACT

This paper presents a content structure to build mobile AR applica-tions in HTML5 to achieve a clean separation of mobile AR con-tents from their application logic to scale like the web. By ex-tending POIs (Point of Interest) to objects and places with Uni-form Resource Identifier (URI), we could build objects of inter-est for mobile AR application as DOM (Document Object Model)elements and control their behavior and user interactions throughDOM events. Using our content structure, a mobile AR applica-tions can be developed as normal HTML documents seamlesslyunder current web eco-system.

Index Terms: K.5.1 [Information Interface and Presentation (e.g.,HCI)]: Multimedia Information Systems—Artificial, augmented,and virtual realities; K.5.4 [Information Interface and Presentation(e.g., HCI)]: Hypertext/Hypermedia—Architectures

1 INTRODUCTION

Most mobile AR applications have been developed in two extremecases: one at a small scale indoors on a table top or in a room envi-ronment with a handful of objects in place, and the other at a verylarge scale outdoors with mostly location and orientation informa-tion with little or no object recognition and tracking. Recently, in-door AR applications are getting off of the desktop and becomingmobile to the next room, across the hallway, and to other buildingsor across town. These mobile AR applications must be aware oftheir own location context in order to bring the objects of interest tobear dynamically for augmenting.[2]

KARML[4] extended KML, location-based contents standard byOGC, to include the AR XML elements. KML is very popularthrough Google Maps and Google Earth applications but it is notbased on HTML so the interoperability between KARML and ex-isting web contents is limited. Our approach is to make use of ex-isting HTML standard but extend the use of URI to identify andreference physical objects and places. Any HTML DOM elementscan be linked to these physical URI’s and DOM eventing mech-anism can be handled by javascript layer for behaviour and userinteraction. X3DOM[1] also attempts to integrate X3D scene el-ements as HTML DOM elements and events seamlessly like us.However, it does not have the notion of physical URI’s as objectsand places of interest for the in-situ AR and no notion of locationcontext. In our approach, location-based AR contents can be inter-mixed within regular HTML documents by augmenting any HTMLDOM element with its location context through extension of CSS3.

There are some commercial HTML5-based AR platforms likeJunaio and ARchitect. Junaio provides AREL as a javascript ARAPI for location-based AR and GLUE as its XML extension forimage recognition-based AR. ARchitect is another approach ofjavascript SDK for AR by Mobilizy. Both provide javascript li-brary for building AR application with location and image recog-nition based tracking. Their approach introduces AR functional-ities procedurally through javascript objects. This procedural ap-

∗e-mail: [email protected]

proach makes AR contents difficult to intermix and interoperatewith standard HTML web contents and their interaction throughDOM event handling mechanism. Our AR content structure isbased on HTML and selection/binding mechanism of CSS withtheir associated javascript event handling of DOM events.

2 MOBILE AR CONTENT STRUCTURE IN HTMLWebizing mobile AR content refers to the process of combiningvirtual and real world objects through three components: use ofURI for dealing with physical objects, CSS Place Media module,and the DOM event extensions for situated interaction.

2.1 Point of InterestIn our model, POI refers to physical object and it is viewed as aphysical resource, an extension of web resource. It is assumedthat any POI has its URI and accessible through HTTP. Howeveridentifying and tracking physical objects require the feature sam-ples suited for the recognition method. To address this, we usethe REST (REpresentational State Transfer) to provide the varioustypes of feature samples as a representation of its URI via HTTPcontent negotiation. And the browser can choose one of availabletracking methods and request the feature sample data to URI of tar-get POI. As a result, the AR content does not need to include allfeature samples for tracking but just refer to URIs of target POIs.

Place is an extension of POI that can contain other POIs theirlocations. In addition to the default properties of POI, they haveseveral environmental properties: boundary, Local Coordinate Ref-erence System (LCRS), and local POI map. Boundary is the ge-ometry separating the inside/outside of the place. The subspaceof place is defined a LCRS description represented to a simpli-fied schema of Geography Markup Language (GML) CRS packagemodel.[6] A local POI map contains the locations of POIs withinthe place boundary. By this extension, the space partitioning treeof the physical world can be constructed recursively. The scope isdynamically determined by the user location. The place is regardedas a POI when the user is outside. However it forms a spatial envi-ronment when the user enters it (e.g., building, room, and earth).

2.2 Augmenting HTML Elements with CSSElements in a HTML document have been rendered to a 2D plainpage and their layout is defined by Cascading Style Sheets (CSS).HTML elements describing POIs and their augmented model mustbe rendered in 3D environment, not on a Page. For this reason, wehave developed a place-based augmentation model to situate thevirtual objects to physical world in human scale. The AR contentsare situated to a Place (e.g., building or room) providing its ownCRS and POI map. To manage 3D augmentation model, we ex-tended CSS media type to Place media[3]. Place media moduleselects the HTML element to augment to the target POI and its 3Dposition and orientations.

A CSS rule has two main parts: a selector, and one or more dec-larations consisting of property-value pairs. A selector determinesthe valid elements to apply the CSS declarations. -ar-targetproperty is used for specifying the target POI or place. With thisproperty, the selected HTML element #about tower is trans-formed relative to the target POI. -ar-transform applies 3Dtransform to the elements using three methods: translate(), rotate(),

131

IEEE Virtual Reality 201316 - 20 March, Orlando, FL, USA978-1-4673-4796-9/13/$31.00 ©2013 IEEE

Page 2: [IEEE 2013 IEEE Virtual Reality (VR) - Lake Buena Vista, FL (2013.3.18-2013.3.20)] 2013 IEEE Virtual Reality (VR) - Webizing mobile AR contents

and scale(). This property has come from the CSS 3D transform butdiffers in the dynamic determination of origin with -ar-target.

��������������� �������������������������������� �������������� �!��� ���"��� ������������#$� �������%�&��������%����� '()*+)',#$� �������%�&����������� ',)-')*.)*+(���#$� �������%�&���������� *)*)*#$/� �������0�������������������������������%����������� �������������%���"�-������0���� �-��� 0�����

2.3 DOM Event Extensions for Situated InteractionThe DOM level 2 Event specification defines the primitive inter-face to generate, handle, and register event listeners.[5] By default,five UI events are provided: FocusEvent, MouseEvent, Keyboard-Event, WheelEvent, and CompositionEvent. Recently TouchEventhas been introduced to support mobile devices. To address the re-quirements for situated interaction, we have extended UI events:PlaceEvent, PoiEvent, and SightEvent.PlaceEvent occurred when the place is changed by the move-

ment of user. During a situated interaction, the environment contextis changed when the user moves from one place to another place. Itis determined by continuous containment checking of the boundaryof nearby places and the user’s position. A PlaceEvent contains aplace identifier, timestamp, and the local CRS description. It hasthree event types: placein, placeout, and placechanged.PoiEvent is generated when the state of POI is changed. When

the user enters a certain place, POIs in the place are also situated.The initial state comes from the place URI using the restful ser-vice invocation and is updated by trackers. A PoiEvent includes aPOI identifier, timestamp, and its position. It has three event types:poiin, poiout, and poimove.SightEvent is created when the target POIs enter or leave the

view frustum. It is motivated from MouseEvent, one of most fre-quently used UI events. Like mousein, mouseout, and mouseoverevents are fired according to the spatial relationship between thepointer and the visible HTML elements, SightEvent is created whenthe virtual and physical objects enter or leave the view frustum. ASightEvent contains a URI, its position, and timestamp. It also hasthree event types: targetin, targetout, and targetmove.

3 IMPLEMENTATION

To validate the proposed content model, we have developed a proto-type of mobile AR web browser for iOS and Android. Here we ex-plain the distinguished components as contrasted with typical page-based web browsers. It consists of four major components: targetmanager, positioning subsystems, place manager, and situated lay-out engine. Target manager picks the target POIs from stylesheetsand gets the feature samples through REST access to their URIs.The types of feature samples are extensible for the supported track-ing methods. Currently RFID and image recognition features aresupported. Positioning subsystem determines position and orienta-tion of the user and the movable POIs. Its major components areclassified into Mapper and Tracker. Mappers correlate the differ-ent location representations from diverse local positioning systemsinto LCRS of the situated place. Tracker includes WifiTracker, Mo-tionTracker, and ImageTracker. WifiTracker deals with the WiFifingerprint-based positioning engine. MotionTracker brings thevalue of gyro and compass of mobile device. ImageTracker is theimage recognition-based tracker. We have two separated Image-Tracker implementations, KistNftImageTracker and VuforiaImage-Tracker based on Qualcomm Vuforia. Place manager serves the

descriptions of place environment and the local POI map. Situatedrenderer manages the unified scene graph to combine physical ob-jects, virtual elements, and user viewpoint into the situated place.

Situated Renderer

Physical Resources

Place Manager

.

.

.

Room

Building

Country

z

x

y

V

P

U

SituatedPlace Model

Positioning Subsystem

Mapper Tracker

HTML5 Mobile AR Applications

HTML Elements CSS Place Media AR DOM Events

- Place Description- POI Map

http://example.com/obj1

Camera

CSS 3D Transforms

Targetmanagement

WebGL

The Insight: Mobile AR Browser

V: Virtual ResourceP: Physical ResourceU: User Viewpoint

To address the device-specific requirements Adobe PhoneGapis used as the hybrid mobile application platform that providesJavaScript APIs to access the device capabilities and the plug-inarchitecture to extend additional native functions.

4 CONCLUSION

‘Webizing’ mobile AR contents cannot be achieved partially byembedding HTML and XML tag sets, but rather by fundamentallyadapting AR content model according to web principles of separa-tion of content from its structure and behavior. We have presenteda HTML-based mobile AR content model without introducing anynew tag sets and their structure and behavior can be captured withinCSS and javascript framework of HTML5 so that the current page-based web contents can be leveraged for mobile AR contents usingweb eco-system fully.

ACKNOWLEDGEMENTS

This research is supported by Ministry of Culture, Sports andTourism (MCST) and Korea Creative Content Agency (KOCCA)in the Culture Technology (CT) Research & Development Program2012.

REFERENCES

[1] J. Behr, Y. Jung, T. Drevensek, and A. Aderhold. Dynamic and in-teractive aspects of x3dom. In Proceedings of the 16th InternationalConference on 3D Web Technology, Web3D ’11, pages 81–87, NewYork, NY, USA, 2011. ACM.

[2] T. Hollerer, S. Feiner, T. Terauchi, G. Rashid, and D. Hallaway. Ex-ploring mars: developing indoor and outdoor user interfaces to a mo-bile augmented reality system. Computers & Graphics, 23(6):779–785,1999.

[3] B. J. Lie, H. W. Css3 paged media module, work in progress.[4] B. MacIntyre, A. Hill, H. Rouzati, M. Gandy, and B. Davidson. The ar-

gon ar web browser and standards-based ar application environment. InMixed and Augmented Reality (ISMAR), 2011 10th IEEE InternationalSymposium on, pages 65 –74, oct. 2011.

[5] F. P. Miller, A. F. Vandome, and J. McBrewster. DOM Events. AlphaPress, 2010.

[6] D. Misev, M. Rusu, and P. Baumann. A semantic resolver for coordinatereference systems. In Proceedings of the 11th international conferenceon Web and Wireless Geographical Information Systems, W2GIS’12,pages 47–56, Berlin, Heidelberg, 2012. Springer-Verlag.

132