a middleware infrastracture for building mixed reality ... · of the technology [3]. for example, a...

A Middleware Infrastracture for Building Mixed Reality Applicationsin Ubiquitous Computing Environments

Eiji TOKUNAGA Andrej van der ZeeMakoto KURAHASHI Masahiro NEMOTO

Tatsuo NAKAJIMADepartment of Information and Computer Science, Waseda University

3-4-1 Okubo Shinjuku Tokyo 169-8555, JAPANTEL&FAX:+81-3-5286-3185

{eitoku,andrej,mik,nemoto,tatsuo}@dcl.info.waseda.ac.jp

Abstract

Mixed reality is one of the most important techniquesto achieve the vision of ubiquitous computing. Tradi-tional middleware for mixed reality provide high level ab-straction to hide complex algorithms for analyzing videoimages, but applications programmers still need to takeinto account distribution and automatic reconfigurationwhen developing mixed reality applications for ubiquitouscomputing.

Our middleware infrastructure hides all the complex-ities to build mixed reality applications for ubiquitouscomputing. Therefore, the development does not requireadvanced skills for ubiquitous computing. The paper de-scribes the design and implementation of our infrastruc-ture, and presents some scenarios and the current statusshowing its effectiveness.

1 Introduction

Mixed reality[3] is a promising technique for real-izing the enhancement of our real world by superim-posing computer generated graphics on video images.The technique is important in ubiquitous computingenvironments[1, 20, 25, 27] to enhance our real world byusing information in cyber spaces. However, in ubiqui-tous computing environments, application programmersneed to deal with ultra heterogeneity to support variousdevices and environments, and handling continuous me-dia such as audio and video is very hard. Also, they needto take into account complex issues such as distributionand dynamic reconfiguration that increase developmentcost of continuous media ubiquitous computing applica-tions. To solve the problem, it is important to providea middleware infrastructure to hide the complexities tomake it easy to develop the applications.

This paper describes the design and implementationof a software infrastructure for building mixed reality ap-plications in ubiquitous computing environments. Tra-ditional toolkits for mixed reality provide high level ab-straction that makes it easy to build mixed reality appli-cations, but application programmers still need to takeinto account distribution and automatic reconfigurationthat make the development of applications very hard.The high level abstraction provided by our software in-frastructure hides these complex issues from applicationprogrammers. Therefore, the cost to develop mixed re-ality applications will be reduced dramatically by usingour software infrastructure. Although our paper focuseson how our system is used to build mixed reality appli-cations for ubiquitous computing, our middleware canalso be used to build many other ubiquitous computingapplications that deal with continuous media.

The remainder of this paper is structured as follows.In Section 2, we describe related work and compare ourframework characteristics with existing middleware. In

Section 3, we show the design issues of our infrastruc-ture. Section 4 presents the design and implementationof our middleware for distributed mixed reality. Section5 presents two scenarios that show the effectiveness ofour approach. In Section 6, we describe the current sta-tus of our system. In Section 7, we discuss strengthsand weaknesses of our current design. We conclude thepaper in Section 8.

2 Related Work

ARToolkit[2] is a software library that allows us to de-velop mixed reality applications easily. It provides sev-eral functions to detect square formed visual markers in avideo frame and superimpose OpenGL based 3D Objecton the markers in the video frame. ARToolkit is quiteuseful for Mixed Reality prototyping, but it does notprovide distributed programming framework and hetero-geneous device adaptation. We implemented continuousmedia components for mixed reality by reusing programsprovided by the ARToolkit. Therefore, we can utilizemost of ARToolkit functions in our distributed multime-dia programming model and dynamic adaptation frame-work.

DWARF[4] is a component based framework for dis-tributed mixed reality applications using CORBA. Oursystem also use CORBA for communication infrastruc-ture. In this aspect, our framework is very similar toDWARF. However, our system is different from DWARFsince our system offers automatic reconfiguration to de-velop mixed reality applications suitable for ubiquitouscomputing. It is very essential part of our framework be-cause dynamic adaptation according to application con-text is one of the main issues in ubiquitous computing.

The VuSystem[16] is a framework for compute-intensive multimedia applications. It is divided into anin-band partition and an out-of-band partition. Theout-of-band partition is written in Tcl and controlsthe in-band media processing modules written in C++.Compute-intensive means that computers perform anal-ysis on multimedia data, and can take actions based onthe findings. In our framework, we intend to use visualmarker information contained within video frames moreextensively. A visual marker might contain any kind ofinformation.

Infopipes[15] proposes an abstraction for building dis-tributed multimedia streaming applications. Compo-nents such as sources, sinks, buffers, and filters are de-fined, and multimedia applications are built by connect-ing them. In our framework, we explicitly specify theconnection among components like Infopipes, but theconnections are dynamically changed according to thecurrent situation.

Fault Tolerant CORBA specification[23] allows us tocreate a replicated object to make a service highly re-liable. In the specification, when we adopt the pri-

1

Proceedings of the First Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services (MobiQuitous’04)

0-7695-2208-4/04 $20.00 © 2004 IEEE

mary/backup scheme, one of the replicated objects actu-ally receive a request. The primary replica is specified inan object reference that is passed to a client. When theobject reference becomes invalid, the reference to the pri-mary replica is returned by using the location forwardmechanism in the IIOP protocol. The scheme is verysimilar to our automatic reconfiguration support.

A programmable network[5] allows us to change thefunctionalities of the network according to the charac-teristics of each applications. Each entity in a pro-grammable network, like a router, has a programmableinterface designed to change the functionalities. In ourapproach, an application can configure each continuousmedia component according to the characteristics of theapplication. The capability is similar to a programmablenetwork.

The LocALE[18] framework provides a simple man-agement interface for controlling the life cycle of CORBAdistributed objects. It extends mobility support to theCORBA life cycle management mechanism. Objects canbe moved to anywhere in a location domain by the ex-plicit request from a client. On the other hand, ourframework provides implicit stream reconfiguration byspecifying reconfiguration policy.

3 Design Issues

3.1 Mixed Reality

Mixed reality1 is a technology concerned with super-imposing computer generated graphics into video imagescapturing the real-world. Several mixed reality applica-tions have been developed and proved the effectivenessof the technology [3]. For example, a surgeon traineecan use the technique to visualize instructions during anoperation[12], or a mechanic can use it as a tool for themaintenance and repair of complex machinery[7]. Also,NaviCam[26] has shown that the technology can be usedfor building many ubiquitous computing applications toenhance our daily life.

Developing mixed reality applications is not easy be-cause of complex algorithms needed for the analysis ofvideo streams and the generation of graphical images.Middleware like ARToolkit[2] and DWARF[4] have beendeveloped to reduce the effort of programmers, but theydo not satisfy the requirements for building mixed real-ity applications for ubiquitous computing, as describedin the next section.

3.2 Requirements for Mixed Reality forUbiquitous Computing

Developing mixed reality applications for ubiquitouscomputing, the programmer is faced with complexitiesinherent to ubiquitous computing environments. Exist-ing mixed reality toolkits such as the ARToolkit[2] arenot designed for such environments, and do not addressthese complexities. We found that the following two re-quirements must be satisfied for building mixed realityapplications in ubiquitous computing environments.

High-Level abstraction to hide heterogeneity:Ubiquitous computing environments consist of varioustypes of computers and networks. Networks may con-tain a mix of resource-constrained and specialized com-puters. Some computers may not be appropriate forprocessing heavy computation like video analysis. Forexample, cellular phones and PDAs are not appropriatefor heavy computations, but they might want to uti-lize mixed reality features. Also, in ubiquitous com-puting environments, we need to use various types of

1 Some researchers use the term augmented reality rather thenmixed reality.

devices. For example, continuous media applicationsfor ubiquitous computing should take into account vari-ous types of cameras, displays, microphones, and speak-ers. Therefore, application programmers may developa different application program for each platform anddevice. A middleware providing high-level abstractionto hide such differences from application programmersis necessary[19, 21] in order to reduce the developmentcosts.

Automatic reconfiguration to cope with an envi-ronmental changes: In ubiquitous computing en-vironments, there will be many cameras and displayseverywhere, we believe that a middleware infrastructureshould provide a mechanism to support dynamic configu-ration to change machines executing components accord-ing to the current situation. Mixed reality applicationsin such environments should be able to select the mostsuitable device according to our current situation. Forexample, a user may want to see a video stream capturedby a camera nearest to him on his cellular phone’s dis-play. However, implementing automatic reconfigurationin an application directly is very difficult. An applicationprogrammer does not want to be concerned with suchcomplexities and therefore we believe that it is desirableto handle automatic reconfiguration in our framework.

4 Middleware supporting Mixed Reality

In this section, we describe the design and implemen-tation of MiRAGe (Mixed Reality Area Generator), themiddleware we have developed to support mixed realityfor ubiquitous computing.

4.1 Overview of Architecture

Figure 1: Overview of MiRAGe Architecture

MiRAGe consists of the multimedia framework, thecommunication infrastructure and the application com-poser, as shown in Figure 1. The multimedia framework,described in Section 4.2, is a CORBA-based componentframework for processing continuous media streams.The framework defines CORBA interfaces to config-ure multimedia components and connections among thecomponents.

Multimedia components supporting mixed reality canbe created from the MR class library. The library con-tains several classes that are useful to build mixed re-ality applications. By composing several instances ofthe classes, mixed reality multimedia components can beconstructed without taking into account various complex


0-7695-2208-4/04 $20.00 © 2004 IEEE

algorithms realizing mixed reality. The MR class libraryis described in Section 4.4.

The communication infrastructure based on CORBA,described in Section 4.3, consists of the situation traderand OASiS. The situation trader is a CORBA servicethat supports automatic reconfiguration, and is collo-cated with an application program. Its role is to managethe configuration of connections among multimedia com-ponents when the current situation is changed. OASiSis a context information database that gathers contextinformation such as location information about objectsfrom sensors. Also, in our framework, OASiS behaveslike as a Naming and Trading service to store objects ref-erences. The situation trader communicates with OASiSto detect changes in the current situation.

Finally, the application composer, written by an ap-plication programmer, coordinates an entire application.A programmer needs to create several multimedia com-ponents and connect these components. Also, he speci-fies a policy on how to reconfigure these components toreflect situation changes. By using our framework, theprogrammer does not need to be concerned with detailedalgorithms for processing media streams because thesealgorithms can be encapsulated in existing reusable mul-timedia components. Also, distribution is hidden by ourCORBA-based communication infrastructure, and auto-matic reconfiguration is hidden by the situation traderservice. Therefore, developing mixed reality applicationsbecomes dramatically easy by using our framework.

MiRAGe satisfies the requirements described in theprevious section in the following way.

High-Level Abstraction: MiRAGe provides a mul-timedia framework for constructing mixed reality com-ponents in an easy way. Complex programs like de-tecting visual markers and drawing 3D objects are en-capsulated in respective multimedia components. Also,detailed characteristics about respective devices are en-capslated in components that offer common interface.All components offer an identical CORBA interface forstandardized inter-component access. In our framework,a complex distributed and automatically reconfigurablemixed reality application can be developed by writ-ing the application composer program that composesreusable multimedia components.

System-Level Automatic Reconfiguration: Inthe MiRAGe framework, the communication infrastruc-ture is designed as a CORBA compliant system thatsupports automatic reconfiguration. The infrastructuresupports user mobility by automatically updating ob-ject references and reconfiguring media streams. Also,the infrastructure allows us to select the most suitablecomponent to process media streams automatically andto reconnect the component, according to the character-istics of each computer platform and the situation of auser by specifying policies. However, an application pro-gram needs not to take into account how an applicationchanges the configuration due to the current situation’schanges by using our middleware infrastructure.

4.2 Multimedia Framework

The main building blocks in our multimedia frame-work are software entities that internally and externallystream multimedia data in order to accomplish a cer-tain task. We call them multimedia components. In thissection, we describe the components in more detail andprovide programs to illustrate how a developer can con-figure the multimedia components.

4.2.1 Multimedia Components

A multimedia component consists of a CORBA interfaceand one or more multimedia objects. For example, Figure1 shows three connected components: One componentthat contains a camera source object for capturing videoimages, one component that contains the MRDetectorand MRRenderer filter objects for implementing mixedreality functionality as described in Section 4.4, and onecomponent that contains a display sink object for show-ing the mixed reality video images.

In a typical component configuration, video or au-dio data are transmitted between multimedia objects,possibly contained by different multimedia components,running on remote machines. Through the CORBAverb—MConnIface— interface, as described in the nextsubsection, connections can be created in order to con-trol the streaming direction of data items between mul-timedia objects. Multimedia components register them-selves at the CORBA Naming Service under a user-specified name.

4.2.2 CORBA Interface

A component can be remotely accessed through one ofthree CORBA interfaces: MCompIface, MConnIface andMServIface.

The MCompIface interface is added to the componentto provide a single object reference through which refer-ences can be obtained to other CORBA interfaces. Thebenefits of adding such an interface is to give clients ac-cess to all inter-component functionality through a singlereference. In addition, the MCompIface interface pro-vides functions to query individual objects and the com-ponent as a whole. The MCompIface interface is identicalto all components.

The MConnIface interface provides methods to es-tablish connections between objects, possibly containedby different multimedia components, running on remotesites. More specific, the interface provides functions tocreate sockets for inter-component streaming, updatingthe streaming information managed by individual mul-timedia objects, and to start and stop streams. TheMConnIface interface is also identical to all components.

The MServIface interface provides methods for con-trolling specific multimedia objects within a multimediacomponent. Clients may find it useful to query and/orchange the state of a multimedia object. For example, aclient may want to query a display object for the resolu-tions it supports and may want to change the resolutionto its needs. The MServIface interface varies from com-ponent to component, depending on the internal multi-media objects it contains.

The interfaces are part of the module IFACE and arewritten in CORBA IDL. Here follows a snapshot of themodule2 :

module IFACE{

interface MConnIface{

ObjectId createServerSocket(out SocketInfo info)ObjectId createClientSocket(in SocketInfo info)

void addStreamInfo(in ObjectId nTargetId, in StreamInfo info)

void startStream(in ObjectId nTargetId, in StreamId nStreamId)void stopStream(in ObjectId nTargetId, in StreamId nStreamId)

};

interface MCompIface{

MConnIface getConnIface();MServIface getServIface();

boolean isInput(in ObjectId nTargetId)boolean isOutput(in ObjectId nTargetId)

DataType getDataType(in ObjectId nTargetId)};

2 The Services interface is not included since it varies for dif-ferent component configurations. Also, detailed definitions aboutdata types and exceptions are omitted.


0-7695-2208-4/04 $20.00 © 2004 IEEE

};

4.2.3 Multimedia Objects

In our approach, the central focus is the stream of datafrom data producers to data consumers through zero ormore data manipulators similar to VuSystem[16]. Dataproducers typically are interfaces to video or audio cap-ture hardware or media storage hardware. In our frame-work we call them sources. Data manipulators performoperations on the media data that runs through them.Data manipulators get their data from sources or otherdata manipulators and stream the modified data to aconsumer or another manipulator. In our framework wecall them filters. Data consumers are multimedia objectsthat eventually process the data. Data consumers typ-ically interface to media playback devices or to mediastorage devices. In our framework we call them sinks.

More concrete, our framework provides the abstractclasses MSource, MFilter and MSink3 . Developers ex-tend the classes and override the appropriate hook-methods to implement functionality. Multimedia objectsneed only to be developed once and can be reused in anycomponent.

The multimedia framework defines two special-ized classes of multimedia objects for handling inter-component data streaming, namely MClientSocket andMServerSocket. Socket objects can be created and con-nected through the appropriate function calls defined inthe CORBA MConnIface interface.

4.2.4 Streams

A typical mixed reality component might contain a filterobject that adds digital images to video frames at speci-fied positions within the frames. Different client compo-nents may want to use the service at the same time bysending video frames to the component and afterwardsreceiving it for playback. This implies that different dataitems streamed through filter objects within multime-dia components might have different destinations. Solelysetting up direct connections between objects does notsatisfy the above described scenario. If each client wouldbe connected to the filter object as a destination, howdoes the filter object know which data is to be send towhich destination?

To solve the above issue we do not use direct connec-tions between multimedia objects. Rather, we assign aunique stream identifier to each stream and use streamtables managed by output objects to hold stream direc-tion information. Source objects add a stream identifierto each data item they produce, identifying the streamthe data is part of.

The stream table managed in the outport of each out-put object store tuples of type [StreamId, ObjectId].The stream identifier sent with each data item is usedfor finding the target multimedia object. If found, thedata is send to the inport of the target object and putin the appropriate buffer, also identified by its streamidentifier.

Our framework defines a MStream class that acts as afacade for the primitive low-level CORBA MConnIfaceinterface functions. It provides easy means for the devel-oper to set up a distributed stream between multimediaobjects. The interface defines methods for setting thesource and sink object of a stream and adding one ormore filters. In addition, methods are added to startand stop a media stream.

In order to identify a multimedia object withina component, the framework assigns a unique ob-ject identifier to each object after it is added tothe component. Universally, we use a tuple of type

3 The M preceding the class names indicate that they are partof the framework and stands for multimedia.

[MCompIface, ObjectId] to denote one specific object.Such universal identifiers are used by MStream objectsto compose a stream.

4.2.5 Component Configuration

In our framework, we use a component abstraction thathides much of the details that deal with CORBA andstreaming. By extending the abstraction, a developercan configure a component. More specific, a developerspecializes the MComponent class provided by the frame-work. In its constructor it typically creates multimediaobjects, possibly creates a stream and finally adds theobjects to the container component. a program for theexample component in Figure 2 might look somethinglike this:

Figure 2: Example Component

MyComponent::MyComponent():MComponent()

{m_pCamera = new Camera;m_pSwapper = new RBSwapper;m_pDisplay = new Display;

addObject(m_pCamera);addObject(m_pSwapper);addObject(m_pDisplay);

}

MCompIface pComponent = MNaming::resolve(‘‘Some_Name’’);

MStream stream(MStream::NORMAL);stream.setSource(pComponent, 1);stream.addFilter(pComponent, 2);stream.setSink(pComponent, 3);stream.start();

The above illustrative code retrieves a CORBA objectreference from the Naming Service registered under thename Some_Name and assumes such a reference exists.Next, a stream with normal priority is set up between theparticipating multimedia objects4 . After the stream isstarted, data is streamed from the camera to the displayobject through the red-blue swapper.

4.3 Communication Infrastructure

Our CORBA-based communication infrastructureconsists of two subsystems, namely the situation traderand OASiS, as described in Section 3.1. Complex is-sues about automatic reconfiguration are handled by thesituation trader and therefor hidden from the applica-tion programmer. A configuration manager, owned bythe situation trader, manages stream reconfiguration byupdating connections between multimedia objects. Thesituation trader is linked into the application program.

In our framework, a proxy object in an applicationcomposer refers to an Adaptive Pseudo Object or APO,managed by the situation trader. Each APO is managedby exactly one Pseudo Object Manager or POM thatis responsible for the replacement of object referencesby receiving a notification message from OASiS uponsituation change.

4.3.1 Automatic Invocation Forwarding

In our system, an application programmer uses a POMto access multimedia components. Each POM managesone APO, a CORBA object that has the same inter-face as its target multimedia object. The APO forwards

4 In the example we assume the object identifiers are known inadvance.


0-7695-2208-4/04 $20.00 © 2004 IEEE

client invocations to the most appropriate target com-ponent. The application programmer can specify a re-configuration policy with a POM to control the reconfig-uration strategy. Upon startup, a POM retrieves an ini-tial IOR from OASiS. OASiS finds the most appropriateobject reference from its context information databaseaccording to the registered reconfiguration policy andupdates the reference upon situation change.

Figure 3 shows how dynamic invocation forwardingworks, and the following describes the depicted sequencein more detail.

Figure 3: Automatic Invocation Forwarding

An application registers a reconfiguration policy witha POM (1). The POM passes the policy to OASiS (2).OASiS returns an initial IOR that is appropriate for thecurrent situation (3). The POM passes the IOR to itsAPO (4). The application requests the activation of theAPO (5). The application invokes a method on the APO(6). The APO returns a LOCATION FORWARD mes-sage containing the IOR in the POM (7). The applica-tion resends the previously issued request by using theenclosed IOR (8). When the current situation changes,OASiS notifies a new IOR that is appropriate for thenew situation (9). The POM updates the received IORin the APO (10). Then, the POM reverts the currentobject reference, and the object reference of a new targetobject is retrieved by using a LOCATION FORWARDmessage again (11). Thus, a client invocation is for-warded transparently according to the new situation(12,13, 14).

4.3.2 Situation Trader Service:

Figure 4 shows the relation between the situation traderand the continuous media framework. As described inthe overview, the situation trader is a CORBA servicethat manages the reconfiguration of multimedia compo-nents. This subsection presents in detail how the situa-tion trader works.

The following sample C++ code illustrates how anapplication program might look like:

CORBA::Object_var obj =orb->resolve_initial_reference("SituationTraderService"); //(1)

STrader::SituationTraderFactory_var factory =STrader::SituationTraderFactory_narrow(obj);

STrader::POManager_var camera_pom =factory->createPOManager("IFACE::MCompIface:1.0"); //......(2)

STrader::POManager_var display_pom =factory->createPOManager("IFACE::MCompIface:1.0");

STrader::ConfigurationManager_var com =factory->createConfigurationManager();

STrader::ReconfigurationPolicy camera_policy;STrader::ReconfigurationPolicy display_policy;

camera_policy.locationScope = "Distributed Computing Laboratory"camera_policy.locationTarget = "Eiji TOKUNAGA"camera_policy.locationContext = "nearest"camera_policy.serviceType = "Camera" ......(3)

display_policy.locationScope = "Distributed Computing Laboratory"display_policy.locationTarget = "Andrej van der Zee"display_policy.locationContext = "nearest"

Figure 4: Situation Trader Service

display_policy.serviceType = "Display"

camera_pom->setPolicy(camera_policy); //......(4)display_pom->setPolicy(display_policy);

IFACE::MCompIface_ptr camera_apo =camera_pom->activateAPObject();

IFACE::MCompIface_ptr display_apo =display_pom->activateAPObject();

MStream stream(MStream::NORMAL);stream.setSource(camera_apo, 1);stream.setSink(display_apo, 1);

StreamAdapter_i* adapter_i = new StreamAdapter_i(stream); //......(5)STrader::ConfigurationAdapter_ptr adapter = adapter_i->_this();

com->setAdapter(adapter); //......(6)com->addPOManager(camera_pom); //......(7)com->addPOManager(display_pom);

// Start streaming.stream.start();

As shown in Figure 4, the situation trader con-sists of a situation trader factory, a configurationmanager and several POMs. The object refer-ence to the situation trader is retrieved by invokingthe resolve_initial_reference method provided byCORBA (line 1).

The situation trader factory is used for creatingPOMs and the configuration manager (line 2). Themethod createPOManager expects as parameter the IDof the target object that specifies the object’s type, andreturns a reference to the POM that manages the APO.

A reconfiguration policy needs to be set for each POM(line 4). The policy is passed to OASiS through thePOM, and OASiS selects the most appropriate targetobject according to the policy.

In the current design, a reconfiguration policy hasthree location parameters, locationScope, locationTargetand locationContext (line 3). LocationScope denotesthe scope for selecting a suitable multimedia compo-nent. When a POM passes the policy to OASiS, OASiSsearches a target multimedia component in the speci-fied scope. LocationTarget specifies a physical objectused to represent the current situation. LocationCon-text specifies the relation between a target multimediacomponent and the physical object specified by location-Target. LocationTarget might be a person’s name or adevice’s name. Currently, locationContext can specify”nearest” and ”exact”. ”Nearest” means that the near-est multimedia component to a physical object specifiedby locationTarget should be selected. For example, iflocationContext is ”nearest” and locationTarget is ”EijiTOKUNAGA”, this pair means ”nearest to Eiji TOKU-NAGA”. ”Exact” means that a multimedia componentthat resides with a physical object specified by location-Target should be selected. We are considering to definemore policies in the future version of our framework.

Several POMs can be added to the configuration man-ager. The configuration manager retrieves the APOs


0-7695-2208-4/04 $20.00 © 2004 IEEE

from its registered POMs and controls automatic recon-figuration of the APOs. A stream adapter needs to beset for the configuration manager for automatic streamreconfiguration (line 6). When one of POMs is updatedby OASiS, the stream adapter reconfigures connectionsbetween multimedia components in order to reflect sit-uation change (line 5 to 7). Stream reconfiguration isexplained in more detail in the next subsection.

4.3.3 Stream Reconfiguration

A stream adapter controls the MStream object describedin Section 4.2. Upon situation change, a callback handlerin the stream adapter is invoked in order to reconfigureaffected streams through its MStream object.

Figure 5: Stream Reconfiguration

Figure 5 depicts how connections among multimediaobjects are changed in order to reconfigure an existingstream. In the figure, the camera object streams mediadata to the display object through the red-blue swapperrbs1. When current situation changes and the red-blueswapper rbs2 becomes the most appropriate object, thecallback handler for the stream adapter is invoked pass-ing the old and the new POM, that is rbs1 and rbs2.The callback handler updates the MStream object andrestarts the stream.

More concrete, the MStream object controlled by thestream adapter stops the old stream by removing itsstream identifier from its source object. Next, theold reference is replaced by the new one. Finally, thenewly configured MStream object is restarted. Inter-nally, restarting is done by setting up the appropriateTCP connections between remote components, updat-ing the stream information of the participating objectsand adding a new unique stream identifier to the sourceobject.

4.4 Mixed Reality Components

The MR class library, as shown in Figure 6, is part ofthe MiRAGe framework. The library defines multimediamixed reality objects for detecting visual markers in videoframes and superimposing graphical images on visualmarkers in video frames. These mixed realtity multime-dia objects are for a large part implemented using theARToolkit. Application programmer can build mixed re-ality applications by configuring multimedia componentswith the mixed reality objects and stream data betweenthem. In addition, the library defines data classes for thevideo frames that are streamed through the MR objects.

MRFilter is a subclass of MFilter and is usedas a base class for all mixed reality classes. Theclass MVideoData encapsulates raw video data. TheMRVideoData class is a specialization MVideoData andcontains a MRMarkerInfo object for storing informationabout visual markers in its video frame. Since differ-ent types of markers will be available in our framework,

Figure 6: Mixed Reality Class Library

the format of marker information must be defined in auniform way.

The class MRDetector is a mixed realty class and in-herits from MRFilter. The class expects a MVideoDataobject as input and detects video markers in theMVideoData object. The class creates a MRVideoDataobject and adds information about detected markersin the video frame. The MRVideoData object is sendas output. The class ARTkDetector is a subclass ofMRDetector that implements the marker detection al-gorithm using the ARToolkit.

The MRRenderer class is another mixed realityclass derived from MRFilter. The class expects anMRVideoData as input and superimposes graphical im-ages at positions specified in the MRMarkerInfo ob-ject. The superimposed image is send as output. TheOpenGLRenderer is a specialization of MRRenderer andsuperimposes graphical images generated by OpenGL.

The MRSensor class is a specialization of MFilterand sends the current marker information to OASiSfor the detection of the location of a physical object.ARTkSensor inherits from MRSensor and uses the AR-Toolkit for its implementation.

Figure 7: An Example MR Application

Figure 7 illustrates how mixed reality components canbe configured and connected in an application. In theexample, a visual marker attached to a person is cap-tured and superimposed by information about this per-sons profile before display.

In detail, the camera object sends MVideoData ob-jects, representing the captured video frames, to the vi-sual marker detector object. The detector object addsinformation about visual tags to a MRVideodata objectand sends it to the superimposer object. The superim-poser object substitutes digital images for visual markersand sends the superimposed video frames to the displayobject.

If a powerful computer is available, the two filter com-ponents for detection and superimposing can be mergedinto one component. Our framework can create multi-media components that are suitable for respective plat-forms.


0-7695-2208-4/04 $20.00 © 2004 IEEE

5 Sample Scenarios

This section describes two scenarios showing the ef-fectiveness of MiRAGe. In the first scenario, we describea follow-me application that dynamically changes cam-era and display devices according to user location. Inthe second scenario, we describe how mobile mixed real-ity can be used on less powerful devices such as PDAsand cellular phones.

5.1 A Follow-Me Application

In this section, we consider an application that re-ceives a video stream from a camera and displays it onthe nearest display to the user. As shown in Figure 8,there are two continuous media components. The firstone is a camera component, and the second one is adisplay component. The two components are connectedby an application composer. However, the actual dis-play component is changed according to user location.An application composer holds a POM managing sev-eral display objects and changes the target reference ofan APO to a display nearest to the user. A configu-ration manager reconfigures a stream when the currentsituation is changed.

Figure 8: A Follow-me Application

When the user moves, a location sensor detects themovement of the user. As a result, OASiS is notified bythe location sensor (1). OASiS notifies an IOR of thenearest display to the POM, then the POM changes thetarget reference in the APO (2). Therefore, a methodinvocation is forwarded to the nearest display compo-nent (3). In this case, when a callback handler in theconfiguration manager is invoked, the configuration ofthe stream is changed (4).

5.2 Mobile Mixed Reality

In a typical mobile mixed reality application, our real-world is augmented with virtual information. For ex-ample, a door of a classroom might have a visual tagattached to it. If a PDA or a cellular phone, equippedwith a camera and an application program for capturingvisual tags, the tags are superimposed by a schedule oftoday’s lecture.

We assume that in the future our environment willdeploy many mixed reality servers. In the example, thenearest server stores information about today’s lectureschedule and provides a service for detecting visual tagsand superimposing them by the information about theschedule, as depicted in Figure 9.

Other mixed reality servers, located on a street, mightcontain information about what shops or restaurants canbe found on the street and until how late they are open.

To build the application, an application composeruses components for capturing video data, detectingvisual markers, superimposing information on videoframes and displaying them. The application composercontacts a situation trader service to retrieve a referenceto a POM managing references to the nearest mixed re-ality server to a user. When he moves, a location sensor

Figure 9: Moblie Mixed Reality

component notifies sensed location information to OA-SiS, and OASiS notifies the situation trader to replacethe current object reference to the reference of the near-est mixed reality server. In this way, the nearest mixedreality server can be selected dynamically according tohis location, but the automatic reconfiguration is hiddenfrom an application programmer.

6 Current Status

In our current prototype, we are using C++ and om-niORB [17] for our CORBA-based communication in-frastructure. OmniORB is open source and very effi-cient. In our design of continuous media components,respective multimedia objects run in separate threads.Therefore, a fully multi-threaded CORBA compliantORB is required.

We describe the evaluation to show the effectivenessof our approach in this section. Also, we show an actualimplementation of a sample scenario described in theprevious section and present some discussions about thecurrent prototype implementation.

6.1 Evaluation of Distributed Mixed Real-ity

The section presents the evaluation showing the effec-tiveness of automatic reconfiguation supported by Mi-RAGe. The result shows the impact to delegate heavycomputation to a powerful server.

The picture shown in Figure 10 presents an evalua-tion environment of distributed mixed reality. The lap-top computer on the right side has PentiumIII 866MHzprocessor and 256MBytes memory. The left one is ahigh performance server computer which has Pentium41.9GHz processor and 512MBytes memory. Our infras-tructure is currently running on Linux. These computersare connected by the 100Base-T Ethernet. A user withthe laptop computer can watch superimposed video im-ages on its screen.

Figure 10: Three Cases of MR Stream

To evaluate the merit of distribution in our compo-nent based approach, we compared the performance in


0-7695-2208-4/04 $20.00 © 2004 IEEE

Figure 11: Processing Time for 2000 Frames

three cases shown in Figure 10. The graph in Figure 13shows the time required to display 2000 superimposedvideo frames on the laptop’s display when a data sourcegenerates 30 video frames per second. In the evaluation,“none” means that a captured video image by the serveris rendered by the server without mixed reality process-ing, “server” means that mixed reality processing runson the server, and “laptop”means that mixed reality pro-cessing runs on the laptop. The result shows that theprocessing time to analyze video images and to superim-pose a graphic image on them on the laptop computeris dramatically increased according to the data size. Onthe other hand, when using a powerful server to executemixed reality processing, the heavy computation doesnot affect the performance seriously. Therefore, our ap-proach that delegates heavy computation to a powerfulserver near a user will improve the performance of mixedreality applications significantly.

In the evaluation, we have adopted a high bandwidthnetwork, but when we use low bandwidth networks, wecan add a component to decrease the quality of videostreams to reduce the bandwidth before transmittingthe video streams to the low bandwidth networks. How-ever, we may use high handwidth networking adoptingUWB(Ultra Wide Band) technologies even by small mo-bile devices in the near future.

6.2 Moblie Mixed Reality Prototype

Figure 12: Mobile Mixed Reality Prototype

Figure 12 is a picture showing a prototype of a mo-bile mixed reality application. The PDA, that the per-son in the picture has in his hand, is a Compaq iPAQH3800 with a wireless LAN card and a TOSHIBA 2GBPCCARD hard disk. We attached a RFID tag to thisPDA for detecting its location. The refrigerator on theright side is a TOSHIBA IT refrigerator named Femin-

ity. The refrigerator is equipped with sensors that let usknow how many bottles are inside.

The following a simplified program for the mobilemixed reality prototype.

camera_policy.locationScope = "Laboratory"camera_policy.locationTarget = "Feminity"camera_policy.locationContext = "nearest"camera_policy.serviceType = "Camera"

filter_policy.locationScope = "Laboratory"filter_policy.locationTarget = "Feminity"filter_policy.locationContext = "nearest"filter_policy.serviceType = "MRFilter"

display_policy.locationScope = "Laboratory"display_policy.locationTarget = "Feminity"display_policy.locationContext = "nearest"display_policy.serviceType = "Display"

camera_pom->setPolicy(camera_policy);filter_pom->setPolicy(filter_policy);display_pom->setPolicy(display_policy);

MStream stream(MStream::NORMAL);stream.setSource(camera_apo, 1);stream.addFilter(filter_apo, 1);stream.addFilter(filter_apo, 2);stream.setSink(display_apo, 1);

Stream_i* adapter_i = new StreamAdapter_i(stream);STrader::ConfigurationAdapter_ptr adapter = adapter_i->_this();

com->setAdapter(adapter);com->addPOManager(camera_pom);com->addPOManager(filter_pom);com->addPOManager(display_pom);

stream.start();

In this scenario, when a user comes near the refriger-ator, the RFID reader recognizes the RFID tag attachedto the PDA. Then, the RFID reader sends the tag in-formation to OASiS. OASiS recognizes the situation’schange, and notifies the IOR of a display service runningon the PDA to the mobile mixed reality application thatshows video frames on the nearest display service to therefrigerator.

In this case, a mixed reality component that detectsvisual tags and superimposes digital images on videoframes is running on a powerful machine not shown inthe picture. In the example, the mixed reality compo-nent retrieves the number of bottles from the refrigeratorand superimposes a graphical image showing this num-ber.

Currently, a cellular phone in Japan has a camera,and the next version of the cellular phone will adopt so-phisticated operating systems such as Linux, and providea wireless LAN support. We believe that our middlewareinfurastructure can be used on the cellular phone to offernew services.

7 Discussions

We believe that our design is very effective. Espe-cially, the automatic reconfiguration of an application toreflect a situation change seems very promising. How-ever, in our approach, a number of issues still need to beaddressed. In this section, we will discuss strengths andweaknesses of our current design.

7.1 Reconfiguration Policy

In our approach, an application programmer needs tospecify reconfiguration policies to reflect application be-havior to his or her desire. Most infrastructure softwarefor ubiquitous computing adopt a different approach.For example, the Context Toolkit[6] offers a mechanism


0-7695-2208-4/04 $20.00 © 2004 IEEE

to deliver events upon situation change, making the de-velopment of context-aware applications more difficult.In our approach, we choose to hide as much detail aspossible from the programmer in order to reduce the de-velopment costs of ubiquitous computing applications.

Some policies are very difficult to implement, how-ever. For example, assume a location policy that alwayschooses a nearest service to user location. The presenceof a wall between a user and a server inflicts complica-tions regarding to implementation. Clearly, the nearestserver might not be the most suitable one. We need to in-vestigate a way to specify a policy that does not dependon sensor technologies to monitor situation information.

We are currently in the process of extending streamreconfiguration abstraction. We will adapt TOAST[8]like hierarchical concept of stream binding for recursivedynamic reconfiguration. In this concept, an applica-tion programmer needs to specify reconfiguration poli-cies for stream objects as binding object. For example,she can specify that the continuous media stream mustchoose a camera device nearest to the user and a fil-ter service nearest to the user. Then, the stream objectdynamically creates lower layer binding objects reflect-ing specific location-aware technologies such as RFID,Wireless LAN, Supersonic and Vision-Based Tracking.In this case, the binding object of the camera device isVision-Based Tracking and the another one can chooseany binding objects. We believe such a hierarchical bind-ing abstraction can provide appropriate high-level ab-straction and flexibility.

7.2 ID Recognition-Based Interaction

Using our framework, we have implemented a vision-based direct interaction application like u-Photo[14] . Inthis application, using camera-attached mobile devices,we can get the graphical user interface of controlling par-ticular embedded devices, such as remote control GUI ofVCR, through actions of taking images of visual markerson them. It is easy to take images by the mobile devicesand send them to the remote visual marker recognitionfilter due to our framework abstraction. In such a MixedReality application domain, however, applications haveto recognize adding, removing and updating of visualtags in the capturing images and handle events relevantto these actions.

The implementation of these functions needs someprogrammers’ effort, because our framework does notprovide any abstractions for firing and handling visualtag recognition events, although we can handle visualtag recognition processes in our Filter Object. In futurestudy, we will incorporate a high level event model forID recognition and utilization like Papier-Mache[13] intoour framework. The event flow in that model should bereconfigurate as same style as stream reconfiguration.

7.3 Media Recognition-Based Interaction

Now we have built a simple gesture recognition-basedapplication on our framework. That recognizes a user’shand movement and followly moves the active window onthe user’s nearest X-Window System. In this case, theapplication needs to accumulate several video frames andanalyze differences in them. It is easy to select a cameraobject to capture the user’s hand movement since it isdynamically adapted by an automatic reconfigurationpolicy specified as ”nearest to user”.

Other development costs rise, however, on the recog-nition process. Because the gesture recognition processon the Filter Object must initialize buffers to hold videoframes and manage them, while our framework does notprovide any programming model handling multiple videoframes. There are the same problems in most mediarecognition applications, such as speech recognition. We

need to take account of an appropriate programmingmodel handling multiple subsequences.

7.4 Component State Management

In our framework, we assume that continuous mediacomponents do not have states. Consequently, if multi-media components are reconfigurated as a result of a sit-uation change, restoring state information of new targetcomponents is not necessary. However, we found thatcomponents controlling devices might hold some stateinformation to configure device parameters such as theresolution of images. Currently, in our approach, theapplication composer restores such parameters after re-ceiving a change-of-situation notification. In this case,we must describe application specific restoring processfrom scratch when building new multimedia componentsand application composers. The approach increases thedevelopment cost. The state restoring processes shouldbe managed automatically.

Using the CORBA Persistent Service[22] and movingcomponents states according to stream reconfigurationmay be a solution if there are many stable ORBs pro-viding it, but there are not so many ORBs providingstable and fully interoperable PSS. Therefore, we planto utilize Ice(Internet Communication Engine)[9] insteadof CORBA as our next communication infrastructurebase. Ice provides a next-generation language indepen-dent object-oriented distributed environment and thebuild-in stable persistent state service.

7.5 System Independent Framework

Our framework provides the situation trader serviceto reconfigure connections among multimedia compo-nents automatically. Our approach does not requireto modify CORBA runtime. Thus, it is easy to portour framework in different CORBA systems and otherobject-oriented middleware such as Ice. The next designextensions described above will be ORB independent aswell.

On the other hand, the current implementation of ourcontext information database OASiS is rather systemdependent because it combines ORB specific servicessuch as CORBA Naming Service and CORBA Trad-ing Service. And it dose not provide system indepen-dent subscription and notification framework. There-fore, we need to redesign OASiS as a combination ofpublish/subscribe system like EventHeap[11] and con-text information database including naming service insystem independent way.

We believe XML-based messaging protocol likeXMPP[10] or Macromedia Flash XML Socket is appro-priate for both of publish/subscribe transaction and reg-istering naming service. These XML-based messagingprotocols have highly extensibility and system indepen-dence by XML’s nature. XML-based messaging con-text information database will be discussed in the futurestudy.

8 Conclusion

In this paper, we have described our middlewareframework to support mixed reality for ubiquitous com-puting. We have described design and implementationof our system, and presented some experiences with ourcurrent prototype system. Our experiences show thatour system is very useful to develop several mixed real-ity applications for ubiquitous computing.

In the future, we like to continue to improve ourmiddleware framework, and to develop attractive mixedreality applications such as game, navigation, and en-hanced communication applications. Currently, our sys-tem is running on Linux, and we like to exploit real-time


0-7695-2208-4/04 $20.00 © 2004 IEEE

capabilities provided by Linux to process video streamsin a timely fashion. Also, we are interested to take intoaccount to use a device proposed in [24] since the de-vice can augment the real world without a display byprojecting computer generated graphics on real objectsdirectly.

References

[1] G.D. Abowd, E.D. Mynatt, “Charting Past,Present, and Future Research in Ubiquitous Com-puting”, ACM Transaction on Computer-Human In-teraction, 2000.

[2] ARToolkit,http://www.hitl.washington.edu/people/grof/SharedSpace/Download/ARToolKitPC.htm.

[3] R.T. Azuma, “A Survey of Augmented Reality”,Presence: Teleoperators and Virtual EnvironmentsVol.6, No.4, 1997.

[4] Martin Bauer, Bernd Bruegge, et al.: Design ofa Component-Based Augmented Reality Framework,The Second IEEE and ACM International Sympo-sium on Augmented Reality, 2001.

[5] Andrew T. Campbell, Herman G. De Meer, MichaelE. Kounavis, Kazuho Miki, John B. Vicente, DanielVillela, “A Survey of Programmable Networks”,ACM SIGCOMM Computer Communications Re-view, Vol.29, No.2, 1999.

[6] A.K.Dey, G.D.Abowd, D.Salber, “A ConceptualFramework and a Toolkit for Supporting theRapid Prototyping of Context-Aware Applications”,Human-Computer Interaction, Vol.16, No.2-4, 2001.

[7] Steven Feiner, Blair MacIntyre, and Doree selig-mann. “Knowledge-based Augmented Reality”,Communications of the ACM 36, 7 (July 1993) ,52-62

[8] Fitzpatrick, T., Gallop, J., Blair, G.S., Cooper, C.,Coulson, G., Duce, D., Johnson, I., “Design andImplementation of TOAST: An Adaptive Multime-dia Middleware Platform”, Proceedings of IDMS’01,Lancaster, UK, September 2001.

[9] Michi Henning, “A New Approach to Object-Oriented Middleware,” IEEE Internet Computing,January-February 2004, pp 66-75

[10] IETF Internet-Draft “Extensible Messaging andPresence Protocol (XMPP): Instant Messaging andPresence,” http://www.ietf.org/internet-drafts/draft-ietf-xmpp-im-22.txt

[11] Brad Johanson and Armando Fox, “Extend-ing tuplespaces for coordination in interactiveworkspaces”, Journal of Systems and Software, v.69n.3, p.243-266, 15 January 2004

[12] Anantha R. Kancherla, Jannick P. Rolland, DonnaL. Wright, and Grigore Burdea. “A Novel VirtualReality Tool for Teaching Dynamic 3D Anatomy”,Proceedings of Computer Vision, Virtual Reality,and Robotics in Medcine ’95 (CVRMed ’95) April1995.

[13] Scott Klemmer, “Papier-Mache: Toolkit supportfor tangible interaction,” in proceedings of The 16thAnnual ACM Symposium on User Interface Soft-ware and Technology: UIST 2003 Doctoral Consor-tium.

[14] N. Kohtake, T. Iwamoto, G. Suzuki , S. Aoki, D.Maruyama, T. Kouda, K. Takashio, H. Tokuda, “u-Photo: A Snapshot-based Interaction Technique forUbiquitous Embedded Information,” Second Inter-national Conference on Pervasive Computing (PER-VASIVE2004), Advances in Pervasive Computing,2004

[15] R.Koster, A.P. Black, J.Huang, J.Walpole, andC.Pu, “Thread Transparency in Information FlowMiddleware”, In Proceedings of the IFIP/ACM In-ternational Conference on Distributed Systems Plat-forms, 2001.

[16] Christopher J. Lindblad, David L. Tennenhouse:The VuSystem: A Programming System forCompute-Intensive Multimedia, In Proceedings ofACM International Conference on Multimedia 1994.

[17] S Lo, S Pope, “The Implementation of a High Per-formance ORB over Multiple Network Transports”,In Proceedings of Middleware 98, 1998.

[18] Diego Lopez de Ipina and Sai-Lai Lo, “LocALE: aLocation-Aware Lifecycle Environment for Ubiqui-tous Computing”, In Proceedings of the 15th IEEEInternational Conference on Information Network-ing (ICOIN-15), 2001.

[19] T.Nakajima, “System Software for Audio and Vi-sual Networked Home Appliances on Commod-ity Operating Systems”, In Proceedings of theIFIP/ACM International Conference on DistributedSystems Platforms, 2001.

[20] T.Nakajima, H.Ishikawa, E.Tokunaga, F. Stajano,“Technology Challenges for Building Internet-ScaleUbiquitous Computing”, In Proceedings of theSeventh IEEE International Workshop on Object-oriented Real-time Dependable Systems, 2002.

[21] T.Nakajima, “Experiences with Building Middle-ware for Audio and Visual Netwoked Home Appli-ances on Commodity Software”, ACM Multimedia2002.

[22] OMG, “CORBAServices Specification,”http://www.omg.org/technology/documents/corbaservices spec catalog.htm

[23] OMG, “Final Adopted Specification for Fault Tol-erant CORBA”, OMG Technical Committee Doc-ument ptc/00-04-04, Object Management Group(March 2000).

[24] C.Pinhanez, “The Everywhere Display Projector:A Device to Create Ubiquitous Graphical Inter-faces”, In Proceedings of Ubicomp’01, 2001.

[25] K.Raatikainen, H.B.Christensen,T.Nakajima, “Applications Requirements for Mid-dleware for Mobile and Pervasive Systems”, MobileComputing and Communications Review, Octorber,2002.

[26] Jun Rekimoto, “Augmented Interaction: Interact-ing with the real world through a computer” , HCIInternational, 1995.

[27] M. Weiser, “The Computer for the 21st Century”,Scientific American, Vol. 265, No.3, 1991.


0-7695-2208-4/04 $20.00 © 2004 IEEE

a middleware infrastracture for building mixed reality ... · of the technology [3]. for example, a...

Documents