[ieee 2012 ieee virtual reality (vr) - costa mesa, ca, usa (2012.03.4-2012.03.8)] 2012 ieee virtual...

2
Acoustically Enriched Virtual Worlds with Minimum Effort Julia Fr¨ ohlich * Ipke Wachsmuth AI & VR Lab Faculty of Technology Bielefeld University ABSTRACT To improve user experiences and immersion within virtual environ- ments auditory experience has long been claimed to be of notable importance [1]. This paper introduces a framework, in which ob- jects, enriched with information about their sound properties, are being processed to generate virtual sound sources. This is done with an automatic processing of the 3D-scene and therefore mini- mizes the effort needed to develop a multimodal virtual world. In order to create a comprehensive auditory experience different types of sound sources have to be distinguished. We propose a differen- tiation into three classes: locally bound static sounds, dynamically created event based sounds, and ambient sounds to create spatial atmosphere. Index Terms: I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Virtual Reality; H.5.2 [Information Inter- faces and Presentation]: User Interfaces—Auditory (non-speech) feedback; H.5.5 [Information Interfaces and Presentation]: Sound and Music Computing—Methodologies and techniques 1 I NTRODUCTION Most virtual reality applications focus on realistic graphical output, while other modalities (e.g. sound) are left aside. Integration of realistic sound is a difficult and complex task since most methods do not meet requirements like real-time calculation. While similar problems in the graphics domain have been mastered by approxi- mation methods (e.g. the Phong Lighting model) years ago, in the fields of audio rendering there is a lack of such approaches. The biggest problem with designing a multimodal virtual world is that it is such a time consuming task because many factors have to be considered. Often this effort does not seem to be worthwhile with regard to the benefits like improved immersion. One approach to simplify the creation of virtual worlds is to semantically enrich virtual objects. This concept has proven to be efficient in creating Intelligent Virtual Environments (IVE’s) [5]. But until now this idea has mostly been used to store additional knowledge about the graphical representation. In this contribution we propose a framework where semantically annotated objects are being analyzed with regard to their potential properties (on the example of acoustic features). The idea behind this is to create ’smart objects’ that know how they have to sound. Even more, the object itself should decide in some points what an appropriate sound would be. Figure 1 shows the semantic enrich- ment of such objects by assigning descriptive values. To achieve a realistic auditory experience it is not sufficient to only generate sin- gle independent sounds, moreover different kinds of sound sources have to be distinguished. * e-mail: [email protected] e-mail: [email protected] <Object name='cave' url='cave.x3d'/> <MetadataString name='sound' value=‘cave‘ /> <Object name=‚ball' url='ball.x3d'/> <MetadataString name='sound' value=‘ball‘ /> <Object name='waterfall' url='waterfall.x3d'/> <MetadataString name='sound' value=‘waterfall‘ /> Figure 1: Semantically enriched virtual world on the example of a cave (ambient) a ball (possible event sound) and a waterfall (static). 2 SMART VIRTUAL SOUND SOURCES Basic properties of virtual sound sources are volume and direction. The volume can be calculated directly through the distance from the user inside the virtual environment. Doubling the distance results in a decrease of the sound level by 6 dB as a rule of thumb[2]. Since a realistic simulation of a virtual environment contains dif- ferent kinds of sound sources, further information have to be taken into account. These include the accompanying graphical object, a matching soundfile, a base volume, and the play mode. We propose a differentiation into three classes: locally bound static sounds, dy- namically created event based sounds, and ambient sounds to create an atmosphere. In order to make virtual sound sources smart, some kind of knowledge has to be provided by the object itself. This means the virtual object itself takes into account properties like size, weight or material when choosing an appropriate sound. The trans- formation from normal to smart sound sources will be discussed in more detail in section 3. The technical realization is based on the scenegraph structure of virtual environments. The position of the sound node and thus the direction to the user can be calculated directly by means of traver- sal. Many spatial audio rendering methods exist to create an appro- priate sound output with regard to the hardware setup. Static Sounds are the most common ones in virtual environ- ments, each closely coupled to a virtual object. In conjunction with this accompanying object some particular features have to be con- sidered. One the one hand there are simple objects like a waterfall, which emit constant sound without changing their position. On the other hand more dynamic objects move in space (e.g. a car) or don’t sound as constant (e.g.a singing bird). Technically static sounds are easy to implement, because of the scene graph structure the changing position of moving objects is always updated automatically. The rate at which an object emits a 147 IEEE Virtual Reality 2012 4-8 March, Orange County, CA, USA 978-1-4673-1246-2/12/$31.00 ©2012 IEEE

Upload: ipke

Post on 11-Apr-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2012 IEEE Virtual Reality (VR) - Costa Mesa, CA, USA (2012.03.4-2012.03.8)] 2012 IEEE Virtual Reality (VR) - Acoustically enriched virtual worlds with minimum effort

Acoustically Enriched Virtual Worlds with Minimum EffortJulia Frohlich! Ipke Wachsmuth†

AI & VR LabFaculty of Technology

Bielefeld University

ABSTRACT

To improve user experiences and immersion within virtual environ-ments auditory experience has long been claimed to be of notableimportance [1]. This paper introduces a framework, in which ob-jects, enriched with information about their sound properties, arebeing processed to generate virtual sound sources. This is donewith an automatic processing of the 3D-scene and therefore mini-mizes the effort needed to develop a multimodal virtual world. Inorder to create a comprehensive auditory experience different typesof sound sources have to be distinguished. We propose a differen-tiation into three classes: locally bound static sounds, dynamicallycreated event based sounds, and ambient sounds to create spatialatmosphere.

Index Terms: I.3.7 [Computer Graphics]: Three-DimensionalGraphics and Realism—Virtual Reality; H.5.2 [Information Inter-faces and Presentation]: User Interfaces—Auditory (non-speech)feedback; H.5.5 [Information Interfaces and Presentation]: Soundand Music Computing—Methodologies and techniques

1 INTRODUCTION

Most virtual reality applications focus on realistic graphical output,while other modalities (e.g. sound) are left aside. Integration ofrealistic sound is a difficult and complex task since most methodsdo not meet requirements like real-time calculation. While similarproblems in the graphics domain have been mastered by approxi-mation methods (e.g. the Phong Lighting model) years ago, in thefields of audio rendering there is a lack of such approaches.

The biggest problem with designing a multimodal virtual worldis that it is such a time consuming task because many factors haveto be considered. Often this effort does not seem to be worthwhilewith regard to the benefits like improved immersion. One approachto simplify the creation of virtual worlds is to semantically enrichvirtual objects. This concept has proven to be efficient in creatingIntelligent Virtual Environments (IVE’s) [5]. But until now thisidea has mostly been used to store additional knowledge about thegraphical representation.

In this contribution we propose a framework where semanticallyannotated objects are being analyzed with regard to their potentialproperties (on the example of acoustic features). The idea behindthis is to create ’smart objects’ that know how they have to sound.Even more, the object itself should decide in some points what anappropriate sound would be. Figure 1 shows the semantic enrich-ment of such objects by assigning descriptive values. To achieve arealistic auditory experience it is not sufficient to only generate sin-gle independent sounds, moreover different kinds of sound sourceshave to be distinguished.

!e-mail: [email protected]†e-mail: [email protected]

<Object name='cave' url='cave.x3d'/>

<MetadataString name='sound' value=‘cave‘ />

<Object name=‚ball' url='ball.x3d'/>

<MetadataString name='sound' value=‘ball‘ />

<Object name='waterfall' url='waterfall.x3d'/>

<MetadataString name='sound' value=‘waterfall‘ />

Figure 1: Semantically enriched virtual world on the example of acave (ambient) a ball (possible event sound) and a waterfall (static).

2 SMART VIRTUAL SOUND SOURCES

Basic properties of virtual sound sources are volume and direction.The volume can be calculated directly through the distance from theuser inside the virtual environment. Doubling the distance resultsin a decrease of the sound level by 6 dB as a rule of thumb[2].

Since a realistic simulation of a virtual environment contains dif-ferent kinds of sound sources, further information have to be takeninto account. These include the accompanying graphical object, amatching soundfile, a base volume, and the play mode. We proposea differentiation into three classes: locally bound static sounds, dy-namically created event based sounds, and ambient sounds to createan atmosphere. In order to make virtual sound sources smart, somekind of knowledge has to be provided by the object itself. Thismeans the virtual object itself takes into account properties like size,weight or material when choosing an appropriate sound. The trans-formation from normal to smart sound sources will be discussed inmore detail in section 3.

The technical realization is based on the scenegraph structure ofvirtual environments. The position of the sound node and thus thedirection to the user can be calculated directly by means of traver-sal. Many spatial audio rendering methods exist to create an appro-priate sound output with regard to the hardware setup.

Static Sounds are the most common ones in virtual environ-ments, each closely coupled to a virtual object. In conjunction withthis accompanying object some particular features have to be con-sidered. One the one hand there are simple objects like a waterfall,which emit constant sound without changing their position. On theother hand more dynamic objects move in space (e.g. a car) or don’tsound as constant (e.g.a singing bird).

Technically static sounds are easy to implement, because of thescene graph structure the changing position of moving objects isalways updated automatically. The rate at which an object emits a

147

IEEE Virtual Reality 20124-8 March, Orange County, CA, USA978-1-4673-1246-2/12/$31.00 ©2012 IEEE

Page 2: [IEEE 2012 IEEE Virtual Reality (VR) - Costa Mesa, CA, USA (2012.03.4-2012.03.8)] 2012 IEEE Virtual Reality (VR) - Acoustically enriched virtual worlds with minimum effort

sound, be it constant or a more or less regular interval, can be setup through the play mode. In case of the singing bird there are gapsof random length between two sounds played. In contrast churchclocks emit sounds in very regular intervals. These can be definedby a fix time step or coupled to full hours of the system time.

Event Sounds are dynamically created. There is a distinctionbetween object-object interaction on the one hand and user-objectinteraction on the other. To be able to detect object-object interac-tion events some sort of physical simulation is needed. The user,too, is part of the physical simulation and can thus interact withdynamic and static objects. Resulting from this the events of twocolliding objects should generate a matching sound.

Our solution is implemented with the help of the Open DynamicsEngine (ODE) [6]. To play an event sound at the right time withouta delay to the visual experience, the trajectory of the moving objectis calculated in advance and the most probable points of impact arebeing chosen. The possibly matching sound files are cached andthe correct one is played as soon as the collision event occurs. Userobject interaction is realized by tracking technologies, resulting inan automatic update of the virtual user representation accordingly.

Ambient Sound represents a base level of output that is more orless constant over a larger region of a scene and creates an acousticatmosphere. This idea is already widely used by audio engineers(e.g. in the movie industry). In the real world, humans can eas-ily distinguish between different surrounding scenarios, solely byacoustic clues. For example it is easy to tell if one is inside or out-side a building. In addition, this concept allows for the definition ofenvironmental properties, which influence the audio rendering, tofit the environment, such as an outside scenario, a cave or a concerthall. In this manner appropriate equalizers can be chosen to affectall sounds belonging to this specific zone (e.g. adding reverb to allsound sources within a cave) [4].

Adding ambient sound nodes to the scenegraph is done by defin-ing major group nodes, which describe self-contained areas of thevirtual world. As long as the user is within the defined area, thissound will be played without direction and always at the same vol-ume. All static and event sounds in this area are changed accordingto the environmental properties.

3 REALIZATION OF THE FRAMEWORK

Our framework is designed to support the development of acousti-cally enriched virtual worlds with minimum effort. The idea behindthis is to create smart objects that ’know’ how they have to sound.This knowledge is not stored in a separate knowledge base, but isembedded directly inside the object itself.

We are developing a virtual reality audio component based onthe InstantReality (IR) framework which utilizes the X3D standard.The IR framework was designed by the Fraunhofer IGD and ZGDVto provide a simple and consistent tool for AR- and VR-developers[3]. So-called metadata can be assigned to X3D-objects which arenot part of the rendered scene. We decided to use these metadatatags for the assignment of semantic information. As seen in Figure1 this annotation describes the objects type without further speci-fication of it’s features. The utilization of this additional semanticinformation is done by a preprocessing unit as shown in Figure 2.

The creation of nodes is achieved by a three-step process startingwith parsing the original scene in order to get access to the metadatainformation. The second step matches these entries with a databasethat holds insertable code fragments containing prototypical infor-mation about objects. These prototypes carry default values for theobjects features. At this point our smart objects adopt the fittingdefault values while overwriting others with specific informationretrieved from their graphical representation. This results in the ef-fect that for example a graphical representation of the Niagara Fallsleads to a complete different acoustic experience than the represen-tation of a small waterfall solely based on the size of the object. The

Parser

Mappingget MetadataString()

Writer

insert code fragments

replace scene

get Code for type

parse file

<X3D version='3.0'>... <ExternProtoDeclare name='waterfall' url='waterfal.x3d'/> <MetadataString name='sound' value='waterfall' containerField='value'/>...

<X3D version='3.0'>... <ExternProtoDeclare name='waterfall' url='waterfall.x3d'/> <SoundNode ID='1' file='data/sounds/waterfall.wav' location='0 0 -2' play='TRUE'/>...

Database

Insertable Code

Preprocessing Unit

Original X3D file

New scene

Figure 2: Schematic overview of the preprocessing unit.

third step uses the extracted information to replace the old scenewith a new one including the created nodes and the application isstarted.

The semantic annotation frees the developer from the time con-suming task of assigning sound sources to every objects himself.Furthermore the fact that only vague type annotation is requiredenables him to create an acoustically enriched virtual world withminimum effort.

4 CONCLUSION AND FUTURE WORK

We introduced a framework for 3D-Audio generation, which mini-mizes the effort while designing a virtual world. This is achieved bysemantically annotated objects, which are automatically assignedto appropriate sound files due to a brief description. All soundscreated in this manner the distinction of ambient, event and staticsounds leads to a reasonable approximation of a realistic acousticenvironment. Furthermore semantically enriched IVE’s have a highpotential in enriching virtual worlds not only with visual or acous-tic content, but with all kinds of multimodal output as well. Forthis reason we are already in the process of extending the proposedframework by haptic content generation.

Our ultimate goal is to create virtual worlds in which the usercan experience the environment he resides in, maybe even in theabsence of graphical output. Thus a step towards a realistic envi-ronmental experience may be achieved.

REFERENCES

[1] J. Bates. Virtual reality, art and entertainment. Presence, 1(1):133–138,1992.

[2] D. R. Begault. 3D Sound for Virtual Reality and Multimedia. AcademicPress, 1994.

[3] D. Fellner, J. Behr, and U. Bockholt. Instantreality a framework forindustrial augmented and virtual reality applications. In Shanghai JiaoTong University: 2nd Sino-German Workshop” Virtual Reality & Aug-mented Reality in Industry, volume 16, page 17, 2009.

[4] J. Frohlich and I. Wachsmuth. A phong-based concept for 3d-audiogeneration. In Smart Graphics, pages 184–187. Springer, 2011.

[5] M. Luck and R. Aylett. Applying artificial intelligence to virtual re-ality: Intelligent virtual environments. Applied Artificial Intelligence,14(1):3–32, 2000.

[6] R. Smith et al. Open dynamics engine, 2005.

148