virtual acoustics and spatial audio - aalto · ville pulkki, tapio lokki virtual acoustics and...

3/10/17

1

Ville Pulkki, Tapio Lokki

Virtual acoustics and spatial audio

ELEC-E5620, 2017Audio Signal Processing

Agenda 10.3.2017• Virtual Acoustics• Spatial audio techniques• Vector-base amplitude panning VBAP (separate slides)• Directional audio coding DirAC (separate slides)

[email protected]

2

3/10/17

2

Basics of soundSound propagates as waves• Audible frequency range 20 … 20 000 Hz• Speed of sound in air ~340 m/s• Wavelength 17 m … 17 mm• Dynamics 0 … 120 dB• scattering, diffraction, interference...Modeling of room acoustics: source – medium – receiver model

[email protected]

3

Virtual Acoustics (Väänänen, 2003)

[email protected]

4

3/10/17

3

Impulse response of a room

[email protected]

510 meters

7 meters

Impulse response of a room

3/10/17

4

Impulse responseA linear time-invariant system (LTI) can be modeled with an impulse response

The output y(t) is the convolution of the input x(t) and the impulse response h(t)

Discrete form (convolution is sum)

[email protected]

7

Measured impulse responses of real concert halls

[email protected]

8

−1

−0.5

0

0.5

1

Promenadisali

Impulssivaste konserttisalissa 15m etäisyydellä lavasta

−1

−0.5

0

0.5

1

Sibeliustalo

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

−0.5

0

0.5

1

Musiikkitalo

Aika [s]

3/10/17

5

Measured impulse responses of real concert halls

[email protected]

9

−40

−30

−20

−10

0

Promenadisali

Impulssivaste konserttisalissa 15m etäisyydellä lavasta

−40

−30

−20

−10

0

Sibeliustalo

Mag

nitu

di [d

B]

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5−40

−30

−20

−10

0

Musiikkitalo

Aika [s]

Two goals of room acoustics modelingGoal 1: room acoustics prediction• Static source and receiver

positions• No real-time requirement

Goal 2: auralization, sound rendering• Possibly moving source(s) and

listener, even geometry• Both off-line and interactive

(real-time) applications• Need of anechoic stimulus signals

[email protected]

10

3/10/17

6

Goal 1: Prediction of room acoustics

[email protected]

11

(Springer Handbook of Acoustics 2007)

Prediction of acoustics of a room (goal 1)Input data:

• Geometry, materials, source(s) and receiver(s) locations and orientationsGoal:

• Impulse response(s)- room acoustical parameters (T60, C80, EDT, LEF...), ISO 3382-1:2009- low frequencies behaviour

Modeling:• Source(s)

- omnidirectional, sometimes directional• Medium

- sound propagation in air and reflections- as accurate as possible

• Receiver(s)- output mono, fig-of-eight, binaural

[email protected]

12

3/10/17

7

Goal 2: Auralization / sound rendering

“Auralization is the process of rendering audible, by physical or mathematical modeling, the sound field of a source in a space, in such a way as to simulate the binaural listening experience at a given position in the modeled space.” (Kleiner et al. 1993, JAES)Sound rendering: plausible 3-D sound, e.g., in games

3-D model Þ spatial IR * dry signal = auralization

[email protected]

13

Auralization, sound rendering (goal 2)Input data:

• Anechoic stimulus signal(s) !• Geometry, materials, source(s) and receiver(s) locations and

orientationsGoal:

• Plausible spatial sound, authentic auralizationModeling:

• Source(s)- omnidirectional, directional

• Medium- physically-based sound propagation in a room- perceptual models, i.e., artificial reverb

• Receiver- spatial sound reproduction (binaural or multichannel)

[email protected]

14

3/10/17

8

Dynamic auralization (≈sound rendering)Method 1: A grid of impulse responses are computed and convolution is performed with interpolation between 2/4/8/-nodes• Applied in CATT software (http://www.catt.se)Method 2: ”Parametric rendering”

[email protected]

15

Source ModelingStimulus• Sound signal synthesis• Anechoic recordings- https://mediatech.aalto.fi/en/research/virtual-

acoustics/research/acoustic-measurement-and-analysis/85-anechoic-recordings

Radiation• Directivity is a measure of the directional

characteristic of a sound source.• Point sources- omnidirectional- frequency dependent directivity characteristics

• Line and volume sources• Database of loudspeakers

http://www.clfgroup.org/

[email protected]

16

3/10/17

9

Directivity of musicalinstrumentsPätynen and Lokki (AAuA2010)Data available: https://mediatech.aalto.fi/en/research/virtual-acoustics/research/acoustic-measurement-and-analysis/77-directivity-of-instruments

[email protected]

17

[email protected]

18

Room acoustics modeling

• 1:10, 1:20, 1:50Scale models

• Element methods(FEM,BEM)

• Time-domain methods(FDTD, e.g., Waveguidemesh)

Wave-basedmethods

• Image-source method, beam tracing

• Ray-tracing, particles, phonon tracing, sonelmapping, etc.

• Acoustic radiancetransfer

Ray-basedmethods

• SEA• Sabine, Eyring, ym.

Statisticalmethods

3/10/17

10

[email protected]

19

Room acoustics modeling

Scalemodels

FEM,BEM

FDTD, e.g.,Waveguide

mesh

Wave-basedmethods

Image sourcemethod,

beam-tracing

Ray-tracing,particles,

phonon tracing,sonel mapping

Acousticradiancetransfer

Ray-basedmethods

SEA

Statisticalmethods

Physically-basedmodeling methods

Building acoustics, such as structural coupling, is not covered in this presentation

Room acoustics modeling methods

[email protected]

20

Scale models 1:10

(Tachibana et al. 2004)

Copenhagen 2006

Musiikkitalo 2009

3/10/17

11

ReproductionThe most intuitive way to study room acoustic prediction results• Not only for expertsAnechoic stimulus signal• Only a few recordings availableReproduction with binaural or multi-channel techniques

Impulse response has to contain also spatial information• Binaural IR• IR for each loudspeaker (e.g., SIRR, Spatial Impulse Response Rendering or

SDM, Spatial Decomposition Method)

[email protected]

21

Spatial audio(a.k.a. 3D sound)Humans are able to perceive the direction of sound event using only two ears. The mechanisms are based on monaural and binaural analysis of ear canal signals.Three-dimensional sound illusion can be achieved using headphones or a pair of loudspeakers. We can "cheat" the auditory system using 3-D audio!• We need to know:- basics of human hearing- basics of spatial hearing- signal processing- basics of loudspeakers- room acoustics

[email protected]

22

3/10/17

12

Modeling the acousticsof listenerHRIR = head-related impulse responseHRTF = head-related transfer function

[email protected]

23

Binaural hearingHumans have two earsFirst studies already at 1876 and 1907 by Lord RayleighBinaural hearing is based on:• interaural time difference (ITD), below ~700 Hz• interaural level difference (ILD), above ~2000 Hz• in-between (700-2000 Hz) both ITD and ILD (also other features)

[email protected]

24

3/10/17

13

Each human pinnae is individual

Torso, shoulders, head and pinna modify the perceived spectrum as a function of the incoming angle of soundRoom acoustics (early reflections), head movements and vision also contribute to spatial hearing sensationHRTF definition:

• Free-field impulse response from a point in a space to a point in the listener's ear canal

[email protected]

25

HRTF modeling and filter designChoice of HRTFs• Individual HRTFs yield best resultsMinimum-phase reconstruction• Reconstruction of ITD using a delay lineSpectral preprocessing• Equalization (diffuse-field, free-field)• Auditory smoothingFilter design• FIR, IIR, warped structures• Least squares, Chebyshev, Hankel norm designs

[email protected]

26

3/10/17

14

HRTF measurementsMicrophones in earsTurntable

[email protected]

27

http://www.ais.riec.tohoku.ac.jp/Lab3/localization/

HRTF measurements, new development[Huttunen et al. Rapid generation of personalized HRTFs. In AES 55th conf, 2014]Scanning head geometry, mathematical model, simulation (FM-BEM)

• Multiple cameras, 3D laser scanners, video, etc. (http://ownsurround.com)Reciprocal measurement: Loudspeaker in the ear, N receivers

[email protected]

28

3/10/17

15

Binaural reproduction with headphonesSeparate sound signals for both ears• HRTFs requiredHead-tracking needed to perceive sources outside-the-head (externalization)

[email protected]

29

Cross-talk cancellationbinaural reproduction with two loudspeakersCross-talk cancellation first introduced by Atal and Schroeder (1963 publication, 1966 patent)Originally intended for playing back dummy head recordings over loudspeakersSymmetry exploited in shuffler structure, transaural processing, by Cooper and Bauck(1989), covered by many patentsLoudspeaker setups• 60 degrees• 10 degrees (stereo-dipole)

Nintendo DS, Nokia phones

[email protected]

30

3/10/17

16

Vector base amplitude panning (VBAP)Developed by Ville Pulkki, 1997 at TKKAmplitude panning extended to 3DSimple calculation of gain factorsArbitrary positioning of loudspeakers

[email protected]

31

AmbisonicsInvented by Michael Gerzon (1973)Both recording and reproduction technique• soundfield microphoneBased on spherical harmonics theory2D panning, 1st order gain factorsIRCAM had recently 9th order system (>300 lps)

[email protected]

32

Quadrafonicloudspeaker setup:first orderambisonics

Quadrafonicloudspeaker setup:second orderambisonics

3/10/17

17

Wave field synthesis (WFS)Idea proposed by Berkhout (JAES, 1998) at the University of Delft • Based on the Huygens–Fresnel principle Requires a lot of loudspeakersLarge listening areaGreat animations: http://www.syntheticwave.de/wfs-properties.htm• http://www.syntheticwave.de/Principle%20of%20wave%20field%20synthesis.htm• http://www.syntheticwave.de/pictures/wave_field_synthesis.swfIOSONO• http://www.iosono-sound.com

[email protected]

33

Multichannel loudspeaker setups

[email protected]

34

10.25.1

22.2

3/10/17

18

Loudspeaker-setup agnostic systems

• 40-120 discrete channels are transmitted to the user• Each channel contains spatial metadata (panning direction,

spread, etc) • Supports in principle any number of loudspeakers• Cinema audio format for 3D loudspeaker setups• Blue-ray also

[email protected]

35

- Dolby Atmos: http://www.dolby.com/us/en/professional/technology/cinema/dolby-atmos.html

- DTS:Xhttp://dts.com/dtsx

- MPEG-H: https://en.wikipedia.org/wiki/MPEG-H_3D_Audio

Virtual loudspeakers with headphones

Each loudspeaker signal convolved with corresponding HRTFs• Head-tracking need real-time implementation

[email protected]

36

Dolby Headphones (5.1 or 7.1 with headphones)

Sony Playstation VR utilizes VBAP with N virtual loudspeakers, head tracking affects the virtual source directions, not HRTFs

3/10/17

19

[email protected]

37

Case: Auralization in DIVA system(Savioja et al., JAES1999)

1. Scene definition

2. Parametric presentation of sound paths

3. Auralization with parametric DSP structure

[email protected] - 38

Auralization parameters in DIVA

• Input contains (given to image-source calculation)– geometry data, material data, positions and orientations of

sources and the listener– anechoic recording

• For the direct sound and each image source the following set of auralization parameters is provided:– Distance from the listener– Azimuth and elevation angles with respect to the listener– Source orientation with respect to the listener– Set of filter coefficients which describe the material properties in

reflections

3/10/17

20


Treatment of one image source – a DSP view

• Directivity• Air absorption• Distance attenuation• Reflection filters• Listener modeling

• Linear system• Commutation• Cascading


DIVA auralization block diagram

3/10/17

21


Treatment of each image source


Late reverberation algorithm• A special version of feedback delay network (Väänänen et al.

1997)– also time-variant method (Lokki & Hiipakka, 2001)

3/10/17

22


A Case Study: a Lecture Room


Image sources 1st order

3/10/17

23


Image sources up to 2nd order


Image sources up to 3rd order

3/10/17

24


Distance attenuation


Distance attenuation (zoomed)

3/10/17

25


Gain + air absorption


Gain + air and material absorption

3/10/17

26


All monaural filtering


All monaural filtering (zoomed)

3/10/17

27


Treatment of each image source


Only ITD for pure impulse

3/10/17

28


Only ITD for pure impulse (zoom)


ITD + minimum phase HRTF

3/10/17

29


Monaural filterings + ITD


Monaural filterings + ITD + HRTF

3/10/17

30


DIVA auralization block diagram


Reverb

3/10/17

31


Image sources + reverberation



3/10/17

32




Dynamic Sound Rendering

• Dynamic rendering– properties of image sources are time variant

• The coefficients of filters are changing all the time– every single parameter have to be interpolated– in delay line pick-ups the fractional filters have to be used to

avoid clicks and artifacts– Late reverberation is static

• Update rate ó latency

3/10/17

33


Auralization quality

• What is the wanted quality?– Assesment of quality is possible only by case studies

• Objectively:– Acoustical attributes– With auditory modeling

• Subjectively:– Listening tests


A case study, lecture hall T3

3/10/17

34


Example impulse responses

• Image sources up to 4th order auralized• First order diffraction modeled• Statistical late reverberation


Quality of auralization

Stimuli: clarinet drum

Results clarinet: recording auralizationResults drum: recording auralization

3/10/17

35

Conclusions on room acoustics modelingTwo goals:• Prediction of acoustical attributes• Auralization, sound renderingA lot of different methods applied in room acoustic modeling• All of them have weaknesses• A hybrid method, combining many techniques, would be the ideal

solution• A few commercial software available and in everyday use of

consultants• Some methods are still much too complex for modern computers

(computation time and memory)

[email protected]

69

[email protected]

Literature• Required reading (for exam):

– Savioja, L., Huopaniemi, J., Lokki, T., and Väänänen, R. 1999. Creating virtualacoustic environments. Journal of the Audio Engineering Society, vol. 47, no. 9,pp. 675-705, September 1999.

• Recommended reading:– Avendano, C., Jot, J-M. Frequency domain techniques for stereo to multichannel

upmix. Proc. AES 22nd international conference, June 15-17, 2002, Espoo,Finland, pp. 121-130

– Lokki, T. Tasting music like wine: Sensory evaluation of concert halls. PhysicsToday, vol. 67, no. 1, pp. 27-32, 2014. http://dx.doi.org/10.1063/PT.3.2242

– TKK doctoral dissertations, see http://lib.tkk.fi/Diss/• Huopaniemi, J. Virtual acoustics and 3-D sound in multimedia signal processing, 1999• Savioja, L. Modeling Techniques for Virtual Acoustics, 1999• Pulkki, V. Spatial Sound Generation and Perception by Amplitude Panning Techniques, 2001• Lokki, T. Physically-based Auralization – Design, Implementation, and Evaluation, 2002• Väänänen, R. Parametrization, Auralization, and Authoring of Room Acoustics for Virtual Reality

Applications, 2003• Merimaa, J. Analysis, Synthesis, and Perception of Spatial Sound – Binaural Localization Modeling

and Multichannel Loudspeaker Reproduction, 2006• Siltanen, S. Efficient Physics-Based Room-Acoustics Modeling and Auralization, 2010• Pätynen, J. A virtual symphony orchestra for studies on concert hall acoustics, 2011• Tervo, S. Localization and tracing of early acoustic reflections in enclosures, 2012• Lainen, M-V. Techniques for versatile spatial-audio reproduction in time-frequency domain, 2014

Vector-base amplitude panning VBAPDirectional audio coding DirAC

Ville PulkkiProfessor of Acoustics (Associate Professor)Department of Signal Processing and AcousticsSchool of Electrical EngineeringAalto University, Helsinki, Finland

Installation lecture

January 19, 2016

These slides

Vector base amplitude panning (VBAP)Variants and enhancement of VBAPTime-frequency-domain parametric spatial audioDirectional audio coding

Vector-base amplitude panning VBAP Directional audio codingDirAC

2/46

Pulkki January 19, 2016Dept Signal Processing and Acoustics

A music student with MSc (Eng) needsextra income (1995)

Sibelius Academy chamber music hall had lots ofloudspeakers on walls and ceilingSibA wanted to have a "panning tool" for theirloudspeaker system (one month salary for student)1-month joint project btw TKK and SibA


3/46


Reformulation of amplitude panning

Tried to generalize the sine panning law to 3D, no luck"Could this be formulated with vector bases?" – "Yes!"Vector base amplitude panning (VBAP) was bornDivide setup into triplets, and compute gain factors for each

n

l

p

l

l

m

k

loudspeaker m

loudspeaker k

virtualsource loudspeaker n


4/46


Vector base amplitude panning

PhD degree in 2001.


5/46


Dissemination of VBAP

Published VBAP paper in JAES 1997Provided free software implementations of the methodArticle has been cited 990 times in google scholar (2017)Second paper of all JAES papers, when ranked with the number ofcitations (scopus)


6/46


Products with "VBAP inside"

ITU MPEG-H audio standard (broadcast)DTS:X audio format (cinema + blueray)Sony Playstation VR (gaming)Dedicated audio programming softwares


7/46


VBAP maths

n

l

p

l

l

m

k

loudspeaker m

loudspeaker k

virtualsource loudspeaker n

p = g1l1 + g2l2 + g3l3

g = [p1 p2 p3]

2

4l11 l12 l13l21 l22 l23l31 l32 l33

3

5�1

g holds the barycentric coordinates of virtual source in vector baseloudspeaker signals y

i

= g

i

x(t)

g

i

controls the amplitude of signal in each loudspeaker


8/46


VBAP maths

g1 = g2 = 0.7g1 = g2 = 1.0

g1 = g2 = 0.5

p

l1l2

(q = 2)

40o90o

120o

g

i

depend on opening angle of the loudspeaker base / not good!Length of g must be normalized to avoid changes in loudnessg

norm

= g/(P

i gpi )

1/q

Thus: (P

i

g

q

norm

i

)1/q == 1q = 1 for anechoic cases (also headphones with virtual loudspeakers)q = 2 for normal rooms

Matlab code available https://se.mathworks.com/matlabcentral/fileexchange/53884-vector-base-amplitude-panning-library


9/46


VBAP runtime cycle

init Feed in loudspeaker directionsinit 2D: form pairs. 3D form triplets. Compute inverse matricesrun multiply input sound x(t) with g, output to loudspeakers

intrpt1 Start interrupt when virtual source direction changesintrpt2 Compute gain factors for each LS pair/triplet, select the pair/triplet with

positive gainsintrpt3 Normalize gains

Max demo


10/46


Amplitude panning audio quality

Direction of amplitude-panned source is perceived relatively accuratelyin best listening positionOutside of sweet spot: directional perception is dominated by nearmostloudspeakerNo prominent coloration issues in normal rooms inside or outside thesweet spotMost-used virtual source positioning method: all mixers have "panpot"buttonsColoration issues in anehcoic listening


11/46


Amplitude panning, mechanism behind formationof perceived directionPerceiving a virtual source between the loudspeakers does notcorrespond to actual situation. If you have a red LED in both left and righthands, you see two LEDS, not one in between.Amplitude panning causes cross-talk and affects both ITD and ILD incomplex way

1t

t t2

t

1l

2r

r

l

g1 g2

R

L

t

g1 = g2 g1 > g2

t

~0.2ms ~0.2ms

ampltude-panned impulse responses at ear canals

Amplitude panning, mechanism behind formationof perceived direction

loudspeaker amplitude difference changes to interaural time differenceat low frequenciesloudspeaker amplitude difference changes to interaural level differenceat high frequencies

1 2 3 4 5 6 7 8 9 10 11

0

5

10

15

20

25

30

40

50

ITD

A [d

egre

es]

ERB channel

Frequency [kHz]0.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 18.2

0

5

10

15

20

25

30

θT =

1 2 3 4 5 6 7 8 9 10 11

0

10

20

30

40

50

ILD

A [d

egre

es]

ERB channel

Frequency [kHz]0.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 18.2

10

5

15

20

0

25

30θ

T =

Pulkki, Ville. Spatial sound generation and perception by amplitude panning techniques. PhD thesis. Helsinki University of Technology, 2001.

Spread issue

Perceived spread of amplitude-panned virtual sources depends on virtualsource direction p

When p is coincident with loudspeaker direction, "point-like"When p is in-between loudspeakers, "more or less spread"Frequency-dependent ITD and ILD cues do not match with real source


14/46


Multiple-direction amplitude panning

Make the spread even!

Define pDefine a number of vectors p

i

around p within angular range of �around pCompute g

i

for for each pi

,Sum over i : g =

Pi

g

Result: always more than one LS hasconsiderable gain.

Max demo

Pulkki, V. (1999). Uniform spreading of amplitude panned virtual sources. In Applications of Signal Processing to Audio and Acoustics, 1999 IEEEWorkshop on (pp. 187-190). IEEE.


15/46


Coloration of amplitude-panned sources

1t

t t2

t

1l

2r

r

l1

t

t t2

t

1l

2r

r

l

Direct sounds from loudspeakers interfere ! comb filter effect !audible colorationReflected and reverberated sound paths arrrive at ear canals inincoherent manner, and no comb filter effect occurs! Amplitude-panned sources are not perceived colored in normalrooms


16/46


Coloration of amplitude-panned sources

Amplitude-panned sources are colored in anechoic listeningIsn’t anechoic listening just a niche that nobody cares?Headphone listening with virtual loudspeakers + panning: that isanechoic listening(Sony playstation VR + many other VR applications)We should do something for thisAn easy solution is to utilize more loudspeakers: when the anglebetween LS is smaller, traveling time difference is smaller, andcomb-filter effect migrates to higher frequencies and becomes lesssalient


17/46


Coloration of amplitude-panned sources in ane-choic listening

CharacteristicsDip around 1-2 kHzAt high frequencies a bitlower leveleffect depends on

panning angleloudspeaker directionsroom effect

Can we compensate this by equalizing / other means?


18/46


Coloration of amplitude-panned sourcesin anechoic listening

Gain factor normalization gnorm

= g/(P

i

g

q

i

)q = 1 or q = 2In anechoic listening only frequencies below about 800Hz satisfy q = 1conditionAt frequencies above 2kHz it is not intuitively clear what happens.

Lets make q to depend on frequency and listening-room-response


19/46


Coloration of amplitude-panned sourcesin anechoic listeningA solution has been proposed in [1]

Gain factor normalization gnorm

= g/(P

i

g

q

i

)

A solution has been numerically obtained using auditory models androom measurementsq(f ,DTT) = (p0(f ))

pDTT + 2

q0(f ) = 1.5 � 0.5 cos [4.7 tanh (a1f ) max (0, 1 � (a2f )]

where a1 = 0.00045 and a2 = 0.000085DTT is direct-to-total energy ratio

[1] Laitinen, M. V., Vilkamo, J., Jussila, K., Politis, A., & Pulkki, V. (2014, August). Gain normalization in amplitude panning as a function offrequency and room reverberance. In Audio Engineering Society Conference: 55th International Conference: Spatial Audio. Audio EngineeringSociety.


20/46


Coloration of amplitude-panned sourcesin anechoic listening

The figures show q(f ,DTT) valuesResults simulated with large number oflistening conditions with loudspeaker spanfrom 30� to 80�

Requires frequency-domainimplementation of panningMitigates coloration issuesReadily implementable intime-frequency-domain processing, suchas in DirACCan be implemented with IIR filters (?)


21/46


Directional audio coding (DirAC)

Developed in Ville Pulkki’s research group 2001 —

reproduce recorded spatial soundsynthesize spatial properties to sound (e.g., game sound engine)time-frequency-domain parametric methodnon-linear, processing depends on signal and on spatial properties ofsound field


22/46


How could a sound field be reproduced

Problems with existing techniques


23/46


Spatial sound reproduction

Target: relay the perception of sound!


24/46


Analogy with video

How does a video camera work?

LensLight from distinct direction is projected to one position at CCDCCD encodes the light energy at three frequency channels (RGB)Visible light wave lengths 380 nm - 780 nm (less than one octave)Very similar with eye


25/46



Could we do the same with sound than with video camera

Create narrow beam for each loudspeakerAudible sound includes wave lengths from 2 cm to about 30 mImpossible to build a microphone having constant narrow beam widthwithout coloration and noise problemsHigher-order Ambisonics / beam steering try to do it


26/46



Holography then, perhaps?

Lots of spaced microphonesLots of loudspeakersWave field synthesisProblems

High priceDirectivity of microphones should be matched with directivity ofloudspeakers - is it possible?


27/46


Sound fields reproduced with WFS and HOA

monochromatic plane waves reproducedvalid sound field only in limited listeningarea "sweet spot"at high frequencies huge errors

Daniel, JÈrÙme, Sebastien Moreau, and Rozenn Nicol. "Further investigations of high-orderambisonics and wavefield synthesis for holophonic sound imaging." Audio Engineering SocietyConvention 114. Audio Engineering Society, 2003. This


28/46


Parametric spatial sound reproduction

Are there any workarounds?

Human spatial hearing can be fooled easilyE.g. two coherent sources produce one virtual source in the middleCompare with vision: coherent sources do not produce virtual sourcesAssumption: at one frequency band humans perceive only one directionand one coherence cue


29/46


Parametric spatial sound reproduction

Capture the soundAnalyze spatial parametersReproduce the sound in a way which recreates the spatial parameters

micro-phones time-

frequencyanalysis

spatialanalysis

spatialsynthesis

microphonesignals in TFdomain

spatialmetadatain TF domain

loudspeaker or headphonesignals


30/46


Assumptions in DirAC

Assumption 1: listener is able to localize only one sound object at onetime-frequency positionAssumption 2: good reproduction quality is obtained, if we reproducecorrectly the

direction,diffuseness, andspectrum of sound


31/46


Example implementation


32/46


B-format microphones


33/46


B-format directional patterns

N = 0

N = 1

N = 2

N = 3

N denotes the order of patterns (and microphone)0th-order (omni) microphones capture pressure signal [W]1st-order dipole microphones capture volume velocity signals [X,Y,Z]


34/46


DirAC details, some of them

Time-frequency transformFilter banksSTFTThe system can be seen a filter changing weights fast in time, aliasingissues have to be taken into account

p / W is pressure signal, u / [X Y Z] is 3D velocity vectorIa

= <[p⇤(k , n) u(k , n)](active intensity vector)

e = ⇢02 ||u||

2 + |p|2

2⇢0c

2 (energy density)Direction of arrival DOA = �I

a

Diffuseness = 1 � ||E[Ia

]||cE[e]

Temporal integration of parametersShort constants for DOA and DiffusenessLonger for loudspeaker gains


35/46


Matlab code for directional analysis% d i r a n a l y s i s .m% Author : V . Pu l kk i% Example o f d i r e c t i o n a l ana l ys i s o f s imulated B�format record ingFs=44100; % Generate s igna l ss ig1 =2⇤(mod ( [ 1 : Fs ] ’ ,40) /80 �0.5) .⇤ min (1 ,max( 0 , (mod ( [ 1 : Fs ] ’ , Fs/5)�Fs / 1 0 ) ) ) ;s ig2 =2⇤(mod ( [ 1 : Fs ] ’ ,32) /72 �0.5) .⇤ min (1 ,max( 0 , (mod ( [ [ 1 : Fs ]+ Fs / 6 ] ’ , Fs/3)�Fs / 6 ) ) ) ;% Simulate two sources i n d i r e c t i o n s o f 50 and 170 degreesw=( s ig1+s ig2 ) / s q r t ( 2 ) ;x=s ig1⇤cos (50/180⇤ p i )+ s ig2⇤cos(�170/180⇤p i ) ;y=s ig1⇤s in (50/180⇤ p i )+ s ig2⇤s in (�170/180⇤p i ) ;

% Add fad ing i n d i f f u s e noise w i th 36 sources evenly i n the h o r i z o n t a l plane 43 f o r d i r =0:10:350noise =( rand ( Fs , 1 ) �0 . 5 ) .⇤ ( 1 0 . \ ^ ( ( ( [ 1 : Fs ] ’ / Fs)�1)⇤2));w=w+noise / s q r t ( 2 ) ;x=x+noise⇤cos ( d i r /180⇤ p i ) ;y=y+noise⇤s in ( d i r /180⇤ p i ) ;

endhopsize =256; % Do d i r e c t i o n a l ana l ys i s w i th STFTwins ize =512; i =2; alpha =1. / (0 .02⇤Fs / wins ize ) ;In tens=zeros ( hopsize ,2 )+ eps ; Energy=zeros ( hopsize ,2 )+ eps ;

Pulkki, Ville, Tapio Lokki, and Davide Rocchesso. "Spatial effects." DAFX: Digital Audio Effects, Second Edition (2011): 139-183.


36/46


Matlab code for directional analysisf o r t ime =1: hopsize : ( leng th ( x)�wins ize )

% moving to frequency domainW= f f t (w( t ime : ( t ime+winsize �1)).⇤hanning ( wins ize ) ) ;X= f f t ( x ( t ime : ( t ime+winsize �1)).⇤hanning ( wins ize ) ) ;Y= f f t ( y ( t ime : ( t ime+winsize �1)).⇤hanning ( wins ize ) ) ;W=W( 1 : hopsize ) ; X=X( 1 : hopsize ) ; Y=Y( 1 : hopsize ) ;

%I n t e n s i t y computat iontempInt = r e a l ( con j (W) ⇤ [1 1 ] .⇤ [X Y ] ) / s q r t (2) ;% InstantaneousIn tens = tempInt ⇤ alpha + In tens ⇤ (1 � alpha ) ; %Smoothed

% Compute d i r e c t i o n from i n t e n s i t y vec to rAzimuth ( : , i ) = round ( atan2 ( In tens ( : , 2 ) , In tens ( : , 1 ) )⇤ ( 1 8 0 / p i ) ) ; %Energy computat iontempEn=0.5 ⇤ (sum( abs ( [ X Y ] ) . ^ 2 , 2) ⇤ 0.5 + abs (W) . ^ 2 + eps);% I n s tEnergy ( : , i ) = tempEn⇤alpha + Energy ( : , ( i �1)) ⇤ (1�alpha ) ; %Smoothed

%Di f fuseness computat ionDi f fuseness ( : , i ) = 1 � s q r t (sum( In tens . ^ 2 , 2 ) ) . / ( Energy ( : , i ) ) ; i = i +1;

end

% Plo t v a r i a b l e sf i g u r e ( 1 ) ; imagesc ( log ( Energy ) ) ; t i t l e ( ’ Energy ’ ) ;se t ( gca , ’ YDir ’ , ’ normal ’ ) ; x l a b e l ( ’ Time frame ’ ) ; y l a b e l ( ’ Freq bin ’ ) ;f i g u r e ( 2 ) ; imagesc ( Azimuth ) ; co lo rba r ;se t ( gca , ’ YDir ’ , ’ normal ’ ) t i t l e ( ’ Azimuth ’ ) ; x l a b e l ( ’ Time frame ’ ) ; y l a b e l ( ’ Freq bin ’ ) ;f i g u r e ( 3 ) ; imagesc ( Di f fuseness ) ; co lo rba r ;se t ( gca , ’ YDir ’ , ’ normal ’ ) ; t i t l e ( ’ Di f fuseness ’ ) ; x l a b e l ( ’ Time frame ’ ) ; y l a b e l ( ’ Freq bin ’ ) ;


37/46


"HQ" implementation

Too high coherence in virtual microphone channels is enhanced bydiffuse stream: loudspeaker-specific frequency-dependent-delay(decorrelation)non-diffuse stream: panning factors used as gates


38/46


Applications of DirAC

Teleconferencing [1]Realistic reproduction of spatial sound environments [2]– especially for head-mounted displays (VR) [3]Virtual reality (game) audio engines [4]Spatial audio effects [5]

[1] Ahonen, Jukka. "Microphone front-ends for spatial sound analysis and synthesis with Directional Audio Coding." Phd thsesis. Aalto University(2013).[2] V. Pulkki, M.-V. Laitinen, J. Vilkamo, and J. Ahonen "First-order directional audio coding (DirAC)" Parametric time-frequency-domain spatialaudio. Wiley (2017), in press, ask for a copy.[3] Laitinen, M. V., and Pulkki, V. (2009, October). Binaural reproduction for directional audio coding. In Applications of Signal Processing to Audioand Acoustics, 2009. WASPAA’09. IEEE Workshop on (pp. 337-340). IEEE.[4] Laitinen, M. V., Pihlajam% ki, T., Erkut, C., and Pulkki, V. (2012). Parametric time-frequency representation of spatial sound in virtual worlds.ACM Transactions on Applied Perception (TAP), 9(2), 8.[5] Politis, A., Pihlajam% ki, T., and Pulkki, V. (2012). Parametric spatial audio effects. York, UK, September.


39/46


Capturing the reality

Omnidirectional cameraat least 6 lensesstitched to spherical video

3D microphonegeneric representation of spatialaudiocan be reproduced over DirAC

Gomez Bolanos AND Pulkki Immersive Audiovisual Environment with 3D audio

Fig. 7: Omnidirectional camera (Ladybug 3) setupwith the A-format microphone (SPS200).

Fig. 8: Video cropping utility (MAX patch). Crop-ping our group meeting recording.

and the e�ect of widening the sound source is alsoperceived correctly.

In general, the system has a good match betweenthe spatial distribution of the auditory events andthe spatial distribution of the visual events, whichtogether with the wide field of view, improves thesensation of immersion and, in consequence, realismin the scene.

5. CONCLUSIONS

An implementation of an immersive audiovisual en-vironment was presented. The system is based onthe use of acoustically transparent screens which re-duces the necessity of complex filtering for correctionof the loudspeaker responses. This environment con-sists of 29 active loudspeakers and three high defini-tion video projectors controlled by a computer. Thesystem also includes a tracking system for interactivepurposes. The loudspeakers are disposed around thelistener in a spherical disposition. The visual dis-play spans 226� in the horizontal plane and 57� inthe vertical plane. The system is easy to assem-ble and disassemble, and allows modifications in theconfiguration of the loudspeakers and the projectorsin order to perform other tasks. With this flexibil-ity, the system can be used for researching into otherfields as crossmodal interaction and psychoacoustics,auralization and gaming. An audiovisual capturingsystem consisting of an omnidirectional camera andan A-format microphone is utilized for acquiring au-diovisual material for the system. Several recordingswere done using the capturing system. It has beenfound, from informal listening tests of the recordedmaterial, that the system presents a good match be-tween visual and audio events, providing good spa-tial audio quality. The non-anechoic characteristicsof the room do not seem to a�ect the spatial audioquality of the reproduction system when DirAC isutilized.

6. ACKNOWLEDGEMENTS

The research leading to these results has receivedfunding from the European Research Council underthe European Communitys Seventh 13 FrameworkProgramme (FP7/2007-2013) / ERC grant agree-ment no. [240453].

AES 132nd Convention, Budapest, Hungary, 2012 April 26–29

Page 7 of 9


40/46


Head-mounted audiovisual displays

ReproductionHead-mounted visual display +headphonesBoth video and spatial audio areupdated with head trackinginformationGeneric representation of audio inDirAC is well-suited for this

Demonstration by Aalto !Fraunhofer IIS demonstration


41/46


Head-mounted audio-visual displays(VR displays)

Virtual content (computer-generated world)Recorded content (surrounding camera + 3D sound)Very strong feeling of being somewhere else to subjectAbility to produce both externalized and internalized sound scenes


42/46


DirAC as virtual reality audio engine

directionalparameters

propagationsimulation

loud

spea

ker

or h

eadp

hone

sign

als

DirAC-monosynth

soundsynth 1

soundsynth N

B-fo

rmat

DirA

C en

codi

ng/d

ecod

ing

DirAC toB-format

DirAC toB-format

monoDirACstream

singleaudio channel

directionalparameters

DirAC-monosynth

propagationsimulation

parametersmono reverbB-format reverb

parametersmono reverb

mux

B-formatstream

Control spatial extent of virtual sourcesWith headphones: Creation of external - internal sourcesLoudspeaker-setup-independent reverberatorEfficient transmission of spatial sound


43/46


DevelopmentDifferent versions of DirAC available1st-order B-format input

some artifacts in acoustically challenging situationsapplause, broad-band sources in opposite directions, very strong earlyreflectionsdecorrelation process causes artifactscovariance-domain rendering minimizes the level of decorrelated sound [1]different assumptions of parameters yield different approaches [2], thathave a bit different problems

With higher number of microphones, higher quality is obtained,parametric processing needs to be less aggressiveA number of different techniques that use parametric approach inspatial audio has been proposed, see overview in [3]

[1] Vilkamo, Juha, and Ville Pulkki. "Minimization of decorrelator artifacts in directional audio coding by covariance domain rendering." Journal ofthe Audio Engineering Society 61.9 (2013): 637-646.[2] Barrett, Natasha, and Svein Berge. "A new method for B-format to binaural transcoding." Audio Engineering Society Conference: 40thInternational Conference: Spatial Audio: Sense the Sound of Space. Audio Engineering Society, 2010.[3] A. Politis, S. Delikaris-Manias and V. Pulkki "Overview to time-frequency-domain parametric spatial audio techniques" Parametrictime-frequency-domain spatial audio. Wiley (2017), in press, ask for a copy.


44/46


Higher-order microphones with TF-domain para-metric processing

Sound field

SF divided virtually into sectors

DirAC 1

DirAC 2

DirAC 3

2nd-orderB-format

1st-orderB-format

Energeticanalysis

Covariancedomainrendering

Higher-order DirAC (Politis & Pulkki)divide sound field into virtual sound fields


45/46


Dissemination of DirAC

DirAC was published first as SIRR [1] for impulse responses, and laterfor continuous sound [2]First TF-domain parametric audio method where the parameters aremeasured using a microphone setup346 references to [2] in 10 years, ninth of all JAES articlesCorresponding patents transferred to Fraunhofer IISCommercialization

[1] Merimaa, Juha, and Ville Pulkki. "Spatial impulse response rendering I: Analysis and synthesis." Journal of the Audio Engineering Society53.12 (2005): 1115-1127.[2] Pulkki, Ville. "Spatial sound reproduction with directional audio coding." Journal of the Audio Engineering Society 55.6 (2007): 503-516.


46/46


virtual acoustics and spatial audio - aalto · ville pulkki, tapio lokki virtual acoustics and...

Documents