virtual acoustics and spatial audio - aalto · ville pulkki, tapio lokki virtual acoustics and...
TRANSCRIPT
3/10/17
1
Ville Pulkki, Tapio Lokki
Virtual acoustics and spatial audio
ELEC-E5620, 2017Audio Signal Processing
Agenda 10.3.2017• Virtual Acoustics• Spatial audio techniques• Vector-base amplitude panning VBAP (separate slides)• Directional audio coding DirAC (separate slides)
2
3/10/17
2
Basics of soundSound propagates as waves• Audible frequency range 20 … 20 000 Hz• Speed of sound in air ~340 m/s• Wavelength 17 m … 17 mm• Dynamics 0 … 120 dB• scattering, diffraction, interference...Modeling of room acoustics: source – medium – receiver model
3
Virtual Acoustics (Väänänen, 2003)
4
3/10/17
3
Impulse response of a room
510 meters
7 meters
Impulse response of a room
3/10/17
4
Impulse responseA linear time-invariant system (LTI) can be modeled with an impulse response
The output y(t) is the convolution of the input x(t) and the impulse response h(t)
Discrete form (convolution is sum)
7
Measured impulse responses of real concert halls
8
−1
−0.5
0
0.5
1
Promenadisali
Impulssivaste konserttisalissa 15m etäisyydellä lavasta
−1
−0.5
0
0.5
1
Sibeliustalo
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1
−0.5
0
0.5
1
Musiikkitalo
Aika [s]
3/10/17
5
Measured impulse responses of real concert halls
9
−40
−30
−20
−10
0
Promenadisali
Impulssivaste konserttisalissa 15m etäisyydellä lavasta
−40
−30
−20
−10
0
Sibeliustalo
Mag
nitu
di [d
B]
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5−40
−30
−20
−10
0
Musiikkitalo
Aika [s]
Two goals of room acoustics modelingGoal 1: room acoustics prediction• Static source and receiver
positions• No real-time requirement
Goal 2: auralization, sound rendering• Possibly moving source(s) and
listener, even geometry• Both off-line and interactive
(real-time) applications• Need of anechoic stimulus signals
10
3/10/17
6
Goal 1: Prediction of room acoustics
11
(Springer Handbook of Acoustics 2007)
Prediction of acoustics of a room (goal 1)Input data:
• Geometry, materials, source(s) and receiver(s) locations and orientationsGoal:
• Impulse response(s)- room acoustical parameters (T60, C80, EDT, LEF...), ISO 3382-1:2009- low frequencies behaviour
Modeling:• Source(s)
- omnidirectional, sometimes directional• Medium
- sound propagation in air and reflections- as accurate as possible
• Receiver(s)- output mono, fig-of-eight, binaural
12
3/10/17
7
Goal 2: Auralization / sound rendering
“Auralization is the process of rendering audible, by physical or mathematical modeling, the sound field of a source in a space, in such a way as to simulate the binaural listening experience at a given position in the modeled space.” (Kleiner et al. 1993, JAES)Sound rendering: plausible 3-D sound, e.g., in games
3-D model Þ spatial IR * dry signal = auralization
13
Auralization, sound rendering (goal 2)Input data:
• Anechoic stimulus signal(s) !• Geometry, materials, source(s) and receiver(s) locations and
orientationsGoal:
• Plausible spatial sound, authentic auralizationModeling:
• Source(s)- omnidirectional, directional
• Medium- physically-based sound propagation in a room- perceptual models, i.e., artificial reverb
• Receiver- spatial sound reproduction (binaural or multichannel)
14
3/10/17
8
Dynamic auralization (≈sound rendering)Method 1: A grid of impulse responses are computed and convolution is performed with interpolation between 2/4/8/-nodes• Applied in CATT software (http://www.catt.se)Method 2: ”Parametric rendering”
15
Source ModelingStimulus• Sound signal synthesis• Anechoic recordings- https://mediatech.aalto.fi/en/research/virtual-
acoustics/research/acoustic-measurement-and-analysis/85-anechoic-recordings
Radiation• Directivity is a measure of the directional
characteristic of a sound source.• Point sources- omnidirectional- frequency dependent directivity characteristics
• Line and volume sources• Database of loudspeakers
http://www.clfgroup.org/
16
3/10/17
9
Directivity of musicalinstrumentsPätynen and Lokki (AAuA2010)Data available: https://mediatech.aalto.fi/en/research/virtual-acoustics/research/acoustic-measurement-and-analysis/77-directivity-of-instruments
17
18
Room acoustics modeling
• 1:10, 1:20, 1:50Scale models
• Element methods(FEM,BEM)
• Time-domain methods(FDTD, e.g., Waveguidemesh)
Wave-basedmethods
• Image-source method, beam tracing
• Ray-tracing, particles, phonon tracing, sonelmapping, etc.
• Acoustic radiancetransfer
Ray-basedmethods
• SEA• Sabine, Eyring, ym.
Statisticalmethods
3/10/17
10
19
Room acoustics modeling
Scalemodels
FEM,BEM
FDTD, e.g.,Waveguide
mesh
Wave-basedmethods
Image sourcemethod,
beam-tracing
Ray-tracing,particles,
phonon tracing,sonel mapping
Acousticradiancetransfer
Ray-basedmethods
SEA
Statisticalmethods
Physically-basedmodeling methods
Building acoustics, such as structural coupling, is not covered in this presentation
Room acoustics modeling methods
20
Scale models 1:10
(Tachibana et al. 2004)
Copenhagen 2006
Musiikkitalo 2009
3/10/17
11
ReproductionThe most intuitive way to study room acoustic prediction results• Not only for expertsAnechoic stimulus signal• Only a few recordings availableReproduction with binaural or multi-channel techniques
Impulse response has to contain also spatial information• Binaural IR• IR for each loudspeaker (e.g., SIRR, Spatial Impulse Response Rendering or
SDM, Spatial Decomposition Method)
21
Spatial audio(a.k.a. 3D sound)Humans are able to perceive the direction of sound event using only two ears. The mechanisms are based on monaural and binaural analysis of ear canal signals.Three-dimensional sound illusion can be achieved using headphones or a pair of loudspeakers. We can "cheat" the auditory system using 3-D audio!• We need to know:- basics of human hearing- basics of spatial hearing- signal processing- basics of loudspeakers- room acoustics
22
3/10/17
12
Modeling the acousticsof listenerHRIR = head-related impulse responseHRTF = head-related transfer function
23
Binaural hearingHumans have two earsFirst studies already at 1876 and 1907 by Lord RayleighBinaural hearing is based on:• interaural time difference (ITD), below ~700 Hz• interaural level difference (ILD), above ~2000 Hz• in-between (700-2000 Hz) both ITD and ILD (also other features)
24
3/10/17
13
Each human pinnae is individual
Torso, shoulders, head and pinna modify the perceived spectrum as a function of the incoming angle of soundRoom acoustics (early reflections), head movements and vision also contribute to spatial hearing sensationHRTF definition:
• Free-field impulse response from a point in a space to a point in the listener's ear canal
25
HRTF modeling and filter designChoice of HRTFs• Individual HRTFs yield best resultsMinimum-phase reconstruction• Reconstruction of ITD using a delay lineSpectral preprocessing• Equalization (diffuse-field, free-field)• Auditory smoothingFilter design• FIR, IIR, warped structures• Least squares, Chebyshev, Hankel norm designs
26
3/10/17
14
HRTF measurementsMicrophones in earsTurntable
27
http://www.ais.riec.tohoku.ac.jp/Lab3/localization/
HRTF measurements, new development[Huttunen et al. Rapid generation of personalized HRTFs. In AES 55th conf, 2014]Scanning head geometry, mathematical model, simulation (FM-BEM)
• Multiple cameras, 3D laser scanners, video, etc. (http://ownsurround.com)Reciprocal measurement: Loudspeaker in the ear, N receivers
28
3/10/17
15
Binaural reproduction with headphonesSeparate sound signals for both ears• HRTFs requiredHead-tracking needed to perceive sources outside-the-head (externalization)
29
Cross-talk cancellationbinaural reproduction with two loudspeakersCross-talk cancellation first introduced by Atal and Schroeder (1963 publication, 1966 patent)Originally intended for playing back dummy head recordings over loudspeakersSymmetry exploited in shuffler structure, transaural processing, by Cooper and Bauck(1989), covered by many patentsLoudspeaker setups• 60 degrees• 10 degrees (stereo-dipole)
Nintendo DS, Nokia phones
30
3/10/17
16
Vector base amplitude panning (VBAP)Developed by Ville Pulkki, 1997 at TKKAmplitude panning extended to 3DSimple calculation of gain factorsArbitrary positioning of loudspeakers
31
AmbisonicsInvented by Michael Gerzon (1973)Both recording and reproduction technique• soundfield microphoneBased on spherical harmonics theory2D panning, 1st order gain factorsIRCAM had recently 9th order system (>300 lps)
32
Quadrafonicloudspeaker setup:first orderambisonics
Quadrafonicloudspeaker setup:second orderambisonics
3/10/17
17
Wave field synthesis (WFS)Idea proposed by Berkhout (JAES, 1998) at the University of Delft • Based on the Huygens–Fresnel principle Requires a lot of loudspeakersLarge listening areaGreat animations: http://www.syntheticwave.de/wfs-properties.htm• http://www.syntheticwave.de/Principle%20of%20wave%20field%20synthesis.htm• http://www.syntheticwave.de/pictures/wave_field_synthesis.swfIOSONO• http://www.iosono-sound.com
33
Multichannel loudspeaker setups
34
10.25.1
22.2
3/10/17
18
Loudspeaker-setup agnostic systems
• 40-120 discrete channels are transmitted to the user• Each channel contains spatial metadata (panning direction,
spread, etc) • Supports in principle any number of loudspeakers• Cinema audio format for 3D loudspeaker setups• Blue-ray also
35
- Dolby Atmos: http://www.dolby.com/us/en/professional/technology/cinema/dolby-atmos.html
- DTS:Xhttp://dts.com/dtsx
- MPEG-H: https://en.wikipedia.org/wiki/MPEG-H_3D_Audio
Virtual loudspeakers with headphones
Each loudspeaker signal convolved with corresponding HRTFs• Head-tracking need real-time implementation
36
Dolby Headphones (5.1 or 7.1 with headphones)
Sony Playstation VR utilizes VBAP with N virtual loudspeakers, head tracking affects the virtual source directions, not HRTFs
3/10/17
19
37
Case: Auralization in DIVA system(Savioja et al., JAES1999)
1. Scene definition
2. Parametric presentation of sound paths
3. Auralization with parametric DSP structure
[email protected] - 38
Auralization parameters in DIVA
• Input contains (given to image-source calculation)– geometry data, material data, positions and orientations of
sources and the listener– anechoic recording
• For the direct sound and each image source the following set of auralization parameters is provided:– Distance from the listener– Azimuth and elevation angles with respect to the listener– Source orientation with respect to the listener– Set of filter coefficients which describe the material properties in
reflections
3/10/17
20
[email protected] - 39
Treatment of one image source – a DSP view
• Directivity• Air absorption• Distance attenuation• Reflection filters• Listener modeling
• Linear system• Commutation• Cascading
[email protected] - 40
DIVA auralization block diagram
3/10/17
21
[email protected] - 41
Treatment of each image source
[email protected] - 42
Late reverberation algorithm• A special version of feedback delay network (Väänänen et al.
1997)– also time-variant method (Lokki & Hiipakka, 2001)
3/10/17
22
[email protected] - 43
A Case Study: a Lecture Room
[email protected] - 44
Image sources 1st order
3/10/17
23
[email protected] - 45
Image sources up to 2nd order
[email protected] - 46
Image sources up to 3rd order
3/10/17
24
[email protected] - 47
Distance attenuation
[email protected] - 48
Distance attenuation (zoomed)
3/10/17
25
[email protected] - 49
Gain + air absorption
[email protected] - 50
Gain + air and material absorption
3/10/17
26
[email protected] - 51
All monaural filtering
[email protected] - 52
All monaural filtering (zoomed)
3/10/17
27
[email protected] - 53
Treatment of each image source
[email protected] - 54
Only ITD for pure impulse
3/10/17
28
[email protected] - 55
Only ITD for pure impulse (zoom)
[email protected] - 56
ITD + minimum phase HRTF
3/10/17
29
[email protected] - 57
Monaural filterings + ITD
[email protected] - 58
Monaural filterings + ITD + HRTF
3/10/17
31
[email protected] - 61
Image sources + reverberation
[email protected] - 62
Image sources + reverberation
3/10/17
32
[email protected] - 63
Image sources + reverberation
[email protected] - 64
Dynamic Sound Rendering
• Dynamic rendering– properties of image sources are time variant
• The coefficients of filters are changing all the time– every single parameter have to be interpolated– in delay line pick-ups the fractional filters have to be used to
avoid clicks and artifacts– Late reverberation is static
• Update rate ó latency
3/10/17
33
[email protected] - 65
Auralization quality
• What is the wanted quality?– Assesment of quality is possible only by case studies
• Objectively:– Acoustical attributes– With auditory modeling
• Subjectively:– Listening tests
[email protected] - 66
A case study, lecture hall T3
3/10/17
34
[email protected] - 67
Example impulse responses
• Image sources up to 4th order auralized• First order diffraction modeled• Statistical late reverberation
[email protected] - 68
Quality of auralization
Stimuli: clarinet drum
Results clarinet: recording auralizationResults drum: recording auralization
3/10/17
35
Conclusions on room acoustics modelingTwo goals:• Prediction of acoustical attributes• Auralization, sound renderingA lot of different methods applied in room acoustic modeling• All of them have weaknesses• A hybrid method, combining many techniques, would be the ideal
solution• A few commercial software available and in everyday use of
consultants• Some methods are still much too complex for modern computers
(computation time and memory)
69
Literature• Required reading (for exam):
– Savioja, L., Huopaniemi, J., Lokki, T., and Väänänen, R. 1999. Creating virtualacoustic environments. Journal of the Audio Engineering Society, vol. 47, no. 9,pp. 675-705, September 1999.
• Recommended reading:– Avendano, C., Jot, J-M. Frequency domain techniques for stereo to multichannel
upmix. Proc. AES 22nd international conference, June 15-17, 2002, Espoo,Finland, pp. 121-130
– Lokki, T. Tasting music like wine: Sensory evaluation of concert halls. PhysicsToday, vol. 67, no. 1, pp. 27-32, 2014. http://dx.doi.org/10.1063/PT.3.2242
– TKK doctoral dissertations, see http://lib.tkk.fi/Diss/• Huopaniemi, J. Virtual acoustics and 3-D sound in multimedia signal processing, 1999• Savioja, L. Modeling Techniques for Virtual Acoustics, 1999• Pulkki, V. Spatial Sound Generation and Perception by Amplitude Panning Techniques, 2001• Lokki, T. Physically-based Auralization – Design, Implementation, and Evaluation, 2002• Väänänen, R. Parametrization, Auralization, and Authoring of Room Acoustics for Virtual Reality
Applications, 2003• Merimaa, J. Analysis, Synthesis, and Perception of Spatial Sound – Binaural Localization Modeling
and Multichannel Loudspeaker Reproduction, 2006• Siltanen, S. Efficient Physics-Based Room-Acoustics Modeling and Auralization, 2010• Pätynen, J. A virtual symphony orchestra for studies on concert hall acoustics, 2011• Tervo, S. Localization and tracing of early acoustic reflections in enclosures, 2012• Lainen, M-V. Techniques for versatile spatial-audio reproduction in time-frequency domain, 2014
Vector-base amplitude panning VBAPDirectional audio coding DirAC
Ville PulkkiProfessor of Acoustics (Associate Professor)Department of Signal Processing and AcousticsSchool of Electrical EngineeringAalto University, Helsinki, Finland
Installation lecture
January 19, 2016
These slides
Vector base amplitude panning (VBAP)Variants and enhancement of VBAPTime-frequency-domain parametric spatial audioDirectional audio coding
Vector-base amplitude panning VBAP Directional audio codingDirAC
2/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
A music student with MSc (Eng) needsextra income (1995)
Sibelius Academy chamber music hall had lots ofloudspeakers on walls and ceilingSibA wanted to have a "panning tool" for theirloudspeaker system (one month salary for student)1-month joint project btw TKK and SibA
Vector-base amplitude panning VBAP Directional audio codingDirAC
3/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Reformulation of amplitude panning
Tried to generalize the sine panning law to 3D, no luck"Could this be formulated with vector bases?" – "Yes!"Vector base amplitude panning (VBAP) was bornDivide setup into triplets, and compute gain factors for each
n
l
p
l
l
m
k
loudspeaker m
loudspeaker k
virtualsource loudspeaker n
Vector-base amplitude panning VBAP Directional audio codingDirAC
4/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Vector base amplitude panning
PhD degree in 2001.
Vector-base amplitude panning VBAP Directional audio codingDirAC
5/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Dissemination of VBAP
Published VBAP paper in JAES 1997Provided free software implementations of the methodArticle has been cited 990 times in google scholar (2017)Second paper of all JAES papers, when ranked with the number ofcitations (scopus)
Vector-base amplitude panning VBAP Directional audio codingDirAC
6/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Products with "VBAP inside"
ITU MPEG-H audio standard (broadcast)DTS:X audio format (cinema + blueray)Sony Playstation VR (gaming)Dedicated audio programming softwares
Vector-base amplitude panning VBAP Directional audio codingDirAC
7/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
VBAP maths
n
l
p
l
l
m
k
loudspeaker m
loudspeaker k
virtualsource loudspeaker n
p = g1l1 + g2l2 + g3l3
g = [p1 p2 p3]
2
4l11 l12 l13l21 l22 l23l31 l32 l33
3
5�1
g holds the barycentric coordinates of virtual source in vector baseloudspeaker signals y
i
= g
i
x(t)
g
i
controls the amplitude of signal in each loudspeaker
Vector-base amplitude panning VBAP Directional audio codingDirAC
8/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
VBAP maths
g1 = g2 = 0.7g1 = g2 = 1.0
g1 = g2 = 0.5
p
l1l2
(q = 2)
40o90o
120o
g
i
depend on opening angle of the loudspeaker base / not good!Length of g must be normalized to avoid changes in loudnessg
norm
= g/(P
i gpi )
1/q
Thus: (P
i
g
q
norm
i
)1/q == 1q = 1 for anechoic cases (also headphones with virtual loudspeakers)q = 2 for normal rooms
Matlab code available https://se.mathworks.com/matlabcentral/fileexchange/53884-vector-base-amplitude-panning-library
Vector-base amplitude panning VBAP Directional audio codingDirAC
9/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
VBAP runtime cycle
init Feed in loudspeaker directionsinit 2D: form pairs. 3D form triplets. Compute inverse matricesrun multiply input sound x(t) with g, output to loudspeakers
intrpt1 Start interrupt when virtual source direction changesintrpt2 Compute gain factors for each LS pair/triplet, select the pair/triplet with
positive gainsintrpt3 Normalize gains
Max demo
Vector-base amplitude panning VBAP Directional audio codingDirAC
10/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Amplitude panning audio quality
Direction of amplitude-panned source is perceived relatively accuratelyin best listening positionOutside of sweet spot: directional perception is dominated by nearmostloudspeakerNo prominent coloration issues in normal rooms inside or outside thesweet spotMost-used virtual source positioning method: all mixers have "panpot"buttonsColoration issues in anehcoic listening
Vector-base amplitude panning VBAP Directional audio codingDirAC
11/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Amplitude panning, mechanism behind formationof perceived directionPerceiving a virtual source between the loudspeakers does notcorrespond to actual situation. If you have a red LED in both left and righthands, you see two LEDS, not one in between.Amplitude panning causes cross-talk and affects both ITD and ILD incomplex way
1t
t t2
t
1l
2r
r
l
g1 g2
R
L
t
g1 = g2 g1 > g2
t
~0.2ms ~0.2ms
ampltude-panned impulse responses at ear canals
Amplitude panning, mechanism behind formationof perceived direction
loudspeaker amplitude difference changes to interaural time differenceat low frequenciesloudspeaker amplitude difference changes to interaural level differenceat high frequencies
1 2 3 4 5 6 7 8 9 10 11
0
5
10
15
20
25
30
40
50
ITD
A [d
egre
es]
ERB channel
Frequency [kHz]0.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 18.2
0
5
10
15
20
25
30
θT =
1 2 3 4 5 6 7 8 9 10 11
0
10
20
30
40
50
ILD
A [d
egre
es]
ERB channel
Frequency [kHz]0.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 18.2
10
5
15
20
0
25
30θ
T =
Pulkki, Ville. Spatial sound generation and perception by amplitude panning techniques. PhD thesis. Helsinki University of Technology, 2001.
Spread issue
Perceived spread of amplitude-panned virtual sources depends on virtualsource direction p
When p is coincident with loudspeaker direction, "point-like"When p is in-between loudspeakers, "more or less spread"Frequency-dependent ITD and ILD cues do not match with real source
Vector-base amplitude panning VBAP Directional audio codingDirAC
14/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Multiple-direction amplitude panning
Make the spread even!
Define pDefine a number of vectors p
i
around p within angular range of �around pCompute g
i
for for each pi
,Sum over i : g =
Pi
g
Result: always more than one LS hasconsiderable gain.
Max demo
Pulkki, V. (1999). Uniform spreading of amplitude panned virtual sources. In Applications of Signal Processing to Audio and Acoustics, 1999 IEEEWorkshop on (pp. 187-190). IEEE.
Vector-base amplitude panning VBAP Directional audio codingDirAC
15/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Coloration of amplitude-panned sources
1t
t t2
t
1l
2r
r
l1
t
t t2
t
1l
2r
r
l
Direct sounds from loudspeakers interfere ! comb filter effect !audible colorationReflected and reverberated sound paths arrrive at ear canals inincoherent manner, and no comb filter effect occurs! Amplitude-panned sources are not perceived colored in normalrooms
Vector-base amplitude panning VBAP Directional audio codingDirAC
16/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Coloration of amplitude-panned sources
Amplitude-panned sources are colored in anechoic listeningIsn’t anechoic listening just a niche that nobody cares?Headphone listening with virtual loudspeakers + panning: that isanechoic listening(Sony playstation VR + many other VR applications)We should do something for thisAn easy solution is to utilize more loudspeakers: when the anglebetween LS is smaller, traveling time difference is smaller, andcomb-filter effect migrates to higher frequencies and becomes lesssalient
Vector-base amplitude panning VBAP Directional audio codingDirAC
17/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Coloration of amplitude-panned sources in ane-choic listening
CharacteristicsDip around 1-2 kHzAt high frequencies a bitlower leveleffect depends on
panning angleloudspeaker directionsroom effect
Can we compensate this by equalizing / other means?
Vector-base amplitude panning VBAP Directional audio codingDirAC
18/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Coloration of amplitude-panned sourcesin anechoic listening
Gain factor normalization gnorm
= g/(P
i
g
q
i
)q = 1 or q = 2In anechoic listening only frequencies below about 800Hz satisfy q = 1conditionAt frequencies above 2kHz it is not intuitively clear what happens.
Lets make q to depend on frequency and listening-room-response
Vector-base amplitude panning VBAP Directional audio codingDirAC
19/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Coloration of amplitude-panned sourcesin anechoic listeningA solution has been proposed in [1]
Gain factor normalization gnorm
= g/(P
i
g
q
i
)
A solution has been numerically obtained using auditory models androom measurementsq(f ,DTT) = (p0(f ))
pDTT + 2
q0(f ) = 1.5 � 0.5 cos [4.7 tanh (a1f ) max (0, 1 � (a2f )]
where a1 = 0.00045 and a2 = 0.000085DTT is direct-to-total energy ratio
[1] Laitinen, M. V., Vilkamo, J., Jussila, K., Politis, A., & Pulkki, V. (2014, August). Gain normalization in amplitude panning as a function offrequency and room reverberance. In Audio Engineering Society Conference: 55th International Conference: Spatial Audio. Audio EngineeringSociety.
Vector-base amplitude panning VBAP Directional audio codingDirAC
20/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Coloration of amplitude-panned sourcesin anechoic listening
The figures show q(f ,DTT) valuesResults simulated with large number oflistening conditions with loudspeaker spanfrom 30� to 80�
Requires frequency-domainimplementation of panningMitigates coloration issuesReadily implementable intime-frequency-domain processing, suchas in DirACCan be implemented with IIR filters (?)
Vector-base amplitude panning VBAP Directional audio codingDirAC
21/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Directional audio coding (DirAC)
Developed in Ville Pulkki’s research group 2001 —
reproduce recorded spatial soundsynthesize spatial properties to sound (e.g., game sound engine)time-frequency-domain parametric methodnon-linear, processing depends on signal and on spatial properties ofsound field
Vector-base amplitude panning VBAP Directional audio codingDirAC
22/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
How could a sound field be reproduced
Problems with existing techniques
Vector-base amplitude panning VBAP Directional audio codingDirAC
23/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Spatial sound reproduction
Target: relay the perception of sound!
Vector-base amplitude panning VBAP Directional audio codingDirAC
24/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Analogy with video
How does a video camera work?
LensLight from distinct direction is projected to one position at CCDCCD encodes the light energy at three frequency channels (RGB)Visible light wave lengths 380 nm - 780 nm (less than one octave)Very similar with eye
Vector-base amplitude panning VBAP Directional audio codingDirAC
25/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Spatial sound reproduction
Could we do the same with sound than with video camera
Create narrow beam for each loudspeakerAudible sound includes wave lengths from 2 cm to about 30 mImpossible to build a microphone having constant narrow beam widthwithout coloration and noise problemsHigher-order Ambisonics / beam steering try to do it
Vector-base amplitude panning VBAP Directional audio codingDirAC
26/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Spatial sound reproduction
Holography then, perhaps?
Lots of spaced microphonesLots of loudspeakersWave field synthesisProblems
High priceDirectivity of microphones should be matched with directivity ofloudspeakers - is it possible?
Vector-base amplitude panning VBAP Directional audio codingDirAC
27/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Sound fields reproduced with WFS and HOA
monochromatic plane waves reproducedvalid sound field only in limited listeningarea "sweet spot"at high frequencies huge errors
Daniel, JÈrÙme, Sebastien Moreau, and Rozenn Nicol. "Further investigations of high-orderambisonics and wavefield synthesis for holophonic sound imaging." Audio Engineering SocietyConvention 114. Audio Engineering Society, 2003. This
Vector-base amplitude panning VBAP Directional audio codingDirAC
28/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Parametric spatial sound reproduction
Are there any workarounds?
Human spatial hearing can be fooled easilyE.g. two coherent sources produce one virtual source in the middleCompare with vision: coherent sources do not produce virtual sourcesAssumption: at one frequency band humans perceive only one directionand one coherence cue
Vector-base amplitude panning VBAP Directional audio codingDirAC
29/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Parametric spatial sound reproduction
Capture the soundAnalyze spatial parametersReproduce the sound in a way which recreates the spatial parameters
micro-phones time-
frequencyanalysis
spatialanalysis
spatialsynthesis
microphonesignals in TFdomain
spatialmetadatain TF domain
loudspeaker or headphonesignals
Vector-base amplitude panning VBAP Directional audio codingDirAC
30/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Assumptions in DirAC
Assumption 1: listener is able to localize only one sound object at onetime-frequency positionAssumption 2: good reproduction quality is obtained, if we reproducecorrectly the
direction,diffuseness, andspectrum of sound
Vector-base amplitude panning VBAP Directional audio codingDirAC
31/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Example implementation
Vector-base amplitude panning VBAP Directional audio codingDirAC
32/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
B-format microphones
Vector-base amplitude panning VBAP Directional audio codingDirAC
33/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
B-format directional patterns
N = 0
N = 1
N = 2
N = 3
N denotes the order of patterns (and microphone)0th-order (omni) microphones capture pressure signal [W]1st-order dipole microphones capture volume velocity signals [X,Y,Z]
Vector-base amplitude panning VBAP Directional audio codingDirAC
34/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
DirAC details, some of them
Time-frequency transformFilter banksSTFTThe system can be seen a filter changing weights fast in time, aliasingissues have to be taken into account
p / W is pressure signal, u / [X Y Z] is 3D velocity vectorIa
= <[p⇤(k , n) u(k , n)](active intensity vector)
e = ⇢02 ||u||
2 + |p|2
2⇢0c
2 (energy density)Direction of arrival DOA = �I
a
Diffuseness = 1 � ||E[Ia
]||cE[e]
Temporal integration of parametersShort constants for DOA and DiffusenessLonger for loudspeaker gains
Vector-base amplitude panning VBAP Directional audio codingDirAC
35/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Matlab code for directional analysis% d i r a n a l y s i s .m% Author : V . Pu l kk i% Example o f d i r e c t i o n a l ana l ys i s o f s imulated B�format record ingFs=44100; % Generate s igna l ss ig1 =2⇤(mod ( [ 1 : Fs ] ’ ,40) /80 �0.5) .⇤ min (1 ,max( 0 , (mod ( [ 1 : Fs ] ’ , Fs/5)�Fs / 1 0 ) ) ) ;s ig2 =2⇤(mod ( [ 1 : Fs ] ’ ,32) /72 �0.5) .⇤ min (1 ,max( 0 , (mod ( [ [ 1 : Fs ]+ Fs / 6 ] ’ , Fs/3)�Fs / 6 ) ) ) ;% Simulate two sources i n d i r e c t i o n s o f 50 and 170 degreesw=( s ig1+s ig2 ) / s q r t ( 2 ) ;x=s ig1⇤cos (50/180⇤ p i )+ s ig2⇤cos(�170/180⇤p i ) ;y=s ig1⇤s in (50/180⇤ p i )+ s ig2⇤s in (�170/180⇤p i ) ;
% Add fad ing i n d i f f u s e noise w i th 36 sources evenly i n the h o r i z o n t a l plane 43 f o r d i r =0:10:350noise =( rand ( Fs , 1 ) �0 . 5 ) .⇤ ( 1 0 . \ ^ ( ( ( [ 1 : Fs ] ’ / Fs)�1)⇤2));w=w+noise / s q r t ( 2 ) ;x=x+noise⇤cos ( d i r /180⇤ p i ) ;y=y+noise⇤s in ( d i r /180⇤ p i ) ;
endhopsize =256; % Do d i r e c t i o n a l ana l ys i s w i th STFTwins ize =512; i =2; alpha =1. / (0 .02⇤Fs / wins ize ) ;In tens=zeros ( hopsize ,2 )+ eps ; Energy=zeros ( hopsize ,2 )+ eps ;
Pulkki, Ville, Tapio Lokki, and Davide Rocchesso. "Spatial effects." DAFX: Digital Audio Effects, Second Edition (2011): 139-183.
Vector-base amplitude panning VBAP Directional audio codingDirAC
36/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Matlab code for directional analysisf o r t ime =1: hopsize : ( leng th ( x)�wins ize )
% moving to frequency domainW= f f t (w( t ime : ( t ime+winsize �1)).⇤hanning ( wins ize ) ) ;X= f f t ( x ( t ime : ( t ime+winsize �1)).⇤hanning ( wins ize ) ) ;Y= f f t ( y ( t ime : ( t ime+winsize �1)).⇤hanning ( wins ize ) ) ;W=W( 1 : hopsize ) ; X=X( 1 : hopsize ) ; Y=Y( 1 : hopsize ) ;
%I n t e n s i t y computat iontempInt = r e a l ( con j (W) ⇤ [1 1 ] .⇤ [X Y ] ) / s q r t (2) ;% InstantaneousIn tens = tempInt ⇤ alpha + In tens ⇤ (1 � alpha ) ; %Smoothed
% Compute d i r e c t i o n from i n t e n s i t y vec to rAzimuth ( : , i ) = round ( atan2 ( In tens ( : , 2 ) , In tens ( : , 1 ) )⇤ ( 1 8 0 / p i ) ) ; %Energy computat iontempEn=0.5 ⇤ (sum( abs ( [ X Y ] ) . ^ 2 , 2) ⇤ 0.5 + abs (W) . ^ 2 + eps);% I n s tEnergy ( : , i ) = tempEn⇤alpha + Energy ( : , ( i �1)) ⇤ (1�alpha ) ; %Smoothed
%Di f fuseness computat ionDi f fuseness ( : , i ) = 1 � s q r t (sum( In tens . ^ 2 , 2 ) ) . / ( Energy ( : , i ) ) ; i = i +1;
end
% Plo t v a r i a b l e sf i g u r e ( 1 ) ; imagesc ( log ( Energy ) ) ; t i t l e ( ’ Energy ’ ) ;se t ( gca , ’ YDir ’ , ’ normal ’ ) ; x l a b e l ( ’ Time frame ’ ) ; y l a b e l ( ’ Freq bin ’ ) ;f i g u r e ( 2 ) ; imagesc ( Azimuth ) ; co lo rba r ;se t ( gca , ’ YDir ’ , ’ normal ’ ) t i t l e ( ’ Azimuth ’ ) ; x l a b e l ( ’ Time frame ’ ) ; y l a b e l ( ’ Freq bin ’ ) ;f i g u r e ( 3 ) ; imagesc ( Di f fuseness ) ; co lo rba r ;se t ( gca , ’ YDir ’ , ’ normal ’ ) ; t i t l e ( ’ Di f fuseness ’ ) ; x l a b e l ( ’ Time frame ’ ) ; y l a b e l ( ’ Freq bin ’ ) ;
Vector-base amplitude panning VBAP Directional audio codingDirAC
37/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
"HQ" implementation
Too high coherence in virtual microphone channels is enhanced bydiffuse stream: loudspeaker-specific frequency-dependent-delay(decorrelation)non-diffuse stream: panning factors used as gates
Vector-base amplitude panning VBAP Directional audio codingDirAC
38/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Applications of DirAC
Teleconferencing [1]Realistic reproduction of spatial sound environments [2]– especially for head-mounted displays (VR) [3]Virtual reality (game) audio engines [4]Spatial audio effects [5]
[1] Ahonen, Jukka. "Microphone front-ends for spatial sound analysis and synthesis with Directional Audio Coding." Phd thsesis. Aalto University(2013).[2] V. Pulkki, M.-V. Laitinen, J. Vilkamo, and J. Ahonen "First-order directional audio coding (DirAC)" Parametric time-frequency-domain spatialaudio. Wiley (2017), in press, ask for a copy.[3] Laitinen, M. V., and Pulkki, V. (2009, October). Binaural reproduction for directional audio coding. In Applications of Signal Processing to Audioand Acoustics, 2009. WASPAA’09. IEEE Workshop on (pp. 337-340). IEEE.[4] Laitinen, M. V., Pihlajam% ki, T., Erkut, C., and Pulkki, V. (2012). Parametric time-frequency representation of spatial sound in virtual worlds.ACM Transactions on Applied Perception (TAP), 9(2), 8.[5] Politis, A., Pihlajam% ki, T., and Pulkki, V. (2012). Parametric spatial audio effects. York, UK, September.
Vector-base amplitude panning VBAP Directional audio codingDirAC
39/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Capturing the reality
Omnidirectional cameraat least 6 lensesstitched to spherical video
3D microphonegeneric representation of spatialaudiocan be reproduced over DirAC
Gomez Bolanos AND Pulkki Immersive Audiovisual Environment with 3D audio
Fig. 7: Omnidirectional camera (Ladybug 3) setupwith the A-format microphone (SPS200).
Fig. 8: Video cropping utility (MAX patch). Crop-ping our group meeting recording.
and the e�ect of widening the sound source is alsoperceived correctly.
In general, the system has a good match betweenthe spatial distribution of the auditory events andthe spatial distribution of the visual events, whichtogether with the wide field of view, improves thesensation of immersion and, in consequence, realismin the scene.
5. CONCLUSIONS
An implementation of an immersive audiovisual en-vironment was presented. The system is based onthe use of acoustically transparent screens which re-duces the necessity of complex filtering for correctionof the loudspeaker responses. This environment con-sists of 29 active loudspeakers and three high defini-tion video projectors controlled by a computer. Thesystem also includes a tracking system for interactivepurposes. The loudspeakers are disposed around thelistener in a spherical disposition. The visual dis-play spans 226� in the horizontal plane and 57� inthe vertical plane. The system is easy to assem-ble and disassemble, and allows modifications in theconfiguration of the loudspeakers and the projectorsin order to perform other tasks. With this flexibil-ity, the system can be used for researching into otherfields as crossmodal interaction and psychoacoustics,auralization and gaming. An audiovisual capturingsystem consisting of an omnidirectional camera andan A-format microphone is utilized for acquiring au-diovisual material for the system. Several recordingswere done using the capturing system. It has beenfound, from informal listening tests of the recordedmaterial, that the system presents a good match be-tween visual and audio events, providing good spa-tial audio quality. The non-anechoic characteristicsof the room do not seem to a�ect the spatial audioquality of the reproduction system when DirAC isutilized.
6. ACKNOWLEDGEMENTS
The research leading to these results has receivedfunding from the European Research Council underthe European Communitys Seventh 13 FrameworkProgramme (FP7/2007-2013) / ERC grant agree-ment no. [240453].
AES 132nd Convention, Budapest, Hungary, 2012 April 26–29
Page 7 of 9
Vector-base amplitude panning VBAP Directional audio codingDirAC
40/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Head-mounted audiovisual displays
ReproductionHead-mounted visual display +headphonesBoth video and spatial audio areupdated with head trackinginformationGeneric representation of audio inDirAC is well-suited for this
Demonstration by Aalto !Fraunhofer IIS demonstration
Vector-base amplitude panning VBAP Directional audio codingDirAC
41/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Head-mounted audio-visual displays(VR displays)
Virtual content (computer-generated world)Recorded content (surrounding camera + 3D sound)Very strong feeling of being somewhere else to subjectAbility to produce both externalized and internalized sound scenes
Vector-base amplitude panning VBAP Directional audio codingDirAC
42/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
DirAC as virtual reality audio engine
directionalparameters
propagationsimulation
loud
spea
ker
or h
eadp
hone
sign
als
DirAC-monosynth
soundsynth 1
soundsynth N
B-fo
rmat
DirA
C en
codi
ng/d
ecod
ing
DirAC toB-format
DirAC toB-format
monoDirACstream
singleaudio channel
directionalparameters
DirAC-monosynth
propagationsimulation
parametersmono reverbB-format reverb
parametersmono reverb
mux
B-formatstream
Control spatial extent of virtual sourcesWith headphones: Creation of external - internal sourcesLoudspeaker-setup-independent reverberatorEfficient transmission of spatial sound
Vector-base amplitude panning VBAP Directional audio codingDirAC
43/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
DevelopmentDifferent versions of DirAC available1st-order B-format input
some artifacts in acoustically challenging situationsapplause, broad-band sources in opposite directions, very strong earlyreflectionsdecorrelation process causes artifactscovariance-domain rendering minimizes the level of decorrelated sound [1]different assumptions of parameters yield different approaches [2], thathave a bit different problems
With higher number of microphones, higher quality is obtained,parametric processing needs to be less aggressiveA number of different techniques that use parametric approach inspatial audio has been proposed, see overview in [3]
[1] Vilkamo, Juha, and Ville Pulkki. "Minimization of decorrelator artifacts in directional audio coding by covariance domain rendering." Journal ofthe Audio Engineering Society 61.9 (2013): 637-646.[2] Barrett, Natasha, and Svein Berge. "A new method for B-format to binaural transcoding." Audio Engineering Society Conference: 40thInternational Conference: Spatial Audio: Sense the Sound of Space. Audio Engineering Society, 2010.[3] A. Politis, S. Delikaris-Manias and V. Pulkki "Overview to time-frequency-domain parametric spatial audio techniques" Parametrictime-frequency-domain spatial audio. Wiley (2017), in press, ask for a copy.
Vector-base amplitude panning VBAP Directional audio codingDirAC
44/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Higher-order microphones with TF-domain para-metric processing
Sound field
SF divided virtually into sectors
DirAC 1
DirAC 2
DirAC 3
2nd-orderB-format
1st-orderB-format
Energeticanalysis
Covariancedomainrendering
Higher-order DirAC (Politis & Pulkki)divide sound field into virtual sound fields
Vector-base amplitude panning VBAP Directional audio codingDirAC
45/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics
Dissemination of DirAC
DirAC was published first as SIRR [1] for impulse responses, and laterfor continuous sound [2]First TF-domain parametric audio method where the parameters aremeasured using a microphone setup346 references to [2] in 10 years, ninth of all JAES articlesCorresponding patents transferred to Fraunhofer IISCommercialization
[1] Merimaa, Juha, and Ville Pulkki. "Spatial impulse response rendering I: Analysis and synthesis." Journal of the Audio Engineering Society53.12 (2005): 1115-1127.[2] Pulkki, Ville. "Spatial sound reproduction with directional audio coding." Journal of the Audio Engineering Society 55.6 (2007): 503-516.
Vector-base amplitude panning VBAP Directional audio codingDirAC
46/46
Pulkki January 19, 2016Dept Signal Processing and Acoustics