advances in the parallelization of music and audio applications
DESCRIPTION
Advances in the Parallelization of Music and Audio Applications. Eric Battenberg, David Wessel & Juan Colmenares. Overview. Parallelism today in the popular interactive music languages Parallel Partitioned Convolution - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/1.jpg)
+ Advances in the Parallelization
of Music and Audio Applications
Eric Battenberg, David Wessel & Juan Colmenares
![Page 2: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/2.jpg)
+ Overview
Parallelism today in the popular interactive music languages
Parallel Partitioned Convolution Accelerating Non-Negative Matrix Factorization (NMF)
for use in audio source separation and music information retrieval and the importance of Selective, Embedded Just In Time Specialization (SEJITS)
Real-time in the Tessellation OS A plea for more flexible I/0 with GPUs
![Page 3: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/3.jpg)
+Current Support for Parallelism is Copy-Based The widely used languages for music and audio
applications are fundamentally sequential in character – this includes Max/MSP, PD, SuperCollider, and CHUCK among others.
Limited multithreading One approach to exploiting multi-core processors is to run
copies of the applications on separate cores. Max/MSP provides a useful multi-threading mechanism
called poly~ . PD provides PD~ each instance of which runs in a separate
thread inside a PD patch.
![Page 4: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/4.jpg)
+Partitioned Convolution
First real-time app in the Par Lab.
Partitioned Convolution – an efficient way to do low-latency filtering with a long (> 1 sec) impulse response.
Important in real-time reverb processing for environment simulation.
Sound examples:
Acoustic Guitar …in a giant mausoleum …convolved with a sine sweep
Impulse response Impulse response
![Page 5: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/5.jpg)
+Partitioned Convolution Convolution: a way to do linear filtering with a finite impulse
response (FIR) filter.
Direct convolution: For length L filter, O(L) ops per output point, zero delay. L can be greater than 100,000 samples (> 3 sec of audio)
Block FFT Convolution: Only O(log(L)) ops per output point, but delay of L.
How can we trade off between complexity and latency?
€
y[n] = h[k]x[n − k]k
∑
FFT Complex Mult IFFTx y
H H = FFT(h)
![Page 6: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/6.jpg)
+Uniform Partitioned Convolution We would like the latency to be less than 10ms (512
samples) Cut an impulse response up into equal-sized blocks.
Then we can use a parallellayout of Block FFT convolverswith delays to implement the filter.
The latency is now N, and we still get complexity savings.
L N
1 43 52
delay(N)
delay(N)
delay(N)
delay(N)
1
2
3
4
5
+
x
y
Block FFT Convolver
![Page 7: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/7.jpg)
+Frequency Delay Line Convolution We can also exploit linearity of the FFT so that only one
FFT/IFFT is required.
So the parallel Block FFT Convolver above becomes a Frequency Delay Line (FDL) Convolver:
delay(N)
delay(N)
1
2
3+
x
y
FFT Complex Mult IFFTH1
Block FFT Convolver
delay(N)
delay(N) +
x
y
Complex Mult
H1
Complex Mult
H2
Complex Mult
H3
FFT
IFFT
Frequency Delay Line Convolver
![Page 8: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/8.jpg)
+Multiple FDL Convolution If L is big (e.g. > 100,000) and N is small (e.g. < 1000),
our FDL will have 100’s of partitions to handle.
We can connect multiple FDL’s in parallel to get the best of both worlds.
xdelay(Nx6
)
delay(4Nx4)
FDL 1
FDL 2
FDL 3
+ y
x FDL y
![Page 9: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/9.jpg)
+Scheduling Multiple FDLs FDLs are run in separate threads.
Each is allowed to compute for a length of time corresponding to its block size.
Synchronization is performed at the vertical lines.
![Page 10: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/10.jpg)
+Auto-Tuning for Real-Time We are not trying to only maximize throughput. We are trying to improve our ability to make real-time
guarantees. For now, we estimate a Worst-Case Execution Time
(WCET) for each size of FDL. Then we combine the FDLs that are most likely to meet
their scheduling deadlines.
In the future, we will use a notion of predictability along with more robust scheduling.
We are finishing development on a Max/MSP object, Audio Unit plugin, and a portable standalone version of this.
![Page 11: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/11.jpg)
+ Accelerating Non-Negative Matrix Factorization (NMF)
NMF is widely used in audio source separation. The idea is to factor the time/frequency representation (spectogram) into source coupled spectral (W) and gain (H) matricies.
![Page 12: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/12.jpg)
+ The Importance of SEJITSin Developing an Information Retrieval (MIR) Application
Rather using a domain restricted language developers write in a full blown scripting language such as PYTHON or RUBY.
Functions are selected by annotation as performance critical. If efficiency layer implementations of these functions are
available appropriate code is generated and JIT compiled. If not the selected function is executed in the scripting
language itself. The scripted implementation remains as the portable
reference implementation.
![Page 13: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/13.jpg)
With this simple music computer application we expect to initially show that Tessellation can provide acceptable performance and time predictability
In cooperation with the OS Group
2nd-level RT scheduler A
Cell A
2nd-level RT scheduler B
Cell B Initial Cell
Sound card
Shell
F
Output
Input
Music Program
End-to-end Deadline
Intermediate Deadline
Audio Processing & Synthesis Engine
Channel
F
Most of the engine’s
functionality
FilterParallel version of a partition-based convolution algorithm
Audio Input
Additional Cells
A real-time application in Tessellation
![Page 14: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/14.jpg)
2nd-level Scheduling
Cell
Tessellation Kernel(Partition Support)
(*) Bottom part of the diagram was adapted from Liu and Asanovic, “Mitosys: ParLab Manycore OS Architecture,” Jan. 2008.
1.A) Cell and Space PartitioningA Spatial Partition (or Cell) comprises a group of processors acting within a hardware boundary
Each cell receives a vector of basic resources– Some number of processors, a portion of
physical memory, a portion of shared cache memory, and potentially a fraction of memory bandwidth
A cell may also receive – Exclusive access to other resources
(e.g., certain hardware devices and raw storage partition)
– Guaranteed fractional services (i.e., QoS guarantees) from other partitions (e.g., network service and file service)
CPUL1
L2Bank
DRAM
DRAM & I/O Interconnect
L1 Interconnect
CPUL1
L2Bank
DRAM
CPUL1
L2Bank
DRAM
CPUL1
L2Bank
DRAM
CPUL1
L2Bank
DRAM
CPUL1
L2Bank
DRAM
(+) Fraction of memory bandwidth
![Page 15: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/15.jpg)
+
Time-sensitive Network
Subsystem
Network Service
(Net Partition)
Input device(Pinned/TT Partition)
Graphical Interface
(GUI Partition)
Audio-processing / Synthesis Engine(Pinned/TT partition)
Output device(Pinned/TT Partition)
GUI Subsystem
Communication with other audio-processing nodes
Music program
Preliminary
Example of Music Application
![Page 16: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/16.jpg)
+
A plea for more flexible GPU I/O
![Page 17: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/17.jpg)
+
Thanks for your attention.
![Page 18: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/18.jpg)
+ Reserve Slides
![Page 19: Advances in the Parallelization of Music and Audio Applications](https://reader036.vdocuments.site/reader036/viewer/2022081517/56816737550346895ddbe782/html5/thumbnails/19.jpg)
Tessellation OS
+Tessellation: 19
November 12th, 2009
Tessellation in Server Environment
DiskI/O
Drivers
OtherDevices
NetworkQoS
MonitorAnd
Adapt
Persistent Storage &Parallel File System
Large Compute-BoundApplication
Large I/O-BoundApplication
DiskI/O
Drivers
OtherDevices
NetworkQoS
MonitorAnd
Adapt
Persistent Storage &Parallel File System
Large Compute-BoundApplication
Large I/O-BoundApplication
DiskI/O
Drivers
OtherDevices
NetworkQoS
MonitorAnd
Adapt
Persistent Storage &Parallel File System
Large Compute-BoundApplication
Large I/O-BoundApplication
DiskI/O
Drivers
OtherDevices
NetworkQoS
MonitorAnd
Adapt
Persistent Storage &Parallel File System
Large Compute-BoundApplication
Large I/O-BoundApplication
QoS
Guarantees
Cloud StorageBW QoS
QoS
Guarantees
QoSGuarantees
QoS
Guarantees