dave murray: developing fast dsp libraries for advanced processors dsp libraries need to be...

4
Dave Murray: Developing Fast DSP Libraries for Advanced Processors DSP libraries need to be efficient Efficiency is expensive to achieve Liberator is our tool for minimising development expense while maximising efficiency. It targets processors with a SIMD capability Everything that can be automated is done by Liberator We do the rest by hand

Upload: kerry-carroll

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dave Murray: Developing Fast DSP Libraries for Advanced Processors DSP libraries need to be efficient Efficiency is expensive to achieve Liberator is our

Dave Murray: Developing Fast DSP Libraries for Advanced Processors

DSP libraries need to be efficient Efficiency is expensive to achieve Liberator is our tool for minimising

development expense while maximising efficiency. It targets processors with a SIMD capability

Everything that can be automated is done by Liberator

We do the rest by hand

Page 2: Dave Murray: Developing Fast DSP Libraries for Advanced Processors DSP libraries need to be efficient Efficiency is expensive to achieve Liberator is our

Liberator Factorises libraries into four levels:

API: how the routines will be called. Current APIs: Full VSIPL (including double precision); CSIPL (a “plain C” lookalike); FFTW; several proprietary APIs.

Algorithm description: including multi-algorithms for efficiency in varying situations

Code Generation: Packages data into vectors; handles edge effects; performs multithreading; blocks the data; unrolls loops; prefetches when that's helpful; manages cache; handles edge effects (data size not a multiple of SIMD length); handles unaligned and strided data.

Processor-dependent back end:Needs to handle only MIMD-sized vectors. We have back ends for PPC/G4; Intel/SSE; MIPS64

Page 3: Dave Murray: Developing Fast DSP Libraries for Advanced Processors DSP libraries need to be efficient Efficiency is expensive to achieve Liberator is our

Example Performance Figures: multiple FFTs (N by N)

Single precision multiple FFT perfomance

0

2000

4000

6000

8000

10000

12000

14000

256 x 256 1K x 100 4K x 50 16K x 20 64K x 20 128K x 20

Matrix size

MF

LO

PS

8641D

IA T7400

IA SL9400

IA SL9400 -two threads

Page 4: Dave Murray: Developing Fast DSP Libraries for Advanced Processors DSP libraries need to be efficient Efficiency is expensive to achieve Liberator is our

For more details:

See our poster Chat to me (Dave Murray)

See us at www.nasoftware.co.uk