spl: a language and compiler for dsp algorithmsjjohnson/2009-10/winter/cs650/lectures/... · spl: a...
TRANSCRIPT
![Page 1: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/1.jpg)
SPL: A Language and Compiler for DSP Algorithms
Jianxin Xiong1, Jeremy Johnson2
Robert Johnson3, David Padua1
1Computer Science, University of Illinois at Urbana-Champaign2Mathematics and Computer Science, Drexel University
3MathStar Inchttp://polaris.cs.uiuc.edu/~jxiong/spl
Supported by DARPA
![Page 2: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/2.jpg)
2
Overview
SPL: A domain specific language DSP core algorithms Matrix factorization
SPL Compiler: SPL ⇒ Fortran/C programs Efficient implementation
Part of SPIRAL(www.ece.cmu.edu/~spiral): Adaptive framework for optimizing DSP libraries Search over different SPL formulas using SPL
compiler.
![Page 3: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/3.jpg)
3
Outline
Motivation Mathematical formulation of DSP algorithms SPL Language SPL Compiler Performance Evaluation Conclusion
![Page 4: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/4.jpg)
4
Motivation
What affects the performance? Architecture features:
pipeline, FU, cache, …
Compiler: Ability to take advantage of architecture features Ability to handle large / complicated programs
Ideal compiler Perform perfect optimization based on the
architecture Practical compilers have limiations
![Page 5: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/5.jpg)
5
Motivation (continue)
Manual Performance Tuning Modify the source based on profiling information Requires knowledge about the architecture features Requires considerable work The performance is not portable
Automatic performance tuning? Very difficult for general programs DSP core algorithms: SPIRAL.
![Page 6: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/6.jpg)
6
SPIRAL Framework
Formula Generator
SPL Compiler
Performance Evaluation
SearchEngine
DSP Transform
Architecture DSP Libraries
SPL Formulae
C/FORTRAN Programs
![Page 7: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/7.jpg)
7
Fast DSP Algorithms as Matrix Factorizations A DSP Transform:
y = Mx ⇒ y = M1M2…Mk x Example: n-point DFT y = Fnx
LFITIFF 4222
42224 )()( ⊗⊗=
−−−−−−
=
ii
iiF
111111
111111
4
−
−
−−
=
11
11
1111
1111
11
1
1111
1111
i
![Page 8: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/8.jpg)
8
Tensor Product A linear algebra operation for representing repetitive
matrix structures
=⊗
B
BBI
''1
111
''
nnmmmnm
n
nmnm
BaBa
BaBaBA
×
××
=⊗
Loop
![Page 9: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/9.jpg)
9
Tensor Product (continue)
=⊗
mn
mn
m
m
n
n
a
a
a
a
a
a
a
a
IA
1
1
1
1
11
11
Vector operations
![Page 10: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/10.jpg)
10
Rules for Recursive Factorization
rsrsr
rsssrrs )LF(I)TI(FF ⊗⊗=
[ ] ∏∏==
+
−
+
+−+−⊗⋅⊗⊗⊗=
1
ki
nnnn
k
1i
nnnnnnnn )L(I)T)(IIF(IF ii
ii
ii
iiiii
where n=n1…nk, ni-=n1…ni-1, ni+=ni+1…nk
Cooley-Tukey factorization for DFT
General K-way factorization for DFT
![Page 11: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/11.jpg)
11
Formulas
RFITIIFITIFF 824422222
84428 ))()(()( ⊗⊗⊗⊗⊗=
LLFITIFITIFF 82
4222
42222
84428 )))()((()( ⊗⊗⊗⊗=
Variations of DFT(8)8222
84428 )LF(I)TI(FF ⊗⊗=
![Page 12: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/12.jpg)
12
The SPL Language Domain-specific programming language for
describing matrix factorizations Domain-specific programming language for
describing matrix factorizations
(compose(tensor (F 2)(I 2))(T 4 2)(tensor (I 2)(F 2))(L 4 2)
matrix operationsprimitives: parameterized special matrices
LFITIFF 4222
42224 )()( ⊗⊗=
![Page 13: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/13.jpg)
13
SPL In A Nut-shell SPL expressions
General matrices (matrix (a11…a1n) … (am1 … amn)) (diagonal (a11…ann)) (sparse (i1 j1 a1) … (ik jk ak))
Parameterized special matrices (I n), (L mn n), (T mn n), (F n)
Matrix operations (compose A1 … Ak ) (tensor A1 … Ak ) (direct_sum A1 … Ak )
Others: definitions, directives, template, commentsA⊕B=diag(A,B)
![Page 14: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/14.jpg)
14
A Simple SPL Program
; This is a simple SPL program(define A (matrix(1 2)(2 1)))(define B (diagonal(3 3))#subname simple(tensor (I 2)(compose A B));; This is an invisible comment
Definition DirectiveFormula Comment
![Page 15: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/15.jpg)
15
The SPL Compiler
Parsing
Intermediate Code Generation
Intermediate Code Restructuring
Target Code Generation
Symbol TableAbstract Syntax Tree
I-Code
I-Code
FORTRAN, C
Template Table
SPL Formula Template DefinitionSymbol Definition
OptimizationI-Code
![Page 16: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/16.jpg)
16
Template Based Intermediate Code Generation
Why use template? User-defined semantics Language extension Compiler extension without modifying the compiler Be integrated into the search space
Structure of a template Pattern, condition, code
Template match Generate I-code from matching template Template matching is a recursive procedure
![Page 17: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/17.jpg)
17
I-Code
I-code is the intermediate code of the SPL compiler
Internally I-code is four-tuples <op, src1, src2, dest>
The external representation of I-code Fortran-like Used in template
![Page 18: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/18.jpg)
18
Template
(template(F n)[ n >= 1 ]( do i=0,n-1
y(i)=0do j=0,n-1y(i)=y(i)+W(n,i*j)*x(j)
endend ))
Pattern
I-code
Condition
![Page 19: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/19.jpg)
19
Code Generation and Template Matching
(F 2) matches pattern (F n) and assigns 2 to n.Because n=2 satisfies the condition n>=1,the following i-code is generated from the template:
do i = 0,1y(i) = 0do j = 0,1
y(i) = y(i)+W(2,i*j)*x(j)end
end
Y(0)=x(0)+x(1)y(1)=x(0)-x(1)
Unrolling & Optimization
![Page 20: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/20.jpg)
20
Define A Primitive
(primitive J)(template(J n)[ n >= 1 ]( do i=0,n-1
y(i) = x(n-1-i)end ))
nn
nJ
×
=
1
1
![Page 21: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/21.jpg)
21
Define An Operation
(operation rcompose) (template(rcompose A B)[ B.nx == A.ny ]( t = A(x)y = B(t)))
y = (A° B)x ≡ t = Ax
y = Bt
![Page 22: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/22.jpg)
22
Compound Template Matching
(rcompose (J 2)(F 2))
(rcompose A B )
(J 2)
(J n)
(F n)
t = x
y = (F 2) t
t(0)=x(1)t(1)=x(0)
y(0)=t(0)+t(1)y(1)=t(0)-t(1)
y(0)=x(1)+x(0)y(1)=x(1)-x(0)
optimize
![Page 23: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/23.jpg)
23
Intermediate Code Restructuring
Loop unrolling Degree of unrolling can be controlled globally or case
by case Scalar function evaluation
Replace scalar functions with constant value or array access
Type conversion Type of input data: real or complex Type of arithmetic: real or complex Same SPL formula, different C/Fortran programs
![Page 24: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/24.jpg)
24
Optimizations
Low-level optimizations: Instruction scheduling, register allocation, instruction selection, … Leave them to the native compiler
Basic high-level optimizations: Constant folding, copy propagation, CSE, dead code elimination,… The native compiler is supposed to do the dirty work, but not enough.
High-level scheduling, loop transformations: Formula transformation Integrated into the search space
![Page 25: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/25.jpg)
25
Basic Optimizations(FFT,N=25,Ultra5)
![Page 26: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/26.jpg)
26
Basic Optimizations(FFT,N=25,Origin200)
![Page 27: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/27.jpg)
27
Basic Optimizations(FFT,N=25,PC)
![Page 28: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/28.jpg)
28
Performance Evaluation
Platforms: Ultra5, Origin 200, PC Small-size FFT (21 to 26)
Straight-line code K-way factorization Dynamic programming
Large-size FFT (27 to 220) Loop code Binary right-most factorization Dynamic programming
Accuracy, memory requirement
![Page 29: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/29.jpg)
29
FFTW
A FFT package Codelet: optimized straight-line code for small-size
FFTs Plan: factorization tree Use dynamic programming to find the plan Make recursive function calls to the codelet according
to the plan Measure and estimate
![Page 30: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/30.jpg)
30
FFT Performance (N=21 to 26,Ultra5)
![Page 31: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/31.jpg)
31
FFT Performance (N=21 to 26,Origin200)
![Page 32: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/32.jpg)
32
FFT Performance (N=21 to 26,PC)
![Page 33: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/33.jpg)
33
FFT Performance (N=27 to 220,Ultra5)
![Page 34: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/34.jpg)
34
FFT Performance (N=27 to 220,Origin200)
![Page 35: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/35.jpg)
35
FFT Performance (N=27 to 220,PC)
![Page 36: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/36.jpg)
36
FFT Accuracy (N=21 to 218)
![Page 37: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/37.jpg)
37
FFT Memory Utilization (N=27 to 220)
![Page 38: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/38.jpg)
38
Conclusion
• The SPL compiler is capable of producing efficient code on a variety of platforms.
• The standard optimizations carried out by the SPL compiler are necessary to get good performance.
• The template mechanism makes the SPL language and the SPL compiler highly extensible
![Page 39: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/39.jpg)
39
Related WorkDomain Code Generator Tuning
FFTW FFT Fix algorithms DP
WHT Package WHT Built-in DP, GA
EXTENT Blockrecursive
Built-in Manual
ATLAS BLAS Hand coded,Blocking, unrolling
Search
PHiPAC BLAS Hand coded Search
IterativeCompilation
Compileroption
N/A Search
![Page 40: SPL: A Language and Compiler for DSP Algorithmsjjohnson/2009-10/winter/cs650/lectures/... · SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson2 Robert](https://reader036.vdocuments.site/reader036/viewer/2022070609/5ade64e57f8b9a9a768e2f03/html5/thumbnails/40.jpg)
40
Performance Evaluation: Platforms
Ultra5 Solaris 7, Sun Workshop 5.0 333MHz UltraSPARC Iii, 128MB, 16KB/16KB/2MB
Origin 200 IRIX64 6.5, MIPSpro 7.3.1.1m 180MHz MIPS R10000, 384MB, 32KB/32KB/1MB
PC Linux kernel 2.2.18, egcs 1.1.2 400MHz Pentium II, 256MB, 16K/16K/512KB