juergen ributzka david stephenson timothy kong dee lee fred chow guang r. gao
TRANSCRIPT
Open64
A Software Pipelining Framework for Simple Processor Cores
•Juergen Ributzka•David Stephenson•Timothy Kong•Dee Lee•Fred Chow•Guang R. Gao
Copyright © 2009 - Juergen Ributzka. All rights reserved. 2Open64
Overview
• SiCortex Multiprocessor• Software Pipelining Framework• Results• Future Work
3/23/2009
Copyright © 2009 - Juergen Ributzka. All rights reserved. 3Open64
SiCortex Multiprocessor
• RISC• 6 cores (MIPS 5KF)• 500 – 700 MHz• In-Order Execution• Limited-Dual Issue• Low Power (12 – 16 Watt)
3/23/2009
Copyright © 2009 - Juergen Ributzka. All rights reserved. 4Open64
Software Pipelining Framework
3/23/2009
DDG
MII
MS
MVE
RA
CG
Front-End
Middle-End
Back-End
FORTRAN C/C++ Java
Loop NestOptimizer
GlobalOptimizer
CodeGenerator
Copyright © 2009 - Juergen Ributzka. All rights reserved. 5Open64
Data Dependence Graph (DDG)
• Cyclic Data Dependence Graph• No register anti- and output-
dependencies• Only single BBs
3/23/2009
DDG
MII
MS
MVE
RA
CG
S1
S2
S1
S2
Copyright © 2009 - Juergen Ributzka. All rights reserved. 6Open64
Minimum Initiation Interval
• Limited by:– Resource Requirements– Recurrence Requirements
3/23/2009
DDG
MII
MS
MVE
RA
CG
Copyright © 2009 - Juergen Ributzka. All rights reserved. 7Open64
Modulo Scheduling (MS)
• Huff Modulo Scheduling– Lifetime sensitive– Uses backtracking
• Hyper Node Reduction Scheduling– Lifetime sensitive– No backtracking
3/23/2009
DDG
MII
MS
MVE
RA
CG
Copyright © 2009 - Juergen Ributzka. All rights reserved. 8Open64
Modulo Variable Expansion (MVE)
• Separate TN set for every iteration• # of unrollings depends on longest
lifetime
3/23/2009
DDG
MII
MS
MVE
RA
CG
TN10 op1 TNXX…TN20 op2 TN10 TNYY
TN11 op1 TNXX…TN21 op2 TN11 TNYY
TN10 op1 TNXX…TN20 op2 TN10 TNYY
Iteration
Cycle
Copyright © 2009 - Juergen Ributzka. All rights reserved. 9Open64
Register Allocation (RA)
• Only kernels are fully register allocated
• No regions– SWP kernels are not a black box for GRA
and LRA anymore• Currently no register spilling support
3/23/2009
DDG
MII
MS
MVE
RA
CG
Copyright © 2009 - Juergen Ributzka. All rights reserved. 10Open64
Code Generation (CG)
• Prologues and epilogues are partially register allocated
• Need several epilogues due to different register sets
3/23/2009
DDG
MII
MS
MVE
RA
CG
P
P0
K0
K1 E0
E1
E2
E
Copyright © 2009 - Juergen Ributzka. All rights reserved. 11Open64
Benchmark Results
3/23/2009
BT CG EP FT IS LU MG SP UA0.9
0.95
1
1.05
1.1
1.15
1.2
NAS Parallel Benchmark
ABI n32ABI 64
spee
dup
Copyright © 2009 - Juergen Ributzka. All rights reserved. 12Open64
Benchmark Results
3/23/2009
410.
bwav
es
433.
milc
434.
zeus
mp
435.
grom
acs
436.
cact
usAD
M
437.
lesli
e3d
444.
nam
d
447.
deal
II
450.
sopl
ex
453.
povr
ay
454.
calcu
lix
459.
Gem
sFDT
D
465.
tont
o
470.
lbm
481.
wrf
482.
sphi
nx3
400.
perlb
ench
401.
bzip
2
403.
gcc
429.
mcf
445.
gobm
k
456.
hmm
er
458.
sjeng
462.
libqu
antu
m
464.
h26r
ef
471.
omne
tpp
473.
asta
r
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
SPEC2006
ABI n32ABI 64
spee
dup
Copyright © 2009 - Juergen Ributzka. All rights reserved. 13Open64
Conclusion and Future Work
• Spilling support needed to enable more loops for SWP
• Screen out loops with small trip count during runtime to reduce SWP overhead
• Hit-under-miss support to overcome current hardware limitations
3/23/2009
Copyright © 2009 - Juergen Ributzka. All rights reserved. 14Open64
Questions ?
3/23/2009