derivation of efficient fsm from polyhedral loop nests tomofumi yuki, antoine morvan, steven derrien...
Post on 14-Dec-2015
216 Views
Preview:
TRANSCRIPT
Derivation of Efficient FSM from Polyhedral Loop Nests
Tomofumi Yuki, Antoine Morvan, Steven Derrien
INRIA/Université de Rennes 1
1
High-Level Synthesis
Writing HDL is too costly Led to emergence of HLS tools
HLS is sensitive to input source Must be written in “HW-aware” manner
Source-to-Source transformations Common in optimizing compilers (semi-)automated exploration at HLS
stage Further enhance
productivity/performance
2
HLS Specific Transformations Not all optimizing compiler
transformations make sense in embedded context Its converse is also true
Finite State Machines is an example
for loops are preferred ingeneral purpose context
3
for i for j S0 for k S1
while (…) if (…) S0; if (…) S1; if (…) k = k+1; if (…) i=i+1; j=0;
FSMderivation
Contributions
Analytical model of Loop Pipelining Understanding when to use Nested LP w.r.t. Single Loop Pipelining
Derivation of Finite State Machines Handles imperfectly nested loops Based on polyhedral techniques
Pipelining of the control-path Computing n-states ahead Improves performance of the control-
path
4
Outline
Modeling Loop Pipelining Single Loop Pipelining Nested Loop Pipelining NLP vs SLP
FSM Derivation Evaluation Conclusion
5
Single Loop Pipelining
Overlapped execution of innermost loop
6
for i=1:M for j=1:N S(i,j);
for i=1:M for j=1:N stage0(i,j); stage1(i,j); stage2(i,j); stage3(i,j);
for i=1:M s0(i,1); s1(i,1); s0(i,2); s2(i,1); s1(i,2); … s3(i,1); s2(i,2); … s0(i,N); s3(i,2); … s1(i,N); … s2(i,N); s3(i,N);
Pipeline flush/fill Overhead
Overhead for each iteration of the outer loop
7
i=1 i=2 i=3
for i=1:M for j=1:N s0(i,j); s1(i,j); s2(i,j); s3(i,j);
under-utilized stages
Nested Loop Pipelining
“Compress” by pipelining alltogether
8
i=1i=2
i=3
for i=1:M for j=1:N s0(i,j); s1(i,j); s2(i,j); s3(i,j);
for i=1:M j=j+1; j<N s0(i,j); s1(i,j); s2(i,j); s3(i,j);
while(has_next) i,j=next(i,j) s0(i,j); s1(i,j); s2(i,j); s3(i,j);
NLP Overhead
Larger control-path FSM for loop nest, instead of a single
loop FSM for SLP is a simple check on loop
bound Hinders maximum frequency
Complex control-path may take longer than one data-path stage
Savings in flush/fill overhead must be greater than the loss in frequency
9
Modeling Trade-offs
Important parameters: Frequency Degradation due to NLP Innermost trip count Number of pipeline stages
f*: NLP frequency normalized to SLP f* = 0.9 means 10% degradation in
frequency α= #stages / trip count
larger α means large flush/fill overhead
10
When is NLP Beneficial?
11
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.2 0.12 0.24 0.36 0.48 0.60 0.72 0.84 0.96 1.08 1.20
0.4 0.14 0.28 0.42 0.56 0.70 0.84 0.98 1.12 1.27 1.41
0.6 0.16 0.32 0.48 0.64 0.80 0.96 1.12 1.28 1.45 1.59
0.8 0.18 0.36 0.54 0.72 0.90 1.08 1.27 1.45 1.61 1.79
1 0.20 0.40 0.60 0.80 1.00 1.20 1.41 1.59 1.79 2.00
1.2 0.22 0.44 0.66 0.88 1.10 1.32 1.54 1.75 1.96 2.22
1.4 0.24 0.48 0.72 0.96 1.20 1.45 1.67 1.92 2.17 2.38
1.6 0.26 0.52 0.78 1.04 1.30 1.56 1.82 2.08 2.33 2.63
1.8 0.28 0.56 0.84 1.12 1.41 1.67 1.96 2.22 2.50 2.78
2 0.30 0.60 0.90 1.20 1.49 1.79 2.08 2.38 2.70 3.03
f*: higher = less degradation
α: larger = small trip count (innermost)
Program Characteristic
(cannot change)
Improving control-path is
possible
Model speedup as a function of f* and α
Outline
Modeling Loop Pipelining FSM Derivation
Polyhedral Representation Computing Transitions State Look Ahead
Evaluation Conclusion
12
Polyhedral Representation
Represent loops as mathematical objects
13
for i = 0:N for j = 0:M S
S M
N
for i = 0:N for j = 0:i S0 for k = 0:N-i S1
S0S1
FSM Derivation
next function Find a piece-wise function that gives the
immediate successor in lexicographic order
Proposed in 1998 for low-level code generation
Direct Application to FSM Each piece = condition of transition Function = transition
Can be composed to obtain nextn
14
State Look Ahead
Pipelining the control-flow When data-path is heavily pipelined,
control-path becomes the critical path Computing n-states ahead
Allows n-stage pipelining of the control-path
15
datapath
i,j i,’j’ i”,j”
next2
datapath
i,j i’,j’
next
Other Optimizations
Merging transitions next computed can have many
transitions Some can be merged by looking at its
context
Common Sub-expressions HLS tools sometimes fail to catch
16
next(i,j) = (i,j+1) if i<N
next(i,j) = (N,j+1) if i=Nnext(i,j) = (i,j+1) if i≤N
if (a>b && c>d) A;if (a>b && e>f) B;
x = a>b;if (x && c>d) A;if (x && e>f) B;
Evaluation Methodology
Focus on control-path empty data-path (incrementing arrays) independent iterations loops with different shapes
3 versions: different pipelining SLP : innermost loop NLP: all loops FSM-LA2: while loop of FSM with next2
17
Evaluation: HLS Phase
Maximum Target Frequency
18
rect 2d rect 3d triangular 2d
triangular 3d
0
100
200
300
400
500
SLP NLP FSM-LA2
Evaluation: Synthesized Design Achieved Frequency
19
rect 2d rect 3d triangular 2d
triangular 3d
0
100
200
300
400
500
SLP NLP FSM-LA2
Conclusion
Improved FSM generation from for loops Example of HLS specific transformation State look ahead to pipeline control-path HLS tools currently lack compiler
optimizations Applied to Nested Loop Pipelining
Enlarge applicability by reducing its overhead
Future Directions Other uses of next function Other HLS-specific transformations
20
21
top related