derivation of efficient fsm from polyhedral loop nests tomofumi yuki, antoine morvan, steven derrien...
TRANSCRIPT
![Page 1: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/1.jpg)
Derivation of Efficient FSM from Polyhedral Loop Nests
Tomofumi Yuki, Antoine Morvan, Steven Derrien
INRIA/Université de Rennes 1
1
![Page 2: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/2.jpg)
High-Level Synthesis
Writing HDL is too costly Led to emergence of HLS tools
HLS is sensitive to input source Must be written in “HW-aware” manner
Source-to-Source transformations Common in optimizing compilers (semi-)automated exploration at HLS
stage Further enhance
productivity/performance
2
![Page 3: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/3.jpg)
HLS Specific Transformations Not all optimizing compiler
transformations make sense in embedded context Its converse is also true
Finite State Machines is an example
for loops are preferred ingeneral purpose context
3
for i for j S0 for k S1
while (…) if (…) S0; if (…) S1; if (…) k = k+1; if (…) i=i+1; j=0;
FSMderivation
![Page 4: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/4.jpg)
Contributions
Analytical model of Loop Pipelining Understanding when to use Nested LP w.r.t. Single Loop Pipelining
Derivation of Finite State Machines Handles imperfectly nested loops Based on polyhedral techniques
Pipelining of the control-path Computing n-states ahead Improves performance of the control-
path
4
![Page 5: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/5.jpg)
Outline
Modeling Loop Pipelining Single Loop Pipelining Nested Loop Pipelining NLP vs SLP
FSM Derivation Evaluation Conclusion
5
![Page 6: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/6.jpg)
Single Loop Pipelining
Overlapped execution of innermost loop
6
for i=1:M for j=1:N S(i,j);
for i=1:M for j=1:N stage0(i,j); stage1(i,j); stage2(i,j); stage3(i,j);
for i=1:M s0(i,1); s1(i,1); s0(i,2); s2(i,1); s1(i,2); … s3(i,1); s2(i,2); … s0(i,N); s3(i,2); … s1(i,N); … s2(i,N); s3(i,N);
![Page 7: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/7.jpg)
Pipeline flush/fill Overhead
Overhead for each iteration of the outer loop
7
i=1 i=2 i=3
for i=1:M for j=1:N s0(i,j); s1(i,j); s2(i,j); s3(i,j);
under-utilized stages
![Page 8: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/8.jpg)
Nested Loop Pipelining
“Compress” by pipelining alltogether
8
i=1i=2
i=3
for i=1:M for j=1:N s0(i,j); s1(i,j); s2(i,j); s3(i,j);
for i=1:M j=j+1; j<N s0(i,j); s1(i,j); s2(i,j); s3(i,j);
while(has_next) i,j=next(i,j) s0(i,j); s1(i,j); s2(i,j); s3(i,j);
![Page 9: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/9.jpg)
NLP Overhead
Larger control-path FSM for loop nest, instead of a single
loop FSM for SLP is a simple check on loop
bound Hinders maximum frequency
Complex control-path may take longer than one data-path stage
Savings in flush/fill overhead must be greater than the loss in frequency
9
![Page 10: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/10.jpg)
Modeling Trade-offs
Important parameters: Frequency Degradation due to NLP Innermost trip count Number of pipeline stages
f*: NLP frequency normalized to SLP f* = 0.9 means 10% degradation in
frequency α= #stages / trip count
larger α means large flush/fill overhead
10
![Page 11: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/11.jpg)
When is NLP Beneficial?
11
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.2 0.12 0.24 0.36 0.48 0.60 0.72 0.84 0.96 1.08 1.20
0.4 0.14 0.28 0.42 0.56 0.70 0.84 0.98 1.12 1.27 1.41
0.6 0.16 0.32 0.48 0.64 0.80 0.96 1.12 1.28 1.45 1.59
0.8 0.18 0.36 0.54 0.72 0.90 1.08 1.27 1.45 1.61 1.79
1 0.20 0.40 0.60 0.80 1.00 1.20 1.41 1.59 1.79 2.00
1.2 0.22 0.44 0.66 0.88 1.10 1.32 1.54 1.75 1.96 2.22
1.4 0.24 0.48 0.72 0.96 1.20 1.45 1.67 1.92 2.17 2.38
1.6 0.26 0.52 0.78 1.04 1.30 1.56 1.82 2.08 2.33 2.63
1.8 0.28 0.56 0.84 1.12 1.41 1.67 1.96 2.22 2.50 2.78
2 0.30 0.60 0.90 1.20 1.49 1.79 2.08 2.38 2.70 3.03
f*: higher = less degradation
α: larger = small trip count (innermost)
Program Characteristic
(cannot change)
Improving control-path is
possible
Model speedup as a function of f* and α
![Page 12: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/12.jpg)
Outline
Modeling Loop Pipelining FSM Derivation
Polyhedral Representation Computing Transitions State Look Ahead
Evaluation Conclusion
12
![Page 13: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/13.jpg)
Polyhedral Representation
Represent loops as mathematical objects
13
for i = 0:N for j = 0:M S
S M
N
for i = 0:N for j = 0:i S0 for k = 0:N-i S1
S0S1
![Page 14: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/14.jpg)
FSM Derivation
next function Find a piece-wise function that gives the
immediate successor in lexicographic order
Proposed in 1998 for low-level code generation
Direct Application to FSM Each piece = condition of transition Function = transition
Can be composed to obtain nextn
14
![Page 15: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/15.jpg)
State Look Ahead
Pipelining the control-flow When data-path is heavily pipelined,
control-path becomes the critical path Computing n-states ahead
Allows n-stage pipelining of the control-path
15
datapath
i,j i,’j’ i”,j”
next2
datapath
i,j i’,j’
next
![Page 16: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/16.jpg)
Other Optimizations
Merging transitions next computed can have many
transitions Some can be merged by looking at its
context
Common Sub-expressions HLS tools sometimes fail to catch
16
next(i,j) = (i,j+1) if i<N
next(i,j) = (N,j+1) if i=Nnext(i,j) = (i,j+1) if i≤N
if (a>b && c>d) A;if (a>b && e>f) B;
x = a>b;if (x && c>d) A;if (x && e>f) B;
![Page 17: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/17.jpg)
Evaluation Methodology
Focus on control-path empty data-path (incrementing arrays) independent iterations loops with different shapes
3 versions: different pipelining SLP : innermost loop NLP: all loops FSM-LA2: while loop of FSM with next2
17
![Page 18: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/18.jpg)
Evaluation: HLS Phase
Maximum Target Frequency
18
rect 2d rect 3d triangular 2d
triangular 3d
0
100
200
300
400
500
SLP NLP FSM-LA2
![Page 19: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/19.jpg)
Evaluation: Synthesized Design Achieved Frequency
19
rect 2d rect 3d triangular 2d
triangular 3d
0
100
200
300
400
500
SLP NLP FSM-LA2
![Page 20: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/20.jpg)
Conclusion
Improved FSM generation from for loops Example of HLS specific transformation State look ahead to pipeline control-path HLS tools currently lack compiler
optimizations Applied to Nested Loop Pipelining
Enlarge applicability by reducing its overhead
Future Directions Other uses of next function Other HLS-specific transformations
20
![Page 21: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c7d5503460f94931d3d/html5/thumbnails/21.jpg)
21