compiler research in hpc lab r. govindarajan high performance computing lab....
TRANSCRIPT
![Page 2: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/2.jpg)
Organization
HPC Lab Research Overview Compiler Analysis & Optimizations
Precise Dataflow Analysis Energy Reduction for Embedded
Systems Array Allocation for Partitioned Memory
Arch. Dynamic Voltage Scaling
Integrated Spill Code Generation & Scheduling
Conclusions
![Page 3: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/3.jpg)
HPC Team (or HPC– XI)
Mruggesh Gajjar B.C. Girish R. Karthikeyan R. Manikantan Santosh
Nagarakatte
Rupesh Nasre Sreepathi Pai Kaushik Rajan T.S. Rajesh Kumar V.Santhosh Kumar Aditya Thakur
Coach: R. Govindarajan
![Page 4: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/4.jpg)
Compiler Optimizations Traditional analysis & optimizations,
power-aware compiling techniques, compilation techniques for embedded systems
Computer Architecture Superscalar architecture, architecture-
compiler interaction, application-specific processors, embedded systems
High Performance Computing Cluster computing, HPC Applications
HPC Lab Research Overview
![Page 5: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/5.jpg)
ILP Compilation Techniques Compiling Techniques for
Embedded Systems Compiling Techniques for
Application-Specific Systems Dataflow Analysis
Compiler Research in HPC Lab.
![Page 6: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/6.jpg)
ILP Compilation Techniques
Instruction Scheduling Software pipelining Register Allocation Power/Energy Aware Compilation
techniques Compiling Techniques for embedded
systems/application specific processors (DSP, Network Processors, …)
![Page 7: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/7.jpg)
Power-aware software pipelining method (using integer linear program formulation)
Simple Offset Assignment for code-size reduction.
Loop transformation and memory bank assignment for power reduction.
Compiler Assisted Dynamic Voltage Scaling Memory layout problem for embedded
systems MMX code generation using vectorization
Compiling Techniques for Embedded Systems
![Page 8: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/8.jpg)
Framework for exploring application design space for network application
Compiling techniques for Streaming Applications and Program Models Buffer-Aware, Schedule-size Aware,
Throughput Optimal Schedules
Compiling Techniques for Application Specific Systems
![Page 9: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/9.jpg)
Precise Dataflow Analysis Pointer Analysis
Compiler Analysis
![Page 10: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/10.jpg)
Compiler problems are Optimization problems – solved by
formulating the problem as Integer Linear Program problem. Involves non-trivial effort! Efficient formulation for reducing exec. time! Other evolutionary approaches can also used.
Graph Theoretic problems – leverage existing well-known approaches
Modelled using Automaton – elegant problem formulation to ensure correctness
So, What is the Connection?
![Page 11: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/11.jpg)
The Problem: Improve precision of data-flow analysis used in compiler optimization
Precise Dataflow Analysis
![Page 12: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/12.jpg)
… : statements unrelated to x or y
A: ...
F: ...E: ...
D: ...
C: x=2;B: x=1;
G: y = x;
start
end
I: ...H: ...
J: ...
Can’t replace the use of x at G with a constant.
{x = 1} {x = 2}
{x = nc}
nc : not constant{ } : Data-flow information
Constant Propagation
![Page 13: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/13.jpg)
A0
F1 E1
D1
C0 B0
G1
F2 E2
D2
G2
start
end
J0
H0 I0
A: ...
F: ...E: ...
D: ...
C: x=2;B: x=1;
G: y = x;
start
end
I: ...H: ...
J: ...
Can replace uses of x at G1 and G2
Overview of our Solution
![Page 14: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/14.jpg)
Challenges
The Problem: Improve precision of data-flow analysis
Approach: Restructuring control-flow of the program
Challenges: Developed generic framework Guarantees optimization opportunities Handles the precision and code size
trade-off Approach is simple and clean
![Page 15: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/15.jpg)
… : statements unrelated to x or y
A: ...
F: ...E: ...
D: ...
C: x=2;B: x=1;
G: y = x;
start
end
I: ...H: ...
J: ...
A brief look at our example.
At control-flow merge D, we lose precision.
{x = 1} {x = 2}
{x = nc}
nc : not constant{ } : Data-flow information
![Page 16: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/16.jpg)
… : statements unrelated to x or y
A: ...
F: ...E: ...
D: ...
C: x=2;B: x=1;
G: y = x;
start
end
I: ...H: ...
J: ... nc : not constant{ } : Data-flow information
Need to duplicate this in order to optimize node G…
![Page 17: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/17.jpg)
A: ...
F: ...E: ...
D: ...
C: x=2;B: x=1;
G: y = x;
start
end
I: ...H: ...
J: ...
… : statements unrelated to x or y
nc : not constant{ } : Data-flow information
…such that paths with differing dataflow information do not intersect.
![Page 18: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/18.jpg)
A: ...
F: ...E: ...
D: ...
C: x=2;B: x=1;
G: y = x;
start
end
I: ...H: ...
J: ...
… : statements unrelated to x or y
nc : not constant{ } : Data-flow information
No need to duplicate this.
![Page 19: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/19.jpg)
Control-flow Graph = Automaton View a control-flow graph G as a
finite automaton with states as nodes start state as entry node accepting state as exit node transitions as the edges
![Page 20: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/20.jpg)
A: ...
F: ...E: ...
D: ...
C: x=2;B: x=1;
G: y = x;
start
end
I: ...H: ...
J: ...
0
21
B-D
B-D
C-D
C-D
G-HG-I
G-HG-I
Split Automaton for D
The Automaton
![Page 21: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/21.jpg)
A: ...
F: ...E: ...
D: ...
C: x=2;B: x=1;
G: y = x;
start
end
I: ...H: ...
J: ...
0
21
B-D
B-D
C-D
C-D
G-HG-I
G-HG-I
Split Automaton for D
The Automaton
![Page 22: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/22.jpg)
A: ...
F: ...E: ...
D: ...
C: x=2;B: x=1;
G: y = x;
start
end
I: ...H: ...
J: ...
0
21
B-D
B-D
C-D
C-D
G-HG-I
G-HG-I
Split Automaton for D
A0
F1 E1
D1
C0 B0
G1
F2 E2
D2
G2
start
end
J0
H0 I0
more
CFG x Automaton = Split Graph
![Page 23: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/23.jpg)
Energy Reduction: Array Alloc. for Partitioned Memory Arch. Dynamic Energy reduction in Memory
Subsystem. Memory subsystem consumes significant energy Many embedded applications are array intensive Memory architecture with multiple banks
Exploiting various low-power modes of
partitioned memory architectures. Put idle memory banks in low-power mode Allocate arrays to memory banks s.t. more
memory banks can be in low-power mode for longer duration
![Page 24: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/24.jpg)
Partitioned Memory Architectures
Memory banks with low-power modes. Active, Stand-by, Napping, Power-down, Disabled.
Resynchronization time – time to move from lower power mode to Active mode
ModeResynch.
Time (cycles)
Energy Consumed
(nJ)
Active 0 0.718
Standby 2 0.468
Napping 30 0.0206
Power Down 9000 0.00875
![Page 25: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/25.jpg)
Motivating Example
Array Relation GraphExample :
float a[N], d[N]; double b[N], c[N];
L1: for (ia=0;ia < N;ia++)
d[ia] = a[ia] + k;
L2: for (ia=0;ia < N;ia++)
a[ia] = b[ia] * k ;
L3: for (ia=0;ia < N;ia++)
c[ia] = d[ia] / k;
L4: for (ia=0;ia < N;ia++)
b[ia] = c[ia] - k;
L5: for (ia=0;ia < N;ia++)
b[ia] = d[ia] + k;
Arrays a, d ~ 1 MB eachArrays b, c ~ 2 MB eachMemory bank size = 4MB
b
c
2N
d
a
N
8N
4N
N
Memory banks active for a total of 32N cycles!
![Page 26: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/26.jpg)
Motivating Example -- Our Approach
Array Relation Graph
• Array allocation requires partitioning the ARG!
• Graph partitioning such that each subgraph can be accommodated in a memory bank.
• Weights of edges across subgraphs is the cost of keeping multiple banks active together. Minimize them!
• Arrays b and c in one subgraph and a and d in another
b
c
2N
d
a
N
8N
4N
N
Memory banks active for a total of 23N cycles!
![Page 27: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/27.jpg)
Dynamic Voltage Scaling
Dynamically vary the CPU frequency and supply voltage.
Dynamic Power proportional to C * V2 * f C capacitance V supply voltage f operating frequency
Processors support different Voltage (and Frequency) modes and can switch betn. them.
AMD, Transmeta, Xscale provide support for DVS, have multiple operating frequencies.
![Page 28: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/28.jpg)
Identify program regions where DVS can be performed.
For each program region, identify the voltage (freq.) mode to operate on, s.t. energy is minimized
Ensure that performance is not degraded.
Compiler Assisted DVS
![Page 29: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/29.jpg)
Motivating Example
Freq.
P1 P2 P3 P4 P5 Total
200Exec. Time
151 6827 335 6827 335 14475
Energy 82 125 39 125 39 410
300Exec. Time
100 4552 223 4552 223 9650
Energy 149 163 72 163 72 619
400Exec. Time
76 3414 168 3414 168 7240
Energy 198 274 176 274 176 1098DVS
Freq. 200 400 300 400 300 --
Exec. Time
151 3414 223 3414 223 7425
Energy 82 274 72 274 72 778
2 % Increase
30 % decrease
![Page 30: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/30.jpg)
Program divided into number of regions. Assign an operating frequency for each
program region. Constraint
Marginal increase in exec. time of the program.
Objective Minimizing program Energy
Consumption. Multiple Choice Knapsack Problem
DVS Problem Formulation
![Page 31: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/31.jpg)
Compiler Problem as Optimization Problem Integrated register allocation, spill
code generation and scheduling in Software Pipelined loop
Problem: Given Machine M, Loop L, a software pipelined schedule S with initiation interval II, perform Register Allocation and generate spill code, if necessary, and schedule them such that the register requirement of the schedule Number of Registers and resource constraints are met!
![Page 32: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/32.jpg)
Live Range Representation
TN A,0,0
Register R0 Register Rn....................A
....................
....................
....................
....................
....................
1
2
3
4
5
6
7 use
1
2
3
4
5
6
7
use
0def
TN A,0,1TN A,0,2
TN A,0,3TN A,0,4
TN A,0,5TN A,0,6
TN A,0,7
TN A,n,0TN A,n,1
TN A,n,2TN A,n,3TN A,n,4TN A,n,5
TN A,n,6TN A,n,7
Modeling Liverange
![Page 33: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/33.jpg)
Store decision variables
Register R0 Register Rn....................A
....................
....................
....................
1
2
3
4
5
6
7 use
1
2
3
4
5
6
7
use
use
0def
STN A,0,1STN A,0,2
STN A,0,3STN A,0,4
STN A,n,1
STN A,n,2STN A,n,3STN A,n,4
Latencies: Load : 1, Store : 1, Instruction : 1
store
store
store
store
Modeling Spill Stores
![Page 34: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/34.jpg)
Load decision variables
Register R0 Register Rn....................A
....................
....................
....................
1
2
3
4
5
6
7 use
1
2
3
4
5
6
7
use
use
0def
LTN A,0,3
LTN A,0,4
LTN A,0,5
LTN A,0,6
LTN A,n,3
LTN A,n,4
LTN A,n,5
LTN A,n,6
Latencies: Load : 1, Store : 1, Instruction : 1
load
load
load
load
Modeling Spill Loads
![Page 35: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/35.jpg)
Constraints- Overview
Every live range must be in a register at the definition time and the use time.
Spill load can take place only if the spill store has already taken place.
After a spill store, a live range can continue or cease to exist.
Ensure that the spill loads and stores don't saturate the memory units.
Minimize the number of spill loads and stores.
Constraints
![Page 36: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/36.jpg)
Objective
No Objective function – just a constrain solving problem!
Minimize the number of spill loads and stores
STN i,r,t+LTN i,r,t
![Page 37: Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab. govind@serc.iisc.ernet.in](https://reader034.vdocuments.site/reader034/viewer/2022052701/56649e115503460f94afca3f/html5/thumbnails/37.jpg)
Conclusions
Compiler research is fun! It is cool to do compiler research! But, remember Proebsting’s Law:
Compiler Technology Doubles CPU PowerEvery 18 YEARS!!
Plenty of opportunities in compiler research!
However, NO VACANCY in HPC lab this year!