towards automated design space exploration and code ... · towards automated design space...
TRANSCRIPT
![Page 1: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/1.jpg)
GPRM
Towards Automated Design Space Exploration
and Code Generation using
Type Transformations
S Waqar Nabi & Wim Vanderbauwhede
First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC'15)
Sunday, November 15, 2015
Austin, TX
www.tytra.org.uk
![Page 2: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/2.jpg)
Using Safe Transformations and a
Cost-Model for HPC on FPGAs
The TyTra project context
o Our approach, blue-sky target, down-to-earth target, where
we are now, how we are different
Key contributions
o (1) Type transformations to create design-variants, (2) a new
Intermediate Language, and (3) an FPGA Cost model
The cost model
o Performance and resource-usage estimates, some results
Using safe transformations and an associated light-weight cost-model opens the
route to a fully automated design-space exploration flow
![Page 3: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/3.jpg)
THE CONTEXTOur approach, blue-sky target, down-to-earth target, where we are now,
how we are different
![Page 4: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/4.jpg)
Blue Sky Target
![Page 5: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/5.jpg)
Blue Sky Target
Cost Model
Legacy Scientific Code
Heterogeneous HPC Target Description
Optimized HPC
solution!
The goal that keeps us motivated!
(The pragmatic target is somewhat more modest…)
![Page 6: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/6.jpg)
The Short-Term Target
Our focus is on FPGA targets, and we currently require design entry in a
Functional Language using High-Level Functions (maps, folds) [a kind of DSL]
![Page 7: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/7.jpg)
The cunning plan…
1. Use the functional programming paradigm to (auto) generate
program-variants which translate to design-variants on the
FPGA.
2. Create an Intermediate Language that:
• Is able to capture points entire design-space
• Allows a light-weight cost-model to be built around it
• Is a convenient target for front-end compiler
3. Create a light-weight cost-model that can estimate the
performance and resource-utilization for each variant.
7
A performance portable code-base that builds on a purely software programming
paradigm.
![Page 8: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/8.jpg)
And you may very well ask…
8
The jury is still out…
![Page 9: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/9.jpg)
How our work is different
Our observations on limitations of current tools and flows:
1. Design-entry in a custom high-level language which nevertheless has
hardware-specific semantics
2. Architecture of the FPGA-solution specified by programmer; compilers
cannot optimize it.
3. Solutions create soft-processors on the FPGA; not optimized for HPC
(orientation towards embedded applications)
4. Design-space exploration requires prohibitively long time
5. Compiler is application specific (e.g. DSP applications)
We are not there yet, but in principle, our approach entirely eliminates the first
four, and mitigates the fifth.
![Page 10: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/10.jpg)
KEY CONTRIBUTIONS(1) Type transformations for generating program variants, (2) a new
Intermediate Language, and (3) a light-weight Cost Model
![Page 11: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/11.jpg)
1. Type Transformations to Generate
Program Variants
Functional Programming
Types
o More general than types in C
o Our focus is on types of functions that perform array
operations
o reshape, maps and folds
Type transformations
o Can be derived automatically
o Provably correct
o Essentially reshape the arrays
A functional paradigm with high-level functions allows creation of design-variants
that are correct-by-construction.
![Page 12: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/12.jpg)
Illustration of Variant Generation through
Type-Transformation
• typeA :Vect (im*jm*km) dataType --1D data
• Single execution thread
• typeB :Vect km (Vect im*jm dataType)--transformed 2D data
• (km concurrent execution threads)
• output = mappipe kernel_func input --original program
• inputTr = reshapeTo km input --reshaping data
• output = mappar (mappipe kernel_func) inputTr --new program
Simple and provably correct transformations in a high-level functional language
translates to design-variants on the FPGA.
![Page 13: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/13.jpg)
• Manage-IR
• Deals with
• memory objects (arrays)
• streams (loops over arrays)
• offset streams
• loops over work-unit
• block-memory transfers
Strongly and statically typed
All computations expressed as SSA (Single-Static Assignments)
Largely (and deliberately) based on the LLVM-IR
• Compute-IR
• Streaming model
• SSA instructions define
the datapath
2. A New Intermediate Language
![Page 14: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/14.jpg)
2. A New Intermediate Language
![Page 15: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/15.jpg)
Design Space
Estimation Space
The Cost Model
3. Cost Model
![Page 16: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/16.jpg)
THE FPGA COST-MODELPerformance Estimate, Resource-utilization estiamte, Experimental
Results
![Page 17: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/17.jpg)
The Cost-Model Use-Case
17
A set of standardized experiments feed target-specific empirical data to the cost
model, and the rest comes from the IR descripition.
![Page 18: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/18.jpg)
Two Types of Estimates
Resource-Utilization Estimates
o ALUTs, REGs, DSPs
Performance Estimates
o Estimating memory-access
bandwidth for specific data
patterns
o Estimating FPGA operating
frequency
18
Both estimates needed to allow compiler to choose the best design variant.
![Page 19: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/19.jpg)
1. Resource Estimates
Observation
o Regularity of FPGA fabric allows some very simple first or second order
expressions to be built up for most instructions based on a few
experiments.
Key Determinants
o Primitive (SSA) instructions used in IR of the kernel functions
o Data-types
o Structure of various functions (par, comb, par, seq)
o Control logic over-head
19
A set of one-time simple synthesis experiments on the target device helps us
create a very accurate resource-utilization cost model
![Page 20: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/20.jpg)
Resource Estimates - Example
20
Integer Division
Integer Multiplication
Light-weight cost expressions associated with every legal SSA instruction in the
TyTra-IR
![Page 21: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/21.jpg)
2. Performance Estimate
Effective Work-Unit Throughput (EWUT)
o Work-Unit = Executing the kernel over the entire index-space
Key Determinants
o Memory execution model
o Sustained memory bandwidth for the target architecture and design-
variant
• Data-access pattern
o Design configuration of the FPGA
o Operating frequency of the FPGA
o Compute-bound or IO-bound?
21
Performance model is trickier, especially calculating estimates of sustained
memory bandwidth and FPGA operating frequency.
![Page 22: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/22.jpg)
2. Performance Estimate
Effective Work-Unit Throughput (EWUT)
o Work-Unit = Executing the kernel over the entire index-space
Key Determinants
o Memory execution model
o Sustained memory bandwidth for the target architecture and design-
variant
• Data-access pattern
o Design configuration of the FPGA
o Operating frequency of the FPGA
o Compute-bound or IO-bound?
22
Performance model is trickier, especially calculating estimates of sustained
memory bandwidth and FPGA operating frequency.
![Page 23: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/23.jpg)
Performance Estimate
Dependence on Memory Execution Model
Time
Activity
Host
Device-DRAM
Device-DRAM
Device-Buffers
Device-Buffers
Offset-Buffers
Kernel Pipeline
Execution
Three Types of memory executions
A given design-variant can be categorized based on:
- Architectural description
- IR description
![Page 24: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/24.jpg)
Performance Estimate
Dependence on Memory Execution Model
Time
Activity
Host
Device-DRAM
Device-DRAM
Device-Buffers
Device-Buffers
Offset-Buffers
Kernel Pipeline
Execution
Three Types of memory executions
A given design-variant can be categorized based on:
- Architectural description
- IR description
![Page 25: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/25.jpg)
Performance Estimate
Dependence on Memory Execution Model
Time
Activity
Host
Device-DRAM
Device-DRAM
Device-Buffers
Device-Buffers
Offset-Buffers
Kernel Pipeline
Execution
Work-Unit Iterations
Type AAll iterations
![Page 26: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/26.jpg)
Performance Estimate
Dependence on Memory Execution Model
Time
Activity
Host
Device-DRAM
Device-DRAM
Device-Buffers
Device-Buffers
Offset-Buffers
Kernel Pipeline
Execution
First Iteration
only
Last Iteration
only
Work-Unit Iterations
Type B
All other
iterations
![Page 27: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/27.jpg)
Performance Estimate
Dependence on Memory Execution Model
Time
Activity
Host
Device-DRAM
Device-DRAM
Device-Buffers
Device-Buffers
Offset-Buffers
Kernel Pipeline
Execution
First Iteration
only
Last Iteration
only
Work-Unit Iterations
Type C
All other
iterations
Once a design-variant is categorized, performance can be estimated accordingly
![Page 28: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/28.jpg)
2. Performance Estimate
Effective Work-Unit Throughput (EWUT)
o Work-Unit = Executing the kernel over the entire index-space
Key Determinants
o Memory execution model
o Sustained memory bandwidth for the target architecture and
design-variant
• Data-access pattern
o Design configuration of the FPGA
o Operating frequency of the FPGA
o Compute-bound or IO-bound?
28
Performance model is trickier, especially calculating estimates of sustained
memory bandwidth and FPGA operating frequency.
![Page 29: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/29.jpg)
Performance Estimate
Dependence on Data Access Pattern
We have defined a rho (ρ) factor defined as a scaling factor of the
peak memory bandwidth
Varies from 0-1
Based on data-access pattern
Derived empirically through one-time standardized experiments on
target node
29
![Page 30: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/30.jpg)
2. Performance Estimate
Effective Work-Unit Throughput (EWUT)
o Work-Unit = Executing the kernel over the entire index-space
Key Determinants
o Memory execution model
o Sustained memory bandwidth for the target architecture and design-
variant
• Data-access pattern
o Design configuration of the FPGA
o Operating frequency of the FPGA
o Compute-bound or IO-bound?
30
Performance model is trickier, especially calculating estimates of sustained
memory bandwidth and FPGA operating frequency.
Determined from the IR
description of design-variant
![Page 31: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/31.jpg)
Performance Estimates
The Parameters and their Evaluation
31
![Page 32: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/32.jpg)
Performance Estimates
Parameters from Architecture Description
32
![Page 33: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/33.jpg)
Performance Estimates
Parameters Calculated Empirically
33
![Page 34: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/34.jpg)
Performance Estimates
Parameters derived from IR description of Kernel
34
![Page 35: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/35.jpg)
Performance Estimates
The Expressions
35
![Page 36: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/36.jpg)
Performance Estimates
The Expressions
36
![Page 37: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/37.jpg)
Performance Estimates
The Expressions
37
![Page 38: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/38.jpg)
Performance Estimates
The Expressions
38
![Page 39: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/39.jpg)
Performance Estimates
The Expressions
39
![Page 40: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/40.jpg)
Performance Estimates
The Expressions
40
![Page 41: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/41.jpg)
Performance Estimates
Experimental Results (Type C)
41
Estimated (E) vs actual (A) cost and throughput for C2 and
C1 configurations of a successive over-relaxation kernel. [Note that the cycles/kernel are estimated very accurately, but the Effective Workgroup Throughput
(EWGT) is off because of inaccuracy of frequency estimate for the FPGA]
![Page 42: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/42.jpg)
Does the TyTra Approach Work?
![Page 43: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/43.jpg)
Design-Space Exploration?
![Page 44: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/44.jpg)
CONCLUSION
![Page 45: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/45.jpg)
The route to Automated Design Space
Exploration on FPGAs for HPC Applications
The larger aim is to create a turn-key compiler for:
Legacy scientific code Heterogeneous HPC Platform
o Current focus is on FPGAs, and on using a Functional
Language design entry
Our main contributions are:
o Type transformations to create design-variants,
o New Intermediate Language, and
o FPGA Cost model
Our FPGA Cost Model
o Works on the TyTra-UIR, is light-weight, accurate (enough),
and allows us to evaluate design-variants
Using safe transformations on a functional language paradigm and a light-weight
cost-model to brings us closer to a turn-key HPC compiler for legacy code
![Page 46: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/46.jpg)
46
AcknowledgementWe wish to acknowledge support
by EPSRC through grant EP/L00058X/1.
The woods are lovely, dark and deep, But I have promises to keep,
And lines to code before I sleep, And lines to code before I sleep.
![Page 47: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/47.jpg)
EXTRAS
![Page 48: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/48.jpg)
Parallel Approaches
• What we do is very similar to:
• Loop optimizations to accelerate a scientific application
• Using skeletons to create a high-level abstraction for parallel
programming
• Tools that automatically explore design-space
![Page 49: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/49.jpg)
Our Approach to a Light-Weight
Cost Model An IR sufficiently low-level to expose the parameters needed for the
o The TyTra-IR has sufficient structural information to associate it directly
with resources on an FPGA
Because TyTra-IR is a customized language, we can ensure that:
o All legal instructions (and structures) have a cost associated with them
o As long as the front-end compiler can target a HLL on the TyTra-IR, we
can cost HL program variants
o Costing resources on specific FPGA devices, and estimating memory
bandwidth for various patterns on the target node, requires some
empirical data.
• We are working on creating a set of standardized experiments that
49
We are not there yet, but in principle, our approach entirely eliminates all these
limitations.
![Page 50: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/50.jpg)
Quite a few avenues…• Experiment with more kernels, their program-variants, estimated vs actual costs, (correct) code-generation. Use
(CHStone) benchmarks.
• Computation-aware caches, optimized for halo-based scientific computations
• Integrate with Altera-OpenCL platform for host-device communication
• Back-end optimizations, LLVM passes, LLVM TyTra-IR translation
• Route to TyTra-IR from SAC
• Integrate Tytra-FPGA flow with SACGPU(OpenCL flow) for heterogeneous targets
• Use of Multi-party Session Types to ensure correctness of transformations• Even code-generation for clusters?
• Abstract descriptions of target hardware
• SystemC-TLM model to profile application and high-level partitioning in a heterogeneous environment
50
![Page 51: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/51.jpg)
Quite a few avenues…• Experiment with more kernels, their program-variants, estimated vs actual costs, (correct) code-generation. Use
(CHStone) benchmarks.
• Computation-aware caches, optimized for halo-based scientific computations
• Integrate with Altera-OpenCL platform for host-device communication
• Back-end optimizations, LLVM passes, LLVM TyTra-IR translation
• Route to TyTra-IR from SAC
• Integrate Tytra-FPGA flow with SACGPU(OpenCL flow) for heterogeneous targets
• Use of Multi-party Session Types to ensure correctness of transformations• Even code-generation for clusters?
• Abstract descriptions of target hardware
• SystemC-TLM model to profile application and high-level partitioning in a heterogeneous environment
51
etcetera, etcetera,
etcetera
![Page 52: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/52.jpg)
The platform model
for TyTra (FPGA)
52
![Page 53: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/53.jpg)
The Manage-IR; Memory Objects
53
TyTra-IR OpenCL view LLVM-SPIR View Hardware (FPGA)
CmemConstant Memory Constant Memory 3: Constant
ImemInstruction Memory Constant Memory DistRAM / BRAM
PipememPipeline registers DistRAM
PmemPrivate Memory(Data Mem for Instruc’ Proc’) Private Memory 0: Private DistRAM
CachememData (and Constant) Cache DistRAM / BRAM
LmemLocal (shared) memory Local Memory 4: Local M20K (BRAM) or Dist RAM
GmemGlobal memory Global Memory 1: Global On-board DRAM
HmemHost memory Host Memory Host communication
![Page 54: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/54.jpg)
The Manage-IR; Stream Objects
54
Can have a 1-1 or Many-1 relation with memory
objects
Have a 1-1 relation with arguments to pipe functions
(i.e. port connections to compute-cores)
![Page 55: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/55.jpg)
The Manage-IR; repeat blocks
• Repeatedly call a kernel without referring back to the host
(outer-loop)
• May involve block memory transfers between iterations
55
![Page 56: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/56.jpg)
The Manage-IR; stream windows
• Access offsets in streams
• Use on-chip buffers for storing data read from memory
56
![Page 57: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/57.jpg)
The Compute-IR
• Structural semantics• @function_name (…args…) par
• @function_name (…args…) seq
• @function_name (…args…) pipe
• @function_name (…args…) comb
• Nesting these functions gives us the expressiveness to explore various
parallelism configurations
• Streaming ports
• Counters and nested counters
• SSA data-path instructions
57
![Page 58: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/58.jpg)
Example: Simple Vector Operation
The Kernel
![Page 59: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/59.jpg)
Version 1 – Single Pipeline (C2)
![Page 60: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/60.jpg)
Core_Compute
addWrmul
ad
daddRd
lmem
a
lmem
b
lmem
c
lmem
y
Str
ea
m C
on
tro
l
Str
ea
m C
on
tro
l
Version 1 – Single Pipeline (C2)
![Page 61: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/61.jpg)
Core_Compute
addWrmul
ad
d
Version 1 – Single Pipeline
addRd
lmem
a
lmem
b
lmem
c
lmem
y
Str
ea
m C
on
tro
l
Str
ea
m C
on
tro
l
The parser can also
automatically find ILP
and schedule in an
ASAP fashion
![Page 62: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/62.jpg)
Version 2 – 4 Parallel Pipelines (C1)
![Page 63: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/63.jpg)
Core_Compute
addWrmul
ad
d
Version 2 – 4 Parallel Pipelines
addRd
lmem
y
Str
ea
m C
on
tro
l
Core_Compute
addWrmul
ad
daddRd
Core_Compute
addWrmul
ad
daddRd
Core_Compute
addWrmul
ad
daddRd
lmem
a
lmem
b
lmem
c
Str
eam
Contr
ol
![Page 64: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/64.jpg)
Core_Compute
addWrmul
ad
d
Version 2 – 4 Parallel Pipelines
addRd
lmem
y
Str
ea
m C
on
tro
l
Core_Compute
addWrmul
ad
daddRd
Core_Compute
addWrmul
ad
daddRd
Core_Compute
addWrmul
ad
daddRd
lmem
a
lmem
b
lmem
c
Str
eam
Contr
ol
![Page 65: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/65.jpg)
Version 3 – Scalar Instruction Processor
(C4)
![Page 66: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/66.jpg)
Core_Compute
Version 3 – Scalar Instruction Processor
(C4)
lmem
a
lmem
b
lmem
c
lmem
yS
tre
am
Co
ntr
ol
Str
eam
Contr
ol
PE(Instruction Processor)
{addaddmuladd
}
AL
U
The ALU would be
customized for the
instructions mapped to this
PE at compile-time
![Page 67: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/67.jpg)
Core_Compute
Version 3 – Single Sequential Processor
lmem
a
lmem
b
lmem
c
lmem
yS
tre
am
Co
ntr
ol
Str
eam
Contr
ol
Generic PE
{addaddmuladd
}
AL
U
![Page 68: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/68.jpg)
Version 4 – Multiple Processors /
Vectorization (C5)
![Page 69: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/69.jpg)
Core
_C
om
put e
Version 4 – Multiple Processors / Vectorization
(C5)
Generic PE
{ addaddmuladd }
AL
U
lmem
a
lmem
b
lmem
c
Str
ea
m C
on
tro
l
lmem
y
Str
ea
m C
on
tro
l
Co
re_
Co
mp
ut e
Generic PE
{ addaddmuladd }
AL
U
Co
re_
Co
mp
ut e
Generic PE
{ addaddmuladd }
AL
U
Core
_C
om
put e
Generic PE
{ addaddmuladd }
AL
U
![Page 70: Towards Automated Design Space Exploration and Code ... · Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede](https://reader034.vdocuments.site/reader034/viewer/2022042612/5f57c19f95d167585809ae41/html5/thumbnails/70.jpg)
Core
_C
om
put e
Version 4 – Multiple Sequential
Processors (Vectorization)
Generic PE
{ addaddmuladd }
AL
U
lmem
a
lmem
b
lmem
c
Str
ea
m C
on
tro
l
lmem
y
Str
ea
m C
on
tro
l
Co
re_
Co
mp
ut e
Generic PE
{ addaddmuladd }
AL
U
Co
re_
Co
mp
ut e
Generic PE
{ addaddmuladd }
AL
U
Core
_C
om
put e
Generic PE
{ addaddmuladd }
AL
U
Note the continued use of stream
abstractions even through the PEs are
Instruction Processors now