compilers as collaborators and competitors of high-level specification systems
DESCRIPTION
Compilers as Collaborators and Competitors of High-Level Specification Systems. David Padua University of Illinois at Urbana-Champaign. Towards a Synthesis. There is much interaction and overlap between compilers and code generation from very high level specifications. - PowerPoint PPT PresentationTRANSCRIPT
Compilers as Collaborators and Competitors of High-Level Specification Systems
David Padua
University of Illinois at Urbana-Champaign
Towards a Synthesis
There is much interaction and overlap between compilers and code generation from very high level specifications.
Both technologies could merge into “supercompiler” technology. Thesis, antithesis synthesis
Higher Levels of Abstraction…
One of the main goals of Software Research is to facilitate program development.
Raise the level of abstraction. What rather than how. Subroutines – Control abstraction Data abstraction mechanisms
… Higher Levels of Abstraction
Programming is simplified by using macro operations from a catalog.
Modules (subroutines/classes/…) Part of the language (Fortran 90,
MATLAB, SETL) Standard libraries
Hand–written Automatically generated
Application specific (usually hand written)
Performance and Abstraction
In many cases the main mechanism to attain high performance is to develop high-performance library routines. For example, MATLAB programming style is to use
functions as much as possible. This approach does not always work. Real
applications make little use of pre-existing libraries. One reason: Data structures are not always in the right
format. Another: The overhead associated with class accesses.
For this reason, with current technology, Higher-level => Lower performance
Automatic Generation of Modules from Specifications…
Several systems aim at generating the fastest possible routines for certain classes of computations Relatively simple (algorithms) Very high performance implementation can be
tedious and time consuming. Examples of these systems include
ATLAS FFTW Spiral
… Automatic Generation of Modules from Specifications
Other systems try to simplify the generation of complete applications. Although performance is also a concern, language design and correctness are the most important issues. Ellpack GPSS Many CAD systems
ATLAS
Generate several versions of BLAS routines Different tile sizes Different degrees of unrolling Loop ordering is fixed
Run all and choose the fastest
FFTW
Recursive divide-and-conquer Plan: factorization tree Factorization stop at certain sizes Execution: call codelets
Codelet Subroutines for small-size FFTs Optimized and fully-unrolled Generated by a dedicated compiler
Adapt to environment at run-time Dynamic programming
F1024
F128
F16
F8
F8
Frs= (Ir Fs)L(FrIs)T
SPIRAL
Formula Generator
SPL Compiler
Performance Evaluation
Search Engine
DSP Transform
Target Architecture DSP Libraries
SPL Formulae
C/FORTRAN Programs
Supercompilers …
Integration of Very High Level Specifications with Conventional Languages
Besides conventional subroutines selected from a catalog), the languages accepted by supercompilers would also call “macros” which could be used to generate code as a function of the Target machine Value of data Structure of data Shape of data Rest of the program Numerical properties
… Supercompilers …
Macros could be subroutines or class methods. Expanding classes could include data representation selection (including data distribution) SETL Automatic Dense Sparse techniques Automatic data distribution techniques
… Supercompilers
In theory at least, generating code from specifications rather than from specific HLL implementations should lead to better performance.
All the benefits of abstraction without the performance penalty.
Vectorizers and High Level Specifications
do i=1,na(i)=b(i)+c(i)d(i)=a(i)+d(i-1)if (m > d(i)) m=d(i)
end do
do i=1,na(i)=b(i)+c(i)
end dodo i=1,n
d(i)=a(i)+d(i-1)end dodo i=1,n
if (m > d(i)) m=d(i)end do
a(1:n)=b(1:n)+c(1:n)d(1:n) = lin-rec(a,d,1,n)m=min(m,d(1:n)
Back End Compilers and Supercompilers …
Back End Compilers take care of Machine code generation Register allocation Conventional optimizations
But not really trusted by today’s module generation systems (Competitors) The existence of ATLAS is just an indictment of current
compiler technology. FFTW does clustering to improve register allocation. Spiral does a variety of conventional optimizations.
Optimizations in Spiral
SPL Compiler
C/Fortran Compiler
Formula Generator* High-level scheduling* Loop transformation
* High-level optimizations- Constant folding- Copy propagation- CSE- Dead code elimination
* Low-level optimizations- Instruction scheduling- Register allocation
Can Module Generators Rely on Back End Compilers ?
Not always, but using backend compilers will always be necessary for portability (Collaborators).
But … Compilers can hinder efforts to get good performance. For example, bad register allocation can have a
serious negative impact. Need a standard set of commands to control
transformations applied by compiler
… Back End Compilers and Supercompilers
In Supercompilers transformations should be done by the Back End whenever possible.
Reason: Applies to all parts of the program not only to very high-level components.
Search …
Search is an important component of module generators.
Also used by conventional compilers, but compilers usually work with static predictions rather than actual execution times. KAP tried all possible loop permutations. SGI-PRO tries many combinations of unrolling of
unrolling. Superoptimizer and similar systems. Most compiler optimization algorithms are
heuristics with no search involved.
… Search …
In Supercompilers search could also be done across several algorithms looking for a good data representation and data distribution for the whole program.
… Search …
Search strategy could make use of actual execution times combined with static performance prediction Static prediction not very accurate today. Tight performance bounds to prune the search. Some decisions could be made at run-time
IF statements/multiversion loops JIT compilers
… Search
Some search could be based on data dependent behavior Profiling “Representative” data set
Search strategy is important given that space of possibilities is often large and not monotonic. And it is difficult to know how far the search process is from the optimum. Need to develop tight bounds.
Size of Search SpaceN # of formulas N # of formulas
21 1 29 20,793
22 1 210 103,049
23 3 211 518,859
24 11 212 2,646,723
25 45 213 13,649,969
26 197 214 71,039,373
27 903 215 372,693,519
28 4279 216 1,968,801,519
Coverage
Need a class of specifications large enough to represent most of the computation.
Effectiveness of approach will depend on coverage. Current libraries are a good start. But … it is not clear how much these libraries
typically cover. To impact programming in general current
approaches would have to be extended to other domains such as sparse computations, sorting, searching. …
Conclusions
As we understand better algorithm choices and their impact in performance it becomes feasible to automate much of the process of selecting data structures and algorithms to maximize performance.
A first step: a repository of routines/classes with several implementations for each subroutine.
But generation based on context could lead to better performance.
In particular generation from very high-level specifications could allow the generation of code combining several operations in ways that is impossible to conceive with current encapsulation mechanisms.