Compiler Research - Louisiana State University

Download Report

Transcript Compiler Research - Louisiana State University

Compilers as Collaborators and
Competitors of High-Level
Specification Systems
David Padua
University of Illinois at Urbana-Champaign
Towards a Synthesis


There is much interaction and overlap
between compilers and code
generation from very high level
specifications.
Both technologies could merge into
“supercompiler” technology.

Thesis, antithesis  synthesis
Higher Levels of Abstraction…


One of the main goals of Software
Research is to facilitate program
development.
Raise the level of abstraction. What
rather than how.


Subroutines – Control abstraction
Data abstraction mechanisms
… Higher Levels of Abstraction


Programming is simplified by using
macro operations from a catalog.
Modules (subroutines/classes/…)


Part of the language (Fortran 90,
MATLAB, SETL)
Standard libraries



Hand–written
Automatically generated
Application specific (usually hand written)
Performance and Abstraction

In many cases the main mechanism to attain high
performance is to develop high-performance library
routines.


This approach does not always work. Real
applications make little use of pre-existing libraries.



For example, MATLAB programming style is to use
functions as much as possible.
One reason: Data structures are not always in the right
format.
Another: The overhead associated with class accesses.
For this reason, with current technology,
Higher-level => Lower performance
Automatic Generation of
Modules from Specifications…

Several systems aim at generating the
fastest possible routines for certain classes
of computations



Relatively simple (algorithms)
Very high performance implementation can be
tedious and time consuming.
Examples of these systems include



ATLAS
FFTW
Spiral
… Automatic Generation of
Modules from Specifications

Other systems try to simplify the generation
of complete applications. Although
performance is also a concern, language
design and correctness are the most
important issues.



Ellpack
GPSS
Many CAD systems
ATLAS

Generate several versions of BLAS
routines




Different tile sizes
Different degrees of unrolling
Loop ordering is fixed
Run all and choose the fastest
FFTW
Frs= (FrIs)T (Ir Fs)L

Recursive divide-and-conquer




Codelet




Plan: factorization tree
F1024
Factorization stop at certain sizes
Execution: call codelets
F8
Subroutines for small-size FFTs
F8
Optimized and fully-unrolled
Generated by a dedicated compiler
Adapt to environment at run-time

Dynamic programming
F128
F16
SPIRAL
DSP Transform
Formula Generator
SPL Formulae
SPL Compiler
Search
Engine
C/FORTRAN Programs
Performance Evaluation
Target Architecture
DSP Libraries
Supercompilers …


Integration of Very High Level Specifications with
Conventional Languages
Besides conventional subroutines selected from a
catalog), the languages accepted by
supercompilers would also call “macros” which
could be used to generate code as a function of the






Target machine
Value of data
Structure of data
Shape of data
Rest of the program
Numerical properties
… Supercompilers …

Macros could be subroutines or class
methods. Expanding classes could
include data representation selection
(including data distribution)



SETL
Automatic Dense  Sparse techniques
Automatic data distribution techniques
… Supercompilers


In theory at least, generating code
from specifications rather than from
specific HLL implementations should
lead to better performance.
All the benefits of abstraction without
the performance penalty.
Vectorizers and High Level
Specifications
do i=1,n
a(i)=b(i)+c(i)
do i=1,n
a(i)=b(i)+c(i)
d(i)=a(i)+d(i-1)
if (m > d(i)) m=d(i)
end do
end do
do i=1,n
d(i)=a(i)+d(i-1)
end do
do i=1,n
if (m > d(i))
m=d(i)
end do
a(1:n)=b(1:n)+c(1:n)
d(1:n) = lin-rec(a,d,1,n)
m=min(m,d(1:n)
Back End Compilers and
Supercompilers …

Back End Compilers take care of




Machine code generation
Register allocation
Conventional optimizations
But not really trusted by today’s module generation
systems (Competitors)



The existence of ATLAS is just an indictment of current
compiler technology.
FFTW does clustering to improve register allocation.
Spiral does a variety of conventional optimizations.
Optimizations in Spiral
Formula Generator
SPL Compiler
C/Fortran Compiler
* High-level scheduling
* Loop transformation
* High-level optimizations
- Constant folding
- Copy propagation
- CSE
- Dead code elimination
* Low-level optimizations
- Instruction scheduling
- Register allocation
Basic Optimizations
(FFT, N=25, SPARC, f77 –fast –O5)
Basic Optimizations
(FFT, N=25, PII, g77 –O6 –malign-double)
Basic Optimizations
(FFT, N=25, MIPS, f77 –O3)
Can Module Generators Rely
on Back End Compilers ?


Not always, but using backend compilers
will always be necessary for portability
(Collaborators).
But … Compilers can hinder efforts to get
good performance.


For example, bad register allocation can have a
serious negative impact.
Need a standard set of commands to control
transformations applied by compiler
… Back End Compilers and
Supercompilers


In Supercompilers transformations
should be done by the Back End
whenever possible.
Reason: Applies to all parts of the
program not only to very high-level
components.
Search …


Search is an important component of
module generators.
Also used by conventional compilers, but
compilers usually work with static
predictions rather than actual execution
times.




KAP tried all possible loop permutations.
SGI-PRO tries many combinations of unrolling of
unrolling.
Superoptimizer and similar systems.
Most compiler optimization algorithms are
heuristics with no search involved.
… Search …

In Supercompilers search could also
be done across several algorithms
looking for a good data representation
and data distribution for the whole
program.
… Search …

Search strategy could make use of actual
execution times combined with static
performance prediction



Static prediction not very accurate today.
Tight performance bounds to prune the search.
Some decisions could be made at run-time


IF statements/multiversion loops
JIT compilers
… Search

Some search could be based on data
dependent behavior



Profiling
“Representative” data set
Search strategy is important given that
space of possibilities is often large and not
monotonic. And it is difficult to know how far
the search process is from the optimum.

Need to develop tight bounds.
Size of Search Space
N
21
22
23
24
25
26
27
28
# of formulas
1
1
3
11
45
197
903
4279
N
29
210
211
212
213
214
215
216
# of formulas
20,793
103,049
518,859
2,646,723
13,649,969
71,039,373
372,693,519
1,968,801,519
Coverage





Need a class of specifications large enough to
represent most of the computation.
Effectiveness of approach will depend on coverage.
Current libraries are a good start.
But … it is not clear how much these libraries
typically cover.
To impact programming in general current
approaches would have to be extended to other
domains such as sparse computations, sorting,
searching. …
Conclusions




As we understand better algorithm choices and
their impact in performance it becomes feasible to
automate much of the process of selecting data
structures and algorithms to maximize performance.
A first step: a repository of routines/classes with
several implementations for each subroutine.
But generation based on context could lead to
better performance.
In particular generation from very high-level
specifications could allow the generation of code
combining several operations in ways that is
impossible to conceive with current encapsulation
mechanisms.