The Riverside Optimizing Compiler for Configurable
Download
Report
Transcript The Riverside Optimizing Compiler for Configurable
COMPUTER
SCIENCE &ENGINEERING
Compiled code acceleration on
FPGAs
W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra
Computer Science & Engineering
University of California Riverside
Why?
Are FPGA: A New HPC Platform?
Comparison of
(dp) Gflop/s
a dual core Opteron (2.5 GHz) to
Virtex 4 & 5 FPGA on dp fp
Opt
V-4
V-5
Balanced allocation of adders,
multipliers and registers
MAc
10
15.9
28.0
Mult
5
12.0
19.9
Use both DSP and logic for
multipliers, run at lower speed
Add
5
23.9
55.3
Logic & wires for I/O interfaces
David Strensky, FPGAs Floating-Point
Performance -- a pencil and paper evaluation, in
HPCwire.com
28 September 2007
Future of Computing - W. Najjar
Watts
Opt V-4
V-5
95
~35
25
2
ROCCC
Riverside Optimizing Compiler for Configurable
Computing
Code acceleration
By mapping of circuits to FPGA
Achieve same speed as hand-written VHDL codes
Improved productivity
Allows design and algorithm space exploration
Keeps the user fully in control
We automate only what is very well understood
28 September 2007
Future of Computing - W. Najjar
3
Challenges
FPGA is an amorphous mass of logic
Structure provided by the code being accelerated
Repeatedly applied to a large data set: streams
Languages reflect the von Neumann execution model:
Highly structured and sequential (control driven)
Vast randomly accessible uniform memory
28 September 2007
CPUs (& GPUs)
FPGAs
Temporal computing
Spatial computing
Sequential
Parallel
Centralized storage
Distributed storage
Control flow driven
Data flow driven
Future of Computing - W. Najjar
4
ROCCC Overview
Procedure, loop
and array
optimizations
Instruction scheduling
Pipelining and storage
optimizations
C/C++
Java
High level
Hi-CIRRF
transformations
Low level
transformations
Lo-CIRRF
Code
generation
VHDL
FPGA
SystemC
CIRRF
Compiler Intermediate
Representation for
Reconfigurable Fabrics
DSP
CPU
Binary
Limitations on the code:
•No recursion
•No pointers
28 September 2007
Future of Computing - W. Najjar
GPU
Custom
unit
5
A Decoupled Execution Model
Decoupled memory access
from datapath
Parallel loop iterations
Pipelined datapath
Smart buffer (input) does
data reuse
Memory fetch and store
units, data path configured
by compiler
Off chip accesses platform
specific
28 September 2007
Input memory
(on or off chip)
Mem Fetch
Unit
Input Buffer
Multiple loop bodies
Unrolled and pipelined
Output memory
(on or off chip)
Future of Computing - W. Najjar
Output Buffer
Mem Store
Unit
6
So far, working compiler with …
Extensive optimizations and transformations
Traditional and FPGA specific
Systolic array, pipelined unrolling, look-up tables
Compile + hardware support for data reuse
> 98% reduction in memory fetches on image codes
Efficient code generation and pipelining
Within 10% of hand-optimized HDL codes
Import of existing IP cores
Leverages huge wealth, integrated with C source code
Support for dynamic partial reconfiguration
28 September 2007
Future of Computing - W. Najjar
7
Example: 3-tap FIR
Indices of A[]
#define N 516
void begin_hw();
void end_hw();
int main()
coefficients
{
int i;
const int T[5] = {3,5,7};
int A[N], B[N];
begin_hw();
L1: for (i=0; i<=(N-3); i=i+1)
{
B[i] = T[0]*A[i] +
T[1]*A[i+1] + T[2]*A[i+2];
}
end_hw(); }
28 September 2007
Future of Computing - W. Najjar
8
RC Platform Models
Memory interface
FPGA
CPU
1
Memory interface
CPU
2
FPGA
CPU
3
Fast Network
CPU Memory
28 September 2007
FPGA
CPU Memory
Future of Computing - W. Najjar
FPGA
9
What we have learned so far
Big speedups are possible
10x to 1,000x on application codes, over Xeon and
Itanium, molecular dynamics, bio-informatics, etc.
Works best with streaming data
New paradigms and tools
For spatio-temporal concurrency
Algorithms, languages, compilers, run-time systems
etc
28 September 2007
Future of Computing - W. Najjar
10
Future? Very wide use of FPGAs
Why?
High throughput (> 10x) AND low power (< 25%)
How?
Mostly in Models 2 and 3, initially
Model2: See Intel QuickAssist, Xtremedata & DRC
Model 3: SGI, SRC & Cray
Contingency
Market brings price of FPGAs down
Availability of some software stack
for savvy programmers, initially
Potential
Multiple “killer apps” (to be discovered)
28 September 2007
Future of Computing - W. Najjar
11
Conclusion
We as a research community should be ready
Stamatis was
Thank you
28 September 2007
Future of Computing - W. Najjar
12