Coarse Grain Reconfigurable Architectures

Download Report

Transcript Coarse Grain Reconfigurable Architectures

THE NINTH WORLD CONFERENCE ON
INTEGRATED DESIGN & PROCESS
TECHNOLOGY
San Diego, CA, USA, June 25 - 30, 2006
Reiner Hartenstein
TU Kaiserslautern
The Transdisciplinary
Responsibility
of CS Curricula
The Basic Model Paradigm Trap
TU Kaiserslautern
frustrates interdisciplinary education,
in CS even between subdisciplines
High performance computing
stalled for decades by the
von Neuman paradigm trap:
the wrong road map.
The right roadmap kept by
another trap for decades !
© 2006, [email protected]
2
stolen from Bob Colwell
http://hartenstein.de
TU Kaiserslautern
Transdisciplinary Education?
Computer Science not prepared
Lacking intradisciplinary cohesion
between the mind sets of:
•Theoreticians (Math background)
•Hardware People
•Computer Architects
•Embedded Syst. Designers
•Software People (Application Development)
for decades: the Hardware / Software chasm
turns into: the Configware / Software chasm
© 2006, [email protected]
3
http://hartenstein.de
define:
... which data item
TU Kaiserslautern
at which time
at which port
introducing Data streams,
time
x
x
x
(pipe network) DPA*
*) DataPath Array
(array of DPUs)
execution
transporttriggered
time
no instruction streams needed
x
x
x
|
x
x
x
|
port #
- - - x x x
time
- - - - x x x
x x x - -
© 2006, [email protected]
input data stream
|
x x x
x x x -
- - - - - x x x
port #
H. T. Kung paradigm
(systolic array)
1980
|
|
|
|
|
|
|
|
|
|
|
x
x
x
time
4
x
x
x
port #
output data streams
|
x
x
x
CS Mathematicians‘
hobby, early 80ies
http://hartenstein.de
TU Kaiserslautern
Synthesis Method?
of course, algebraic !
Algebraic means linear projection, restricted to
uniform arrays, only with linear pipes
useful only for applications with
strictly regular data dependencies:
Mathematicians caught by their own paradigm trap
for more than a decade
rDPA:
Generalization by a transdisciplinary hardware guy:
Rainer Kress discarded their algebraic synthesis
methods and replaced it by simulated annealing. 1995
© 2006, [email protected]
5
http://hartenstein.de
no memory wall
The right road map to HPC:
TU Kaiserslautern there ignored for decades
massively avoiding memory cycles
DPA
DPU operation is
transport-triggered
© 2006, [email protected]
|
- - - x x x
- - - - x x x
x x x - -
nor thru common memory
where are the
supercomputing people ?
|
(took >2 decades)
6
input data streams
|
x x x
x x x -
no instruction streams
no message passing
x
x
x
x
x
x
x
x
x
- - - - - x x x
|
|
|
|
|
|
|
|
|
|
|
x
x
x
x
x
x
output data streams
|
x
x
x
http://hartenstein.de
The supercomputing paradigm trap
TU Kaiserslautern
this did not prevent supercomputing from
following the wrong rodmap for decades,
imprisoned by the von Neumann paradigm trap
No technology transfer from Mathematics:
caught by the algebraic paradigm trap
(systolic array scene)
© 2006, [email protected]
7
http://hartenstein.de
Monstrous Steam Engines of Computing
Crossbar weight: 220 t, 3000 km of cable,
5120 Processors, 5000 pins each
ES 20: TFLOPS
TU Kaiserslautern
peak or sustained?
© 2006, [email protected]
8
http://hartenstein.de
Dead Supercomputer Society
Research 1985 – 1995 [Gordon Bell, keynote ISCA 2000]
TU Kaiserslautern
•ACRI
•Alliant
•American
Supercomputer
•Ametek
•Applied Dynamics
•Astronautics
•BBN
•CDC
•Convex
•Cray Computer
•Cray Research
•Culler-Harris
•Culler Scientific
•Cydrome
•Dana/Ardent/
Stellar/Stardent
•DAPP
•Denelcor
•Elexsi
•ETA Systems
•Evans and Sutherland
•Computer
•Floating Point Systems
•Galaxy YH-1
•Goodyear Aerospace MPP
•Gould NPL
•Guiltech
•ICL
•Intel Scientific Computers
•International Parallel
.
Machines
•Kendall Square Research
•Key Computer Laboratories
© 2006, [email protected]
9
•MasPar
•Meiko
•Multiflow
•Myrias
•Numerix
•Prisma
•Tera
•Thinking Machines
•Saxpy
•Scientific Computer
•Systems (SCS)
•Soviet Supercomputers
•Supertek
•Supercomputer Systems
•Suprenum
•Vitesse Electronics
http://hartenstein.de
TU Kaiserslautern
The Language and Tool Disaster
End of April a DARPA brainstorming conference
Software people do not speak VHDL
Hardware people do not speak MPI
Bad quality of the application development tools
A poll at FCCM’98 revealed, that
86% hardware designers hate their tools
© 2006, [email protected]
10
http://hartenstein.de
TU Kaiserslautern
Escaping the Paradigm Trap
The underground success story of FPGAs
The fastest growing segment
of the semiconductor market
Massive speed-up
Slashing the electricity bill
However, this is not supported
by our education systems
© 2006, [email protected]
11
http://hartenstein.de
some published speed-up factors
The RC
paradox
relative
performance
TU Kaiserslautern
109
DSP and wireless
Image processing,
Decoding
Pattern matching, real-time face Reed-Solomon
detection
2400
6000
crypto
Multimedia video-rate stereo visionMAC 1000
106
1000
although the effective
integration density of
FPGAs is by 4 orders
of magnitude 103
behind the Moore curve
400
pattern recognition 730
900 288
SPIHT wavelet-based image compression 457
Bioinformatics
1980
© 2006, [email protected]
100
52
FFT
protein identification BLAST
40
Pentium 4
20
wiring overhead
reconfigurability overhead
routing congestion
8080
100
Viterbi Decoding
Smith-Waterman
pattern matching
88 molecular dynamics simulation
GRAPE
Astrophysics
1990
12
2000
2010
http://hartenstein.de
Pervasiveness of Reconfigurable Computing (RC)
TU Kaiserslautern
“FPGA and ….”
# of hits
by Google
•Hardware People:
•Computer architects
•Embedded system des.
647,000
1,490,000
Embedded Systems scene
not imprisened by the von
Neumann paradigm trap
398,000
1,620,000
915,000
272,000
© 2006, [email protected]
13
http://hartenstein.de
The Pervasiveness of RC
unqualified for RC ?
Math/SW-savvy scene
TU Kaiserslautern
“FPGA and ….”
# of hits
by Google
# of hits
by Google
647,000
1,490,000
171,000
194,000
398,000
1,620,000
127,000
113,000
158,000
162,000
915,000
272,000
© 2006, [email protected]
14
http://hartenstein.de
Unqualified for RC ?
TU Kaiserslautern
Using FPGAs for scientific computation?
hiring a student from the EE dept. ?
application disciplines use their own trick boxes:
transdisciplinary fragmentation of methodology
CS is responsible to provide a RC common model
•for transdisciplinary education
•and, to fix its intradisciplinary fragmentation
© 2006, [email protected]
15
http://hartenstein.de
TU Kaiserslautern
Joint Task Force for Computing Curricula 2004
fully ignores
Reconfigurable Computing
Curricula ?
FPGA & synonyma: 0 hits
(Google: 10 million hits)
not even here
© 2006, [email protected]
16
http://hartenstein.de
TU Kaiserslautern
Curriculum Recommendations, v. 2005
Upon my complaints the only change: including
to the last paragraph of the survey volume:
"programmable hardware (including
FPGAs, PGAs, PALs, GALs, etc.)."
However, no structural changes at all
v. 2005 intended to be the final version (?)
torpedoing the transdisciplinary
responsibility of CS curricula
This is criminal !
© 2006, [email protected]
17
http://hartenstein.de
TU Kaiserslautern
We need a
SIG on CS education
and RC education
to identify intra-disciplinary
communication gaps in CS
a task force for Reconfigurable Computing education
to send delegates to the
Joint Task Force for Computing Curricula
to develop a roadmap for CS to assume
intradisciplinary resonsibility for education
© 2006, [email protected]
18
http://hartenstein.de
TU Kaiserslautern
thank you
© 2006, [email protected]
19
http://hartenstein.de
TU Kaiserslautern
END
© 2006, [email protected]
20
http://hartenstein.de
TU Kaiserslautern
Backup for
Discussion:
© 2006, [email protected]
21
http://hartenstein.de
use data counters,
no program counter
x
x
x
|
|
|
x x x - -
32 ports, or
n x 32 ports
© 2006, [email protected]
|
|
|
|
|
|
|
|
|
|
x
x
x
x
x
x
22
|
x
x
x
ASM
other example
|
ASM
50 & more
on-chip ASM
are feasible
x
x
x
x x x
x x x -
ASM
implemented ASM
by distributed ASM
memory ASM
x
x
x
ASM
reconfigurable
(pipe network) rDPA
ASM
ASM
TU Kaiserslautern
ASM Data stream
generators
- - - x x x
ASM
- - - - x x x
ASM
- - - - - x x x
ASM
non-von-Neumann
machine paradigm
GAG
RAM
data
counter
ASM: AutoSequencing
Memory
http://hartenstein.de
Liaison to the Organic Computing Initiative ?
TU Kaiserslautern
German
section
http://www.organic-computing.de
http://www.organic-computing.org
© 2006, [email protected]
23
http://hartenstein.de
Illustrating the von Neumann paradigm trap
the watering pot model
TU Kaiserslautern
[Hartenstein]
The instruction-stream-based approach
many watering pots
The data-stream-based approach
has no von
Neumann
bottleneck
von
Neumann
bottleneck
© 2006, [email protected]
24
http://hartenstein.de
TU Kaiserslautern
data meeting the processing unit (PU)
We have 2 choices
by Software
routing the data
instruction streams are
memory-cycle-hungry
placement of the
execution locality
by
Configware
optimize pipe network:
place PU in data stream
© 2006, [email protected]
25
http://hartenstein.de
TU Kaiserslautern
Co-Compiler Enabling Technology
is available from academia
only a small team needed for
commercial re-implementation
on the road map to the
Personal Supercomputer
© 2006, [email protected]
26
http://hartenstein.de
Compilation: Software vs. Configware
TU Kaiserslautern
Software
Engineering
source program
Configware
Engineering
C, FORTRAN
MATHLAB
placement source „program“
& routing
mapper
software
compiler
configware
compiler
data scheduler
software code
configware code
© 2006, [email protected]
27
flowware code
http://hartenstein.de
TU Kaiserslautern
Nick Tredennick’s Paradigm Shifts
explain the differences
Software Engineering
CPU
software
resources: fixed
algorithm: variable
1 programming
source needed
Configware Engineering
configware
flowware
© 2006, [email protected]
resources: variable
algorithm: variable
28
2 programming
sources needed
http://hartenstein.de
Co-Compilation
TU Kaiserslautern
C, FORTRAN, MATHLAB
automatic SW / CW partitioner
Software /
Configware
software Co-Compiler
compiler
mapper
configware
compiler
data scheduler
software code
configware code
© 2006, [email protected]
29
flowware code
http://hartenstein.de
Co-Compiler for Hardwired Kress/Kung Machine
[e. g. Brodersen]
TU Kaiserslautern
source
automatic SW / CW partitioner
Software /
software
Flowware
compiler Co-Compiler
flowware
compiler
data scheduler
software code
© 2006, [email protected]
30
flowware code
http://hartenstein.de