Coarse Grain Reconfigurable Architectures

Download Report

Transcript Coarse Grain Reconfigurable Architectures

Reconfigurable Computing
THALES Research & Technologies,
Orsay, France, Friday, Sept.18, 2003
Reiner Hartenstein
Kaiserslautern
University of
Technology
Reconfigurable Computing and
its Enabling Technologies
-for the Personal Supercomputer
(PS) to replace the PC
Kaiserslautern
University of
Technology
>> outline <<
• Preface
• Flowware
• Datastream-based Computing
• The Anti Machine Paradigm
• Final Remarks
http://www.uni-kl.de
© 2003, [email protected]
2
http://hartenstein.de
Kaiserslautern
University of
Technology
Ubiquitous Embedded Systems
Embedded System Engineering (ESE) requires:
• HW / CW / ESW co-design onto
highly programmable platforms (SoC)
ESW and CW become main vehicle
to product differentiation
ESE and CW become the
main focus in system design
© 2003, [email protected]
3
(Performance
and) Flexibility
are key issues
http://hartenstein.de
Kaiserslautern
University of
Technology
Reconfigurable Computing:
a second programming domain
Migration of programming to the structural domain
The structural domain has become RAM-based
Currently running: the next fundamental revolution
after introduction of the microprocessor
However, CS curricula ignore this impact of Reconfigurable
Computing – key issue in embedded systems ...
... causing the coming disaster by unqualified CS
graduates pushing up the unemployment rate ?
© 2003, [email protected]
4
http://hartenstein.de
focusing on coarse grain
Kaiserslautern
University of
Technology
• Fine Grain morphware platforms
already mainstream: reconfigurable logic
just logic design on a strange platform
• Coarse Grain platforms:
Reconfigurable Computing :
not that new – but shocking the
fundamentals of CS curricula
speed-up til 3
orders of magnitude
© 2003, [email protected]
an order of magnitude more
MIPS/mW than fine grain
5
http://hartenstein.de
Kaiserslautern
University of
Technology
mapping algorithms efficently onto rDPA
SNN filter on KressArray
rout thru only
array size:
10 x 16
= 160 rDPUs
Legend:
rDPU not used
backbus connect
used for
routing only
backbus
connect
operator and routing
port location
not
used marker
by the way: example of scalability / relocatability by EDA support
© 2003, [email protected]
6
http://hartenstein.de
Kaiserslautern
University of
Technology
>> flowware
• Preface
• Flowware
• Datastream-based Computing
• The Anti Machine Paradigm
• Final Remarks
http://www.uni-kl.de
© 2003, [email protected]
7
http://hartenstein.de
IT ages
Kaiserslautern
University of
Technology
flowware
data streams ...
mainframe age
computer age (PC age)
morphware age
here?
von Neumann
does not support morphware
1967
1957
© 2003, [email protected]
2007
1987
1977
1997
8
http://hartenstein.de
Kaiserslautern
University of
Technology
Flowware defines:
... which data item
at which time
time
at which port
x
x
x
DPA
time
x
x
x
|
x
x
x
|
|
x x x
x x x -
time
- - - - x x x
- - - - - x x x
port #
|
|
|
|
|
|
|
|
|
|
|
x
x
x
9
port #
- - - x x x
x x x - -
© 2003, [email protected]
input data streams
time
x
x
x
port #
output data streams
|
x
x
x
http://hartenstein.de
Paradigm Shifts:
Nick Tredennick‘s view
Kaiserslautern
University of
Technology
why 2 program sources ?
data-stream-based
data-stream
reconfigurable
computing:
instruction-streaminstruction-stream
based computing:
algorithms variable
algorithms variable
resources fixed
resources variable
Software
© 2003, [email protected]
programmable
10
Configware
Flowware
http://hartenstein.de
Configware / Flowware Compilation
Kaiserslautern
University of
Technology
M
M
M
M
data
streams
M
M
© 2003, [email protected]
M
mapper
M
configware
M
M
M
wrapper
„instruction“ fetch
before runtime
M
r. Data
Path
Array
high level source
intermediate
M
rDPA
M
M
M
asM
scheduler
address
generator
11
flowware
data sequencer
http://hartenstein.de
Kaiserslautern
University of
Technology
>> Datastream-based Computing
• Preface
• Flowware
• Datastream-based Computing
• The Anti Machine Paradigm
• Final Remarks
http://www.uni-kl.de
© 2003, [email protected]
12
http://hartenstein.de
Kaiserslautern
University of
Technology
computing paradigms and methodologies
1946: machine paradigm (von Neumann)
1980: data streams (Kung, Leiserson)
1990: rDPU (Rabaey)
1994: anti machine high level programming language
1995: super systolic rDPA
flowware
1989: anti machine paradigm
1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...
1997+: discipline of distributed memory architecture
1997: configware / software partitioning compiler
© 2003, [email protected]
13
http://hartenstein.de
Flowware heading toward mainstream
Kaiserslautern
University of
Technology
•Data-stream-based Computing is heading for mainstream
–1997 SCCC (LANL) Streams-C Configurabble Computing
–SCORE (UCB) Stream Computations Organized for Reconfigurable Execution
–ASPRC (UCB) Adapting Software Pipelining for Reconfigurable Computing
–2000 Bee (UCB), ...
–Most stream-based multimedia systems, etc.
–Many other areas ....
Flowware ..... mostly not yet modelled that
way: most flowware is hidden by its indirect
instruction-stream-based implementation
© 2003, [email protected]
14
http://hartenstein.de
Kaiserslautern
University of
Technology
>> The Anti Machine Paradigm
• Preface
• Flowware
• Datastream-based Computing
• The Anti Machine Paradigm
• Final Remarks
http://www.uni-kl.de
© 2003, [email protected]
15
http://hartenstein.de
Kaiserslautern
University of
Technology
language category
both deterministic
operation
sequence
driven by:
state register
address
computation
Programming Language Paradigms
Computer Languages
Languages f. Anti Machine
procedural sequencing: traceable, checkpointable
read next instruction,
read next data item,
goto (instr. addr.),
goto (data addr.),
jump (to instr. addr.),
jump (to data addr.),
instr. loop, loop nesting
data loop, loop nesting,
no parallel loops, escapes,
parallel loops, escapes,
instruction stream branching data stream branching
program counter
data counter(s)
massive memory
overhead avoided
cycle overhead
Instruction fetch
parallel memory
bank access
memory cycle overhead
overhead avoided
interleaving only
no restrictions
language features
control flow +
data manipulation
data streams only
(no data manipulation)
© 2003, [email protected]
16
http://hartenstein.de
Machine paradigms
Kaiserslautern
University of
Technology
von Neumann
memory
M
I/O
instruction
stream
machine
instruction
stream
DPU
CPU instruction
sequencer
-
CPU
+
(reconf.) data-stream machine
(anti machine)
Flowware
DPU
+
-
Software (Configware)
M M M M
I/O
M
I/O
memory
data address
generator
(data sequencer)
asM**
data stream
DPU or rDPU
distributed memory architecture*
memory
M
M M M M
M
I/O
(r)DPU
(r)DPA
*) the new discipline came just in time:
see Herz et al.: Proc. IEEE ICECS, 2002
http://hartenstein.de
© 2003, [email protected]
17 by Francky Catthoor
also see books
et al.
What‘s the problem ?
Kaiserslautern
University of
Technology
Crossing the Hardware /
Software Chasm [Mike Butts]
µprocessor
accelerators
It‘s the gap between procedural and structural mind set
Traditional CS: programming is (control-)procedural,
instruction-stream-based – sources: software
The typical programmer has problems to understand
function evaluation without machine mechanisms....
... we need a second machine paradigm
© 2003, [email protected]
18
http://hartenstein.de
PC replaced by PS
Kaiserslautern
University of
Technology
flowware
data streams ...
mainframe age
computer age (PC age)
morphware age
co-compiler
PC replaced by PS
(personal supercomputer)
1967
1957
© 2003, [email protected]
µProc rDPA
.
2007
1987
1977
1997
19
http://hartenstein.de
Kaiserslautern
University of
Technology
>> final remarks
• Preface
• Flowware
• Datastream-based Computing
• The Anti Machine Paradigm
• Final Remarks
http://www.uni-kl.de
© 2003, [email protected]
20
http://hartenstein.de
Kaiserslautern
University of
Technology
All enabling technologies are available
• literature from last 30 years
• languages & (co-)compilation techniques
• anti machine and all its architectural resources
• parallel memory IP cores and generators
• morphware vendors like PACT ....
• anything else needed
© 2003, [email protected]
21
http://hartenstein.de
>>> thank you
Kaiserslautern
University of
Technology
thank you
© 2003, [email protected]
22
http://hartenstein.de
>>> END <<<
Kaiserslautern
University of
Technology
© 2003, [email protected]
23
http://hartenstein.de
>>> X <<<
Kaiserslautern
University of
Technology
© 2003, [email protected]
24
http://hartenstein.de
Kaiserslautern
University of
Technology
>>> for discussion <<<<<<<<<<<<<
for
discussion:
© 2003, [email protected]
25
http://hartenstein.de
Kaiserslautern
University of
Technology
SoC means Embedded Systems
factor
2
90% by 2010
The real labor market:
10 times more programmers
will write embedded applications
than computer software by 2010
1
0*) Department of Trade and Industry, London
© 2003, [email protected]
10
12
26
18
months
http://hartenstein.de
Kaiserslautern
University of
Technology
Xplorer Plot: SNN Filter Example
[13]
http://kressarray.de
2 hor. NNports, 32 bit
3 vert. NNports, 32 bit
route-thru-only rDPU
© 2003, [email protected]
+
result
operand
27
operator
operand
route thru
backbus connect
http://hartenstein.de
KressArray principles
Kaiserslautern
University of
Technology
• take systolic array principles
• replace classical synthesis by simulated annealing
• yields the super systolic array
• a generalization of the systolic array
• no more restricted to regular data dependencies
• now reconfigurability makes sense
© 2003, [email protected]
28
http://hartenstein.de
Super Pipe Networks
Kaiserslautern
University of
Technology
array
systolic
array
applications
regular data
dependencies
only
supersystolic
rDPA
*
pipeline properties
shape
resources
linear
only
uniform
only
mapping
linear projection or
algebraic synthesis
simulated
annealing or
P&R algorithm
no restrictions
scheduling
(data stream
formation)
(e.g. force-directed)
scheduling
algorithm
*) KressArray [1995]
© 2003, [email protected]
29
http://hartenstein.de
Kaiserslautern
University of
Technology
KressArray Family generic Fabrics:
a
few
examples
Select mode,
Select
number, width
of NNports
16
Function
Repertory
8
32
+
24
2
rDPU
4
select Nearest Neighbour (NN) Interconnect: an example
http://kressarray.de
© 2003, [email protected]
routthrough
only
more NNports:
rich Rout Resources
rout-through
and function
Examples of
2nd Level
Interconnect:
layouted over
rDPU cell no separate
routing areas !
30
http://hartenstein.de
Kaiserslautern
University of
Technology
KressArray Xplorer
DPSS
KressArray
(Design Space)
Platform Space
Explorer
User
User
Interface
ALEX
Code
Suggestion
Application
Set
Schedule
© 2003, [email protected]
statist.
Data
Inference
Engine (FOX)
interm.
form
interm.
form
Bus
& I/O
Improvement
Proposal
Generator
Architecture
Estimator
Selection
Architecture
Editor
Mapping
Editor
Suggestion
ALE-X
Compiler
Delay
Estim.
Mapper
interm.
form
HDL
Generator
Simulator
VHDL
Verilog
Design
Rules
Datapath
Generator
Generator
Scheduler
Kress
rDPU
Layout
DPSS
Power
Estimator
Analyzer
Power
Data
http://kxplorer.informatik.uni-kl.de
http://hartenstein.de
31
Kaiserslautern
University of
Technology
The Secret of Success: Co-Compilation
supporting platform-based design
High level PL source
“vN" machine
paradigm
Partitioner
anti machine
paradigm
CW
SW
Analyzer
compiler / Profiler compiler
SW code
© 2003, [email protected]
CW Code
32
supporting
different
platforms
Resource
Parameters
http://hartenstein.de
Loop Transformation Examples
Kaiserslautern
University of
Technology
sequential processes:
loop 1-16
body
endloop
resource parameter driven
Co-Compilation
host:
loop 1-8
trigger
endloop
loop 1-8
fork
body
body
loop 1-8 loop 9-16
endloop body
body
endloop endloop
loop
unrolling
loop 1-4
trigger
endloop
loop 1-2
trigger
endloop
join
strip mining
© 2003, [email protected]
reconf.array:
33
http://hartenstein.de
We introduce: Co-Compilation
Co-Compilation
Kaiserslautern
University of
Technology
Machine
Paradigm
partitioning compiler
Computer
mProcessor
interface
Software
running on
high level programming
language source
Reconfigurable
Accelerators
Configware
running on
Xputer
“Soft”
Machine
Paradigm
Reconfigurable
Architecture (RA)
-- instead of hardwired
© 2003, [email protected]
34
http://hartenstein.de
Kaiserslautern
University of
Technology
Significance of Address Generators
• Address generators have the potential to reduce
computation time significantly.
• In a grid-based design rule check a speed-up of
more than 2000 has been achieved, compared to a
VAX-11/750
• Dedicated address generators contributed a
factor of 10 - avoiding memory cycles for address
computation overhead
© 2003, [email protected]
35
http://hartenstein.de
Kaiserslautern
University of
Technology
http://www.uni-kl.de
>> final remarks <<
• Embedded System Design Crisis
• Computing Crisis
• CS for Embedded Systems?
• Flowware-based Computing
• Enabling Architectural Resources
• New Machine Paradigm
• final remarks
© 2003, [email protected]
36
http://hartenstein.de
Kaiserslautern
University of
Technology
Why a dichotomy of machine paradigms?
vN: unbalanced
vN bottleneck
data stream machine:
• bad message:
caches do not help
• good message:
no vN bottleneck
• caches not needed
stolen from Bob Colwell
The anti machine has no
von Neumann bottleneck
© 2003, [email protected]
37
http://hartenstein.de
Kaiserslautern
University of
Technology
Acceleration Mechanisms
• parallelism by multi bank memory architecture
• auxiliary hardware for address calculation
• address calculation before run time
• avoiding multiple accesses to the same data.
• avoiding memory cycles for address computation
• improve parallelism by storage scheme transformations
• improve parallelism by memory architecture transformations
• alleviate interconnect overhead (delay, power and area)
© 2003, [email protected]
38
http://hartenstein.de
Kaiserslautern
University of
Technology
Synthesizable distributed memory architecture...
for a Stream-based Soft Machine
“instructions”
rDPA
Compiler
Memory
Scheduler
(data memory)
memory bank
memory bank
memory bank
...
memory bank
...
memory bank
Sequencers
(data stream
generator)
© 2003, [email protected]
39
http://hartenstein.de
Kaiserslautern
University of
Technology
Reconfigurable Computing:
a second programming domain
Migration of programming to the structural domain
The structural domain has become RAM-based
The opportunity to introduce
the structural domain to programmers ...
... to bridge the gap by clever abstraction mechanisms
using a simple new machine paradigm
© 2003, [email protected]
40
http://hartenstein.de
Algorithmic cleverness
Kaiserslautern
University of
Technology
Very high throughput on low power slow
FPGAs may be obtained only by algorithmic
cleverness - not yet taught by CS & CSE at
Universities – an urgent educational problem.
© 2003, [email protected]
41
http://hartenstein.de
Kaiserslautern
University of
Technology
Summary of the Anti Machine Paradigm
• anti language primitives are
almost the same (slightly extended)
• anti machine execution potential
is dramatically more powerful
• provides drastically more flexibility
• not always replacing von Neumann
© 2003, [email protected]
42
http://hartenstein.de
roadmap
Kaiserslautern
University of
Technology
old CS lab course philosophy:
given an application: implement it by a program -/-
new CS freshman lab course environment:
Given an application:
a) implement it by writing a program
b) implement it as a morphware prototype
c) Partition it into P and Q
c.1) implement P by software
c.2) implement Q by morphware
c.3) implement P / Q communication interface
© 2003, [email protected]
43
http://hartenstein.de
Semiconductor Paradigm Shifts
Kaiserslautern
University of
Technology
“Mainstream Silicon Application
is switching every 10 Years”
standard
TTL
1957
custom
“The Programmable System-on-a-Chip
is the next wave“
hardwired
1967
instruction stream programming
µproc.,
memory
LSI,
MSI
structural programming:
datastream-based
operation
2007
1987
ASICs,
accel’s
1977
1997
rDPU, rDPA
algorithm: fixed
algorithm: variable
algorithm: variable
resources: fixed
resources: fixed
resources: variable
vN machine
Tredennick’s
paradigm
Paradigm Shifts
© 2003, [email protected]
44
anti machine
paradigm
http://hartenstein.de
Kaiserslautern
University of
Technology
PACT XPP: Reference Module: XPU128 Co-Processor
XPP128 rDPA
ALU
• Full 32 or 24 Bit Design working silicon
• 2 Configuration Hierarchies
• Evaluation Board available, and
• XDS Development Tool with Simulator
© 2003, [email protected]
buses
not
shown
Ctrl
CFG
rDPU
PAE
core
© PACT AG, Munich
http://pactcorp.com
45
http://hartenstein.de
There are exceptions
Kaiserslautern
University of
Technology
GI / ITG Fachgruppe PARS (Parallele Algorithmen,
-Rechnerstrukturen und Systemsoftware)
also turns its attention to Reconfigurable Computing
(keynote at joint 19th PARS / 33rd speed-up workshop
Basel, Switzerland, March 2003)
Andt Bode (general chair ISCA 2004, Munich)
now also is interested in Reconfigurable Computing
© 2003, [email protected]
46
http://hartenstein.de
Kaiserslautern
University of
Technology
[ST microelectronics]
© 2003, [email protected]
47
Mask & NRE cost
http://hartenstein.de
Kaiserslautern
University of
Technology
MPU designs more complex
new kinds of concurrency are becoming important
chip-level multiprocessing +
simultaneous multithreading
many bugs relate to concurrency issues
greatly complicates the verification process
© 2003, [email protected]
48
http://hartenstein.de
Kaiserslautern
University of
Technology
Steroids for the aging microprocessor:
The Impact of
Reconfigurable
Computing
© 2003, [email protected]
49
http://hartenstein.de
Kaiserslautern
University of
Technology
processor/memory commmunication bottleneck
vN bottleneck
vN: unbalanced
stolen from Bob Colwell
© 2003, [email protected]
50
http://hartenstein.de
Throughput vs. Flexibilityy
Kaiserslautern
University of
Technology
coarse grain
goes far beyond
bridging the gap
T. Claasen et al.: ISSCC 1999
*) R. Hartenstein: ISIS 1997
MOPS / mW
1000
throughput
100
hardwired
10
coarse
grain
FPGAs
1
von
0.1
Neumann
0.01
[Diagram (except *) by Hugo De Man]
0.001
2
1
0.5
© 2003, [email protected]
0.25
flexibility
0.13 0.1 0,07 µ feature size
51
http://hartenstein.de
Kaiserslautern
University of
Technology
wide variety of speed-up factors
key issue: algorithmic cleverness
platform
PACT Xtreme
4-by-4 array 16 tap FIR filter
[2003]
MoM
anti machine
with DPLA*
[1983]
speed-up
factor
application example
grid-based DRC**
1-metal 1-poly nMOS
256 reference patterns
straight
x16 MOPS/mW
forward
> x1000
(computation
time)
*) MPC fabrication via E.I.S. multi university project
© 2003, [email protected]
52
method
multiple
aspects
**) Design Rule Check
http://hartenstein.de
ES level methodology
Intelligent testbench
IC implementation tools
large block reuse
small block reuse
Improving RTL-only design cost model
2001]
tall thin engineer
Kaiserslautern [ITRS*
University of
*) former SIA
Technology
RTL methodology only
w. future improvements
http://public.itrs.net/Files/2001ITRS/Design.pdf
© 2003,
[email protected]
53
http://hartenstein.de
Kaiserslautern
University of
Technology
„EDA industry shifts into CS mentality“
[Wojciech Maly]
• Microprogramming to replace FSM design
• Hardware languages replace EE-type schematics
• EDA Software and its interfacing languages
• Newer system level languages like systemC etc.
• Small and large module re-use
• Hierarchical organization of designs, EDA, et al.
• .....................
© 2003, [email protected]
54
http://hartenstein.de
Kaiserslautern
University of
Technology
Embedded System Design Crisis
the 2nd design crisis
year
© 2003, [email protected]
55
http://hartenstein.de
Kaiserslautern
University of
Technology
Foundries: Adoption Rate By Process
© 2003, [email protected]
[Nick Tredennick]
56
http://hartenstein.de
Kaiserslautern
University of
Technology
[intel]
„Pollack‘s Law“ (simplified)
growth factor
area efficiency
performance
© 2003, [email protected]
57
µm
http://hartenstein.de
0.1
Kaiserslautern
University of
Technology
Future Trends in Microelectronics
• Predictions require some history
[Gordon Bell]
• Organizations usually behave poorer
than anyone can predict [Gordon Bell]
• This especially holds
for research funding
and politics in Europe
[R. H.]
© 2003, [email protected]
58
http://hartenstein.de
Kaiserslautern
University of
Technology
The European Paradigm Shift Paradox
• When major funding agencies and their prominent
advisors exclude an area, it becomes mainstream (e. g. elsewhere)
• the EU commission decided to exclude HDLs from
funding* [CAVE workshop, Nice, France, 1989]
*) after having spent about 100 Mio ECU 1983-1989
• Similar errors:
Ken Olson, 1977:
„There is no reason
anyone would want a
computer in their home“
Heinz Nixdorf:
„We won‘t switch
from Mercedes
to Bicycle“
• the EU commission rejected all consortia applications on
funds for research in Reconfigurable Computing [2003]
© 2003, [email protected]
59
http://hartenstein.de