Coarse Grain Reconfigurable Architectures

Download Report

Transcript Coarse Grain Reconfigurable Architectures

VLSI-SoC 2001
IFIP - LIRMM
December 2- 4, 2001, Montpellier, France
Reiner Hartenstein
University of
Kaiserslautern
Stream-based Arrays:
Converging Design Flows
for both, Reconfigurable
and Hardwired ....
>> Stream-based Computing
Xputer Lab
University of Kaiserslautern
• Stream-based Computing
• Stream-based Compilation Techniques
• Use in Co-Design
• Now it’s up to You !
http://www.uni-kl.de
© 2001, [email protected]
2
http://www.fpl.uni-kl.de
Xputer Lab
University of Kaiserslautern
rDPA (coarse grain) becoming important
commercial rDPAs:
XPU family (IP cores):
PACT Corp., Munich
CALISTO: Silicon Spice **
CS2000 family:
Chameleon Systems
MECA family: Malleable **
flexible array: MorphICs
ACM: Quicksilver Tech
CHESS array: Elixent
MorphoSys: Morpho Tech
FIPSOC: SIDSA
XPU128
**) bought
© 2001, [email protected]
3
http://www.fpl.uni-kl.de
Xputer Lab
SNN filter Example: KressArray Family
University of Kaiserslautern
http://kressarray.de
You may use it
on your Netscape
KressArray
Xplorer:
rout thru only
array size:
10 x 16
= 160 rDPUs
Legend:
© 2001, [email protected]
rDPU not used
backbus connect
used for
routing only
backbus
connect
4
operator and routing
port location
not
used marker
http://www.fpl.uni-kl.de
Xputer Lab
Rapidly toward the Break-through
University of Kaiserslautern
• replaceConcurrent Processes by
more efficient parallelism:
stream-based DPAs1
stream-based rDPAs2
Kress: a generalization of
systolic array synthesis:
super systolic synthesis
terms:
DPU: datpath unit
DPA: data path array
rDPU: reconfigurable DPU
rDPA: reconfigurable DPA
© 2001, [email protected]
Generalization of
the Systolic Array
5
http://www.fpl.uni-kl.de
Xputer Lab
compare Concurrent Computing
University of Kaiserslautern
DPU
instruction
sequencer
....
extremely inefficient
CPU
DPU
DPU
DPU
instruction
sequencer
instruction
sequencer
instruction
sequencer
Bus(es) or switch box
• massive bottleneck phenomena at run time
• control flow overhead
• instruction fetch / interpretation overhead
• address computation overhead - may be massive
6
http://www.fpl.uni-kl.de
© 2001, [email protected]
Xputer Lab
... with Stream-based Computing: (r)DPA
University of Kaiserslautern
• „instruction fetch“: driven by data stream fr. / to memory
or, fr. / to peripheral interface
at compile time
•no instruction
sequencer inside !
•transport-triggered
execution
for both,
• reconfigurable, and
• hardwired [Brodersen]
DPU
DPU
DPU
DPU
DPU
DPU
DPU
DPU
DPU
avoids run time overhead
and bottleneck phenomena
rDPA: drastically reduced
reconfigurability overhead
© 2001, [email protected]
7
http://www.fpl.uni-kl.de
Xputer Lab
Soft rDPA ?
University of Kaiserslautern
miscellanous
HLL
Compiler
soft CPU
Memory
•50 mio system gates soon
•even large rDPAs as soft IPs become feasible
•by >2005: don’t care about area efficiency ?
© 2001, [email protected]
8
http://www.fpl.uni-kl.de
Xputer Lab
>> Stream-based Compilation Techniques
University of Kaiserslautern
• Stream-based Computing
• Stream-based Compilation Techniques
• Use in Co-Design
• Now it’s up to You !
http://www.uni-kl.de
© 2001, [email protected]
9
http://www.fpl.uni-kl.de
Xputer Lab
Systolic Stream-based Computing System
University of Kaiserslautern
Systolic Array [H. T. Kung, 1980]:
a DPA (Data Path Array)
y
y1
y2
-
x3
a
y3
- -
a13
a23
a33
-
a12
a22
a32
x1
a11
a21
a31
-
-
x2
y 1(0)
y 2( 0)
y 3( 0)
© 2001, [email protected]
*
+
this dichotomy is
completely ignored
DPU
byarchitecture
our CS curricula
x
placement
equations
linear
projection
systolic computing
computing
or
arrays in space
in
time
algebraic etc.
mapping
data
streams
migration by re-timing
and other transformations
10
http://www.fpl.uni-kl.de
General Stream-based Computing System
Xputer Lab
heterogenous DPA or rDPA
University of Kaiserslautern
y
a
*
2
3
4
+ + * xf
sh sh * + + * xf
sh sh
* -
© 2001, [email protected]
1
+
DPU architectures
x expression tree
simultaneous
placement
& routing
2
data
streams
11
Mapper
Scheduler
http://www.fpl.uni-kl.de
Xputer Lab
Memory Communication Architecture …
• hot research topic in embedded systems
• storage context transformations
[Cathoor, Herz
Herz, Kougia, Soudris]
University of Kaiserslautern
• Synthesizable
Memory
Communication
Architecture
• an example by
Nageldinger’s
KressArray Xplorer
• startups provide
memory IPs or generators
Legend:
application
sequencers
memory ports
not used
Optimized Parallel
Memory Controller
© 2001, [email protected]
12 sequencer methodology
GAG generic
available
http://www.fpl.uni-kl.de
>> Use in Co-Design
Xputer Lab
University of Kaiserslautern
• Stream-based Computing
• Stream-based Compilation Techniques
• Use in Co-Design
• Now it’s up to You !
http://www.uni-kl.de
© 2001, [email protected]
13
http://www.fpl.uni-kl.de
“von Neumann” Computer:
the wrong Machine Paradigm
Xputer Lab
University of Kaiserslautern
Computer
tightly coupled
by compact
instruction code
Compiler
enabling
technology
Memory
published
 10 years ago
now a hot topic area
Datapath
Datapath
Sequencer
program
full day course
hardwired
ter
:
cou
n
last week at
state
registerFinland
Tampere,
© 2001, [email protected]
loosely coupled
by decision
data bits only
“von
Neumann”
Xputer
Compiler
Scheduler
Memory
does not support
soft data paths
Xputer:
(multiple)
sequencer
Datapath
Array
The Soft
Machine
Paradigm data reconfigurable
counter(s)
also for hardwired
14
[Broderson]
http://www.fpl.uni-kl.de
Co-Compilation
Xputer Lab
University of Kaiserslautern
Jürgen Becker’s
Co-DE-X
X-C
high level programming
language source
Co-Compiler
[ASP-DAC’95]
Computer
Machine
Paradigm
X-C
GNUpartitioning
C
compiler
compiler
compiler Analyzer
/ Profiler
DPSS
mProcessor
© 2001, [email protected]
interface
Software
running on
Partitioner
Reconfigurable
KressArray
Accelerators
15
supporting
Configware
different
running
on
platforms
Xputer
“Soft”
Machine
Paradigm
Resource
Parameters
http://www.fpl.uni-kl.de
Loop Transformation Examples
Xputer Lab
University of Kaiserslautern
sequential processes:
loop 1-16
body
endloop
resource parameter driven
Co-Compilation
host:
loop 1-8
trigger
endloop
loop 1-8
fork
body
body
loop 1-8 loop 9-16
endloop body
body
endloop endloop
loop
unrolling
loop 1-4
trigger
endloop
loop 1-2
trigger
endloop
join
strip mining
© 2001, [email protected]
reconf.array:
16
http://www.fpl.uni-kl.de
>> Now it’s up to You !
Xputer Lab
University of Kaiserslautern
• Stream-based Computing
• Stream-based Compilation Techniques
• Use in Co-Design
• Now it’s up to You !
http://www.uni-kl.de
© 2001, [email protected]
17
http://www.fpl.uni-kl.de
Xputer Lab
University of Kaiserslautern
CS Education
….
…However,
is basedcurrent
on the Submarine
Model
This model disables ...
Algorithm
procedural high level
Programming Language
Brain usage:
procedural-only
Assembly Language
Hardware invisible:
under the surface
Hardware
© 2001, [email protected]
18
http://www.fpl.uni-kl.de
Xputer Lab
University of Kaiserslautern
... this model disables
Hardware and Software as Alternatives
procedural
structural
Algorithm
partitioning
Brain Usage:
both Hemispheres
Hardware,
Configware
Software
Hardw/Configw
Softwareonly
& Hardw/Configw
Software only
© 2001, [email protected]
19
http://www.fpl.uni-kl.de
Xputer Lab
The Dominance of the Submarine Model ...
University of Kaiserslautern
(procedural)
structurally
disabled
Hardware
... indicates, that our CS education
system produces zillions of
mentally disabled Persons
It‘s time to attack the software
faculty dictatorship. Get involved!
© 2001, [email protected]
… completely disabled to cope with
solutions other than software only
20
http://www.fpl.uni-kl.de
>>> thank you
Xputer Lab
University of Kaiserslautern
thankIt’s
you for
uplistening
to You
© 2001, [email protected]
21
!
http://www.fpl.uni-kl.de
>>> END
Xputer Lab
University of Kaiserslautern
END
© 2001, [email protected]
22
http://www.fpl.uni-kl.de
Xputer Lab
The Impact of Reconfigurable Logic
University of Kaiserslautern
• Reconfigurable platforms bring a new dimension to digital
system development and have a strong impact on SoC design.
• Flexibility promises spin-around times downto minutes instead
of months for real time in-system debugging, profiling,
verification, tuning, field-maintenance, and field upgrades
• A rapidly growing large user
base of HDL-savvy designers
with FPGA experience.
Revenue [T. Kean]
/ month
Update 1
reconfigurable
Product
Product
with
download
• A New Business Model (in-field
debugging and upgrading ... )
• A Fundamental Paradigm
Shift in Silicon Application
ASIC
Product
Time / months
1
© 2001, [email protected]
Update 2
23
10
20
30
http://www.fpl.uni-kl.de
The History of
Paradigm Shifts
Xputer Lab
University of Kaiserslautern
“Mainstream Silicon Application
is switching every 10 Years”
“The Programmable System-on-a-Chip
is the next wave“
standard
µproc.,
memory
TTL
1967
1957
custom
LSI,
MSI
© 2001, [email protected]
1977
2007
1987
ASICs,
accel’s
24
1997
?
?
http://www.fpl.uni-kl.de
How’s next Wave ?
Xputer Lab
University of Kaiserslautern
standard
hardwired
procedural programming
1967
1957
1987
1977
structural programming
FPGAs
1997
Coarse
2007 grain
RAs
Hartenstein’s
Curve
custom
algorithm: fixed
algorithm: variable
algorithm: variable
resources: fixed
resources: fixed
resources: variable
Tredennick’s
Paradigm Shifts
© 2001, [email protected]
no further wave !
25
http://www.fpl.uni-kl.de
The Impact of Makimoto’s
Paradigm Shifts
Xputer Lab
University of Kaiserslautern
Software Industry’s
Secret of Success
µproc.,
memory
TTL
1967
1957
custom
Configware Success Story
by new Machine Paradigm
Procedural
personalization
via RAM-based
Machine Paradigm
standard
LSI,
MSI
© 2001, [email protected]
Dr. Makimoto: FPL 2000 keynote
structural
personalization:
RAM-based
before run time
2007
1987
1977
26
ASICs,
accel’s
1997
http://www.fpl.uni-kl.de
The History of
Paradigm Shifts
Xputer Lab
University of Kaiserslautern
“Mainstream Silicon Application
is switching every 10 Years”
standard
µproc.,
memory
TTL
1967
1957
custom
LSI,
MSI
© 2001, [email protected]
1977
1987
ASICs,
accel’s
27
FPGAs
1997
2007
coarse
grain
http://www.fpl.uni-kl.de
Xputer Lab
University of Kaiserslautern
KressArray Family generic Fabrics:
a
few
examples
Select mode,
Select
number, width
of NNports
16
Function
Repertory
8
32
+
24
2
rDPU
4
Wired by Abutment
select Nearest Neighbour (NN) Interconnect: an example
routthrough
only
more NNports:
rich Rout Resources
rout-through
and function
Examples of
2nd Level
Interconnect:
layouted over
rDPU cell no separate
routing areas !
© 2001, [email protected]
28
http://www.fpl.uni-kl.de