Coarse Grain Reconfigurable Architectures

Download Report

Transcript Coarse Grain Reconfigurable Architectures

10th Reconfigurable
Architectures Workshop
(RAW 2003)
Nice, France, April 22, 2003
(invited paper)
Reiner Hartenstein
IEEE fellow
Kaiserslautern
University of
Technology
Are we really ready
for the break-through ?
>> history & terminology
Xputer Lab
University of Kaiserslautern
http://www.uni-kl.de
• history & terminology
• skyrocketing requirements
• destructive von Neumann monopoly
• high mask cost
• low battery capacity
• new compilation model
• conclusions
© 2003, [email protected]
2
http://hartenstein.de
Semiconductor Revolutions
Xputer Lab
University of Kaiserslautern
“Mainstream Silicon Application
is switching every 10 Years”
software people
1957
custom
1967
µproc.,
memory
LSI,
MSI
hardware people
© 2003, [email protected]
2007
1987
ASICs,
accel’s
1977
1st design crisis
TTL
new breed needed
new breed (M&C)
3
1997
2nd design crisis
standard
http://hartenstein.de
Xputer Lab
Terminology: DPU versus CPU ...
University of Kaiserslautern
•
•
•
•
•
•
DPU: data path unit
DPA: DPU array
GA: gate array
rDPU: reconfigurable DPU
rDPA: reconfigurable DPA
rGA: reconfigurable GA
(r)DPA
(r)DPU
• DPU is no CPU:
there is nothing central
CPU
- like in a DPA
© 2003, [email protected]
4
DPU
DPU
instruction
sequencer
http://hartenstein.de
Digital System Platforms
clearly distinguished (1)
Xputer Lab
University of Kaiserslautern
program source
running on it
platform
hardware
(not programmable)
fine grain rGA (FPGA)
configware
morphware coarse
rDPU, rDPA
grain
reconfigurable flowware &
data stream
configware
processor
data stream processor (hardwired)
flowware
instruction stream processor
software
© 2003, [email protected]
machine
paradigm
5
none
anti
machine
von
Neumann
machine
http://hartenstein.de
flowware defines ....
Xputer Lab
University of Kaiserslautern
time
x
x
x
DPA
... which data item
at which time
at which port
time
|
|
port #
- - - x x x
time
- - - - x x x
x x x - -
- - - - - x x x
port #
|
|
|
|
|
|
|
|
|
|
|
x
x
x
6
input data streams
|
x x x
x x x -
flowware manipulates the
data counter(s) ...
... software manipulates
the program counter
© 2003, [email protected]
x
x
x
x
x
x
time
x
x
x
port #
output data streams
|
x
x
x
http://hartenstein.de
History of data-streams
Xputer Lab
University of Kaiserslautern
1980: data streams (Kung, Leiserson)
1995: super systolic rDPA (Kress)
1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...
(tutorials and courses available on all this)
© 2003, [email protected]
7
http://hartenstein.de
>> skyrocketing requirements
Xputer Lab
University of Kaiserslautern
http://www.uni-kl.de
• history & terminology
• skyrocketing requirements
• destructive von Neumann monopoly
• high mask cost
• low battery capacity
• new compilation model
• conclusions
© 2003, [email protected]
8
http://hartenstein.de
Xputer Lab
University of Kaiserslautern
What are the Challenges ? (1)
[ST microelectronics, MorphICs, Dataquest, eASIC]
factor
2
4y
1
0
© 2003, [email protected]
10
12
18
9
months
http://hartenstein.de
Changing Models of Computing
Xputer Lab
University of Kaiserslautern
software
design
Software
(procedural)
hardware/
software
co-design
hardware
Software spec
downloading
I/O
data path
RAM
instruction
sequencer
“von Neumann”
© 2003, [email protected]
downloading
RAM
CAD
hardwired
accelerator(s)
host
hardware
10
the problem
with typical CS
people:
-the dominance
of von Neumann
- they cannot
partition
- they cannot
migrate
hardware
people
needed
http://hartenstein.de
>> destructive von Neumann monopoly
Xputer Lab
University of Kaiserslautern
http://www.uni-kl.de
• history & terminology
• skyrocketing requirements
• destructive von Neumann monopoly
• high mask cost
• low battery capacity
• new compilation model
• conclusions
© 2003, [email protected]
11
http://hartenstein.de
Xputer Lab
Which machine paradigm ?
University of Kaiserslautern
von Neuman does not support morphware
© 2003, [email protected]
12
http://hartenstein.de
What about CS people ?
Xputer Lab
University of Kaiserslautern
CS
people
TTL
1957
1967
µproc.,
memory
LSI,
MSI
© 2003, [email protected]
1987
ASICs,
procedural accel’s
programming
languages,
compiler
computer
architecture
1977
13
FPGAs
1997
2007
soft
CPUs
coarse
grain
http://hartenstein.de
Flag ship example: annual IEEE ISCA conference series
Xputer Lab
Statistics [David Padua, John Hennessy, et al.]
University of Kaiserslautern
the Datenflow
Machine is dead
vN Parallelism:
Resignation?
Interconnect
Fabrics:
taken over by
the opposition:
Reconfigurable
Computing
© 2003, [email protected]
14
http://hartenstein.de
Xputer Lab
There are more Levels of Parallelism
University of Kaiserslautern
Process level
Loop Level (data-stream-based, pipe nets, etc.)
Instruction Level (VLIW etc.)
RT Level (special architectures etc.)
Logic Level (FPGAs)
© 2003, [email protected]
15
http://hartenstein.de
Xputer Lab
University of Kaiserslautern
What are the Challenges ? (2)
[ST microelectronics, MorphICs, Dataquest, eASIC]
factor
2
90% by 2010
4y
10y
1
0*) Department of Trade and Industry, London
© 2003, [email protected]
10
12
16
18
months
http://hartenstein.de
Changing Models of Computing
Xputer Lab
University of Kaiserslautern
software
design
Software
(procedural)
configware
hardware/
/software Software
software
co-design
co-design
Configware
Software
(structural)
downloading
I/O
data path
RAM
instruction
sequencer
“von Neumann”
downloading
RAM
CAD
hardwired
accelerator(s)
host
RAM
host
reconf.
accelerator(s)
RAM
hardware/configware
/software co-design
Hardware
© 2003, [email protected]
downloading
17
Morphware
http://hartenstein.de
no von Neumann bottleneck ?
Xputer Lab
University of Kaiserslautern
typical CS people:
• how to provide more performance to these people ?
• think in terms of machine models:
sequencing instruction by instruction
• cannot be turned into hardware people
• new machine paradigm needed which does not have
a von Neumann bottleneck
• the anti machine has no von Neumann bottleneck
• data streams instead of an instruction stream
• flowware instead of software
© 2003, [email protected]
18
http://hartenstein.de
Digital System Platforms
clearly distinguished (2)
Xputer Lab
University of Kaiserslautern
program source
running on it
platform
hardware
(not programmable)
fine grain rGA (FPGA)
configware
morphware coarse
rDPU, rDPA
grain
reconfigurable flowware &
data stream
configware
processor
data stream processor (hardwired)
flowware
instruction stream processor
software
© 2003, [email protected]
machine
paradigm
19
none
anti
machine
von
Neumann
machine
http://hartenstein.de
Matter & Antimatter
Xputer Lab
University of Kaiserslautern
-
+
The World of Matter
machine paradigm: the
© 2003, [email protected]
The World of Anti Matter
machine paradigm: Anti Atom
Atom
20
+
http://hartenstein.de
Xputer Lab
Matter & Antimatter of Informatics :
University of Kaiserslautern
Anti Machine paradigm
CPU
-
+
nothing central !
DPU
+
© 2003, [email protected]
21
-
http://hartenstein.de
heavy anti atoms: DPA = DPU array
Xputer Lab
flowware: data streams
spinning around
University of Kaiserslautern
+
+
+
DPU
DPU
DPU
DPU
DPU
DPU
DPU
DPU
DPU
DPA
-
+
-
+
+
© 2003, [email protected]
-
-
-
-
22
+
-
DPA
+
-
+
http://hartenstein.de
Machine paradigms
Xputer Lab
University of Kaiserslautern
von Neumann
data-stream machine
instruction
M
stream
instruction machine
stream
I/O
Flowware
M
(data sequencer)
I/O
DPU
CPU instruction
sequencer
Software
memory
data address
generator
Configware
asM*
data stream
DPU or rDPU
embedded memory architecture*
M M M M
I/O
M
M M M M
M
memory
I/O
(r)DPA
(r)DPU
© 2003, [email protected]
23
http://hartenstein.de
Just in time
Xputer Lab
University of Kaiserslautern
The new distributed memory discipline:
just in time to implement the anti machine.
[3] M. Herz et al. (invited): Memory Organization for
Data-Stream-based Reconfigurable Computing; Proc. ICECS 2002
© 2003, [email protected]
24
http://hartenstein.de
Machine Paradigms
Xputer Lab
University of Kaiserslautern
machine category
Computer (the Machine:
“v. Neumann”)
driven by:
Instruction streams
data streams (no “dataflow”)
engine principles
instruction sequencing
sequencing data streams
state register
single program counter
(multiple) data counter(s)
at run time
at load time
resource
DPU (e.g. single ALU)
DPU or DPA (DPU array) etc.
operation
sequential
parallel pipe network etc.
Communication path set-up
. fetch” )
( “instruction
data
path
*) e g. Bee project Prof. Broderson
© 2003, [email protected]
The Anti Machine
also hardwired implementations*
25
http://hartenstein.de
>> high mask cost
Xputer Lab
University of Kaiserslautern
http://www.uni-kl.de
• history & terminology
• skyrocketing requirements
• destructive von Neumann monopoly
• high mask cost
• low battery capacity
• new compilation model
• conclusions
© 2003, [email protected]
26
http://hartenstein.de
What are the Challenges ? (3)
Xputer Lab
[ST microelectronics, MorphICs, Dataquest, eASIC]
University of Kaiserslautern
factor
2
avoid
applicationspecific
silicon !
3y
4y
10y
30y
1
10
0
*) Department
12
18
months
of Trade and Industry, London
© 2003, [email protected]
27
http://hartenstein.de
Xputer Lab
Coarse grain vs. Fine grain
University of Kaiserslautern
Reconfigurability:
fine grain (FPGAs, rGAs)
coarse grain (PACT AG, Munich)
multi grain (e. g. by slice bundling)
© 2003, [email protected]
28
http://hartenstein.de
Throughput vs. Efficiency
Xputer Lab
University of Kaiserslautern
area used by
application
T. Claasen et al.: ISSCC 1999
*) R. Hartenstein: ISIS 1997
MOPS / mW
1000
L
100
L
L
L
S
1
L
S
L
L
resources
needed for
reconfigurability
0.01
0.001
L
1 Bit CLB
0.1
Wiring by abutment:
32 Bit example
S
S
10
L
2
© 2003, [email protected]
1
0.5
0.25
29
0.13 0.1 0,07 µ feature size
http://hartenstein.de
Throughput vs. Flexibilityy
Xputer Lab
University of Kaiserslautern
coarse grain
goes far beyond
bridging the gap
T. Claasen et al.: ISSCC 1999
*) R. Hartenstein: ISIS 1997
MOPS / mW
1000
throughput
100
hardwired
10
1
coarse
grain
FPGAs
0.1
ment:
von
Neumann
0.01
0.001
flexibility
2
1
0.5
© 2003, [email protected]
0.25
0.13 0.1 0,07 µ feature size
30
http://hartenstein.de
PACT XPP: Reference Module: XPU128 Co-Processor
ALU - PAE
Xputer Lab
University of Kaiserslautern
ALU
Ctrl
CFG
XPP128 ALU-Array
PAE
core
•
•
•
•
2 X PACs (Cluster)
128 X ALU-PAEs
32 X 1Kbyte RAM-PAEs
8X I/O Elements
[Jürgen Becker,
Univ. Karlsruhe]
•
•
•
•
Full 32 or 24 Bit Design
2 Configuration Hierarchies
Evaluation Board (2001)
XDS Development Tool with
Simulator
© 2003, [email protected]
• PAE Core is 32- or 24-Bit ALU with
DSP-Instruction Set and Controller
31
• Connecttions: Inputs + Outputs (Channels) + Events
http://hartenstein.de
>> low battery capacity
Xputer Lab
University of Kaiserslautern
http://www.uni-kl.de
• history & terminology
• skyrocketing requirements
• destructive von Neumann monopoly
• high mask cost
• low battery capacity
• new compilation model
• conclusions
© 2003, [email protected]
32
http://hartenstein.de
What are the Challenges ? (4)
Xputer Lab
[ST microelectronics, MorphICs, Dataquest, eASIC]
University of Kaiserslautern
factor
2
3y
4y
10y
30y
Battery capacity (1.03/year)
1
10
0
*) Department
12
18
months
of Trade and Industry, London
© 2003, [email protected]
33
http://hartenstein.de
Xputer Lab
Algorithmic cleverness
University of Kaiserslautern
Very high throughput on low power slow
FPGAs may be obtained only by algorithmic
cleverness - not yet taught by CS & CSE at
Universities – an urgent educational problem.
© 2003, [email protected]
34
http://hartenstein.de
>> new compilation model
Xputer Lab
University of Kaiserslautern
http://www.uni-kl.de
• history & terminology
• skyrocketing requirements
• destructive von Neumann monopoly
• high mask cost
• low battery capacity
• new compilation model
• conclusions
© 2003, [email protected]
35
http://hartenstein.de
What are the Challenges ? (5)
Xputer Lab
[ST microelectronics, MorphICs, Dataquest, eASIC]
University of Kaiserslautern
factor
new
2
compilation
techniques
needed !
supported
by a new
machine
paradigm
2y
3y
4y
5y
10y
Battery capacity (1.03/year)
1
10
0
*) Department
12
18
30y
months
of Trade and Industry, London
© 2003, [email protected]
36
http://hartenstein.de
Xputer Lab
computing paradigms and methodologies
University of Kaiserslautern
1946: machine paradigm (von Neumann)
1980: data streams (Kung, Leiserson)
1989: anti machine paradigm introduced
1990: anti machine implementation methodology
1990: rDPU (Rabaey)
1994: anti machine high level programming language
1995: super systolic rDPA (Kress)
1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...
1997: configware / software partitioning compiler (Becker)
2000: generator for rDPA with high memory bandwidth
(tutorials and courses available on all this)
© 2003, [email protected]
37
http://hartenstein.de
Configware / Flowware Compilation
Xputer Lab
University of Kaiserslautern
M
M
M
M
high level source program
M
M
M
© 2003, [email protected]
M
M
M
r. Data
Path
Array
mapper
M
rDPA
wrapper
intermediate
configware
M
M
M
M
M
data
streams
scheduler
address
generator
38
flowware
data sequencer
http://hartenstein.de
Tredennick’s Paradigm Shifts
Xputer Lab
University of Kaiserslautern
standard
TTL
1957
custom
hardwired
1967
procedural programming
µproc.,
memory
LSI,
MSI
1977
structural programming
2007
1987
ASICs,
accel’s
1997
2 sources
algorithm: fixed
algorithm: variable
algorithm: variable
resources: fixed
resources: fixed
resources: variable
© 2003, [email protected]
vN machine
paradigm
39
new machine
paradigm needed
http://hartenstein.de
>> conclusions
Xputer Lab
University of Kaiserslautern
http://www.uni-kl.de
• history & terminology
• skyrocketing requirements
• destructive von Neumann monopoly
• high mask cost
• low battery capacity
• new compilation model
• conclusions
© 2003, [email protected]
40
http://hartenstein.de
Conclusion
Xputer Lab
University of Kaiserslautern
No, we are not ready for the break-through,
since our computing education is obsolete,
because of the von Neumann monopoly.
But all ingredients are available
to jazz up our CS & CSE curricula
© 2003, [email protected]
41
http://hartenstein.de
>>> thank you
Xputer Lab
University of Kaiserslautern
thank you for your patience
© 2003, [email protected]
42
http://hartenstein.de
Xputer Lab
The Dominance of the Submarine Model ...
University of Kaiserslautern
(procedural)
structurally
disabled
Hardware
... indicates, that our CS education
system produces zillions of
mentally disabled Persons
It‘s time to attack the software
faculty dictatorship. Get involved!
© 2003, [email protected]
… completely disabled to cope with
solutions other than software only
43
http://hartenstein.de
>>> END
Xputer Lab
University of Kaiserslautern
© 2003, [email protected]
44
http://hartenstein.de
>>> Appendix
Xputer
Xputer
LabLab
University
Kaiserslautern
University
of of
Kaiserslautern
Appendix
for discussion
© 2003, [email protected]
© 2001, [email protected]
45
http://hartenstein.de
http://KressArray.de
Jürgen Becker’s Co-DE-X Co-Compiler
Xputer
Xputer
LabLab
supporting platform-based design
University
Kaiserslautern
University
of of
Kaiserslautern
X-C
Computer machine
paradigm
X-C is C language
extended by MoPL
Partitioner
Xputer machine
paradigm
X-C
GNU C Analyzer
compiler / Profiler compiler
Host KressArray
Software Configware
© 2003, [email protected]
© 2001, [email protected]
46
DPSS
supporting
different
platforms
Resource
Parameters
http://hartenstein.de
http://www.fpl.uni-kl.de
Xputer Lab
University of Kaiserslautern
CS Education
….
…However,
is basedcurrent
on the Submarine
Model
This model disables ...
Algorithm
procedural high level
Programming Language
Brain usage:
procedural-only
Assembly Language
Hardware invisible:
under the surface
Hardware
© 2003, [email protected]
47
http://hartenstein.de
Xputer Lab
Hardware and Software as Alternatives
University of Kaiserslautern
procedural
structural
Algorithm
partitioning
Brain Usage:
both Hemispheres
Hardware,
Configware
Software
Hardw/Configw
Softwareonly
& Hardw/Configw
Software only
© 2003, [email protected]
48
http://hartenstein.de
Xputer Lab
Impact of Makimoto’s wave
University of Kaiserslautern
Software Industry’s
Secret of Success
Personalization
(CAD) before
fabrication
standard
1967
1957
custom
Repeat Success Story by
new Machine Paradigm !
Procedural
personalization
via RAM-based
Machine Paradigm
µproc.,
memory
TTL
LSI,
MSI
© 2003, [email protected]
Configware
Industry
structural
personalization:
RAM-based
before run time
2007
1987
1977
49
ASICs,
accel’s
1997
http://hartenstein.de
scalability
Xputer Lab
University of Kaiserslautern
The Scalability Problem
The Routing congestion Problem
grows with the size of the FPGA
© 2003, [email protected]
50
http://hartenstein.de
Structured Configware Design
Xputer Lab
(Mead & Conway Revival)
University of Kaiserslautern
SNN filter KressArray Mapping Example
rout thru only
array size:
10 x 16
= 160 rDPUs
Legend:
© 2003, [email protected]
rDPU not used
backbus connect
used for
routing only
backbus
connect
51
operator and routing
port location
not
used marker
http://hartenstein.de
Xputer Lab
Conclusion: all knowledge needed is available
University of Kaiserslautern
• machine paradigm
courses / embedded tutorials:
• languages
full day courses:
• compilation techniques
• anti architectural resources
• sequencing methodology: hw & sw
• hw / sw partitioning methodology
• parallel memory IP core and module generator vendors
• anything else needed
© 2003, [email protected]
52
http://hartenstein.de
... has a chance
Xputer Lab
University of Kaiserslautern
Configware Industry has a Chance
© 2003, [email protected]
53
http://hartenstein.de
Conclusions
Xputer Lab
University of Kaiserslautern
• the anti machine is the way to go for massive parallelism,
also data-intensive applications
• reconfigurable anti machine for high performance
with short product life cycles, unstable standards
• reconfigurable for low cost low volume production
• sparepart problem: needs new infrastructures
• Giga FPGAs highly promising - only by a new design flow:
configware could repeat the success of software industry
© 2003, [email protected]
54
http://hartenstein.de
Paradigm Shifts:
Nick Tredennick‘s view
Xputer Lab
University of Kaiserslautern
why 2 program sources ?
reconfigurable
computing:
instruction-streambased computing:
algorithms variable
algorithms variable
resources fixed
resources variable
programmable
© 2003, [email protected]
55
http://hartenstein.de
Xputer Lab
Compilation for (r)DPA of anti machine
University of Kaiserslautern
high level source program
(software notation)
parameters
wrapper
expression
morphware
tree
DPU library
configware
mapper
code
generators
scheduler
streamware
flowware
© 2003, [email protected]
56
http://hartenstein.de
Misleading predictors
Xputer Lab
University of Kaiserslautern
Moore's Law is becoming a misleading
predictor of future developments.
© 2003, [email protected]
57
http://hartenstein.de
High mask cost
Xputer Lab
University of Kaiserslautern
High mask cost may be avoided
completely by morphware use, or,
partly by GAs (ASICs).
© 2003, [email protected]
58
http://hartenstein.de
Fault tolerance
Xputer Lab
University of Kaiserslautern
Morphware is the only way to
obtain fault-tolerant ICs.
© 2003, [email protected]
59
http://hartenstein.de
World-wide services
Xputer Lab
University of Kaiserslautern
FPGAs may provide an important
benefit for world-wide services and
all other after sales consequences
© 2003, [email protected]
60
http://hartenstein.de
Xputer Lab
„Re-configurable Hardware“ ??
University of Kaiserslautern
Terminology has been highly confusing
„Re-configurable Hardware“ ??
this „Hardware“ is not hard !
it‘s Morphware
We need a concise terminology:
a consensus is on the way
© 2003, [email protected]
61
http://hartenstein.de
What are the Challenges ?
Xputer Lab
[ST microelectronics, MorphICs, Dataquest, eASIC]
University of Kaiserslautern
factor
2
3y
4y
10y
30y
Battery capacity (1.03/year)
1
10
0
*) Department
12
18
months
of Trade and Industry, London
© 2003, [email protected]
62
http://hartenstein.de
What are the Challenges ?
Xputer Lab
[ST microelectronics, MorphICs, Dataquest, eASIC]
University of Kaiserslautern
factor
2
design complexity: +40%/year doub 2y
design productivity: +15%/year doub 5y
SIA roadmap]
3y
4y
10y
30y
Battery capacity (1.03/year)
1
10
0
*) Department
12
18
months
of Trade and Industry, London
© 2003, [email protected]
63
http://hartenstein.de
>> Outline
Xputer Lab
University of Kaiserslautern
• Morphware
• Changing Models by SoC Development
• New Machine Paradigm needed
• The Dichotomy of Paradigms
• Outlook
http://www.uni-kl.de
© 2003, [email protected]
64
http://hartenstein.de
The Morphware Market
Xputer Lab
University of Kaiserslautern
coarse-grained:
rDPUs: configurable
functional blocks
fine-grained:
cLBs, rLBs:
configurable
logic blocks
PACT AG, Munich, Germany
http://pactcorp.com
Lattice
15%
Altera
37%
Actel
6%
Xilinx
42%
total: $3.7 Bio
Top 4 PLD Manufacturers 2000
• [Dataquest] > $7 billion by 2003.
• fastest growing semiconductor
market segment
• PLD vendors’ and their alliances
provide libraries of “soft IPs”
Configware Market
© 2003, [email protected]
65
http://hartenstein.de
Xputer Lab
Coarse grain vs. Fine grain
University of Kaiserslautern
Reconfigurability:
fine grain (FPGAs, rGAs)
coarse grain (PACT AG, Munich)
multi grain (e. g. by slice bundling)
© 2003, [email protected]
66
http://hartenstein.de
Xputer Lab
Xplorer Plot: SNN Filter Example
University of Kaiserslautern
[13]
http://kressarray.de
2 hor. NNports, 32 bit
3 vert. NNports, 32 bit
route-thru-only rDPU
© 2003, [email protected]
+
result
operand
67
operator
operand
route thru
backbus connect
http://hartenstein.de
Xputer Lab
Morphware only: some soft CPU core examples
University of Kaiserslautern
core
architecture
platform
MicroBlaze
125 MHz 70
D-MIPS
32 bit
standard RISC
32 reg. by 32
LUT RAMbased reg.
Xilinx up to
100 on one
FPGA
Nios
16-bit
instr. set
Nios
50 MHz
Nios
core
architecture
platform
Leon
25 Mhz
SPARC
ARM7 clone
ARM
uP1232 8-bit
CISC, 32 reg.
Altera
Mercury
200 XC4000E
CLBs
REGIS
32-bit
instr. set
Altera
22 D-MIPS
8 bits Instr. +
ext. ROM
2 XILINX
3020 LCA
Reliance-1
12 bit DSP
8 bit
Altera –
Mercury
Lattice
4 isp30256,
4 isp1016
1Popcorn-1
8 bit CISC
Altera, Lattice,
Xilinx
gr1040
16-bit
gr1050
32-bit
My80
i8080A
FLEX10K30
or EPF6016
YARD-1A
16-bit RISC,
2 opd. Instr.
old Xilinx FPGA
Board
DSPuva16
16 bit DSP
Spartan-II
xr16
RISC integer C
SpartanXL
© 2003, [email protected]
Acorn-1
68
1 Flex 10K20
http://hartenstein.de
Xputer Lab
soft CPUs in academic teaching
University of Kaiserslautern
• UCSC: 1990!
•
•
•
•
•
•
Märaldalen University
Chalmers University
Cornell University
Gray Research
Georgia Tech
Hiroshima City Univ.
© 2003, [email protected]
•
•
•
•
•
•
•
69
Michigan State
Univ. de Valladolid
Virginia Tech
Washington U. St. Louis
New Mexico Tech
UC Riverside
Tokai University
http://hartenstein.de
>> New Machine Paradigm needed
Xputer Lab
University of Kaiserslautern
• Morphware
• Changing Models by SoC Development
• New Machine Paradigm needed
• The Dichotomy of Paradigms
• Outlook
http://www.uni-kl.de
© 2003, [email protected]
70
http://hartenstein.de
>> The Dichotomy of Paradigms
Xputer Lab
University of Kaiserslautern
• Morphware
• Changing Models by SoC Development
• New Machine Paradigm needed
• The Dichotomy of Paradigms
• Outlook
http://www.uni-kl.de
© 2003, [email protected]
71
http://hartenstein.de
>> Outlook
Xputer Lab
University of Kaiserslautern
• Morphware
• Changing Models by SoC Development
• New Machine Paradigm needed
• The Dichotomy of Paradigms
• Outlook
http://www.uni-kl.de
© 2003, [email protected]
72
http://hartenstein.de
Why fine grain ?
Xputer Lab
University of Kaiserslautern
• no specific silicon: low production volume
(aerospace, automotive, military, industrial
controllers, et al.)
• the spare part problem
• design flow
• coming Giga-FPGA
© 2003, [email protected]
73
http://hartenstein.de
Xputer Lab
Configware Industry vs. Software Industry
University of Kaiserslautern
can configware industry repeat the success story?
• RAM-based
• Compatibility
• Scalability
• Education problems
© 2003, [email protected]
74
http://hartenstein.de
Xputer Lab
Problems of Parallelism
University of Kaiserslautern
enormous speed-ups: factor of 3 to >10 000
Software to FPGA migration:
algorithmic cleverness missing, no education
no methodology for interconnect estimation
Software to rDPA migration
methodology only in special areas (DSP, wireless ....)
... far beyond
the area of parallel algorithms needs
traditional
a complete re-orientation of its scope ... platforms
© 2003, [email protected]
75
http://hartenstein.de
Evolution of FPGA and its design flow
Xputer Lab
[à la S. Guccione]
University of Kaiserslautern
HLL
Compiler
HLL
Compiler
soft
rDPA
rDPA
core
Schematics/
HDL
Netlister
Netlist
Place
and
Route
.
.
FPGA
core
soft
FPGA core
CPU
inter
face
s
CPU Memory
core
core
as soon as Giga
FPGA is available
inter
face
s
CPU Memory
core
core
Bitstream
User
Code
HLL
Compiler Executable
© 2002,
2003, [email protected]
76
Compiler
http://hartenstein.de
http://KressArray.de
ASIC emulation
Xputer Lab
University of Kaiserslautern
• ASIC emulation / Rapid Prototyping: to replace simulation
• Quickturn (Cadence), IKOS (Synopsys), Celaro (Mentor)
• hours of compilation run: inefficient since netlist-based: ...
• ... ASIC emulators will become obsolete soon
• by RTR: in-circuit execution debugging instead of emulation
• new business model: upgradable morphware is the product
• emulation for solving the spare part problem in many areas
© 2003, [email protected]
77
http://hartenstein.de
Xputer Lab
University of Kaiserslautern
the wrong
Nasty Matter
machine paradigm
extremely
power hungry
and area
inefficient
CPU
+
Data
Path
instruction
sequencer
© 2003, [email protected]
reconfigurable?
central
von Neumann
bottleneck
RAM
Instruction Fetch Overhead
Address Computation Overhead
78
http://hartenstein.de
University of Kaiserslautern
DPU
+
79
stream
data
data streams
instruction
sequencer
© 2003, [email protected]
-
+
Data
Path
+
Xputer Lab
Matter vs. Antimatter: CPU vs. DPU
DPU
Data
Path
Unit
http://hartenstein.de
Xputer Lab
University of Kaiserslautern
CPU
+
Data
Path
CPU: RAM-based
+ simple machine paradigm
+ scalability
+ relocatability
+ compatibility
= secret of success
of software industry
RAM
instruction
sequencer
© 2003, [email protected]
80
http://hartenstein.de
• for configware industry is missing:
Xputer Lab
University of Kaiserslautern
– FPGA compatibility,
– fully scalable FPGA,
– relocatable configuration code
property
instruction
stream
based
Success Factors
• rDPUs and rDPAs do
much better than FPGAs
data stream based
reconfigurable
fine grain
(FPGA)
coarse
grain
hardwired
RAM-based
yes
yes
yes
(hardwired)
machine paradigm
yes
available**
no
available
available
compatibility
yes
feasible**
limited
feasible
feasible
scalability
yes
good**
no
good*
(hardwired)
code relocatability
yes
good**
no
good*
(hardwired)
success of
software
industry
© 2003, [email protected]
**) mapping coarse
grain onto
FPGA
81
*) if KressArray
used
http://hartenstein.de
Xputer Lab
>>> Problems with Concurrency
University of Kaiserslautern
• The Computer Architecture Crisis
• The Impact of Reconfigurable Platforms
• The Dichotomy of Models
• Parallelism
• Conclusions
http://www.uni-kl.de
© 2003, [email protected]
82
http://hartenstein.de
Parallelism by Concurrency
Xputer Lab
+
independent instruction streams
University of Kaiserslautern
-
+
-
+
+
+
-
+
-
© 2003, [email protected]
+
-
difficult coordination
Data
Path
Data
Path
Data
Path
instruction
sequencer
instruction
sequencer
instruction
sequencer
....
Data
Path
instruction
sequencer
Bus(es) or switch box
massive run time overhead
83
http://hartenstein.de
>> The Dominance of Embedded Systems
Xputer Lab
University of Kaiserslautern
• The Computer Architecture Crisis
• The Impact of Reconfigurable Platforms
• The Dichotomy of Models
• Parallelism
• Conclusions
http://www.uni-kl.de
© 2003, [email protected]
84
http://hartenstein.de
Xputer Lab
Summary of the Anti Machine Paradigm
University of Kaiserslautern
• anti language primitives are
almost the same (slightly extended)
• anti machine execution potential
is dramatically more powerful
• provides drastically more flexibility
• not always replacing von Neumann
© 2003, [email protected]
85
http://hartenstein.de
JPEG zigzag scan pattern
*> Declarations
goto PixMap[1,1]
4
Xputer
EastScan
is Lab
University of Kaiserslautern
step by [1,0]
end EastScan;
2
SouthScan is
step by [0,1]
endSouthScan;
HalfZigZag;
SouthWestScan
uturn (HalfZigZag)
NorthEastScan is
loop 8 times until [*,1]
step by [1,-1]
3 endloop
x
y
dataHalfZigZag
counter
data counter
data counter
data counter
end NorthEastScan;
SouthWestScan is
loop 8 times until [1,*]
step by [-1,1]
1 endloop
end SouthWestScan;
endloop
end HalfZigZag;
© 2003, [email protected]
86
HalfZigZag
HalfZigZag is
EastScan
loop 3 times
SouthWestScan
SouthScan
NorthEastScan
EastScan
http://hartenstein.de
Xputer Lab
University of Kaiserslautern
>> Address Generators for Data Streams
(data streams introduced earlier in this session)
• Introduction
• Smart Address Generators
• Address Generators for Data Streams
• Customized Memory Organization
• Conclusions
http://www.uni-kl.de
© 2003, [email protected]
87
http://hartenstein.de
2-D Generic Data Sequence Examples
Xputer Lab
University of Kaiserslautern
a)
b)
c)
d)
© 2003, [email protected]
e)
f)
88
g)
http://hartenstein.de
Xputer Lab
University of Kaiserslautern
GAU generic address unit Scheme
GAG = Generic
Address
Generatorc
DA
B0
[|
L0
Limit
Slider
GAU
© 2003, [email protected]
DA
89
|
|
]
limit
B0
Address
Stepper
A
|
L
Base
Slider
all 3 are copies
of the same BSU
stepper circuit
http://hartenstein.de
GAG: Address Stepper
GAG: Address Stepper
Xputer Lab
University of Kaiserslautern
]
[
Base
B0
Limit
GAG =
Generic
Address
Generator
[|
DA
|
|
stepVector
maxStepCount
init
tag
L
B0
|
DA
A
Step
Counter
+/–
=o
Escape
Clause
End
Detect
L
|
|
]
limit
A
Address
© 2003, [email protected]
90
endExec
http://hartenstein.de
Xputer Lab
University of Kaiserslautern
GAG Slider Model
floor
DA
L0
Limit
Stepper
B0
Address
Stepper
DA
[
B0
Generic
Address
Generator
L0
]
DA
L0
[
© 2003, [email protected]
Base
Stepper
GAG
A
B0
ceiling
sliders
]
91
http://hartenstein.de
GAG Complex Sequencer Implementation
Xputer Lab
University of Kaiserslautern
GAU
GAU
L0 DA B0
Limit
Slider
Address
Stepper
A
VLIW
stack
L0 DA B0
Base
Slider
Limit
Slider
Address
Stepper
GAU
A
L0 DA B0
Limit
Slider
GAG
Address
Stepper
A
GAU
GAG
SDS
Base
Slider
GAU
Generic Address Generator
© 2003, [email protected]
Base
Slider
92
all `been
published
in 1990
http://hartenstein.de
Xputer Lab
GAG Slider Operation Demo Example
University of Kaiserslautern
address
floor
F
ceiling
B0
DA
DB
x
© 2003, [email protected]
y
DB
93
L0
C
DL
DL
http://hartenstein.de
Xputer Lab
The microelectronics spare part problem
University of Kaiserslautern
•Demand: several
decades of availability
[Hartenstein 2002]
• e. g. car price: ~25% electronics
•ICs do not survive
storage time
•Original fab line is no
more existing
2
1
0.5
© 2003, [email protected]
0.25
0.13 0.1 0,07µ feature size
94
http://hartenstein.de
Xputer Lab
The microelectronics spare part problem
University of Kaiserslautern
[Hartenstein 2002]
key problem in many
application areas:
medical, aerospace,
automotive,
other transportation,
military, industrial
equipment controllers,
et al.
2
1
0.5
© 2003, [email protected]
0.25
0.13 0.1 0,07µ feature size
95
http://hartenstein.de
Dead Supercomputer Society
[Gordon Bell, keynote at ISCA 2000].
Xputer Lab
University of Kaiserslautern
•ACRI
•Alliant
•American
Supercomputer
•Ametek
•Applied Dynamics
•Astronautics
•BBN
•CDC
•Convex
•Cray Computer
•Cray Research
•Culler-Harris
•Culler Scientific
•Cydrome
•Dana/Ardent/
Stellar/Stardent
•DAPP
•Denelcor
•Elexsi
•ETA Systems
•Evans and Sutherland
•Computer
•Floating Point Systems
•Galaxy YH-1
•Goodyear Aerospace MPP
•Gould NPL
•Guiltech
•ICL
•Intel Scientific Computers
•International Parallel
Machines
•Kendall Square Research
•Key Computer Laboratories
© 2003, [email protected]
96
•MasPar
•Meiko
•Multiflow
•Myrias
•Numerix
•Prisma
•Tera
•Thinking Machines
•Saxpy
•Scientific Computer
•Systems (SCS)
•Soviet Supercomputers
•Supertek
•Supercomputer Systems
•Suprenum
•Vitesse Electronics
http://hartenstein.de
CS: young ? dynamic?
Xputer Lab
University of Kaiserslautern
.. but the von Neumann
Paradigm is still the
dominant doctrine ...
after >10 technology generations ...
•
•
•
... still pushing he basic
models from the times of •
•
mainframe dinosaurs
•
•
Microelectronics is
•
•
ignored (except falling cost
•
of computational effort)
•
•
© 2003, [email protected]
1th
2nd
3rd
4th
5th
6th
7th
8th
9th
10th
11th
.......
4004
... the vN Microprocessor
8008
is a methusela, the steam
8086
engine of the silicon age.
80286
80386
80486
P5 (Pentium)
P6 (Pentium Pro / Pentium II)
Pentium III
....
97
http://hartenstein.de
Xputer Lab
better to go for reconfigurable platforms
University of Kaiserslautern
• [Dataquest] PLD market > $7 billion by 2003.
• fastest growing segment of semiconductor market
• IP reuse and silicon reuse
• FPGAs are going into every type of application
© 2003, [email protected]
98
http://hartenstein.de
Throughput vs. Flexibility
Xputer Lab
University of Kaiserslautern
the anti machine
goes far beyond
bridging the gap
T. Claasen et al.: ISSCC 1999
*) R. Hartenstein: ISIS 1997
MOPS / mW
1000
throughput
100
hardwired
anti
machine
10
FPGAs
1
von
0.1
Neumann
flexibility
0.01
0.001
2
1
0.5
© 2003, [email protected]
0.25
0.13 0.1 0,07 µ feature size
99
http://hartenstein.de
Xputer Lab
University of Kaiserslautern
Why coarse grain ?
© 2003, [email protected]
100
http://hartenstein.de
Xputer Lab
consensus
is near
digital system platforms:
platform
DPU data path unit
category
rDPU reconfigurable DPU
DPA data path array (DPU array)
hardware
rDPA reconfigurable DPA
RA reconfigurable array
set
processor ISP
ISPinstruction
instruction set
processor
AM anti machine
AMP data stream processor*
• morphware
rAMP reconfigurable AMP
*) no “dataflow machine” data stream
FPGA field-programmable gate array
FPL field-programmable logic
processor (AMP)
PLD programmable logic device
reconfigurable
CPLD complex PLD
AMP (rAMP)
categories of morphware:
University of Kaiserslautern
morphware use
programming
source
(not programmable)
software
configware
streamware &
configware
• fine grain (~1 bit)
coarse grain (e.g. 32 bits)
reconfigurable computing
multi granular: by slice bundling
© 2003, [email protected]
101
machine
paradigm
none
von Neumann
FPGA: none
streamware
granularity (path width)
reconfigurable logic
Terminology
anti machine
(re)configurable blocks
CLBs
rDPUs (e.g. ALU-like)
rDPU slices (e.g. 4 bits)
http://hartenstein.de
>> Problems to be solved
Xputer Lab
University of Kaiserslautern
•
Configware Market
•
FPGA Market
•
Embedded Systems (Co-Design)
•
Hardwired IP Cores on Board
•
Run-Time Reconfiguration (RTR)
•
Rapid Prototyping & ASIC Emulation
•
Evolvable Hardware (EH)
•
Academic Expertise
•
ASICs dead
•
Soft CPU
•
HLLs
•
Problems to be solved
© 2003, [email protected]
102
http://hartenstein.de
Xputer Lab
University of Kaiserslautern
EDA industry shift into CS mentality
[Wojciech Maly]
• patches instead of engineering
• innovation stalled many years ago
• 85% users hate their tools
• netlist-based: do not care about efficiency, ...
• ... do not care about transistor density
© 2003, [email protected]
103
http://hartenstein.de
[Jonathan Rose] FPGAs Give You
Xputer Lab
University of Kaiserslautern
• Instant Fabrication
– Get to Market Fast
– Fix ‘em quick
• Zero NRE Charges
– Low Risk
– Low Cost at good volume
© 2003, [email protected]
104
http://hartenstein.de
Xputer Lab
The Crisis of Computing Sciences
University of Kaiserslautern
• Computing Sciences are in a severe crisis
• Computing curricula are obsolete because of strictly
enforced „procedural-only“ blinders
• Computer Architecture and related areas have
lost leadership in digital system implementation
• CS ignores > 90% µprocessors in embedded systems:
10 times more programmers will write embedded
applications than computer software by 2010
• A disruptive promising therapy introduced by new
approaches coming with Reconfigurable Computing
© 2003, [email protected]
105
http://hartenstein.de
Programming Language Paradigms
Xputer Lab
University of Kaiserslautern
language category
both deterministic
operation
sequence
driven by:
state register
address
computation
Instruction fetch
parallel memory
bank access
© 2003, [email protected]
Computer Languages
Languages f. Anti Machine
procedural sequencing: traceable, checkpointable
read next instruction,
read next data item,
goto (instr. addr.),
goto (data addr.),
jump (to instr. addr.),
jump (to data addr.),
instr. loop, loop nesting
data loop, loop nesting,
no parallel loops, escapes,
parallel loops, escapes,
instruction stream branching data stream branching
program counter
data counter(s)
massive memory
overhead avoided
cycle overhead
memory cycle overhead
overhead avoided
interleaving only
no restrictions
106
http://hartenstein.de
Xputer Lab
Ubiquitous embedded systems
University of Kaiserslautern
Embedded systems means:
20 billion µprocessors (2001)
• hardware / software
co-design
> 90% in embedded systems
10 times more programmers will
write embedded applications
than computer software by 2010
That’s where our graduates will go
© 2003, [email protected]
107
• configware / software
co-design
• hardware / configware /
software co-design
http://hartenstein.de
Xputer Lab
The Situation in Computing Sciences
University of Kaiserslautern
• Computing Sciences are in a severe crisis
• New fundamentals and R&D directions are inevitable
• my mission: getting you involved
• All knowledge needed is readily available ...
• ... even from Computing Sciences
• Silicon application and EDA provide useful concepts
• Reconfigurable Computing has the remedy
© 2003, [email protected]
108
http://hartenstein.de
the
edu
gap
has
dramatic
consequences
Xputer Lab
University of Kaiserslautern
• Key R&D scenes are drying out or dying
• because of a lack of qualified researchers
• the embedded system design crisis gets worse
• because of a lack of qualified designers
• many innovative products cannot be sold
• because of a lack of qualified customers
• the edu gap is widening dramatically
• because of a lack of qualified educators
© 2003, [email protected]
109
http://hartenstein.de
Super Pipe Networks
Xputer Lab
University of Kaiserslautern
array
systolic
array
supersystolic
DPA
applications
regular data
dependencies
only
pipeline properties
shape
resources
linear
only
uniform
only
mapping
linear projection or
algebraic synthesis
simulated
annealing or
P&R algorithm
no restrictions
scheduling
(data stream
formation)
(e.g. force-directed)
scheduling
algorithm
*) KressArray [ASP-DAC-1995]
© 2003, [email protected]
110
http://hartenstein.de
Xputer Lab
.... it‘s an alternative culture ....
University of Kaiserslautern
• now the area is going mainstream: a rapidly widening
audience of non-specialists gets interested ...
• severe communication gaps due to educational deficits
• not only to users: still many hardware and EDA experts
ask: isn’t it just logic design on a strange platform ?
• it is time to clarify and popularize fundamental aspects
and to explain, that it is a fundamentally different culture
© 2003, [email protected]
111
http://hartenstein.de
“von Neumann” Computer:
the wrong Machine Paradigm
Xputer
Xputer
LabLab
University
Kaiserslautern
University
of of
Kaiserslautern
Computer
RAM
tightly coupled
by compact
instruction code
Compiler
instructions
Sequencer
Datapath
Datapath
program hardwired
counter:
loosely coupled
by decision
data bits only
“von
Neumann”
does not support
soft data paths
Xputer:
Compiler
Scheduler
“instructions”
(multiple)
sequencer
The Soft
Machine
Paradigm data
Datapath
Array
reconfigurable
counter s
also for hardwired
state register
© 2003, [email protected]
© 2001, [email protected]
RAM
Xputer
112
(anti machine)
http://hartenstein.de
Semiconductor Revolutions
Xputer Lab
University of Kaiserslautern
standard
TTL
1957
custom
“Mainstream Silicon Application
is switching every 10 Years”
“The Programmable System-on-a-Chip
is the next wave“
hardwired
1967
procedural programming
µproc.,
memory
LSI,
MSI
1977
structural programming
2007
1987
ASICs,
accel’s
1997
algorithm: fixed
algorithm: variable
algorithm: variable
resources: fixed
resources: fixed
resources: variable
vN machine
Tredennick’s
paradigm
Paradigm Shifts
© 2003, [email protected]
113
anti machine
paradigm
http://hartenstein.de
Xputer Lab
Impact of Makimoto’s wave
University of Kaiserslautern
Software Industry’s
Secret of Success
Procedural
personalization
via RAM-based
Machine Paradigm
standard
µproc.,
memory
TTL
1967
1957
custom
LSI,
MSI
© 2003, [email protected]
2007
1987
1977
114
ASICs,
accel’s
1997
http://hartenstein.de
Xputer Lab
Impact of Makimoto’s wave
University of Kaiserslautern
Configware
Industry
Repeat Success Story by
new Machine Paradigm !
structural
personalization:
RAM-based
before run time
standard
µproc.,
memory
TTL
1967
1957
custom
LSI,
MSI
© 2003, [email protected]
2007
1987
1977
115
ASICs,
accel’s
1997
http://hartenstein.de
Xputer Lab
University of Kaiserslautern
Impact of Data-stream-based Embedded
... Hardware/
Repeat Success Story by
new Machine Paradigm !
standard
structural
personalization:
hardwired
before fabrication
µproc.,
memory
TTL
1967
1957
custom
Configware
Industry
LSI,
MSI
© 2003, [email protected]
2007
1987
1977
116
ASICs,
accel’s
1997
http://hartenstein.de
Xputer Lab
Rapidly growing CS education gap
University of Kaiserslautern
• Our computing curricula are obsolete
• introduction is strictly „procedural-only“
• vN-only use of terms like „computer organisation“,
„ computer structures“, „ computer architecture
• graduates are not prepared to the real world
– most applications for embedded systems (>90% by 2010)
• our graduates are unable to compete with EE graduates
• only a few % curricula need to be changed
• my mission: getting
© 2003, [email protected]
you involved
117
http://hartenstein.de
Xputer Lab
University of Kaiserslautern
Binding Time vs. Computing Domain
Binding time: (Set-up of
Communication Channels)
at run time
microprocessor
parallel computer
array processor
at loading time
Reconfigurable
Computing
at compile time
later fabrication step
supersystolic
arrays
systolic
arrays
before fabrication
programming domain:
© 2003, [email protected]
time domain
(procedural)
118
time & space
(hybrid)
ASICs
full custom
ICs
space domain
(structural)
http://hartenstein.de
Why Coarse Grain instead of FPGA ?
Xputer Lab
University of Kaiserslautern
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld
physical
logical
100 000 000 000
FPGA
physical
Transistors / chip
10 000 000 000
1000 000 000
FPGA
routed
10 000 000
reduced reconfigurability
overhead by up to ~ 1000
1000 000
100 000
drastically
much
fastersmaller
loading
configuration memory
a lot of more benefits
10 000
© 2003, [email protected]
~ 10 000
FPGA
logical
100 000 000
1000
1980
~ 10
1990
2000
119
2010
http://hartenstein.de
Xputer Lab
What are the differences ?
University of Kaiserslautern
vN* computing:
Reconfigurable Computing:
• computing in time
• computing in space and time
• instruction fetch
at run time
• procedural programming
• instruction scheduling
• “instruction” fetch at compile time
• structural programming
• data scheduling
• i. e. Data-stream-based
• also hardwired implementations**
*) vN stands for “von Neumann”
**) e g. Bee project Prof. Broderson
© 2003, [email protected]
• “instruction” fetch before fabrication
120
http://hartenstein.de
Basics of Binding Time
Xputer Lab
University of Kaiserslautern
“Instruction”
generalized:
including
complex
expressions
and other
datapaths
time of “Instruction Fetch”
run time
parallel computer
loading time
strong impact on the
machine paradigm !
compile time
© 2003, [email protected]
microprocessor
121
Reconfigurable
Computing
http://hartenstein.de
Data-stream-based Parallelism
Xputer Lab
University of Kaiserslautern
See my other talk
ICECS 2002
IEEE 9th International Conference
on Electronics, Circuits and Systems
Dubrovnik, Croatia
September 15-18, 2002
(invited paper)
Michael Herz,
Agilent
Technologies
Reiner
Hartenstein,
University of
Kaiserslautern
Memory Organisation for
Datastream-based
Reconfigurable Computing
Miguel Miranda,
Erik Brockmeyer,
Francky Catthoor,
IMEC, Leuven
© 2003, [email protected]
122
http://hartenstein.de
Machine paradigms
Xputer Lab
University of Kaiserslautern
von Neumann
M
instruction
stream
instruction
stream
machine
data-stream machine
Flowware
data
path
I/O
M
I/O
(ALU)
memory
data address
generator
(data sequencer)
data asM*
stream
data
path
unit
CPU instruction
sequencer
Software
Configware
DPU or
rDPU
embedded memory architecture*
M M M M
I/O
M
M M M M
M
memory
I/O
(r)DPA
(r)DPU
© 2003, [email protected]
123
http://hartenstein.de
Xputer Lab
Synthesizable Memory Communication
University of Kaiserslautern
An example by
Nageldinger’s
KressArray
Xplorer
Efficient Memory
Communication
should be directly
supported by the
Mapper Tools
Legend:
Optimized
Parallel
memory ports Memory
Controller
sequencers
application
not used
http://kressarray.de
© 2003, [email protected]
124
http://hartenstein.de
Data-Stream-based Soft Machine
Xputer Lab
University of Kaiserslautern
Memory
(data memory)
Compiler
“instructions”
Scheduler
rDPA
memory bank
memory bank
memory bank
...
memory bank
...
memory bank
Sequencers
(data stream
generator)
© 2003, [email protected]
125
http://hartenstein.de
Xputer Lab
Terminology has been
highly confusing
University of Kaiserslautern
###############
factor
4y
2
30y
Battery capacity (1.03/year)
1
0
*) Department
10
12
of Trade and Industry, London
© 2003, [email protected]
18
24
126
36
months
10y
48
http://hartenstein.de
Semiconductor Revolutions
Xputer Lab
University of Kaiserslautern
standard
TTL
1957
custom
“Mainstream Silicon Application
is switching every 10 Years”
“The Programmable System-on-a-Chip
is the next wave“
hardwired
1967
procedural programming
µproc.,
memory
LSI,
MSI
1977
structural programming
2007
1987
ASICs,
accel’s
1997
algorithm: fixed
algorithm: variable
algorithm: variable
resources: fixed
resources: fixed
resources: variable
vN machine
Tredennick’s
paradigm
Paradigm Shifts
© 2003, [email protected]
127
anti machine
paradigm
http://hartenstein.de
No vN bottleneck
Xputer Lab
University of Kaiserslautern
The anti machine has no von
Neumann bottleneck.
© 2003, [email protected]
128
http://hartenstein.de
3 different mind sets
Xputer Lab
University of Kaiserslautern
hardware people
TTL
1957
1967
CS
people
µproc.,
memory
LSI,
MSI
1977
new breed needed
1987
ASICs,
accel’s
FPGAs
1997
2007
soft
CPUs
coarse
grain
Common terminology needed
© 2003, [email protected]
129
http://hartenstein.de
Throughput vs. Flexibility
Xputer Lab
University of Kaiserslautern
the anti machine
goes far beyond
bridging the gap
T. Claasen et al.: ISSCC 1999
*) R. Hartenstein: ISIS 1997
MOPS / mW
1000
throughput
100
hardwired
anti
machine
10
FPGAs
1
von
0.1
Neumann
flexibility
0.01
0.001
2
1
0.5
© 2003, [email protected]
0.25
0.13 0.1 0,07 µ feature size
130
http://hartenstein.de
Programming sources
Xputer Lab
University of Kaiserslautern
von Neumann
instruction stream
machine
hardware
resources fixed
algorithms variable
software
hardwired only
Anti machine
data stream machine
morphware
resources variable
configware
algorithms variable
streamware
flowware
© 2003, [email protected]
reconfigurable
or hardwired
131
http://hartenstein.de