Topic: Multithreaded Execution and Architecture Models

Download Report

Transcript Topic: Multithreaded Execution and Architecture Models

Dataflow Model of Computation
(From Dataflow to Multithreading)
Guang R. Gao
ACM Fellow and IEEE Fellow
Endowed Distinguished Professor
Electrical & Computer Engineering
University of Delaware
[email protected]
652-11S-Topic-Gao-Dataflow
1
CPU
CPU
Memory
Memory
Thread
Unit
Executor
Locus
A Single
Thread
Coarse-Grain threadThe family home model
Thread
Unit
Executor
Locus
A Pool
Thread
Fine-Grain non-preemptive threadThe “hotel” model
Coarse-Grain vs. Fine-Grain Multithreading
[Gao: invited talk at Fran Allen’s Retirement Workshop, 07/2002]
652-11S-Topic-Gao-Dataflow
2
Evolution of Multithreaded
Execution and Architecture Models
CHoPP’77
Non-dataflow
based
CHoPP’87
MASA
Alwife
Halstead
1986
Agarwal
1989-96
HEP
CDC 6600
1964
Tera
B. Smith
1978
Flynn’s
Processor
B. Smith
1990-
Cosmic Cube
Seiltz
1985
1969
Eldorado
CASCADE
J-Machine
M-Machine
Dally
1988-93
Dally
1994-98
Others: Multiscalar (1994), SMT (1995), etc.
Dataflow
model inspired
Monsoon
MIT TTDA
Arvind
1980
LAU
Syre
1976
Static
Dataflow
Papadopoulos
& Culler
1988
P-RISC
*T/Start-NG
Nikhil &
Arvind
1989
MIT/Motorola
1991-
Iannuci’s
1988-92
TAM
Cilk
Culler
1990
Leiserson
Manchester
SIGMA-I
Gurd & Watson
1982
Shimada
1988
EM-5/4/X
RWC-1
1992-97
Dennis 1972
MIT
Arg-Fetching
Dataflow
DennisGao
1987-88
MDFA
Gao
1989-93
652-11S-Topic-Gao-Dataflow
MTA
HumTheobald
Gao 94
EARTH
PACT95’,
ISCA96,
Theobald99
CARE
Marquez04
3
The Von Neumann-type Processing
begin
for i = 1 …
…
endfor
end
Compiler
Sequential
Machine
Representation
Source Code
Load
CPU
Processor
652-11S-Topic-Gao-Dataflow
4
A Multithreaded Architecture
To Other PE’s
One PE
652-11S-Topic-Gao-Dataflow
5
McGill Data Flow
Architecture Model
(MDFA)
652-11S-Topic-Gao-Dataflow
6
n1
n1
store
fetch
fetch
fetch
fetch
n2
n3
n2
Argument –flow Principle
n3
Argument –fetching Principle
652-11S-Topic-Gao-Dataflow
7
A Dataflow Program Tuple
Program Tuple = { P-Code . S-Code }
S-Code
P-Code
N1: x = a + b;
N2: y = c – d;
N3: z = x * y;
a
b
n1
2
3
c
d
IPU
2
3
2
3
n1
n2
ISU
652-11S-Topic-Gao-Dataflow
8
The McGill Dataflow Architecture Model
Pipelined Instruction
Processing Unit (PIPU)
Fire
Done
Dataflow Instruction
Scheduling Unit (DISU)
Enable Memory &
Controller
Signal
Processing
652-11S-Topic-Gao-Dataflow
9
The McGill Dataflow Architecture Model
Pipelined Instruction
Processing Unit (PIPU)
Important Features
Fire
Pipeline can be kept fully
utilized provided that the
program has sufficient
parallelism
Done
Dataflow Instruction
Scheduling Unit (DISU)
Enabled Instructions
Waiting Instructions
652-11S-Topic-Gao-Dataflow
= PC
10
The Scheduling Memory (Enable)
Dataflow Instruction
Scheduling Unit (DISU)
Fire
1
0
1
1
0
0
1
0
0
0
1
0
0
1
0
1
1
1
0
1
1
C
O
N
T
R
O
L
L
E
R
Done
Count Signal(s)
Signal Processing
Enabled Instructions
0
Waiting Instructions
652-11S-Topic-Gao-Dataflow
11
Advantages of the McGill Dataflow
Architecture Model
• Eliminate unnecessary token copying
and transmission overhead
• Instruction scheduling is separated
from the main datapath of the
processor (e.g. asynchronous,
decoupled)
652-11S-Topic-Gao-Dataflow
12
Von Neumann Threads as Macro Dataflow
Nodes
A sequence of
instructions is “packed”
into a macro-dataflow
node
1
2
3
Synchronization is done
at the macro-node level
k
652-11S-Topic-Gao-Dataflow
13
Hybrid Evaluation Von Neumann
Style Instruction Execution” on
the McGill Dataflow Architecture
• Group a “sequence” of dataflow instruction into a
“thread” or a macro dataflow node.
• Data-driven synchronization among threads.
• “Von Neumann style sequencing” within a thread.
Advantage:
Preserves the parallelism among threads but avoids
unnecessary fine-grain synchronization between
instructions within a sequential thread.
652-11S-Topic-Gao-Dataflow
14
What Do We Get?
• A hybrid architecture model
without sacrificing the advantage
of fine-grain parallelism!
(latency-hiding, pipelining support)
652-11S-Topic-Gao-Dataflow
15
A Realization of the Hybrid Evaluation
Shortcut
Pipelined Instruction
Processing Unit (PIPU)
Fire
Von Neumann bit
Done
1
2
k
Dataflow Instruction
Scheduling Unit (DISU)
652-11S-Topic-Gao-Dataflow
16
Case Studies –
Dataflow Model Insired Multithreading
• McGill Dataflow Model (1988 - 1993)
• EARTH Model (1993 – mid 2000s )
• The UHPC/Runnemede Model (2010 - )
7/17/2015
421-10-F/Topic-3-II-FineGrain-Cases
17