Transcript Slide 1

ROUTING ARCHITECTURE AND ALGORITHMS FOR A
SUPERCONDUCTIVITY CIRCUITS-BASED COMPUTING HARDWARE
Farhad Mehdipour, Hiroaki Honda, Hiroshi Kataoka, Koji Inoue,
Kazuaki Murakami
Kyushu University, Japan
CCECE 2011
CREST-JST (2006~): Low-power,
high-performance, reconfigurable processor using
single-flux quantum (SFQ) circuits
Yokohama National Univ.
SFQ-FPU chip, cell library
N. Yoshikawa et al.
Nagoya Univ.
SFQ-RDP chip, cell library,
and wiring
A. Fujimaki et al.
Nagoya Univ.
CAD for logic design
and arithmetic circuits
N. Takagi (Leader)
et al.
SFQ-LSRDP
Kyushu Univ.
Architecture, Compiler
and Applications
K. Murakami
K. Inoue
H. Honda
F. Mehdipour
H. Kataoka
Superconducting
Research Lab. (SRL)
SFQ process
S. Nagasawa et al.
Our mission: Architecture, compiler and application development
2
Outline of Large-Scale Reconfigurable Data-Path
(LSRDP) Processor
磁束量子
超伝導ループ
Superconductivity
Single Flux Quantum
loop
ジョセフソン接合
Josephson
junction
SFQ Features:




High-speed switching and signal transmission
Low power consumption
Compact implementation (smaller area)
Suitable for pipeline processing
3
How it works
inst;
inst;
…
conf_LSRDP ( );
Loop:
rearrange_input_data ( );
set_IO_info ( );
run_LSRDP ( );
inst;
…
sync_lsrdp ( );
rearrange_output_data ( );
End_Loop
inst;
…
Memory Controller
Memory
GPP
Buffers
…
LSRDP
conf. bit-stream
…
GPP
Waiting
for
rearrange_output_data
rearrange_input_data
conf_LSRDP();
run_LSRDP
inst
set_IO_info( ();();)()
sync_lsrdp
the LSRDP
…
…
LSRDP terminating
the operation
…
Buffers
…
4
Architecture Exploration
LSRDP Layouts
ADD/
SUB
MUL
ADD/
SUB
MUL
ADD/
SUB
MUL
ADD/
SUB
MUL
ADD/
SUB
MUL
ADD/
SUB
MUL
...
ADD/
SUB
MUL
ORN
ADD/
SUB
MUL
ADD/
SUB
MUL
ADD/
SUB
MUL
ADD/
SUB
MUL
ADD/
SUB
MUL
...
ADD/
SUB
MUL
.
.
.
ADD/
SUB
MUL
MUL
MUL
ADD/
SUB
...
ADD/
SUB
MUL
ADD/
SUB
ADD/
SUB
MUL
.
.
.
.
.
.
...
ADD/
SUB
.
.
.
MUL
MUL
...
ADD/
SUB
...
MUL
...
PE structures
ORN
MUL
...
ADD/
SUB
ADD/
SUB
ADD/
SUB
TU
FU
TU
TU
FU
TU
TU
FU
TU
ORN
MUL
...
MUL
MUL
MUL
ORN
ORN
ADD/
SUB
MUL
Layout-I
...
ORN
ORN
.
.
.
MUL
ORN
ADD/
SUB
MUL
ORN
ADD/
SUB
MUL
ADD/
SUB
MUL
ADD/
SUB
MUL
.
.
.
.
.
.
.
.
.
Layout-II
...
ADD/
SUB
ADD/
SUB
ADD/
SUB
ADD/
SUB
.
.
.
.
.
.
.
.
.
.
.
.
PE arch. I
PE arch. II
Basic PE arch.
4-inps/3-outs
3-inps/3-outs
3-inps/2-outs
...
Layout-III
MCL= 1
Number of rows = 1.5×M
Number of rows = 2×M
Number of rows = 1.5×M
ORN structures
Number of columns = 4×MCL
MCL= 1
Number of columns = 6×MCL+2
MCL= 2
Number of columns = 4×MCL+1
5
LSRDP Tool Chain
Application
C code
Modifying application code
1
Inserting LSRDP
instructions in the code
Modified
application code
1
2
1
LSRDP library file
LSRDP architecture
description
DFG Extraction
Function definitions
& declarations
1
ISAcc or COINS compiler
2
2
1
Data flow graphs
2
Placing and Routing Tool
.asm code
for MIPS-based GPP
1: flow of the assembly code generation
for GPP
2: flow of configuration bit-stream
generation for the LSRDP
2
Configuration file +
various text & schematic
reports
Simulator
Performance evaluation6
Mapping DFGs onto LSRDP
DFG
Placing Input Nodes
LSRDP
Architecture
Description
Placing Operational &
Output Nodes
Routing Nets
Routing IO Nets
Final Map
Longest connections
7
Global routing algorithms
Routing DFG connections between source and destination PEs
exhaustive search-based
very time consuming
branch and bound alg.
Very fast
src
src
vacant
fully- occupied
dest
dest
8
Micro-Routing-Problem Definition
• Inputs
– LSRDP basic specifications
FU T FU T FU T …
FU T
i-th row
FU T
(i+1)-th
row
•Layout, Width (W), MCL, PE arch., and etc.
•List of connections b/w consecutive rows ORN
– ORN structure including
•The number of CBs and T2s in each row
•The number of CB rows
•Topology of connections among CBs
FU T FU T FU T …
• Output
– Detailed routes via cross-bar switches
•The list of CBs used for routing each connection
•Configuration of CBs
A micro-routing algorithm has been implemented for the LSRDP with
underlying layout II and PE arch. III
ORN Micro-routing
CB: 2-input/2-output
1
Example
CB
PE1
11
(PE2 PE5, PE6, PE7)
3
PE 3
½CB
01
10
11
3
2
4
CB
3
4
PE 4
3
½CB
Micro-nets
00
CB
CB
4
3
4
(CB)
4
PE 6
PE 7
2
3
CB
½CB
PE 5
2
CB
2
(PE4 PE7, PE8)
1
2
2
½CB
(PE3 PE6, PE8 )
1/2CB: 1-input/2-ouput
CB
2
PE 2
2
CB
10
01
1
½CB
(PE1 PE 5)
00
(CB)
PE 8
4
10
PEs in
3rd
12
Row
6
4
17
18
20
7
8
11
8
9
10
6
…
18
20
9
…
24
25
12
10
24
25
12
…
14
15
15
16
31
32
…
16
17
17
18
18
12
17
18
9
17
18
20
10
20
11
24
25
12
13
24
25
17
18
9
10
12
13
31
32
15
16
17
18
12
17
18
18
20
24
24
25
9
10
11
12
13
17
18
18
20
17
18
9
10
24
25
12
24
13
31
15
32
16
17
18
12
17
18
20
18
24
25
24
9
10
11
12
13
14
17
18
20
18
24
25
16
17
18
12
8
9
10
24
9
18
17
20
10
20
11
18
12
24
25
31
32
15
16
17
18
12
17
18
11
14
7
6
8
13
15
31
32
6
6
7
8
14
15
16
8
6
7
11
14
31
32
6
12
7
8
14
15
16
8
6
7
11
14
31
32
6
12
7
8
13
14
6
7
11
13
31
32
12
7
8
17
11
9
10
12
7
5
6
ORN Micro-Routing
Example: Heat 8x2- ORN b/w 3rd and 4th Rows
12
13
31
32
15
16
17
18
8
9
10
12
5
18
17
6
20
7
24
18
11 24
18
8
12
25
13 25
14
24
7
PEs in
4th
row
4
14
24
31
32
9
24
15 31 10
32
16
11
17
18
Specifications of Attempted DFGs
total # of nodes
Heat-8x1
Heat-8x2
34
# of Inputs
6
# of outputs
# of ops
4
16
60
8
4
32
172
16
12
96
62
18
1
33
48
8
4
24
Vibration-8x2
136
16
12
72
Vibration-8x4
168
16
8
96
Heat-16x2
Poisson-3x3
Vibration-4x2
ERI-1
ERI-2
76
16
9
51
67
19
1
47
CCECE 2011
12
Example of a DFG Mapping
Vibration- 8x2
CCECE 2011
13
Results of routing nets using the proposed algorithms
DFG
Heat-8x1
avg. hor. C.L.
avg./max.
ver. C.L.
# of global/micro
nets to route
Time
to map (sec)
0.35
0.75/3
36/64
0.015
Heat-8x2
0.44
Heat-16x2
0.47
Poisson-3x3
0.68
Vibration-4x2
0.46
Vibration-8x2
0.42
Vibration-8x4
2.48
ERI-1
0.75
ERI-2
0.78
1.32/5
1.64/7
2.4/16
1.58/9
2.15/10
3.72/16
2.21/9
2.99/9
CCECE 2011
68/114
1.75
204/343
1.05
67/120
2074.5
50/88
0.34
154/332
2.20
348/610
6721.3
111/374
53.61
95/332
0.327
14
Thank you for your attention!