Transcript Slide 1
ROUTING ARCHITECTURE AND ALGORITHMS FOR A SUPERCONDUCTIVITY CIRCUITS-BASED COMPUTING HARDWARE Farhad Mehdipour, Hiroaki Honda, Hiroshi Kataoka, Koji Inoue, Kazuaki Murakami Kyushu University, Japan CCECE 2011 CREST-JST (2006~): Low-power, high-performance, reconfigurable processor using single-flux quantum (SFQ) circuits Yokohama National Univ. SFQ-FPU chip, cell library N. Yoshikawa et al. Nagoya Univ. SFQ-RDP chip, cell library, and wiring A. Fujimaki et al. Nagoya Univ. CAD for logic design and arithmetic circuits N. Takagi (Leader) et al. SFQ-LSRDP Kyushu Univ. Architecture, Compiler and Applications K. Murakami K. Inoue H. Honda F. Mehdipour H. Kataoka Superconducting Research Lab. (SRL) SFQ process S. Nagasawa et al. Our mission: Architecture, compiler and application development 2 Outline of Large-Scale Reconfigurable Data-Path (LSRDP) Processor 磁束量子 超伝導ループ Superconductivity Single Flux Quantum loop ジョセフソン接合 Josephson junction SFQ Features: High-speed switching and signal transmission Low power consumption Compact implementation (smaller area) Suitable for pipeline processing 3 How it works inst; inst; … conf_LSRDP ( ); Loop: rearrange_input_data ( ); set_IO_info ( ); run_LSRDP ( ); inst; … sync_lsrdp ( ); rearrange_output_data ( ); End_Loop inst; … Memory Controller Memory GPP Buffers … LSRDP conf. bit-stream … GPP Waiting for rearrange_output_data rearrange_input_data conf_LSRDP(); run_LSRDP inst set_IO_info( ();();)() sync_lsrdp the LSRDP … … LSRDP terminating the operation … Buffers … 4 Architecture Exploration LSRDP Layouts ADD/ SUB MUL ADD/ SUB MUL ADD/ SUB MUL ADD/ SUB MUL ADD/ SUB MUL ADD/ SUB MUL ... ADD/ SUB MUL ORN ADD/ SUB MUL ADD/ SUB MUL ADD/ SUB MUL ADD/ SUB MUL ADD/ SUB MUL ... ADD/ SUB MUL . . . ADD/ SUB MUL MUL MUL ADD/ SUB ... ADD/ SUB MUL ADD/ SUB ADD/ SUB MUL . . . . . . ... ADD/ SUB . . . MUL MUL ... ADD/ SUB ... MUL ... PE structures ORN MUL ... ADD/ SUB ADD/ SUB ADD/ SUB TU FU TU TU FU TU TU FU TU ORN MUL ... MUL MUL MUL ORN ORN ADD/ SUB MUL Layout-I ... ORN ORN . . . MUL ORN ADD/ SUB MUL ORN ADD/ SUB MUL ADD/ SUB MUL ADD/ SUB MUL . . . . . . . . . Layout-II ... ADD/ SUB ADD/ SUB ADD/ SUB ADD/ SUB . . . . . . . . . . . . PE arch. I PE arch. II Basic PE arch. 4-inps/3-outs 3-inps/3-outs 3-inps/2-outs ... Layout-III MCL= 1 Number of rows = 1.5×M Number of rows = 2×M Number of rows = 1.5×M ORN structures Number of columns = 4×MCL MCL= 1 Number of columns = 6×MCL+2 MCL= 2 Number of columns = 4×MCL+1 5 LSRDP Tool Chain Application C code Modifying application code 1 Inserting LSRDP instructions in the code Modified application code 1 2 1 LSRDP library file LSRDP architecture description DFG Extraction Function definitions & declarations 1 ISAcc or COINS compiler 2 2 1 Data flow graphs 2 Placing and Routing Tool .asm code for MIPS-based GPP 1: flow of the assembly code generation for GPP 2: flow of configuration bit-stream generation for the LSRDP 2 Configuration file + various text & schematic reports Simulator Performance evaluation6 Mapping DFGs onto LSRDP DFG Placing Input Nodes LSRDP Architecture Description Placing Operational & Output Nodes Routing Nets Routing IO Nets Final Map Longest connections 7 Global routing algorithms Routing DFG connections between source and destination PEs exhaustive search-based very time consuming branch and bound alg. Very fast src src vacant fully- occupied dest dest 8 Micro-Routing-Problem Definition • Inputs – LSRDP basic specifications FU T FU T FU T … FU T i-th row FU T (i+1)-th row •Layout, Width (W), MCL, PE arch., and etc. •List of connections b/w consecutive rows ORN – ORN structure including •The number of CBs and T2s in each row •The number of CB rows •Topology of connections among CBs FU T FU T FU T … • Output – Detailed routes via cross-bar switches •The list of CBs used for routing each connection •Configuration of CBs A micro-routing algorithm has been implemented for the LSRDP with underlying layout II and PE arch. III ORN Micro-routing CB: 2-input/2-output 1 Example CB PE1 11 (PE2 PE5, PE6, PE7) 3 PE 3 ½CB 01 10 11 3 2 4 CB 3 4 PE 4 3 ½CB Micro-nets 00 CB CB 4 3 4 (CB) 4 PE 6 PE 7 2 3 CB ½CB PE 5 2 CB 2 (PE4 PE7, PE8) 1 2 2 ½CB (PE3 PE6, PE8 ) 1/2CB: 1-input/2-ouput CB 2 PE 2 2 CB 10 01 1 ½CB (PE1 PE 5) 00 (CB) PE 8 4 10 PEs in 3rd 12 Row 6 4 17 18 20 7 8 11 8 9 10 6 … 18 20 9 … 24 25 12 10 24 25 12 … 14 15 15 16 31 32 … 16 17 17 18 18 12 17 18 9 17 18 20 10 20 11 24 25 12 13 24 25 17 18 9 10 12 13 31 32 15 16 17 18 12 17 18 18 20 24 24 25 9 10 11 12 13 17 18 18 20 17 18 9 10 24 25 12 24 13 31 15 32 16 17 18 12 17 18 20 18 24 25 24 9 10 11 12 13 14 17 18 20 18 24 25 16 17 18 12 8 9 10 24 9 18 17 20 10 20 11 18 12 24 25 31 32 15 16 17 18 12 17 18 11 14 7 6 8 13 15 31 32 6 6 7 8 14 15 16 8 6 7 11 14 31 32 6 12 7 8 14 15 16 8 6 7 11 14 31 32 6 12 7 8 13 14 6 7 11 13 31 32 12 7 8 17 11 9 10 12 7 5 6 ORN Micro-Routing Example: Heat 8x2- ORN b/w 3rd and 4th Rows 12 13 31 32 15 16 17 18 8 9 10 12 5 18 17 6 20 7 24 18 11 24 18 8 12 25 13 25 14 24 7 PEs in 4th row 4 14 24 31 32 9 24 15 31 10 32 16 11 17 18 Specifications of Attempted DFGs total # of nodes Heat-8x1 Heat-8x2 34 # of Inputs 6 # of outputs # of ops 4 16 60 8 4 32 172 16 12 96 62 18 1 33 48 8 4 24 Vibration-8x2 136 16 12 72 Vibration-8x4 168 16 8 96 Heat-16x2 Poisson-3x3 Vibration-4x2 ERI-1 ERI-2 76 16 9 51 67 19 1 47 CCECE 2011 12 Example of a DFG Mapping Vibration- 8x2 CCECE 2011 13 Results of routing nets using the proposed algorithms DFG Heat-8x1 avg. hor. C.L. avg./max. ver. C.L. # of global/micro nets to route Time to map (sec) 0.35 0.75/3 36/64 0.015 Heat-8x2 0.44 Heat-16x2 0.47 Poisson-3x3 0.68 Vibration-4x2 0.46 Vibration-8x2 0.42 Vibration-8x4 2.48 ERI-1 0.75 ERI-2 0.78 1.32/5 1.64/7 2.4/16 1.58/9 2.15/10 3.72/16 2.21/9 2.99/9 CCECE 2011 68/114 1.75 204/343 1.05 67/120 2074.5 50/88 0.34 154/332 2.20 348/610 6721.3 111/374 53.61 95/332 0.327 14 Thank you for your attention!