Transcript Document
Logical Effort of Higher Valency Adders David Harris Harvey Mudd College November 2004 Outline Are higher valency adders worth the hassle? Prefix networks Building blocks Critical paths Results Conclusions Logical Effort of Higher Valency Adders Slide 2 Prefix Networks A4 B4 A3 B3 A2 B2 A1 B1 Cin Precomputation G4 P4 P4:4 G3 G3:3 P3 P3:3 G2 G2:2 P2 P2:2 G1 G1:1 P1 P1:1 G0 G0:0 P0 P0:0 Prefix Network G3:0 G2:0 C3 G1:0 C2 G0:0 C1 C0 Postcomputation C4 Cout S4 S3 S2 Logical Effort of Higher Valency Adders S1 Slide 3 Pre/post computation Inverting Static CMOS Bitwise Ai 2 Bi 2 Bi 2 Ai 2 Pi Sum XOR 2 Gi-1:0 1 Ai 4 Bi 4 1 Bi 1 Gi Ai Gi-1:0 Footless Domino 4 Pi 4 Gi-1:0 4 Gi-1:0 2 Gi-1:0 2 2 Pi Pi Ai_h 1 1 Si P i' 2 1 2 Ai_l 2 2 Ai_h 2 Bi_h 2 2 Bi_l 4 Gi-1:0 Pi 1 Ai_l 2 2 Pi 2 Gi-1:0 2 2 Ki-1:0 Gi H Pi H Ki tiny 1 2 Pi H Pi' P i' H Si_h H Si_l Cell Term Noninverting CMOS Inverting CMOS Footed Domino Footless Domino Bitwise LEbit 9/3 9/3 6/3 * 5/6 4/3 * 5/6 PDbit 6/3 + 1 6/3 7/3 + 5/6 5/3 + 5/6 LExor 9/3 9/3 3/3 * 5/6 2/3 * 5/6 PDxor 9/3 + 12/3 9/3 + 12/3 7/3 + 5/6 5/3 + 5/6 LEbuf 1 * 1/2 1 * 1/2 2/3 * 5/6 * 1/2 1/3 * 5/6 * 1/2 Sum XOR Buffer Logical Effort of Higher Valency Adders Slide 4 Valency 2 Networks (1 , 2 , 0 ) ( f ) L a d n e r - F is c h e r 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 3 :1 2 1 1 :1 0 1 5 :1 2 9 :8 7 :6 1 1 :8 5 :4 3 :2 7 :4 14 13 12 1 3 :8 1 5 :8 1 3 :0 1 5 :1 4 3 :0 7 :0 11 10 9 8 1 3 :1 2 1 1 :1 0 9 :8 (0 , 3 , 0 ) ( b ) S k la n s k y 15 14 1 5 :1 4 13 12 1 3 :1 2 11 10 1 1 :1 0 1 5 :0 9 8 7 9 :8 6 5 7 :6 4 3 5 :4 2 3 :2 1 1 4 :0 1 3 :0 1 2 :0 6 7 :6 5 4 3 1 1 :8 5 :4 2 3 :2 1 0 7 :4 1 :0 0 :0 1 :0 3 :0 5 :0 1 5 :8 1 1 :0 7 1 :0 1 5 :1 2 1 5 :8 (3 , 0 , 0 ) (a ) B re n t-K u n g 0 15 1 5 :1 4 7 :0 9 :0 1 1 :0 1 0 :0 9 :0 8 :0 7 :0 6 :0 5 :0 4 :0 3 :0 2 :0 1 :0 1 1 :0 0 :0 1 3 :0 0 9 :0 5 :0 1 5 :0 1 4 :0 1 3 :0 1 2 :0 1 1 :0 1 0 :0 9 :0 1 :0 8 :0 7 :0 6 :0 5 :0 4 :0 3 :0 2 :0 l ( L o g ic L e v e l s ) 1 5 :1 2 1 4 :1 2 1 1 :8 1 0 :8 7 :4 6 :4 3 :0 2 :0 B r e n t1 5 :8 1 4 :8 1 3 :8 1 2 :8 K ung Ladner1 5 :0 1 4 :0 1 3 :0 1 2 :0 1 1 :0 1 0 :0 9 :0 8 :0 7 :0 6 :0 5 :0 4 :0 3 :0 2 :0 1 :0 3 (7) F is c her 0 :0 LadnerF is c her f ( F a n o u t) S k lans k y 2 (6) 3 (9 ) 1 (5 ) 2 (5) ( e ) K n o w le s [2 ,1 ,1 ,1 ] 1 (3) (0 , 1 , 2 ) 0 (4 ) 0 (2 ) 15 14 13 12 11 10 9 8 7 6 5 4 3 1 5 :1 4 1 4 :1 3 1 3 :1 2 1 2 :1 1 1 1 :1 0 1 0 :9 9 :8 8 :7 7 :6 6 :5 5 :4 4 :3 3 :2 1 5 :1 2 1 4 :1 1 1 3 :1 0 1 2 :9 1 1 :8 1 0 :7 9 :6 8 :5 7 :4 6 :3 5 :2 4 :1 3 :0 1 5 :8 1 4 :7 1 3 :6 1 2 :5 1 1 :4 1 0 :3 9 :2 8 :1 7 :0 6 :0 5 :0 4 :0 2 2 :1 1 0 H an- 0 (1) C a rl s o n New 1 :0 K n o w le s ( d ) H a n - C a r ls o n ( 1 ,1 , 1 ) 15 1 (2) 1 5 :0 1 4 :0 1 3 :0 1 2 :0 1 1 :0 1 0 :0 9 :0 8 :0 7 :0 6 :0 5 :0 4 :0 3 :0 2 :0 (1 , 0 , 2 ) [4 , 2 , 1 ,1 ] 2 :0 1 :0 0 :0 14 13 12 11 10 9 8 7 6 5 4 3 1 5 :1 4 1 3 :1 2 1 1 :1 0 9 :8 7 :6 5 :4 3 :2 1 5 :1 2 1 3 :1 0 1 1 :8 9 :6 7 :4 5 :2 3 :0 1 5 :8 1 3 :6 1 1 :4 9 :2 7 :0 5 :0 2 1 0 1 :0 0 :0 1 :0 H anC a rl s o n K n o w le s [2 , 1 ,1 ,1 ] 2 (4) 1 5 :0 1 4 :0 1 3 :0 1 2 :0 1 1 :0 1 0 :0 9 :0 8 :0 7 :0 6 :0 5 :0 4 :0 3 :0 2 :0 K ogge3 (8) S to n e t (W i r e T ra c k s ) (c ) K o g g e -S to n e 15 14 13 1 5 :1 4 1 4 :1 3 1 3 :1 2 1 5 :1 2 1 4 :1 1 1 3 :1 0 1 4 :7 1 3 :6 1 5 :8 12 1 2 :1 1 (0 , 0 , 3 ) 11 10 9 8 7 6 5 4 3 1 1 :1 0 1 0 :9 9 :8 8 :7 7 :6 6 :5 5 :4 4 :3 3 :2 1 2 :9 1 1 :8 1 0 :7 9 :6 8 :5 7 :4 6 :3 5 :2 4 :1 3 :0 1 2 :5 1 1 :4 1 0 :3 9 :2 8 :1 7 :0 6 :0 5 :0 4 :0 1 5 :0 1 4 :0 1 3 :0 1 2 :0 1 1 :0 1 0 :0 9 :0 Logical Effort of Higher Valency Adders 8 :0 7 :0 6 :0 5 :0 4 :0 3 :0 2 2 :1 1 0 1 :0 0 :0 1 :0 2 :0 2 :0 Slide 5 Valency 3: BK 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 26:0 25:0 24:0 23:0 22:0 21:0 20:0 19:0 18:0 17:0 16:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 Logical Effort of Higher Valency Adders Slide 6 Valency 3: LF 26 0 26:0 0:0 Logical Effort of Higher Valency Adders Slide 7 Valency 3: Sklansky 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 26:0 25:0 24:0 23:0 22:0 21:0 20:0 19:0 18:0 17:0 16:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 Logical Effort of Higher Valency Adders Slide 8 Valency 3: KS 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 26:0 25:0 24:0 23:0 22:0 21:0 20:0 19:0 18:0 17:0 16:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 Logical Effort of Higher Valency Adders Slide 9 Valency 3: HC 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 26:0 25:0 24:0 23:0 22:0 21:0 20:0 19:0 18:0 17:0 16:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 Logical Effort of Higher Valency Adders Slide 10 Building Blocks Black Cells, Gray Cells, Buffers – Black cells compute G and P – Gray cells compute G Circuit families – Inverting Static CMOS – Noninverting Static CMOS – Footed Domino – Footless Domino Logical Effort of Higher Valency Adders Slide 11 Valency 2 G0 4 P1 G1 G1 4 4 1 G0 2 P1 2 Gi:j P1 2 P0 2 P1 2 P0 2 1 1 1 P1:0 Logical Effort of Higher Valency Adders G0 G1 1 2 P0 2 P1 2 K0 2 K1 H G1:0 H P1:0 H K1:0 1 Slide 12 Valency 4 P1 P2 P3 4 G2 G3 8 G2 G3 1 P3 P3 P P1 2 P0 4 G1 8 G0 8 8 8 G1 2 P2 G0 2 P1 4 4 G3:0 G0 G1 4 G2 4/3 G3 4 1 2 1 1 1 4P0 P1 4 K0 4 4 P2 4 P3 4 G3:0 P3:0 K3:0 K1 2 K2 4/3 K3 1 P3:0 Logical Effort of Higher Valency Adders Slide 13 Delay Parameters (v=2) Term Cell Inverting CMOS Noninverting CMOS Footed Domino Footless Domino PDg 7/3 7/3 + 1 6/3 + 5/6 4/3 + 5/6 PDp 2 2+1 3/3 + 5/6 2/3 + 5/6 LEg1 5/3 1/2 * 5/6 1/3 * 5/6 LEg0 6/3 3/3 * 5/6 2/3 * 5/6 LEp1 Gray 6/3 3/3 * 5/6 2/3 * 5/6 LEp1 Black 10/3 6/3 * 5/6 4/3 * 5/6 LEp0 Black 4/3 3/3 * 5/6 2/3 * 5/6s Logical Effort of Higher Valency Adders Slide 14 Delay Parameters (v=3) Term Cell Inverting CMOS Noninverting CMOS Footed Domino Footless Domino PDg 13/3 13/3 + 1 10/3 + 5/6 7/3 + 5/6 PDp 3 3+1 4/3 + 5/6 3/3 + 5/6 LEg2 7/3 4/9 * 5/6 1/3 * 5/6 LEg1 8/3 2/3 * 5/6 1/2 * 5/6 LEg0 9/3 4/3 * 5/6 3/3 * 5/6 LEp2 Gray 5/3 4/3 * 5/6 3/3 * 5/6 LEp1 Gray 9/3 4/3 * 5/6 3/3 * 5/6 LEp2 Black 10/3 8/3 * 5/6 6/3 * 5/6 LEp1 Black 14/3 8/3 * 5/6 6/3 * 5/6 LEp0 Black 5/3 4/3 * 5/6 3/3 * 5/6 Logical Effort of Higher Valency Adders Slide 15 Delay Parameters (v=4) Term Cell Inverting CMOS Noninverting CMOS Footed Domino Footless Domino PDg 17/3 17/3 + 1 13/3 + 5/6 10/3 + 5/6 PDp 4 4+1 5/3 + 5/6 4/3 + 5/6 LEg3 9/3 5/12 * 5/6 1/3 * 5/6 LEg2 10/3 5/9 * 5/6 4/9 * 5/6 LEg1 10/3 5/6 * 5/6 2/3 * 5/6 LEg0 12/3 5/3 * 5/6 4/3 * 5/6 LEp3 Gray 8/3 5/3 * 5/6 4/3 * 5/6 LEp2 Gray 6/3 5/3 * 5/6 4/3 * 5/6 LEp1 Gray 12/3 5/3 * 5/6 4/3 * 5/6 LEp3 Black 14/3 10/3 * 5/6 8/3 * 5/6 LEp2 Black 12/3 10/3 * 5/6 8/3 * 5/6 LEp1 Black 18/3 10/3 * 5/6 8/3 * 5/6 LEp0 Black 6/3 5/3 * 5/6 4/3 * 5/6 Logical Effort of Higher Valency Adders Slide 16 Wire Capacitance Wires contribute capacitance per length – Count tracks t spanned by each wire w = (wire cap / track) / unit inverter capacitance w = wt W = sum(wi) Logical Effort of Higher Valency Adders Slide 17 Delay Estimation Ideally choose best size for each gate D DF P NF 1/ N P – Only valid if wire parasitics are negligible Alternatively, make each stage have unit drive N D DF P W f i P W inverting static CMOS i 1 N /2 D DF P W 2 f 2i f 2i 1 P W others (2 gates/stage) i 1 How much performance does this cost? – Very little unless stage efforts are nonuniform – Overestimates delay of Sklansky / LF networks Logical Effort of Higher Valency Adders Slide 18 Method Create MATLAB models of adders – For each stage, list f, p, t • Depends on architecture, valency, circuit family – Use MATLAB to calculate total delays Compare ideal and unit drive delays for w = 0 – Verifies unit drive simplification Plot D vs. # of bits Logical Effort of Higher Valency Adders Slide 19 Ladner-Fischer Sklansky Kogge-Stone Han-Carlson Delay (FO4) Brent-Kung Results 30 30 30 30 20 20 20 20 10 10 10 10 0 0 30 0 0 30 0 0 30 50 50 50 0 0 30 20 20 20 20 10 10 10 10 0 0 30 50 0 0 30 50 0 0 30 50 0 0 30 20 20 20 20 10 10 10 10 0 0 30 0 0 30 0 0 30 50 50 50 20 20 20 10 10 10 10 50 0 0 30 50 0 0 30 50 20 20 20 10 10 10 10 50 Inverting CMOS 0 0 50 Noninverting CMOS 0 0 # of bits 50 Footed Domino 50 0 0 30 20 0 0 50 0 0 30 20 0 0 30 50 0 0 50 50 Footless Domino Valency 2 Valency 3 Valency 4 Logical Effort of Higher Valency Adders Slide 20 Conclusions For fast prefix networks, the logical effort model predicts that valency barely affects delay – Valency 2 designs are simpler – But most commercial designs use valency 4 Weaknesses of logical effort model – Overpredicts g for higher valency – Underpredicts p for higher valeny – Calibrate model through simulation – Or simulate entire designs Logical Effort of Higher Valency Adders Slide 21