VLSI Arithmetic Adders & Multipliers

Download Report

Transcript VLSI Arithmetic Adders & Multipliers

VLSI Arithmetic
Adders
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Introduction
• Digital Computer Arithmetic belongs to
Computer Architecture, however, it is also an
aspect of logic design.
• The objective of Computer Arithmetic is to
develop appropriate algorithms that are
utilizing available hardware in the most
efficient way.
• Ultimately, speed, power and chip area are
the most often used measures, making a
strong link between the algorithms and
technology of implementation.
Oklobdzija 2004
Computer Arithmetic
2
Basic Operations
•
•
•
•
Addition
Multiplication
Multiply-Add
Division
• Evaluation of Functions
• Multi-Media
Oklobdzija 2004
Computer Arithmetic
3
Addition of Binary Numbers
Addition of Binary Numbers
Full Adder. The full adder is the fundamental building block
of most arithmetic circuits:
ai
Cout
bi
Full
Adder
Cin
si
The sum and carry outputs are described as:
si  ai bi ci  ai bi ci  ai bi ci  ai bi ci
ci1  ai bi ci  ai bi ci  ai bi ci  ai bi ci  ai bi  ai ci  bi ci
Oklobdzija 2004
Computer Arithmetic
5
Addition of Binary Numbers
Inputs
Outputs
ci
ai
bi
si
ci+1
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
1
1
0
1
0
0
1
0
0
0
1
0
1
1
1
Oklobdzija 2004
Computer Arithmetic
Propagate
Generate
Propagate
Generate
6
Full-Adder Implementation
Full Adder operations is defined by equations:
si  aibi ci  aibi ci  aibi ci  aibi ci  ai  bi  ci  pi  ci
ci 1  aibi ci  aibi ci  aibi  gi  pi ci
ai bi
Carry-Propagate: pi  ai  bi
and Carry-Generate gi
g i  ai  bi
cout
cin
One-bit adder could be
implemented as shown
Oklobdzija 2004
Computer Arithmetic
si
7
High-Speed Addition
ci 1  gi  pi ci
ai
gi  ai  bi
bi
pi  ai  bi
0
cout
cin
s 1
One-bit adder could be
implemented more efficiently
because MUX is faster
Oklobdzija 2004
si  pi  ci
si
Computer Arithmetic
8
The Ripple-Carry Adder
Oklobdzija 2004
Computer Arithmetic
9
The Ripple-Carry Adder
A0
A1
B0
Co,0
Ci,0
FA
S0
FA
A2
B1
A3
B2
Co,2
C o,1
B3
Co,3
FA
FA
S2
S3
(= C i,1)
S1
Worst case delay linear with the number of bits
td = O(N)
t adder   N – 1 tcarry + tsum
Goal: Make the fastest possible carry path circuit
From Rabaey
Oklobdzija 2004
Computer Arithmetic
10
Inversion Property
A
Ci
A
B
FA
Co
Ci
S
B
FA
Co
S
S  A B C i  = S  A B  Ci 
C  A B C  = C  A B  C 
o
i
o
i
From Rabaey
Oklobdzija 2004
Computer Arithmetic
11
Minimize Critical Path by Reducing Inverting
Stages
Even Cell
A0
Ci,0
A1
B0
FA’
C o,0
S0
B1
FA’
S1
A2
Co,1
A3
B2
FA’
Odd Cell
C o,2
S2
B3
FA’
C o,3
S3
Exploit Inversion Property
From Rabaey Note: need 2 different types of cells
Oklobdzija 2004
Computer Arithmetic
12
Ripple Carry Adder
Carry-Chain of an RCA implemented using multiplexer from the
ai+2 library:
bi+2
ai+1
bi+1
ai
bi
standard cell
Critical Path
ci+1
cout
ci
cin
Oklobdzija, ISCAS’88
si+2
Oklobdzija 2004
si+1
Computer Arithmetic
si
13
Manchester Carry-Chain
Realization of the Carry Path
• Simple and very popular scheme for implementation of
carry signal path
Vdd
Vdd
Vdd
Vdd
Vdd
Vdd
Vdd
Vdd
Generate
device
Carry in
Carry out
+
+
+
+
+
+
+
+ Propagate
device
Predischarge
& kill device
Oklobdzija 2004
Computer Arithmetic
14
Original Design
T. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers:
A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.
Oklobdzija 2004
Computer Arithmetic
15
Carry-Skip Adder
MacSorley, Proc IRE 1/61
Lehman, Burla, IRE Trans on Comp, 12/61
Oklobdzija 2004
Computer Arithmetic
16
Carry-Skip Adder
G1
Ci,0
P0
G1
C o,0
P0
FA
P2
FA
G2
Co,1
FA
G3
Co,3
FA
G1
C o,0
P3
Co,2
FA
P0 G1
G2
C o,1
FA
Ci,0
P2
P3
G3
BP=P oP1 P2 P3
C o,2
FA
FA
Multiplexer
P0
Co,3
Bypass
From Rabaey
Idea: If (P0 and P1 and P2 and P3 = 1)
then C o3 = C 0, else “kill” or “generate”.
Oklobdzija 2004
Computer Arithmetic
17
Carry-Skip Adder:
N-bits, k-bits/group, r=N/k groups
a (r-1)k b(r-1)k a (r-1)kb (r-1)k
a N-1bN-1a N-k-1b N-k-1
OR
Cout
+
...
...
+
... ...
SN-1 S N-k-1
Pr-1
AND
...
G r1
OR
+
G1
+
... ...
S (r-1)k-1
...
...
OR
OR
Gr
a k-1 b k-1 a0 b0
a 2k-1b 2k-1 ak bk
... ...
... ...
S (r-2)k
Pr-2
...
AND
S
2k-1
Sk
P1
AND
Cin
Go
S
k-1
S
0
P0
AND
critical path, delay =2(k-1)+(N/2-2)
Oklobdzija 2004
Computer Arithmetic
18
Carry-Skip Adder
tp
ripple adder
bypass adder
N

td  2k  1t RCA    2 t SKIP
 2k

4..8
Oklobdzija 2004
N
Computer Arithmetic
19
Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004
Computer Arithmetic
20
Carry-chain of a 32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
a N-1b N-1
C out
..
...
Gm
Gm-1
SN-1
Pm
Pm-1
Gm
Gm-1
a
aj b j
..
.
Gm-2
i
...
..
G2
G1
G0
Si
Pm-2
P2
...
a0 b0
...
Sj
Gm-2
bi
Cin
S0
P1
G2
G1
skiping
P0
G0
...
C ou
Cin
t
rippling
Oklobdzija 2004
Carry signal path
Computer Arithmetic
21
Carry-chain of a 32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
6
1 3
4
5
5
=9
4
3 1
Any-point-to-any-point delay = 9 
as compared to 12  for CSKA
Oklobdzija 2004
Computer Arithmetic
22
Delay Calculation for Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
P0
Ci,0
G0
P1
P2
G1
P3
G2
BP
Co,3
G3
BP
Delay model:
Oklobdzija 2004
Computer Arithmetic
23
Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Variable Group Length
td  c1  c2 N  c3
Oklobdzija, Barnes, Arith’85
Oklobdzija 2004
Computer Arithmetic
24
Carry-chain of a 32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)
Variable Block Lengths
•
•
Oklobdzija 2004
No closed form solution for delay
It is a dynamic programming problem
Computer Arithmetic
25
Delay
Delay Comparison: Variable Block Adder
16
VBA
14
12
CLA
10
8
VBA- Multi-Level
6
4
2
0
4
11
18
25
32
39
46
53
60
Size N
Oklobdzija 2004
Computer Arithmetic
26
VLSI Arithmetic
Lecture 4
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Carry-Lookahead Adder
(Weinberger and Smith, 1958)
ARITH-13: Presenting Achievement Award to Arnold Weinberger of IBM (who
invented CLA adder in 1958)
Ref: A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”,
National Bureau of Standards, Circ. 591, p.3-12, 1958.
Oklobdzija 2004
Computer Arithmetic
28
CLA Definitions: One-bit adder
ci 1  gi  pi ci
ai
gi  ai  bi
bi
pi  ai  bi
0
cout
cin
s 1
si  pi  ci
si
Oklobdzija 2004
Computer Arithmetic
29
CLA
Definitions:
4-bit
Adder
a
a
a
b
b
b
b
ai+3
i+2
i+3
Ci+4
Ci+3
gi+3
pi+3
i+1
i+2
Ci+2
gi+2
pi+2
i
i+1
i
Ci+1
gi+1
pi+1
Ci
gi
pi
ci 1  aibi ci  aibi ci  aibi  gi  pi ci
ci  2  gi 1  pi 1ci 1  gi 1  pi 1 ( gi  pi c1 )
 gi 1  pi 1gi  pi 1 pi c1
Oklobdzija 2004
Computer Arithmetic
30
Carry-Lookahead
a
a 4-bits
a
a Adder:
b
b
b
b
i+3
i+2
i+3
Ci+4
Ci+3
gi+3
pi+3
i+1
i+2
Ci+2
gi+2
pi+2
i
i+1
i
Ci+1
gi+1
pi+1
Ci
gi
pi
ci 3  gi  2  pi  2ci  2  gi  2  pi  2 ( gi 1  pi 1gi  pi 1 pi ci )
 gi  2  pi  2 gi 1  pi  2 pi 1gi  pi  2 pi 1 pi ci
ci  4  gi 3  pi 3ci 3  gi 3  pi 3 ( gi  2  pi  2 gi 1  pi  2 pi 1gi )
 gi 3  pi 3 gi  2  pi 3 pi  2 gi 1  pi 3 pi  2 pi 1gi  pi 3 pi  2 pi 1 pi ci
Gj
Oklobdzija 2004
Pj
Computer Arithmetic
31
Carry-Lookahead Adder
G j  gi 3  pi 3 gi 2  pi 3 pi 2 gi 1  pi 3 pi 2 pi 1gi
Pj  pi 3 pi 2 pi 1 pi
ai+3 bi+3
One gate delay 
to calculate p, g
One  to calculate
P and two for G
ai+1 bi+1
ai
bi
Cj
Cin
gi+1pi+1
gi+1pi+1
C4(j+1)
Three gate delays
To calculate C4(j+1)
gi+1pi+1
gi pi
P, G Group
C4j+3
c4( j 1)  G j  Pj c j
Oklobdzija 2004
ai+2 bi+2
C4j+2
Gj
C4j+1
Pj
Compare that to 8  in RCA !
Computer Arithmetic
32
Carry-Lookahead Adder
(Weinberger and Smith)
G* j  Gi 3  Pi 3Gi  2  Pi 3Pi  2Gi 1  Pi 3Pi  2 Pi 1Gi
P* j  Pi3 Pi 2 Pi 1Pi
Gj+3 Pj+3
Pj+2
Gj+2
Gj+1 Pj+1
Gj Pj
C4(j+1)
C4j
P*
G*
c4( j 1)  G *k P *k c4 j
C4j+3
C4j+2
C4j+1
Additional two gate delays
C16 will take a total of 5 vs. 32 for RCA !
Oklobdzija 2004
Computer Arithmetic
33
32-bit Carry Lookahead Adder
ai
C28
C24
individual adders
generating: gi, pi,
and sum Si
C20
C12
bi
C8
C4
Cin
C16
Carry-lookahead super- blocks of
4-bits blocks generating:
G*i, P*i, and Cin for the 4-bit
blocks
Cout
Cout
Cin
Cin
Carry-lookahead blocks of
4-bits generating:
Gi, Pi, and Cin for the
adders
Group producing final
carry Cout and C16
Critical path delay =  (for gi,pi)+2x2 (for G,P)+3x2 (for Cin)+1XOR- (for Sum) = appx. 12 of delay
Oklobdzija 2004
Computer Arithmetic
34
Carry-Lookahead Adder
(Weinberger and Smith: original derivation, 1958 )
Oklobdzija 2004
Computer Arithmetic
35
Carry-Lookahead Adder
(Weinberger and Smith: original derivation )
Oklobdzija 2004
Computer Arithmetic
36
Carry-Lookahead Adder (Weinberger and Smith)
please notice the similarity with Parallel-Prefix Adders !
Oklobdzija 2004
Computer Arithmetic
37
Carry-Lookahead Adder (Weinberger and Smith)
please notice the similarity with Parallel-Prefix Adders !
Oklobdzija 2004
Computer Arithmetic
38
Motorola: CLA Implementation
Example
A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS
Adder Design”,
Proceedings of the IEEE Custom Integrated Circuits
Conference, May 3-6, 1992.
P63
G63
P62
G62
P61
G61
P60
G60
P59
P63:48
P63:0
C60
P63:60
P59:48
G59:48
P55:48
G55:48
P51:48
G51:48
P11:0
G11:0
P7:0
G7:0
P3:0
G3:0
C4
C8
C12
C16
C32
C48
C52
C56
G63:0
P59:56
G59:56
P55:52
G55:52
P51:48
G51:48
P15:12
G15:12
P11:8
G11:8
P7:4
G7:4
P3:0
G3:0
G15:0
P15:0
G31:16
P31:16
G47:32
G63:48
61
CARRY
BLOCK
G63:60
C
P,G62:60 63
C
P,G61:6062
C
P,G60
PG BLOCK
PG BLOCK
G56
P55
G
52
P47:32
PG BLOCK
1.7nS
P51
P,G2:0
P,G1:0
P,G0
C16
P15:0
G31:0
C32
P31:0
G47:0
P47:0
C48
3.75nS
G15:0
C64
C0
2.35nS
2.0nS
G48
P47
G32
P
31
G16
P15
G12
P
PG BLOCK
PG BLOCK
40
Computer Arithmetic
Oklobdzija 2004
11
G8
P
7
G4
P3
G3
P2
G2
P1
G1
P0
G0
1.05nS
...
...
...
...
...
...
...
...
4.8nS
Critical path in Motorola's 64-bit CLA
2.7nS
Critical path: A, B - G0 - G3:0 - G15:0 - G47:0 - C48 - C60 - C63 - S63
Motorola's 64-bit
CLA
conventional PG Block
no better
situation here !
Basically, this is MCC performance with
Carry-Skip.
One should not expect any better results
than VBA.
Oklobdzija 2004
Computer Arithmetic
carry ripples locally
5-transistors in the path
41
Motorola's 64-bit
CLA
Modified PG Block
Intermediate propagate signals Pi:0
are generated to speed-up C3
still critical path resembles MCC
Oklobdzija 2004
Computer Arithmetic
42
Motorola's 64-bit CLA
3.9nS
1.8nS
2.2nS
3.55nS
2.9nS
Oklobdzija 2004
3.2nS
Computer Arithmetic
43
P63
G63
P62
G62
P61
G61
P60
G
60
P59
56
G
P
55
G52
PG BLOCK
PG BLOCK
P63:48
P63:0
C4
C8
C12
C16
C32
C48
C52
C56
P63:60
P59:48
G59:48
P55:48
G55:48
P51:48
G51:48
P11:0
G11:0
P7:0
G7:0
P3:0
G3:0
P31:16
G31:16
P15:0
G15:0
P47:0
G47:0
C32
P31:0
G31:0
C16
P15:0
G15:0
C64
3.75nS
C48
G63:0
C60
P47:32
G63:48
C61
P59:56
G59:56
P55:52
G55:52
P51:48
G51:48
P15:12
G15:12
P11:8
G11:8
P7:4
G7:4
P3:0
G3:0
G47:32
C G63:60
P,G62:60 63
C
P,G61:6062
P,G60
CARRY
BLOCK
44
3.2nS
Computer Arithmetic
P51
P,G2:0
P,G1:0
P,G0
2.7nS
C0
3.55nS
2.2nS
3.9nS
2.35nS
2.0nS
G48
P47
G32
P31
G16
P15
G12
P
11
G8
P
7
G4
P3
G3
P2
G2
P1
G1
P0
G0
PG BLOCK
1.7nS
2.9nS
Oklobdzija 2004
PG BLOCK
PG BLOCK
1.05nS
...
...
...
...
...
...
...
...
4.8nS
Critical path: A, B - G0 - G3:0 - G15:0 - G47:0 - C48 - C60 - C63 - S63
1.8nS