CSE 477. VLSI Systems Design - University of California

Download Report

Transcript CSE 477. VLSI Systems Design - University of California

CSE477
VLSI Digital Circuits
Fall 2002
Lecture 20: Adder Design
Mary Jane Irwin ( www.cse.psu.edu/~mji )
www.cse.psu.edu/~cg477
[Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
CSE477 L20 Adder Design.1
Irwin&Vijay, PSU, 2002
Review: Basic Building Blocks

Datapath

Execution units
- Adder, multiplier, divider, shifter, etc.



Control


Finite state machines (PLA, ROM, random logic)
Interconnect


Register file and pipeline registers
Multiplexers, decoders
Switches, arbiters, buses
Memory

Caches (SRAMs), TLBs, DRAMs, buffers
CSE477 L20 Adder Design.2
Irwin&Vijay, PSU, 2002
The 1-bit Binary Adder
Cin
A
B
1-bit Full
Adder
(FA)
Cout
G = A&B
P=AB
K = !A & !B
S
A
B
Cin
Cout
S
carry status
0
0
0
0
0
kill
0
0
1
0
1
kill
0
1
0
0
1
propagate
0
1
1
1
0
propagate
1
0
0
0
1
propagate
1
0
1
1
0
propagate
1
1
0
1
0
generate
1
1
1
1
1
generate
S = A  B  Cin
= P  Cin
Cout = A&B | A&Cin | B&Cin (majority function)
= G | P&Cin

How can we use it to build a 64-bit adder?

How can we modify it easily to build an adder/subtractor?

How can we make it better (faster, lower power, smaller)?
CSE477 L20 Adder Design.3
Irwin&Vijay, PSU, 2002
FA Gate Level Implementations

The way you learned to design in CSE271 and CSE471
A B Cin
A
t1
B Cin
t0
t2 t1
t0
t2
Cout
S
Cout
S
CSE477 L20 Adder Design.4
Irwin&Vijay, PSU, 2002
Review: XOR FA
Cin
A
S
B
Cout
16 transistors
CSE477 L20 Adder Design.5
Irwin&Vijay, PSU, 2002
Review: CPL FA
!B
!Cin
B
Cin
A
!S
!A
S
B
!B
Cin
A
!Cin
!Cout
B
Cin
!A
Cout
!B
!Cin
20+8 transistors, dual rail – beware of threshold drops
CSE477 L20 Adder Design.6
Irwin&Vijay, PSU, 2002
Review: Mirror Adder
24+4 transistors
A
8 B
8 B
8
4 B
4 Cin
4
kill
0-propagate
8
A 8
Cin
4
1-propagate
A
A
4 B
A 4
4 B
4
!Cout
2
generate
4
Cout = A&B | B&Cin | A&Cin
A
2 B
2 Cin 2
B
6
A
6
Cin 6
!S
Cin 3
A
3
B
3
SUM = A&B&Cin | COUT&(A | B | Cin)
Sizing: Each input in the carry circuit has a logical effort of 2 so the
optimal fan-out for each is also 2. Since !Cout drives 2 internal and 2
inverter transistor gates (to form Cin for the nms bit adder) should
oversize the carry circuit. PMOS/NMOS ratio of 2.
CSE477 L20 Adder Design.8
Irwin&Vijay, PSU, 2002
Mirror Adder Features

The NMOS and PMOS chains are completely
symmetrical with a maximum of two series transistors in
the carry circuitry, guaranteeing identical rise and fall
transitions if the NMOS and PMOS devices are properly
sized.

When laying out the cell, the most critical issue is the
minimization of the capacitances at node !Cout (four
diffusion capacitances, two internal gate capacitances,
and two inverter gate capacitances). Shared diffusions
can reduce the stack node capacitances.

The transistors connected to Cin are placed closest to the
output.

Only the transistors in the carry stage have to be
optimized for optimal speed. All transistors in the sum
stage can be minimal size.
CSE477 L20 Adder Design.9
Irwin&Vijay, PSU, 2002
A 64-bit Adder/Subtractor
add/subt


Ripple Carry Adder (RCA)
built out of 64 FAs
Subtraction – complement
all subtrahend bits (xor
gates) and set the low
order carry-in
RCA

advantage: simple logic,
so small (low cost)

disadvantage: slow (O(N)
for N bits) and lots of
glitching (so lots of energy
consumption)
CSE477 L20 Adder Design.10
A0
1-bit
FA
C1
S0
A1
1-bit
FA
C2
S1
A2
1-bit
FA
C3
S2
B0
B1
B2
...

C0=Cin
C63
A63
B63
1-bit
FA
S63
C64=Cout
Irwin&Vijay, PSU, 2002
Ripple Carry Adder (RCA)
A3 B3
A2 B2
A1 B1
A0 B0
FA
FA
FA
FA
Cout=C4
S3
S2
S1
C0=Cin
S0
Tadder  TFA(A,BCout) + (N-2)TFA(CinCout) + TFA(CinS)
T = O(N) worst case delay
Real Goal: Make the fastest possible carry path
CSE477 L20 Adder Design.11
Irwin&Vijay, PSU, 2002
Inversion Property

Inverting all inputs to a FA results in inverted values for
all outputs
A
Cout
B
FA
A
Cin

Cout
S
B
FA
Cin
S
!S (A, B, Cin) = S(!A, !B, !Cin)
!Cout (A, B, Cin) = Cout (!A, !B, !Cin)
CSE477 L20 Adder Design.12
Irwin&Vijay, PSU, 2002
Exploiting the Inversion Property
A3 B3
A2 B2
A1 B1
A0 B0
FA’
FA’
FA’
FA’
S3
S2
S1
S0
Cout=C4
C0=Cin
inverted cell regular cell
Minimizes the critical path (the carry chain) by
eliminating inverters between the FAs (will need to
increase the transistor sizing on the carry chain portion of
the mirror adder).

Now need two “flavors” of FAs
CSE477 L20 Adder Design.13
Irwin&Vijay, PSU, 2002
Fast Carry Chain Design

The key to fast addition is a low latency carry network

What matters is whether in a given position a carry is




generated
propagated
annihilated (killed)
Gi = Ai & Bi = AiBi
Pi = Ai  Bi (sometimes use Ai | Bi)
Ki = !Ai & !Bi
Giving a carry recurrence of
Ci+1 = Gi | PiCi
C1 = G0 | P0C0
C2 = G1 | P1G0 | P1P0 C0
C3 = G2 | P2G1 | P2P1G0 | P2P1P0 C0
C4 = G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 C0
CSE477 L20 Adder Design.15
Irwin&Vijay, PSU, 2002
Manchester Carry Chain

Switches controlled by Gi and Pi
!Ci+1
!Ci
Gi
Pi
clk

Total delay of



time to form the switch control signals Gi and Pi
setup time for the switches
signal propagation delay through N switches in the worst case
CSE477 L20 Adder Design.16
Irwin&Vijay, PSU, 2002
4-bit Sliced MCC Adder
A3
B3
A2
B2
A1
B1
A0
B0
clk
&

G P
&

G P
&

G P
&

G P
!C4
!C0
!C3
!C1
!C2




S3
S2
S1
S0
CSE477 L20 Adder Design.17
Irwin&Vijay, PSU, 2002
Domino Manchester Carry Chain Circuit
3
3
P3
1
Ci,4
P2
3
2
3
P1
3
P0
3
clk
4
1
G3 2
G2 3
G1 4
G0 5
Ci,0
2
3
4
5
6
clk
!(G0 | P0 Ci,0)
!(G2 | P2G1 | P2P1G0 | P2P1P0 Ci,0)
!(G1 | P1G0 | P1P0 Ci,0)
!(G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 Ci,0)
CSE477 L20 Adder Design.18
Irwin&Vijay, PSU, 2002
Binary Adder Landscape
synchronous word parallel adders
ripple carry adders (RCA)
carry prop min adders
T = O(N), A = O(N)
signed-digit
adders
fast carry prop
adders
residue adders
T = O(1), A = O(N)
Manchester
carry chain
T = O(N)
A = O(N)
CSE477 L20 Adder Design.19
carry
select
parallel
prefix
conditional
sum
T = O(log N)
A = O(N log N)
carry
skip
T = O(N),
A = O(N)
Irwin&Vijay, PSU, 2002
Carry-Skip (Carry-Bypass) Adder
Co,3
A3 B3
A2 B2
A1 B1
A0 B0
FA
FA
FA
FA
Ci,0
Co,3
S3
BP = P0 P1 P2 P3
S2
S1
S0
“Block Propagate”
If (P0 & P1 & P2 & P3 = 1) then Co,3 = Ci,0 otherwise
the block itself kills or generates the carry internally
CSE477 L20 Adder Design.20
Irwin&Vijay, PSU, 2002
Carry-Skip Chain Implementation
block carry-out
carry-out
BP
block carry-in
P3
P2
P1
P0
!Cout
Cin
G3
G2
G1
G0
BP
CSE477 L20 Adder Design.21
Irwin&Vijay, PSU, 2002
4-bit Block Carry-Skip Adder
bits 12 to 15
bits 8 to 11
bits 4 to 7
bits 0 to 3
Setup
Setup
Setup
Setup
Carry
Propagation
Carry
Propagation
Carry
Propagation
Carry
Propagation
Ci,0
Sum
Sum
Sum
Sum
Worst-case delay  carry from bit 0 to bit 15 = carry generated
in bit 0, ripples through bits 1, 2, and 3, skips the middle two
groups (B is the group size in bits), ripples in the last group from
bit 12 to bit 15
Tadd = tsetup + B tcarry + ((N/B) -1) tskip +B tcarry + tsum
CSE477 L20 Adder Design.22
Irwin&Vijay, PSU, 2002
Optimal Block Size and Time

Assuming one stage of ripple (tcarry) has the same delay
as one skip logic stage (tskip) and both are 1
TCSkA = 1 + B + (N/B-1) + B + 1
tsetup
ripple in
block 0
skips
ripple in
last block
tsum
= 2B + N/B + 1

So the optimal block size, B, is
dTCSkA/dB = 0  (N/2) = Bopt

And the optimal time is
Optimal TCSkA = 2((2N)) + 1
CSE477 L20 Adder Design.23
Irwin&Vijay, PSU, 2002
Carry-Skip Adder Extensions

Variable block sizes

A carry that is generated in, or absorbed by, one of the inner
blocks travels a shorter distance through the skip blocks, so can
have bigger blocks for the inner carries without increasing the
overall delay
Cout

Cin
Multiple levels of skip logic
Cout
Cin
skip level 1
skip level 2
CSE477 L20 Adder Design.24
AND of the
first level skip
signals (BP’s)
Irwin&Vijay, PSU, 2002
Carry-Skip Adder Comparisons
70
60
50
40
30
B=4
20
B=2
B=5
B=6
RCA
CSkA
VSkA
B=3
10
0
8 bits
CSE477 L20 Adder Design.25
16 bits
32 bits
48 bits
64 bits
Irwin&Vijay, PSU, 2002
Parallel Prefix Adders (PPAs)

Define carry operator € on (G,P) signal pairs
(G’’,P’’) (G’,P’)
G’’
G’
€
where
G = G’’  P’’G’
P = P’’P’
(G,P)

!G
P’’
€ is associative, i.e.,
[(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)]
€
€
€
CSE477 L20 Adder Design.31
€
Irwin&Vijay, PSU, 2002
PPA General Structure

Given P and G terms for each bit position, computing all
the carries is equal to finding all the prefixes in parallel
(G0,P0) € (G1,P1) € (G2,P2) € … € (GN-2,PN-2) € (GN-1,PN-1)

Since € is associative, we can group them in any order

but note that it is not commutative
Pi, Gi logic (1 unit delay)

Measures to consider


Ci parallel prefix logic tree
(1 unit delay per level)




Si logic (1 unit delay)
CSE477 L20 Adder Design.32

number of € cells
tree cell depth (time)
tree cell area
cell fan-in and fan-out
max wiring length
wiring congestion
delay path variation (glitching)
Irwin&Vijay, PSU, 2002
Brent-Kung PPA
G15 G14 G13 G12
p15 p14 p13 P12
€
€
G11 G10 G9
p11 P10 p9
€
€
G8 G7
P8 P7
€
G6 G5
P6 P5
€
€
G4 G3
P4 P3
€
€
€
€
G2
p2
G1
P1
G0
P0 C
in
€
€
C2
C1
€
€
€
€
€
€
€
€
€
C16 C15 C14 C13 C12 C11
CSE477 L20 Adder Design.34
€
€
C10 C9
€
C8 C7
€
C6 C5
€
C4 C3
Irwin&Vijay, PSU, 2002
Kogge-Stone PPF Adder
G15 G14 G13 G12
P15 P14 P13 P12
G11 G10 G9
P11 P10 P9
G8
P8
G7
P7
G6
P6
G5
P5
G4
P4
G3
P3
G2 G1
P2 P1
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
C8
C7
C16 C15 C14 C13 C12 C11 C10 C9
CSE477 L20 Adder Design.35
C6
C5 C4
Tadd = tsetup + log2N t€ + tsum
C3
G0
P0 C
in
€
€
C2
C1
Irwin&Vijay, PSU, 2002
More Adder Comparisons
70
60
50
RCA
CSkA
VSkA
KS PPA
40
30
20
10
0
8 bits
CSE477 L20 Adder Design.36
16 bits
32 bits
48 bits
64 bits
Irwin&Vijay, PSU, 2002
Next Lecture and Reminders

Next lecture

Multiplier Design
- Reading assignment – Rabaey, et al, 11.4

Reminders




Project final reports due December 5th
HW5 (last one!) due November 19th
Final grading negotiations/correction (except for the final
exam) must be concluded by December 10th
Final exam scheduled
- Monday, December 16th from 10:10 to noon in 118 and 121
Thomas
CSE477 L20 Adder Design.40
Irwin&Vijay, PSU, 2002