Transcript ppt

The Art of Digital Design and
Fast Adder Circuits
Lecture Notes # 4
Shantanu Dutt
Electrical & Computer Eng.
University of Illinois at Chicago
Outline
• Different dependency aspects in divide-&conquer (D&C)
• Techniques for tackling dependency aspects in
D&C
• Application to adder designs---ripple carry, treebased carry-lookahead, carry select
Dependency Aspects in D&C
Legend
Stitch-up of solns to A1 and A2
to form the complete soln to A
Root problem A
D&C tree arc
Data flow arc
Subprob. A1
A1,1
A1,2
Subprob. A2
A2,1
A2,2
• Q: Is there a data dependency between A1 and A2, i.e., does solution of A2
depend on some o/p generated by A1 or vice versa?
• If there is no dependency, then A1 and A2 can be solved independently and
some stitch-up logic used to combine the o/ps of A1 and A2 to obtain the o/p
of A. Example design problems are n-bit comparison, sorting of n #s
• If there is a dependency between A1 and A2 there are a few strategies that
can be used to design such circuits---note that a stitch-up logic can still be
needed for D&C partitioning w/ dependency of a design problem.
Dependency Aspects in D&C
The Wait Strategy
Root problem A
Subprob. A2
Subprob. A1
Data flow
• Strategy 1: Wait for required o/p of A1 and then perform A2, e.g.,
as in a ripple-carry adder: A = n-bit addition, A1 = (n/2)-bit addition
of the L.S. n/2 bits, A2 = (n/2)-bit addition of the M.S. n/2 bits
• No concurrency between A1 and A2: t(A) = t(A1) + t(A2) +
t(stich-up) = 2*t(A1) + t(stich-up) if A1 and A2 are the same
problems of the same size (w/ different i/ps)
Dependency Aspects in D&C
The “Design-for-all-cases and Select” Strategy
Root problem A
00
Subprob. A1
Subprob. A2
I/p00
01
10
Subprob. A2
I/p01
Subprob. A2
I/p10
4-to-1 Mux
• Strategy 2: For a k-bit i/p from A1 to A2,
design 2**k copies of A2 each with a
different hardwired k-bit i/p to replace the
one from A1.
• Select the correct o/p from all the copies
of A2 via a (2**k)-to-1 Mux that is selected
by the k-bit o/p from A1 when it becomes
available
• E.g., carry-select adder
• t(A) = max(t(A1), t(A2)) + t(Mux) + t(stichup) = t(A1) + t(Mux) + t(stitch-up) if A1 and
A2 are the same problems
Select i/p
I/p11
11
Subprob. A2
• Other variations---“Predict Strategy”: Have a single copy of A2 but choose a highly likely
value of the k-bit i/p and perform A1, A2 concurrently. If after k-bit i/p from A1 is available and
selection is incorrect, re-do A2 w/ correct available value.
• t(A) = p(correct-choice)*max(t(A1), t(A2)) +[(1-p(correct-choice)]*t(A2) + t(Mux) + t(stich-up),
where p(correct-choice) is probability that our choice of the k-bit i/p for A2 is correct
• Need a completion signal to indicate when the final o/p is available for A; assuming worstcase time (when the choice is incorrect) is meaningless is such designs
Dependency Aspects in D&C---The “Lookahead” Strategy
Concept
Example of an unstructured logic for A2
v’ x’
u v
x
w’ x
yw
z’ a1u’ x
a1
v’ x’
u v
x
w’ x
yw
z’
u’ x
Root problem A
Subprob. A1
Subprob. A2
A2_dep
Data flow
A2_indep
or
A2_lookahd
A2
A2_indep
Critical path after
a1 avail (8-unit delay)
a2
A2_dep
Critical path after
a1 avail (4-unit delay)
a2
• Strategy 3: Redo the design of A2 so that it can do as much processing as possible that is independent of
the i/p from A1 (A2_indep = A2_lookahd). This is the “lookahead” computation that prepares for the final
computation of A2 (A2_dep) that can start once A2_indep and A1 are done.
• t(A) = max(t(A1), t(A2_indep)) + t(A2_dep) + t(stitch-up)
• E.g., Carry-looakahead adder --- does lookahead computation; also looakahead compuattion is
associative, so doable in (log n). Overall computation is also doable in (log n) time.
• A less structured example: Let a1 be the i/p from A1 to A2. If A2 has the logic a2 = v’x’ + uvx + w’xy +
wz’a1 + u’xa1. If this were implemented using 2-i/p AND/OR gates, the delay will be 8 delay units (1 unit =
delay for 1 i/p) after a1 is available. If the logic is re-structured as a2= (v’x’ + uvx + w’xy) + (wz’ + u’x)a1,
and if the logic in the 2 brackets are performed before a1 is available (these constitute A2_indep), then the
delay is only 4 delay units after a1 is available.
a1
Adder Circuits—From Slow to Fast
Tree CLA Adders
• First of all, can we generate multi-bit P,G signals formed from single-bit ones?
• Secondly, can we generate them fast, say, in (log n) time using a tree-structured
circuit?
• The answer is “Yes” to both Qs. For the 2nd Q, the answer is “Yes” since, P, G
operations are associative!
• Concept of the propagate Pk for k bits: Pk is 1 under the conditions that the carry
into the least-significant of the k bits should be the carry-out of the most-significant of
the k bits. In terms of the 1-bit pi’s this happens if and only if all the k bits are in
“propagate mode”, i.e., for all i, 1 <= i <= k, pi = 1. Thus Pk = pk-1 pk-2 ……… p0.
Since “and” is associative, the propagate is an associative operation and can thus be
generated using a tree-circuit in log n time.
Tree CLA Adders (contd)
•Concept of generate Gk for k bits: Gk is 1 under the conditions that the carry-out of the k bits
should be 1 irrespective of the carry-in to the k bits
• For k=2, this happens whenever g1=1 or (g0=1 and p1=1): G2 = g1 + p1g0
• Now consider k=3. Conceptually speaking, G3=1 iff g2=1 or G2(bits 1-0)=1 and p2=1. This
operates on the 1-bit g and 1-bit p for bit 2 and the 2-bit G for bits 1 & 0:
G3 = g2 + p2 G2(1-0) = g2 + [p2 (g1 + p1g0)] = g2 + p2g1 + p2p1g0
•However, G3=1 iff G2(bits 2-1)=1 or g0=1 and P2(bits 2-1)=1. This operates on the 2-bit G and
P for bits 2 & 1 and the 1-bit g and 1-bit p for bit 2:
G3 = G2(2-1) + P2(2-1)g0 = [g2 + p2g1] + [p2p1g0] = g2 + p2g1 + p2p1g0 (same as above!)
• In other words (g2,p2) gen [(g1,p1) gen (g0, p0)] = [(g2,p2) gen (g1,p1)] gen (g0, p0) --you can also come to the same conclusion using a truth table (TT).
• Hence generate (gen) is also an associative operation and can thus be generated using a
tree-circuit in log n time.
p2 g2
p1 g1
p0 g0
p2 g2
p1 g1
p0 g0
p3 g3
p2 g2
p1 g1
p0 g0
p3 g3
p2 g2
p1 g1
p0 g0
2
2
2
2
2
2
2
2
2
2
2
2
2
2
gen
gen
G2(2-1)
&
gen
gen
gen
G2(1-0)
G2(1-0)
G2(3-2)
gen
gen
gen
G3(2-0)
G3
G3
gen
G4
G2(1-0)
gen
G4
Tree CLA Adders (contd)
• In practice, instead of generating generates and propagates in a binary tree using 2-
bit prop, gen operations, 4-bit prop, gen operations are used as basic modules and
the higher-level generate and propagates are generated using a 4-ary tree.
p3 g3
p2 g2
p1 g1
p0 g0
2
2
2
2
i.e., G4 = g3 + p3g2 + p3p2g1 + p3p2p1g0
4-bit gen =
gen
Similarly for 4-bit propagates: P4 = p3p2p1p0
G2(1-0)
gen
G3(2-0)
gen
G4
(b) Basic 4-bit
(a) 4-bit G generation
(P,G)-module
using 2-bit G-operations
• We thus have the following 4-ary prop, gen (P, G)
tree using 4-bit (P,G) generation logic as the basic
module
(c) 4-ary (P,G)-tree
Tree CLA Adders (contd)