PowerPoint Presentation: EE5324 Multipliers and Shifters

Download Report

Transcript PowerPoint Presentation: EE5324 Multipliers and Shifters

EE 5324 – VLSI Design II
Part III: Multipliers and Shifters
Kia Bazargan
University of Minnesota
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
129
References and Copyright
• Textbooks referenced
 [WE92] N. H. E. Weste, K. Eshraghian
“Principles of CMOS VLSI Design: A System Perspective”
Addison-Wesley, 2nd Ed., 1992.
 [Rab96] J. M. Rabaey
“Digital Integrated Circuits: A Design Perspective”
Prentice Hall, 1996.
 [Par00] B. Parhami
“Computer Arithmetic: Algorithms and Hardware Designs”
Oxford University Press, 2000.
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
130
References and Copyright (cont.)
• Slides used(Modified by Kia when necessary)
 [©Hauck] © Scott A. Hauck, 1996-2000;
G. Borriello, C. Ebeling, S. Burns, 1995,
University of Washington
 [©Prentice Hall] © Prentice Hall 1995, © UCB 1996
Slides for [Rab96]
http://bwrc.eecs.berkeley.edu/Classes/IcBook/instructors.html
 [©Oxford U Press] © Oxford University Press,
New York, 2000
Slides for [Par00]
With permission from the author
http://www.ece.ucsb.edu/Faculty/Parhami/files_n_docs.htm
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
131
Why Multipliers?
• Used in a lot of DSP applications




Vector product, matrix multiplication
Convolution
Filtering (tap filters, FIR, …)
...
“At least one good reason for studying multiplication
and division is that there is an infinite number of ways
of performing these operations and hence there is an
infinite number of PhDs (or expense-paid visits to
conferences in USA) to be won from inventing new forms
of multiplier”
Alan Clements
The Principles of Computer Hardware, 1986
[Par00]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
132
Outline
• Serial Multiplier
• Multiplier arrays
• Carry save adder (CSA) and multiple
operand addition
• Booth encoding
• Pipelined multipliers
• Wallace tree
• Signed multiplication
• Shifters
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
133
Multiplication Example
• Example: 12x5
Multiplicand:
Multiplier:
1 1 0 0
0 1 0 1
1 1 0 0
0 0 0 0
1 1 0 0
0 0 0 0
0 1 1 1 1 0 0
12
5
4 partial products
60
• The partial product can be generated using an
array of AND gates
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
134
Outline
• Serial Multiplier
• Multiplier arrays
• Carry save adder (CSA) and multiple
operand addition
• Booth encoding
• Pipelined multipliers
• Wallace tree
• Signed multiplication
• Shifters
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
135
Sequential Multiplier
• Shift register
 Originally holds multiplicand
 Shifts it left for each partial product
• One bit of multiplier at a time presented to the
AND gates
2N bits
Shift Register
Initialized w/
mcand,
shifts it left
0
One bit of
mplier applied
each cycle
Adder
Register
[©Hauck]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
136
Sequential Multiplier – Resource Requirements
• Adder: 2N-bit
• Registers: 2N-bit wide
• Better design:
 Shift result register to
right
 Uses N AND gates
 Uses N-bit adder
Register
Register
Adder
Shift Register
Adder
Shift Register
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
[©Hauck]
137
Outline
• Serial Multiplier
• Multiplier arrays
• Carry save adder (CSA) and multiple
operand addition
• Booth encoding
• Pipelined multipliers
• Wallace tree
• Signed multiplication
• Shifters
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
138
Combinational Multiplier: Idea
• Use an array of AND gates to generate the partial
products in parallel
multiplicand
LSB
multiplier
1
0 LSB
1
1
1
1
1
1
0
0
0
0
1
0
0
0
Spring 2006
1
1
EE 5324 - VLSI Design II - © Kia Bazargan
0
0
[©Hauck]
139
Combinational Multiplier: Adding PProds
X3
Z7
Spring 2006
X2
X1
X0
X3
X2
X1
X0
HA
FA
FA
HA
X3
X2
X1
X0
FA
FA
FA
HA
X3
X2
X1
X0
FA
FA
FA
HA
Z6
Z5
Z4
Z3
Y3
Y2
Y0
Y1 Z
0
Z1
Z2
EE 5324 - VLSI Design II - © Kia Bazargan
[WE92] p547
[Rab96] p.409
140
Combinational Multiplier: Critical Path(s)
• A lot of critical paths: same delay. (AND gates not
shown)
MxN
Multiplier
M
FA
N
HA
FA
FA
HA
FA
FA
HA
Critical Path 1
Critical Path 2
FA
FA
FA
HA
Delay=(M+N-2)tcarry+(N-1)tsum+tAND
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
[Rab96] p.410
141
Combinational Multiplier: Layout
• Better floorplan for compact layout:
HA
FA
FA
HA
FA
FA
FA
HA
FA
FA
FA
HA
 Send partial product
diagonally
 Results in better area
 (AND gates and
hence the first
row not shown)
[WE92] p548
[Rab96] p.412
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
142
Outline
• Serial Multiplier
• Multiplier arrays
• Carry save adder (CSA) and multiple
operand addition
• Booth encoding
• Pipelined multipliers
• Wallace tree
• Signed multiplication
• Shifters
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
143
Carry-Save Adder: the Idea
• When adding k n-bit numbers, don’t need to
optimize the carry chain of each of the rows
 Below is the old-style ripple-adder
FA
Spring 2006
HA
FA
FA
FA
FA
FA
HA
FA
FA
HA
EE 5324 - VLSI Design II - © Kia Bazargan
HA
144
Carry-Save Adder: structure
• Postpone the “carry propagation” operation to the
last stage
Delay=N.tcarry+
tand +
tmerge
CSA
HA
HA
HA
HA
HA
FA
FA
FA
HA
FA
FA
FA
FA
FA
HA
HA
Vector merging stage
[Rab96] p.411
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
145
Carry-Save Adder: Details
F
Spring 2006
H
H
F
F
F
F
F
F
F
H
EE 5324 - VLSI Design II - © Kia Bazargan
H
146
CSA: Intermediate FA Cells
• Better to have the same sum and carry delays
(both contribute to critical path)
P
A
Ci
P
S
P
B
A
B
Ci
P
P
A
A
Setup
Spring 2006
Co
P
Ci
P
EE 5324 - VLSI Design II - © Kia Bazargan
[Rab96] p.410
147
Outline
• Serial Multiplier
• Multiplier arrays
• Carry save adder (CSA) and multiple
operand addition
• Booth encoding
• Pipelined multipliers
• Wallace tree
• Signed multiplication
• Shifters
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
148
Booth Multiplier: an Introduction
• Recode each 1 in multiplier as “+2-1”
 Converts sequences of 1 to 10…0(-1)
 Might reduce the number of 1’s
0
0
1
1
1
1
1
1
0
0
0
0
+1 -1
+1 -1
+1 -1
+1 -1
+1 -1
+1 -1
0
Spring 2006
1
0
0
0
0
0
EE 5324 - VLSI Design II - © Kia Bazargan
-1
149
Booth Multiplier: Recoding (Encoding) Example
0
1
1
(+1 -1)
(+1 -1)
+1
0
0
1
1
1
0
(+1 -1)
(+1 -1)
(+1 -1)
-1 +1
0
0
-1
0
0
1
0
(+1 -1)
0
0 +1 -1
0
• If you use the last row in multiplication, you
should get exactly the same result as using the
first row (after all, they represent the same
number!)
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
150
Booth Recoding: Multiplication Example
Sign extension
1
1
1
0
0 0
0 0 1
0 0 1
Spring 2006
1
0
0
1
0
0 0
0 1
+1 0
0 0
1 0
0 0
0 0
0
1 0
1
1
0
0
1
0
1
1
-1
0
0
0
0
0
0
1 0 0
EE 5324 - VLSI Design II - © Kia Bazargan
6x
14
(-6)
84
151
Booth Recoding: Advantages and Disadvantages
• Depends on the architecture
 Potential advantage: might reduce the # of 1’s
in multiplier
• In the multipliers that we have seen so far:
 Doesn’t save in speed
(still have to wait for the critical path, e.g., the shiftadd delay in sequential multiplier)
 Increases area: recoding circuitry AND subtraction
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
152
Modified Booth Multiplier: Idea
• Group pairs, leaving –2, -1, 0, 1, 2
 Grouping reduces # of partial products by half
• Booth recoding results in:
 Gets rid of 3’s (sequences of 1’s in general)
0
1
1
(+1 -1)
(+1 -1)
+1 0
+2
0
1
1
1
0
(+1 -1)
(+1 -1)
(+1 -1)
-1 +1
-1
0
0
0
-1 0
-2
0
0
1
0
(+1 -1)
0 +1 -1 0
+1
-2
[©Hauck]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
153
Modified Booth Multiplier: Idea (cont.)
• Can encode the digits by looking at three bits at a
time
• Booth recoding table:
i+1
i
i-1
add
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0*M
1*M
1*M
2*M
–2*M
–1*M
–1*M
0*M
Spring 2006
 Must be able to add
multiplicand times –2, -1,
0, 1 and 2
 Since Booth recoding got
rid of 3’s, generating
partial products is not that
hard (shifting and
negating)
EE 5324 - VLSI Design II - © Kia Bazargan
[©Hauck]
154
Modified Booth Multiplier: Idea (cont.)
• Interpretation of the Booth recoding table:
i+1
i
i-1
add
Explanation
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0*M
1*M
1*M
2*M
–2*M
–1*M
–1*M
0*M
No string of 1’s in sight
End of a string of 1’s
Isolated 1
End of a string of 1’s
Beginning of a string of 1’s
End one string, begin new one
Beginning of a string of 1’s
Continuation of string of 1’s
[Par] p. 160
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
155
(Modified) Booth Multiplier: Example
• Retire two bits per shift operation
• Addition: signed
0 0 1 1 0 1
 Sign extend 2 bits if adding
two partial products at a time
i
i-1
add
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0*M
1*M
1*M
2*M
–2*M
–1*M
–1*M
0*M
Spring 2006
0 -1 -2
1 1 0 0 1 1 0
1 1 1 1 0 0 1 1
0 0 0 0 0 0
1
i+1
1 1 1 0 1 0
13
-6
1
1
1 1 1 0 1 1 0 0 1 0
EE 5324 - VLSI Design II - © Kia Bazargan
156
Modified Booth Recoding: Summary
• Grouping multiplier bits into pairs
 Orthogonal idea to the Booth recoding
 Reduces the num of partial products to half
 If Booth recoding not used  have to be able to
multiply by 3 (hard: shift+add)
• Applying the grouping idea to Booth 
Modified Booth Recoding (Encoding)
 We already got rid of sequences of 1’s 
no mult by 3
 Just negate, shift once or twice
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
157
Modified Booth Multiplier: Summary (cont.)
• Uses high-radix to reduce number of intermediate
addition operands
 Can go higher: radix-8, radix-16
 Radix-8 should implement *3, *-3, *4, *-4
 Recoding and partial product generation becomes
more complex
• Can automatically take care of signed
multiplication
 (we will see why)
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
158
Outline
• Serial Multiplier
• Multiplier arrays
• Carry save adder (CSA) and multiple
operand addition
• Booth encoding
• Pipelined multipliers
• Wallace tree
• Signed multiplication
• Shifters
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
159
Pipelined Multipliers
• Insert registers (latches) between rows
• Insert registers for bits of multiplier
 Schedule MSB bits to arrive later
Spring 2006
HA
FA
FA
HA
FA
FA
FA
HA
FA
FA
FA
HA
EE 5324 - VLSI Design II - © Kia Bazargan
160
Pipelined Multiplier: Example
a4
a3
a2
a1
a0
x0 x1 x2 x3 x4
Sum/
carry
path
FA with
AND gate
and latches
(for ai,
intermediate
sum and
carry)
Latch
FA
p9
Spring 2006
p8
p
7
p6
p p p p p p
5 4 3 2 1 0
EE 5324 - VLSI Design II - © Kia Bazargan
[Par00] p186
[© Oxford U Press]
161
Outline
• Serial Multiplier
• Multiplier arrays
• Carry save adder (CSA) and multiple
operand addition
• Booth encoding
• Pipelined multipliers
• Wallace tree
• Signed multiplication
• Shifters
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
162
Wallace Tree: Idea
• Idea: divide & conquer
• Why add the k numbers one by one?
 Tree structure  logarithmic
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
......
......
......
......
......
......
......
......
......
......
......
......
......
......
[Par00] p131
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
163
Wallace Tree Example
Delay = 4 CSA + 1 CLA
[Par00] p130
[© Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
164
Wallace Tree: Structure for 7 k-bit Numbers
[0,k-1] [0,k-1] [0,k-1]
K-bit CSA
[1,k]
[0,k-1]
[0,k-1] [0,k-1] [0,k-1]
K-bit CSA
[1,k]
[0,k-1]
[0,k-1]
K-bit CSA
[1,k]
[0,k-1]
K-bit CSA
[2,k+1]
‘0’,[2,k]
[k+1]
[1,k]
K-bit CSA
[2,k+1]
[1,k-1], ‘0’
[1,k+1]
[2,k+1]
K-bit CPA
[k+2]
Spring 2006
[2,k+1]
EE 5324 - VLSI Design II - © Kia Bazargan
[1]
[0]
[Par00] p131
165
Wallace Tree: Timing
• At each step, # of operands reduces to 2/3
n k-bit numbers
CSA
CSA
(2/3) n
nums
CSA
(2/3)2 n
CSA
CSA
CSA
CSA
CSA
CSA
CSA
CSA
CSA
CSA
CSA
CSA
...
(2/3)h
Spring 2006
n=2
CSA
CSA
CSA
CSA
h
levels
CSA
EE 5324 - VLSI Design II - © Kia Bazargan
166
Wallace Tree: Timing (cont.)
• Delay depends on height h
• h = O ( log n )  Logarithmic delay
Max # N of k-bit numbers that can be added
using a Wallace tree of height h
h
N
h
0
1
2
3
4
5
6
2
3
4
6
9
13
19
7
8
9
10
11
12
13
N
h
28
42
63
94
141
211
316
14
15
16
17
18
19
20
N
474
711
1066
1599
2398
3597
5395
[Par00] p132
[© Oxford U Press]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
167
Outline
• Serial Multiplier
• Multiplier arrays
• Carry save adder (CSA) and multiple
operand addition
• Booth encoding
• Pipelined multipliers
• Wallace tree
• Signed multiplication
• Shifters
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
168
Multiplying Signed Numbers
• Coding of the numbers
 Signed-magnitude  trivial
 2’s complement?
• 2’s complement
 Mplier positive, Mcand +/- :
o Sign extend the partial products when adding up
o Example:
0
00
000
000
Spring 2006
0
0
0
1
0
0
1
1
0
1
0
0
01
11
01
1
+5x
+3
111
+15
1 1 1
1
0 0 0
000
111
1 1
EE 5324 - VLSI Design II - © Kia Bazargan
1
0
1
0
0
0
0
0
0
0
1
0
11
11
11
1
-5x
+3
001
-15
169
Multiplying Signed Numbers (cont.)
• 2’s complement (cont.)
 Mplier negative, Mcand +/- :
o Ad-hoc solution: convert negative Mplier to positive,
do the multiplication, negate the result
o Example:
1011
1101
11
11
11
1
-5x
+3
001
-15
0001111
+15
-5x
-3
1 1 1
1
0 0 0
000
111
1 1
Spring 2006
1
0
1
0
0
0
0
EE 5324 - VLSI Design II - © Kia Bazargan
0
0
0
1
0
170
Multiplying Signed Numbers: Efficient Method
• Using almost the same architecture, we can do
signed mult w/o negating the result
• Idea: “What if we had negated the mplier?”
M
1

0101 =+5x
11 0 1 = -3
• Consider  and  as positive magnitudes (forget
about the 2’s complement convention for now)
• We want to use computation:  . M
 Previously, we negated 1  to get 0  , then
 . M and negated it
negate
1 1 0 1 =-3
0 0 1 1 =+3


computed
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
171
Multiplying Signed Numbers: Efficient Method
• The negation process
k-1 k-2 . . . 1 0
1
-

+
0

negate
1

=
1


= 2k – 1
= 2k – (2k-1 + ) = 2k – 2k-1 - 
= (2k – 2k-1) -  = 2k-1 - 
k-1 - 

=
2
= 0


Spring 2006
2k-1
= -
k-1 k-2 . . . 1 0
EE 5324 - VLSI Design II - © Kia Bazargan
172
Multiplying Signed Numbers: Efficient Method
Machine’s
understanding
Our interpretation
k-1 k-2 . . . 1 0
k-1 k-2 . . . 1 0
1

3
2 1 0
1
1 0 1 = - 0
= - 0
3

2 1 0
0 1 1
 = 2k-1 - 
3 = 23 - 5
• We used to compute: - ( . M)
-
 . M = - (2k-1 -  ) . M
= -2k-1 . M +

.M
Subtract the mcand for the last bit
Normal mult for the first k-1 bits
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
173
Multiplying Signed Numbers: Example
Normal mult
for the first k-1 bits
Use a subtractor
for the last pproduct
Spring 2006
0
0 1
1 0 1
1 1 1
0
1
0
0
0
1
0
1
1
1
0
1
0 1
0 1
0 1
0
(-5)
0 0 1
EE 5324 - VLSI Design II - © Kia Bazargan
+5x
-3
-15
174
Booth Recoding: Signed Numbers
• For unsigned numbers, increase bit-width on
mplier & mcand (add 0 to the left)

1
+1 0
1
1
0
1
0 -1 +1 0
1
1
0
0 -1 0
0
0
1
0
0 +1 -1 0

• If dealing with Signed numbers, discard
the extra bit
 Why does it work?
M. = M.( - 2k) = -M.(2k -) = -M.
( is the positive, 2’s compliment of )
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
175
Booth Recoding: Signed Mult Example
1
1
0
1
1
0
1
1 0
0 1 0
1 0 0 0 1
1
1
1
1
1
1
1
-1
0
0
0
1
0
0
0
0
1
1
1
1
0
1
1
-1
0
1
0
1
0
1
1
0
0
1
-1
0
-10x
-11
(+10)
1 1 1 0
Note: the column which has ‘1111’ generates
a carry of ’10’ if calculating by hand
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
176
Multiplier: Summary
• Goals different than addition
 In some structures, sum and carry delay equal
 Analysis more difficult : Multiple critical paths
• Different levels of optimization




Data encoding (Booth)
Architecture-level: Wallace Tree
Gate-level: pipelining
Transistor-level: equal sum, carry delays
• More to cover:
 Constant multiplication
 Floating point, precision
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
177
Outline
• Serial Multiplier
• Multiplier arrays
• Carry save adder (CSA) and multiple
operand addition
• Booth encoding
• Pipelined multipliers
• Wallace tree
• Signed multiplication
• Shifters
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
178
Shift and Rotate Operations
• Used in:
 Microprocessors
 Encryption algorithms
• If fixed shift, simply wire the inputs to the correct
output positions
• Variable shift
 One-bit shifter
 Barrel shifter
 Logarithmic shifter
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
179
One-bit Shifter
Right NOP Left
Ai
Bi
Ai-1
Bi-1
Bit-slice i
[©Prentice Hall]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
180
Simple n-bit Shifter
• Quadratic number of transistors
• One switch per path
in1
in2
in3
in4
out1
out2
out3
out4
[©Hauck]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
181
Barrel Shifter
A3
A2
B3
Sh1
B2
Data Wire
Sh2
A1
A0
B1
Sh3
B0
Sh0
Spring 2006
Bit 3 wrapped
around
Sh1
Sh2
Sh3
EE 5324 - VLSI Design II - © Kia Bazargan
Control Wire
Area
dominated by
wiring
[©Prentice Hall]
182
Barrel Shifter: Layout Example
A3
A2
A1
A0
Sh0
Sh1
Sh2
Sh3
Buffer
[©Prentice Hall]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
183
Logarithmic Shifter
i1
i2
i3
i4
S1
S2
S1'
S2'
S1
S1
S2
S2
S1'
S2'
S1
S1
S2
S2
S1'
S2'
S1
S1
S2
S2
S1'
S2'
S1
S2
o1
Simplified structure
but more stages
(greater delay)
o2
o3
o4
[©Hauck]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
184
Logarithmic Shifter: Layout
[©Prentice Hall]
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
185
Shift: Summary
• Trade-off between area, delay
 Barrel shifter: fastest O(1), n2 transistors
 Logarithmic shifter: O(log n), n log n transistors
 One-bit shifter: O(n), n transistors
• Barrel shifter: wire-dominated circuit
Spring 2006
EE 5324 - VLSI Design II - © Kia Bazargan
186