Transcript L3_Slides

Integer Multipliers
1
A
B
X
P
Multipliers
• A must have circuit in most DSP applications
• A variety of multipliers exists that can be chosen
based on their performance
• Serial, Serial/Parallel,Shift and Add, Array, Booth,
Wallace Tree,….
2
A
B
X
P
en
en
en
reset
converter
reset
reset
RA
converter
16x16
multiplier
RC
Converter
RB
3
A
B
X
P
Multiplication Algorithm
X= Xn-1 Xn-2 …………………X0
Y=Yn-1 Yn-2…………………….Y0
Multiplicand
Multiplier
Yn-1X0 Yn-2X0 Yn-3X0 …… Y1X0 Y0X0
Yn-1X1 Yn-2X1 Yn-3X1 …… Y1X1 Y0X1
Yn-1X2 Yn-2X2 Yn-3X2 …… Y1X2 Y0X2
…
…
…
…
….
….
….
….
….
Yn-1Xn-2 Yn-2X0 n-2 Yn-3X n-2 …… Y1Xn-2 Y0Xn-2
Yn-1Xn-1 Yn-2X0n-1 Yn-3Xn-1 …… Y1Xn-1 Y0Xn-1
----------------------------------------------------------------------------------------------------------------------------------------P2n-1
P2n-2
P2n-3
P2
P1
P0
4
.
1. Multiplication Algorithms
Implementation of multiplication of binary numbers boils down to how to do the the additions.
Consider the two 8 bit numbers A and B to generate the 16 bit product P. First generate the 64
partial Products and then add them up.
A7
B7
A6
B6
A5
B5
A4
B4
A3
B3
A2
B2
A1
B1
A0
B0
A7.B0 A6.B0 A5.B0 A4.B0 A3.B0 A2.B0 A1.B0 A0.B0
A7.B1 A6.B1 A5.B1 A4.B1 A3.B1 A2.B1 A1.B1 A0.B1
A7.B2 A6.B2 A5.B2 A4.B2 A3.B2 A2.B2 A1.B2 A0.B2
A7.B3 A6.B3 A5.B3 A4.B3 A3.B3 A2.B3 A1.B3 A0.B3
A7.B4 A6.B4 A5.B4 A4.B4 A3.B4 A2.B4 A1.B4 A0.B4
A7.B5 A6.B5 A5.B5 A4.B5 A3.B5 A2.B5 A1.B5 A0.B5
A7.B6 A6.B6 A5.B6 A4.B6 A3.B6 A2.B6 A1.B6 A0.B6
A3.B7 A2.B7 A1.B7 A0.B7 A3.B7 A2.B7 A1.B7 A0.B7
P15
P14
The equation is :
P13
P12
P11
P10
P9
P8
P7
P6
P5
P4
P3
P2
P1
P0
.
m1 n 1
P(m  n)  A(m)B(n)  ai b j 2i  j
i 0 j 0
5
A
B
X
Multiplier Design
P
Storage
R
E
G
I
N
1
MU
(16X16 Multiplier
Unit)
R
E
G
O
U
T
Control Unit
6
X: x3x2x1x0
Y:y 3y2y1y0
Input Sequence for G1:
A
B
X
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
P
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
0
Reset=0
G2
CLK
1
0
d 1-bit q
REG
0
x0y0
+
x0
y0
G1
x0y0
0
0
0
0
0
Serial Register
CLK
CLK/(N+1)
Slide 1
7
X: x3x2x1x0
Si: the ith bit of the final result
Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
0
Reset=0
G2
CLK
1
0
d 1-bit q
REG
0
x1y0
+
x1
y0
G1
x1y0
S0
0
0
0
0
Serial Register
CLK
CLK/(N+1)
Slide 2
8
X: x3x2x1x0
Si: the ith bit of the final result
Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
0
Reset=0
G2
CLK
1
0
d 1-bit q
REG
0
x2y0
+
x2
y0
G1
x2y0
x1y0
S0
0
0
0
Serial Register
CLK
CLK/(N+1)
Slide 3
9
X: x3x2x1x0
Si: the ith bit of the final result
Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
0
Reset=0
G2
CLK
1
0
d 1-bit q
REG
0
x3y0
+
x3
y0
G1
x3y0
x2y0 x1y0
S0
0
0
Serial Register
CLK
CLK/(N+1)
Slide 4
10
X: x3x2x1x0
Si: the ith bit of the final result
Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
S0
Reset=1
G2
CLK
0
0
d 1-bit q
REG
0
0
+
0
0
G1
0
x3y0 x2y0 x1y0
S0
0
Serial Register
CLK
CLK/(N+1)
Slide 5
11
X: x3x2x1x0
Si: the ith bit of the final result
Y:y 3y2y1y0
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
x1y0
Reset=0
G2
1
CLK
d 1-bit q
REG
x1y0
C1
S1
+
x0
y1
G1
x0y1
0
x3y0 x2y0 x1y0
S0
0
Serial Register
CLK
CLK/(N+1)
Slide 6
12
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
x2y0
Reset=0
G2
1
CLK
x2y0
d 1-bit q
REG
C20
+
x1
y1
G1
Y:y 3y2y1y0
x1y1
S20
S1
0
x3y0 x2y0
S0
C1
Serial Register
CLK
CLK/(N+1)
Slide 7
13
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
x3y0
Reset=0
G2
1
Y:y 3y2y1y0
CLK
d 1-bit q
REG
x3y0
C30
+
x2
y1
G1
x2y1
S30
S20
S1
0
x3y0
S0
C20
Serial Register
CLK
CLK/(N+1)
Slide 8
14
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
0
Reset=0
G2
1
Y:y 3y2y1y0
CLK
d 1-bit q
REG
0
C40
+
x3
y1
G1
x3y1
S40
S30
S20
S1
0
S0
C30
Serial Register
CLK
CLK/(N+1)
Slide 9
15
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
S1
Reset=1
G2
0
Y:y 3y2y1y0
CLK
d 1-bit q
REG
0
C50=0
+
0
0
G1
0
S50
S40
S30
S20
S1
S0
C40
Serial Register
CLK
CLK/(N+1)
Slide 10
16
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
S20
Reset=0
G2
1
CLK
S20
d 1-bit q
REG
C21
S2
+
x0
y2
G1
Y:y 3y2y1y0
x0y2
S50
S40
S30
S20
S1
S0
0
Serial Register
CLK
CLK/(N+1)
Slide 11
17
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
S30
Reset=0
G2
1
CLK
S30
d 1-bit q
REG
C31
+
x1
y2
G1
Y:y 3y2y1y0
x1y2
S31
S2
S50
S40
S30
S1
S0
C21
Serial Register
CLK
CLK/(N+1)
Slide 12
18
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
S40
Reset=0
G2
1
CLK
S40
d 1-bit q
REG
C41
+
x2
y2
G1
Y:y 3y2y1y0
x2y2
S41
S31
S2
S50
S40
S1
S0
C31
Serial Register
CLK
CLK/(N+1)
Slide 13
19
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
S50
Reset=0
G2
1
CLK
S50
d 1-bit q
REG
C51
+
x3
y2
G1
Y:y 3y2y1y0
x3y2
S51
S41
S31
S2
S50
S1
S0
C41
Serial Register
CLK
CLK/(N+1)
Slide 14
20
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
S2
Reset=1
G2
0
Y:y 3y2y1y0
CLK
d 1-bit q
REG
0
C60=0
+
0
0
G1
0
S60
S51
S41
S31
S2
S1
S0
C51
Serial Register
CLK
CLK/(N+1)
Slide 15
21
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
S31
Reset=0
G2
1
Y:y 3y2y1y0
CLK
d 1-bit q
REG
S31
C32
S3
+
x0
y3
G1
x0y3
S60
S51
S41
S31
S2
S1
S0
0
Serial Register
CLK
CLK/(N+1)
Slide 16
22
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
S41
Reset=0
G2
1
Y:y 3y2y1y0
CLK
d 1-bit q
REG
S41
C42
+
x1
y3
G1
x1y3
S4
S3
S60
S51
S41
S2
S1
S0
C32
Serial Register
CLK
CLK/(N+1)
Slide 17
23
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
S51
Reset=0
G2
1
Y:y 3y2y1y0
CLK
d 1-bit q
REG
S51
C52
+
x2
y3
G1
x2y3
S5
S4
S3
S60
S51
S2
S1
S0
C42
Serial Register
CLK
CLK/(N+1)
Slide 18
24
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
S60
Reset=0
G2
1
Y:y 3y2y1y0
CLK
d 1-bit q
REG
S60
C61
+
x3
y3
G1
x3y3
S6
S5
S4
S3
S60
S2
S1
S0
C52
Serial Register
CLK
CLK/(N+1)
Slide 19
25
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset:010000100001000010000
S3
Reset=1
Y:y 3y2y1y0
G2
CLK
0
d 1-bit q
REG
0
0
+
0
0
G1
0
S7
S6
S5
S4
S3
S2
S1
S0
C61
Serial Register
CLK
CLK/(N+1)
Slide 20
26
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset=0
Reset:010000100001000010000
G2
1
CLK
d 1-bit q
REG
+
0
0
G1
Y:y 3y2y1y0
0
S7
S6
S5
S4
S3
S2
S1
0
Serial Register
CLK
CLK/(N+1)
Slide 21
27
S0
X: x3x2x1x0
Si: the ith bit of the final result
Input Sequence for G1:
Ci: the only carry from column i
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
Sij: the jth partial sum for column i
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Cij: the jth partial carry from column i
Reset=0
Reset:010000100001000010000
G2
1
CLK
d 1-bit q
REG
+
0
0
G1
Y:y 3y2y1y0
0
S7
S6
S5
S4
S3
S2
S1
0
Serial Register
CLK
CLK/(N+1)
Slide 21
28
S0
Si: the ith bit of the final result
A
B
y0
x0
y1
D
y2
D
0
0
S0
0
0
+
S0
D
0
+
0
P
y3
D
0
X
+
S0
D
0
S0
D
S0
0
Slide 1
29
Si: the ith bit of the final result
Ci: the only carry from column i
A
B
y0
x1
y1
D
x1y0
y2
x0
D
x0y1
0
0
+
S1
P
y3
D
0
X
0
+
S1
+
S1
S1 S0
C1
D
D
0
0
D
0
Slide 2
30
Si: the ith bit of the final result
Ci: the only carry from column i
A
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
y0
x2
y1
D
x2y0
B
y2
x1
D
x1y1
+
D
x0y2
0
0
S2
+
C20
P
y3
x0
S20
X
+
S2
S2 S1 S0
C21
D
C1
D
0
D
0
Slide 3
31
Si: the ith bit of the final result
Ci: the only carry from column i
A
Sij: the jth partial sum for column i
B
X
Cij: the jth partial carry from column i
y0
x3
y1
D
x3y0
y2
x2
D
x2y1
+
y3
x1
D
x1y2
S30
+
C30
x0
x0y3
S31
+
D
C20
S3 S2 S1 S0
C32
C31
D
S3
C21
D
0
Slide 4
32
P
Si: the ith bit of the final result
Ci: the only carry from column i
A
Sij: the jth partial sum for column i
B
X
Cij: the jth partial carry from column i
y0
0
y1
D
0
y2
x3
D
x3y1
+
y3
x2
D
x2y2
S40
+
C40
x1
x1y3
S41
+
C41
D
S4 S3 S2 S1 S0
C42
D
C30
S4
C31
D
C32
Slide 5
33
P
Si: the ith bit of the final result
A
Ci: the only carry from column i
S ij:
B
the jth partial sum for column i
X
P
Cij: the jth partial carry from column i
y0
y1
D
0
y2
D
0
x3
D
0
0
+
y3
x3y2
C40
+
x2
x2y3
S51
+
C50
0
D
S5 S4 S3 S2 S1 S0
C51
D
C40
S5
C41
D
C42
Slide 6
34
Si: the ith bit of the final result
A
Ci: the only carry from column i
S ij:
B
the jth partial sum for column i
X
P
Cij: the jth partial carry from column i
y0
y1
D
0
y2
D
0
D
0
0
0
y3
x3
x3y3
0
+
0
+
0
C50
+
0
D
0
S6
S6 S5 S4 S3 S2 S1 S0
C6
D
C50
D
C51
Slide 7
35
Si: the ith bit of the final result
A
Ci: the only carry from column i
B
y0
y1
D
0
y2
D
0
0
0
D
0
0
+
0
+
0
0
+
0
D
P
y3
0
0
X
S7 S6 S5 S4 S3 S2 S1 S0
0
D
0
S7
0
D
C6
Slide 8
36
Shift Add Multiplier Design Implementation
INPUT Ain (7 downto 0)
A
B
X
P
REGA
0
MUX
8 bit Adder
INPUT Bin (7 downto 0)
REGC
Result (15 downto 8)
REGB
Result (7 downto 0)
CLOCK
37
A
B
X
P
Synchronous Shift and Add Multiplier
controller
 Multiplication process:
 5 states: Idle, Init, Test, Add, and Shift&Count.
 Idle: Starts by receiving the Start signal;
 Init: Multiplicand and multiplier are loaded into a load
register and a shift register, respectively;
 Test: The LSB in the shift register which contains the
multiplier is tested to decide the next state;
38
A
B
X
P
Synchronous Shift and Add Multiplier
ControllerDesign
 Add: If LSB is ‘1’, then next state is to add the new partial product to the
accumulation result, and the state machine transits to shift&count state ;
 Shift&Count: If LSB is ‘0’, then the two shift register shift their contains
one bit right, and the counter counts up by one step. After that, the state
machine transits back to test state;
 When the counter reaches to N , a Stop signal is asserted and the state
machine goes to the idle state;
 Idle: In the idle state, a Done signal is asserted to indicate the end of
multiplication.
39
n-bit Multiplier:
Q0=1: Multiplicand is added to register A; the result is stored in register A; registers C, A, Q are shifted to the right
one bit
Q0=0: Registers C, A, Q are shifted to the right one bit
Multiplicand
Add
Shift and Add
Control Logic
n-bit Adder
Shift Right
C
An-1
An
...
A1
A0
Qn-1
Qn
...
Q1
Q0
Multiplier
Slide 1
40
Example: 4-bit Multiplier
A
Initial Values
B
X
Multiplicand
1
0
1
1
Add
4-bit Adder
Shift and Add
Control Logic
Shift Right
0
0
0
0
0
1
1
0
1
Multiplier
Slide 2
41
P
Example: 4-bit Multiplier
First Cycle--Add
A
B
X
Multiplicand
1
0
1
1
Add=1
4-bit Adder
Shift and Add
Control Logic
Shift Right=0
0
1
0
1
1
1
1
0
1
Multiplier
Slide 3
42
P
Example: 4-bit Multiplier
A
First Cycle--Shift
B
X
Multiplicand
1
0
1
1
Add=0
4-bit Adder
Shift and Add
Control Logic
Shift Right=1
0
0
1
0
1
1
1
1
0
Multiplier
Slide 4
43
P
Example: 4-bit Multiplier
Second Cycle--Shift
A
B
X
Multiplicand
1
0
1
1
Add=0
4-bit Adder
Shift and Add
Control Logic
Shift Right=1
0
0
0
1
0
1
1
1
1
Multiplier
Slide 5
44
P
Example: 4-bit Multiplier
A
Third Cycle--Add
B
X
Multiplicand
1
0
1
1
Add=1
4-bit Adder
Shift and Add
Control Logic
Shift Right=0
0
1
1
0
1
1
1
1
1
Multiplier
Slide 6
45
P
Example: 4-bit Multiplier
Third Cycle--Shift
A
B
X
P
Multiplicand
1
0
1
1
Add=0
4-bit Adder
Shift and Add
Control Logic
Shift Right=1
0
0
1
1
0
1
1
1
1
Multiplier
Slide 7
46
Example: 4-bit Multiplier
Fourth Cycle--Add
A
B
X
Multiplicand
1
0
1
1
Add=1
4-bit Adder
Shift and Add
Control Logic
Shift Right=0
1
0
0
0
1
1
1
1
1
Multiplier
Slide 8
47
P
Example: 4-bit Multiplier
Fourth Cycle--Shift
A
B
X
Multiplicand
1
0
1
1
Add=0
4-bit Adder
Shift and Add
Control Logic
Shift Right=1
0
1
0
0
0
1
1
1
1
Multiplier
Slide 9
48
P
A
B
X
P
4*4 Synchronous Shift and Add Multiplier Design
Layout Design
Floor plan of the 4*4 Synchronous
Shift and Add Multiplier
49
A
B
X
P
Comparison between Synchronous and Asynchronous
Approaches
.
50
Example : (simulated by Ovais Ahmed, Fall_03,project)
A
B
Multiplicand = 100010012 =
8916
Multiplier =
AB16
101010112
=
X
Expected Result = 1011011100000112 =5B8316
51
P
A
B
X
P
Array Multiplier
 Regular structure based on add and shift algorithm.
 Addition is mainly done by carry save algorithm.
 Sign bit extension results in a higher capacitive load and
slows down the speed of the circuit.
52
A
B
X
Addition with CLA
P
a3
a2
a1
a0
b0
A = a3a2a1a0
B = b3b2b1b0
a3
a2
a1
a0
b1
0
Cout
Ci
Four-bit Adder
0
n
a3
a2
a1
a0
b2
Cout
a3
Four-bit Adder
a2
a1
Cin
0
a0
b3
Cout
Four-bit Adder
Cin
0
53
Product (A*B)
A
B
X
P
Array Multiplier with CSA
A3
A2
A1
A0
**Pij =Ai Bj
Aj
Total of 16
gates
P03 P12 0
P02 P11 0
P01 P10 0
F.A
F.A
F.A
B0
Bi
Ci
B1
0i3
B2
0 j3
B3
P13 P22
P21
F.A
Pij
Ci
P23 P32
P31
F.A
Ci
Si
P33
Ci
Si
Ci
Si
P20
F.A
Si
F.A
Ci
Si
P30
F.A
Ci
Ci
Si
F.A
Ci
Si
0
F.A
Ci
Si
Si
P00
Si
F.A
Ci
Si
F.A
Ci
Si
54
R7
R6
R5
R4
R3
R2
R1
R0
A
B
X
P
Critical Path with Array Multipliers
FA
FA
FA
FA
FA
FA
FA
HA
FA
FA
HA
HA
Two of the possible paths for the Ripple-Carry based 4*4 Multiplier
Area = (N*N) AND Gate + (N-1)N Full-Adder
Delay =
τ HA
+ (2N-1)
τ
FA
55
A
B
X
P
56
B
P9
+
+
+
+
+
+
+
+
P8
P7
P6
P5
P4
P3
+
+
+
+
+
+
+
+
+
+
P2
P1
x0y0
x0y1
x1y0
x0y2
x1y1
x2y0
x0y3
x1y2
x2y1
x3y0
x2y2
x3y1
x4y0
x0y4
x1y3
x2y3
x3y2
x4y1
x1y4
x2y4
x3y3
x4y2
X
x3y4
x4y3
x4y4
A
Wallace Tree
P
+
+
P0
57
A
B
X
P
Array Multiplier + Wallace Tree
58
A
B
X
Background
P
 Baugh-Wooley Algorithm
X * Y  ( xk 1 * 2
k 1
 ( xk 1 * yk 1 * 2
k 2
  xi * 2 ) * ( yk 1 * 2
i
i 0
2 k 2
k 2
k 2
i 0
j 0
k 2
k 1
k 2
  yi * 2i )
i 0
   xi y j * 2 )   xk 1 yi * 2
i j
i 0
k 1i
k 2
  yk 1 xi * 2 k 1i
i 0
• Convert negative partial
products to positive
representation
• No sign-extension required
4/13/2015
Concordia VLSI Lab
59
59
A
X
B
P
examples of 5-by-5 Baugh-Wooley
a4b0'
FA
a4b1'
FA
a4b2'
a3b2
FA
0
a3b0
a3b1
a2b2
FA
a3b3
FA
a2b3
FA
a1b3
FA
a2'b4
FA
a1'b4
FA
a0'b4
FA
FA
FA
a2b0
0
a2b1
a1b2
FA
FA
0
a1b1
a1b0
0
FA
a0b1
a0b0
a0b2
a0b3
a4b3'
a4'
1
FA
b4'
a4b4
FA
a3'b4
a4
FA
FA
FA
FA
FA
P9
P8
P7
P6
P5
FA
b4
P4
P3
P2
P1
P0
The schematic logic circuit diagram of a 5-by-5 Baugh-Wooley two’s complement array multiplier
4/13/2015
Concordia VLSI Lab
60
60
A
B
X
P
a7
a6
a5
a4
a3
a2
a1
a0
*
a7
a6
a5
a4
a3
a2
a1
a0
------------
------------
------------
------------
------------
------------
------------
------------
------------
a7*a0
a6*a0
a5*a0
a4*a0
a3*a0
a2*a0
a1*a0
a0*a0
a7*a1
a6*a1
a5*a1
a4*a1
a3*a1
a2*a1
a1*a1
a0*a1
a7*a2
a6*a2
a5*a2
a4*a2
a3*a2
a2*a2
a1*a2
a0*a2
a7*a3
a6*a3
a5*a3
a4*a3
a3*a3
a2*a3
a1*a3
a0*a3
a7*a4
a6*a4
a5*a4
a4*a4
a3*a4
a2*a4
a1*a4
a0*a4
a7*a5
a6*a5
a5*a5
a4*a5
a3*a5
a2*a5
a1*a5
a0*a5
a7*a6
a6*a6
a5*a6
a4*a6
a3*a6
a2*a6
a1*a6
a0*a6
a7*a7
a6*a7
a5*a7
a4*a7
a3*a7
a2*a7
a1*a7
a0*a7
------------
------------
------------
------------
------------
------------
------------
------------
------------
------------
------------
------------
------------
------------
------------
a7*a6
a7*a5
a7*a4
a7*a3
a7*a2
a7*a1
a7*a0
a6*a0
a5*a0
a4*a0
a3*a0
a2*a0
a1*a0
‘0'
a0
------------
------------
------------
------------
------------
------------
61
A
X
B
P
Example of an 8bit squarer
N*N
a6a2
a7a1
N=8bits
a6a1
a5a3 a7a0 a5a2
a5a1
a6a0 a4a2
a3a1
a4a1
a5a0 a3a2 a4a0
a2a1
a3a0
a2a0
a2
‘0’ a0
a1a0
a1
‘0’
‘0’
a6a3
a7a2
a5a4
a3
a3a4
‘0’
‘0’
a6a5
a7a4
a6
a6a4
a7a3
a5
a4
‘0’
a7
a7a6
a7a5
‘0’
S15
S14
S13
S12
S11
S10
S9
S8
S7
S6
S5
S4
S3
S2
S1 S0
62
A
B
X
P
Array Multiplier
32bits by 32bits multiplier
63
1 Booth (Radix-4) Multiplier
A
B
X
 Radix-4 (3 bit recoding) reduces number of partial products to be
added by half.
 Great saving in area and increased speed.
A = -an-12n-1 + an-22n-2 + an-32n-3 + …. + a12 + a0
B = -bn-12n-1 + bn-22n-2 + bn-32n-3 + …. + b12 + b0
· Base 4 redundant sign digit representation of B is
(n/2) - 1
B= 
22i Ki
i=0
64
P

 Ki is calculated by following equation
Ki = -2b2i+1 + b2i + b2i-1
i = 0,1,2,….(n-2)/2
 3 bits of Multiplier B, b2i+1, b2i, b2i-1, are examined and
corresponding Ki is calculated.
 B is always appended on the right with zero (b-1 = 0), and n is
always even (B is sign extended if needed).
 The product AB is then obtained by adding n/2 partial products.
(n/2) - 1
AB = P =  22i Ki A
i=0
65
A
B
X
P
Booth Algorithm
Decoding of multiplier to generate signals for hardware use
Xi+1
Xi
Xi-1
OP
NEG
ZERO
TWO
0
0
0
0
0
1
0
1
0
0
2
1
0
1
0
1
0
1
0
0
0
1
1
0
1
1
0
0
0
0
1
1
0
0
0
1
0
1
1
1
0
0
0
1
1
2
0
0
1
1
1
1
0
1
1
0
66
A
B
X
P
Booth Algorithm
A Booth recoded multiplier examines
Three bits of the multiplicand at a time
It determine whether to add zero, 1, -1, 2, or -2 of that rank of
the multiplicand.
The operation to be performed is based on the current two bits
of the multiplicand and the previous bit
Xi+1
X
Xi-1
Zi/2
0
0
0
0
0
0
1
1
0
1
0
1
0
1
1
2
1
0
0
-2
1
0
1
-1
1
1
0
-1
1
1
1
0
67
BIT
M is
multiplied
by
OPERATION
21
20
2-1
Xi
Xi+1
Xi+2
0
0
0
add zero (no string)
+0
0
0
1
add multipleic (end of string)
+X
0
1
0
add multiplic. (a string)
+X
0
1
1
add twice the mul. (end of string)
+2X
1
0
0
sub. twice the m. (beg. of string)
-2X
1
0
1
sub. the m. (-2X and +X)
-X
1
1
0
sub . the m. (beg. of string)
-X
1
1
1
sub. zero (center of string)
-0
68
A
B
X
P
Booth Algorithm-a higher radix Multiplication
● ●●●
(●●)(●●)
Multiplicand A =
Multiplier
B=
● ● ● ● (B1B0)2A40
Partial product bits
● ●●●
Partial product bits
Product
P=
(B3B2)A41
● ●●●● ●●●
69
A
Example
B
X
P
The following example is used to show how the calculation is done properly.
Added to
the
multiplier
Multiplicand X = 000011
Multiplier
Y = 011101
0 1 1 1 0 1 0
After booth decoding, Y is decoded as to multiply X by +2, -1, +1 separately, then shift the partial
product two bits and add them together.
X* +1
X* -1
X* +2
000000000011
1111111101
00000110
-------------------------------------------000001010111
70
A
B
X
P
Sign Extension
71
A
B
X
P
Sign extension
 Traditional sign-extension scheme
• Segment the input operands based on the size of
embedded blocks
• Multiply the segmented inputs and extend the sign bit of
each partial products
• Sum all partial products
×
Segmented input
operands
Sign extension
partial
products
+
Sign
4/13/2015
Final result
Concordia VLSI Lab
72
72
A
B
X
P
Booth Algorithm-Example 1
Example 1:
000011
011101 0
(+3)
(+29)
+2 -1 +1
000000000011
1111111101
00000110
1
000001010111
(+87)
73
A
B
X
P
Booth Algorithm Example 2
Notice sign
extensions
111101
011101 0
(-3)
(+29)
+2 -1 +1
111111111101
0000000011
11111010
2s complement of
multiplicand
1
111110101001
(-87)
74
A
B
X
P
Booth Algorithm-Example 3
Notice the sign
extensions
111101
100011 0
(-3)
(-29)
-2 +1 -1
000000000011
1111111101
00000110
Shifted 2s
complement
1
000001010111
(+87) 75
A
B
X
P
Comparison of Booth and parallel
multiplier shift and Add
76
Template to reduce sign extensions for Booth Algorithm
For hardware implementation
Please note that each operand is 17 bit ie. the 17th bit is the sign bit. Also
negative numbers are entered as 1’s complement, this is why you need to
add the S in the right hand side of the diagram. If you use 2’complement
then the S’s on right side of the diagram can be removed
77
A
B
X
P
Comparison of Template and the
sign extension
A
B
1
A
B
S1 S1 S1
S1 S1 S1 S1 S1 S1 S1
S2
S2 S2 S2 S2 S2
S3
S3 S3 S3
S4
P
Sign template
P
Sign extension
78
3 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9 8 7 6 5 4
1 0
3 2
2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
S S S A A A A A A A A A A A A A
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1 S A A A A A A A A A A A A A A A
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
A A
0
0
A A
0
0
A A
1
1
1 S A A A A A A A A A A A A A A A A A
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1 S A A A A A A A A A A A A A A A A A
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
1 S A A A A A A A A A A A A A A A A A
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
1 S A A A A A A A A A A A A A A A A A
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
1 S A A A A A A A A A A A A A A A A A
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
A A A A A A A A A A A A A A A A A A
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
6
6
5
5
4
4
3
3
2
2
Partial
Product
matrix
generated for
a 16 * 16 bit
multiplication
,
Using booth
and the
template
given in
previous slide
S A A A A A A A A A A A A A A A A
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
79
A
B
X
Example of using the template
P
25 * - 35 with -35 as the multiplier. Using 8 bit representation
Using the Template 25 * -35
Sign bit
Add SS
Add inverted S
00011001
110111010
Add Inverted sign and add 1
Add Inverted sign bit
No sign bit
10000011001
1011100111
100110010
1100111
* 1
* -1
* 2
* -1
11110010010101
This is a –ve number. Convert it
00001101101011
512 256 64 32 8
2
1 = 875
80
A
B
X
P
Booth Multiplier Components
Multiplier
Booth Encoder
Mu
lt
ip
li
ca
nd
PPU
(Partial
products
unit)
PPA
(Partial
products
adding unit)
Product
81
A
B
X
Wallace Tree and Ripple Carry Adder Structure.
P
Of 8*8 multiplier With Pipeline
Partial Product PP0,PP1,PP2(15 downto 0)
+
+
+
+
+
+
+
+
+ + + + + + + + +
Partial Product PP3(15 downto 0)
+ + + + + + + + +
+
+
+
0
+
+
+
Pipeline Register
Ripple Carry Adder
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
P4
P3
P2
+
0
Critical Path
P16
P15
P14
P13
P12
P11
P10
P9
P8
P7
P6
P5
P1 P0
82
CLK
Hardware implementation of
Booth with shift and add
Start
Mulbegin
A
D
Init
LD
16
17
X
Q
Doubleshift SH
CLK
17
=0; A16=0
=1, A16=1
endcheck
CLK
F
Mulbegin
Stop
Doubleshift
Mux11 Mux11
CLK
Mux12 Mux12 CLK
A3bit
CLK
Mux0
Init
CLK
Mulend
CLR
CLR reg2right17
Mul11
Start
ctrl0
A
32
B
32
C
32
D
32
*2 (shifter)
2s
D 16
B complement
Init LD
Shift SH
32
Q
Mul12
Init
32
32
Counter20
Mulend
CLR
Start
FSM
Mulend
11
32
10 Y
01
00
5
A
37
not used
Mux0
Cout
37
Sum A
37
B
37
B 37
Cin Adder
37
Start
Sel
1 Y
0
37
Mux37
D 37
Q 37 Result
CLK
Register37 CLR
Start
Start
16
D 16
LD
Finish
Mux0
Init
Start
sign
expansion
mux4-32
B
Finish
ctrl1
CLK CLK
reg_2left32 CLR
B
X
Shift
Stop
QA(0-2)
A
Q
*2 (shifter)
Shift SH
CLK CLK
reg_2left32
reg_2left32 CLR
Start
83
P
A
B
X
P
Simulation Plan
32-bit Signal
Generator A
Behavioral Multiplier
A[31:0]
A*B
Result
P[63:0]
64-bit
Comparator
Failed
Number
My_P[63:0]
32-bit Signal
Generator B
B[31:0]
Array Multiplier
Modified BoothWallace Tree
Multiplier
My Multiplier
Modified Booth
Multiplier
Wallace Tree
Multiplier
Twin Pipe
Serial-Parallel
Multiplier
84
A
B
X
P
Testing the Design
85
A
B
X
P
Simulation For Parallel Multipliers
Signed
Number:
Unsigned
Number:
86
A
B
X
P
Simulation For Signed S/P Multipliers
There are 340 ns
delay between the
result and the
operators because
of the D flip-flops
delay.
87
A
B
X
P
FPGA after implementation, areas of
programming shown clearly
88
A
B
X
P
Another implementation of the above after pipelining, the place and
rout has paced the design in different places.
89
A
B
X
P
Spartacus FPGA board
90
A
B
X
P
Testing the multiplication system
91
A
B
X
P
Comparison of Multipliers
Array
Multiplier
Area – Total CLB’s
(#)
Modified Booth
Multiplier
Wallace-Tree
Multiplier
Modified BoothWallace
Tree
Multiplier
Twin Pipe
SerialParallel
Multiplier
Behavioral
Multiplier
3076.50
2649.50
3325.50
2672.50
490.00
2993.50
Maximum Delay
D(ns)
35.78
24.43
18.93
18.53
107.52 (3.36x32)
49.33
Total Dynamic
Power P (W)
7.52
6.33
7.46
6.41
0.28
6.24
Delay ·Power
Product (DP)
(ns W)
268.98
154.64
141.14
118.76
30.62
307.58
Area•Power
Product (AP)
(# W)
23128.20
16771.60
24793.93
17127.79
139.54
18665.07
Area•Delay
Product (AD)
(# ns)
1.10E+05
6.47E+04
6.30E+04
4.95E+04
5.27E+04
1.48E+05
3.94E+06
1.58E+06
1.19E+06
9.18E+05
5.66E+06
7.28E+06
Area•Delay2
Product
(AD2)
(# ns2)
92
Table 7. Performance comparison for two’s complement multipliers By Chen Yaoquan, M.Eng. 2005
A
B
X
Comparison of Multipliers
P
Array
Multiplier
Area – Total CLB’s
(#)
Modified Booth
Multiplier
Wallace-Tree
Multiplier
Modified
BoothWallace
Tree
Multiplier
Twin Pipe
SerialParallel
Multiplier
Behavioral
Multiplier
3280.50
2800.00
3321.50
2845.50
487.00
3003.00
37.23
25.33
18.93
18.33
107.52
44.50
Total Dynamic
Power P (W)
7.57
6.66
7.32
6.66
0.29
6.26
Delay ·Power
Product (DP)
(ns W)
281.88
168.77
138.60
122.13
30.66
278.53
Area•Power
Product (AP)
(# W)
24837.98
18656.40
24319.36
18959.57
138.89
18795.78
Area•Delay
Product (AD)
(# ns)
1.22E+05
7.09E+04
6.29E+04
5.22E+04
5.24E+04
1.34E+05
4.55E+06
1.80E+06
1.19E+06
9.56E+05
5.63E+06
5.95E+06
Maximum Delay
D(ns)
Area•Delay2
Product
(AD2)
(# ns2)
93
Table 7. Performance comparison for Unsigned multipliers By Chen Yaoquan, M.Eng. 2005
A
B
X
P
Comparison of Multipliers
Change the value of “set_max_delay” in Script file (ns)
0
10
20
30
40
50
60
>60
3014.
5
3013.
0
3110.
0
3193.
5
3019.
5
2999.
5
2978.
5
2978.
5
Power(w) 6.649
9
6.647
0
7.568
3
8.187
8
8.064
5
8.041
9
8.015
6
8.015
6
Delay(n
s)
31.98
30.93
30.08
39.93
49.88
59.63
59.63
Area(#)
31.98
3250
The relation of Area
and Delay for
behavioral
multiplier -"banana curve"
Area (#)
3200
3150
3100
Series1
3050
3000
2950
0
20
40
Delay (ns)
60
80
94
A
B
X
P
Comparison of Multipliers
Array
Multiplier
Modified
Booth
Multiplier
WallaceTree
Multiplier
Modified
BoothWallace
Tree
Multiplier
Twin Pipe
SerialParallel
Multiplier
Behavioral
Multiplier
Area
Medium
Small
Large
Small
Smallest
Medium
Critical Delay
Medium
Fast
Very Fast
Fastest
Very Large
Large
Power
Consumption
Large
Medium
Large
Medium
Smallest
Medium
Complexity
Simple
Complex
More
Complex
More
Complex
Simple
Simplest
Implement
Easy
Medium
Difficut
Difficut
Easy
Easiest
By Chen Yaoquan, M.Eng. 2005
95
A
B
X
P
Pipelining Simulation
96
A
B
X
P
Synthesis for Signed Multipliers
Array
Modified Booth
Wallace Tree
Modified Booth
-Wallace Tree
Twin Pipe S/P
Behavioral
97
A
B
X
P
Synthesis for Unsigned Multipliers
Array
Modified Booth
Wallace Tree
Modified Booth
-Wallace Tree
Twin Pipe S/P
Behavioral
98
A
B
X
P
Conclusion
•
•
•
•
Modified Booth and Wallace Tree are the best
techniques for high speed multiplication.
Wallace Tree has the best performance, but it is
hard to implement.
Booth algorithm based multipliers have lower area
among parallel multipliers.
For behavioral multipliers, the area will increase
while the delay decreases.
99
A
B
X
Comparison
P
Array
Multiplier
Area – Total
CLB’s (#)
Maximum Delay
(ns)
Power
Consumption at
highest speed
(mW)
Delay Power
Product (DP)
(ns mW)
Area  Power
Product (AP)
(# mW)
Area  Delay
Product (AD)
(# ns)
Area  Delay2
Product(AD2)
(# ns2)
1165
Modified
Booth
Multiplier
1292
Wallace Tree
Multiplier
1659
Modified Booth
& Wallace Tree
Multiplier
1239
Twin Pipe SerialParallel
Multiplier
133
187.87ns
139.41ns
101.14ns
101.43ns
22.58ns
(722.56ns)
16.6506m
W
(at 188ns)
23.136mW
(at 140ns)
30.95mW
(at 101.14ns)
30.862mW
(at 101.43ns)
2.089mW
(at 722.56ns)
3128.15
3225.39
3130.28
3130.33
1509.42
19.397 x
103
29.891 x 103
51.346 x 103
38.238 x 103
277.837
218.868 x
103
180.118 x
103
167.791 x 103
125.671 x 103
96.101 x 103
41.119 x
106
25.110 x 106
16.970 x 106
12.747 x 106
69.438 x 106
100
A
B
X
P
NOTICE
 The rest of these slides are for extra information only
and are not part of the lecture
101
Array Addition
102
Addition
of 8
binary
numbers
using the
Wallace
tree
principal
103
104
105
FINISH0
A
B
BEGIN0
CLK
RESET
START
MULT320
Done
RESULT
INVERTER
END0
AND_2
COUNTER20
CLR
Adder37
32
37
37
CLK
37
D
LAST_RESULT
Q
CLR
REGSTER37
106
Baugh-Wooley two's complement
multiplier:
a4b0'
•
FA
a4b1'
FA
a4b2'
a3b2
FA
0
a3b0
a3b1
a2b2
FA
a3b3
FA
a2b3
FA
a1b3
FA
a2'b4
FA
a1'b4
FA
a0'b4
FA
FA
FA
a2b0
0
a2b1
a1b2
FA
FA
0
a1b1
a1b0
0
FA
a0b1
a0b0
a0b2
a0b3
a4b3'
a4'
1
FA
b4'
a4b4
FA
a3'b4
a4
FA
FA
FA
FA
FA
P9
P8
P7
P6
P5
FA
P4
b4
P3
P2
P1
P0
The schematic logic circuit diagram of a 5-by-5 Baugh-Wooley two’s complement array multiplier
107
A
B
a4 a3 a2 a1 a0
X
b4 b3 b2 b1 b0
a4b0' a3b0 a2b0 a1b0 a0b0
a4b2'
a4b4
a4'
a4b3'
a3'b4
a4b1'
a3b1 a2b1 a1b1 a0b1
a3b2
a2b2
a1b2 a0b2
a0b3
a3b3
a2b3
a1b3
a2'b4
a1'b4
a0'b4
b4'
+
b4
p9
1
0
1
=13
1
1
0
1
1
= -5
0
1
1
0
1
0
1
1
0
1
0
0
0
0
0
1
1
1
= -5
0
1
=13
1
0
1
0
1
0
0
0
0
1
0
1
0
0
1
1
+
1
1
1
1
1
0
1
=13
= 5
0
1
0
1
0
1
1
0
1
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
= -65
1
0
1
1
0
1
1
0
1
X
0
0
1
1
1
0
0
0
1
= 65
1
1
1
1
= -65
1
0
0
1
1
= -13
1
1
0
1
1
= -5
0
0
0
1
1
1
0
0
0
1
0
0
0
1
1
0
0
0
1
0
1
1
0
0
1
1
0
1
0
0
+
0
0
1
1
0
0
1
0
1
0
1
0
1
0
0
0
0
1
1
1
1
X
1
1
1
P
0
0
0
p0
1
0
X
p1
1
1
1
p2
0
0
0
p3
0
1
1
p4
1
1
1
p5
0
0
1
p6
0
0
1
0
p7
1
0
+
p8
0
X
+
a4
1
1
0
0
1
0
0
0
0
0
1
= 65
108
A
B
X
P
Cluster Multipliers
Divide the multiplier into smaller multipliers
109
A
B
X
Cluster Multipliers
P
Multiplicand
Multiplier
/CLR
8-bit
Latch
/CLR
8-bit
Latch
/CLR
8-bit
Latch
EN1
/CLR
8-bit
Latch
8
8
EN0
CLK
CLK
CLK
CLK
8
EN2
The circuit used
to generate the
enable signal
4-bit
Multiplier
4-bit
Multiplier
4-bit
Multiplier
EN3
CLK
CLK
CLK
CLK
4-bit
Multiplier
8-bit
Latch
8-bit
Latch
8-bit
Latch
8-bit
Latch
4
4
4
4
B3~B0
B8~B7
A3~A0
A8~A7
8
Final Addition Stage
16
P
110
8-bit cluster low power multiplier
A
B
X
P
Cluster Multipliers
• Dividing the multiplication circuit into clusters
(blocks) of smaller multipliers
• Applying clock gating techniques to disable the
blocks that are producing a zero result.
• Features
– Low Power (claims 13.4 % savings)
111
A
B
X
Multiplexer-Based Array Multipliers
P
Z 41
Zj
Z 42
Z 40
Z30
Z31
Z 21
Z 20
Z10
Z32
Z 43
xjyj
n 1
n 1
P   xj yj 2   Z j 2 j
2j
j 0
j 1
Z j  x jY j  X j y j
X j  X j 1 X j 2 ...X 0
112
A
B
X
P
Multiplexer-Based Array Multipliers
Two types of cells:
Cell 1: produce the terms
carry save adder array
Zij2j
and includes a full adder of
Cell 2: produce the terms xjyj 2j and includes a full adder of
carry save adder array
113
A
B
X
P
Multiplexer-Based Array Multipliers
• Characteristics
–
–
–
–
Faster than Modified Booth
Unlike Booth, does not require encoding logic
Requires approximately N2/2 cells
Has a zigzag shape, thus not layout-friendly
114
A
B
X
P
Multiplexer-Based Array Multipliers
• Improvement
– More rectangular layout
– Save up to 40 percent area without penalties
– Outperforms the modified Booth multiplier in both
speed and power by 13% to 26%
115
A
B
•
X
Gray-Encoded Array Multiplier
P
Dec
Hyb
Dec
Hyb
Dec
Hyb
Dec
Hyb
0
0000
4
0100
-8
1100
-4
1000
1
0001
5
0101
-7
1101
-3
1001
2
0011
6
0111
-6
1111
-2
1011
3
0010
7
0110
-5
1110
-1
1010
2’s complement Hybrid Coding
– Having a single bit different for consecutive values
– Reducing the number of transitions, and thus power ( for
highly correlated streams ).
116
A
B
X
P
Gray-Encoded Array Multiplier
An 8-bit wide 2’s complement radix-4 array multiplier
117
A
B
X
P
Gray-Encoded Array Multiplier
• Characteristics
– Uses gray code to reduce the switching activity
of multiplier
– Saves 45.6% power than Modified Booth
– Uses greater area(26.4% ) than Modified Booth
118
A
B
X
P
Ultra-high Speed Parallel Multiplier
• How to ultra-high speed?
– Based on Modified Booth Algorithm and Tree
Structure (Column compress)
– Chooses efficient counters (3:2 and 5:3)
– Uses the new compressor (faster 20% )
– Uses First Partial product Addition (FPA)
Algorithm (reducing the bits of CLA by 50%)
119
A
B
X
P
Ultra-high Speed Parallel Multiplier
Divide into 3 rows
or 5 rows only
(most efficient).
Calculate
the
partial products as
soon as possible.
The final CLA is
only 16-bit instead
of 32-bit.
Calculation process using parallel counter in case of 16x16
---Totally reduce delay by about 30%
120
A
B
X
P
ULLRLF Multiplier
• ULLRLF stands for Upper/Lower Left-toRight Leapfrog.
• Combine the following techniques:
– Signal flow optimization in [3:2] adder array
for partial product reduction,
– Left-to-right leapfrog (LRLF) signal flow,
– Splitting of the reduction array into upper/lower
parts.
121
A
B
X
P
ULLRLF Multiplier
PPij is always connected to pin A
Sin/Cin are connected to B/C ,
most Sin signals are connected to C
1) Signal flow optimization in [3:2] adder array
-- For n = 32, the delay is reduced by 30 percent.
-- The power is saved also.
122
A
B
X
P
ULLRLF Multiplier
The sum signals skip
over alternate rows.
2) Left-to-Right Leapfrog (LRLF) Structure
-- The delay of signals is more balanceable.
-- Low power.
123
A
B
X
P
ULLRLF Multiplier
Only n+2 bits
3) Upper/Lower Split Structure
-- The long path of data path be broken into parallel short
paths, there would be a saving in power.
-- The delay of Partial Products Reduction is reduced.
124
A
B
X
P
ULLRLF Multiplier
•ULLRLF multipliers have
less power than optimized
tree multipliers for n ≤ 32 while
keeping similar delay and
area.
• With more regularity and
inherently shorter
interconnects, the ULLRLF
structure presents a
competitive alternative to tree
structures.
Floorplan of ULLRLF (n = 32)125
A
B
X
Signed Array Multiplier
P
B0
A31
A3
A2
A1
A0
B1
A31
A30
A2
A1
A0
B2
One stage of carry
save adder
A31
A30
A1
A29
FA
A0
FA
FA
FA
HA
B3
A31
A30
A28
A29
FA
FA
A0
FA
A0
FA
FA
HA
STAGE 4 TO 30
(Each stage includes 32 AND gates, 31 full adders ,1 half adder and 1 NOT gate)
B31
A31
A30
A1
FA
A0
FA
FA
HA
P31
P30
1
32-bit carry look ahead adder
HA
P63
P62
P61
P34
P33
P3
P2
P1
P0
126
32*32-Bit Array Multiplier for Signed Number
A
B
X
Unsigned Array Multiplier
P
B0
A31
A3
A2
A1
A0
B1
A31
A30
A2
A1
A0
B2
One stage of carry
save adder
A31
A30
A1
A29
HA
A0
FA
FA
FA
HA
B3
A31
A30
A28
A29
FA
FA
A0
FA
A0
FA
FA
HA
STAGE 4 TO 30
(Each stage includes 32 AND gates, 31 full adders and 1 half adder)
B31
A31
A30
A1
FA
A0
FA
FA
HA
32-bit carry look ahead adder
P63
P62
P61
P33
P32
P31
P30
P3
P2
32*32-Bit Array Multiplier for Unsigned Number
P1
P0
127
A
X
Signed Modified Booth Multiplier
0
B
P
63
60
55
50
45
40
35
30
25
20
15
10
5
0
1E
1 E
1 E
.S
1 E
16 rows of partial products
.S
1 E
.S
1 E
.S
1 E
.S
1 E
.S
1 E
.S
1 E
.S
1 E
.S
1 E
.S
1 E
.S
1 E
.S
1 E
.S
1 E
1 E
1 E
E = The inversion of sign bit in each row
S = the B i+1 bit in the three encoded bits
.S
.S
.S
................................
…............................
..............................
...............................
................................
….........................
.................................
….......................
.................................
.........................
.......................................
....
.......................
.................................
.....................
...........................................
........
...................
.................................
….............
...............................................
............
...............
.................................
.............
...................................................
................
...........
.................................
.........
.......................................................
....................
.......
.................................
.....
...........................................................
........................
...
.................................
.. .................................
............................
.............................
.
..........
...........................
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
LSB
B i-1
B
B i+1
M
u
l
t
I
p
l
i
e
r
MSB
32*32-bit Booth Multiplier for
Signed Number
128
A
B
X
Signed Modified Booth
Multiplier
P
A31 A31
SEL
A31 A31
SEL
One stage
A29
A28
SEL
SEL
SEL
SEL
A3
A2
A1
A0
0
SEL
SEL
SEL
SEL
SEL
SEL
A2
A1
A0
0
SEL
SEL
SEL
SEL
X1[0]
X2[0]
INVERT0
X1[1]
X2[1]
INVERT1
Booth
Encoder
B[1:0]0
Booth
Encoder
B[3:1]
Booth
Encoder
B[5:3]
1
HA
A31 A31
SEL
A30
1
A4
A30
FA
HA
HA
HA
A30
A29
A28
A27
A26
A1
A0
0
SEL
SEL
SEL
SEL
SEL
SEL
SEL
SEL
HA
HA
X1[2]
X2[2]
INVERT2
INVERT2
1
HA
FA
FA
FA
FA
FA
FA
FA
X1[n]
X2[n]
INVERT n
STAGE 3 TO 15
(Each stage includes 33 PP selectors, 31 full adders ,1 half adder and 1 NOT gate)
INVERT1
INVERT0
0
0
64-bit carry look ahead adder
P62
P61
P60
B[31:5]
INVERT n
1
P63
Booth
Encoder
P5
P4
P3
P2
P1
P0
32*32-Bit Modified Booth Multiplier for Signed Number
129
A
X
Unsigned Modified Booth Multiplier
0
B
P
60
55
50
45
40
35
30
25
20
15
10
5
0
1 S'
1 S'
1 S'
.S
1 S'
17 rows of partial products
.S
1 S'
.S
1 S'
.S
1 S'
.S
1 S'
.S
1 S'
.S
1 S'
.S
1 S'
.S
1 S'
.S
1 S'
.S
1 S'
.S
1 S'
.S
1 S'
.S
1 S'
.S
.S
S = the B i+1 bit in the three encoded bits
S' = The inversion of S
32*32-bit Booth Multiplier for
unsigned Number
................................
63
00
…............................
..............................
...............................
.................................
….........................
..................................
….......................
..................................
.........................
........................................
....
.......................
..................................
.....................
............................................
........
...................
..................................
….............
................................................
............
...............
..................................
.............
....................................................
................
...........
..................................
.........
........................................................
....................
.......
..................................
.....
............................................................
........................
...
..................................
.....................................
.................................. ............................
...........................
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
{
LSB
B i-1
B
B i+1
M
u
l
t
i
p
l
i
e
r
MSB
130
A
B
X
Unsigned Modified Booth Multiplier
P
A31
S[0]
A31
S[1]
A30
SEL_
END
One stage
A29
SEL
A31
S[2]
A30
SEL_
END
A29
SEL
FA
A28
SEL
SEL
SEL
A2
A28
SEL
SEL
HA
A2
SEL
A1
SEL
A27
SEL
SEL
A26
SEL
HA
HA
HA
A1
SEL
A0
0
SEL
SEL_
END
A1
SEL
SEL
A0
0
SEL
SEL_
END
A0
0
SEL
SEL_
END
X1[0]
X2[0]
S[0]
X1[1]
X2[1]
S[1]
HA
FA
FA
FA
FA
FA
FA
STAGE 3 TO 15
(Each stage includes 33 PP selectors, 32 full adders ,1 half adder and 1 NOT gate)
A31
A30
A1
A29
SEL
SEL
SEL
A0
0
SEL
SEL_
END
X1[2]
X2[2]
S[2]
FA
B[3:1]
Booth
Encoder
B[5:3]
FA
Booth
Encoder
B[i+1, I, i-1]
S[i]
X1[16]
X2[16]
S[16]
FA
FA
FA
FA
Booth
Encoder
00B[31]
FA
S[1]
0
S[0]
0
64-bit carry look ahead adder
P63
Booth
Encoder
S16
1
HA
B[1:0]0
HA
X1[i]
X2[i]
S [i]
SEL_
END
Booth
Encoder
S[2]
1
HA
A3
1
1
HA
SEL
A4
A30
SEL_
END
P62
P61
P35
P34
P33
P32
P31
P6
P5
P4
P3
P2
32*32-Bit Modified Booth Multiplier for Unsigned Number
P1
P0
131
A
B
X
P
Wallace Tree multipliers
A[31:0]
B[31:0]
32 partial products added in Wallace Tree Adder
C[63:0]
S[63:0]
64-bit Carry Look-ahead Adder
P[63:0]
132
A
B
X
P
Wallace Tree multipliers
.................................
................................
.................................
................................
.................................
................................
.................................
................................
.................................
................................
.................................
................................
.................................
................................
.................................
................................
.................................
................................
.................................
................................
.................................
................................
.................................
................................
.................................
................................
.................................
................................
.................................
................................
.................................
..................................
..................................
................................................................
................................................................
..............................................................
.........................................................
....
............................................................
.....................................................
......
..........................................................
.................................................
........
...............................................
.........
.......................................................
...........................................
...........
.....................................................
.......................................
.............
...................................................
...................................
..............
..................................
................................................
...............................................
.................
............................
.............................................
............................................
...........................................
..........................................
.........................................
........................................
.......................................
......................................
.....................................
....................................
.............................
.....
..................................
.................................
.................................
................................................................
............................................................
..
......................................................... .
..........................................................
.....
...................................................
..... .................................................
..................................................
..
..
.
...............................................
.................................................
...............................................
..
................................................
……......................................
…….....................................
……...................................
..........................................
………............................
………….......................
………….....................
………….........................
…………..................
……………..............
……...............
.......
..................................................................
...........
....
.. ..........................................................
.......................................................... .
..........................................................
.......................................................
...................................................
....
.. .. ..........................................
.................................................
..
....
... ...............................................
…...............................................
.................................................
……......................................
……......................................
……...................................
………………….................
• Use the 3:2 counters
and 2:2 counters
• Number of levels of
= log (32/2) / log (3/2)
≈8
• Irregular structure
• Fast
Input:
Output:
..
..
Sum
Carry
2:2 counter
..
.
..
Sum
Carry
3:2 counter
................................................................
............................................................
. .......................................................... .
..........................................................
.........................................................
…...................................................
……..............................................
…….......................................
....
….................................
………………….......................
.....
... ...................
........................
................................................................
............................................................
. .......................................................... .
..........................................................
.....
...................................................
..... .................................................
..................................................
..
.. . ........................
................................................................
.............................................................
.......................................................... .
..........................................................
.....
...................................................
....
.. .. ..................................................
.................................................
..
..........................................
....
................................................................
...
........................................................... .
............................................................
...........................................................
..... ...........................................
...
................................................................
............................................................ .
............................................................
......................................
..................................................................
............................................................... .
1
2
3
4
5
6
7
8
133
A
B
X
P
Wallace Tree multipliers
B63
.................................. B0
.................................. A0
A63
Cin
Carry Propagate/Generate unit
P63
.................................. P0
P63-P56
G63-G56
.....................................................................................
8-Bit
BCLA
C63-C56
PM7
C56
GM7
2-level hierarchical
.................................. G0
G63
PM6
GM6
8-Bit
BCLA
8-Bit
BCLA
C55-C48
C47-C40
PM5
C40
GM5
C48
PM4
GM4
C39-C32
PM3
GM3
C31-C24
C24
PM2
GM2
8-Bit
BCLA
8-Bit
BCLA
8-Bit
BCLA
8-Bit
BCLA
8-Bit
BCLA
C23-C16
C16
PM1
GM1
P7-P0
G7-G0
C15-C8
C8
PM0
GM0
C7-C0
8-Bit BCLA
P63
.................................. P0
C63
.................................. C0
64-Bit Summation Unit
C64
S63 ....................................................................................... S0
64-Bit Carry Look Ahead Adder
134
A
B
X
Modified Booth-Wallace Tree Multipliers
P
135
A
B
X
P
Modified Booth-Wallace Tree Multipliers
• Use the 3:2 counters
and 2:2 counters
• Number of levels of
= log (16/2) / log (3/2)
≈6
• Irregular structure
• Fast
• Less area
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
........................... .............
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
...........................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.........................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
.................................................................
..........................................................
..
.................................................................
.................................................................
Rearrage
1
2
PP Dot Matrix of Booth-Wallace
Multiplier for Signed Number
3
4
5
6
136
A
B
X
P
Twin pipe serial-parallel multipliers
B30 B28 …… B2
B0
A31 A30 …………………… A1
B1
Parallel in – serial out
shift registers
P62 P60 ……………………… P2
P0
Serial in – parallel out
shift registers
Parallel in – serial out
shift registers
B31 B29 …… B3
A0
32-bit twin pipe serial-parallel
multiplier unit
P63 P61 ……………………… P3
Serial in – parallel out
shift registers
P1
Result_ready
Load/Shift
Reset
Clock
Sign
Block diagram of 32*32-bit signed twin pipe serial-parallel
multiplier with serial/parallel conversion logic
137
A
B
X
Signed twin pipe serial-parallel
multipliers
P
Even data bits on rising clock
…...
B2
B0
0
A31
A30
A0
0 reset
FA
D
D
rising_edge
D
falling_edge
FA
D
D
D
FA
D
Repeat 28 units more
D
HA
D
D
Even
product
D
0
Product
MUX
D
D
D
FA
FA
FA
HA
D
Odd data bits on rising clock
…...
B3
B1
0
D
0 reset
1
D
Clock
Odd
product
B31 B29 …...
A31
A30
A0
Sign
Reset
Clock
32*32-bit twin pipe serial-parallel multiplier
for signed number
“Sign” control line and the sign-change hardware
138
A
B
X
Unsigned twin pipe serial-parallel
multipliers
P
Even data bits on rising clock
…...
B2
B0
0
A31
A30
A0
0 reset
HA
D
D
rising_edge
D
falling_edge
FA
D
D
D
FA
D
Repeat 28 units more
D
D
D
Even
product
HA
D
0
Product
MUX
D
D
D
HA
FA
FA
D
Odd data bits on rising clock
…...
B3
B1
0
1
D
Clock
HA
Odd
product
0 reset
A31
A30
A0
Reset
Clock
32*32 bit twin pipe serial-parallel multiplier
for unsigned number
• Don’t need the “Sign” control line and the sign-change hardware
139