Advanced Dividers Lecture 10

Download Report

Transcript Advanced Dividers Lecture 10

Lecture 10
Advanced Dividers
Required Reading
Behrooz Parhami,
Computer Arithmetic: Algorithms and Hardware Design
Chapter 13, Basic Division Schemes
13.4, Non-Restoring and Signed Division
Chapter 15 Variation in Dividers
15.6, Combined Multiply/Divide Units
15.3, Combinational and Array Dividers
Chapter 16, Division by Convergence
Division versus Multiplication
Division is more complex than multiplication:
Need for quotient digit selection or estimation
Overflow possibility: the high-order k bits of z
must be strictly less than d; this overflow check
also detects the divide-by-zero condition.
Pentium III latencies
Instruction
Latency
Load / Store
3
Integer Multiply
4
Integer Divide
36
Double/Single FP Multiply
5
Double/Single FP Add
3
Double/Single FP Divide
38
Cycles/Issue
1
The ratios haven’t
1
changed much in
36
later Pentiums, Atom,
2
or AMD products*
1
*Source: T. Granlund, “Instruction
38
Latencies and Throughput for AMD
and Intel x86 Processors,” Feb. 2012
May 2012
Computer Arithmetic, Division
Slide 3
Classification of Dividers
Sequential
Radix-2
Array
Dividers
Dividers
by Convergence
High-radix
Restoring
Non-restoring
• regular
• SRT
• using carry save adders
• SRT using carry save adders
4
Notation
and
Basic Equations
Notation
z
Dividend
z2k-1z2k-2 . . . z2 z1 z0
d
Divisor
dk-1dk-2 . . . d1 d0
q
Quotient
qk-1qk-2 . . . q1 q0
s
Remainder
(s = z - dq)
sk-1sk-2 . . . s1 s0
6
Basic Equations of Division
z=dq +s
|s|<|d|
sign(s) = sign(z)
z>0
0s<|d|
z<0
-|d|<s0
7
Sequential Integer Division
Basic Equations
s(0) = z
s(j) = 2 s(j-1) - qk-j (2k d) for j=1..k
s(k) = 2k s
8
Restoring Unsigned Integer
Division
Restoring Unsigned Integer Division
s(0) = z
for j = 1 to k
if 2 s(j-1) - 2k d > 0
qk-j = 1
s(j) = 2 s(j-1) - 2k d
else
qk-j = 0
s(j) = 2 s(j-1)
10
Example of restoring unsigned division
11
Shift/subtract sequential restoring
divider
12
Non-Restoring Unsigned
Integer Division
Non-Restoring Unsigned Integer Division
s(1) = 2 z - 2k d
for j = 2 to k
if s(j-1)  0
qk-(j-1) = 1
s(j) = 2 s(j-1) - 2k d
else
qk-(j-1) = 0
s(j) = 2 s(j-1) + 2k d
end for
if s(k)  0
q0 = 1
else
q0 = 0
Correction step
Non-Restoring Unsigned Integer Division
Correction step
s = 2-k · s(k)
z=qd+s
z, q, d ≥ 0
s<0
z = (q-1) d + (s+d)
z = q’ d + s’
15
Example of nonrestoring unsigned
division
16
Partial remainder variations for restoring and
nonrestoring division
17
Non-Restoring Unsigned Integer Division
Justification
s(j-1) ≥ 0
2 s(j-1) - 2k d < 0
2 (2 s(j-1) ) - 2k d
Restoring division
≥0
Non-Restoring division
s(j) = 2 s(j-1)
s(j) = 2 s(j-1) - 2k d
s(j+1) = 2 s(j) - 2k d =
= 4 s(j-1) - 2k d
s(j+1) = 2 s(j) + 2k d =
= 2 (2 s(j-1) - 2k d) + 2k d =
= 4 s(j-1) - 2k d
18
Convergence of the
Partial Quotient to q
1111
Partial quotient
1110
1101
Example
1100
(0 1 1 1 0 1 0 1)two / (1 0 1 0)two
q(2)
Nonrestoring
q(4)
1011
1010
(117)ten/(10)ten = (11)ten = (1011)two
q(3)
1001
Restoring
1000
In restoring division, the
partial quotient converges
to q from below
q
q(1)
0111
q(2)
0110
0101
0100
In nonrestoring division,
the partial quotient may
overshoot q, but converges
to it after some oscillations
May 2012
0011
0010
0001
0000
0
q(0)
Computer Arithmetic, Division
Iteration
1
2
3
4
Slide 19
Non-Restoring
Signed Integer Division
Non-Restoring Signed Integer Division
s(0) = z
for j = 1 to k
if sign(s(j-1)) == sign(d)
qk-j = 1
s(j) = 2 s(j-1) - 2k d = 2 s(j-1) - qk-j (2k d)
else
qk-j = -1
s(j) = 2 s(j-1) + 2k d = 2 s(j-1) - qk-j (2k d)
q = BSD_2’s_comp_conversion(q)
Correction_step
21
Non-Restoring Signed Integer Division
Correction step
s = 2-k · s(k)
z=qd+s
sign(s) = sign(z)
z = (q-1) d + (s+d)
z = q’ d + s’
z = (q+1) d + (s-d)
z = q” d + s”
22
Example of nonrestoring signed division
========================
z
0010 0001
4
2d
1 1001
4
–2 d
0 0111
========================
s(0)
0 0010 0001
(0)
2s
0 0100 001
4
+2 d
1 1001
––––––––––––––––––––––––
s(1)
1 1101 001
(1)
2s
1 1010 01
+(–24d)
0 0111
––––––––––––––––––––––––
s(2)
0 0001 01
(2)
2s
0 0010 1
4
+2 d
1 1001
––––––––––––––––––––––––
s(3)
1 1011 1
(3)
2s
1 0111
4
+(–2 d)
0 0111
––––––––––––––––––––––––
s(4)
1 1110
4
+(–2 d)
0 0111
––––––––––––––––––––––––
s(4)
0 0101
s
0101
q
1 1 -1 1
========================
sign(s(0))  sign(d),
so set q3 = -1 and add
sign(s(1)) = sign(d),
so set q2 = 1 and subtract
sign(s(2))  sign(d),
so set q1 = -1 and add
sign(s(3)) = sign(d),
so set q0 = 1 and subtract
sign(s(4))  sign(z),
so perform corrective subtraction
p=
0 1 0 1
1 1 0 1 1
1 1 0 0
Shift, compl MSB
Add 1 to correct
Check: 33/(-7) = -4
23
BSD  2’s Complement Conversion
q = (qk-1 qk-2 . . . q1 q0)BSD =
= (pk-1 pk-2 . . . p1 p0 1)2’s complement
where
qi
-1
1
pi
0
1
Example:
qBSD
p
q2’scomp
no overflow if pk-2 = pk-1
1 -1 1 1
1 011
00111 =0111
(qk-1  qk-2)
24
Nonrestoring Hardware Divider
q k–j
M SB of
2s (j–1)
Quotient
Partial Remainder
Divis or Sign
Divis or
k
add/sub
Complement
cout
cin
k-bit adder
k
Complement of
Partial Remainder Sign
May 2012
Computer Arithmetic, Division
Slide 25
Multiply/Divide
Unit
26
Multiply-Divide Unit
k
The control unit
proceeds through
necessary steps for
multiplication or
division
(including using
the appropriate
shift direction)
Partial product p or
partial remainder s
qk–j
Multiplier x
or quotient q
Shift control
xj
MSB of 2s (j–1)
Divisor sign
Multiplicand a
or divisor d
0
The slight speed
penalty owing to a
more complex
control unit is
insignificant
Mul Div
Shift
Mux 1
Enable
MSB of p (j+1)
k
Multiply/
divide
control
Select
k
cout
Adder
cin
Fig. 15.9 Sequential radix-2 multiply/divide unit.
Fractional Division
Unsigned Fractional Division
zfrac Dividend
.z-1z-2 . . . z-(2k-1)z-2k
dfrac Divisor
.d-1d-2 . . . d-(k-1) d-k
qfrac Quotient
.q-1q-2 . . . q-(k-1) q-k
sfrac Remainder
.000…0s-(k+1) . . . s-(2k-1) s-2k
k bits
29
Integer vs. Fractional Division
For Integers:
z=qd+s
 2-2k
z 2-2k = (q 2-k) (d 2-k) + s (2-2k)
For Fractions:
zfrac = qfrac dfrac + sfrac
where
zfrac = z 2-2k
dfrac = d 2-k
qfrac = q 2-k
sfrac = s 2-2k
30
Unsigned Fractional Division Overflow
Condition for no overflow:
zfrac < dfrac
31
Sequential Fractional Division
Basic Equations
s(0) = zfrac
s(j) = 2 s(j-1) - q-j dfrac
for j=1..k
2k · sfrac = s(k)
sfrac = 2-k · s(k)
32
Fig. 13.2 Examples of sequential division with integer
and fractional operands.
33
Array
Dividers
34
Sequential Fractional Division
Basic Equations
sfrac(0) = zfrac
s(j) = 2 s(j-1) - q-j dfrac
s(k)frac = 2k sfrac
35
Restoring Unsigned Fractional Division
s(0) = z
for j = 1 to k
if 2 s(j-1) - d > 0
q-j = 1
s(j) = 2 s(j-1) - d
else
q-j = 0
s(j) = 2 s(j-1)
36
Restoring Array Divider
z –1 d –1
z –2 d –2
z –3 d –3
z –4
q –1
0
z –5
q –2
0
z –6
q
–3
0
Cell
s–4
FS
1
May 2012
0
Dividend
Divisor
Quotient
Remainder
z
d
q
s
s–5
=
=
=
=
.z–1
.d–1
.q–1
.0
z–2
d–2
q–2
0
z–3
d–3
q–3
0
Computer Arithmetic, Division
s–6
z–4 z–5 z–6
s–4 s–5 s–6
Slide 37
Non-Restoring Unsigned Fractional Division
s(-1) = z-d
for j = 0 to k-1
if s(j-1) > 0
q-j = 1
s(j) = 2 s(j-1) - d
else
q-j = 0
s(j) = 2 s(j-1) + d
end for
if s(k-1) > 0
q-k = 1
else
q-k = 0
38
Nonrestoring Array Divider
d0
z0
d –1
z –1 d
–2
z –2 d
–3
z –3
1
z –4
q0
z –5
q –1
Critical
path
z –6
q –2
Cell
XOR
FA
May 2012
Similarity to
array multiplier
is deceiving
q –3
s –3
Dividend
Divisor
Quotient
Remainder
s –4
z
d
q
s
=
=
=
=
z
0
d
0
q
0
0
.z
–1
.d
–1
.q
–1
.0
s –5
z
–2
d
–2
q
–2
0
Computer Arithmetic, Division
z
–3
d
–3
q
–3
s
–3
s –6
z
z
–4 z
–5 –6
s
s
–4 s
–5 –6
Slide 39
Division by Convergence
40
Division by Convergence
Chapter Goals
Show how by using multiplication as the
basic operation in each division step,
the number of iterations can be reduced
Chapter Highlights
Digit-recurrence as convergence method
Convergence by Newton-Raphson iteration
Computing the reciprocal of a number
Hardware implementation and fine tuning
May 2012
Computer Arithmetic, Division
Slide 41
16.1 General Convergence Methods
Sequential digit-at-a-time (binary or high-radix) division
can be viewed as a convergence scheme
As each new digit of q = z / d is determined, the quotient value
is refined, until it reaches the final correct value
Convergence is from below in restoring division and oscillating
in nonrestoring division
q
Meanwhile,
the remainder
s=z–qd
approaches 0;
the scaled
remainder is kept
in a certain range,
such as [– d, d)
May 2012
1
0.101101
0
Computer Arithmetic, Division
Digit
Slide 42
Elaboration on Scaled Remainder in Division
The partial remainder s(j) in division recurrence isn’t the true remainder
but a version scaled by 2j
Division with left shifts
s(j) = 2s(j–1) – qk–j (2k d)
|–shift–|
|––– subtract–––|
with s(0) = z and
s(k) = 2ks
q
1
Quotient digit selection
keeps the scaled
remainder bounded
0.101101
(say, in the range
–d to d) to ensure the
convergence of the
true remainder to 0
0
May 2012
Computer Arithmetic, Division
Digit
Slide 43
Recurrence Formulas for Convergence Methods
u (i+1) = f(u (i), v (i))
v (i+1) = g(u (i), v (i))
Constant
Desired
function
u (i+1) = f(u (i), v (i), w (i))
v (i+1) = g(u (i), v (i), w (i))
w (i+1) = h(u (i), v (i), w (i))
Guide the iteration such that one of the values converges
to a constant (usually 0 or 1)
The other value then converges to the desired function
The complexity of this method depends on two factors:
a. Ease of evaluating f and g (and h)
b. Rate of convergence (number of iterations needed)
May 2012
Computer Arithmetic, Division
Slide 44
16.2 Division by Repeated Multiplications
Motivation: Suppose add takes 1 clock and multiply 3 clocks
64-bit divide takes 64 clocks in radix 2, 32 in radix 4
 Divide faster via multiplications faster if 10 or fewer needed
Idea:
z zx (0 ) x (1)  x ( m -1)
q   (0 ) (1)
d dx x  x ( m -1)
Converges to q
Force to 1
Remainder often not needed, but can be obtained
by another multiplication if desired: s = z – qd
To turn the identity into a division algorithm, we face three questions:
1. How to select the multipliers x(i) ?
2. How many iterations (pairs of multiplications)?
3. How to implement in hardware?
May 2012
Computer Arithmetic, Division
Slide 45
Formulation as a Convergence Computation
Idea:
z zx (0 ) x (1)  x ( m -1)
q   (0 ) (1)
d dx x  x ( m -1)
d (i+1) = d (i) x (i)
z (i+1) = z (i) x (i)
Converges to q
Force to 1
Set d (0) = d; make d (m) converge to 1
Set z (0) = z; obtain z/d = q  z (m)
Question 1: How to select the multipliers x (i) ?
x (i) = 2 – d (i)
This choice transforms the recurrence equations into:
d (i+1) = d (i) (2 - d (i))
z (i+1) = z (i) (2 - d (i))
u (i+1) = f(u (i), v (i))
v (i+1) = g(u (i), v (i))
May 2012
Set d (0) = d; iterate until d (m)  1
Set z (0) = z; obtain z/d = q  z (m)
Fits the general form
Computer Arithmetic, Division
Slide 46
Determining the Rate of Convergence
d (i+1) = d (i) x (i)
z (i+1) = z (i) x (i)
Set d (0) = d; make d (m) converge to 1
Set z (0) = z; obtain z/d = q  z (m)
Question 2: How quickly does d (i) converge to 1?
We can relate the error in step i + 1 to the error in step i:
d (i+1) = d (i) (2 - d (i)) = 1 – (1 – d (i))2
1 – d (i+1) = (1 – d (i))2
For 1 – d (i)  e, we get 1 – d (i+1)  e2:
Quadratic convergence
In general, for k-bit operands, we need
2m – 1 multiplications and m 2’s complementations
where m = log2 k
May 2012
Computer Arithmetic, Division
Slide 47
Quadratic Convergence
Table 16.1 Quadratic convergence in computing z/d
by repeated multiplications, where 1/2  d = 1 – y < 1
–––––––––––––––––––––––––––––––––––––––––––––––––––––––
i
d (i) = d (i–1) x (i–1), with d (0) = d
x (i) = 2 – d (i)
–––––––––––––––––––––––––––––––––––––––––––––––––––––––
0
1 – y = (.1xxx xxxx xxxx xxxx)two  1/2
1+y
1
1 – y 2 = (.11xx xxxx xxxx xxxx)two  3/4
1 + y2
2
1 – y 4 = (.1111 xxxx xxxx xxxx)two  15/16
1 + y4
3
1 – y 8 = (.1111 1111 xxxx xxxx)two  255/256
1 + y8
4
1 – y 16 = (.1111 1111 1111 1111)two = 1 – ulp
–––––––––––––––––––––––––––––––––––––––––––––––––––––––
Each iteration doubles the number of guaranteed leading 1s
(convergence to 1 is from below)
Beginning with a single 1 (d  ½), after log2 k iterations we get
as close to 1 as is possible in a fractional representation
May 2012
Computer Arithmetic, Division
Slide 48
Graphical Depiction of Convergence to q
1
1 – ulp
d (i)
d
q
q– e
z (i)
z
Iteration i
0
1
2
3
4
5
6
Fig. 16.1 Graphical representation of convergence
in division by repeated multiplications.
May 2012
Computer Arithmetic, Division
Slide 49
16.5 Hardware Implementation
Repeated multiplications: Each pair of ops involves the same multiplier
d (i+1) = d (i) (2 - d (i))
z (i+1) = z (i) (2 - d (i))
z(i)
x(i)
Set d (0) = d; iterate until d (m)  1
Set z (0) = z; obtain z/d = q  z (m)
2's Compl
d(i+1)
x(i+1)
z(i+1)
x(i+1)
(i+1) (i+1)
z
d(i) x(i)
z(i) x(i)
d(i+1)x(i+1)
(i+1)
z(i+1)
z(i) x(i)
d
d
x
(i+1) (i+1)
x
d(i+2)
Fig. 16.6 Two multiplications fully overlapped
in a 2-stage pipelined multiplier.
May 2012
Computer Arithmetic, Division
Slide 50
16.3 Division by Reciprocation
f(x)
The Newton-Raphson
method can be used for
finding a root of f (x) = 0
Tangent at x (i)
Start with an initial estimate
x(0) for the root
Iteratively refine the
estimate via the recurrence
f(x (i))
x(i+1) = x(i) – f (x(i)) / f (x(i))
 (i)
Root
x(i+2)
x (i+1)
x
x (i)
Justification:
tan (i) = f (x(i))
= f (x(i)) / (x(i) – x(i+1))
May 2012
Fig. 16.2 Convergence to a root of
f(x) = 0 in the Newton-Raphson method.
Computer Arithmetic, Division
Slide 51
Computing 1/d by Convergence
1/d is the root of f (x) = 1/x – d
f(x)
f (x) = –1/x2
Substitute in the Newton-Raphson
recurrence x(i+1) = x(i) – f (x(i)) / f (x(i)) to get:
x (i+1)
=
x (i) (2
-
1/d
x
-d
x (i)d)
One iteration = Two multiplications + One 2’s complementation
Error analysis: Let d (i) = 1/d – x(i) be the error at the ith iteration
d (i+1) = 1/d – x (i+1) = 1/d – x (i) (2 – x (i) d) = d (1/d – x (i))2 = d (d (i))2
Because d < 1, we have d (i+1) < (d (i))2
May 2012
Computer Arithmetic, Division
Slide 52
Choosing the Initial Approximation to 1/d
With x(0) in the range 0 < x(0) < 2/d, convergence is guaranteed
Justification:
| d(0) | = | x(0) – 1/d | < 1/d
d(1) = | x(1) – 1/d | = d (d(0))2 = (d d(0)) d(0) < d(0)
For d in [1/2, 1):
Simple choice
1/x
x(0) = 1.5
2
Max error = 0.5 < 1/d
1
Better approx.
x(0) = 4(3 – 1) – 2d
= 2.9282 – 2d
Max error  0.1
May 2012
Computer Arithmetic, Division
0
0
x
1
Slide 53
16.4 Speedup of Convergence Division
z zx (0 ) x (1)  x ( m -1)
q   (0 ) (1)
d dx x  x ( m -1)
Compute y = 1/d
Do the multiplication yz
Division can be performed via 2 log2 k – 1 multiplications
This is not yet very impressive
64-bit numbers, 3-ns multiplier  33-ns division
Three types of speedup are possible:
Fewer multiplications (reduce m)
Narrower multiplications (reduce the width of some x(i)s)
Faster multiplications
May 2012
Computer Arithmetic, Division
Slide 54
Initial Approximation via Table Lookup
Convergence is slow in the beginning: it takes 6 multiplications to get
8 bits of convergence and another 5 to go from 8 bits to 64 bits
Better approx
Approx to 1/d
d x(0) x(1) x(2) = (0.1111 1111 . . . )two
Read this value, x(0+), directly from a table,
thereby reducing 6 multiplications to 2
A 2w  w lookup table is necessary and sufficient for w bits of
convergence after 2 multiplications
Example with 4-bit lookup: d = 0.1011 xxxx . . . (11/16  d < 12/16)
Inverses of the two extremes are 16/11  1.0111 and 16/12  1.0101
So, 1.0110 is a good estimate for 1/d
1.0110  0.1011 = (11/8)  (11/16) = 121/128 = 0.1111001
1.0110  0.1100 = (11/8)  (3/4) = 33/32 = 1.000010
May 2012
Computer Arithmetic, Division
Slide 55
Visualizing the Convergence with Table Lookup
1 – ulp
1
d
q– e
After the 2nd pair
of multiplications
z
After table lookup and 1st
pair of multiplications,
replacing several iterations
Iterations
Fig. 16.3 Convergence in division by repeated multiplications
with initial table lookup.
May 2012
Computer Arithmetic, Division
Slide 56
Convergence Does Not Have to Be from Below
1 ± ulp
1
d
q± e
z
Iterations
Fig. 16.4 Convergence in division by repeated multiplications with
initial table lookup and the use of truncated multiplicative factors.
May 2012
Computer Arithmetic, Division
Slide 57
Sequential
Dividers
with
Carry-Save Adders
58
Block diagram of a radix-2 SRT divider with partial
remainder in stored-carry form
59
Pentium bug (1)
October 1994
Thomas Nicely, Lynchburg Collage, Virginia
finds an error in his computer calculations, and traces
it back to the Pentium processor
November 7, 1994
First press announcement, Electronic Engineering Times
Late 1994
Tim Coe, Vitesse Semiconductor
presents an example with the worst-case error
c = 4 195 835/3 145 727
Pentium
= 1.333 739 06...
Correct result = 1.333 820 44...
Pentium bug (2)
Intel admits “subtle flaw”
November 30, 1994
Intel’s white paper about the bug and its possible consequences
Intel - average spreadsheet user affected once in 27,000 years
IBM - average spreadsheet user affected once every 24 days
Replacements based on customer needs
December 20, 1994
Announcement of no-question-asked replacements
Pentium bug (3)
Error traced back to the look-up table used by
the radix-4 SRT division algorithm
2048 cells, 1066 non-zero values {-2, -1, 1, 2}
5 non-zero values not downloaded correctly
to the lookup table due to an error in the C script
Follow-up
Courses
64
DIGITAL SYSTEMS DESIGN
1. ECE 681 VLSI Design for ASICs
(Fall semesters)
H. Homayoun, project/lab, front-end and back-end
ASIC design with Synopsys tools
2. ECE 699 Digital Signal Processing Hardware Architectures
(Fall semesters)
A. Cohen, project, FPGA design for DSP
3. ECE 682 VLSI Test Concepts
(Spring semesters)
T. Storey, homework
NETWORK AND SYSTEM SECURITY
1. ECE 646 Cryptography and Computer Network Security
(Fall semesters)
K.Gaj, hardware, software, or analytical project
2. ECE 746 Advanced Applied Cryptography
(Spring semesters)
J.-P. Kaps, hardware, software, or analytical project
3. ECE 899 Cryptographic Engineering
(Spring semesters)
J.-P. Kaps, research-oriented project