A Fast Hardware Approach for Efficient Computation of

Download Report

Transcript A Fast Hardware Approach for Efficient Computation of

A Fast Hardware Approach for
Approximate, Efficient Logarithm
and Anti-logarithm Computation
Suganth Paul
Nikhil Jayakumar
Sunil P. Khatri
Department of Electrical and Computer Engineering
Texas A&M University, College Station

Introduction
•
•
•
•
•
•
•
The fast generation of functions such as logarithm and antilogarithm is
important in areas such as DSP, computer graphics, scientific
computing, artificial neural networks, logarithmic number systems.
Over the past, authors have proposed various hardware approaches to
accurately approximate logarithm and antilogarithm functions.
Out of these approaches, Look up table (LUT) based methods such as
Brubaker, Maenner, Kmetz, SBTM are widely used.
Some hardware approaches also include LUTs combined with
polynomial approximations. But these need multiplications/divisions.
Our approach combines an LUT with linear interpolation implemented
in an area and delay efficient manner.
The novelty of our approach lies in the fact that we do not need a
multiplier or divider to perform interpolation. Also we use the
same hardware structure to implement log and antilog.
The number format used for the computation is shown below.
N  2e (1 m)
Here m : 0 < m < 1 is the Mantissa and
e : is the exponent.
Mitchell Approximation
The logarithm of a number is found as
N  2e (1 m)
log 2 (N)  e  log 2 (1 m)
Mitchell’s approximation is given by





log 2 (N)  e  m
where
log 2 (1 m)  m
The error due to this approximation is
E m  log 2 (1 m)  m
The error is plotted on the right
Kmetz Approximation
•
•
•
In the Kmetz method, the Mitchell error curve shown above is
t
sampled at 2 points and stored in an LUT.
Here the LUT is indexed by the first t bits of the mantissa m
If the error value looked up from the LUT is a , the logarithm is found
as

log 2 (N)  e  log 2 (1 m)

where
log 2 (1 m)  m  a


• The error in this case due to approximating the logarithm of the
mantissa portion is given by


E k  log 2 (1 m)  m  a

Our Approach
•
•
In our method we interpolate between values stored in the LUT to get a
more accurate result.
The logarithm of the mantissa part of the number is obtained as
log2 (1 m)  m  a 
•
where
a
t
b
(b  a)  n
2kt
is the error value from the LUT at location i
is the number of leading bits in the mantissa indexing the table
is the next value in the LUT at location i 1
is the total number of bits used to represent the mantissa
k  t bits of the mantissa
is the decimal value of the last 

k

n
• The multiplication step (b  a)  n is found as

anti log 2 (log 2 (b  a)  log
2 (n))
• log 2 (n) is found by using the same LUT as above


• We consider 
the following approximations to find log 2 (b  a)
anti log 2 (log 2 (b  a)  log 2 (n))


and
Errors for Various Interpolation
Methods and Table Sizes
1.


log 2 (b  a)
is found by
a)
Mitchell approximation
b) Kmetz approximation using another LUT
2.
anti log 2 (log 2 (b  a)  log 2 (n)) is found by
a)
Mitchell approximation
b)
Kmetz approximation using another LUT
We find from the table below that 1.b) 2.b) has the best error performance
and hence we use LUTs to approximate the multiplication.
3
10
Max Error is in
Block Diagram of the Log Engine
•
•

•
•
The block diagram shows the
implementation log 2 (1 m) of
where m is the 23 bit mantissa
The number of leading bits of
the mantissa going to the
 depends on the size
interpolator
of the LUTs used in the
Interpolator.
In this case we are using an
LUT that holds 64 values and
13 bits of the mantissa are
required.
The Interpolator block is shown
below.
Interpolator Block Diagram
• The implementation can be
pipelined to get a better
throughput.
• The
COMPARE
block
determines if the final stage
does an Add or Subtract.
• The LOD (leading one
detector) block finds the
position of the leading one
and the rest of the bits are
used to access the LUT.
• The LUT used to find a and
log 2 (n) is the same and
is implemented as a dual
port ROM.


Antilog Computation
•
Let M  log 2 (N)  e  m
The antilogarithm of this number is found as
antilog2 (M)  2M  2e2m
Using Mitchell’s method we make the following approximation

2 m  1 m
•

•
•
A Kmetz approximation can be made by storing the error due to this
approximation in an LUT and adding the error value to the above
equation for the antilogarithm.
In our approach, we compute the antilogarithm by interpolating
efficiently between two adjacent table values stored in the LUT without
needing a multiplier.
We follow the same flow used for computing the logarithm. The error
incurred while using different table sizes for computing the
antilogarithm is shown below.
Comparison of FPGA Resources
used by the Log Engine
• We implemented our method and the Symmetric Bipartite Table
Method (SBTM) using a Virtex2P FPGA.
• Our method requires smaller on-chip Block Rams.
• Both methods occupied less than 1% of FPGA resources
• Both methods were able to support clock speeds of a little over
350 MHz.
Comparison of LUT Size used and
Accuracy of the Log Computation
Conclusion
• Our approach has low memory requirement as compared
with other methods to provide better accuracies.
• When compared to the SBTM, for every two bits of extra bits of
accuracy,
– we need a factor of 2 increase in the LUT size
– the SBTM needs a factor of 3 increase in the LUT size
Hence our method scales well for higher accuracy in bits.
• We are area efficient compared polynomial interpolation
methods as we do not need a multiplier or divider to perform
interpolation.
• The implementation can be pipelined and the number of stages
in the pipeline can be varied depending on the throughput
required.
• We have presented an approach to efficiently compute the
logarithm and antilogarithm of a number in hardware.