Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Prof.

Download Report

Transcript Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Prof.

Automating Transformations from
Floating Point to Fixed Point for Implementing
Digital Signal Processing Algorithms
Prof. Brian L. Evans
Embedded Signal Processing Laboratory
Dept. of Electrical and Computer Engineering
The University of Texas at Austin
Based on work by PhD student Kyungtae Han (now at Intel Research Labs)
July 4, 2006
2
Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
3
Introduction
Implementing Digital Signal
Processing Algorithms
Hardware Price
Floating-Point Program
FloatingPoint
Processor
Code
Conversion
Digital Signal
Processing
Algorithms
Fixed Point
(Uniform Wordlength)
Wordlength
Optimization
Fixed Point
(Optimized Wordlength)
ASIC: Application Specific Integrated Circuit
Power*
FixedPoint
Processor
FixedPoint
ASIC
L
H
L
H
L
H
* Power consumption
4
Introduction
Transformations to Fixed Point
 Lower hardware complexity
 Lower power consumption
 Faster speed in processing
Floating-Point Program
Code
Conversion
• Disadvantages
 Introduces distortion due to
quantization error
 Search for optimum wordlengths
by trial & error is time-consuming
Wordlength
Optimization
• Research goals
 Automate transformations to fixed point
 Control distortion vs. complexity tradeoffs
Transformation
• Advantages
Fixed Point
(Optimized Wordlength)
5
Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
6
Background
Fixed-Point Data Format
• Integer wordlength (IWL)
 Number of bits assigned to integer representation
 Includes sign bit
SystemC format
www.systemc.org
• Fractional wordlength (FWL)
 Number of bits assigned to fraction
Wordlength
• Wordlength: WL = IWL + FWL
π=
3.14159…
(10)
[Floating Point]
3.140625(10) = 011.001001(2)
[WL=9; IWL=3; FWL=6]
3.141479492(10) = 011.00100100001110(2)
[WL=16; IWL=3; FWL=13]
S
X X X X X
Integer
Fractional
wordlength wordlength
(Binary point)
7
Background
Distortion vs. Complexity Tradeoffs
• Different wordlengths have different application
distortion and implementation complexity tradeoffs
Application
distortion d(w)
Vector of wordlengths:
Feasible
region
w  {w0 , w1 , w2 ,, wN 1}
c(w) Implementation cost
function
Optimal
tradeoff
curve
Implementation
complexity c(w)
• Minimize implementation cost
• Minimize application distortion
Cmax Constant for maximum
implementation cost
d(w) Application distortion
function
Dmax Constant for maximum
application distortion
w
Wordlength lower bounds
w
Wordlength upper bounds
8
Background
Wordlength Optimization
• Single objective
optimization
minn ac c (w )  ad d ( w )
wΙ
• Multiple objective
optimization
min
[c ( w ), d ( w )]
n
wΙ
subject t o
subject t o
d ( w )  Dmax
d ( w )  Dmax
c (w )  Cmax
c ( w )  Cmax
www
www
Proposed work fixes integer wordlengths
and searches for fractional wordlengths
9
Background
Genetic Algorithm
• Evolutionary
algorithm
New Gene
Pool
Function
Evaluation
Genes w/
Measure
 Inspired by Holland
1975
 Mimic processes of
plant and animal
evolution
 Find optimum of a
complex function
Mutation
Child
Genes
Selection
Mating
Parental
Genes
[Greg Rohling, Ph.D Defense, Georgia Tech, 2004]
10
Background
Pareto Optimality
 E is dominated by C as all objectives for C
are less than corresponding objectives for E
 Solutions A, B, C, D are nondominated (not
dominated by any solution)
• Pareto front is boundary (tradeoff curve)
that connects Pareto optimal set solutions
Pareto Front
I
A
Objective 2
• Pareto optimality: “best that could be
achieved without disadvantaging at least
one group” [Schick, 1970]
• Pareto optimal set is set of nondominated
solutions
G
H
B
E
C
F
D
Objective 1
: Nondominated
: Dominated
11
Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
Optimize Fixed-Point Wordlengths
Search for Optimum Wordlength
• Exhaustive search impractical for many variables
• Gradient-based search (single objective)
 Utilizes gradient information to determine next candidates
 Complexity measure (CM) [Sung & Kum, 1995]
 Distortion measure (DM) [Han et al., 2001]
Next
 Complexity-and-distortion measure (CDM) [Han & Evans, 2004]
• Guided random search
 Genetic algorithm for single objective [Leban & Tasic, 2000]
Next
 Multiple objective genetic algorithm [Han, Olson & Evans, 2006]
12
13
Optimize Fixed-Point Wordlengths
Complexity-and-Distortion Measure
• Weighted combination of measures
f cd (w)  c c(w)   d d (w)
where c   d  1, 0  c  1, 0   d  1
• Single objective function
• Gradient-based search
 Initialization
minn f cd (w )
wΙ
subject t o
d (w )  Dmax
 Iterative greedy search based c(w )  Cmax
on complexity and distortion w  w  w
gradient information
c(w) Complexity
function
d(w) Distortion
function
Dmax Constant for
maximum
distortion
Cmax Constant for
maximum
complexity
14
Optimize Fixed-Point Wordlengths
Case Study I: Filter Design
• Infinite impulse response (IIR) filter
 Complexity measure: Area model of field-programmable
gate array (FPGA) [Constantinides, Cheung & Luk 2003]
 Distortion measure: Root mean square (RMS) error
 Seven fixed-point variables (indicated by slashes)
b0
x[n]
y[n]
Delay
-a1
b1
15
Optimize Fixed-Point Wordlengths
Case Study I: Gradient-Based Search
• CDM could lead to lower complexity and lower
number of simulations compared to DM and CM
Search
Method
Gradient
Measure
Gradient
Gradient
Gradient
Complete
DM
CDM
CM
-
Number
of System
Simulations
Complexity
Estimate
(LUT)
Distortion
(RMS)*
316
145
417
167 **
51.05
49.85
51.95
-
0.0981
0.0992
0.0986
-
* Maximum distortion measured by root mean square (RMS) error is 0.1
** 167 = 268,435,456 (8.5 years, if 1 second per 1 simulation)
16
Optimize Fixed-Point Wordlengths
Case Study I: Genetic Algorithm
• Search Pareto optimal set (nondominated)
• Handles multiple objectives: Error and Area
Pareto Front
10
0
10
0
10
10
-1
-2
9,000 simulations
20
40
60
Area (LUTs)
80
100
10
10
Error (RMS)
10
non-dom (90/90)
non-dom (76/90)
dom (14/90)
Error (RMS)
Error (RMS)
non-dom (67/90)
dom (23/90)
0
-1
-2
22,500 simulations
20
100th Generation
* Population for one generation: 90
40
60
Area (LUTs)
80
250th Generation
100
10
10
-1
-2
45,000 simulations
20
40
60
Area (LUTs)
80
500th Generation
LUT: Lookup table
100
17
Optimize Fixed-Point Wordlengths
Case Study I: Comparison
• Gradient-based search (GS) results vs. GA results
10
10
0
10
Error (RMS)
Error (RMS)
10
-1
non-dom (35/90)
dom (55/90)
DM solutions
CDM solutions
CM solutions
-2
20
40
60
Area (LUTs)
non-dom (90/90)
DM solutions
CDM solutions
CM solutions
10
10
80
100
0
-1
-2
20
40
60
Area (LUTs)
80
100
50th Generation (4500 simulations)
500th Generation (45000 simulations)
* Required RMSmax for gradient-based search are Dmax  {0.12, 0.1, 0.08}
• GS methods can get stuck in a local minimum
• GS methods reduce running time (CDM: 145 simulations)
18
Optimize Fixed-Point Wordlengths
Case Study II: Communication System
• Simple binary phase shift keying (BPSK) system
 Complexity measure: Area model of field-programmable
gate array (FPGA) [Constantinides, Cheung, and Luk 2003]
 Distortion measure: Bit error rate (BER)
 Four fixed-point variables (indicated by slashes)
Source Data
(1 or -1)
Carrier
BER
Decision
Integration
& Dump
AWGN
19
Optimize Fixed-Point Wordlengths
Case Study II: Gradient-Based Search
• CDM could lead to lower complexity and lower
number of simulations compared to DM and CM
Search
Method
Gradient
Measure
Gradient
Gradient
Gradient
Complete
DM
CDM
CM
-
Number
of System
Simulations
Complexity
Estimate
(LUT)
Distortion
(BER)*
66
65
193
65536
40.65
43.65
41.95
-
0.083
0.085
0.081
-
* Maximum distortion measured by bit error rate (BER) error is 0.1
20
Optimize Fixed-Point Wordlengths
• Search Pareto optimal set
• Handles multiple objectives
4,500 simulations
BER
LUT
DM
0.83
40.65
CDM
0.85
43.95
CM
0.81
41.95
Error (Bit Error Rate)
Error (Bit Error Rate)
Error (Bit Error Rate)
Pareto Front
For Comparison
Case Study II: Genetic Algorithm
9,000 simulations
50th Generation
* Population for one generation: 90
100th Generation
Preliminary results
18,000 simulations
200th Generation
LUT: Lookup table
21
Optimize Fixed-Point Wordlengths
Comparison of Proposed Methods
Gradient-based
search
algorithm
One point
Family of points
No
Yes
Execution Time
Short
Long
Amount of Computation
Low
High
Parallelism
Low
High
Type of Solution
Tradeoff Curve Found
Genetic
22
Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
Reduce Power Consumption in Arithmetic
Lower Power Consumption in DSP
• Minimize power dissipation due to limited battery
power and cooling system
• Multipliers often a major source of dynamic power
consumption in typical DSP applications
 Multi-precision multiplier select smaller multipliers (8,
16 or 24 bits) to reduce power consumption
Next
 Wordlength reduction to select any word size
[Han, Evans & Swartzlander 2004]
• In general, what reductions in power are possible in
software when hardware has fixed wordlengths?
23
Reduce Power Consumption in Arithmetic
Wordlength Reduction in Multiplication
• Input data wordlength
reduction
 Smaller bits enough to
represent, e.g. π x π ≈ 9
• Truncation
• Signed right shift
 Move toward the least
significant bit (LSB)
 Signed bit extended for
arithmetic right shift
Sign bit
0001 0010 0011 0100
1101 1100 1010 1001
(a) Original Multiplication
0001 0010 0000 0000
1101 1100 0000 0000
(b) Reduction by Truncation
0000 0000 0001 0010
1111 1111 1101 1100
(c) Reduction by Signed Right Shift
24
Reduce Power Consumption in Arithmetic
Power Reduction via Wordlength Reduction
• Power consumption
 Switching power consumption
Pswitching  CLVdd2 fclk
 Static power consumption
• Switching power consumption
CL Load capacitance
 Switching activity parameter, α
Vdd Operating voltage
 Reduce α by wordlength
fclk Operating
frequency
reduction
Relationship between reduced wordlength and
switching parameter α in power consumption?
25
26
Reduce Power Consumption in Arithmetic
Analytical Method
L bits
M bits
N bits
S
…
…
S
…
…
S S
…
S S
No Reduction
…
Input
Switching
expectation
Full length
L/2
Truncate N bits
M/2
N-bit signed
right shift
L/2
Wordlength (L) = 16
27
Reduce Power Consumption in Arithmetic
Dynamic Power Consumption
for Wallace Multiplier (1 MHz)
Reduction
(56%)
16-bit x 16-bit
multiplier
(Simulated on
Xilinx XC3S2005FT256 FPGA)
Truncate 1st arg
Truncation-ndFirst
Truncate 2 arg
Truncation- Second
(recode,nonrecode)
Wallace multiplier used in TI 320C64 DSP
28
Reduce Power Consumption in Arithmetic
Dynamic Power Consumption for Radix-4
Modified Booth Multiplier (1 MHz)
Sensitive
(13%)
16-bit x 16-bit
multiplier
(Simulated on
Xilinx XC3S2005FT256 FPGA)
Reduction
(31%)
Truncate 1st arg
Truncate 2nd arg
(recode,nonrecode)
Swapping could have benefit
Radix-4 modified Booth multiplier used in TI 320C62 DSP
Reduce Power Consumption in Arithmetic
Comparison of Proposed Methods
• Truncation to 8 bits reduces est. power consumption by
56% in Wallace and 31% in Booth 16-bit multipliers
• Signed right shift has no est. power reduction in
Wallace multiplier (for any shift) and 25% reduction in
Booth (for 8-bit shift) multiplier
• Operand swapping reduces power consumption for
Booth but has negligible savings for Wallace multiplier
• Power consumption in tree-based multiplier
 Highly dependent on input data
 Simulation matches analysis
29
30
Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
31
Automatic Transformations of Systems
Automating Transformations from
Floating Point to Fixed Point
• Existing fixed-point tools
 Support fixed-point simulation
 Convert floating-point code to
raw fixed-point code
 Manually find optimum
wordlength by trial and error
• Automating transformations
Fixed-point tools
• SNU gFix, Autoscaler
• CoWare SPW HDS
• Synopsys CoCentric
• MATLAB Fixed-point toolbox
• MATLAB Fixed-point blockset
• AccelChip DSP synthesis
• Catalytic RMS, MCS
 Fully automate conversion and wordlength optimization
Floating-Point
Program
Code
Conversion
Wordlength
Optimization
Wordlength-Optimized
Fixed-Point Program
32
Automatic Transformations of Systems
Automatic Transformation Flow
• Code generation
 Parse floating-point program
 Generate raw fixed-point program and auxiliary programs
• Range estimation
 Estimate range to avoid overflow (Analytical/Simulation)
 Determine integer wordlength (IWL)
• Wordlength optimization
 Optimize wordlength according to given input, and error
specification (Analytical/Simulation)
 Determine fractional wordlength (FWL)
Code
Generation
Range
Estimation
Wordlength
Optimization
33
Automatic Transformations of Systems
Automating Transformation Environment
for Wordlength Optimization
Input Data
Optimum Wordlength
Gradient-based
or Genetic
algorithm
Top
Program
Search
Engine
Range
Estimation
Floating-Point
Program
Evaluation
Program
(Objectives)
Complexity
Estimation
Fixed-Point
Program
Error
Estimation
• Given floating-point program and options,
auxiliary programs are automatically generated
• Given input data, optimum wordlength is searched
Automatic Transformations of Systems
Demo of Released Software
34
Conclusion
Conclusion
• Search for optimum wordlength
 Gradient-based search reduces execution time while
solutions could be trapped in local optimum
 Genetic algorithm can find distortion vs. complexity
tradeoff curve, but it requires longer execution time
• Reduce power consumption by wordlength reduction
of multiplicands
• Automate transformations from floating-point
programs to fixed-point programs
• Freely distributable software release available at
http://www.ece.utexas.edu/~bevans/projects/wordlength/converter/
35
Conclusion
Future Work
• Advanced wordlength search algorithms
 Hybrid wordlength optimization
 Prune redundant wordlength variables (e.g. delay, adder)
 Adaptive step size for gradient-based search methods
• Further analysis on search algorithms
 Analysis of genetic algorithms with different settings
 Comparison with simulated annealing
• Low power consumption
 System level including memory [Powell and Chau, 1991]
 Wordlength reduction for floating-point multipliers
36
Conclusion
Future Work (continued)
• Electronic design automation software
 Enhanced code generator (e.g. rounding preferences)
 Hybrid analytical/simulation range estimation
• Optimum DSP algorithms
 Rearranging subsystems at block diagram
 Rearranging mathematical expressions in algorithm
• Developing more sophisticated hardware area models
 Avoids having to route each design through synthesis tools
 Transcendental functions
37
38
End
39
Backup Slides
Publications
Publications-I
• Conference Papers
1. K. Han, A. G. Olson, and B. L. Evans, ``Automatic floating-point to fixed-point
2.
3.
4.
5.
6.
7.
transformations'', Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers, Nov.
2006, Pacific Grove, CA USA. invited paper.
K. Han, B. L. Evans, and E. E. Swartzlander, Jr., ``Low-Power Multipliers with Data
Wordlength Reduction'', Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers,
Oct. 30-Nov. 2, 2005, pp. 1615-1619, Pacific Grove, CA USA.
K. Han, B. L. Evans, and E.E. Swartzlander, Jr., ``Data Wordlength Reduction for LowPower Signal Processing Software,'' Proc. IEEE Work. on Signal Processing Systems, Oct.
13-15, 2004, pp. 343-348, Austin, TX USA.
K. Han and B. L. Evans, ``Wordlength Optimization with Complexity-And-Distortion
Measure and Its Applications to Broadband Wireless Demodulator Design,'' Proc. IEEE
Int. Conf. on Acoustics, Speech, and Signal Proc., May 17-21, 2004, vol. 5, pp. 37-40,
Montreal, Canada.
K. Han, I. Eo, K. Kim, and H. Cho, ``Numerical Word-Length Optimization for CDMA
Demodulator,'' Proc. IEEE Int. Sym. on Circuits and Systems, May, 2001, vol. 4, pp. 290293, Sydney, Australia.
K. Han, I. Eo, K. Kim, and H. Cho, ``Bit Constraint Parameter Decision Method for
CDMA Digital Demodulator,'' Proc. CDMA Int. Conf. & Exhibition, Nov. 2000, vol. 2,
pp. 583-586, Seoul, Korea.
S. Nahm, K. Han, and W. Sung, ``A CORDIC-based Digital Quadrature Mixer:
Comparison with ROM-based Architecture,'' Proc. IEEE Int. Sym. on Circuits and
Systems, Jun. 1998, vol. 4, pp. 385-388, Monterey, CA USA.
40
Publications
Publications-II
• Journal Articles

K. Han and B. L. Evans, ``Optimum Wordlength Search Using A Complexity-AndDistortion Measure,'' EURASIP Journal on Applied Signal Processing, special issue on
Design Methods for DSP Systems, vol. 2006, no. 5, pp. 103-116, 2006.
• Other Publications
1. K. Han, E. Soo, H. Jugn, and K. Kim, Apparatus and Method for Short-Delay Multipath
Searcher in Spread Spectrm Systems, U.S. Patent pending, Nov. 2001.
2. K. Han, I. Lim, E. Soo, H. Seo, K. Kim, H. Jung, and H. Cho, Apparatus and Method for
Separating Carrier of Multicarrier Wireless Communication Receiver System, U.S.
Patent pending, Sep. 2001.
3. K. Han, ``Carrier Synchronization Scheme Using Input Signal Interpolation for Digital
Receivers,'' Master's Thesis, Seoul National University, Seoul, Korea, Feb. 1998.
41
Backup
Research on Transformation
42
43
Backup
Simulation Flow
Gradient-based
search algorithm
Genetic
search algorithm
Setup desired
specification
Search wordlength
sets
Search wordlength
set
Pick one of sets
Generate Optimized
fixed-point program
Generate
Pareto Front
44
Backup
Algorithm Design and Implementation
High
Low
Floating-Point
Processor
Uniform Wordlength
Fixed-Point
Programs
Fixed-Point
Processor
Wordlength Optimization
Optimized
Fixed-Point
Programs
Power Consumption
Design Time
Code Conversion
Hardware Complexity
Floating-Point
Programs
Fixed-Point
IC
Low
High
Algorithm Design
Algorithm Implementation
45
Backup
Wordlength Optimization Constraints
• Distortion constraint
Application-specific
distortion d(w)
• Complexity constraint
Application-specific
distortion d(w)
Dmax
Cmax
Implementation
complexity c(w)
Implementation
complexity c(w)
46
Backup
Gradient-Based Search
• Gradient information can be used for update direction
• Gradient information is measured in design
parameters such as implementation complexity,
precision distortion, or power consumption
 Complexity measurement (CM)
 Distortion measurement (DM)
[Sung and Kum, 1995]
[Han et al., 2001]
 Complexity-and-distortion measurement (CDM)
[Han and Evans, 2004] (proposed)
47
Backup
Gradient Information
f (w ( h) )  f ({w1( h) , w2( h) ,...,wn( h1) ,...,wN( h) })
wn( h)  wn( h1)
w2
3
20
23
10
2
10
2
8
5
15
25
10
3
4
w1
Search direction
b
Objective
value
a
b
Gradient
N
number of variable
h
iteration index
n
w
variable index
wordlength vector
f(w) objective function
48
Backup
Gradient-Based Search Direction
• Wordlength update (s: step size)



w j 1  w j  s j
• Direction
d

 (1,0,0,...,0) if m j  w
1

d

  (0,1,0,...,0) if m j 
w2
j  

..............................

d

(0,0,0,...,1) if m j  w
N

d d
d
m

max(
,
,....,
)
where
j
w1 w2
wN
Finite Difference
Backup
Complexity and Distortion Function
• Complexity function, c(w)
 Number of multiplications is counted
 Hardware complexity is estimated by assuming that
complexity linearly increases as wordlength increases
 Given hardware model results in accurate complexity
• Distortion function, d(w)
 Difficult to derive closed-form mathematical expression
 Estimated by computer simulation measuring output SNR
or bit error rate in digital communication systems
49
Backup
Complexity Measure [Sung and Kum, 1995]
• Uses complexity sensitivity information as direction
to search for optimum wordlength
• Advantage: minimizes complexity
• Disadvantage: demands large number of iterations
Objective function
f ( w )  c( w )
Optimization problem
minn { f (w) | d (w)  Dmax , w  w  w}
Update direction
  c(w)
wI
50
Backup
Distortion Measure [Han et al., 2001]
• Applies the application performance information to
search for the optimum wordlengths
• Advantage: Fewer number of iterations
• Disadvantage: Not guaranteed to yield optimum
wordlength for complexity
Objective function
f (w )  d (w)
Optimization problem min{ f (w) | d (w)  Dmax , c(w)  Cmax , w  w  w}
wI n
Update direction
  d (w)
51
52
Backup
Feasible Solution Search [Sung and Kum, 1995]
• Exhaustive search of all possible
wordlengths
• Advantages
w2
wopt
5
24
 Does not miss optimum points
 Simple algorithm
• Disadvantage
 Many trials (=experiments)
• Distance d  dw1  dw2  ...  dwN
• Expected number of iterations
N
EFS
(d ) 
(d  N  1)...(d  2)( d  1)d
N!
23
dw2
22
21
wb
dw1
5
w1
Direction of full search:
minimum wordlengths {2,2}
optimum wordlengths = {5,5}
d=6
trials = 24
53
Backup
Sequential Search [K. Han et al. 2001]
• Greedy search based on sensitivity information
(gradient)



w
w  w  s
• Example
2
 Minimum wordlengths {2,2}
 Direction of sequential search
 Optimum wordlengths {5,5}
 12 iterations
j 1
j
j
wopt
5
dw2
wb
dw1
• Advantage: Fewer trials
• Disadvantage: Could miss global optimum point
5
w1
54
Backup
Case Study: Receiver Design
Transmitter
Data
Receiver
Bit Error
Rate
Tester
Encoder
Multicarrier
Modulator
Wireless
Channel
Channel w2
Estimator
w
3
w
1
Channel
Equalizer
Multicarrier
Demodulator
w
0
w0
Input wordlength of a multicarrier demodulator which performs
a fast Fourier transform (FFT)
w1
Input wordlength of equalizer
w2
Input wordlength of channel estimator
w3
Output wordlength of channel estimator
55
Backup
Simulation Results
• CDM leads to lower complexity compared to DM
• CDM reduces the number of trials compared to CM,
feasible solution [Sung and Kim 1995], and exhaustive search
 Fast searching
Search
Method
Gradient
Measure
Gradient
Gradient
Gradient
Feasible
Exhaustive
DM
CDM
CM
-
αc
0
0.5
1
-
* Required BER ≤ 1.5 x 10-3
Number
of
Trials
16
15
69
210
26364
Simulations
Wordlength
for
Variables
64 {10,9,4,10}
60 {7,10,4,6}
69 {7,7,4,6}
210 {7,7,4,6}
26364
-
Complexity
Estimate
10781
7702
7699
7699
-
Distortion
(BER)*
0.0009
0.0012
0.0015
0.0015
-
56
Backup
Simulation Environments
• Assumptions
 Internal wordlengths of blocks have
been decided
 Complexity increases linearly as
wordlength increases
• Required application performance
 Bit error rate of 1.5 x 10-3 (without
Complexity Vector
Input
Weight
FFT
Equalizer
(right)
1
Estimator
128
Equalizer
(upper)
2
error correcting codes)
• Simulation tool
 LabVIEW 7.0
1024
Complexity
C(w) = cT.w
Backup
FFT Cost
• N Tap FFT cost
Cost FFT
N
 log 2 N
2
• 256 Tap FFT cost
CostFFT
256

log2 256
2
 1024
57
Backup
Minimum Wordlengths
• Change one wordlength
variable while keeping
other variables at high
precision
 {1,16,16,16},{2,16,16,16},...
 {16,1,16,16},{16,2,16,16},...
 …
 …{16,16,16,15},{16,16,16,16}
• Minimum wordlength
vector is {5,4,4,4}
58
Backup
Number of Trials
• Start at {5,4,4,4} wordlength
• Next wordlength vectors
for complexity measure
(α = 1.0)
{5,4,4,4},
{5,5,4,4}, …
• Increase wordlength one-byone until satisfying required
application performance
59
60
Low-Power Signal Processing
Power Consumption
• Power consumption in CMOS circuits
Pavg  Pswitching  Pshort circuit  Pleakage
• Significant power in CMOS circuits is
dissipated when they are switching
Pswitching   C V 2 f
α : T ransitionfactor
C : Capacitance
V : Supply voltage
f : Frequency
• Power reduction in hardware part [Chandrakasan and Brodersen, 1995]
 Scaling down, minimizing area
 Adjusting voltage and frequency during operation
• Power reduction in software part [Tiwari, Malik and Wolfe, 1994] [Lee et al., 1997]
 Instruction ordering and packing
 Energy reductions varying from 26% to 73%
Low-Power Signal Processing
Wordlength for Low-Power Consumption
• Power model of wordlength [Choi and Burleson, 1994]
 Wordlength is considered as capacitance
 Power consumption is proportional to wordlength
 Switching activity is not considered
• Data wordlength reduction technique
[Han, Evans, and Swartzlander, 2004] (proposed)
 Count node transitions for switching activity
 Reduce input data wordlength to decrease power
consumption
61
Backup
Dynamic and Static Power
Trends in dynamic and static power dissipation showing
increasing contribution of static power
[S. Thompson, P. Packan, and M. Bohr. MOS Scaling: Transistor Challenges for the 21st Century. Intel
Technology Journal, Q3 1998]
62
Backup
Power Dissipation of Multiplier Unit
• Multiply unit is usually a major source of power
consumption in typical DSP applications
 Multiply unit required
for digital communication
& digital signal processing
algorithms
 Digital filters, equalizers,
FFT/IFFT, digital down/
upconverter, etc.
TMS320C5x Power Dissipation Characteristics
from www.ti.com
63
64
Backup
Wallace vs. Booth Multipliers
Symmetric
Asymmetric
(one operand
recoded)
Tree dot diagram
in 4-bit Wallace multiplier
Radix-4 multiplier based
on Booth’s recoding (Χ ● a = P)
Backup
Radix-4 Modified Booth Multiplier
• One multiplicand is recoded • Three bits in X are recoded
to z
• The a and x are multiplicands
• P is product of multiplication
65
Backup
Switching Activity in Multipliers
• Logic delay and propagation cause glitches
• Proposed analytical method
 Hard to estimate glitches in closed form
 Analyze switching activity w/r to input data wordlength
 Does not consider multiplier architecture
• Simulation method
 Count all switching activities
(transition counts in logic)
 Power estimation (Xilinx XPower)
 Considers multiplier architecture
66
67
Reduce Power Consumption in Arithmetic
Analytical Method
• Stream of data for one multiplicand
• Compare two adjacent numbers
in stream after reduction
• Expectation of bit
L
EL ( X ) 
2
switching, x, with
probability Px
LN M
Etr ( X ) 

 L-bit input data
2
2
 Truncate input data
to M bits (remove N bits)
 N-bit signed right shift in
L-bit input (Y is sign bit)
Ers ( X ) 
L
E ( X )   x  PX ( x)
x 0
L bits
M bits
N bits
S
…
…
S
…
…
S S
…
S S
…
1
1
L
E ( X | Y  0)  E ( X | Y  1) 
2
2
2
Backup
Analytical Method
X has binomial
distribution
Ers ( X ) 
1
1
E ( X | Y  0)  E ( X | Y  1)
2
2
Always L/2 (independent on M and N)
68
69
Backup
Power Reduction in TI DSP
• TI TMS320VC5416 DSP STARTER KIT
 Radix-4 modified Booth multiplier
 Measure average current for wordlength reduction of
multiplicands
581
Assembly program (data_a and
data_b has random data with
wordlength w)
580
Current [mA]
loop:
STM data_a, AR2;
STM data_b, AR3;
MPY *AR2+, *AR3+,a
MPY *AR2+, *AR3+,a
….….
MPY *AR2+, *AR3+,a
B loop
579
578
577
(w,w)
(16,w)
(w,16)
(wrsh,wrsh)
576
575
574
0
2
4
6
8
10
Wordlength (w)
12
14
16
70
Backup
Code Generation for Fixed-Point Program
• Adder function in MATLAB
Function [c] = adder(a, b)
c = 0;
c = a + b;
(a) Floating point program for adder
Function [c] = adder_fx(a, b, numtype)
c = 0;
a = fi (a, numtype.a);
b = fi (b, numtype.b);
c = fi (c, numtype.c);
c(:) = a + b;
(c) Converted fixed-point program for
automating optimization
Function [c] = adder_fx(a, b)
c = 0;
Determined
a = fi (a, 1,32,16);
by
b = fi (b, 1,32,16);
designers
c = fi (c, 1,32,16);
with trial
c(:) = a + b;
and error
(b) Raw fixed-point program
WL
S
FWL
fi(a, S,WL,FWL) is a constructor
function for a fixed-point object in
fixed-point toolbox [S: Signed, WL:
Wordlength, FWL: Fraction length]
Backup
Code Generation
<Run Code Generation>
<Floating-point Program>
71
Backup
Running Transformation
• Just call top function with input data
> in = rand(1,1000)
> mac_top(in)
• Range and optimum wordlengths depend on input
statistic
72
Backup
Advantages/disadvantages of
wordlength search algorithms
73