下載/瀏覽

Download Report

Transcript 下載/瀏覽

Class Report
林常仁
Low Power Design: System and
Algorithm Levels
Why Low Power
•
•
•
•
•
•
Battery life in portable systems
Packaging and cooling cost
Digital noise immunity
Power supply rail design
Environmental concerns
Goal: reduce power dissipations but
maintaining adequate throughput rate
Low Power Design Approaches
 Run at minimum allowable voltage
 Reduce effective switching capacitance per sample
• System: Hardware-software partitioning, power
distribution
• Algorithms: Complexity, concurrency, locality,
regularity, data representation
• Architecture: Parallelism, pipelined, signal
correlations
• Circuit/Logic: Size, logic design, logic style
• Technology: Scaling, threshold reduction,
advanced packaging
Level of Power Reduction
Increasing
Leverage
Level of
Abstraction
Algorithm
Expected Saving
Architecture
10-90%
Logic Level
20-40%
10–99%
Layout Level
10-30%
Device Level
10-30%
General
Purpose
Applicable
System Level Optimization
• System partition is very important for low power
implementation of time-slicing OFDM receiver or
system-on-chip (SOC) application
• Energy consumption determines the battery life.
• Functions are implemented in different modes:
-- Active modes with different clocks (voltage)
-- Standby mode with slow clock
-- Sleep or suspend mode (slowest clock or shut
down)
Power Reduction by Clock Gating

Clock
Module
Unit 1

Enable 1
Module
Unit 2
Enable 2
Module
Unit N
Enable 2
Need circuit in standby mode or active
mode to generate enable signals

Modules will be partitioned by
-- application functions
-- speed of implementation

In SOC applications, the global clock
might activate the local clock generator

Reducing power consumption can use a
global synchronous local synchronous
(GALS) design style
Stopping Clock of Unused Block
Function A
0
Function B
1
0
0
Function A
0
1
Function B
0
1
Algorithm Level Optimization
• Apply fast algorithm to reduce the average
switched capacitance CL per sample
• Multiplies are traded-off with adds
• Can be combined with other low area/power
techniques via voltage scaling
• Select the suitable algorithm to meet the
requirements and to reduce the computations
• Algorithm transforms: parallel/pipelined
processing, look ahead, retiming, folding,
unfolding, strength reduction
Algorithm Optimization - Example
x0
h0
y0
x1
x1
x0+x1
h0
h1
y0
x1
h0
h1-h0
y1
x2
h1
4 multipliers, 2 adds

y1
x1+x2
h1
3 multipliers, 5 adds
Winograd’s algorithm reduce the number of multiplies
at the price of the number of adds
Precomputation-Based Optimization
A(n-1)
B(n-1)
A(n-2)
B(n-2)
Comparator
A>B
A(0)
B(0)
Load
Disable
When
A(n-1)B(n-1)
 Achieve up to 75% power reduction with 3% area overhead
 In the worst case, there are an additional 1 to 5 more gate
delay
Don’t Care Optimization
f  x1 x2 ( x3  x4    xn )  h( x3 ,, xn )
x1
x2
xn
R2

x3

R1
h
x1
x2
R2

xn

R1
x3
f
h
LE
FF
f
Comparison of 8X8 DCT
Algorithms
Algorithm
Multiplications
Additions
Brute Force
4096
4096
Row-Column
1024
1024
Chen [CSF77]
256
416
Ligtenberg [LV86]
208
464
Arai [AAN88]
80
464
Feig [FW92]
54
462
Lee [CL92]
112
472
References
• A. P. Chandreakasan and R. W. Brodersen,
Minimizing Power Consumption in Digital CMOS
Circuits, IEEE Proceedings, pp.498-523, April
1995.
• M. Mehendale and S. D. Sherlekar, VLSI Synthesis
of DSP Kernels, Kluwer Academic Publishers,
2001.
• K. K. Parhi, VLSI Digital Signal Processing
Systems – Design and Implementation, John
Wiley & Sons, 1999.
• S.S. Rofail and K. Yeo, Low-Voltage, Low-Power
Digital BiCMOS Circutis, Prentice Hall, 2000.