Transcript 下載/瀏覽
Class Report 林常仁 Low Power Design: System and Algorithm Levels Why Low Power • • • • • • Battery life in portable systems Packaging and cooling cost Digital noise immunity Power supply rail design Environmental concerns Goal: reduce power dissipations but maintaining adequate throughput rate Low Power Design Approaches Run at minimum allowable voltage Reduce effective switching capacitance per sample • System: Hardware-software partitioning, power distribution • Algorithms: Complexity, concurrency, locality, regularity, data representation • Architecture: Parallelism, pipelined, signal correlations • Circuit/Logic: Size, logic design, logic style • Technology: Scaling, threshold reduction, advanced packaging Level of Power Reduction Increasing Leverage Level of Abstraction Algorithm Expected Saving Architecture 10-90% Logic Level 20-40% 10–99% Layout Level 10-30% Device Level 10-30% General Purpose Applicable System Level Optimization • System partition is very important for low power implementation of time-slicing OFDM receiver or system-on-chip (SOC) application • Energy consumption determines the battery life. • Functions are implemented in different modes: -- Active modes with different clocks (voltage) -- Standby mode with slow clock -- Sleep or suspend mode (slowest clock or shut down) Power Reduction by Clock Gating Clock Module Unit 1 Enable 1 Module Unit 2 Enable 2 Module Unit N Enable 2 Need circuit in standby mode or active mode to generate enable signals Modules will be partitioned by -- application functions -- speed of implementation In SOC applications, the global clock might activate the local clock generator Reducing power consumption can use a global synchronous local synchronous (GALS) design style Stopping Clock of Unused Block Function A 0 Function B 1 0 0 Function A 0 1 Function B 0 1 Algorithm Level Optimization • Apply fast algorithm to reduce the average switched capacitance CL per sample • Multiplies are traded-off with adds • Can be combined with other low area/power techniques via voltage scaling • Select the suitable algorithm to meet the requirements and to reduce the computations • Algorithm transforms: parallel/pipelined processing, look ahead, retiming, folding, unfolding, strength reduction Algorithm Optimization - Example x0 h0 y0 x1 x1 x0+x1 h0 h1 y0 x1 h0 h1-h0 y1 x2 h1 4 multipliers, 2 adds y1 x1+x2 h1 3 multipliers, 5 adds Winograd’s algorithm reduce the number of multiplies at the price of the number of adds Precomputation-Based Optimization A(n-1) B(n-1) A(n-2) B(n-2) Comparator A>B A(0) B(0) Load Disable When A(n-1)B(n-1) Achieve up to 75% power reduction with 3% area overhead In the worst case, there are an additional 1 to 5 more gate delay Don’t Care Optimization f x1 x2 ( x3 x4 xn ) h( x3 ,, xn ) x1 x2 xn R2 x3 R1 h x1 x2 R2 xn R1 x3 f h LE FF f Comparison of 8X8 DCT Algorithms Algorithm Multiplications Additions Brute Force 4096 4096 Row-Column 1024 1024 Chen [CSF77] 256 416 Ligtenberg [LV86] 208 464 Arai [AAN88] 80 464 Feig [FW92] 54 462 Lee [CL92] 112 472 References • A. P. Chandreakasan and R. W. Brodersen, Minimizing Power Consumption in Digital CMOS Circuits, IEEE Proceedings, pp.498-523, April 1995. • M. Mehendale and S. D. Sherlekar, VLSI Synthesis of DSP Kernels, Kluwer Academic Publishers, 2001. • K. K. Parhi, VLSI Digital Signal Processing Systems – Design and Implementation, John Wiley & Sons, 1999. • S.S. Rofail and K. Yeo, Low-Voltage, Low-Power Digital BiCMOS Circutis, Prentice Hall, 2000.