Transcript Document
Keeping Hot Chips Cool Circuits R-US Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY So, What’s Going On ? Power Density (W/cm ^2) 1 E+ 4 1 E+ 2 Active Power 1 E+ 0 Shrinking Margin 1 E-2 1 E-4 SubThreshold Power 1 E-6 1 E-8 0 .0 1 0 .1 1 10 Gate Length (m icr ons) At 65nm node Static Power is equal to Active Power Clock distribution accounts for half of active power Why Can’t We Keep Scaling Vt ? 40 1000 30 100 20 10 10 1 0 0 0.1 0.2 Threshold Voltage 0.3 0.4 Leakage (nA/um) Delay (ps) Device Leakage vs Delay Low Power Opportunities 5% Late Mode Tim ing Checks (Thousands) 200 150 10% 15% 20% Exploiting positive slacks 100 50 0 280 260 240 220 200 180 160 140 120 100 80 60 40 20 0 -20 -40 Tim ing S lack (psec) Power4 Timing Histogram Most of the Power reduction techniques exploit this positive slack. Low Power Levers Structural Techniques Voltage Islands Multi-threshold devices Multi-oxide devices Minimize capacitance by custom design Power efficient circuits Parallelism in micro-architecture Dynamic Techniques Clock gating Power gating Variable frequency Variable voltage supply Variable device threshold Outline Voltage Clock & Latch Power Islands Optimization Gating Active Power Clock Power Leakage Power Outline Voltage Clock & Latch Power Islands Optimization Gating Active Power Clock Power Leakage Power Minimizing Active Power: Coarse Grained Voltage Islands Vdd1 Vdd2 Trade off power for delay by Vdd0 running functional blocks at different voltages SWITCH SWITCH Can use mix of Low and High Vt to balance performance and leakage High VT LOGIC LOGIC Switch off inactive blocks to reduce leakage power IP 1 IP 2 E.g.: Telecom ASIC 1.0/1.2 V islands saved: 16 % active power Power Management Unit 50 % standby power Fine-Grained Voltage Islands PowerPC 405 Secondary power drop Vddl = 1.2V Vddh = 1.5V No timing degrade, and no area increase for the core! Outline Voltage Clock & Latch Power Islands Optimization Gating Active Power Clock Power Leakage Power Minimizing Clock Power: Local Clock buffer - Latch clustering Clocks consume large amount of power in high-performance designs Large portion of that power goes to the last stage of the clock tree Minimize the Capacitive loading on local clock buffers by clustering latches around them. Tradeoff between latch placement flexibility and clock power savings Reduction in clock skew between capturing and launching latch compensates for loss in latch placement flexibility. Clock Power Savings 70 % Capacitance Savings 60 Wire Total 50 40 30 20 10 c1_0 c1_1 c1_2 c1_3 c1_4 c1_5 c1_6 c1_7 c1_8 c1_9 c1_10 c1_11 c1_12 c2_0 c2_1 c2_2 c2_3 c2_4 c2_5 c2_6 c2_7 c2_8 c2_9 c2_10 c2_11 c2_12 0 Clock Net Reduces total capacitance on the local clock buffer by 25% Direct savings in clock power in the Random Control Logic Outline Voltage Clock & Latch Power Islands Optimization Gating Active Power Clock Power Leakage Power Minimizing Leakage Power: Power Supply Gating Logic Block SLEEP Footer Switch Leakage power is now more than switching power Limits the performance of microprocessors Power gating is one of the most effective ways of minimizing leakage power Cut-off power to inactive units/components Dynamic/workload based power gating Reduces both gate and sub-threshold leakage Over 20-2000x reduction in leakage with little or no cycle time penalty. Power Gating Concept Performance on Demand P1 P2 Dedicated Units off on P1 L2 P3 P2 L2 P4 P3 P4 More Power Available to Scalar Units Dedicated Units Available for Higher SPEC Performance Higher Application Performance Normal Operation Mode VDDL IDS,MAX CORE VGS = VDD IDS VGND VDS,LINEAR VGS = 0 V IACTIVE VDS GNDL To reduce the performance degradation, the voltage drop across SLEEP transistor should be minimized to reduce active leakage current. Requires sizing up of footer device Sleep Mode VDDL CORE IDS,MAX VGS = VDD IDS VGND VGS = 0 V VDS GNDL During the sleep mode, all of the internal capacitive nodes and VGND node are charged up to near VDD. Requires sizing down of footer device to reduce standby leakage. Wake-Up Mode VDDL IDS,MAX CORE VGS = VDD IDS VGND ITURN_ON Rs VGS = 0 V VDS GNDL When the SLEEP transistor is turned on, the maximum instant current can flow. Requires sizing up of footer device. Sleep / Wake / Run State Control Exit sleep state off assert discharge wake Enter sleep state run enable fence assert disable & run fence deassert wake/run run off charge Power Supply Current (leakage) - gxpsi_channel_mac) 50 discharge cycle (wake) 45 40 current (mA) 35 30 charge cycles 25 20 15 sleep 10 5 0 0 sleep run (idle) 2 4 6 time (nsec) 8 10 Footer Selection and Sizing Power Gate Area vs. Frequency and Leakage Reduction 5 6.5 15.5x 10x-20x Leakage Reduction 4.5 6 Frequency loss (%) 5 20x 3.5 4.5 4 25x 3 3.5 2.5 Reg. Vth Reg. Vth lkg 3 33x 2 2.5 2 50x 1.5 1.5 1 1 100x < 1% Frequency Loss 0.5 0.5 0 0 50 100 150 200 Footer gate width (um) 250 % of reference leakage Reduction Leakage 5.5 4 300 0 350 Power vs Performance Tradeoff 130nm Hardware ~8% Performance Degradation Due to Sleep Transistor with 1% area overhead Target Specification: 250MHz at 0.9V ~ 500MHz at 1.4V 1% footer size is used for a 2-stage pipelined 40-bit ALU Sleep Transistor Sizing and Performance 130nm Hardware Less Than 2% Performance Degradation More Than 8% Performance Degradation Leakage Power Reduction 130nm Hardware Leakage Suppression Using VDD Scaling ~8.4 x ~2000 x Leakage Suppression using Power Gating Structure with 1% area overhead Physical Design: External Footer Switch Global Grid GND VGND Macro/Core M1 metal Virtual Grid M2 metal Footer Switch Location Physical Design: Internal Footer Switch GND VDD GND VDD GND VDD VGND VGND M1 metal VDD VDD VGND Footer Locations M2 metal Internal fine-grained power gating is more efficient in addressing: Electro-Migration and Current Delivery. Ground Redistribution The ‘real’ chip-level ground distribution is M4 and above. It is unchanged by power gating This part of the redistribution is electrically similar to an unmodified distribution Virtual ground M3 V2 M2 V1 M1 Contact Logic Device Global ground Footer Cell Physical Design: Footer Insertion Footer Rows Without Footers With Footers Power Gating in High-Performance Gated and non-gated logic have identical width 5% total area overhead for power gating 20X leakage reduction <1% performance degradation Non-gated Logic Gated Logic Power Gating: Footer area overhead % WC Area Overhead 14 12 10.4% 10 8 Custom 5.7% 6 RLM 4 2 0 1 3 5 7 9 Macro 11 13 15 10mV Virtual Ground Conclusions Power is the limiting factor in traditional CMOS scaling and must be dealt with aggressively Controlling leakage is crucial for future scaling Power gating and voltage islands are effective techniques to minimize leakage and active power Special consideration to clock distribution must be given in high performance designs to minimize clock power In order to keep hot chips cool, a holistic power minimization approach across the whole design stack is required which must include : Device level techniques Circuit level techniques System level power management