Transcript Document

Various Low-Power SoC
Design Techniques
Chong-Min Kyung
KAIST
Contents
• Introduction
• Power Management
using Voltage Island Technique
• Energy (Power) Management Approach by
ARM
• Low Power Design Example with Samsung
AP based on ARM 920T
• IBM Low Power Design using PowerPC
• Conclusions
Why Low Power?
• Limited Battery Capacity (Mobile
Devices)
• For Minimal Heat Dissipation (Heat
Sink, Cooler, System
Size/Weight/Cost)
• For Chip/System Reliability
• Save Energy; it’s limited after all!
Power vs. Energy
• Power-Critical Applications ;
– Heat Dissipation Requirement
– Power/Ground Metal Line Width
– Power/Ground Bounce due to IR drop
• Energy-Critical Applications ;
– Battery Lifetime
– Heat Dissipation Requirement
Applications for Low Power
Technology
• Medical ; Implantable hearing-aid,
cardiac pacemaker
• Mobile Devices ; cellular phone
• Military Devices ;
• Hard-to-access points ; Space
• Too-many-to-access points ;
Sensors/Actuators in Ubiquitous
World
• Power Management
using Voltage Island Technique
Typical Power Optimization Procedure
Applications
H/W Description and Synthesis
Initial
Layout
Standard Cell/Wire
Place/Route and Layout
Gate-Level Power
Optimization
Functional Partitioning
Constraints
(Delay, Power, Area, Noise)
Cell/Interconnect
Delay and Power Modeling
Switching Activity
Vdd, Vt, Wg, Wint Optimization
Power optimized Net List
Customized
Layout
Parameterized Cell/Wire Design
Place/Route and Layout
N
Verification for Min-Power,
Delay, Area, Noise
Y
Optimized Vdd, Vt, Wg, Wint
Technology Files
Parasitic
(Resistance, Capacitance)
Interconnects from layout
Power Challenge
 Active power density increasing with
device scaling and increased frequency
 Leakage power density increasing due
to lower Vt and gate leakage
 Stressing packaging, cooling, battery
life, etc.
 Complicates IDDq testing as well
Source from Bergamaschi
Thinning gate oxides
increase
gate tunneling leakage
Low Power Levers
 Dynamic Techniques
 Structural Techniques
 Clock gating
 Voltage Islands
 Data gating
 Multi-threshold devices
 Power gating
 Multi-oxide devices
 Variable frequency
 Minimize capacitance
by custom design
 Variable voltage supply
 Power efficient circuits
 Variable device threshold
 Parallelism in
micro-architecture
Standby Mode Leakage Suppression
 Disconnect inactive logic from
supply in standby mode
 Multi-threshold
 use higher Vt header/footer
 suppresses logic leakage
 gate & sub-threshold
 Multi-oxide
 Use thick oxide header/footer
 suppresses gate leakage
 Header/footer gate voltage
 Overdrive: increase freq.
 under-drive: reduces leakage
 Header/footer well bias
 Forward bias : increase freq.
 Reverse bias : reduce leakage
 Voltage Islands
Standby Power Reduction Mechanism
 On-chip supervisor manages standby power
 Clock gating
 Functional clock gating (fine clock control)
 Voltage scaling, shutdown
 SOC latch save/restore
 Timeout and interrupt driven
DC/DC Supplies
Select
Shutdown
1.0-1.8V
Scalable VDD Domain
3.3V
I/O
System Clks
Freeze
I/O Freeze
Clk
RTC
Irq
PG
Wake
Reset
3
SoC Logic
LSSD Latches
Suspend
Ctrl
Logic
Battery
Backed Domain
Scan Ctrl Logic
Reset Logic
Scan
Chains
IIC
Ctrl
Serial
NVRAM
Clk
Data
Voltage Island Concept
Vdd1
Vdd2
Vddo
SWITCH
SWITCH
Logic
Low VT
Logic
 Trade off power for delay by running
functional blocks at different voltages
 Can use mix of Low and High Vt to
balance performance and leakage
 Switch off inactive blocks to reduce
leakage power
 Requires IP standards for power
management, clock gating, etc.
Delay vs. Voltage
30
Std. Vt
25
IP2
Power Management Unit
E.g.: Telecom ASIC with 1.0/1.2 V islands saved :
16 % active power
50 % standby power
20
Ddelay (ps)
IP1
Low Vt
15
10
5
0
0.7
0.8
0 .9 1.0 1.1
Voltage (Vdd)
1.2
1.3
Source from Bergamaschi
Power Management Unit
Bus Interfaces
DC/DC Converter
Reconfigurable
Register Units
Control
Performance
Unit
Well-bias generator
Clock generator
Power
Management
State Machine
Clock Control
Unit
Monitor
Unit
Clock & Power-Gating
Device Performance Monitor
Thermal Monitor
Timer / Counter
Power Control
Unit
IP Core Interfaces
Busses with Different Voltages
 One clock & One signaling voltage
 Some approaches :
 Temporarily scaling V & F to for comm.
 Separate different voltages with bridges
Hot Bus
bridge
Cold Bus
bridge
Cool Bus
Power Management
I/O’s, VReg, Gnd
ROM
Vdd1
DSP
Vdd 2
Vdd 1
RLM 1
RLM 2
Microcontroller
Vdd 2
ROM
Vdd 1
Analog Vdd 5
Memory Arrays
Vdd 3
Low Vt device arrays
Optimized for low active
power
RLM 3
Monitor Logic Vdd 4
I/O’s, VReg, Gnd
Memory Arrays
Vdd 4
High Vt device arrays
Optimized for low active
power
 Independently controlled domain power switches
 Multiple On-Chip Voltage Islands
 On-Chip Voltage Regulators
I/O’s, VReg, Gnd
I/O’s, VReg, Gnd
Memory Arrays
Vdd 3
Low Vt device arrays
Optimized for low active
power
Functional Partitioning

Identifying functional components with similar inactive periods

Assigning functional components to possible chip-level power
sources capable of providing required voltage level

Identifying the optimal grouping of components, based upon power
sequencing (affects static power) and operating voltage (affects
active power) that minimizes chip power within the limits (such as
peak power) of the SoC

Identifying or creating, and connecting, logic signals that will be
used to control power-sequencing circuitry or control clock gates

Connecting alternate voltage sources to latches or arrays used to
save state across power sequencing
Controlling VDD and VTH for low power
Active
Stand-by
Multiple VTH
Dual-VTH
MTCMOS
Variable VTH
VTH hopping
VTCMOS
Multiple VDD
Dual-VDD
Boosted gate MOS
Variable VDD
VDD hopping
Software-hardware cooperation
Technology-circuit cooperation
 MTCMOS : Multi-Threshold CMOS
 VTCMOS : Variable Threshold CMOS
 Multiple : spatial assignment
 Variable : temporal assignment
Dynamic power reduction
 Through Software-hardware cooperation
 OS and application programming
Normalized power P∞fV2
1
Controller
0.8
0.6
Clock & VDD
Required
speed
0.4
0.2
Super-linear
0
0.2
0.4
0.6
0.8
Software
1
Hardware
Processor
Required speed ∞f
If you don’t need to hustle, relax and save power
Voltage Scaling Mechanism
 Four power domains
 On-chip supervisor for
SOC voltage supplies
 Level shifting and
latching circuits at
domain interfaces
DC/DC Supplies
Select
3.3V
Shutdown
1.0V-1.8V
Voltage Scalable
1.0V-1.8V Logic Supply
Domain
Battery
Persistent
1.8V
BatteryBacked
Domain
Suspend
Ctrl
Logic
RTC
Logic
Linear
Regulator
Regulated
1.0V PLL
Supply
Domain
CPU Core
Caches
I/O Intf Logic
Memory Intf
Accelerators
Constant
3.3V I/O
Domain
Drivers
Recvrs
Dynamic Voltage/Frequency Scaling
Freq. changed and Vdd dropped from 1.8V to 1.0V
 PLL locked at 533MHz with CPU clock switched from
266MHz to 66MHz to 266MHz
 Continues to execute Dhrystone benchmark
Low Leakage Cells – Standby Power Reduction
 Dual-Vt Storage Cells
 Low Vt for high performance
 High Vt for low leakage
 Gated Vdd and DRG
 Power Switch
 Sub threshold leakage current domi
nates
• Energy (Power) Management
Approach by ARM
Need for Energy Management
• Today’s mobile consumers want:
– longer battery life and
– smaller, lighter products
• Manufacturers are adding new features
and applications to add product appeal:
– media players (audio, video)
– gaming
– video capture
 Increasing processing power requirements and longer
battery life are conflicting requirements
• Battery technology alone offers only incremental
improvement over the next several years
Higher performance, higher power
1000
ARM10, 11
Power consumption (mW)
ARM9
100
ARM7
10
0.18um process
0.13um process
1
0
50
100
150
200
250
300
Dhrystone MIPS
350
400
450
500
Layers of power optimizations
 Software (OS, applications)
 System – Architecture
 Micro-architecture
 Circuits
 Ambient environment
 Si conditions
 Power delivery
• Important to optimize design at each level
• ARM’s partners have widely varying designtime, technology, legacy, cost constraints.
• IEM: current focus on top two layers
– Widely applicable dynamic power-optimizations
– Optimize for the requirements of the specific
workload
Conventional Power Management
• Conventional power management schemes manage the
transitions between defined power states
ON
RESTART
Power
Manager
IDLE
STANDBY
– STANDBY is off but with state retained with clocks stopped
– IDLE is a lower power mode with a slow clock running
– ON state is fully powered up at maximum clock frequency
• Despite the changing software workload, system runs at
maximum performance while there is any work to be
done
Optimizing for utilization characteristics
• Conventional power management optimizes power
consumption when there is nothing to do (sleep modes).
• IEM optimizes power when work is being done.
– Only run fast enough to meet deadlines!
– Running fast and idling wastes power.
• The active- and sleep-mode techniques are orthogonal.
100%
Utilization
Energy used
0%
100%
Dynamic Voltage Scaling
Energy used
0%
Meeting the Performance Requirement
•
Effective Energy Management requires:
1. Automatic Performance Prediction technology
•
Determining the lowest performance level that will get the
software workload done just in time
2. Performance Scaling technology
•
•
Delivering just enough performance to meet the current
requirement
Responding rapidly to changing performance levels
Performance
Prediction
and
Monitoring
Scaling
Technology
Voltage Scaling
Threshold Scaling
Energy Management Control Components
• Software component
– To automatically predict future software workloads by
interacting with instrumented Operating Systems and
application software
– To determine the software deadlines
– To balance workload and deadlines with performance
• Hardware component
– To accurately measure the actual system performance
– To independently manage the transitions of hardware scaling
blocks. e.g., clock generators and power controllers
• Together these components determine and manage
the lowest performance level that gets the work done
Adaptive Voltage Scaling (AVS)
• AVS is a closed loop control mechanism.
– Feedback from the PMU indicates the earliest opportunity to
change processor frequency based on the voltage levels
being output to the SoC.
– APC monitors the difference between the requested
performance level and the actual level achieved.
• Taking into account variations due to differences in
process technology and ambient temperature the
system dynamically changes the voltage applied.
• The lowest energy consumption is achieved OR
a specified performance level can be met.
• Low Power Design Example with
Samsung AP based on ARM 920T
Limited Battery Improvement
• Power Increase vs. Battery Improvement
Year
2001
Feature Size(nm)
2004
2007
2010
2013
2016
130
90
65
45
32
22
Dynamic Power Reduction(X)
0
1.5
2.5
4.0
7.0
20
Stand-by Power Reduction(X)
2
6
15
30
150
800
Volumetric Energy
Density(Whr/L)
[ITRS 2001]
• Cellular Phone
Talk Time : about 12Hrs
Standby : about 1 month
800
600
Fuel Cell
Lighter
• Cellular Phone
Talk Time : 2Hrs ~ 4Hrs
Standby : about 1 week
400
NI-MH
200
Smaller
100
Li-Ion / Polymer
Only 4~5 X improvement
In Battery lifetime!
200
300
400
500
600
700
Gravimetric Energy Density(Whr/Kg)
800
900
Problem Statement
• Power Analysis on CMOS Inverter
Input switching to '1' or '0'
Vthn < Input < VDD-|Vthp |
Input : '1' or '0' steady state
charge
Input
Input
Input
Cload
discharge
(a) Capacitive Current
(b) Short Circuit Current
(c) Static Leakage Current
Problem Statement
• Dynamic Power
Pswitching  Cswitching VDD2  f
• Average Short Circuit Current
I SC 
  in
12 VDD
 (VDD  2Vth ) 3  f
gain _ factor: n  p   ,
Threshold _ Voltage: Vthn  Vthp  Vth
• Sub-threshold Leakage Current
I DS    e(VGS Vth )q / nkT  (1  eVDS q / kT )
K: function of technology, VGS : gate  to  source voltage, VDS : drain  to  source voltage,
Vth : theshold voltage, q: electronic charge, k : Boltzmann constance, T: temperature,
n: nonlinearity constance 1 ~ 2 , ( kT  0.0259)
Problem Statement
• Domination of Leakage Current
Feature Size
> 0.25um
0.18/0.13/0.09um…
Performance(AP)
< 200MHz
300/400/533MHz, 1GHz
Core Voltage
5.0/3.3/2.5V
1.8/1.2/1.0V …
VTH(Threshold)
> +/- 0.6V
+/- 0.5, 0.4, 0.3V …
TR Leakage
Negligible
Exponential growing(SD/Gate)
Stand-by Mode
PLL-off(Clock-off)
V/MTMOS, High VTH/High VDD
Low Power
Focus on Operating Power
Focus on Operating/Stand-by
Active and Leakage Power with CMOS
Scaling
• As CMOS scales down the
following stand-by leakage
current rises rapidly.
– Source to drain leakage
(diffusion+tunneling) as
Lg scales down
– Gate leakage current
(tunneling) as Tox scales
down
– Body to drain leakage
current (tunneling) as
channel doping scales
up
Two cases of Leakage Mechanism
Turn off
Turn on
Sub-threshold Leakage
Source to drain tunneling
Vg=0V
Vd=Vdd
Drain to Body tunneling (BTB)
Vg=Vdd
Vd=0V
Gate oxide
tunneling
2
Current Density (A/cm)
Gate Leakage Current Reduction with
High-K Gate Dielectric
1
10
0
10
-1
10
Drain leakage
-2
10
-3
10
Gate leakage
High-K gate dielectric
-4
10
Cox 
-5
10
k0 A
Tphysical
-6
10
20
25
30
Tox (A)
35
40
Gate Leakage Current Reduction with
High-K Gate Dielectric
• As Tox scales gate leakage current increases
exponentially due to exponential increase of
tunneling probability with reduction of physical
tunneling distance.
• Physically thicker gate dielectric allows lower
leakage current but lower oxide capacitance
reducing on-current
• Using high k (dielectric constant) material, both
thicker physical thickness and higher oxide
capacitance can be achieved.
• Applying high-k gate dielectric, several orders of
magnitude lower gate leakage current can be
achieved with similar oxide capacitance
Power Saving vs. Abstraction Layers
• Power Saving v.s. Abstraction Layers
Design Time
System/Algorithm/Architecture
have a large potential!
System Level Consideration
for Low Power Design
• Mobile Device’s Behavior according to
Time (Operation Time is less than 10%)
Periodic
Wakeup
Wakeup &
Operation
Idle/Stand-by
Time
“Need Various Power Modes In
System”
Power Management : Example
General Clock Gating
Controlling the individual clock source for
each IP block by the on/off controlling of
each corresponding clock source enable bit
IDLE
Turn off the clock source to the CPU
STOP
Turn off all of the clock sources including
the external X-tal and internal PLLs
SLEEP
Turn off all of the clock sources and also
the power-supply for the internal-logic
except for the wake-up logic circuitry
Dynamic Voltage Scaling
(DVS)
• Reduction of Stand-by Power in Leaky Process
– By Monitoring Data Bus Congestion
– By Monitoring/Guessing Performance Needed,
for Specific Application
V
V
DVS
Task
Task
time
Need to predict
task execution time!
ΔV
Power gain ∝ ΔV2
time
Dynamic Voltage Scaling (DVS)
• Stretch the execution by lowering the supply
voltage
– Quadratic Power saving
– No later than the deadline
• Processors supporting DVS
– Intel Xscale
– Transmeta Crusoe
• DVS Algorithms
– Can be implemented as HW or SW
– Optimal solution in continuous voltage domain,
but not in discrete voltage domain
Voltage Scaling for Low Power
Low Power
P  VDD2
Low VDD
I ds  (VDD - Vth)1~2
Low Speed
Speed Up
I leakage  e-C x Vth
Low Vth
High Leakage
Leakage
Suppression
I ds  (VDD - Vth)1~2
Low-Leakage Solution – Technology
100m
VTH control
Dynamic power[W]
VDD control
10m
High speed
MTCMOS
High speed
VDD: 1.5V
VDD control
1m
VDD: 1.0V
Low speed
VTH control
Low speed
VTH: 0.5V
100n
1p
10p
VTH: 0.25V
100p
1n
10n
Leakage power[W]
100n
VTCMOS & MTCMOS
Multi-Threshold CMOS
Variable-Threshold CMOS
Schematic Diagram principle Merit
VDD
VDD
Low-Vth
Sleep
Hi-Vth
N-well
Low Vt
GND
P-well
Vpb = VDD
or V+
Vt
Control
circuit
Vnb = 0 or V-
GND
•On-off control of internal
VDD or VSS
•Special F/Fs, Two Vth’s
•Threshold control with bulk-bias
•Triple well is desirable
•Low leakage in stand-by mode.
•Conventional design Env.
•Low leakage in stand-by mode.
•Conventional design Env.
Demerit
•Large serial MOSFET
•ground bounce noise
•Ultra-low voltage region?(1V)
•Scalability? (junction leakage)
•TR reliability under 0.1mm
•Latch-up immunity, Vth controllability,
Substrate noise, Gate oxide reliability
•Gate leakage current
MTCMOS : Reduce Stand-by Power with
High Speed
With High VTH switch (MTCMOS)
Without High VTH switch
Vdd
Vdd
Normal or Low VTH MOSFET
0
0
1
1
Virtual Ground
Vss
0
Vss
High VTH switch
• With High VTH switch, much lower leakage current flows
between Vdd and Vss
• High VTH MOSFET should have much lower ( >10X) leakage
current compared to normal VTH MOSFET
Multi-Threshold CMOS (MTCMOS)
• Mobile Applications
– Mostly in the idle state
– Sub-threshold leakage Current
• Power Gating
– Low VTH Transistors for High Performance Logic
Gates
– High VTH Transistors for Low Leakage Current
Current
Gates
Logic
Component Cutoff-Switch
(High Vth)
(Low Vth)
Operating
Mode Active Sleep
Sleep
Control
(SC)
VDD
Low Vth
MOS
Active
SC
Time
VGND
VSS
High Vth
MOS
CCS Sizing
• The effect of CCS size
– As the size decreases, logic performance also
decreases.
– As the size increases, leakage current and chip
area also increase.
– Proper sizing is very important.
– CCS size should be decided within 2%
performance degradation.
VDD
Low Vt
Switch
Control
High Vt
GND
Vop = VDD - V
V must be sized
within 2% performance degradation
.
Energy Management System
– Open loop
System-on-Chip (SoC)
ARM Core
Vdd
Apps
PMU
OS
Power
Management
Unit
IEM
Intelligent
Energy
Manager
IEC
Perform ance
PC
Intelligent
Energy
Controller
Comms
Power
Controller
DCG
CPU Clk
Dynamic
Clock
Generator
• IEM and IEC components work together to predict
lowest acceptable processor performance level
• Power Controller, PMU and Clock Generator work
together to deliver that lowest performance level
Energy Management System
– Closed loop
System-on-Chip (SoC)
ARM Core
Hardware
Performance
Monitor
Vdd
EMU
Apps
"PowerWise"
Energy
Management
Unit
OS
IEM
Intelligent
Energy
Manager
Performance
IEC
APC
Intelligent
Energy
Controller
Adaptive
Power
Controller
PowerWise
Interface
Dynamic Clock
Generator
• APC operates in closed loop control mode using HPM
to adapt to actual process and temperature
• PowerWise™ Interface provides fast control of EMU
and feedback of status for optimum control
MPEG video playback
comparison
Legendary MPEG
Danse De Cable MPEG
100%
80%
600 M hz
47.72%
48.34%
60%
40%
500 M hz
29.50%
51.17%
20%
400 M hz
17.04%
300 M hz
0%
4.07%
80%
600 M hz
79.15%
60%
88.06%
40%
20%
500 M hz
17.20%
400 M hz
5.74%
LongRun
Fraction of time at each performance level
Fraction of time at each performance level
100%
7.78%
0%
Vertigo
LongRun
Vertigo
• Classical interval-based algorithms (e.g.
LongRun) are too conservative – choose
higher performance than necessary.
Interactive app: Konqueror
Konqueror
Fraction of time at each performance level
100%
26.65%
80%
14.75%
60%
73.92%
25.56%
40%
20%
5.55%
10.44%
38.49%
10.09%
0%
LongRun
Vertigo
• Exactly repeating the run of interactive apps is
difficult.
• Our methodology: LongRun in control, estimate
what IEM would have done on that same run.
Energy Management in Action
Performance
100%
83%
MPEG video
66%
Closest available
performance
level of system
50%
Performance
level requested
by algorithm
2 seconds
4 performance
(frequency and
voltage) levels
available in
benchmarked
system
DVS Control Sub-system
IEC
Config.
DVC
Configuration Interface
DPM
Dynamic Performance
Monitor
APB
MAXPERF
DEM
DVS Emulation
cpuclk
CPU
CLKGEN
Perf.
Index
Target
Current
...
CLOCK
PMU
PWRREQ
Target
Current
Interrupts
DPC
Dynamic Performance
Controller
Perf.
Index
DPC
CLKGEN
DCG
Dynamic Clock Generator
(SoC specific)
DVC
Dynamic Voltage
Controller
(SoC specific)
Voltage vs.
Frequency
Lookup table
DATA
DVS operation (with MAXPERF
Signalling)
New Performance Target
(50%)
Requested by IEM S/W
Maximum performance
requested
Back to software programmed
performance as
IECMAXPERF is cleared
IECMAXPERF
100%
75%
50%
25%
0%
VDD
IECCRNTDVCIDX[7]
IECCRNTDVCIDX[6]
IECCRNTDVCIDX[5]
IECCRNTDVCIDX[4]
Index changes as Voltage
Ramps down and
respective stable point
reached
Index changes as
VDD ramps up due to
IECMAXPERF
Index changes as VDD
Ramps down as
IECMAXPERF is cleared
Prototype IEM test chip
• ARM926EJ-S core
• Multiple power domains
• Voltage and frequency scaling of
CPU, caches and TCMs
• First full DVS silicon with National
Semiconductor PowerWise™
technology
• NSC Adaptive Power Controller
(APC) implemented in FPGA
• Includes DVS emulation mode for
comparative tests
• TSMC 0.13μm - CL013G - April Cyber Shuttle
– Packaged parts – 11 August 2003
• Developed by ARM, Synopsys and National
Semiconductor using Synopsys EDA tools
Conclusions
• Along with Process Technology
Scaling, Signal Integrity, SoC
Integration and System Verification,
Low-Power Design is a critical issue.
• Low Power Design needs to be
approached from System-Level
including Software, algorithm to
Device/Process Standpoints.
Thank you for your kind
attention!
• IBM Low Power Design using
PowerPC
Platforms for Information Appliances
IBM PowerPC platforms enable highly integrated, power efficient
Information Appliance (IA) chips
CoreConnectTM
Architecture
SOC
uP Cores
405/440
Custom IA Chips
Low Power
PowerPC
Optimizations
Platform
IP Cores
SOC
ASIC
Tools
Application-Specific
IA Chips
Scalable PowerPC 405 CPU Core
CPU Goals
 Expanded operating voltage range (0.9V
to 1.95V)
 Maintain full software and tools with
existing compatibility PowerPC 405
 Provide a high performance core capable
of high efficiency low power operation
CPU Optimizations
 Redesigned custom circuits within CPU
that were sensitive to low voltage
operation
 Re-optimize design and timing for
extended voltage range
 Verification of equivalence
64-bit Processor Local Bus
I-cache
I-cache
Control
D-cache
D-cache
Control
MMU
Instruction Branch
Unit
Unit
Execution GPRs Load /
Unit
Store
Pipe
MAC
Timers
Power Mgmt.
Debug/Trace
Interrupts
PowerPC 405 Core
Embedded PowerPC Cores
• PowerPC 405
– 32-bit data, 32-bit address, MMU
– Single-issue, 5-stage pipeline: 1.52 DMIPS / MHz
– 266 – 400 MHz
– L1 Cache to 16KB/16KB
– Voltage-scalable versions (405LP-1, 405LP-2)
• PowerPC 440
– 32-bit data, 36-bit address, MMU
– Dual-issue, 7-stage pipeline: 2.0 DMIPS / MHz
– 400 – 800 MHz
– L1 Cache 32KB/32KB; L2 256 KB; L3
Low Power Optimizations
IBM low-power SOC designs include a wide range of
optimizations to reduce both active and standby power
Active Power Reductions
Standby Power Reductions
 Voltage Scaling

Clock Freezing
 Frequency Scaling

Hibernation
 Flexible Clock
Distribution

“Cryo” Standby
 Clock Gating
 Hardware Accelerators
Reducing operating voltage greatly reduces active power in CMOS
Voltage Scaling Benefits
CMOS Ring Oscillator Delay and Power VS VDD
 Complementary CMOS scales
well over a wide voltage
range
 Can be used widely over
entire chip
 Can optimize
power/performance (MIPS /
W) over a 4X range
Voltage Scaling Challenges
 Custom Circuits, PLLs,
Analog, and I/O drivers don’t
voltage scale easily
 Avoiding increases in standby
power in low active power
circuits
( the VTH dilemma )
Operating at 1/2 normal Vdd increases delay 2.43.2X but reduces power by > 10X
IBM Low-Power SOC Designs
“Palmtops to Teraflops” in a single ISA
Optimized for high-performance handheld applications, e.g., high-end
PDA
•
PowerPC 405LP-1
– Joint project of IBM Research and IBM Microelectronics
– First silicon Oct. 2001
– 0.18m process
– Frequency-scalable, < 66 – 266 MHz
– Voltage-scalable, 1.0 – 1.8 V (0.9 – 1.65 V)
– Technology evaluation platform
All power and performance data from 405LP-1 systems
•
PowerPC 405LP-2
– 0.13 m process
– Scalable to 333 MHz @ 1.5 V (est.)
– Optimized for multimedia processing
– Well into design
405LP-1 System on a Chip
3.3V I/O Supply
1.0V – 1.8V Logic
PCMCIA/CFII
RAM/ROM/
Peripheral
Controller
LCD
Controller
PLB-OPB
Bridge
RTC
Standby Power
Management
Scalable
Low Power
PLL
Processor Local Bus (PLB)
16K
I-Cache
DMA
Controller
16K
D-Cache
PPC405
CPU Core
Clock
Power
Management
64-bit
Crypto
Accel
Interrupt
Controller
GPIO
On-chip Peripheral Bus (OPB)
Code Decompression
Speech
Accel
UART
1.8V Battery-Backed
UART
1.0V Internal Reg.
32-bit
SDRAM
Controller
IIC
CODEC
INTRFC
Passive
Sensor
INTRFC
New Core
Pre-existing
Core
Reducing Standby Power
• Cryo mode uses
– Customers/designs comfortable with clock-stop standby
– Low-latency periodic sleep/wake with minimal standby
power
– IP cores with hidden state can cause problems for SWbased save/restore
• Other methods under review
– Voltage islands and power gating
– State-saving latches
Standby Power Modes
Standby power modes enable longer battery life and “instant on”
System
Clock
Freeze
Mode
Hibernation
Mode
Cryo Mode
VDD
Logic
State Saved
Restore Time
Power
Logic
CMOS
Leakage
at 1V
0 Hz
1V
All
Observe
Wake-up
Condition
(< 1ms)
0
0
Software State
OS Restore
(100s of mS)
~0
0
Registers and
Software State
“Instant On” –
Scan Restore
of State
(20 - 200 mS)
~0
0
Cryo mode sequence
– Shutdown: Save CPU Core State  Flush caches and TLBs Clocks
stopped State scanned to internal/external non-volatile storage
Power removed from logic
– Suspend: Monitor system for wake up condition or RTC timer
– Restore: On Wake indicator Restore power to logic State scanned
in from non-volatile storage Restore clocks  Restore CPU state
Dynamic Power Management
• System-Wide power management (PM) during application
execution
• Examples:
– Peripheral PM, including core clock gating
– PM at idle (including low-latency sleep modes)
– Memory PM
– Dynamic voltage and frequency scaling
– Energy policy management
• DPM is proposed as an architecture for policy-guided
dynamic power management.
DPM Motivation
•
Embedded application requirements
– Long battery life
– System-specific policy requirements
• Highly variable system designs
• Watch, cell phone, personal server, PDA, tablet
• Soft real-time (multimedia) requirements
• Task-specific policy requirements
– General-purpose systems and applications
• No/minimal application software changes for PM
– Minimal/variable firmware
• PM must be in the OS/applications
DPM Motivation
• Technology
– SOC
• CPU + peripheral PM
– Complex clocking architectures
• Decoupled CPU/bus frequencies
– Heterogeneous processor architectures
• Example: 405LP-2 - Asynchronous heterogeneous
processing in a common voltage/memory domain
– New performance and leakage control mechanisms at the
circuit level
DPM Motivation
• Linux
– Platform independence desired
– Community acceptance required
• Simplicity – ease of maintenance
• Integration with pre-existing facilities
– Linux Device Model
• Minimal core kernel changes
– 5 lines of new code in the “core” kernel
– Scalability to server/SMP systems
Is:
DPM: An Architecture for
Policy-Guided PM
A generic software architecture
for policy-guided dynamic
power management
proposed by IBM and
MontaVista software
•
•
Flexible enough to implement
a number of system-specific
DVFS and static PM
approaches
Available in an embedded
Linux distribution for several
embedded processors
Is Not:
• PowerPC or Linux
specific
• A DVFS algorithm
• Fully implemented
yet
DPM Overview
Power-aware
Applications
Operating
System
Device
Drivers
Software
Hardware
Policy/Power
Managers
Signal
operating/task state
changes
Requirements,
Provide,
power-mgmt.
manage policies
information DPM
System
Sets operating
Clock
points changing
Generation
power-performance
Power
levels
Memory
CPU
Controller
Supplies
Dynamic Voltage and Frequency Scaling
Total Chip Power
600mW
Logic Power
Dynamic Frequency Scaling
266Mhz CPU to 66MHz CPU
400mW
200mW
I/O Power
Dynamic Voltage Scaling
1.8V --> 1.0V at upto 1V/100us
0mW
2.0V
Logic
VDD
1.0V
Uninterrupted Operation
Linux 2.3.17 Running
Dhrystone 2.1 code
400 loops per cycle .
--- 266 /133---| -------------------------- 66 /66 --------------------- |-------- 266/133-------CPU/MEMORY
FREQUENCY(
Power consumption for the CPU
and logic was
reduced MHz)
by 13X dynamically
under the control of the Linux kernel
( NO PLL Relock and NO stopping of the application )
0V
Idle Scaling Trace (MPEG4)
Core Voltage
Battery
Power
Application
Default
Idle Scaling
Sys. Savings
Core Savings
MPEG4 A/V
2.76 W
2.63 W
4.7 %
11.4 %
MP3
1.42 W
1.1 W
22.5 %
47.8 %
Load Scaling Trace (MPEG4/spmt)
B
F
Core Voltage
E
D
A
Battery
Power
Application
Default
Load Scaling
System
Savings
MPEG4 A/V
2.76 W
2.54 W
8.0 %
MP3
1.42 W
1.03 W
27.7 %
Application Scaling Trace
F
E
D
Task Task+1 Task+1 Task
Task-1
Task-1
Video
Thread
Task State
More Performance Required
Working Ahead
AS Results
• AS achieved close to an “ideal” LS result with a simple
policy manager and a straightforward modification of
the application
Applicatio
n
MPEG4
A/V
No DPM
2.76 W
DPM:
Application
Scaling
2.46 W
DPM
Savings
10.8 %
“Ideal”
Savings
10.8 %
Operating Point Usage for MPEG4 by Strategy
70
60
50
Idle Scaling
Load Scaling
App. Scaling
"Ideal"
40
%
30
20
10
0
Idle/33
100
133
166
200
266
References
• Nowka et al., “A 32-bit PowerPC System-on-a-chip With
Support for Dynamic Voltage Scaling and Dynamic
Frequency Scaling”, IEEE Journal of Solid-State Circuits, vol.
37(11), Nov. 2002, pp. 1441-1447.
•
IBM Austin Research Laboratory (www.research.ibm.com/arl)
– Dynamic Power Management for Embedded Systems
(Whitepaper)
http://www.research.ibm.com/arl/projects/papers/DPM_V1.1.pdf
•
Linux 2.4 kernel including DPM implementation (Bitkeeper)
bk://source.mvista.com/linuxppc_2_4_devel-pm