Power_Management_in_..
Download
Report
Transcript Power_Management_in_..
Minshu Zhao
Power Management in Multicores
Outline
Introduction
Review of Power management technique
Power management in Multicore
◦ Identify Multicores Characteristics
◦ Apply power management technique
Future of multicore
Review on low power technique
Clock gating
EN
FF
CK
◦ + Gating can be done
on fine grained
◦ + Save dynamic power
◦ - Not affect static
power
Power Gating
◦ + save both
dynamic and static
power
EN
◦ - need
microseconds to
power up again
◦ - lost data or need
some form of
state retention
Vdd
FF
Review on low power technique
Voltage (Frequency)
Scaling
◦ Scale down frequency
and/or voltage,
sacrifice performance
for power
I ∝ (Vdd-Vt) ~ Vdd
f ∝ Vdd
P ∝ CV2f ∝ V3
Variable device
threshold
◦ Use high vt transistor
to reduce leakage
◦ + reduce leakage
◦ - vt is generally fixed
for one transistor
Outline
Introduction
Review of Power management technique
Power management in Multicore
◦ Identify Multicores Characteristics
◦ Apply power management technique
Future of multicore
Identify Multicore Characteristics
Half of the chip is cores
◦ Large dynamic power
◦ Unbalanced power consumption among cores
Another Half of the chip is Cache
◦ Large Leakage Power
Outline
Introduction
Review of Power management technique
Power management in Multicore
◦ Identify Multicores Characteristics
◦ Apply power management technique
To Cores
To Caches
Future of multicore
Traditional DVFS
Motivation
◦ Large
Computation/Memory Gap
Problems to apply to
multi-core
Power
supply
Off-chip
regulator
◦ Slow
Microsecond timescales
◦ Coarse-grained adjustment
In operating system
◦ All cores arrive at a single
chip-wide VF setting
Lose potential power saving
Core0
Core1
Core2
Core3
Per-core DVFS & on-chip regulator
On-chip vs. off-chip
regulator
◦ Tens of nanoseconds
vs. microseconds
Per-Core vs. ChipWide DVFS
◦ Benefit heterogeneous
workload
Power
supply
Off-chip
regulator
On-chip Regulator
Core0
Core1
Core2
Core3
Wonyoung Kim; Gupta, M.S.; Gu-Yeon Wei; Brooks, D.; , "System level analysis
of fast, per-core DVFS using on-chip switching regulators," High Performance
Computer Architecture, 2008. HPCA 2008.
Per-core DVFS & on-chip regulator
Application
◦ Multi-Core Global Power Management
Monitor power & performance
Apply policies by per-core DVFS
Problem
◦ Overhead is large
Thread Motion
App B
Low IPC
High IPC
High-VF
Activity
App A
Low-VF
Time
Cores have different Voltage-Frequency setting
Migrate thread between cores
Apply DVFS benefits to program variability by
observe micro architectural events
Fast movement create effective voltage level
Krishna K. Rangan, Gu-Yeon Wei, and David Brooks. 2009. Thread motion: finegrained power management for multi-core systems. In Proceedings of the 36th
annual international symposium on Computer architecture (ISCA '09).
Thread Motion
Application
◦ Thread Motion Framework
Evaluation driven by micro
architectural events
Time-driven
Miss-driven
Predict IPC for the next
interval
Move thread if needed
Problem
◦ Potential Cache penalty
Clustered multicore with
shared L1 cache within cluster
◦ Register file transfer penalty
Store them in the shared cache
Heterogeneous Cores
Motivation
◦ Different applications have different resource
requirements
Large ILP -> VLIW
◦ Different Power conditions
full battery vs. low battery
Combine existing processor architecture
and do core-selection to minimize energy
Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi,
and Keith I. Farkas. 2004. Single-ISA Heterogeneous Multi-Core Architectures
for Multithreaded Workload Performance. In Proceedings of the 31st annual
international symposium on Computer architecture (ISCA '04).
Outline
Introduction
Review of Power management technique
Power management in Multicore
◦ Identify Multicores Characteristics
◦ Apply power management technique
To Cores
To Caches
Future of multicore
Gated-Vdd cache
Use high- Vt
transistor to turn off
power supply
+ reduce power
when turn off
- data stored in low
power mode are lost
Vdd
SRAM CELL
Gated-vdd
control
Gnd
Michael Powell, Se-Hyun Yang, Babak Falsafi, Kaushik Roy, and T. N. Vijaykumar.
2000. Gated-Vdd: a circuit technique to reduce leakage in deep-submicron
cache memories. In Proceedings of the 2000 international symposium on Low
power electronics and design (ISLPED '00). ACM, New York, NY, USA, 90-95.
Gated-Vdd cache
Application
◦ Dynamically resizable i-cache
Evaluate miss rate at every time interval and
upsize/downsize the cache using gated-vdd
Problem
◦ Data remapping on the fly
Yang, S.; Powell, M.D.; Falsafi, B.; Roy, K.; Vijaykumar, T.N.; , "An integrated
circuit/architecture approach to reducing leakage in deep-submicron highperformance I-caches," High-Performance Computer Architecture, 2001. HPCA.
Gated-Vdd cache
Application
◦ Cache Decay
Turn a cache line off if
some cycles elapsed since
last access
The decay interval can be
adaptive to the program
Problem
◦ Data lost in sleep cache
line, suffer cache miss
Kaxiras, S.; Zhigang Hu; Martonosi, M.; , "Cache decay: exploiting generational
behavior to reduce cache leakage power," Computer Architecture, 2001.
Proceedings. 28th Annual International Symposium on , vol., no., pp.240-251, 2001
ABB-Multi-threshold CMOS
Increase Vsb in the
sleep mode
Effectively increase
vth to reduce leakage
+ State Preserved in
sleep mode
- Need long time to
switch from sleep
1.0V
1.0V
1.0V / 3.3V
0V / 1.0V
0V
K. Nii, et. al. A low power SRAM using auto-backgate-controlled
MT-CMOS. Proc. of Int. Symp. Low Power Electronics
and Design, 1998, pp. 293-298.
0V
Drowsy Caches
Apply DVFS to
Cache
+ Waking up cost is
small
+ State preserve
- Save not as much
leakage power
drowsy
1V
Vdd
0.3V
drowsy
SRAM CELL
Krisztián Flautner, Nam Sung Kim, Steve Martin, David Blaauw, and Trevor
Mudge. 2002. Drowsy caches: simple techniques for reducing leakage power. In
Proceedings of the 29th annual international symposium on Computer architecture
(ISCA '02). IEEE Computer Society, Washington, DC, USA, 148-157.
Drowsy Caches
Application
◦ Simple policy
Put all lines into sleep periodically and wake up
afterwards
◦ No-access policy
Put the lines which is not access in the window in sleep
◦ 90% of the lines can be drowsy mode
Avg
Normalized
total energy
Normalized
leakage energy
Run time
increase
0.46
0.29
0.41%
Problem
Leakage power
Drowsy cache
Gated-Vdd
6.24nW
0.02nW
Outline
Introduction
Review of Power management technique
Power management in Multicore
◦ Identify Multicores Characteristics
◦ Apply power management technique
Future of multicore
Future multicore
Dark silicon (transistor under-utilization)
◦ Power constraints
Power down the transistor to reduce power
◦ Memory wall
Waiting for the memory to continue computation
◦ Lack of parallelism
Do not have enough work for transistor
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and
Doug Burger. 2011. Dark silicon and the end of multicore scaling. In Proceeding of
the 38th annual international symposium on Computer architecture (ISCA '11).
Future multicore
Power constraints
◦ New Device– FinFET
Memory wall
◦ New Technology – 3D IC
Lack of parallelism
◦ Auto parallization
Thank you !