Transcript Slide 1
Runtime Power Monitoring and Phase Analysis Methods for Power Management
Canturk Isci and Margaret Martonosi
Princeton University
Motivation and Research Overview
Counter Based Power Estimation:
Idealized view: For all components on a chip….
Our Work:
Runtime Monitoring
Hardware
Performance
Counters
Power of component I =
▪ Monitor application Execution:
- Performance behavior via performance
monitoring counters (PMCs)
- Control flow via dynamic instrumentation
Application
From
Microarch.
Properties
Die Area +
Stressmarks
PMC and control flow samples
▪ Estimate power behavior from PMC information
▪ Apply phase tracking, detection and prediction
strategies under real-system effects based on
PMC and control flow features
Power Estimation &
Phase Analysis
▪ Employ real power measurements to provide
feedback to runtime power estimations and to
evaluate phase characterizations
193
237
281
325
369
413
457
501
Phase Detection Under Real-System Variability:
Problem Definition: Variability effects on phases
+ Fast (Real-time)
+ Offers estimated view of on-chip detail for real systems
+ Real measurement validation
545
Billions of Instructions
44
88
132
176
220
264
308
352
396
Billions of Instructions
Glitch
A
B
C
B
D
Gradient
A
B
C
B
D
C
B
B
E
Control flow (Basic Block Vectors / BBVs):
B
A
B
C
B
D
E
B
Mutation
Time
Dilation
A
B
C
B
D
E
F
A
B
C
B
1
50
D
1
E
00…0
00…0
A
B
C
B
t
1
44
88
132
176
220
264
308
352
396
00…0
440
11
00…0
1
00…0
11
00…0
00…0
t
Billions of Instructions
A
Phase Tracking: By evaluating the similarity
among PMC vectors (PVs):
Similarity Criterion: L1-Distance between PVs
Similarity (r , c ) PVr (i) PVc (i)
i1
N
.3
run2
PVs achieve < 5W within phase variations with
<10 phases
C
B
D
.7
1
.7
.3 0 0 .3
.7
1
0
1
.7 .7
Power [W]
40
30
Metric Variability
Time Variability
Gcc Run1
Gcc Run2
Gcc Run3
0
0
5
10
15
Time [s]
20
25
30
.7
0 0 0 0 0 0 0
0 0 0 0
Very high detect
threshold
P{hit} = 0
P{false alarm} = 0
Experimentation:
Application Binary
Power
Pintool
0 0 0 t
0 detect
threshold
P{hit} = 1
P{false alarm} = 1
Best
detection
scheme
achieves
100% hit
detection
with <5%
false alarms
7.45
Predicted_IPC
9.34
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
11.23
13.14
Power
Task1
Task2
Core/μP 1
30%
Both approaches
bring significant
insights to
application power
behavior
Random
BBV
PMC
Oracle
25%
20%
15%
10%
Core/μP 2
Speed up!
Power
Meas.via
Current
Probe
Performance
Counter
Hardware
Evaluation:
t
5.56
Orig_IPC
Swap hot task
Hardware
1
0 0 0 0
3.68
L3_Refs
Power Balancing for Multiprocessor Systems / Activity Migration:
OS serial
device file
.3 0 0 0 0 0 0 0 0
1.78
DVS State
Can predict >90% of DVFS’able phases, with less than 5% prediction overshoots!
F
1
Desired operating
point
P{hit} ~ 1
P{false alarm} ~ 0
50
1
1
Real-System Effects on Phases:
Metric and time variability
60
E
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0.00
OS
Mutations Transition based tracking
Glitches and gradients Glitch/Gradient Filtering
Shifts ~Binary cross correlations
Time Dilations Near-neighbor blurring
run1
1
B
Imperfect repeatability
Lack of detail
DVS Oracle
Time [s]
Dynamic
Instrumentation
via Pin
42
0
10
Runtime monitoring
Strong relation to power
F
Application
00…0
Runtime applicability
BBV phases ≢ power phases
No physical binding to power
Event counters (PMCs):
1
00…0
46
38
20
Perfect repeatability
Architectural independence
Detail at program level
Shift
Power can also exhibit phase behavior
Power [W]
B
Proposed Solution:
Transition-guided phase detection framework:
440
Initialization and
computation phases
Initialization with high
complex IA32 instructions
FP intensive mesh
computation phase
Long-Term Value and Duration Prediction of Memory Bound Phases for DVFS:
IPC
Ideal
A
Evaluating Control-Flow-Based and
Event-Counter Based Approaches:
Percent Error w.r.t. Actual Power
149
Per-Component Estimates: Ex. Equake
Applications of Power Phase Analysis
Slow down!
Conclusions
Certain compositions of event counters can provide reasonably accurate runtime estimates for
processor power consumption and distribution of power among architectural components
Workloads exhibit phases in their performance as well as power behavior
- Performance counter vectors help identify different (recurring) power phases of applications
Real system variability effects impose additional challenges for detecting recurrent phases
- Phase transition guided approach, together with supporting methods such as glitch/gradient filtering
and near-neighbor blurring enable detection of repetitive power phase behavior
Both control flow and event counter based application features provide insight to application power
behavior
- PMC based approaches generally provide a better proxy to application power phase behavior, due to
their strong physical binding to processor power consumption
These phase oriented methods can be employed to guide range of applications in current and next
generation systems
5%
0%
AVE(SPECint)
AVE(SPECfp)
AVE(OTHER)
AVE(Overall)
100%
AVE Error (BBV)
AVE Error (PMC)
90%
80%
PMCs achieve (on average
40%) less errors than BBVs in
power phase characterization
70%
Error
IPC
Mem Refs
1.2
1
0.8
0.6
0.4
0.2
1105
0.8
0.6
0.4
0.2
0
0
Crafty
Empirical
Multimeter
Measurements
… + NonGatedPower[I]
Power Phase Analysis on Real Systems
Phases: Distinct and often-recurring regions of
program behavior
Ex: Vortex
Gap
Realistic view: Handle non-linear scaling…
dynamic/adaptive power management
techniques
Real Measurements
Vortex
CPU
Performance
Counters!
▪ Use application phase information to guide
Dynamic Management
Gzip Vpr
Gcc
MaxPower[I] * ArchScaling[I] * AccessRate[I]
▪ Represent application execution as a stream of
Dynamic
Program
Flow
Total Power Estimates and Measurement Validation:
60%
50%
40%
30%
20%
0
20
40
60
Number of Phases
80
100
L3 Refs
Power is the primary design constraint for current systems
Power density Cooling / Thermal constraints
Energy Battery life
Workloads exhibit drastically different behavior both within
applications and among different applications (Phases)
These can be exploited by workload directed dynamic
management techniques
Dynamically reconfigurable hardware
Power balancing / Activity migration
Need methods to track application power behavior and
identify different (repetitive) regions of operation
Live, real-system experiments:
Reflect behavior of real, modern processors
Observe long time periods
Guide on-the-fly adaptations
Live, Runtime Power Monitoring and Estimation