Transcript Slide 1

Runtime Power Monitoring and Phase Analysis Methods for Power Management
Canturk Isci and Margaret Martonosi
Princeton University
Motivation and Research Overview
Counter Based Power Estimation:
 Idealized view: For all components on a chip….
Our Work:
Runtime Monitoring
Hardware
Performance
Counters
Power of component I =
▪ Monitor application Execution:
- Performance behavior via performance
monitoring counters (PMCs)
- Control flow via dynamic instrumentation
Application
From
Microarch.
Properties
Die Area +
Stressmarks
PMC and control flow samples
▪ Estimate power behavior from PMC information
▪ Apply phase tracking, detection and prediction
strategies under real-system effects based on
PMC and control flow features
Power Estimation &
Phase Analysis
▪ Employ real power measurements to provide
feedback to runtime power estimations and to
evaluate phase characterizations
193
237
281
325
369
413
457
501
 Phase Detection Under Real-System Variability:
 Problem Definition: Variability effects on phases
+ Fast (Real-time)

+ Offers estimated view of on-chip detail for real systems

+ Real measurement validation

545
Billions of Instructions
44
88
132
176
220
264
308
352
396
Billions of Instructions
Glitch
A
B
C
B
D
Gradient
A
B
C
B
D
C
B
B
E
Control flow (Basic Block Vectors / BBVs):
B
A
B
C
B
D
E
B
Mutation
Time
Dilation
A
B
C
B
D
E
F
A
B
C
B
1
50
D
1
E
00…0
00…0
A
B
C
B
t
1
44
88
132
176
220
264
308
352
396
00…0
440
11
00…0
1
00…0
11
00…0
00…0
t
Billions of Instructions
A




 Phase Tracking: By evaluating the similarity
among PMC vectors (PVs):
 Similarity Criterion: L1-Distance between PVs


Similarity (r , c )    PVr (i)  PVc (i) 
 i1

N
.3
run2
 PVs achieve < 5W within phase variations with
<10 phases
C
B
D
.7
1
.7
.3 0 0 .3
.7
1
0
1
.7 .7
Power [W]
40
30
Metric Variability
Time Variability
Gcc Run1
Gcc Run2
Gcc Run3
0
0
5
10
15
Time [s]
20
25
30
.7
0 0 0 0 0 0 0
0 0 0 0
Very high detect
threshold
P{hit} = 0
P{false alarm} = 0
 Experimentation:
Application Binary
Power
Pintool
0 0 0 t
0 detect
threshold
P{hit} = 1
P{false alarm} = 1
Best
detection
scheme
achieves
100% hit
detection
with <5%
false alarms
7.45
Predicted_IPC
9.34
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
11.23
13.14
Power
Task1
Task2
Core/μP 1
30%
 Both approaches
bring significant
insights to
application power
behavior
Random
BBV
PMC
Oracle
25%
20%
15%
10%
Core/μP 2
Speed up!
Power
Meas.via
Current
Probe
Performance
Counter
Hardware
 Evaluation:
t
5.56
Orig_IPC
Swap hot task
Hardware
1
0 0 0 0
3.68
L3_Refs
 Power Balancing for Multiprocessor Systems / Activity Migration:
OS serial
device file
.3 0 0 0 0 0 0 0 0
1.78
DVS State
 Can predict >90% of DVFS’able phases, with less than 5% prediction overshoots!
F
1
Desired operating
point
P{hit} ~ 1
P{false alarm} ~ 0
50
1
1
 Real-System Effects on Phases:
Metric and time variability
60
E
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0.00
OS
Mutations  Transition based tracking
Glitches and gradients  Glitch/Gradient Filtering
Shifts  ~Binary cross correlations
Time Dilations  Near-neighbor blurring
run1
1
B
Imperfect repeatability
Lack of detail
DVS Oracle
Time [s]
Dynamic
Instrumentation
via Pin
42
0
10
Runtime monitoring
Strong relation to power
F
Application
00…0
Runtime applicability
BBV phases ≢ power phases
No physical binding to power
Event counters (PMCs):
1
00…0
46
38
20
Perfect repeatability
Architectural independence
Detail at program level
Shift
 Power can also exhibit phase behavior
Power [W]
B
 Proposed Solution:
Transition-guided phase detection framework:
440
Initialization and
computation phases
Initialization with high
complex IA32 instructions
FP intensive mesh
computation phase
 Long-Term Value and Duration Prediction of Memory Bound Phases for DVFS:
IPC
Ideal
A
 Evaluating Control-Flow-Based and
Event-Counter Based Approaches:
Percent Error w.r.t. Actual Power
149
Per-Component Estimates: Ex. Equake
Applications of Power Phase Analysis
Slow down!
Conclusions

Certain compositions of event counters can provide reasonably accurate runtime estimates for
processor power consumption and distribution of power among architectural components

Workloads exhibit phases in their performance as well as power behavior
- Performance counter vectors help identify different (recurring) power phases of applications

Real system variability effects impose additional challenges for detecting recurrent phases
- Phase transition guided approach, together with supporting methods such as glitch/gradient filtering
and near-neighbor blurring enable detection of repetitive power phase behavior

Both control flow and event counter based application features provide insight to application power
behavior
- PMC based approaches generally provide a better proxy to application power phase behavior, due to
their strong physical binding to processor power consumption

These phase oriented methods can be employed to guide range of applications in current and next
generation systems
5%
0%
AVE(SPECint)
AVE(SPECfp)
AVE(OTHER)
AVE(Overall)
100%
AVE Error (BBV)
AVE Error (PMC)
90%
80%
 PMCs achieve (on average
40%) less errors than BBVs in
power phase characterization
70%
Error
IPC
Mem Refs
1.2
1
0.8
0.6
0.4
0.2
1105
0.8
0.6
0.4
0.2
0
0
Crafty
Empirical
Multimeter
Measurements
… + NonGatedPower[I]
Power Phase Analysis on Real Systems
 Phases: Distinct and often-recurring regions of
program behavior
 Ex: Vortex
Gap
 Realistic view: Handle non-linear scaling…
dynamic/adaptive power management
techniques
Real Measurements
Vortex
CPU
Performance
Counters!
▪ Use application phase information to guide
Dynamic Management
Gzip Vpr
Gcc
MaxPower[I] * ArchScaling[I] * AccessRate[I]
▪ Represent application execution as a stream of
Dynamic
Program
Flow
Total Power Estimates and Measurement Validation:
60%
50%
40%
30%
20%
0
20
40
60
Number of Phases
80
100
L3 Refs
 Power is the primary design constraint for current systems
 Power density  Cooling / Thermal constraints
 Energy  Battery life
 Workloads exhibit drastically different behavior both within
applications and among different applications (Phases)
 These can be exploited by workload directed dynamic
management techniques
 Dynamically reconfigurable hardware
 Power balancing / Activity migration
 Need methods to track application power behavior and
identify different (repetitive) regions of operation
 Live, real-system experiments:
 Reflect behavior of real, modern processors
 Observe long time periods
 Guide on-the-fly adaptations
Live, Runtime Power Monitoring and Estimation