36x48 vertical poster template - people

Download Report

Transcript 36x48 vertical poster template - people

Can Hankendi
Ayse K. Coskun
Electrical and Computer Engineering Department,
Boston University, MA, USA {hankendi, acoskun}@bu.edu
ABSTRACT
PERFORMANCE ISOLATION ON VIRTUAL SYSTEMS
As multi-threaded workloads start to emerge on the cloud, providing energy-efficient
consolidation strategies for these high-performance computing (HPC)-type loads is
becoming an important research problem. This work proposes an adaptive resource
provisioning technique for multi-threaded workloads to improve the energy efficiency of a
virtualized multi-core server. Proposed technique adjusts available resources for a virtual
machine (VM) based on the application power efficiency, while delivering the desired
performance guarantees. Experiments on a real-life multi-core server show that the
proposed technique improves the system throughput-per-watt by 15% on average (and by
up to 21%) over existing co-scheduling techniques.
• Consolidating multiple workloads can degrade performance
due to resource contention
• CPU binding and NUMA balancing can mitigate the
performance variation
0.2
0.18
Native
VM
Native
VM
VM
w/o binding
w/o binding
w/o NUMA Bal.
w/ binding
w/ binding
w/ NUMA Bal.
w/o binding
w/ NUMA Bal.
• Energy consumption of computing clusters is
increasing by 15% per year
• Energy efficiency and budget/cost control are
the major challenges for data centers
Standard Deviation / Mean
0.16
MOTIVATION
RUNTIME IMPLEMENTATION & RESULTS
0.14
0.12
0.1
0.08
0.06
0.04
Throughput
Constraint
• Each application is initially executed with equal resources
• Our technique:
• (1) monitor IPC, CPU Utilization and throughput
• (2) access LUT to check phases:
allocate more resources to higher class (i.e., scalable applications),
fewer resources to lower class applications
• (3) monitor throughput gains and loses due to resource adjustments, if
gains are higher, continue to adjust resources
Runtime Behavior w/o Throughput Constraints
0.02
HPC
EXPERIMENTAL SETUP
•
•
12-core AMD Magny
Cours
Two 6-core processors
in a single package
blackscholes
bodytrack
canneal
dedup
fluidanimate
freqmine
raytrace
streamcluster
swaptions
vips
x264
Average
• VM w/ NUMA balancer
and w/ binding provides
comparable performance
isolation and performance
with respect to the best
case
Runtime Behavior w/ Throughput Constraints
CLASSIFYING APPLICATIONS FOR POWER EFFICIENCY
• IPC*CPU Utilization metric shows strong correlation with
power efficiency
Throughput-per-watt Across 50 Workload Sets
7.50E+08
• We utilize density-based clustering algorithm (DBSCAN) to
determine application groups (classes)
Throughput-per-watt
Enterprise Loads
0
A
HPC on Cloud
• HPC applications are expected to shift
towards cloud resources
• Nature of HPC applications differs from
traditional workloads on cloud
7.00E+08
6.50E+08
6.00E+08
Max
Average
Min
5.50E+08
5.00E+08
4.50E+08
4.00E+08
3.50E+08
3.00E+08
MPC
MPC*Util
IPC
IPC*Util
Application selection based
Random
Proposed
Resource
Allocation
• For randomly
generated 50
workload sets,
proposed technique
improves the
throughput-per-watt
by 15% on average,
reaching up to 21%.
REFERENCES
Applications: PARSEC 2.1 Parallel Benchmarks [3]
[1] C.Hankendi, A. Coskun, ‘Adaptive Energy-Efficient Resource Sharing for Multi-threaded
Workloads in Virtualized Systems’. In CHANGE-DAC’12.
[2] C.Hankendi, A. Coskun, ‘Reducing the Energy Cost of Computing Through Efficient CoScheduling of Parallel Workloads’. In DATE’12.
[3] C. Bienia et al. ‘The PARSEC benchmark suite: characterization and architectural implications’. In
PACT, 2008.
*This work is partially funded by VMware, Inc. and MGHPCC.