Effective VM Sizing in Virtualized Data Centers

Download Report

Transcript Effective VM Sizing in Virtualized Data Centers

Effective VM Sizing in
Virtualized Data Centers
Ming Chen1, Hui Zhang2, Ya-Yunn Su3,
Xiaorui Wang1, Guofei Jiang2, Kenji Yoshihira2
1. University of Tennessee
2. NEC Laboratories America
3. National Taiwan University
Virtualized data centers: server consolidation and green IT
•
Server consolidation - virtualization facilitates consolidation of several
physical servers onto a single high end system
— Reduces management costs/overheads
— Increases overall utilization
Resource Pool
•
Green IT - computing more, consume less
— Improving infrastructure efficiency
Today
—Increasing IT productivity
Future
IT load power
DCiE =
Total data center
Input power
7/16/2015
DCiE: Data center infrastructure efficiency
DCpW =
Data center
useful work
Total facility
power
2
DCpW: Data center performance per Watt
In virtualized data centers…
• Server utilization based performance and
power management mechanisms
Overload
threshold
CPUhigh
CPU utilization
– VMware DPM, NEC SSC, IBM Tivoli…
CPUlow
7/16/2015
Power-saving
mode
3
VM sizing – a resource management
primitive in virtualized data centers
VM
How much resource
allocated to this VM?
CPU utilization
time
Sizing over the maximal load?
Low resource utilization!!!
Sizing over the average load?
High performance violations!!!
Maximal load is much larger than the
average load
• 90% of the servers have the maximal load at
least 2.2 times larger than their average load;
• 50% of the servers have the maximal load at
least 7.2 times larger than their average load.
7/16/2015
Cumulative Distribution Function of
Server Normalized -percentile Loads
(5,415 servers of 10 IT systems)
4
Effective VM sizing
• Effective size, a new VM sizing concept under
probabilistic SLAs
– A probabilistic SLA example [Bobroff2007]
– Prob[server x’s CPU utilization at any time > 90%] < 5%
• A VM’s effective size is decided by four factors
1.
2.
3.
4.
its own workload
performance constraint defined as probabilistic SLAs
the resource capacity of the server
the VMs co-hosted in the server
7/16/2015
5
Stochastic bin packing problem
• Given
VMs
workload
– a set of items, whose size is described by independent random
variables S = {X1,X2, … ,Xn},
– and an overflow probability p,
• Partition
machines
– the set S into the smallest number of set (bins) S1 ,… , Sk such
that
•
SLA
– for all 1 ≤ j ≤ k.
• Effective sizing is the basis of a family of O(1)approximation algorithms for the stochastic bin packing
problem.
7/16/2015
6
Effective Sizing – intrinsic demand
• Let a random variable Xi represent a VM i's resource demand, and
Cj is the resource capacity of server j.
• The intrinsic demand of VM i on server j is defined as
and Nij is the maximal value of N satisfying the following constraint
where Uk are independent and identically distributed (i.i.d.) random
variables with the same distribution as Xi.
7/16/2015
7
Intrinsic demand – one example
Statistical multiplexing rocks!
Effective sizing example: i.i.d random variables with normal distribution
(server overload probability = 2.5%)
7/16/2015
8
Intrinsic demand – analysis
• Theorem 1. For items with independent Poisson
distributions, the First Fit Decreasing (FFD)
deterministic bin packing algorithm with effective sizing
(intrinsic demand) finds a solution to the stochastic bin
packing problem with at most (1.22B*+1) bins of size 1,
where B* is the minimum possible number of bins.
• Theorem 2. For items with independent normal
distributions, the First Fit Decreasing deterministic bin
packing algorithm with effective sizing (intrinsic demand)
finds a solution to the stochastic bin packing problem
with at most (1.22B*+1) bins of size 1+rc, where B* is the
minimum possible number of bins, and rc ≤ 0.496.
7/16/2015
9
Intrinsic demand may not be enough
• Workload independence assumption might
not hold in practice
7/16/2015
10
Effective Sizing – correlation-aware demand
•
•
Let a random variable Xi represent a VM i's resource demand, and another
random variable Xj represent a server j's existing aggregate resource
demand from the VMS already allocated to it.
The correlation-aware demand of VM i on server j is defined as
where σ2i and σ2j are the variances of the random variables Xi and Xj;
ρ is the correlation coefficient between Xi and Xj; Zα denotes the αpercentile for the unit normal distribution (α= 1-p).
• For example, if we want the overflow probability p = 0.25%, then
α= 99.75%, and Zα = 3.
7/16/2015
11
Applying effective sizing in production systems
• Practical issues in many dimensions
– Product implementation
– VM migration cost
• History and correlation aware (HCA) VM placement algorithm in the
paper.
– Workload distribution modeling
– Workload stationarity
– Application-layer SLAs
• Please see discussions in the paper.
7/16/2015
12
1
14
27
40
53
66
79
92
105
118
131
144
157
170
183
196
209
222
235
248
261
274
287
300
313
326
339
352
365
378
391
404
417
430
443
456
469
482
495
508
521
534
547
560
573
586
599
612
625
638
651
664
1
14
27
40
53
66
79
92
105
118
131
144
157
170
183
196
209
222
235
248
261
274
287
300
313
326
339
352
365
378
391
404
417
430
443
456
469
482
495
508
521
534
547
560
573
586
599
612
625
638
651
664
1
14
27
40
53
66
79
92
105
118
131
144
157
170
183
196
209
222
235
248
261
274
287
300
313
326
339
352
365
378
391
404
417
430
443
456
469
482
495
508
521
534
547
560
573
586
599
612
625
638
651
664
1
14
27
40
53
66
79
92
105
118
131
144
157
170
183
196
209
222
235
248
261
274
287
300
313
326
339
352
365
378
391
404
417
430
443
456
469
482
495
508
521
534
547
560
573
586
599
612
625
638
651
664
Data center workload traces
•
Traces on 2525 servers from 10 IT
systems
– Each is regarded as a VM in the
simulations.
•
•
Monitoring data: CPU utilization.
1 week length, 15 minute monitoring
frequency
– 672 time points
100
50
0
150
100
50
0
100
50
0
150
100
50
0
7/16/2015
13
Simulation methodology
•
All physical servers have homogenous hardware specs.
– CPU resource: 3GHZ X 4 (Quadra-core) (the most common CPU model in the
traces)
– Memory constraint: the maximal number of VMs allowable if the server is
memory bounded (4, 8, 16, …)
•
At the beginning of each time window, provoke the server consolidation
scheme
– Using the monitoring information in the previous window to make decision
•
During each time window, measure the placement scheme by
– The number of active servers
– Server overflowing probability
•
•
p=5% in the evaluation.
Five server consolidation schemes
– B1: FFD + average load
– B2: FFD + maximal load
– B3: FFD + VMware DPM VM sizing (μ+2σ, μ - mean, σ – standard deviation)
– B4: FFD + 95-percentile load
7/16/2015
– ES-CA: FFD + effective sizing
14
Simulation results
Effective sizing
46% less servers than max-load sizing
23% less servers than VMware DPM
10% less servers than 95-percentile
7/16/2015
15
Simulation results
Effective sizing
34% less servers than max-load sizing
16% less servers than VMware DPM
11% less servers than 95-percentile
ES-CA
7/16/2015
16
Conclusions & Future Work
• Effective sizing, a new VM sizing method
in server consolidation.
– O(1)-approxmiation algorithms for stochastic
bin packing problem.
– Migration-cost and workload-correlation
aware VM placement algorithms.
• Future work
– Server consolidation in multiple dimensions.
• CPU, memory, disk, network.
7/16/2015
17