Dynamic Placement of Virtual Machines for

Download Report

Transcript Dynamic Placement of Virtual Machines for

Dynamic Placement of Virtual
Machines for Managing SLA
Violations
NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY
SOME SLIDE CONTENT ADAPTED FROM ALEXANDER NUS
PRESENTED BY JON LOGAN
Motivation

Virtual machines are becoming more and more popular
throughout our datacenters

Servers use electricity

Electricity can be expensive!

How do we minimize the number of utilized machines, while
meeting our SLA obligations?

Usage patterns of machines are NOT static, and generally
change dynamically
Goals

Maximize utilization of active machines

Minimize Service Level Agreement (SLA) violations

Minimize number of active machines


Power off unused machines to conserve cost
(electricity)
Essentially, minimize cost while meeting SLA
guarantees
Static Allocation

All machines are taken offline, and historical usage
is used to determine ideal placement

Happens very infrequently (~weeks or months)

Must interrupt service to relocate

Utilization is not consistent in many cases! Demand
may vary significantly within the period between
allocations
Dynamic Allocation

VMs are seamlessly migrated between machines
based on predicted demand

Is done rather frequently (~minutes, hours)

Live migration


Minimal (~ms) service disruptions during migration
Allows for allocations to more closely follow demand
Live Migration

Moves a VM image between machines without
service interruption

The paper cites a ~45 second transition time

VM must be serialized and transferred over the
network

Artificially limits our reallocation period

Can’t reallocate faster than we can migrate!
Service Level Agreement


Essentially is a contract between the provider and the customer
that states that resources R will be available X% of the time

Violations cost money!

X is usually high (ex. 95%)
VMs do not necessarily use this entire resource allocation at all
times, but it must be available should they choose to use it

Ex. VM may be doing batch processing, and only do substantial work
between 12:00AM and 1:00AM
Static vs Dynamic Usages

Workloads are not static!

Try to predict the usage of the VM in a time
T

Reallocate machines to be able to meet
that predicted usage

Need to be within a certain percentile to
meet SLA requirements

Capacity savings is simply


Static Allocation - (Predicted Usage + Error
Factor)
Repeat this process every time T
What Workloads Are Best For Dynamic Allocation?

Not all Workloads are created equal

Some tend to be better than others

Constant workloads = bad!

A workload is an ideal candidate for dynamic
allocation if


It has strong variability AND

It has strong autocorrelation combined with periodic
behavior
Essentially, you need to have a decent degree of
variability, and be able to reasonably predict its
usage
Workload 3a

Strongly variable – good

Autocorrelation ~0.8 – good

Weak periodic behavior – bad

Verdict – Good

Large variability offers significant
potential for optimization

Strong autocorrelation makes it possible
to obtain a low-error predication
Workload 3b

Weakly variable - bad

Decaying autocorrelation - bad

Weak periodic behavior – bad

Verdict – Bad

Low variability makes potential gain low

Weak autocorrelation and no periodic
component make it difficult to predict
demand
Workload 3c

Strongly variable – good

Strong Autocorrelation– good

Strong periodic behavior –
good

Verdict – Very Good

An ideal case for dynamic
allocation
Potential Gain
Demand forecast algorithm

Determine the periods in demand using ‘common
sense’ aided by periodogram (e.g.time-of-day,day
of week,…)

Decompose the process into deterministic periodic
and residual components Di + ri

Estimate the deterministic part using averaging of
multiple smoothed historical periods

Fit Auto Regressive Moving Average (ARMA) model
to the residual process

Use the combined components for demand
prediction
Ui = Di + ri
Management Algorithm

Goal is to minimize time averaged number of active servers
without violating the SLA agreement

Machines that are not utilized to handle VMs are powered off
or put in a low power state

Will be reactivated if/when required (minimally, the next period)

The time to power on & migrate must be less than the period T

Responsible for actual migrations of machines

Placing of VMs is essentially a version of the bin packing
problem

NP hard!

We use an approximation, using first-fit
Management Algorithm

Measure – Measure usage

Forecast – Predict usage for the next window

Remap – Relocate machines if necessary

Preform this (MFR) at regular intervals

Designed to try to predict the “best we can do”
Management Algorithm
Overview
Key Terms

N – virtual machines

M – physical machines

Cm – Maximum capacity of physical machine

fni, k – forcast value for resource demand of VM n at interval
i+k

R – migration interval

Cp(u, o2) – (1-p)-percentile of Gaussian distribution with mean
u and variance o2
Management Algorithm
Management Algorithm (2)
Management Algorithm (3)
Management Algorithm (4)
Simulations

Simulated using traces gathered from hundreds of production servers using
various applications

Traces contain CPU, memory, storage, and network

We are only focusing on CPU usage

Samples were collected every 15 minutes

The simulated study

Verifies that the MFR meets SLA targets

Quantifies the reduction of SLA violations

Quantifies the number of saved machines

Explores the relationship between the remapping interval and the gain from
dynamic management

Performs measurements to determine properties of a practical infrastructure
with respect to migration of VMs
Overflows vs Number of PMs
Number of Machines vs Overflow Desired
Significantly reduces
number of machines
active
Performance
degrades as the
migration interval
increases
Essentially, the
prediction is the
max usage
predicted within the
range
Limitations

The paper only looks at one resource utilization

In this case, CPU utilization

In the real world, you have numerous resources to handle
allocations for


Assumes bandwidth between machines is free &
unrestricted


Memory, CPU, IO, Network, etc.
Relocating some VMs in some cases may not be worth the
cost of relocating the image
Their study size is small

Only 6 physical machines

What if different VMs have different SLA requirements?

What if your PMs had differing hardware?
Conclusion

Based on the simulated data, it significantly reduces cost to
execute virtual machines

Relies on an ideal case of VMs

Predictable and volatile usage

Algorithm could be optimized to reduce the number of VM
relocations, or to more optimally schedule

Simulation is too small

The paper claims a 44% average savings in the number of active
PMs