Live Migration(LM) Benchmark Research

Download Report

Transcript Live Migration(LM) Benchmark Research

Live Migration(LM)
Benchmark Research
College of Computer Science
Zhejiang University
China
Outline
 Background and Motives
 Virt-LM Benchmark Overview
 Further Issues and Possible Solutions
 Conclusion
 Our Possible Work under the Cloud WG
Background and Motives
Significance of Live Migration
 Concept:


Migration: Move VM between different physical machines
Live: Without disconnecting client or application (invisible)
 Relation to Cloud Computing and Data Centers:


Cloud Infrastructures and data centers have to efficiently use their huge
scales of hardware resources.
Virtualization Technology provides two approaches:
 Server Consolidation
 Live Migration
 Roles in a Data Center:




Flexibly remap hardware among VMs.
Balance workload
Save energy
Enhance service availability and fault tolerance
Motives of the LM Benchmark
 Scale and frequency leads to a significant LM cost (TC):

S(Scale): How many servers?
 Google: Estimated 200,000 to 500,000 servers, included in 36 data centers in 2008
 MS: Added 10,000 servers per month in 2008
 FaceBook: More than 30,000 servers in its data center in 2008

F(Frequency):How often it happens?
 Load balancing
 Online maintainance and proactive fault tolerance
 Power management

C(Cost of Live Migration):
 Hardware and network bandwidth:save and transfer VM state
 Workload performance: share hardware
 Service availability: downtime
Motives of the LM Benchmark
 A LM benchmark is in need.

LM benchmark helps make right decisions to reduce cost
 Design better LM strategies
 Choose better platform

Evaluation of a data center should include its LM performance
 VMware released VMmark 2.0 for multi-server performance in DEC, 2010
 Existing evaluation methodologies have their limitations.

VMmark 2.x
 Dedicated to the VMware’s platforms
 A macro benchmark -- no spefic metrics about LM performance

Existing research on LM
 ([Vee09 Hines], [HPDC09 Liu], [Cluster09 Jin], [IWVT08 Liu], [NSDI05 Clark], …)
 All dedicated to design LM strategies
 No unified metrics and workloads. Results are not comparable to each other.
 Some critical issues are not mentioned.
 Still lack of a formal and qualified LM benchmark
Virt-LM Benchmark Overview
Goal and Criterias
 Goal of Virt-LM Benchmark:

Compare LM performance among different hardware and software
platform, especially in data center scenarios
 Design Criteria:

Metric
Workloads
 Sufficient
 Observable
 Concise

Workload
 Typical
 Scalable

Stability
 Produce repeatable results


platform
Metric
Results
Metric
Results
…
platform
Scoring methodology
 Impartial

platform
Compatibility
Usability
Metric
Results
System Under Test
 System Under Test(SUT):
 Evaluation
Target
 Hardware and software platform
 Including its VMM and the LM strategies it used
Workloads
SUT
SUT
Metric
Results
Metric
Results
…
SUT
Metric
Results
Metrics
 Metrics and Measurement:


Downtime
 Metrics Sufficiency:

Cost :
 Def: how long the VM is suspended
 migration overhead,
 Measure: ping
 amount of migrated data
(burden on network)
Total migration time
 Def: how long a LM lasts

QoS:
 Measure: timing the LM command

Amount of migrated data
 Def: how many data is transferred
 Measure: transferred data on its exclusive TCP
port

 downtime,
 total migration time
 migration overhead,
Migration overhead
 Def: How much LM impaires performance of
the workload
 Measure: Declined percentage of the
workloads’s score
9
Workloads
 Representative to
real scenarios
migrate
 Where:
 Data centers
service
VM
VM
…
VM
OS
 When:
 Load balancing
 power management,
 service enhancement and
fault tolerate
Platform (HW and VMM)
Workloads
 During a live migration,
 VM
could run different services
migrate
 Mail Server
 Application Server
 File Server
VM
VM
…
service
VM
OS
 Web Server
 Database Server
Platform (HW and VMM)
 Standby Server
 Other
VMs exist on the same platform
 Heavy during load balancing
 Light during power management
 Random during service enhancement and fault tolerance
 Happens
at any moments (Migrations Points)
11
Workload Implementation
 Internal workload types






Mail Server: SPECmail2008
App Server: SPECjAppServer2004
File Server: Dbench
Web Server: SPECweb2005
Database Server: Sysbench
Standby Server: Idle VM
External workload
VM
VM
Platform (HW and VMM)
Heavy: more VMs to fully utilize the machine
 Increasing VMs until workload performances are undermined

Light: single VM on the platform
VM
Internal
Workload
OS
 External workload types

…
migrate
Migration Points Problem
 During the run of a workload


LM happens at random time
Performance varies at different points
workload: 483xalancbmk of SPECcpu2006

How to fully represent a workload’s performance variety?
 Test as many migration points,spreading the whole run of a workload
Migration Points Problem
 Problem

too many points prolong the test significantly
 Soution


More sample results in each run
Only a few runs
First run
Second run
Third run
 Implementation

Divide a workload’s runtime into many time sectors
 Each time sector is longer than total migration time

Migrate at the startpoint of each sector
Scoring Method
 Goal: compute an overall score
 Each metric i,compute a final score Si

Normalize each result (Pij) using reference system(Rij)

Sum up results of all workloads:

Si of reference system is always 1000:
Lower Score indicates higher performance

 Open Problem: merge the 4 metrics’ Si



Different property,different variation
Simply adding up is not appropriate
Current implementation in Virt-LM: Final result have 4 scores
Other Criterias
 Usability

Easy to configure
 VM images Provided
 Workloads pre-installed

Easy to run
 Automatically managed after launch
 Compatibility

Successful on Xen and KVM
 Scalable workload: Fully utilize the hardware
 Heavy enough macro workload
 Live migration lasts a long time.
 (Multiple live migration)
 more than one are migrated concurrently
Benchmark Components
 Logical components




System Under Test
Migration Target Platform
VM Image Storage
Management Agent
 Benchmark components

Workload VM images
 Distributed on VM Image Storage

Running Scripts
 Installed on Management Agent
Internal Running Process
 For every class of workload

Initialize the environment

Run the workload

Migrate the VM at different migration points

Fetch the metrics results
 Collect all results and Compute an overall score
 Management Agent automatically control the whole process
18
Experiments on Xen and KVM
 Experiment Setup

SUT-XEN
 VMM:Xen 3.3 on Linux 2.6.27
 Hardware:DELL OPTIPLEX 755, 2.4GHz Intel Core Quad Q6600,
2GB memory, sata disk, 100Mbit network

SUT-KVM
 VMM:KVM-84 on Linux 2.6.27
 Hardware:Same as SUT-XEN

VM
 Linux 2.6.27, 512MB mem, one core
 Workload
 Internal: SPECjvm2008, cpu/mem intensive workloads
 External: Light: single VM
 Migration Points:Spreading the whole running
Experiments on Xen and KVM
 Analysis

SUT-KVM intensively compress the data
 Less migrated data and less total time
 More overhead
Experiments on Xen and KVM
 Analysis
 SUT-XEN strictly control the “downtime”
 Less downtime
 More migrated data:Due to more rounds of pre-copy to decrease downtime
Experiments on Xen and KVM

Analysis
 Conclusion
 SUT-XEN less “downtime”and “overhead”,
 But more consumption of network
Further Issues and Possible Solutions
1. Workload Complexity
 Total test takes a long time
Total time = Runtime * N workload types
 When workloads has too many
combination
N = I * E * P (* M )

Internal workload
External workload
(I) Internal workload types:
 Mail Server,App Server, File Server, Web
Server, DBServer , Standby Server

(E) External workload types:
 Heavy, Light

(P) Migration points quantity:
 Considerable due to the long run time of each
workload
Migration Points
Multiple migration
Possible Solutions
 Speed up for migration points:


(Virt-LM’s current implementation)
More samples in a run
 Using time-insensitive workloads

Micro operation: CPU, Memory, IO…
 Different memory r/w intensity

Advantage:
 Eliminate the “Migration Points” dimension
 Internal workloads are reduced
 Runtime of each each workload is shorten

Disadvantage:
 Different from real scenarios
 Hybrid


Test time-insensitive micro workloads
Analysis and predict typical workloads results
 Redefine an average workload
2. Multiple/Concurrent Live Migration
 Problem: Define overall
metrics


Representative for platform’s
maxium performance
Other concerns:
 When average results decreased
obviously
VM
…
VM
VM
VM
Platform (HW and VMM)
 When system cannot afford
Thresholds: Concurrent numbers
 Possible solutions


Maximum sum of metrics
Define different thresholds
Average
decreased
Obviously
Sum
decreased
Obviously
Maximum
sum
System
cannot
afford
3. Other Issues
 Overall score computation

Virt-LM produces 4 scores as the final result
 Definition of external workloads

Current implementation is simple
 Repeatability


Need more experiment to exam
Migration points are not precisely arranged
 Compatibility

Should be compatible to other VMM, besides Xen and KVM
 Usability

More easy to configure and run
Conclusion
Current Work
 Investigation on recent work on LM
 Summarize the critical problems

Migration points

Workload complexity

Scoring methods

Multiple live migration
 Present some possible solutions
 Implement a benchmark prototype – Virt-LM
More details in “Virt-LM: A Benchmark for Live Migration of Virtual Machine”(ICPE2011)
Future work
 Improve and complete Virt-LM

Implement and test other solutions
 Workload complexity
 Multiple live migration
 Overall score computation
 Others

Test and compare their effectiveness and choose best one
30
Our Possible Work under the Cloud WG
Possible Work
 Relation to the cloud benchmark

Enough migration cost in the workload

 Although the cost maybe not a metric, we have to ensure workload
could cause enough cost.
How fast could a cloud reallocate resources?
 If implemented by live migration technology, it regards to following
two factors:
 1. how many migrations (determined by) resource
management and reallocation strategies
 2. how fast for each migration  live migration efficiency
& cost
 Possible future work under cloud benchmark

We may work on how to ensure the workload produce enough live
migration cost
32
Possible Work
 We hope to cooperate with other members, maybe
join a sub-project related to live migration.
 We hope can contribute to the design of the Cloud
Benchmark
33
Team Members
 Prof. Dr. Qinming He

[email protected]
 Kejiang Ye


Representative of the SPEC Research Group
[email protected]
 Assoc. Prof. Dr. Deshi Ye

[email protected]
 Jianhai Chen

[email protected]
 Dawei Huang

[email protected]
 …….
Appendix: Team’s Recent Work
Virtualization Performance
 Virtualization in Cloud Computing System

IEEE Cloud’2011, IEEE/ACM GreenCom’2010
 Performance Evaluation & Benchmark of VM

ACM/SPEC ICPE’2011, IWVT’2008 (ISCA Workshop), EUC’2008
 Performance Optimization of VM

ACM HPDC’2010, IEEE HPCC’2010, IEEE ISPA’2009
 Performance Modeling of VM

IEEE HPCC’2010, IFIP NPC’2010
Performance Testing Toolkit for VM

IEEE ChinaGrid’2010
36
Publications
 [1] Live Migration of Multiple Virtual Machines with Resource Reservation in Cloud
Computing Environments (IEEE Cloud’2011, Accept)
 [2] Virt-LM: A Benchmark for Live Migration of Virtual Machine (ACM/SPEC
ICPE’2011)
 [3] Virtual Machine Based Energy-Efficient Data Center Architecture for Cloud
Computing: A Performance Perspective” (IEEE/ACM GreenCom’2010)
 [4] Analyzing and Modeling the Performance in Xen-based Virtual Cluster
Environment, (IEEE HPCC’2010 )
 [5] Two Optimization Mechanisms to Improve the Isolation Property of Server
Consolidation in Virtualized Multi-core Server, (IEEE HPCC’2010)
 [6] Evaluate the Performance and Scalability of Image Deployment in Virtual Data
Center, (IFIP NPC’2010)
 [7] vTestkit: A Performance Benchmarking Framework for Virtualization
Environments, (IEEE ChinaGrid’2010)
 [8] Improving Host Swapping Using Adaptive Prefetching and Paging Notifier, (ACM
HPDC’2010)
 [9] Load Balancing in Server Consolidation, (IEEE ISPA’2009)
 [10] A Framework to Evaluate and Predict Performances in Virtual Machines
Environment, (IEEE EUC’2008)
 [11] Performance Measuring and Comparing of Virtual Machine Monitors,
(IWVT’2008, ISCA Workshop)
37
Thank you!