A Fast Rejuvenation Technique for Server Consolidation
Download
Report
Transcript A Fast Rejuvenation Technique for Server Consolidation
A Fast Rejuvenation Technique
for Server Consolidation
with Virtual Machines
Kenichi Kourai
Shigeru Chiba
Tokyo Institute of
Technology
Server consolidation with VMs
Server consolidation is widely carried out
Multiple server machines are integrated on one
physical machine
Recently, using virtual machines (VM)
VMs are run on a virtual machine monitor (VMM)
Multiplexing resources
VM
VM ...
VMM
hardware
Software aging of VMMs
Software aging of a VMM is critical
Software aging is...
• The phenomenon that software state degrades
with time
• E.g. exhaustion of system resources
Software aging of a VMM
affects all VMs on it
• E.g. performance degradation
VM
VM ...
VMM
Software rejuvenation of VMMs
Preventive maintenance
Performed before software aging of a VMM
affects its VMs
Occasionally stops a VMM, cleans its internal
state, and restarts it
Typical example: rebooting a VMM
Cleans the internal state automatically and
completely
The easiest way
Drawbacks (1/2):
Increasing service downtime
The VMM reboot needs:
Rebooting all OSes running on the VMs
• The time tends to be long
• Larger number of VMs
• Longer startup time of services
VM
OS
OS
...
VMM
A hardware reset
• The BIOS power-on self test is time-consuming
OS
shutdown
VMM
shutdown
hardware
reset
VMM
boot
OS
boot
Drawbacks (2/2):
Performance degradation
The file cache is lost by the OS reboot
OSes cannot restore performance until the file
cache is re-filled
• They strongly rely on the file cache
to speed up file accesses
process
The time tends to be long
• The file cache size is increasing
• Large amount of memory for a VM
• Free memory as the file cache
file
cache
OS
disk
Warm-VM reboot
Fast rejuvenation technique
Efficiently reboots only a VMM
• The VMM reboot causes no OS reboot
Basic idea
• Suspend all VMs before the VMM reboot
• Resume them after the reboot
Challenge
• How does a VMM efficiently deal with the large
memory images of VMs?
On-memory suspend of VMs
Freezes the memory images of VMs on the
main memory
That memory area is just reserved
• The time does not depend on the memory size
Saving them into a slow disk is inefficient
ACPI S3 state for VMs
VM
Suspend To RAM
Traditional suspend is
ACPI S4 state
disk
freez
e
main memory
On-memory resume of VMs
Unfreezes the memory images preserved on
the main memory
They are reused directly as the memory of VMs
• No need to read them from a slow disk
The file cache of OSes is also restored
• No performance degradation
VM
disk
unfreez
e
main memory
Quick reload of VMMs
Directly boots a new VMM without a hardware
reset
The memory images of VMs are preserved
through the VMM reboot
• Software can keep track of them
• A hardware reset does not guarantee this
A VMM is rebooted quickly
• No overhead due to
a hardware reset
main memory
VM
new VMM
preload
old VMM
Comparison with other methods
Cold-VM reboot
Needs the OS reboot
Saved-VM reboot
A naive implementation of the warm-VM reboot
• VMs are saved into a disk
Reboot method
Cold-VM Saved-VM Warm-VM
Depend on # of VMs
Yes
No
No
Depend on services
Yes
No
No
Depend on mem size of VMs No
Yes
No
Performance degradation
No
No
Yes
Model for availability
Must consider the software rejuvenation of both
a VMM and OSes
Warm-VM reboot
• The OS rejuvenation is
independent
Cold-VM reboot
• The OS rejuvenation is affected
by the VMM rejuvenation
• # of the OS rejuvenation
increases
OS rejuvenation
VMM rejuvenation
OS rejuvenation
VMM rejuvenation
RootHammer
We have implemented the warm-VM reboot into
Xen 3.0.0
VM
physical
On-memory suspend/resume
memory
memory
• Based on Xen's suspend/resume
• Manages the mapping from the
VM memory to the physical memory
Quick reload
• Based on the kexec mechanism in Linux
• Kexec for a VMM is included in the latest Xen
• It is not for reusing the memory images
Experiments
Examine that the warm-VM reboot reduces
downtime and performance degradation
Comparison
• Cold-VM reboot with the OS reboot
• Saved-VM reboot using Xen's suspend/resume
server
...
Linux
Linux
client
VMM
2 dual-core 12 GB 15,000 rpm gigabit
Opteron SDRAM SCSI disk Ethernet
Linux
Performance of
on-memory suspend/resume
Suspend/resume of one VM
with 11 GB of memory
Ours: 1 sec
Xen's: 280 sec
• Depends on the memory size
Suspend/resume of 11 VMs
Ours: 4 sec
OS reboot: 58 sec
• Depends on # of VMs
Effect of quick reload
VMM boot
hardware reset or quick reload
VMM shutdown
70
The time of rebooting a
VMM with no VMs
Warm-VM reboot
• 11 sec
• The time of quick reload
is negligible
60
50
40
30
20
10
0
Warm-VM
Cold-VM
Cold-VM reboot
• 59 sec
• The time due to a
hardware reset is 48
sec
Downtime of services
Warm-VM reboot
Always the same
• 42 sec
Saved-VM reboot
Depends on # of VMs
• 429 sec (11 VMs)
Cold-VM reboot
Affected by the service type
• 157 sec (sshd)
• 241 sec (JBoss)
Availability of JBoss
The warm-VM reboot achieves four 9s
Assumptions
• OS rejuvenation every week
• 34 sec
• VMM rejuvenation every 4 weeks
• In 0.5 week after the last OS rejuvenation
1 week
OS rejuvenation
0.5 week
VMM rejuvenation
Warm-VM reboot
99.993%
Cold-VM reboot
99.985%
Saved-VM reboot
99.977%
Performance degradation
The throughput of the
Apache web server
before and after the VMM
reboot
Warm-VM reboot
• No degradation
Cold-VM reboot
• Degraded by 69%
Software rejuvenation
in a cluster environment
Clustering achieves zero downtime
Multiple hosts can provide the same service
Let us consider the total throughput of all hosts
in a cluster
total throughput
Warm-VM reboot
• (m-1)p
Cold-VM reboot
• (m-1)p
• (m-0.69)p for a while
after the reboot
mp
(m-1)p
42 sec
241 sec
m: # of hosts
p: throughput of one host
t
Comparison with VM migration
in a cluster environment
VM migration achieves nearly zero downtime
VMs are moved to another host
• Xen's live migration, VMware's VMotion
Total throughput
Normal run
• (m-1)p
• One host is reserved
for migration
Live migration
• (m-1.12)p
total throughput
mp
(m-1)p
42 sec
17 min
t
Related work
Microreboot [Candea et al.'04]
Reboots only a part of subcomponents
• The warm-VM reboot enables rebooting only a
parent component (VMM for VMs)
Checkpointing/restart [Randell '75]
Saves/restores OS processes
• Similar to suspend/resume of VMs
Optimizations of suspend/resume
Incremental suspend, compression of memory
images
Conclusion
We proposed the warm-VM reboot
On-memory suspend/resume
• Freezes/unfreezes the memory images of VMs
Quick reload
• Preserves the memory images through the VMM
reboot
It achieved fast rejuvenation
Downtime reduced by 83% at maximum
No performance degradation