Transcript Cellular Disco: Resource Management using Virtual Clusters
Cellular Disco: Resource management using virtual clusters on shared memory multiprocessors
Kinshuk Govil, Dan Teodosiu*, Yongqiang Huang, and Mendel Rosenblum Computer Systems Laboratory, Stanford University * Xift, Inc., Palo Alto, CA
www-flash.stanford.edu
Motivation
• Why buy a large shared-memory machine?
– Performance, flexibility, manageability, show-off • These machines are not being used at their full potential – Operating system scalability bottlenecks – No fault containment support – Lack of scalable resource management • Operating systems are too large to adapt 2
Previous approaches
• Operating system: Hive, SGI IRIX 6.4, 6.5
+ Knowledge of application resource needs – Huge implementation cost (a few million lines) • Hardware: static and dynamic partitioning + Cluster-like (fault containment) – Inefficient, granularity, OS changes, large apps • Virtual machine monitor: Disco + Low implementation cost (13K lines of code) – Cost of virtualization 3
Questions
• Can virtualization overhead be kept low?
– Usually within 10% • Can fault containment overhead be kept low?
– In the noise • Can a virtual machine monitor manage resources as well as an operating system?
– Yes 4
Overview of virtual machines
• IBM 1960s • Trap privileged instructions
Virtual Machine App Virtual Machine App OS OS
• Physical to machine address mapping
Virtual Machine Monitor Hardware
• No/minor OS modifications 5
Avoiding OS scalability bottlenecks
VM VM Application OS OS Virtual Machine App Cellular Disco App App Operating System CPU CPU CPU CPU CPU CPU CPU . . .
Interconnect 32-processor SGI Origin 2000
6
Experimental setup
IRIX 6.2
Cellular Disco
vs.
IRIX 6.4
32P Origin 2000 32P Origin 2000
• Workloads – Informix TPC-D (Decision support database) – Kernel build (parallel compilation of IRIX5.3) – Raytrace (from Stanford Splash suite) – SpecWEB (Apache web server) 7
MP virtualization overheads
32-processor overheads 120 100 80 60 40 20 0 +10% +20% +1% +4% IRIX Cellular Disco Informix TPC-D Kernel build Raytrace SpecWEB
• Worst case uniprocessor overhead only 9% 8
Fault containment
VM VM Cellular Disco VM CPU CPU CPU CPU CPU CPU CPU CPU Interconnect
• Requires hardware support as designed in FLASH multiprocessor 9
Fault containment overhead @ 0%
120 100 80 +1% -2% +1% +1% 1 cell 8 cells 60 40 20 0 Informix TPC-D Kernel build Raytrace SpecWEB
• 1000 fault injection experiments (SimOS): 100% success 10
Resource management challenges
• Conflicting constraints – Fault containment – Resource load balancing • Scalability • Decentralized control • Migrate VMs without OS support 11
CPU load balancing
VM VM VM VM Cellular Disco VM VM CPU CPU CPU CPU CPU CPU CPU CPU Interconnect
12
Idle balancer (local view)
• Check neighboring run queues (intra-cell only) • VCPU migration cost: 37µs to 1.5ms
– Cache and node memory affinity: > 8 ms • Backoff • Fast, local
CPU 0 CPU 1 CPU 2 CPU 3
B0 B1 A0 B1 A1 VCPUs 13
Periodic balancer (global view)
• Check for disparity in load tree • Cost – Affinity loss – Fault dependencies 4 1 3 1
CPU 0
0
CPU 1
B0 B1 2
CPU 2
A0 B1 1
CPU 3
A1 fault containment boundary 14
CPU management results
250 200 150 100 50 +0.3% +9% 0 One VM Four VMs Eight VMs
• IRIX overhead (13%) is higher
Ideal Cellular Disco
15
Memory load balancing
VM VM VM Cellular Disco VM RAM RAM RAM RAM RAM RAM RAM RAM Interconnect
16
Memory load balancing policy
• Borrow memory before running out • Allocation preferences for each VM • Borrow based on: – Combined allocation preferences of VMs – Memory availability on other cells – Memory usage • Loan when enough memory available 17
Memory management results
DB Cellular Disco 32 CPUs, 3.5GB
Interconnect DB
Only +1% overhead
4 4 Cellular Disco 4 4 4 4 Interconnect 4 4
• Ideally: same time if perfect memory balancing 18
Comparison to related work
• Operating system (IRIX6.4) • Hardware partitioning – Simulated by disabling inter-cell resource balancing
16 process TPC-D Raytrace Cellular Disco
8 CPUs 8 CPUs 8 CPUs
Interconnect
8 CPUs 19
Results of comparison
500 400 300 200 216 221 434 231 229 325 Operating system Virtual clusters Hardware partitioning 100 0 Raytrace Database
• CPU utilization: 31% (HW) vs. 58% (VC) 20
Conclusions
• Virtual machine approach adds flexibility to system at a low development cost • Virtual clusters address the needs of large shared-memory multiprocessors – Avoid operating system scalability bottlenecks – Support fault containment – Provide scalable resource management – Small overheads and low implementation cost 21