Cellular Disco: Resource Management using Virtual Clusters

Transcript Cellular Disco: Resource Management using Virtual Clusters

Cellular Disco: Resource management using virtual clusters on shared memory multiprocessors

Kinshuk Govil, Dan Teodosiu*, Yongqiang Huang, and Mendel Rosenblum Computer Systems Laboratory, Stanford University * Xift, Inc., Palo Alto, CA

www-flash.stanford.edu

Motivation

• Why buy a large shared-memory machine?

– Performance, flexibility, manageability, show-off • These machines are not being used at their full potential – Operating system scalability bottlenecks – No fault containment support – Lack of scalable resource management • Operating systems are too large to adapt 2

Previous approaches

• Operating system: Hive, SGI IRIX 6.4, 6.5

+ Knowledge of application resource needs – Huge implementation cost (a few million lines) • Hardware: static and dynamic partitioning + Cluster-like (fault containment) – Inefficient, granularity, OS changes, large apps • Virtual machine monitor: Disco + Low implementation cost (13K lines of code) – Cost of virtualization 3

Questions

• Can virtualization overhead be kept low?

– Usually within 10% • Can fault containment overhead be kept low?

– In the noise • Can a virtual machine monitor manage resources as well as an operating system?

– Yes 4

Overview of virtual machines

• IBM 1960s • Trap privileged instructions

Virtual Machine App Virtual Machine App OS OS

• Physical to machine address mapping

Virtual Machine Monitor Hardware

• No/minor OS modifications 5

Avoiding OS scalability bottlenecks

VM VM Application OS OS Virtual Machine App Cellular Disco App App Operating System CPU CPU CPU CPU CPU CPU CPU . . .

Interconnect 32-processor SGI Origin 2000

Experimental setup

IRIX 6.2

Cellular Disco

vs.

IRIX 6.4

32P Origin 2000 32P Origin 2000

• Workloads – Informix TPC-D (Decision support database) – Kernel build (parallel compilation of IRIX5.3) – Raytrace (from Stanford Splash suite) – SpecWEB (Apache web server) 7

MP virtualization overheads

32-processor overheads 120 100 80 60 40 20 0 +10% +20% +1% +4% IRIX Cellular Disco Informix TPC-D Kernel build Raytrace SpecWEB

• Worst case uniprocessor overhead only 9% 8

Fault containment

VM VM Cellular Disco VM CPU CPU CPU CPU CPU CPU CPU CPU Interconnect

• Requires hardware support as designed in FLASH multiprocessor 9

Fault containment overhead @ 0%

120 100 80 +1% -2% +1% +1% 1 cell 8 cells 60 40 20 0 Informix TPC-D Kernel build Raytrace SpecWEB

• 1000 fault injection experiments (SimOS): 100% success 10

Resource management challenges

• Conflicting constraints – Fault containment – Resource load balancing • Scalability • Decentralized control • Migrate VMs without OS support 11

CPU load balancing

VM VM VM VM Cellular Disco VM VM CPU CPU CPU CPU CPU CPU CPU CPU Interconnect

Idle balancer (local view)

• Check neighboring run queues (intra-cell only) • VCPU migration cost: 37µs to 1.5ms

– Cache and node memory affinity: > 8 ms • Backoff • Fast, local

CPU 0 CPU 1 CPU 2 CPU 3

B0 B1 A0 B1 A1 VCPUs 13

Periodic balancer (global view)

• Check for disparity in load tree • Cost – Affinity loss – Fault dependencies 4 1 3 1

CPU 0

CPU 1

B0 B1 2

CPU 2

A0 B1 1

CPU 3

A1 fault containment boundary 14

CPU management results

250 200 150 100 50 +0.3% +9% 0 One VM Four VMs Eight VMs

• IRIX overhead (13%) is higher

Ideal Cellular Disco

Memory load balancing

VM VM VM Cellular Disco VM RAM RAM RAM RAM RAM RAM RAM RAM Interconnect

Memory load balancing policy

• Borrow memory before running out • Allocation preferences for each VM • Borrow based on: – Combined allocation preferences of VMs – Memory availability on other cells – Memory usage • Loan when enough memory available 17

Memory management results

DB Cellular Disco 32 CPUs, 3.5GB

Interconnect DB

Only +1% overhead

4 4 Cellular Disco 4 4 4 4 Interconnect 4 4

• Ideally: same time if perfect memory balancing 18

Comparison to related work

• Operating system (IRIX6.4) • Hardware partitioning – Simulated by disabling inter-cell resource balancing

16 process TPC-D Raytrace Cellular Disco

8 CPUs 8 CPUs 8 CPUs

Interconnect

8 CPUs 19

Results of comparison

500 400 300 200 216 221 434 231 229 325 Operating system Virtual clusters Hardware partitioning 100 0 Raytrace Database

• CPU utilization: 31% (HW) vs. 58% (VC) 20

Conclusions

• Virtual machine approach adds flexibility to system at a low development cost • Virtual clusters address the needs of large shared-memory multiprocessors – Avoid operating system scalability bottlenecks – Support fault containment – Provide scalable resource management – Small overheads and low implementation cost 21