PowerPoint 프레젠테이션

Download Report

Transcript PowerPoint 프레젠테이션

vTurbo: Accelerating Virtual Machine I/O
Processing Using Designated Turbo-Sliced Core
Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella, Dongyan Xu
2013 USENIX Annual Technical Conference
Embedded Lab.
Kim Sewoog
Motivation
 Pay-as-you-go: Server Consolidation
 Save cost in running application and operational expenditure
 Multiple VMs sharing the same core
 CPU access latency
VM1
VM2
VM3
VM4
Hypervisor(or VMM)
Low I/O
Throughput
I/O Processing
 Two basic stages
 Device interrupts are processed synchronously in the kernel
 Application asynchronously copies the data in kernel buffer
VM1
VM2
VM3
Application
Kernel
Buffer
IRQ
Processing
CPU
Time
IRQ processing delay
< I/O Processing Workflow >
< Effect of CPU Sharing on I/O Processing >
Effect of CPU Sharing on TCP Receive
TCP
Client
Hypervisor
Shared
Buffer
Scheduled
VMs
DATA
DATA
VM1
IRQ
Processing
Delay
VM2
DATA
VM3
ACK
ACK
ACK
Effect of CPU Sharing on UDP Receive
UDP
Client
Hypervisor
Shared
Buffer
Scheduled
VMs
DATA
Shared Buffer
DATA
VM1
Dropped
Full
VM2
Application
Buffer
DATA
VM3
Effect of CPU Sharing on Disk Write
Scheduled
VMs
Application
Kernel Memory
Disk Drive
DATA
DATA
VM3
VM1
IRQ
Processing
Delay
VM2
DATA
VM3
Kernel
Memory
Intuitive Solution
 Reduce time-slice of each VM
 Causes significant context switch overhead
Our Solution: vTurbo
Our Solution: vTurbo
 IRQ processing offloaded to a dedicated turbo core
 Turbo core : Any physical core with micro-slicing (e.g., 0.1 ms)
 Expose turbo core as a special vCPU to the VM
 Turbo vCPU runs on a turbo core
 Regular vCPUs run on regular cores
 Pin IRQ context of guest OS to turbo vCPU
 Benefits
 Improved I/O throughput (TCP/UDP, Disk)
 Self-adaptive system
vTurbo Design
vTurbo Design
VM1
VM2
VM3
Application
Regular Core
Buf
VM1
VM2
VM3
IRQ
Buf
VM1
VM2
VM3
IRQ
Turbo Core
Time
Data
Data
vTurbot’s Impact on Disk Write
Regular
Core
Application
Kernel Memory
vTurbo
Disk Drive
DATA
VM3
VM1
VM2
VM3
VM1
VM2
VM3
VM1
VM2
VM3
VM1
VM2
VM3
VM1
VM2
VM3
VM1
VM2
Kernel Memory
Effect of CPU Sharing on UDP Receive
UDP
Client
Hypervisor
Shared
Buffer
vTurbo
Regular Cores
Kernel
Buffer
Shared Buffer
DATA
VM1
VM3
VM1
VM2
VM3
VM1
VM2
VM3
VM1
VM2
VM3
VM1
VM2
Kernel Buffer
VM2
Application Buffer
DATA
VM3
Effect of CPU Sharing on TCP Receive
TCP
Client
Hypervisor
Shared
Buffer
vTurbo
Kernel
Buffer
Regular Cores
Backlog Queue
DATA
ACK
VM3
VM1
VM2
VM3
VM1
VM2
VM3
VM1
VM2
VM3
VM1
VM2
VM1
Receive Queue
Locked
VM2
DATA
Application Buffer
VM3
VM Scheduling Policy for Fairness
 Turbo cores are not free
 Maintain CPU fair-share among VMs




Calculate the credits on both regular and turbo cores
Guarantee the CPU allocation on turbo cores
Deduct I/O intensive VMs’ credits on regular cores
Allocate the deduction to non-IO intensive VMs
< total capacity among the regular and turbo cores >
< each VMs’ turbo core fair share >
< total capacity >
< actual usage of the turbo core >
< each VM’s fair share of CPU >
Evaluation
 VM hosts





3.2 GHz Intel Xeon Quad-cores CPU, 16GB RAM
Assign an independent core to driver domain(dom0)
Xen 4.1.2
Linux 3.2
Choose 1 core as Turbo core
 Gigabit Ethernet switch(10Gbps for 2 experiments)
File Read/Write Throughput: Micro-Benchmark
regular core <-> turbo core
TCP/UDP Throughput : Micro-Benchmark
NFS/SCP Throughput : Application Benchmark
Apache Olio : Application Benchmark
 3 components
 a web server to process user requests
 a MySQL database server to store user profiles and event information
 an NFS server to store images and documents specific to events
Conclusions
 Problem : CPU sharing affects I/O throughput
 Solution : vTurbo
 Offload IRQ processing to a turbo-sliced dedicated core
 Results :





Improve UDP throughput up to 4x
Improve TCP throughput up to 3x
Improve Disk write up to 2x
Improve NFS’ throughput up to 3x
Improve Olio’s throughput by up to 38.7%
Reference
 CHENG, L., AND WANG, C.-L. “vbalance: Using interrupt load balance
to improve i/o performance for smp virtual machine”, In ACM SoCC
(2012)
 DONG, Y., YU, Z., AND ROSE, G. “SR-IOV networking in Xen:
architecture, design and implementation”, In WIOV (2008).
 GORDON, A., AMIT, N., HAR’EL, N., BEN-YEHUDA, M., LANDAU, A.,
SCHUSTER, A., AND TSAFRIR, D. “ELI: baremetal performance for
I/O virtualization”, In ACM ASPLOS(2012).
THANK YOU !