Transcript Evaluating GPU Passthrough in Xen for High Performance Cloud
Evaluating GPU Passthrough in Xen for High Performance Cloud Computing Andrew J. Younge
1
, John Paul Walters
2 , Stephen P. Crago 2 , and Geoffrey C. Fox 1 1 Indiana University 2 USC / Information Sciences Institute
Where are we in the Cloud?
• • • Cloud computing spans may areas of expertise Today, focus only on IaaS and the underlying
hardware
Things we do here effect the entire pyramid!
http://futuregrid.org
2
Motivation
• • • Need for GPUs on Clouds – GPUs are becoming commonplace in scientific computing – Great performance-per-watt Different competing methods for virtualizing GPUs – Remote API for CUDA calls – Direct GPU usage within VM Advantages and disadvantages to both solutions 3
Front-end GPU API
• • • • • Translate all CUDA calls into remote method invocations Users share GPUs across a node or cluster Can run within a VM, as no hardware is needed, only a remote API Many implementations for CUDA – rCUDA, gVirtus, vCUDA, GViM, etc..
Many desktop virtualization technologies do the same for OpenGL & DirectX http://futuregrid.org
4
Front-end GPU API
http://futuregrid.org
5
Front-end API Limitations
• • • Can use remote GPUs, but all data goes over the network – Can be very inefficient for applications with non trivial memory movement Usually doesn’t support CUDA extensions in C – Have to separate CPU and GPU code – Requires special decouple mechanism Cannot directly drop in solution with existing solutions.
http://futuregrid.org
6
Direct GPU Passthrough
• • • • Allow VMs to directly access GPU hardware Enables CUDA and OpenCL code Utilizes PCI-passthrough of device to guest VM – Uses hardware directed I/O virt (VT-d or IOMMU) – Provides direct isolation and security of device – Removes host overhead entirely Similar to what Amazon EC2 uses 7 http://futuregrid.org
Dom0
Direct GPU Passthrough
DomN VM Dom1 VM Dom2 VM CUDA Task CUDA Task CUDA Task . . .
OpenStack Compute veth GPU veth GPU veth GPU BR0 VMM (Hypervisor) CPU & DRAM VT-D / IOMMU PCI Express GPU2 ETH0 GPU0 http://futuregrid.org
GPU1 8
CPU (cores) Clock Speed RAM NUMA Nodes GPU
Type
Native Host Xen Dom0 4.2.22 DomU Guest VM
Hardware Setup
Sandy Bridge + Kepler
2x E5-2670 (16) 2.6 GHz 48 GB 2 1x Nvidia Tesla K20m
Westmere + Fermi
2x X5660 (12) 2.6 GHz 192 GB 2 2x Nvidia Tesla C2075
Linux Kernel
2.6.32-279 3.4.53-8 2.6.32-279
9
Linux Distro
CentOS 6.4
CentOS 6.4
CentOS 6.4
SHOC Benchmark Suite
• • • Developed by Future Technologies Group @ Oak Ridge National Laboratory Provides 70 benchmarks – Synthetic micro benchmarks – 3 rd party applications – OpenCL and CUDA implementations Represents well-rounded view for GPU performance http://futuregrid.org
10
http://futuregrid.org
11
http://futuregrid.org
12
http://futuregrid.org
13
http://futuregrid.org
14
Initial Thoughts
• • Raw GPU computational abilities impacted less than 1% in VMs compared to base system – Excellent sign for supporting GPUs in the Cloud However, overhead occurs during large transfers between CPU & GPU – Much higher overhead for Westmere/Fermi test architecture – Around 15% overhead in worst-case benchmark – Sandy-bridge/Kepler overhead lower 15 http://futuregrid.org
http://futuregrid.org
16
http://futuregrid.org
17
Discussion
• • • • GPU Passthrough possible in Xen!
– Results show high performance GPU computation a reality with Xen Overhead is minimal for GPU computation – – Sandy-Bridge/Kepler has < 1.2% overall overhead Westmere/Fermi has < 1% computational overhead, 7-25% PCIE overhead PCIE overhead not likely due to VT-d mechanisms – NUMA configuration in Westmere CPU architecture GPU PCI Passthrough performs better than other front-end remote API solutions http://futuregrid.org
18
Future Work
• • • • Support PCI Passthrough in Cloud IaaS Framework – OpenStack Nova – Work for both GPUs and other PCI devices – Show performance better than EC2 Resolve NUMA issues with Westmere architecture and Fermi GPUs Evaluate other hypervisor GPU possibilities Support large scale distributed CPU+GPU computation in the Cloud http://futuregrid.org
19
Conclusion
• • • GPUs are here to stay in scientific computing – Many Petascale systems use GPUs – Expected GPU Exascale machine (2020-ish) Providing HPC in the Cloud is key to the viability of scientific cloud computing.
OpenStack provides an ideal architecture to enable HPC in clouds.
http://futuregrid.org
20
Thanks!
• • • •
Acknowledgements:
NSF FutureGrid project – – GPU cluster hardware FutureGrid team @ IU USC/ISI APEX research group Persistent Systems Graduate Fellowship Xen open source community
About Me:
Andrew J. Younge Ph.D Candidate Indiana University Bloomington, IN USA Email – [email protected]
Website – http://ajyounge.com
http://portal.futuregrid.org
http://futuregrid.org
21
EXTRA SLIDES
http://futuregrid.org
22
FutureGrid: a Distributed Testbed
Private Public FG Network
NID
: Network Impairment Device
http://futuregrid.org
24
OpenStack GPU Cloud Prototype
http://futuregrid.org
25
Bandwidth - Host to Device
7000 6000 5000 4000 3000 2000 1000 0 1024 4096 16384 65536
Data Size (bytes)
262144 1048576 4194304 ~ 1.25% K20m VM K20m Native 26
Bandwidth - Device to Host
7000 6000 5000 4000 3000 2000 1000 0 1024 ~3.62% 4096 16384 65536
Data Size (bytes)
262144 1048576 4194304 ~.64% K20m Native K20m VM 27
Overhead in Bandwidth
7000 6000 5000 4000 3000 2000 1000 0
Xen vs. Base, Host to Device Bandwidth, Pinned
Base Xen VM 6000 5000 4000 3000 2000 1000 0
Xen vs. Base, Device to Host Bandwidth, Pinned
Base Xen VM
Data Size, KB Data Size, KB
28