Evaluating GPU Passthrough in Xen for High Performance Cloud

Download Report

Transcript Evaluating GPU Passthrough in Xen for High Performance Cloud

Evaluating GPU Passthrough in Xen for High Performance Cloud Computing Andrew J. Younge

1

, John Paul Walters

2 , Stephen P. Crago 2 , and Geoffrey C. Fox 1 1 Indiana University 2 USC / Information Sciences Institute

Where are we in the Cloud?

• • • Cloud computing spans may areas of expertise Today, focus only on IaaS and the underlying

hardware

Things we do here effect the entire pyramid!

http://futuregrid.org

2

Motivation

• • • Need for GPUs on Clouds – GPUs are becoming commonplace in scientific computing – Great performance-per-watt Different competing methods for virtualizing GPUs – Remote API for CUDA calls – Direct GPU usage within VM Advantages and disadvantages to both solutions 3

Front-end GPU API

• • • • • Translate all CUDA calls into remote method invocations Users share GPUs across a node or cluster Can run within a VM, as no hardware is needed, only a remote API Many implementations for CUDA – rCUDA, gVirtus, vCUDA, GViM, etc..

Many desktop virtualization technologies do the same for OpenGL & DirectX http://futuregrid.org

4

Front-end GPU API

http://futuregrid.org

5

Front-end API Limitations

• • • Can use remote GPUs, but all data goes over the network – Can be very inefficient for applications with non trivial memory movement Usually doesn’t support CUDA extensions in C – Have to separate CPU and GPU code – Requires special decouple mechanism Cannot directly drop in solution with existing solutions.

http://futuregrid.org

6

Direct GPU Passthrough

• • • • Allow VMs to directly access GPU hardware Enables CUDA and OpenCL code Utilizes PCI-passthrough of device to guest VM – Uses hardware directed I/O virt (VT-d or IOMMU) – Provides direct isolation and security of device – Removes host overhead entirely Similar to what Amazon EC2 uses 7 http://futuregrid.org

Dom0

Direct GPU Passthrough

DomN VM Dom1 VM Dom2 VM CUDA Task CUDA Task CUDA Task . . .

OpenStack Compute veth GPU veth GPU veth GPU BR0 VMM (Hypervisor) CPU & DRAM VT-D / IOMMU PCI Express GPU2 ETH0 GPU0 http://futuregrid.org

GPU1 8

CPU (cores) Clock Speed RAM NUMA Nodes GPU

Type

Native Host Xen Dom0 4.2.22 DomU Guest VM

Hardware Setup

Sandy Bridge + Kepler

2x E5-2670 (16) 2.6 GHz 48 GB 2 1x Nvidia Tesla K20m

Westmere + Fermi

2x X5660 (12) 2.6 GHz 192 GB 2 2x Nvidia Tesla C2075

Linux Kernel

2.6.32-279 3.4.53-8 2.6.32-279

9

Linux Distro

CentOS 6.4

CentOS 6.4

CentOS 6.4

SHOC Benchmark Suite

• • • Developed by Future Technologies Group @ Oak Ridge National Laboratory Provides 70 benchmarks – Synthetic micro benchmarks – 3 rd party applications – OpenCL and CUDA implementations Represents well-rounded view for GPU performance http://futuregrid.org

10

http://futuregrid.org

11

http://futuregrid.org

12

http://futuregrid.org

13

http://futuregrid.org

14

Initial Thoughts

• • Raw GPU computational abilities impacted less than 1% in VMs compared to base system – Excellent sign for supporting GPUs in the Cloud However, overhead occurs during large transfers between CPU & GPU – Much higher overhead for Westmere/Fermi test architecture – Around 15% overhead in worst-case benchmark – Sandy-bridge/Kepler overhead lower 15 http://futuregrid.org

http://futuregrid.org

16

http://futuregrid.org

17

Discussion

• • • • GPU Passthrough possible in Xen!

– Results show high performance GPU computation a reality with Xen Overhead is minimal for GPU computation – – Sandy-Bridge/Kepler has < 1.2% overall overhead Westmere/Fermi has < 1% computational overhead, 7-25% PCIE overhead PCIE overhead not likely due to VT-d mechanisms – NUMA configuration in Westmere CPU architecture GPU PCI Passthrough performs better than other front-end remote API solutions http://futuregrid.org

18

Future Work

• • • • Support PCI Passthrough in Cloud IaaS Framework – OpenStack Nova – Work for both GPUs and other PCI devices – Show performance better than EC2 Resolve NUMA issues with Westmere architecture and Fermi GPUs Evaluate other hypervisor GPU possibilities Support large scale distributed CPU+GPU computation in the Cloud http://futuregrid.org

19

Conclusion

• • • GPUs are here to stay in scientific computing – Many Petascale systems use GPUs – Expected GPU Exascale machine (2020-ish) Providing HPC in the Cloud is key to the viability of scientific cloud computing.

OpenStack provides an ideal architecture to enable HPC in clouds.

http://futuregrid.org

20

Thanks!

• • • •

Acknowledgements:

NSF FutureGrid project – – GPU cluster hardware FutureGrid team @ IU USC/ISI APEX research group Persistent Systems Graduate Fellowship Xen open source community

About Me:

Andrew J. Younge Ph.D Candidate Indiana University Bloomington, IN USA Email – [email protected]

Website – http://ajyounge.com

http://portal.futuregrid.org

http://futuregrid.org

21

EXTRA SLIDES

http://futuregrid.org

22

FutureGrid: a Distributed Testbed

Private Public FG Network

NID

: Network Impairment Device

http://futuregrid.org

24

OpenStack GPU Cloud Prototype

http://futuregrid.org

25

Bandwidth - Host to Device

7000 6000 5000 4000 3000 2000 1000 0 1024 4096 16384 65536

Data Size (bytes)

262144 1048576 4194304 ~ 1.25% K20m VM K20m Native 26

Bandwidth - Device to Host

7000 6000 5000 4000 3000 2000 1000 0 1024 ~3.62% 4096 16384 65536

Data Size (bytes)

262144 1048576 4194304 ~.64% K20m Native K20m VM 27

Overhead in Bandwidth

7000 6000 5000 4000 3000 2000 1000 0

Xen vs. Base, Host to Device Bandwidth, Pinned

Base Xen VM 6000 5000 4000 3000 2000 1000 0

Xen vs. Base, Device to Host Bandwidth, Pinned

Base Xen VM

Data Size, KB Data Size, KB

28