Transcript vCUDA

vCUDA: GPU Accelerated High Performance
Computing in Virtual Machines
Lin Shi, Hao Chen and Jianhua
Sun
1
IEEE 2009
2009-12-31
Presenter: Hung-Fu Li
HPDS Lab.
NKUAS
Lecture Outline
Abstract
Background
Motivation
CUDA Architecture
vCUDA Architecture
Experiment Result
Conclusion
3
4
5
7
8
13
19
2
Abstract
This paper describe vCUDA, a GPGPU computation
solution for virtual machine. The author announced that
the API interception and redirection could provide
transparent and high performance to the applications.
This paper would carry out the performance evaluation on
the overhead of their framework.
3
Background
VM(Virtual Machine)
CUDA (Computation Unified Device Architecture)
API (Application Programming Interface)
API Interception, Redirection
RPC(Remote Procedure Call)
4
Motivation
Virtualization may be the simplest solution to
heterogeneous computation environment.
Hardware varied by vendors, it is not necessary for VMdeveloper to implements hardware drivers for them. (due
to license, vendor would not public the source and kernel
technique)
5
Motivation ( cont. )
Currently the virtualization does only support
Accelerated Graphic API such as OpenGL, named
VMGL, which is not used for general computation
purpose.
6
CUDA Architecture
Component Stack
User Application
<< CUDA Extensions to C>>
CUDA Runtime API
CUDA Driver API
CUDA Driver
CUDA Enabled Device
7
vCUDA Architecture
Split the stack into hardware/software binding
User Application
Part of SDK
<< CUDA Extensions to C>>
CUDA Runtime API
soft binding
CUDA Driver API
Direct communicate
CUDA Driver
CUDA Enabled Device
hard binding
8
vCUDA Architecture ( cont. )
Re-group the stack into host and remote side.
User Application
Part of SDK
<< CUDA Extensions to C>>
[v]CUDA Runtime API
[v]CUDA Driver API
Remote binding
(guestOS)
[v]CUDA Enabled Device(vGPU)
CUDA Driver API
CUDA Driver
Host binding
CUDA Enabled Device
9
vCUDA Architecture ( cont. )
[v]CUDA Runtime API
[v]CUDA Driver API
[v]CUDA Enabled Device(vGPU)
Remote
binding
(guestOS)
Use fake API as adapter to adapt the instant driver and the virtual driver.
API Interception
Parameters passed
Order Semantics
Hardware State
Communication
Use Lazy-RPC Transmission
Use XML-RPC as high-level communication.(for cross-platform requirement)
10
vCUDA Architecture ( cont. )
Host OS
Virtual Machine OS
Non instant API
lazyRPC
Instant API
11
vCUDA Architecture ( cont. )
vCUDA API with virtual GPU
Lazy RPC
Reduce the overhead of switching between host OS and
guest OS.
Hardware states
vGPU
NonInstant Package
AP
API Invocation
NonInstant API call
LazyRPC
Instant api call
GPU
Stub
vStub
12
Experiment Result
Criteria
Performance
Lazy RPC and Concurrency
Suspend& Resume
Compatibility
13
Experiment Result ( cont. )
Criteria
Performance
Lazy RPC and Concurrency
Suspend& Resume
Compatibility
14
Experiment Result ( cont. )
Criteria
Performance
Lazy RPC and Concurrency
Suspend& Resume
Compatibility
15
Experiment Result ( cont. )
Criteria
Performance
Lazy RPC and Concurrency
Suspend& Resume
Compatibility
16
Experiment Result ( cont. )
Criteria
Performance
Lazy RPC and Concurrency
Suspend& Resume
Compatibility
17
Experiment Result ( cont. )
Criteria
Performance
Lazy RPC and Concurrency
Suspend& Resume
Compatibility
MV: Matrix Vector Multiplication Algorithm
StoreGPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems
MRRR: Multiple Relatively Robust Representations
GPUmg: Molecular Dynamics Simulation with GPU
18
Conclusion
They have developed CUDA interface for virtual machine,
which is compatible to the native interface. The data
transmission is a significant bottleneck, due to RPC XMLparsing.
This presentation have briefly present the major
architecture of the vCUDA and the idea of it. We could
extend the architecture as component / solution to make
the cloud computing support GPU.
19
End of Presentation
Thanks for your listening.
20