OpenCL-short

Download Report

Transcript OpenCL-short

OpenCL
Sathish Vadhiyar
Sources:
OpenCL overview from AMD
OpenCL learning kit from AMD
Introduction
 OpenCL is a programming framework
for heterogeneous computing
resources
 Resources include CPUs, GPUs, Cell
Broadband Engine, FPGAs, DSPs
 Many similarities with CUDA
Command Queues
A command queue is the mechanism for the
host to request that an action be performed by
the device
 Perform a memory transfer, begin executing, etc.
 Interesting concept of enqueuing kernels and
satisfying dependencies using events
A separate command queue is required for each
device
Commands within the queue can be
synchronous or asynchronous
Commands can execute in-order or out-of-order
Perhaad Mistry & Dana Schaa,
Northeastern Univ Computer
Architecture Research Lab, with Ben
4
 Example – Image Rotation
 Slides 8, 11-16 of lecture 5 in openCL
University kit
 Synchronization
Synchronization in OpenCL
 Synchronization is required if we use
an out-of-order command queue or
multiple command queues
 Coarse synchronization granularity
 Per command queue basis
 Finer synchronization granularity
 Per OpenCL operation basis using events
Perhaad Mistry & Dana Schaa,
Northeastern Univ Computer
Architecture Research Lab, with Ben
8
OpenCL Command Queue Control


Command queue synchronization methods work on a per-queue
basis
Flush: clFlush(cl_commandqueue)
 Send all commands in the queue to the compute
device
 No guarantee that they will be complete when clFlush
returns

Finish: clFinish(cl_commandqueue)
 Waits for all commands in the command queue to
complete before proceeding (host blocks on this call)

Barrier: clEnqueueBarrier(cl_commandqueue)
 Enqueue a synchronization point that ensures all
prior commands in a queue have completed before
any further commands execute
Perhaad Mistry & Dana Schaa,
Northeastern Univ Computer
Architecture Research Lab, with Ben
9
OpenCL Events
 Previous OpenCL synchronization
functions only operated on a percommand-queue granularity
 OpenCL events are needed to
synchronize at a function granularity
 Explicit synchronization is required for
 Out-of-order command queues
 Multiple command queues
Perhaad Mistry & Dana Schaa,
Northeastern Univ Computer
Architecture Research Lab, with Ben
10
Using User Events
 A simple example of user events
being triggered and used in a
command queue
//Create user event which will start the write of buf1
user_event = clCreateUserEvent(ctx, NULL);
clEnqueueWriteBuffer( cq, buf1, CL_FALSE, ..., 1, &user_event , NULL);
//The write of buf1 is now enqued and waiting on user_event
X = foo(); //Lots of complicated host processing code
clSetUserEventStatus(user_event, CL_COMPLETE);
//The clEnqueueWriteBuffer to buf1 can now proceed as per OP of foo()
Perhaad Mistry & Dana Schaa,
Northeastern Univ Computer
Architecture Research Lab, with Ben
11
 Multiple Devices
Multiple Devices
 OpenCL can also be used to program multiple
devices (CPU, GPU, Cell, DSP etc.)
 OpenCL does not assume that data can be
transferred directly between devices, so
commands only exists to move from a host to
device, or device to host
 Copying from one device to another requires an
intermediate transfer to the host
 OpenCL events are used to synchronize
execution on different devices within a context
Compiling Code for Multiple
Devices