OpenCL-short
Download
Report
Transcript OpenCL-short
OpenCL
Sathish Vadhiyar
Sources:
OpenCL overview from AMD
OpenCL learning kit from AMD
Introduction
OpenCL is a programming framework
for heterogeneous computing
resources
Resources include CPUs, GPUs, Cell
Broadband Engine, FPGAs, DSPs
Many similarities with CUDA
Command Queues
A command queue is the mechanism for the
host to request that an action be performed by
the device
Perform a memory transfer, begin executing, etc.
Interesting concept of enqueuing kernels and
satisfying dependencies using events
A separate command queue is required for each
device
Commands within the queue can be
synchronous or asynchronous
Commands can execute in-order or out-of-order
Perhaad Mistry & Dana Schaa,
Northeastern Univ Computer
Architecture Research Lab, with Ben
4
Example – Image Rotation
Slides 8, 11-16 of lecture 5 in openCL
University kit
Synchronization
Synchronization in OpenCL
Synchronization is required if we use
an out-of-order command queue or
multiple command queues
Coarse synchronization granularity
Per command queue basis
Finer synchronization granularity
Per OpenCL operation basis using events
Perhaad Mistry & Dana Schaa,
Northeastern Univ Computer
Architecture Research Lab, with Ben
8
OpenCL Command Queue Control
Command queue synchronization methods work on a per-queue
basis
Flush: clFlush(cl_commandqueue)
Send all commands in the queue to the compute
device
No guarantee that they will be complete when clFlush
returns
Finish: clFinish(cl_commandqueue)
Waits for all commands in the command queue to
complete before proceeding (host blocks on this call)
Barrier: clEnqueueBarrier(cl_commandqueue)
Enqueue a synchronization point that ensures all
prior commands in a queue have completed before
any further commands execute
Perhaad Mistry & Dana Schaa,
Northeastern Univ Computer
Architecture Research Lab, with Ben
9
OpenCL Events
Previous OpenCL synchronization
functions only operated on a percommand-queue granularity
OpenCL events are needed to
synchronize at a function granularity
Explicit synchronization is required for
Out-of-order command queues
Multiple command queues
Perhaad Mistry & Dana Schaa,
Northeastern Univ Computer
Architecture Research Lab, with Ben
10
Using User Events
A simple example of user events
being triggered and used in a
command queue
//Create user event which will start the write of buf1
user_event = clCreateUserEvent(ctx, NULL);
clEnqueueWriteBuffer( cq, buf1, CL_FALSE, ..., 1, &user_event , NULL);
//The write of buf1 is now enqued and waiting on user_event
X = foo(); //Lots of complicated host processing code
clSetUserEventStatus(user_event, CL_COMPLETE);
//The clEnqueueWriteBuffer to buf1 can now proceed as per OP of foo()
Perhaad Mistry & Dana Schaa,
Northeastern Univ Computer
Architecture Research Lab, with Ben
11
Multiple Devices
Multiple Devices
OpenCL can also be used to program multiple
devices (CPU, GPU, Cell, DSP etc.)
OpenCL does not assume that data can be
transferred directly between devices, so
commands only exists to move from a host to
device, or device to host
Copying from one device to another requires an
intermediate transfer to the host
OpenCL events are used to synchronize
execution on different devices within a context
Compiling Code for Multiple
Devices