OpenCL Introduction

Download Report

Transcript OpenCL Introduction

OpenCL Introduction
A TECHNICAL REVIEW
LU LU
OCT. 11 2014
CONTENTS
1. OpenCL Architecture
2. OpenCL Programming
3. An Matrix Multiplication Example
OPENCL INTRODUCTION | APRIL 11, 2014
2
1. OPENCL ARCHITECTURE
1. OPENCL ARCHITECTURE
1. Four Architectural Models
Platform Model
Execution Model
Memory Model
Programming Model
2. OpenCL Framework
OPENCL INTRODUCTION | APRIL 11, 2014
4
1.1 FOUR ARCHITECTURAL MODELS
 Platform Model
 Execution Model
 Memory Model
 Programming Model
OPENCL INTRODUCTION | APRIL 11, 2014
5
1.1.1 PLATFORM MODEL
OPENCL INTRODUCTION | APRIL 11, 2014
6
1.1.1 PLATFORM MODEL (CONT.)
 One host equipped with OpenCL device(s).
 An OpenCL device consists of compute unit(s)/CU(s).
 A CU consists of processing element(s), or PE(s).
– Computations on a device occur within PEs.
OPENCL INTRODUCTION | APRIL 11, 2014
7
1.1.2 EXECUTION MODEL
 Kernels
– execute on one or more OpenCL devices
 Host Program
– executes on the host
– defines the context for the kernels
– manages the execution of kernels
OPENCL INTRODUCTION | APRIL 11, 2014
8
1.1.2 EXECUTION MODEL (CONT.)
 NDRange
– an N-dimensional index space, where N is 1, 2 or 3
 WORK-ITEM
– an instance of the kernel
– identified by a global ID in the NDRange
– executes the same code in parallel
• The specific execution pathway through the code and the data operated upon can
vary per work-item.
OPENCL INTRODUCTION | APRIL 11, 2014
9
1.1.2 EXECUTION MODEL (CONT.)
 WORK-GROUP
– Provide a coarse-grained decomposition of NDRange;
– Be assigned a unique work-group ID with the same dimensionality as
NDRange;
– Use a unique local ID to identify each of its work-items.
– Its work-items execute concurrently on the PEs of a single CU.
– Kernels could use some synchronization controls within a work-group.
– The NDRange size should be a multiple of the work-group size.
OPENCL INTRODUCTION | APRIL 11, 2014
10
1.1.2 EXECUTION MODEL (CONT.)
OPENCL INTRODUCTION | APRIL 11, 2014
11
1.1.2 EXECUTION MODEL (CONT.)
 Context
– The host defines a context for the execution of the kernels.
 Resources in the context:
– Devices
• The collection of OpenCL devices to be used by the host.
– Kernels
• The OpenCL functions that run on OpenCL devices.
– Program Objects
• The program source and executable that implement the kernels.
– Memory Objects
• A set of memory objects visible to the host and the OpenCL devices.
• Memory objects contain values that can be operated on by instances of a kernel.
OPENCL INTRODUCTION | APRIL 11, 2014
12
1.1.2 EXECUTION MODEL (CONT.)
 Command-queue
– The host creates a data structure called a command-queue to coordinate
execution of the kernels on the devices.
– The host places commands into the command-queue which are then
scheduled onto the devices within the context.
– The command-queue schedules commands for execution on a device.
– Commands execute asynchronously between the host and the device.
OPENCL INTRODUCTION | APRIL 11, 2014
13
1.1.2 EXECUTION MODEL (CONT.)
 Commands in command-queue:
– Kernel execution commands
• Execute a kernel on the processing elements of a device.
– Memory commands
• Transfer data to, from, or between memory objects, or map and unmap memory
objects from the host address space.
– Synchronization commands
• Constrain the order of execution of commands.
OPENCL INTRODUCTION | APRIL 11, 2014
14
1.1.2 EXECUTION MODEL (CONT.)
 Commands execute modes:
– In-order Execution
– Out-of-order Execution
• Any order constraints are enforced by the programmer through explicit
synchronization commands
OPENCL INTRODUCTION | APRIL 11, 2014
15
1.1.3 MEMORY MODEL
OPENCL INTRODUCTION | APRIL 11, 2014
16
1.1.3 MEMORY MODEL (CONT.)
 Private Memory
– Per work-item
 Local Memory
– Shared within a work-group
 Global/Constant Memory
– Latter is cached
 Host Memory
– On the CPU
 Memory management is explicit
– must move data from host -> global -> local and back
OPENCL INTRODUCTION | APRIL 11, 2014
17
1.1.3 MEMORY MODEL (CONT.)
 Memory Region
– Allocation and Memory Access Capabilities
OPENCL INTRODUCTION | APRIL 11, 2014
18
1.1.3 MEMORY MODEL (CONT.)
 Memory Consistency
– OpenCL uses a relaxed consistency memory model; i.e., the state of
memory visible to a work-item is not guaranteed to be consistent across
the collection of work-items at all times
– Within a work-item, memory has load/store consistency
– Within a work-group at a barrier, local memory has consistency across
work-items
– Global memory is consistent within a work-group, at a barrier, but not
guaranteed across different work-groups
– Consistency of memory shared between commands are enforced through
synchronization
OPENCL INTRODUCTION | APRIL 11, 2014
19
1.1.4 PROGRAMMING MODEL
 Data Parallel Programming Model
– All the work-items in NDRange execute in parallel.
 Task Parallel Programming Model
– Executing a kernel on a compute unit with a work-group containing a single
work-item.
– Express parallelism by:
• using vector data types implemented by the device,
• enqueuing multiple tasks, and/or
• enqueuing native kernels developed using a programming model orthogonal to
OpenCL.
OPENCL INTRODUCTION | APRIL 11, 2014
20
1.1.4 PROGRAMMING MODEL (CONT.)
 Synchronization
– Work-items in a single work-group
• Work-group barrier
– Commands enqueued to command-queue(s) in a single context
• Command-queue barrier
• Waiting on an event.
OPENCL INTRODUCTION | APRIL 11, 2014
21
1.1.4 PROGRAMMING MODEL (CONT.)
 Events Synchronization
OPENCL INTRODUCTION | APRIL 11, 2014
22
1.2 OPENCL FRAMEWORK
 OpenCL Platform layer
– This layer allows a host program to discover OpenCL devices and their
capabilities and to create contexts.
 OpenCL Runtime
– The runtime allows the host program to manipulate created contexts.
 OpenCL Compiler
– The compiler creates executable program containing OpenCL kernels. The
OpenCL programming language implemented by the compiler supports a
subset of the ISO C99 language with parallelism extensions.
OPENCL INTRODUCTION | APRIL 11, 2014
23
2. OPENCL PROGRAMMING
2.2 BASIC STEPS
 Step 1: Discover and initialize the platforms
 Step 2: Discover and initialize the devices
 Step 3: Create the context
 Step 4: Create a command queue
 Step 5: Create device buffers
 Step 6: Write the host data to device buffers
OPENCL INTRODUCTION | APRIL 11, 2014
25
2.2 BASIC STEPS (CONT.)
 Step 7: Create and compile the program
 Step 8: Create the kernel
 Step 9: Set the kernel arguments
 Step 10: Configure the work-item structure
 Step 11: Enqueue the kernel for execution
 Step 12: Read the output buffer back to the host
 Step 13: Release the OpenCL resources
OPENCL INTRODUCTION | APRIL 11, 2014
26
2.3 BASIC STRUCTURE
 Host program
–
–
–
–
–
–
–
Query compute devices
Create the context and command-queue
Create memory objects associated to the context
Compile and create kernel objects
Issue commands to command-queue
Synchronization of commands
Release OpenCL resources
Platform Layer
Runtime
 Kernels
– C code with come restrictions and extensions
OPENCL INTRODUCTION | APRIL 11, 2014
Language
27
3. AN EXAMPLE
3.1 DESCRIPTION OF THE PROBLEM
 𝐴 is a 𝐻𝐴 × 𝑊𝐴 matrix
 𝐵 is a 𝐻𝐵 × 𝑊𝐵 matrix
 Satisfy 𝑊𝐴 = 𝐻𝐵
 Calculate 𝐶 = 𝐴𝐵
– which would be a 𝐻𝐴 × 𝑊𝐵 matrix
OPENCL INTRODUCTION | APRIL 11, 2014
29
3.2 SERIAL IMPLEMENTATION
OPENCL INTRODUCTION | APRIL 11, 2014
30
3.3 CALCULATION PROCEDURE DIAGRAM
B
A
OPENCL INTRODUCTION | APRIL 11, 2014
C
31
3.4 CHARACTERS OF THE CALCULATION
 Each element in 𝐶 is calculate by the same computing with different
data of 𝐴 and 𝐵.
 The calculation for each element in C is independent to any others.
– There is no write collision.
 So it is suitable for data-parallel computing.
OPENCL INTRODUCTION | APRIL 11, 2014
32
3.5 OPENCL IMPLEMENTATION
 We assign one work-item for each element of 𝐶.
 We code a kernel for the calculation of one element of 𝐶.
 We use a 2DRange of size 𝐻𝐴 × 𝑊𝐵 .
– All the elements of 𝐶 would be generated concurrently.
OPENCL INTRODUCTION | APRIL 11, 2014
33
3.6 OPENCL MATRIX-MULTIPLY CODE
 kernel
OPENCL INTRODUCTION | APRIL 11, 2014
34
 The calculation for each
element in 𝐶 would be done in
parallel.
Query platform
Query devices
Command queue
Create buffers
Compile program
Create kernel
Compiler
 Set the size of NDRange (and
work-group) when enqueuing
the kernel.
Runtime layer
 What should be done in the
host is illustrated in the right
figure.
Platform layer
3.7 OPENCL IMPLEMENTATION
Set arguments
Execute kernel
OPENCL INTRODUCTION | APRIL 11, 2014
35
THANK YOU!
OPENCL INTRODUCTION | APRIL 11, 2014
36
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies,
omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not
limited to product and roadmap changes, component and motherboard version changes, new model and/or product
releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the
like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right
to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify
any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND
ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY
PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT,
SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED
HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are
trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered
trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only
and may be trademarks of their respective owners.
OPENCL INTRODUCTION | APRIL 11, 2014
37