Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming Dr.

Download Report

Transcript Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming Dr.

Workshop 9: General purpose computing using
GPUs: Developing a hands-on
undergraduate course on CUDA programming
Dr. Barry Wilkinson
Dr. Yaohang Li
University of North Carolina
Charlotte
Old Dominion University
SIGCSE 2011 - The 42nd ACM Technical Symposium
on Computer Science Education
Wednesday March 9, 2011, 7:00 pm - 10:00 pm
1
SIGCSE 2011 Workshop 9 intro.ppt © 2010 B. Wilkinson Modification date: Feb 22, 2011
Agenda
7:00 pm - 7:15 pm
Welcome and opening remarks: GPUs and CUDA,
remote server configurations, guest accounts, sample
programs with graphics.
7:15pm - 8:25 pm
Session 1: Basic CUDA programming
• Presentation
• Hands-on experience using remote GPU server
8:25 pm - 8:35 pm
Break, with demos
8:35 pm - 9:35 pm
Session 2: Further features and performance of CUDA
programs
• Presentation
• Guided hands-on experience
9:35 pm - 10:00 pm Discussion of general-purpose GPU programming at
undergraduate level
2
Emergence of GPU systems for General
Purpose High Performance Computing
GPUs have developed
from graphics cards
into a platform for HPC GPU performance gains over CPUs
1400
T12
NVIDIA GPU
Intel CPU
1000
GFLOPs
GPUs are being
designed with that
application in mind
1200
GT200
800
G80
600
400
Very significant
performance
improvements on
scientific code
G70
200
NV30
NV40
3GHz Dual
Core P4
3GHz
Core2 Duo
3GHz Xeon
Quad
Westmere
0
9/22/2002
2/4/2004
6/18/2005
10/31/2006
3/14/2008
7/27/2009
Source © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009
ECE 498AL Spring 2010, University of Illinois, Urbana-Champaign
3
4
http://www.hpcwire.com/blogs/New-China-GPGPU-Super-Outruns-Jaguar-105987389.html
A hot topic to
teach
http://www.nvidia.com/object/cuda_courses_and_map.html
Taught at
Illinois,
Stanford,
MIT, Harvard,
Duke, Chapel
Hill, UNC-C,
…
Taught at
graduate level
and now
moving into
undergraduate
level
outline.5
GPU Course for High
Performance Computing

Concerned with using Graphics Processing Units
(GPUs) for high performance computing
Not graphics



A programming course
Uses CUDA (Compute Unified Device Architecture),
an architecture and programming model introduced
by NVIDIA in 2007
C-based. Easy to learn.
1993
NVIDIA products
NVIDIA Corp. is the leader in GPUs for high performance
computing:
Tesla 2050 GPU
has 448 thread
processors
Maxwell
(2013)
Kepler
(2011)
Fermi
NVIDIA's first
GPU with
general
purpose
processors
Established by JenHsun Huang, Chris
Malachowsky,
Curtis Priem
GeForce 2 series GeForce FX series
NV1
1995
Tesla
C870, S870, C1060, S1070, C2050, …
GeForce 400 series
Quadro
GTX460/465/470/475/
480/485
GT 80
GeForce 200 series
GeForce
GTX260/275/280/285/295
8800
GeForce 8 series
CUDA
GeForce 1
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Programming Model
GPUs historically designed for creating image data for
displays.
Involves manipulating image picture elements (pixels)
and often the same operation each pixel.
SIMD (Single Instruction Multiple Data) model - An
efficient mode of operation in which the same operation
done on each data element at the same time.
GPUs use a thread version of SIMD called Single
Instruction Multiple Thread (SIMT).
8
GPU’s SIMT Programming Model
GPUs use very lightweight threads to achieve high parallel
performance and to hide memory latency
Multiple threads, each execute the same instruction
sequence.
Very large number of threads (10,000’s) possible on GPUs.
Threads mapped onto available processors on GPU (100’s
of processors) all executing same program sequence
More on the program model shortly
9
Programming applications using
SIMT model
Matrix operations -- very amenable to SIMT
• Same operations done on different elements of matrices
Some “embarassingly” parallel computations such as
Monte Carlo calculations
•
Monte Carlo calculations use random selections that are
independent of each other
Data manipulations
• Some sorting can be done quite efficiently
10
Computer system used for
workshop at UNC-Charlotte
coit-grid01-4 Each
dual Xeon processors
(3.4Ghz) 8GB main
memory
coitgrid01
System to log onto first
coitgrid02
coitgrid03
switch
coit-grid05 -- Four
quad-core Xeon
processors (2.93Ghz)
64GB main memory
1.2 TB disk
coitgrid05
Only available directly
from on campus
coitgrid06
coitgrid04
NVIDIA Tesla
GPU (448
core Fermi)
All user’s home
directories on
coit-grid05 (NFS)
coit-grid01.uncc.edu – coit-grid06.uncc.edu
Guest accounts on computer
systems




Account details consist of an
account name and an ssh
password.
Logon through first to coit-grid01
and then to grid06
Files needed for hands-on
sessions provided in each
account.
More details in hands-on
session write-ups
Use PuTTY or WinSCP if
Windows
coit-grid01.uncc.edu
To make sure all X servers running
Xclock
running on
client PC
Xclock running
on coitgrid01.uncc.edu
Xclock running
on coitgrid06.uncc.edu
Xterm running on
client PC, logged onto
coit-grid06.uncc.edu
User
interface
accessing
for
forwarding
X11
graphics
Not
needed for
workshop
WinSCP running
on client PC
connected to
grid01.uncc.edu
1
Heat distribution problem
(Solving Laplace’s equation)
Simple implementation
800 x 800 points
50000 iterations
Speed-up = 16.57
Fireplace
Graphics
forwards
to client
computer
(PC)
14
N Body problem
15
Video
16
Questions
Next
Basic CUDA programming
Intro to
st
1
hands-on session