Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming Dr.
Download ReportTranscript Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming Dr.
Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming Dr. Barry Wilkinson Dr. Yaohang Li University of North Carolina Charlotte Old Dominion University SIGCSE 2011 - The 42nd ACM Technical Symposium on Computer Science Education Wednesday March 9, 2011, 7:00 pm - 10:00 pm 1 SIGCSE 2011 Workshop 9 intro.ppt © 2010 B. Wilkinson Modification date: Feb 22, 2011 Agenda 7:00 pm - 7:15 pm Welcome and opening remarks: GPUs and CUDA, remote server configurations, guest accounts, sample programs with graphics. 7:15pm - 8:25 pm Session 1: Basic CUDA programming • Presentation • Hands-on experience using remote GPU server 8:25 pm - 8:35 pm Break, with demos 8:35 pm - 9:35 pm Session 2: Further features and performance of CUDA programs • Presentation • Guided hands-on experience 9:35 pm - 10:00 pm Discussion of general-purpose GPU programming at undergraduate level 2 Emergence of GPU systems for General Purpose High Performance Computing GPUs have developed from graphics cards into a platform for HPC GPU performance gains over CPUs 1400 T12 NVIDIA GPU Intel CPU 1000 GFLOPs GPUs are being designed with that application in mind 1200 GT200 800 G80 600 400 Very significant performance improvements on scientific code G70 200 NV30 NV40 3GHz Dual Core P4 3GHz Core2 Duo 3GHz Xeon Quad Westmere 0 9/22/2002 2/4/2004 6/18/2005 10/31/2006 3/14/2008 7/27/2009 Source © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009 ECE 498AL Spring 2010, University of Illinois, Urbana-Champaign 3 4 http://www.hpcwire.com/blogs/New-China-GPGPU-Super-Outruns-Jaguar-105987389.html A hot topic to teach http://www.nvidia.com/object/cuda_courses_and_map.html Taught at Illinois, Stanford, MIT, Harvard, Duke, Chapel Hill, UNC-C, … Taught at graduate level and now moving into undergraduate level outline.5 GPU Course for High Performance Computing Concerned with using Graphics Processing Units (GPUs) for high performance computing Not graphics A programming course Uses CUDA (Compute Unified Device Architecture), an architecture and programming model introduced by NVIDIA in 2007 C-based. Easy to learn. 1993 NVIDIA products NVIDIA Corp. is the leader in GPUs for high performance computing: Tesla 2050 GPU has 448 thread processors Maxwell (2013) Kepler (2011) Fermi NVIDIA's first GPU with general purpose processors Established by JenHsun Huang, Chris Malachowsky, Curtis Priem GeForce 2 series GeForce FX series NV1 1995 Tesla C870, S870, C1060, S1070, C2050, … GeForce 400 series Quadro GTX460/465/470/475/ 480/485 GT 80 GeForce 200 series GeForce GTX260/275/280/285/295 8800 GeForce 8 series CUDA GeForce 1 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Programming Model GPUs historically designed for creating image data for displays. Involves manipulating image picture elements (pixels) and often the same operation each pixel. SIMD (Single Instruction Multiple Data) model - An efficient mode of operation in which the same operation done on each data element at the same time. GPUs use a thread version of SIMD called Single Instruction Multiple Thread (SIMT). 8 GPU’s SIMT Programming Model GPUs use very lightweight threads to achieve high parallel performance and to hide memory latency Multiple threads, each execute the same instruction sequence. Very large number of threads (10,000’s) possible on GPUs. Threads mapped onto available processors on GPU (100’s of processors) all executing same program sequence More on the program model shortly 9 Programming applications using SIMT model Matrix operations -- very amenable to SIMT • Same operations done on different elements of matrices Some “embarassingly” parallel computations such as Monte Carlo calculations • Monte Carlo calculations use random selections that are independent of each other Data manipulations • Some sorting can be done quite efficiently 10 Computer system used for workshop at UNC-Charlotte coit-grid01-4 Each dual Xeon processors (3.4Ghz) 8GB main memory coitgrid01 System to log onto first coitgrid02 coitgrid03 switch coit-grid05 -- Four quad-core Xeon processors (2.93Ghz) 64GB main memory 1.2 TB disk coitgrid05 Only available directly from on campus coitgrid06 coitgrid04 NVIDIA Tesla GPU (448 core Fermi) All user’s home directories on coit-grid05 (NFS) coit-grid01.uncc.edu – coit-grid06.uncc.edu Guest accounts on computer systems Account details consist of an account name and an ssh password. Logon through first to coit-grid01 and then to grid06 Files needed for hands-on sessions provided in each account. More details in hands-on session write-ups Use PuTTY or WinSCP if Windows coit-grid01.uncc.edu To make sure all X servers running Xclock running on client PC Xclock running on coitgrid01.uncc.edu Xclock running on coitgrid06.uncc.edu Xterm running on client PC, logged onto coit-grid06.uncc.edu User interface accessing for forwarding X11 graphics Not needed for workshop WinSCP running on client PC connected to grid01.uncc.edu 1 Heat distribution problem (Solving Laplace’s equation) Simple implementation 800 x 800 points 50000 iterations Speed-up = 16.57 Fireplace Graphics forwards to client computer (PC) 14 N Body problem 15 Video 16 Questions Next Basic CUDA programming Intro to st 1 hands-on session