### CUDA Library and Demo

Yafeng Yin, Lei Zhou, Hong Man 07/21/2010

### Outline

• • • •

Basic CUDA computation library

 GPULib, CUBLAS, CUFFT

 CULA /MAGMA , VSIPL

CUDA FIR Demo(UMD) Discuss and future work

### Basic lib - GPULib

GPULib provides a library of mathematical functions – addition, subtraction, multiplication, and division, as well as unary functions, including sin(), cos(), gamma(), and exp(), – interpolation, array reshaping, array slicing, and reduction operations

### CUBLAS

• • BLAS-- Basic Linear Algebra Subprograms

CUBLAS

Provide a set of functions for basic vector and matrix operations, such as matrix‐vector copy, sort, dot product, Euclidean norm etc – Real data • Level 1 (vector-vector O(N) ) • Level 2 (matrix-vector O(N2) ) • Level 3 (matrix-matrix O(N3) ) – Complex data • Level 1

cublasSgbmv()

cublasSgemv() cublasSger() cublasSsbmv() cublasSspmv() cublasSspr() cublasSspr2() cublasSsymv() cublasSsyr() cublasSsyr2() cublasStbmv() cublasStbsv()

### CUBLAS-Level 2 function

y = alpha * op(A) * x + beta * y

y = alpha * op(A) * x + beta * y A = alpha * x * yT + A y = alpha * A * x + beta * y , y = alpha * A * x + beta * y A = alpha * x * xT + A A = alpha * x * yT + alpha * y * xT + A , y = alpha * A * x + beta * y A = alpha * x * xT + A A = alpha * x * yT + alpha * y * xT + A , x = op(A) * x op(A) * x = b , output x

### Basic lib - CUFFT

CUFFT is the CUDA FFT library

Provides a simple interface for computing parallel FFT on an NVIDIA GPU

Allows users to leverage the floating-point power and parallelism of the GPU without having to develop a GPU-based FFT implementation

cufftPlan1d() ,cufftPlan2d() ,cufftPlan3d()

Creates a 1D,2D or 3D FFT plan configuration for a specified signal size

### Advanced lib – CULA and MAGMA

• • CULA: GPU Accelerated Linear Algebra – provide LAPACK (Linear Algebra PACKage) function on CUDA GPUs MAGMA: Matrix Algebra on GPU and Multicore Architectures – develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures and "Multicore+GPU" systems

• • • • •

Linear Equation Routines

– Solves a general system of linear equations AX=B.

Orthogonal Factorizations

– LQ ,RQ factorization

Least Squares Routines Symmetric and non- Symmetric Eigenvalue Routines Singular Value Decomposition (SVD) Routines

• LAPACK on CUDA GPUs – LU, QR, and Cholesky factorizations in both real and complex arithmetic (single and double) – Linear solvers based on LU, QR, and Cholesky in real arithmetic (single and double) – Mixed-precision iterative refinement solvers based on LU, QR, and Cholesky in real arithmetic – Reduction to upper Hessenberg form in real arithmetic (single and double) – MAGMA BLAS in real arithmetic (single and double),

VSIPL: Vector Image Signal Processing Library –

Generalized matrix product

Fast FIR filtering

Correlation

Fast Fourier Transform

QR decomposition

Random number generation

– Elementwise arithmetic, logical, and comparison operators, linear algebra procedures

• •

### Basic vector or matrix computation

GPULib, CUBLAS, CUFFT

vector or matrix: addition, subtraction, multiplication, and division sin(), cos(), sort, dot product, Libraries can be used for Signal Processing –

CULA /MAGMA , VSIPL

– –

LU, QR, and Cholesky factorizations SVD decompostion

### CUDA Demo (FIR)

GPU: NVIDIA GeForce 8600 GT CPU: Intel Duo CPU 2.33G

Software: Visual Studio 2005

### CUDA Demo (FIR)

Output NO GPU Run Time(msec) Memory Time(msec) Total Time CPU + GPU

1000 0.312121

0.166641

CPU Only Time(msec)

10000 100000 1000000 10000000 0.667264

4.210870

39.460812

391.816345

0.284254

1.489784

5.597150

48.080204

5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 1000

### CUDA Demo (FIR)

FIR Performance

CPU CPU+GPU 10000 100000 1000000 10000000

### Discuss and future work

• • • how to connect CUDA to the SSP re-hosting demo how to change the sequential executed codes in signal processing system to CUDA codes how to transfer the XML codes to CUDA codes to generate the CUDA input.

### Reference

• • CUDA Zone http://www.nvidia.com/object/cuda_home_new.ht

ml http://en.wikipedia.org/wiki/CUDA