Transcript 7-20-2010-CUDA lib summary
CUDA Library and Demo
Yafeng Yin, Lei Zhou, Hong Man 07/21/2010
Outline
• • • •
Basic CUDA computation library
GPULib, CUBLAS, CUFFT
Advanced CUDA computation library
CULA /MAGMA , VSIPL
CUDA FIR Demo(UMD) Discuss and future work
Basic lib - GPULib
• GPULib provides a library of mathematical functions – addition, subtraction, multiplication, and division, as well as unary functions, including sin(), cos(), gamma(), and exp(), – interpolation, array reshaping, array slicing, and reduction operations
Basic lib -
CUBLAS
• • BLAS-- Basic Linear Algebra Subprograms
CUBLAS
Provide a set of functions for basic vector and matrix operations, such as matrix‐vector copy, sort, dot product, Euclidean norm etc – Real data • Level 1 (vector-vector O(N) ) • Level 2 (matrix-vector O(N2) ) • Level 3 (matrix-matrix O(N3) ) – Complex data • Level 1
cublasSgbmv()
cublasSgemv() cublasSger() cublasSsbmv() cublasSspmv() cublasSspr() cublasSspr2() cublasSsymv() cublasSsyr() cublasSsyr2() cublasStbmv() cublasStbsv()
CUBLAS-Level 2 function
y = alpha * op(A) * x + beta * y
y = alpha * op(A) * x + beta * y A = alpha * x * yT + A y = alpha * A * x + beta * y , y = alpha * A * x + beta * y A = alpha * x * xT + A A = alpha * x * yT + alpha * y * xT + A , y = alpha * A * x + beta * y A = alpha * x * xT + A A = alpha * x * yT + alpha * y * xT + A , x = op(A) * x op(A) * x = b , output x
Basic lib - CUFFT
•
CUFFT is the CUDA FFT library
–
Provides a simple interface for computing parallel FFT on an NVIDIA GPU
–
Allows users to leverage the floating-point power and parallelism of the GPU without having to develop a GPU-based FFT implementation
–
cufftPlan1d() ,cufftPlan2d() ,cufftPlan3d()
Creates a 1D,2D or 3D FFT plan configuration for a specified signal size
Advanced lib – CULA and MAGMA
• • CULA: GPU Accelerated Linear Algebra – provide LAPACK (Linear Algebra PACKage) function on CUDA GPUs MAGMA: Matrix Algebra on GPU and Multicore Architectures – develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures and "Multicore+GPU" systems
Advanced lib -CULA function
• • • • •
Linear Equation Routines
– Solves a general system of linear equations AX=B.
Orthogonal Factorizations
– LQ ,RQ factorization
Least Squares Routines Symmetric and non- Symmetric Eigenvalue Routines Singular Value Decomposition (SVD) Routines
Advanced lib - MAGMA
• LAPACK on CUDA GPUs – LU, QR, and Cholesky factorizations in both real and complex arithmetic (single and double) – Linear solvers based on LU, QR, and Cholesky in real arithmetic (single and double) – Mixed-precision iterative refinement solvers based on LU, QR, and Cholesky in real arithmetic – Reduction to upper Hessenberg form in real arithmetic (single and double) – MAGMA BLAS in real arithmetic (single and double),
Advanced lib -VSIPL
• VSIPL: Vector Image Signal Processing Library –
Generalized matrix product
–
Fast FIR filtering
–
Correlation
–
Fast Fourier Transform
–
QR decomposition
–
Random number generation
– Elementwise arithmetic, logical, and comparison operators, linear algebra procedures
CUDA library Summary
• •
Basic vector or matrix computation
–
GPULib, CUBLAS, CUFFT
– vector or matrix: addition, subtraction, multiplication, and division sin(), cos(), sort, dot product, Libraries can be used for Signal Processing –
CULA /MAGMA , VSIPL
– –
LU, QR, and Cholesky factorizations SVD decompostion
CUDA Demo (FIR)
GPU: NVIDIA GeForce 8600 GT CPU: Intel Duo CPU 2.33G
Software: Visual Studio 2005
CUDA Demo (FIR)
Output NO GPU Run Time(msec) Memory Time(msec) Total Time CPU + GPU
1000 0.312121
0.166641
CPU Only Time(msec)
10000 100000 1000000 10000000 0.667264
4.210870
39.460812
391.816345
0.284254
1.489784
5.597150
48.080204
5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 1000
CUDA Demo (FIR)
FIR Performance
CPU CPU+GPU 10000 100000 1000000 10000000
Discuss and future work
• • • how to connect CUDA to the SSP re-hosting demo how to change the sequential executed codes in signal processing system to CUDA codes how to transfer the XML codes to CUDA codes to generate the CUDA input.
Reference
• • CUDA Zone http://www.nvidia.com/object/cuda_home_new.ht
ml http://en.wikipedia.org/wiki/CUDA