Transcript Document
"Distributed Computing and Grid-technologies
in Science and Education"
PROSPECTS OF USING GPU IN
DESKTOP-GRID SYSTEMS
Klimov Georgy
Dubna, 2012
AGENDA
•
•
•
•
•
•
•
•
Grid & GPU
GPU architecture
CUDA technologies
Grid-projects with GPU using
Monotonic Basin Hopping method
CUDA-realization of MBH
Further investigations plan
Summary
Grid & GPU
Problems, solving by
Grid:
• effective using of
existing resources
• working with huge
data arrays
• providing high
performance
GPU advantages:
• ~33% of all PCs are equipped
with modern GPU (~60% - Nvidia)
• Common usage of GPU
resources <5% (HD film)
• GPU optimized for working with
huge textures arrays
• Modern GPUs consist of tens or
even hundreds cores. It means
great performance for some kinds
of tasks
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
GPU architecture
•scalable array of ТРС
•with it’s own DRAM
• 8 Scalar Processors
• 2 Special Functions Units
• Double Precision Unit
• Register File
• Shared Memory
• Texture Memory Cache
• Constant Memory Cache
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
CUDA technology
CUDA – Compute Unified Device Architecture
• Supports all NVidia GPUs starting from GeForce 8-x series
• Low level access to the hardware - graphics API knowledge
not required
• CUDA programming language is based on C/C++ syntax –
easier porting of existing code
• Greater performance comparing to OpenCL (50-100%
performance increase in different researches)
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
CUDA technology
CUDA programming model
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
CUDA technology
CUDA threads hierarchy
• Threads groups in Blocks (1, 2 or 3-dim)
• Blocks groups in Grid (1 or 2-dim)
• Treads within Block:
Sharing data through shared memory
Synchronizing their execution
• Threads from different blocks operate
independently
• Built variables threadIdx, blockIdx etc.
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
CUDA technology
CUDA memory hierarchy
Memory type
Access
Level
Speed
Registers
R/W
Per-thread
High (on chip)
Local
R/W
Per-thread
Low (DRAM)
Shared
R/W
Per-block
High (on chip)
Global
R/W
Per-grid
Low (DRAM)
Constant
R/O
Per-grid
High (L1 cache)
Texture
R/O
Per-grid
High (L1 cache)
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
Grid-projects with GPU using
GPUgrid.net - volunteer distributed computing project for biomedical
research from the Universitat Pompeu Fabra in Barcelona (Spain)
Collatz Conjecture - research in mathematics, specifically testing
the Collatz Conjecture also known as 3x+1 or HOTPO (half or triple plus
one).
PrimeGrid -
to bring the excitement of prime finding
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
Monotonic Basin Hopping method
Algorithm steps:
1. Start from point x0
2. Repeat until the stop condition:
2.1. generate point Φ(x)
2.2. apply the local minimization algorithm
to the point Φ (x) → get point x1.
2.3. if f (x1 ) < f (x) , then x = x1
3. Return x
* Gradient descent was used as local
minimization algorithm
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
CUDA-realization of MBH
• Divide the research
area into equal square
areas
Ymax
I, j
• Each thread
implements the
algorithm in it’s area
Ymin
Xmin
Xmax
• Find minimum among
the results of each
thread
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
CUDA-realization of MBH
Used hardware:
GPU1 - Tesla 10:
max threads per block = 512
max threads per dim = 512
max blocks per dim = 65535
number of multiproc = 30
GPU2 - GeForce GT 525M:
max threads per block = 1024
max threads per dim = 1024
max blocks per dim = 65535
number of multiproc = 2
CPU - Intel core2duo T6400
number of cores = 2
Clock speed = 2 GHz
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
CUDA-realization of MBH
Methodology of the experiment
• Four parameters: the radius of the “jump” of the algorithm MBH - r, the
maximum number of steps in the cycle - N, the number of blocks
launched - Nb and the number of threads per block - Nt
• Set Nb and Nt
• The radius r is calculated as half of a square area diametr
• The number of cycle’s steps N is determined a result of the experiment *
• 4 test functions were selected: Ackley, Griewank, Rastrigin, Shubert
1. The result is considered valid if it differs from the tabular less
than 0.001
2. The result is considered valid if an average of 9 times out of
10 gives the right within the specified accuracy of the answer
3. The time averaged over 20 runs of the program
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
CUDA-realization of MBH
Results for Ackley function
160 sec
GeForce GT 525M
35 sec
Tesla 10
1.5 sec
Minimal time of finding extremum, sec
CPU
block
blocks
blocks
blocks
Number of treads per block
Minimal time of finding extremum, sec
AVG executing time
block
blocks
blocks
blocks
Number of treads per block
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
CUDA-realization of MBH
Results for Griewank function
155 sec
GeForce GT 525M
33 sec
Tesla 10
2.2 sec
Minimal time of finding extremum, sec
CPU
block
blocks
blocks
blocks
Number of treads per block
Minimal time of finding extremum, sec
AVG executing time
block
blocks
blocks
blocks
Number of treads per block
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
CUDA-realization of MBH
Results for Rastrigin function
125 sec
GeForce GT 525M
28.5 sec
Tesla 10
2.0 sec
Minimal time of finding extremum, sec
CPU
block
blocks
blocks
blocks
Number of treads per block
Minimal time of finding extremum, sec
AVG executing time
block
blocks
blocks
blocks
Number of treads per block
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
CUDA-realization of MBH
Results for Shubert function
300 sec
GeForce GT 525M
82 sec
Tesla 10
4.3 sec
Minimal time of finding extremum, sec
CPU
block
blocks
blocks
blocks
Number of treads per block
Minimal time of finding extremum, sec
AVG executing time
block
blocks
blocks
blocks
Number of treads per block
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
Further investigations plan
• Use more complicated and accurate local
optimization methods
• Uprgrade method of parallization
• Improve algorithm of MBH “jump” set-up
• Build solution for Molecular cluster modeling
based on MBH method
• Integrate CUDA-solution to BNB-Grid project
• Describe class of functions that can be
effectively processed on GPUs
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
Summary
• There are huge share of GPUs among PCs
• GPU is a multicore system
• CUDA is one of the technologies that provides
great performance of GPU calculations
• There are a number of Grid-projects that
already use CUDA
• Tests shows that in some cases GPU perform
5-100 times better than CPU
PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS
Klimov G., CMC MSU 2012
THANKS FOR YOUR ATTENTION!