Speeding up MATLAB Applications
Download
Report
Transcript Speeding up MATLAB Applications
Parallel computing with MATLAB
© 2012 The MathWorks, Inc.1
Going Beyond Serial MATLAB Applications
MATLAB
Desktop (Client)
Worker
Worker
Worker
Worker
Worker
Worker
2
Programming Parallel Applications (CPU)
Ease of Use
Greater Control
Built-in support with toolboxes
3
Example: Optimizing Cell Tower Position
Built-in parallel support
With Parallel Computing Toolbox
use built-in parallel algorithms in
Optimization Toolbox
Run optimization in parallel
Use pool of MATLAB workers
4
Tools Providing Parallel Computing Support
Optimization Toolbox
Global Optimization Toolbox
Statistics Toolbox
Signal Processing Toolbox
Neural Network Toolbox
Image Processing Toolbox
…
BLOCKSETS
Directly leverage functions in Parallel Computing Toolbox
www.mathworks.com/builtin-parallel-support
5
Agenda
Task parallel applications
GPU acceleration
Data parallel applications
Using clusters and grids
6
Independent Tasks or Iterations
Ideal problem for parallel computing
No dependencies or communications between tasks
Examples: parameter sweeps, Monte Carlo simulations
Time
Time
blogs.mathworks.com/loren/2009/10/02/using-parfor-loops-getting-up-and-running/
7
Example: Parameter Sweep of ODEs
Parallel for-loops
Parameter sweep of ODE system
– Damped spring oscillator
– Sweep through different values
of damping and stiffness
– Record peak value for each
simulation
Convert for to parfor
Use pool of MATLAB workers
5
m x b x k x 0
1, 2 ,...
1, 2 ,...
8
The Mechanics of parfor Loops
1
1
2
23 34 4 55 66
7
8 8 9 910 10
Worker
a(i) = i;
a = zeros(10, 1)
parfor i = 1:10
a(i) = i;
end
a
Worker
a(i) = i;
Worker
a(i) = i;
Worker
a(i) = i;
Pool of MATLAB Workers
9
Agenda
Task parallel applications
GPU acceleration
Data parallel applications
Using clusters and grids
10
What is a Graphics Processing Unit (GPU)
Originally for graphics acceleration, now
also used for scientific calculations
Massively parallel array of integer and
floating point processors
– Typically hundreds of processors per card
– GPU cores complement CPU cores
Dedicated high-speed memory
* Parallel Computing Toolbox requires NVIDIA GPUs with Compute Capability 1.3 or
higher, including NVIDIA Tesla 20-series products. See a complete listing at
www.nvidia.com/object/cuda_gpus.html
11
Performance Gain with More Hardware
Using More Cores (CPUs)
Core 1
Using GPUs
Core 2
GPU cores
Core 3
Core 4
Cache
Device
Device Memory
Memory
12
Example: Mandelbrot set
The color of each pixel
is the result of hundreds
or thousands or
iterations
Each pixel is
independent of the other
pixels
Hundres of thousands of
pixels
13
Real-world performance increase
Solving a wave equation
Grid Size
CPU
(s)
GPU
(s)
Speedup
64 x 64
0.1004
0.3553
0.28
128 x 128
0.1931
0.3368
0.57
256 x 256
0.5888
0.4217
1.4
512 x 512
2.8163
0.8243
3.4
1024 x 1024
13.4797
2.4979
5.4
2048 x 2048
74.9904
9.9567
7.5
Intel Xeon Processor X5650, NVIDIA Tesla C2050 GPU
14
Programming Parallel Applications (GPU)
Built-in support with toolboxes
Ease of Use
gpuArray, gather
Advanced programming constructs:
arrayfun, spmd
Greater Control
Simple programming constructs:
Interface for experts:
CUDAKernel, MEX support
www.mathworks.com/help/distcomp/run-cuda-or-ptx-code-on-gpu
www.mathworks.com/help/distcomp/run-mex-functions-containing-cuda-code
15
Agenda
Task parallel applications
GPU acceleration
Data parallel applications
Using clusters and grids
16
Big data: Distributed Arrays
11 26 41
12 27 42
13 28 43
14 29 44
15 30 45
16 31 46
TOOLBOXES
17 32 47
BLOCKSETS
19 34 49
17 33 48
20 35 50
21 36 51
22 37 52
Remotely Manipulate Array
from Desktop
Distributed Array
Lives on the Cluster
17
Big Data: Distributed Arrays
Worker
Column 1:3 of y
Worker
Column 4:6 of y
y = distributed(rand(10));
Worker
Column 7:8 of y
Worker
Column 9:10 of y
Pool of MATLAB Workers
18
Demo: Approximation of π
1
0
4
𝑑𝑥 = 𝜋
2
1+𝑥
19
Programming Parallel Applications (CPU)
Ease of Use
Simple programming constructs:
parfor, batch, distributed
Greater Control
Built-in support with toolboxes
Advanced programming constructs:
createJob, labSend, spmd
20
Agenda
Task parallel applications
GPU acceleration
Data parallel applications
Using clusters and grids
21
Working on C3SE
22
Apply for a project with SNIC
23
© 2012 The MathWorks, Inc.
24