by David Cronk, Ph.D. - University of Hawaii

Download Report

Transcript by David Cronk, Ph.D. - University of Hawaii

An Introduction to Parallel Computing
Dr. David Cronk
Innovative Computing Lab
University of Tennessee
Distribution A: Approved for public release; distribution is unlimited.
Outline
Parallel Architectures
Parallel Processing
› What is parallel processing?
› An example of parallel processing
› Why use parallel processing?
Parallel programming
› Programming models
› Message passing issues
• Data distribution
• Flow control
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
2
Shared Memory Architectures
Single address space
All processors have access to a pool of shared memory
Symmetric multiprocessors (SMPs) – Access time is uniform
CPU
CPU
CPU
CPU
CPU
bus
Main Memory
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
3
Shared Memory Architectures
Single address space
All processors have access to a pool of shared memory
Non-Uniform Memory Access (NUMA)
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
bus
Main Memory
bus
Main Memory
Network
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
4
Distributed memory
Architectures
M
M
M
M
M
M
M
M
P
P
P
P
P
P
P
P
Network
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
5
Networks
Grid – processors are connected to 4 neighbors
Cylinder – A closed grid
Torus – A closed cylinder
Hypercube – Each processor is connected to
2^n other processors, where n is the degree
of the hypercube
Fully Connected – Every processor is directly
connected to every other processor
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
6
Parallel Processing
What is parallel processing?
› Using multiple processors to solve a single
problem
• Task parallelism
– The problem consists of a number of independent
tasks
– Each processor or groups of processors can perform
a separate task
• Data parallelism
– The problem consists of dependent tasks
– Each processor works on a different part of data
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
7
Parallel Processing
1
4
dx


2
0 (1  x )
We can approximate the integral as a sum of rectangles
N
 F(x )x  
i 0
i
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
8
Parallel Processing
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
9
Parallel Processing
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
10
Parallel Processing
Why parallel processing?
› Faster time to completion
• Computation can be performed faster with
more processors
› Able to run larger jobs or at a higher
resolution
• Larger jobs can complete in a reasonable
amount of time on multiple processors
• Data for larger jobs can fit in memory when
spread out across multiple processors
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
11
Parallel Programming
Outline
› Programming models
› Message passing issues
• Data distribution
• Flow control
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
12
Parallel Programming
Programming models
› Shared memory
• All processes have access to global memory
› Distributed memory (message passing)
• Processes have access to only local memory.
Data is shared via explicit message passing
› Combination shared/distributed
• Groups of processes share access to “local”
data while data is shared between groups via
explicit message passing
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
13
Message Passing
Message passing is the most common method
for programming for distributed memory
With message passing, there is an explicit
sender and receiver of data
In message passing systems, different
processes are identified by unique identifiers
› Simplify this to each having a unique numerical
identifier
• Senders send data to a specific process based on this
identifier
• Receivers specify which process to receive from based
on this identifier
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
14
Parallel Programming
Message Passing Issues
› Data Distribution
• Minimize overhead
– Latency (message start up time)
» Few large messages is better than many small
– Memory movement
• Maximize load balance
– Less idle time waiting for data or synchronizing
– Each process should do about the same work
› Flow Control
• Minimize waiting
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
15
Data Distribution
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
16
Data Distribution
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
17
Flow Control
0
Send to 1
1
2
Send to
Recv
from
2 0 Recv
Send to
from
3 1
Send to
Recv
from
2 0 Recv
Send to
from
3 1
3
4
5 ………
Send to
Recv
from
4 2
Send to
Recv
from
4 2
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
18
“This presentation was made possible
through support provided by DoD HPCMP
PET activities through Mississippi State
University (MSU) under contract
No. N62306-01-D-7110.”
David Cronk
Distribution A: Approved for public release; distribution is unlimited.
19