Parallel Programming
Download
Report
Transcript Parallel Programming
Lecture 2: Part I
Parallel Programming Models
A programming model is what
the programmer sees and
uses when developing
program
1
Sequential Programming
Model
Processor
P
code
M
os
Memory
2
Parallel Programming
Definition: PP is the activity of
constructing a parallel program
from a given algorithm.
Interface between algorithms and
parallel architectures.
3
Why Is It So Difficult ?
Potentially more complicated than
sequential programming
Diverse parallel programming models
Lack of advanced parallel compiler,
debugger, profiler.
More people doing sequential
programming
4
QUESTION ?
Can we run sequential C code on
parallel machine to gain
speedup ?
5
Levels of Abstraction
Applications
Machine
Independent
Machine
Dependent
Algorithmic Paradigm
Language Supported
(Programming Models)
Hardware Architecture
Multiprocessor NOW Multicomputer
6
Parallel Programming
(Sequential or parallel) Application Algorithm
User (Programmer)
Parallel language
and other tools
(sequential or parallel) Source program
Parallel
Programming Compiler (including Preprocessor,
Assembler, and Linker)
Run-Time Support
and Other Libraries
Native Parallel Code
Parallel Platform (OS+Hardware)
7
Native Programming Model
The lowest-level, user-visible
programming model provided by a
specific parallel computer platform.
Example: Power C in SGI Power
Challenge, shmem in T3D
Other Programming models: data
parallel (HPF), message passing (MPI)
can be implemented on top of Power C.
8
Algorithmic Paradigms
(Engineering) - Parallel Track
Compute-Interact
Work-Pool
Divide and Conquer
Pipelining (Data Stream)
Master-Slave
9
Data vs. Control Parallelism
Data parallelism:
– Multiple, complete functional units apply
same operation ``simultaneously’’ to
different elements of data set.
• E.g., Divide the domain evenly among the
PEs and each PE performs the same task
– Hardware: SIMD/MIMD machine
– Data Parallel Programming: HPF (High
Performance Fortran), C*
10
Data vs. Control Parallelism
Control parallelism:
– Apply distinct operations to data elements
concurrently.
– Outputs of operations are fed in as inputs
to other operations, in an arbitrary way.
– The flow of data forms an arbitrary graph.
– A pipeline is a special case of this, where
the graph is just a single path.
11
Compute-Interact
C
C
C
Synchronous Interaction
C
C
C
Synchronous Interaction
12
Work Pool
Get jobs from pool
13
Divide and Conquer
Load
Load/2
Load/4
Merge results
14
Pipelined (Data Stream)
Edge Detection
Task 1
Edge Linking
Task 2
Line Generation
Task 2
15
Pipelined computation:
Divide computation into a number of
stages.
Devote separate functional units to each
stage;
if each completes in the same time,
then, once pipe is full, the throughput of
the pipeline is 1 result per clock.
16
Master-Slave
Master
(assign jobs)
Slave
Slave
Slave
17
Algorithmic Paradigms
(science) - Sequential Track
Divide-and-Conquer
Dynamic Programming
Branch-and-Bound
Backtracking
Greedy
***** Parallel Versions ??
18
Parallel Programming Models
Not talking about “language” !!
19
Programming Models
Homogeneity: refers to the similarity of
component processes in a parallel
program.
20
Programming Models
SPMD: Single-Program-Multiple-Data,
programs are homogeneous
MPMD: Multiple-Program-Multiple-Data,
programs are heterogeneous
SIMD: Single-Instruction-Multiple-Data,
restricted form of SPMD, all processors
execute the same instruction at the
same time. Restricted form of SPMD
21
Programming Models
Both SPMD and MPMD are MIMD -- different
instructions can be executed by different
processes at the same time.
22
What is SPMD?
Single Program, Multiple Data
Same program runs everywhere
Restriction on the general messagepassing model (MPMD)
Most venders only support SPMD
parallel programs
23
What is SPMD?
General message-passing model
(MPMD) can be emulated
A data-parallel program refers to an
SPMD program in general.
Defined by Alan Karp [1987].
24
An Example of SPMD and
MPMD Code
MPMD code:
parbegin {
A;
B;
C;
}
SPMD code:
Main( )
{
myid = getid ( );
if (myid=0) A;
elseif (myid=1) B;
else (myid=2) C;
}
25
MPMD
Network
A
B
C
26
SPMD
Network
Main( )
{
myid = getid ( );
if (myid=0) A;
elseif (myid=1)
B;
else (myid=2) C;
}
Main( )
{
myid = getid ( );
if (myid=0) A;
elseif (myid=1)
B;
else (myid=2) C;
}
Main( )
{
myid = getid ( );
if (myid=0) A;
elseif (myid=1)
B;
else (myid=2) C;
}
27
SPMD Programming
Two major phases:
– (1) data distribution choice: determine the
mapping of data onto nodes
– (2) parallel program generation: translate
sequential algorithm into the SPMD
program (only write one program !!).
28
Parallel Programming base on
SPMD (4 main tasks)
(1) Get node and environmental information:
How many nodes in the system?
Who am I ?
(2) Access data: convert local-to-global and
global-to-local indexes
(3) Insert message-passing primitives to
exchange data (implicitly or explicitly)
(4) Carry out operations on directly accessible
data (local operation, e.g., C code)
29
Programming Languages
Lots of names !!
30
Programming Languages
Implicit Parallel (KAP)
Data Parallel (Fortran 90, HPF, CM
Fortran,..)
Message-Passing (MPI, PVM, CMMD,
NX, Active Message, Fast Message, P4,
MPL, LAM, Express)
Shared-Variable (X3H5)
Hybrid : MPI-HPF, Split-C, MPI-Java
31
(SRG)
Data Parallel Language
main( ) {
double local[N], tmp[N], pi, w;
long i,j, N=100000;
w=1.0/N;
forall (I=0;i<N;I++) {
local[i]= (i-0.5)*w;
tmp[i] = 4.0/(…
}
pi = sum(tmp);
}
32
Data Parallel Model
Single threading (from user’s
viewpoint): one process + one thread of
control. Just like a sequential program.
Global naming space: all variable
reside in a single address space.
Parallel operations on aggregate data
structure: sum( ).
33
Data Parallel Model
Loosely synchronization: implicit
synchronization after every statement.
(Compared with tight synchronization in
an SIMD system -- on every instruction)
34
Message-Passing Language
C Code
M
M
M
P
P
P
Comm Subroutine
Communication Network
35
Message-Passing Model
Multithreading: multiple processes
simultaneously executing
Separate address space: local
variables are not visible to other
processes.
36
Message-Passing Model
Explicit allocation: both workload and
data are explicitly allocated to the
processes by the user.
Explicit interactions: communication,
synchronization, aggregation,...
Asynchronous: the processes execute
asynchronously.
37
Message Passing Programming
Model
Message-passing programming is more
directly painful, but it tends to build
locality of reference from the start.
The end result seems to be:
– for scalable, highly parallel, highperformance results, locality of reference
will always be central.
38
Shared variable Model
double local, pi, w;
long I, taskid;
long numtask;
w= 1.0/N
{
#pragma pfor iterate
(I=0;N;I)
for (I=0; I<N; I++) {
local = (I+0.5) * w;
…
#pragma shared
(pi, w) }
#pragma local (I,
#pragma critical
local)
pi=pi+local
}}
39
Shared Variable Model
Single address space (similar to data
parallel model)
Multithreading and asynchronous
(similar to message-passing)
Communication is done implicitly
through shared reads and writes of
variables.
Synchronization is explicit.
40
Shared-Memory Programming
Model
All data shared and visible by executing
threads
Shared-memory program starts out looking
simpler, but memory locality forces one to do
some strange transformations
Shared-Memory programming standards:
ANSI X3H5 (1993), POSIX Threads
(Pthreads), OpenMP, SGI Power C
41
Conclusion
Parallel programming has lagged far
behind the advances of parallel hardware
Compared to the sequential counterparts,
today’s parallel system software and
application software are few in quantity
and primitive in functionality.
Likely to continue !!
42