Parallel Programming

Download Report

Transcript Parallel Programming

Lecture 2: Part I
Parallel Programming Models
A programming model is what
the programmer sees and
uses when developing
program
1
Sequential Programming
Model
Processor
P
code
M
os
Memory
2
Parallel Programming
Definition: PP is the activity of
constructing a parallel program
from a given algorithm.
 Interface between algorithms and
parallel architectures.

3
Why Is It So Difficult ?




Potentially more complicated than
sequential programming
Diverse parallel programming models
Lack of advanced parallel compiler,
debugger, profiler.
More people doing sequential
programming
4
QUESTION ?
Can we run sequential C code on
parallel machine to gain
speedup ?
5
Levels of Abstraction
Applications
Machine
Independent
Machine
Dependent
Algorithmic Paradigm
Language Supported
(Programming Models)
Hardware Architecture
Multiprocessor NOW Multicomputer
6
Parallel Programming
(Sequential or parallel) Application Algorithm
User (Programmer)
Parallel language
and other tools
(sequential or parallel) Source program
Parallel
Programming Compiler (including Preprocessor,
Assembler, and Linker)
Run-Time Support
and Other Libraries
Native Parallel Code
Parallel Platform (OS+Hardware)
7
Native Programming Model



The lowest-level, user-visible
programming model provided by a
specific parallel computer platform.
Example: Power C in SGI Power
Challenge, shmem in T3D
Other Programming models: data
parallel (HPF), message passing (MPI)
can be implemented on top of Power C.
8
Algorithmic Paradigms
(Engineering) - Parallel Track
Compute-Interact
 Work-Pool
 Divide and Conquer
 Pipelining (Data Stream)
 Master-Slave

9
Data vs. Control Parallelism

Data parallelism:
– Multiple, complete functional units apply
same operation ``simultaneously’’ to
different elements of data set.
• E.g., Divide the domain evenly among the
PEs and each PE performs the same task
– Hardware: SIMD/MIMD machine
– Data Parallel Programming: HPF (High
Performance Fortran), C*
10
Data vs. Control Parallelism

Control parallelism:
– Apply distinct operations to data elements
concurrently.
– Outputs of operations are fed in as inputs
to other operations, in an arbitrary way.
– The flow of data forms an arbitrary graph.
– A pipeline is a special case of this, where
the graph is just a single path.
11
Compute-Interact
C
C
C
Synchronous Interaction
C
C
C
Synchronous Interaction
12
Work Pool
Get jobs from pool
13
Divide and Conquer
Load
Load/2
Load/4
Merge results
14
Pipelined (Data Stream)
Edge Detection
Task 1
Edge Linking
Task 2
Line Generation
Task 2
15
Pipelined computation:



Divide computation into a number of
stages.
Devote separate functional units to each
stage;
if each completes in the same time,
then, once pipe is full, the throughput of
the pipeline is 1 result per clock.
16
Master-Slave
Master
(assign jobs)
Slave
Slave
Slave
17
Algorithmic Paradigms
(science) - Sequential Track
Divide-and-Conquer
 Dynamic Programming
 Branch-and-Bound
 Backtracking
 Greedy
***** Parallel Versions ??

18
Parallel Programming Models
Not talking about “language” !!
19
Programming Models

Homogeneity: refers to the similarity of
component processes in a parallel
program.
20
Programming Models



SPMD: Single-Program-Multiple-Data,
programs are homogeneous
MPMD: Multiple-Program-Multiple-Data,
programs are heterogeneous
SIMD: Single-Instruction-Multiple-Data,
restricted form of SPMD, all processors
execute the same instruction at the
same time. Restricted form of SPMD
21
Programming Models

Both SPMD and MPMD are MIMD -- different
instructions can be executed by different
processes at the same time.
22
What is SPMD?




Single Program, Multiple Data
Same program runs everywhere
Restriction on the general messagepassing model (MPMD)
Most venders only support SPMD
parallel programs
23
What is SPMD?



General message-passing model
(MPMD) can be emulated
A data-parallel program refers to an
SPMD program in general.
Defined by Alan Karp [1987].
24
An Example of SPMD and
MPMD Code
MPMD code:
parbegin {
A;
B;
C;
}
SPMD code:
Main( )
{
myid = getid ( );
if (myid=0) A;
elseif (myid=1) B;
else (myid=2) C;
}
25
MPMD
Network
A
B
C
26
SPMD
Network
Main( )
{
myid = getid ( );
if (myid=0) A;
elseif (myid=1)
B;
else (myid=2) C;
}
Main( )
{
myid = getid ( );
if (myid=0) A;
elseif (myid=1)
B;
else (myid=2) C;
}
Main( )
{
myid = getid ( );
if (myid=0) A;
elseif (myid=1)
B;
else (myid=2) C;
}
27
SPMD Programming

Two major phases:
– (1) data distribution choice: determine the
mapping of data onto nodes
– (2) parallel program generation: translate
sequential algorithm into the SPMD
program (only write one program !!).
28
Parallel Programming base on
SPMD (4 main tasks)
(1) Get node and environmental information:
How many nodes in the system?
Who am I ?
(2) Access data: convert local-to-global and
global-to-local indexes
(3) Insert message-passing primitives to
exchange data (implicitly or explicitly)
(4) Carry out operations on directly accessible
data (local operation, e.g., C code)
29
Programming Languages
Lots of names !!
30
Programming Languages





Implicit Parallel (KAP)
Data Parallel (Fortran 90, HPF, CM
Fortran,..)
Message-Passing (MPI, PVM, CMMD,
NX, Active Message, Fast Message, P4,
MPL, LAM, Express)
Shared-Variable (X3H5)
Hybrid : MPI-HPF, Split-C, MPI-Java
31
(SRG)
Data Parallel Language
main( ) {
double local[N], tmp[N], pi, w;
long i,j, N=100000;
w=1.0/N;
forall (I=0;i<N;I++) {
local[i]= (i-0.5)*w;
tmp[i] = 4.0/(…
}
pi = sum(tmp);
}
32
Data Parallel Model



Single threading (from user’s
viewpoint): one process + one thread of
control. Just like a sequential program.
Global naming space: all variable
reside in a single address space.
Parallel operations on aggregate data
structure: sum( ).
33
Data Parallel Model

Loosely synchronization: implicit
synchronization after every statement.
(Compared with tight synchronization in
an SIMD system -- on every instruction)
34
Message-Passing Language
C Code
M
M
M
P
P
P
Comm Subroutine
Communication Network
35
Message-Passing Model


Multithreading: multiple processes
simultaneously executing
Separate address space: local
variables are not visible to other
processes.
36
Message-Passing Model



Explicit allocation: both workload and
data are explicitly allocated to the
processes by the user.
Explicit interactions: communication,
synchronization, aggregation,...
Asynchronous: the processes execute
asynchronously.
37
Message Passing Programming
Model


Message-passing programming is more
directly painful, but it tends to build
locality of reference from the start.
The end result seems to be:
– for scalable, highly parallel, highperformance results, locality of reference
will always be central.
38
Shared variable Model
double local, pi, w;
long I, taskid;
long numtask;
w= 1.0/N
{
#pragma pfor iterate
(I=0;N;I)
for (I=0; I<N; I++) {
local = (I+0.5) * w;
…
#pragma shared
(pi, w) }
#pragma local (I,
#pragma critical
local)
pi=pi+local
}}
39
Shared Variable Model




Single address space (similar to data
parallel model)
Multithreading and asynchronous
(similar to message-passing)
Communication is done implicitly
through shared reads and writes of
variables.
Synchronization is explicit.
40
Shared-Memory Programming
Model

All data shared and visible by executing
threads

Shared-memory program starts out looking
simpler, but memory locality forces one to do
some strange transformations

Shared-Memory programming standards:
ANSI X3H5 (1993), POSIX Threads
(Pthreads), OpenMP, SGI Power C
41
Conclusion

Parallel programming has lagged far
behind the advances of parallel hardware

Compared to the sequential counterparts,
today’s parallel system software and
application software are few in quantity
and primitive in functionality.

Likely to continue !!
42