Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic Overview • Hardware model • Programming model • Message Passing Interface.

Download Report

Transcript Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic Overview • Hardware model • Programming model • Message Passing Interface.

Message Passing Models
CEG 4131 Computer Architecture III
Miodrag Bolic
1
Overview
• Hardware model
• Programming model
• Message Passing Interface
2
Generic Model Of A Message-passing Multicomputer [5]
Node
Node
Node
Node
Node
Node
Message-passing
direct network
interconnection
Node
Node
Node
Node
Node
Node
3
Gyula Fehér
Generic Node Architecture [5]
External
channel
Node
Node
Node-processor
Processor +
Local memory +....
Fat-Node
-powerful processor
-large memory
-many chips
-costly/node
-moderate parallelism
Thin-Node
Router
External
channel
Communication
Processor +
Switch unit+ ....
External
channel
Gyula Fehér
Internal
channel(s)
-small processor
External -small memory
channel -one-few chips
-cheap/node
-high parallelism
4
Generic Organization Model [5]
P+M
CP
S
Switching network
P+M
CP
S
(b) Decentralized
P+M
CP
P+M
CP
P+M
CP
(c) Centralized
Gyula Fehér
5
Message Passing Properties [1]
• Complete computer as building block, including I/O
• Programming model: directly access only private
address space (local memory)
• Communication via explicit messages (send/receive)
• Communication integrated at I/O level, not memory
system, so no special hardware
• Resembles a network of workstations (which can
actually be used as multiprocessor systems)
6
Message Passing Program [1]
•
Problem: Sum all of the elements of an array of size n.
INITIALIZE; //assign proc_num and num_procs
if (proc_num == 0) //processor with a proc_num of 0 is the master,
//which sends out messages and sums the result
{
read_array(array_to_sum, size); //read the array and array size from file
size_to_sum = size/num_procs;
for (current_proc = 1; current_proc < num_procs; current_proc++)
{
lower_ind = size_to_sum * current_proc;
upper_ind = size_to_sum * (current_proc + 1);
SEND(current_proc, size_to_sum);
SEND(current_proc, array_to_sum[lower_ind:upper_ind]);
}
//master nodes sums its part of the array
sum = 0;
for (k = 0; k < size_to_sum; k++)
sum += array_to_sum[k];
global_sum = sum;
for (current_proc = 1; current_proc < num_procs; current_proc++)
{
RECEIVE(current_proc, local_sum);
global_sum += local_sum;
}
printf(“sum is %d”, global_sum);
}
else //any processor other than proc_num = 0 is a slave
{
sum = 0;
RECEIVE(0, size_to_sum);
RECEIVE(0, array_to_sum[0 : size_to_sum]);
for (k = 0; k < size_to_sum; k++)
sum += array_to_sum[k];
SEND(0, sum);
}
END;
7
Message Passing Program (cont.) [1]
Multiprocessor Software Functions Provided:
• INITIALIZE – assigns a number (proc_num) to each
processor in the system, assigns the total number of
processors (num_procs).
• SEND(receiving_processor_number, data) - sends data
to another processor
• BARRIER(n_procs) – When a BARRIER is encountered,
a processor waits at that BARRIER until n_procs
processors reach the BARRIER, then execution can
proceed.
8
Advantages [1]
• Advantages
– Easier to build than scalable shared memory machines
– Easy to scale (but topology is important)
– Programming model more removed from basic hardware
operations
– Coherency and synchronization is the responsibility of the user,
so the system designer need not worry about them.
• Disadvantages
– Large overhead: copying of buffers requires large data transfers
(this will kill the benefits of multiprocessing, if not kept to a
minimum).
– Programming is more difficult.
– Blocking nature of SEND/RECEIVE can cause increased latency
and deadlock issues.
9
Message-Passing Interface – MPI [3]
• Standardization - MPI is the only message passing
library which can be considered a standard. It is
supported on virtually all HPC platforms. Practically, it
has replaced all previous message passing libraries.
• Portability - There is no need to modify your source
code when you port your application to a different
platform that supports the MPI standard.
• Performance Opportunities - Vendor implementations
should be able to exploit native hardware features to
optimize performance.
• Functionality - Over 115 routines are defined.
• Availability - A variety of implementations are available,
both vendor and public domain.
10
MPI basics [3]
•
•
•
•
•
Start Processes
Send Messages
Receive Messages
Synchronize
With these four capabilities, you can construct any
program.
• MPI offers over 125 functions.
11
Communicators [3]
• Provide a named set of processes for communication:
– System allocated unique tags to processes
– All processes can be numbered from 0 to n-1
– Allow construction of libraries: application creates
communicators
• MPI_COMM_WORLD
– MPI uses objects called communicators and groups to define
which collection of processes may communicate with each other.
– Provide functions (split, duplicate, ...) for creating communicators
from other communicators
– Functions (size, my_rank, …) for finding out about all processes
within a communicator
• Blocking vs. non-blocking
12
Hello world example [3]
#include <stdio.h>
#include "mpi.h"
main(int argc, char** argv)
{
int my_PE_num;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_PE_num);
printf("Hello from %d.\n", my_PE_num);
MPI_Finalize();
}
13
Hello world example [3]
•
•
•
•
•
•
•
•
Hello from 5.
Hello from 3.
Hello from 1.
Hello from 2.
Hello from 7.
Hello from 0.
Hello from 6.
Hello from 4.
14
MPMD [3]
Use MPI_Comm_rank:
if (my_PE_num = 0)
Routine1
else if (my_PE_num = 1)
Routine2
else if (my_PE_num =2)
Routine3 . . .
15
Blocking Sending and Receiving Messages [3]
#include <stdio.h>
#include "mpi.h"
main(int argc, char** argv)
{
int my_PE_num, numbertoreceive, numbertosend=42;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_PE_num);
if (my_PE_num==0)
{
MPI_Recv( &numbertoreceive, 1, MPI_INT, MPI_ANY_SOURCE,
MPI_ANY_TAG, MPI_COMM_WORLD, &status);
printf("Number received is: %d\n", numbertoreceive);
}
else
MPI_Send( &numbertosend, 1, MPI_INT, 0, 10, MPI_COMM_WORLD);
MPI_Finalize();
}
16
Non-Blocking Message Passing Routines [4]
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
{
int numtasks, rank, next, prev, buf[2], tag1=1, tag2=2;
MPI_Request reqs[4];
MPI_Status stats[4];
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
prev = rank-1; next = rank+1;
if (rank == 0) prev = numtasks - 1;
if (rank == (numtasks - 1)) next = 0;
MPI_Irecv(&buf[0], 1, MPI_INT, prev, tag1, MPI_COMM_WORLD, &reqs[0]);
MPI_Irecv(&buf[1], 1, MPI_INT, next, tag2, MPI_COMM_WORLD, &reqs[1]);
MPI_Isend(&rank, 1, MPI_INT, prev, tag2, MPI_COMM_WORLD, &reqs[2]);
MPI_Isend(&rank, 1, MPI_INT, next, tag1, MPI_COMM_WORLD, &reqs[3]);
{ do some work }
MPI_Waitall(4, reqs, stats); MPI_Finalize();
}
17
Collective Communications [3]
• The Communicator specifies a process group to
participate in a collective communication
• MPI implements various optimized functions:
– Barrier synchronization
– Broadcast
– Reduction operations:
• with one destination or all in group destination
• Collective operations may or may not synchronize
18
Comparison MPI vs. OpenMP
Features
OpenMP
MPI
Apply parallelism in steps
yes
no
Scale to large number of
processors
maybe
yes
Code complexity
Small increase
Major increase
Runtime environment
Expensive compilers
Free
Cost of hardware
Very expensive
Cheap
Ease of modification
Easy
Hard
19
References
1. J. Kowalczyk, “Multiprocessor Systems,” Xilinx, 2003.
2. D. Culler, J. P. Singh, Parallel Computer Architectures, A
Hardware/Software Approach, Morgan Kaufman, 1999.
3. MPI Basics
4. Message Passing Interface (MPI)
5. D. Sima, T. Fountain and P. Kascuk, Advanced
Computer Architectures – A Design Space Approach,
Pearson, 1997.
20