Document 7668104

Download Report

Transcript Document 7668104

MPI Workshop - II

Research Staff Week 2 of 3

Today’s Topics 

Course Map



Basic Collective Communications



MPI_Barrier



MPI_Scatter v, MPI_Gather v, MPI_Reduce



MPI Routines/Exercises



Pi, Matrix-Matrix mult., Vector-Matrix mult.



Other Collective Calls



References

MPI functional routines MPI Examples Course Map Week 1 Point to Point Basic Collective MPI_SEND (MPI_ISEND) MPI_RECV (MPI_IRECV) MPI_BCAST MPI_SCATTER MPI_GATHER Helloworld Swapmessage Vector Sum Week 2 Collective Communications MPI_BCAST MPI_SCATTERV MPI_GATHERV MPI_REDUCE MPI_BARRIER Pi Matrix/vector multiplication Matrix/matrix mulplication Week 3 Advanced Topics MPI_DATATYPE MPI_HVECTOR MPI_VECTOR MPI_STRUCT MPI_CART_CREATE Poisson Equation Passing Structures/ common blocks Parallel topologies in MPI

Example 1 - Pi Calculation   0  1 1  4

Uses the following MPI calls:

MPI_BARRIER, MPI_BCAST, MPI_REDUCE

Integration Domain: Serial x 0 x 1 x 2 x 3 x N

Serial Pseudocode

f(x) = 1/(1+x 2 ) h = 1/N, sum = 0.0

do i = 1, N x = h*(i - 0.5) sum = sum + f(x) enddo pi = h * sum Example: N = 10, h=0.1

x={.05, .15, .25, .35, .45, .55, .65, .75, .85, .95}

Integration Domain: Parallel x 4 x 0 x 1 x 2 x 3 x N

Parallel Pseudocode

P(0) reads in N and Broadcasts N to each processor f(x) = 1/(1+x 2 ) Example: h = 1/N, sum = 0.0

do i = rank +1, N, nprocrs N = 10, h=0.1

Procrs: {P(0),P(1),P(2)} x = h*(i - 0.5) sum = sum + f(x) P(0) -> {.05, .35, .65, .95} P(1) -> {.15, .45, .75} enddo P(2) -> {.25, .55, .85} mypi = h * sum Collect (Reduce) mypi from each processor into a collective value of pi on the output processor

Collective Communications Synchronization 

Collective calls can (but are not required to) return as soon as their participation in a collective call is complete.



Return from a call does NOT indicate that other processes have completed their part in the communication.



Occasionally, it is necessary to force the synchronization of processes.



MPI_BARRIER

Collective Communications - Broadcast

MPI_BCAST

Collective Communications - Reduction  MPI_REDUCE 

MPI_SUM, MPI_PROD, MPI_MAX, MPI_MIN, MPI_IAND, MPI_BAND,...

Example 2: Matrix Multiplication (Easy) in C



Two versions depending on whether or not the # rows of C and A are evenly divisible by the number of processes. Uses the following MPI calls: MPI_BCAST, MPI_BARRIER, MPI_SCATTER V , MPI_GATHER V

Serial Code in C/C++

for(i=0; i

C

Matrix Multiplication in C Parallel Example   

A



B

Collective Communications Scatter/Gather

MPI_GATHER, MPI_SCATTER, MPI_GATHERV, MPI_SCATTERV

Flavors of Scatter/Gather 

Equal-sized pieces of data distributed to each processor



MPI_SCATTER, MPI_GATHER



Unequal-sized pieces of data distributed



MPI_SCATTERV, MPI_GATHERV



Must specify arrays of sizes of data and their displacements from the start of the data to be distributed or collected.



Both of these arrays are of length equal to the size of communications group

Scatter/Scatterv Calling Syntax

int MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm) int MPI_Scatterv(void *sendbuf, int *sendcounts, int *offsets, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm)

Abbreviated Parallel Code (Equal size)

ierr=MPI_Scatter(*a,nrow_a*ncol_a/size,...); ierr=MPI_Bcast(*b,nrow_b*ncol_b,...); for(i=0; i

Abbreviated Parallel Code (Unequal)

ierr=MPI_Scatterv(*a,a_chunk_sizes,a_offsets,...); ierr=MPI_Bcast(*b,nrow_b*ncol_b, ...); for(i=0; i

Fortran version 

F77 - no dynamic memory allocation.



F90 - allocatable arrays, arrays allocated in contiguous memory.



Multi-dimensional arrays are stored in memory in column major order.



Questions for the student.



How should we distribute the data in this case? What about loop ordering?



We never distributed B matrix. What if B is large?

Example 3: Vector Matrix Product in C Illustrates MPI_Scatterv, MPI_Reduce, MPI_Bcast

C



A



B

Main part of parallel code

ierr=MPI_Scatterv(a,a_chunk_sizes,a_offsets,MPI_DOUBLE, apart,a_chunk_sizes[rank],MPI_DOUBLE, root, MPI_COMM_WORLD); ierr=MPI_Scatterv(btmp,b_chunk_sizes,b_offsets,MPI_DOUBLE, bparttmp,b_chunk_sizes[rank],MPI_DOUBLE, root, MPI_COMM_WORLD); … initialize cpart to zero … for(k=0; k

Collective Communications - Allgather

MPI_ALLGATHER

Collective Communications - Alltoall

MPI_ALLTOALL

References - MPI Tutorial

CS471 Class Web Site - Andy Pineda http://www.arc.unm.edu/~acpineda/CS471/HTML/CS471.html

MHPCC http://www.mhpcc.edu/training/workshop/html/mpi/MPIIntro.html

Edinburgh Parallel Computing Center http://www.epcc.ed.ac.uk/epic/mpi/notes/mpi-course-epic.book_1.html

Cornell Theory Center http://www.tc.cornell.edu/Edu/Talks/topic.html#mess

References - IBM Parallel Environment 

POE - Parallel Operating Environment http://www.mhpcc.edu/training/workshop/html/poe/poe.html

http://ibm.tc.cornell.edu/ibm/pps/doc/primer/



Loadleveler http://www.mhpcc.edu/training/workshop/html/loadleveler/Loa dLeveler.html

http://ibm.tc.cornell.edu/ibm/pps/doc/LlPrimer.html

http://www.qpsf.edu.au/software/ll-hints.html

Exercise: Vector Matrix Product in C Rewrite Example 3 to perform the vector matrix product as shown.

C



A



B

Document 7668104

Transcript Document 7668104

MPI Workshop - II

Directory