Document 7668104

Download Report

Transcript Document 7668104

MPI Workshop - II

Research Staff Week 2 of 3

Today’s Topics 

Course Map

Basic Collective Communications

MPI_Barrier

MPI_Scatter v, MPI_Gather v, MPI_Reduce

MPI Routines/Exercises

Pi, Matrix-Matrix mult., Vector-Matrix mult.

Other Collective Calls

References

MPI functional routines MPI Examples Course Map Week 1 Point to Point Basic Collective MPI_SEND (MPI_ISEND) MPI_RECV (MPI_IRECV) MPI_BCAST MPI_SCATTER MPI_GATHER Helloworld Swapmessage Vector Sum Week 2 Collective Communications MPI_BCAST MPI_SCATTERV MPI_GATHERV MPI_REDUCE MPI_BARRIER Pi Matrix/vector multiplication Matrix/matrix mulplication Week 3 Advanced Topics MPI_DATATYPE MPI_HVECTOR MPI_VECTOR MPI_STRUCT MPI_CART_CREATE Poisson Equation Passing Structures/ common blocks Parallel topologies in MPI

Example 1 - Pi Calculation   0  1 1  4

x

2

dx

Uses the following MPI calls:

MPI_BARRIER, MPI_BCAST, MPI_REDUCE

Integration Domain: Serial x 0 x 1 x 2 x 3 x N

Serial Pseudocode

f(x) = 1/(1+x 2 ) h = 1/N, sum = 0.0

do i = 1, N x = h*(i - 0.5) sum = sum + f(x) enddo pi = h * sum Example: N = 10, h=0.1

x={.05, .15, .25, .35, .45, .55, .65, .75, .85, .95}

Integration Domain: Parallel x 4 x 0 x 1 x 2 x 3 x N

Parallel Pseudocode

P(0) reads in N and Broadcasts N to each processor f(x) = 1/(1+x 2 ) Example: h = 1/N, sum = 0.0

do i = rank +1, N, nprocrs N = 10, h=0.1

Procrs: {P(0),P(1),P(2)} x = h*(i - 0.5) sum = sum + f(x) P(0) -> {.05, .35, .65, .95} P(1) -> {.15, .45, .75} enddo P(2) -> {.25, .55, .85} mypi = h * sum Collect (Reduce) mypi from each processor into a collective value of pi on the output processor

Collective Communications Synchronization 

Collective calls can (but are not required to) return as soon as their participation in a collective call is complete.

Return from a call does NOT indicate that other processes have completed their part in the communication.

Occasionally, it is necessary to force the synchronization of processes.

MPI_BARRIER

Collective Communications - Broadcast

MPI_BCAST

Collective Communications - Reduction  MPI_REDUCE 

MPI_SUM, MPI_PROD, MPI_MAX, MPI_MIN, MPI_IAND, MPI_BAND,...

Example 2: Matrix Multiplication (Easy) in C

C

AB

Two versions depending on whether or not the # rows of C and A are evenly divisible by the number of processes. Uses the following MPI calls: MPI_BCAST, MPI_BARRIER, MPI_SCATTER V , MPI_GATHER V

Serial Code in C/C++

for(i=0; i

C

Matrix Multiplication in C Parallel Example   

A

B

Collective Communications Scatter/Gather

MPI_GATHER, MPI_SCATTER, MPI_GATHERV, MPI_SCATTERV

Flavors of Scatter/Gather 

Equal-sized pieces of data distributed to each processor

MPI_SCATTER, MPI_GATHER

Unequal-sized pieces of data distributed

MPI_SCATTERV, MPI_GATHERV

Must specify arrays of sizes of data and their displacements from the start of the data to be distributed or collected.

Both of these arrays are of length equal to the size of communications group

Scatter/Scatterv Calling Syntax

int MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm) int MPI_Scatterv(void *sendbuf, int *sendcounts, int *offsets, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm)

Abbreviated Parallel Code (Equal size)

ierr=MPI_Scatter(*a,nrow_a*ncol_a/size,...); ierr=MPI_Bcast(*b,nrow_b*ncol_b,...); for(i=0; i

Abbreviated Parallel Code (Unequal)

ierr=MPI_Scatterv(*a,a_chunk_sizes,a_offsets,...); ierr=MPI_Bcast(*b,nrow_b*ncol_b, ...); for(i=0; i

Fortran version 

F77 - no dynamic memory allocation.

F90 - allocatable arrays, arrays allocated in contiguous memory.

Multi-dimensional arrays are stored in memory in column major order.

Questions for the student.

How should we distribute the data in this case? What about loop ordering?

We never distributed B matrix. What if B is large?

Example 3: Vector Matrix Product in C Illustrates MPI_Scatterv, MPI_Reduce, MPI_Bcast

C

A

B

Main part of parallel code

ierr=MPI_Scatterv(a,a_chunk_sizes,a_offsets,MPI_DOUBLE, apart,a_chunk_sizes[rank],MPI_DOUBLE, root, MPI_COMM_WORLD); ierr=MPI_Scatterv(btmp,b_chunk_sizes,b_offsets,MPI_DOUBLE, bparttmp,b_chunk_sizes[rank],MPI_DOUBLE, root, MPI_COMM_WORLD); … initialize cpart to zero … for(k=0; k

Collective Communications - Allgather

MPI_ALLGATHER

Collective Communications - Alltoall

MPI_ALLTOALL

References - MPI Tutorial

CS471 Class Web Site - Andy Pineda http://www.arc.unm.edu/~acpineda/CS471/HTML/CS471.html

MHPCC http://www.mhpcc.edu/training/workshop/html/mpi/MPIIntro.html

Edinburgh Parallel Computing Center http://www.epcc.ed.ac.uk/epic/mpi/notes/mpi-course-epic.book_1.html

Cornell Theory Center http://www.tc.cornell.edu/Edu/Talks/topic.html#mess

References - IBM Parallel Environment 

POE - Parallel Operating Environment http://www.mhpcc.edu/training/workshop/html/poe/poe.html

http://ibm.tc.cornell.edu/ibm/pps/doc/primer/

Loadleveler http://www.mhpcc.edu/training/workshop/html/loadleveler/Loa dLeveler.html

http://ibm.tc.cornell.edu/ibm/pps/doc/LlPrimer.html

http://www.qpsf.edu.au/software/ll-hints.html

Exercise: Vector Matrix Product in C Rewrite Example 3 to perform the vector matrix product as shown.

C

A

B