No Slide Title

Download Report

Transcript No Slide Title

MPI debugging and verification:
The MARMOT tool
www.eu-crossgrid.org
Architecture of
MARMOT
Application
process
Application
process
Additional
Process
Profiling
Interface
MARMOT
Profiling
Interface
MARMOT
MPI
(MARMOT’s
Debug
Server)
MPI
Client Side
Server Side
This figure shows how
an MPI application
running on two
processes makes use of
MARMOT: the MPI
calls are intercepted
and analyzed by
MARMOT with the
help of an additional
process and then
passed to the native
MPI implementation to
do the intended work.
MARMOT is a tool that verifies the correctness of parallel, distributed Grid applications
using the MPI paradigm. The Message Passing
Interface (MPI) is currently the only way to
write parallel programs that scale to thousands
of processors. Unfortunately, this way of
programming is also very error prone. The
reasons are lack of portability between
different MPI implementations, irreproducibility of parallel programs and race conditions
that occur only after hours of runtime.
Standard tools and debuggers offer little
support for this kind of problems. The primary
issues of MARMOT are to make end-user
applications portable, reproducible and reliable
on any platform of the Grid.
Design
MARMOT uses the MPI profiling interface to
analyze the application at runtime. The MPI
calls made by the application are intercepted by
MARMOT for analysis before they are passed
to the native MPI implementation.
Local checks like for example verification of
arguments are performed on the client side.
Global tasks, for example deadlock detection or
the output of MARMOT’s logging, are done on
the server side, using an additional process.
No source code modification is required. The
application only has to be relinked with the
MARMOT library. The additionally required
process is transparent for the application.
MARMOT
itself
also
uses
MPI
for
communication.
IST-2001-32243
MARMOT has
been tested with C
and Fortran
applications, e.g.
with the High
Energy Physics
application from
Task 1.3 or the
Meteo applications
from Task 1.4:
Supported Platforms
MARMOT has been tested successfully on
different platforms, for example
• Linux IA32/IA64 with MPICH
• IBM Regatta
•Hitachi SR8000
•NEC SX-5
Responsible Partner
The High Performance Computing Center
Stuttgart (HLRS) is responsible for the
development of MARMOT within the CrossGrid
project. HLRS hosts one of the largest
European supercomputer resources, which are
accessible to academic users and industry.
References
•Rainer Keller, Bettina Krammer, Matthias S.
Müller, Michael M. Resch, Edgar Gabriel "MPI
Development Tools and Applications for the
Grid", Workshop on Grid Applications and
Programming Tools, Seattle, U.S.A., June 25,
2003.
•Bettina Krammer, Katrin Bidmon, Matthias S.
Müller, Michael M. Resch . "MARMOT: An MPI
Analysis and Checking Tool", Parallel Computing
2003, Dresden, Germany, September 2-5, 2003
More Information and Download
http://www.hlrs.de/organization/tsc/projects/marmot
Contact:
Bettina Krammer,
Matthias Müller,
HLRS, Stuttgart, Germany
[email protected],
[email protected]
IST-2001-32243