AMMPI - Summary • Active Messages–2 (AM) implementation over MPI version 1.1 – Porting is trivial - works on virtually any platform.

Download Report

Transcript AMMPI - Summary • Active Messages–2 (AM) implementation over MPI version 1.1 – Porting is trivial - works on virtually any platform.

AMMPI - Summary • Active Messages–2 (AM) implementation over MPI version 1.1

– Porting is trivial - works on virtually any platform that has MPI 1.1

– Often provides very high performance – vendors tune their MPI well – Linux/Myrinet, MPICH, IBM SP3, Origin 2000, Cray T3E, many others…

• Based on the AMUDP code base, same cool features

– Robust, clear error reporting for ease of debugging – SPMD bootstrapping library (but we use site-specific mpirun) – Network performance/utilization monitoring API

• MPI Interface

– Non-blocking sends, non-blocking receives – Uses MPI communicators to co-exist happily with other MPI-aware layers

AMMPI – Latency Performance

ROCKS MPI-Myrinet (naïve) ROCKS MPI-Myrinet (pre-post recv) ROCKS MPI-Myrinet (full) Millennium MPI-TCP (Gigabit Ethernet) Millennium *AMUDP* (Gigabit Ethernet) NOW MPI-AM-Myrinet Cray MPI-shmem Origin 2000 - NCSA SP3 - Blue Horizon (within node) SP3 - Seaborg (within node) SP3 - Blue Horizon (between nodes) SP3 - Seaborg (between nodes - us) SP3 - Seaborg (between nodes - ip) 0 20 Good 40

AMMPI Round-trip Latency

60

microseconds

80 100 28 35 33 Bad 120 115 120 31 32 36 46 56 58 140 320 273 minimal small message, round-trip time measured from application

AMMPI – Bandwidth Performance

ROCKS MPI-Myrinet (full) Millennium MPI-TCP (Gigabit Ethernet) Millennium *AMUDP* (Gigabit Ethernet) NOW MPI-AM-Myrinet Cray MPI-shmem Origin 2000 - NCSA SP3 - Blue Horizon (within node) SP3 - Seaborg (within node) SP3 - Blue Horizon (between nodes) SP3 - Seaborg (between nodes - us) SP3 - Seaborg (between nodes - ip) 0 20 50 40 47 60 Bad 100

AMMPI Bandwidth MB/sec

150 200 112 80 93 170 167 Good 250 240 267 300 with 64 KB messages == MAX_MEDIUM == MAX_LONG

AMMPI – Raw Performance Data

ROCKS MPI-Myrinet (naïve) ROCKS MPI-Myrinet (pre-post recv) ROCKS MPI-Myrinet (full) Millennium MPI-TCP (Gigabit Ethernet) Millennium *

AMUDP* (Gigabit Ethernet)

NOW MPI-AM-Myrinet Cray MPI-shmem Origin 2000 - NCSA Round-trip Latency (us) 35 33 28 320 120 115 36 46 SP3 - Blue Horizon (within node) SP3 - Seaborg (within node) SP3 - Blue Horizon (between nodes) SP3 - Seaborg (between nodes - us) SP3 - Seaborg (between nodes - ip) 31 32 56 58 273 Pipelined Inverse Throughput (us) Bandwidth (MB/s) - 64 KB msgs 17 229 27 49 32 27 18 20 51 27 152 112 20 60 40 93 80 267 240 170 167 47 notes buffered send, probe/block recv buffered send, pre-posted non-blocking recv latency highly variable, no MPI-Myrinet yet (Gigabit upgrade - used to get 12 MB/sec and 250-300 us on 100 Mbit) AM-over-MPI-over-AM-over-Myrinet (native AM is 20-30 us RT latency, hardw are B/W 40MB/sec)