Evaluation of Java Message Passing in High Performance Data Analytics Saliya Ekanayake Overview • Performance of MPI Kernel Operations • Implementations based on Ohio MicroBenchmark.
Download ReportTranscript Evaluation of Java Message Passing in High Performance Data Analytics Saliya Ekanayake Overview • Performance of MPI Kernel Operations • Implementations based on Ohio MicroBenchmark.
Evaluation of Java Message Passing in High Performance Data Analytics Saliya Ekanayake Overview • Performance of MPI Kernel Operations • Implementations based on Ohio MicroBenchmark suite • Evaluates MPI allreduce, and send and receive • Performance of Deterministic Annealing Vector Sponge • Performance with pure MPI and MPI + threads • Threads come from Habanero Java library • Terms • • • • • OMB – Ohio MicroBenchmark suite DAVS – Deterministic Annealing Vector Sponge OMPI-trunk – OpenMPI source tree revision 30301 OMPI-nightly – OpenMPI nightly snapshop verison 1.9a1r28881 FG – FutureGrid Performance of MPI Kernel Operations 10000 MPI.NET C# in Tempest FastMPJ Java in FG OMPI-nightly Java FG OMPI-trunk Java FG OMPI-trunk C FG MPI.NET C# in Tempest FastMPJ Java in FG OMPI-nightly Java FG OMPI-trunk Java FG OMPI-trunk C FG 5000 Performance of MPI send and receive operations 10000 4MB 1MB 256KB 64KB 16KB 4KB 1KB 64B 16B 256B Message size (bytes) Performance of MPI allreduce operation 1000000 OMPI-trunk C Madrid OMPI-trunk Java Madrid OMPI-trunk C FG OMPI-trunk Java FG 1000 5 4B Average time (us) 512KB 128KB 32KB 8KB 2KB 512B Message size (bytes) 128B 32B 8B 2B 1 0B Average time (us) 100 OMPI-trunk C Madrid OMPI-trunk Java Madrid OMPI-trunk C FG OMPI-trunk Java FG 10000 Performance of MPI send and receive on Infiniband and Ethernet Message Size (bytes) 4MB 1MB 256KB 64KB 16KB 4KB 1KB 256B 64B 1 16B 512KB 128KB Message Size (bytes) 32KB 8KB 2KB 512B 128B 32B 8B 2B 0B 1 100 4B 10 Average Time (us) Average Time (us) 100 Performance of MPI allreduce on Infiniband and Ethernet DAVS Performance MPI.NET 0.3 MPI.NET OMPI-trunk 0.6 OMPI-nightly OMPI-trunk 0.25 0.2 MPI.NET 5 OMPI-nightly 4.5 OMPI-trunk 4 3.5 0.15 0.4 5.5 Speedup OMPI-nightly 0.8 Time (hours) 1 0.1 0.2 0.05 0 0 3 2.5 2 1.5 1 1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2 1x8x1 2x1x8 4x1x8 8x1x8 1x2x8 4x2x8 1x4x8 2x4x8 1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2 1x8x1 TxPxN TxPxN TxPxN DAVS Charge5 performance DAVS Charge5 performance w/ threads 30 5.5 5 MPI.NET 25 4.5 OMPI-nightly Time (hours) 15 4.5 OMPI-nightly 3.5 3 OMPI-trunk 2.5 10 2 0 4 MPI.NET OMPI-nightly OMPI-trunk 3.5 3 2.5 1.5 5 5 MPI.NET 4 Time (hours) OMPI-trunk 20 DAVS Charge5 speedup Speedup Time (hours) 6 0.35 1.2 1 2 0.5 1.5 0 1 1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2 2x1x8 4x1x8 8x1x8 1x2x8 4x2x8 1x4x8 2x4x8 1x8x8 1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2 TxPxN TxPxN TxPxN DAVS Charge2 performance DAVS Charge2 performance w/ threads DAVS Charge2 speedup DAVS Performance on Single Node 35.00 140 OMPI-trunk Madrid 120 100 OMPI-trunk FG 120 OMPI-trunk FG 20.00 MPI.NET Tempest 100 MPI.NET Tempest 15.00 10.00 80 60 TxPxN DAVS Charge2 performance on single node 0 OMPI-trunk FG MPI.NET Tempest 80 60 20 20 1x1x1 MPI.NET Madrid 40 40 5.00 0.00 Time (s) 25.00 Time (s) 30.00 Time (hours) 140 160 OMPI-trunk Madrid 0 1x1x1 TxPxN DAVS Charge6 performance on single node 1x4x1 TxPxN DAVS Charge6 performance on single node with multiple processes