Evaluation of Java Message Passing in High Performance Data Analytics Saliya Ekanayake Overview • Performance of MPI Kernel Operations • Implementations based on Ohio MicroBenchmark.

Download Report

Transcript Evaluation of Java Message Passing in High Performance Data Analytics Saliya Ekanayake Overview • Performance of MPI Kernel Operations • Implementations based on Ohio MicroBenchmark.

Evaluation of Java Message
Passing in High Performance
Data Analytics
Saliya Ekanayake
Overview
• Performance of MPI Kernel Operations
• Implementations based on Ohio MicroBenchmark suite
• Evaluates MPI allreduce, and send and receive
• Performance of Deterministic Annealing Vector Sponge
• Performance with pure MPI and MPI + threads
• Threads come from Habanero Java library
• Terms
•
•
•
•
•
OMB – Ohio MicroBenchmark suite
DAVS – Deterministic Annealing Vector Sponge
OMPI-trunk – OpenMPI source tree revision 30301
OMPI-nightly – OpenMPI nightly snapshop verison 1.9a1r28881
FG – FutureGrid
Performance of MPI Kernel Operations
10000
MPI.NET C# in Tempest
FastMPJ Java in FG
OMPI-nightly Java FG
OMPI-trunk Java FG
OMPI-trunk C FG
MPI.NET C# in Tempest
FastMPJ Java in FG
OMPI-nightly Java FG
OMPI-trunk Java FG
OMPI-trunk C FG
5000
Performance of MPI send and receive operations
10000
4MB
1MB
256KB
64KB
16KB
4KB
1KB
64B
16B
256B
Message size (bytes)
Performance of MPI allreduce operation
1000000
OMPI-trunk C Madrid
OMPI-trunk Java Madrid
OMPI-trunk C FG
OMPI-trunk Java FG
1000
5
4B
Average time (us)
512KB
128KB
32KB
8KB
2KB
512B
Message size (bytes)
128B
32B
8B
2B
1
0B
Average time (us)
100
OMPI-trunk C Madrid
OMPI-trunk Java Madrid
OMPI-trunk C FG
OMPI-trunk Java FG
10000
Performance of MPI send and receive on
Infiniband and Ethernet
Message Size (bytes)
4MB
1MB
256KB
64KB
16KB
4KB
1KB
256B
64B
1
16B
512KB
128KB
Message Size (bytes)
32KB
8KB
2KB
512B
128B
32B
8B
2B
0B
1
100
4B
10
Average Time (us)
Average Time (us)
100
Performance of MPI allreduce on Infiniband
and Ethernet
DAVS Performance
MPI.NET
0.3
MPI.NET
OMPI-trunk
0.6
OMPI-nightly
OMPI-trunk
0.25
0.2
MPI.NET
5
OMPI-nightly
4.5
OMPI-trunk
4
3.5
0.15
0.4
5.5
Speedup
OMPI-nightly
0.8
Time (hours)
1
0.1
0.2
0.05
0
0
3
2.5
2
1.5
1
1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2 1x8x1
2x1x8 4x1x8 8x1x8 1x2x8 4x2x8 1x4x8 2x4x8
1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2 1x8x1
TxPxN
TxPxN
TxPxN
DAVS Charge5 performance
DAVS Charge5 performance w/ threads
30
5.5
5
MPI.NET
25
4.5
OMPI-nightly
Time (hours)
15
4.5
OMPI-nightly
3.5
3
OMPI-trunk
2.5
10
2
0
4
MPI.NET
OMPI-nightly
OMPI-trunk
3.5
3
2.5
1.5
5
5
MPI.NET
4
Time (hours)
OMPI-trunk
20
DAVS Charge5 speedup
Speedup
Time (hours)
6
0.35
1.2
1
2
0.5
1.5
0
1
1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2
2x1x8 4x1x8 8x1x8 1x2x8 4x2x8 1x4x8 2x4x8 1x8x8
1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2
TxPxN
TxPxN
TxPxN
DAVS Charge2 performance
DAVS Charge2 performance w/ threads
DAVS Charge2 speedup
DAVS Performance on Single Node
35.00
140
OMPI-trunk Madrid
120
100
OMPI-trunk FG
120
OMPI-trunk FG
20.00
MPI.NET Tempest
100
MPI.NET Tempest
15.00
10.00
80
60
TxPxN
DAVS Charge2 performance on
single node
0
OMPI-trunk FG
MPI.NET Tempest
80
60
20
20
1x1x1
MPI.NET Madrid
40
40
5.00
0.00
Time (s)
25.00
Time (s)
30.00
Time (hours)
140
160
OMPI-trunk Madrid
0
1x1x1
TxPxN
DAVS Charge6 performance on single
node
1x4x1
TxPxN
DAVS Charge6 performance on
single node with multiple processes