Technology Evaluation of Agent Teamwork High Performance Distributed Computing Middleware Terminology Grid vs. Cluster A computing grid is commonly distinguished from a computing cluster by the geographic.

Download Report

Transcript Technology Evaluation of Agent Teamwork High Performance Distributed Computing Middleware Terminology Grid vs. Cluster A computing grid is commonly distinguished from a computing cluster by the geographic.

Technology
Evaluation of Agent Teamwork
High Performance Distributed Computing
Middleware
Terminology
Grid vs. Cluster
A computing grid is commonly distinguished from a computing cluster
by the geographic distance between members. A cluster would be a
group of computers in the same room or building and connected to the
same physical network, while the members of grid could be located
anywhere and may connected over several different networks.
.
Solomon Lane Agent Teamwork Research Assistant
October 2006 – March 2007
AgentTeamwork
Platform
I define an HPDC platform as software that provides Infrastructure and
Scheduling services. Infrastructure services include authentication and
authorization, job submission, and file transfer for job deployment.
Scheduling services include dynamic resource identification and
allocation, scheduling policies, and coordinating job execution.
Framework
I define a framework as a related set of software libraries that are used to
write software in a particular programming model. The Single Program
Multiple Data (SPMD) programming model is commonly used to
achieve data level parallelism in HPDC. MPIJava is a Java
implementation of the Message Passing Interface standard which
provides a framework for programming in the SPMD model.
Breadth First
My goal as a research assistant was to
evaluate Agent Teamwork’s “Job Dispatch &
Termination” and “Framework” performance
against a contemporary alternative.
Seconds
Overview
50
45
40
35
30
25
20
15
10
5
0
Job Dispatch and Termination Performance
Agent Teamwork VS. Globus/OpenPBS
2
4
8
16
24
32
40
48
Agent Teamwork
AgentTeamwork is a mobile-agent-based job coordination system that
targets a mixture of computing nodes, some directly connected to the
public Internet, and others simply clustered in a private IP domain but
not managed by a commodity job scheduler.1
64
Nodes
Job Dispatch & Termination Evaluation:
I built a reference platform to compare Agent
Teamwork against by integrating the Globus
Toolkit with the OpenPBS scheduler and the
MPICH-G2 MPI framework.
The Clusters
Results:
Agent Teamwork
Phoebe Depth First
Medusa Depth First
25
30
20
25
Seconds
Seconds
Framework Function Evaluation:
To evaluate the framework performance I
wrote three benchmark programs in the
Agent Teamwork MPI framework and the
MPIJava framework and compared their
runtimes.
Globus Median of 5 Runs
15
10
5
20
15
10
5
0
0
1
2
4
8
16
24
32
40
48
64
1
2
4
8
16
24
32
40
48
Nodes
Nodes
Globus Median of 5 Runs
Globus Toolkit
The Globus Toolkit is an open source software toolkit used for building
Grid systems and applications.2
Agent Teamwork
Globus Median of 5 Runs
Agent Teamwork
64
OpenPBS
OpenPBS is the original version of the Portable Batch System. It is a
flexible batch queueing system developed for NASA in the early to mid1990s3. The purpose of the OpenPBS system is to provide additional
controls over initiating or scheduling execution of batch jobs; and to
allow routing of those jobs between different hosts.4
These graphs compare job
dispatch & termination time when
submitting a test program to
different numbers of cluster nodes
in either a depth or breadth first
distribution. Agent Teamwork’s job
dispatch and termination
performance was comparable with
the reference platform in the depth
first distribution And agent
teamwork outperformed the
reference platform with a large
number of nodes in a breadth first
distribution.
Message Passing Interface (MPI)
MPI is a library specification for message-passing, proposed as a
standard by a broadly based committee of vendors, implementors, and
users. MPI was designed for high performance on both massively
parallel machines and on workstation clusters.5
MPICH-G2
A grid-enabled implementation of the MPI v1.1 standard. It uses
services from the Globus Toolkit (e.g., job startup, security), MPICH-G2
allows you to couple multiple machines, potentially of different
architectures, to run MPI applications.6
MPIJava
mpiJava is an object-oriented Java interface to the standard Message
Passing Interface (MPI).7
Reference Platform Overview
1 Fault-Tolerant
Job Execution over Multi-Clusters using Mobile
agents, Munehiro Fukuda gca07.pdf
2 http://www.globus.org/
3 http://www.openpbs.org/about.html
4 Overview of the OpenPBS, http://www.openpbs.org/overview.html
5 What is MPI, http://www-unix.mcs.anl.gov/mpi/
6 What is MPICH-G2 http://www3.niu.edu/mpi/
7 http://www.hpjava.org/mpiJava.html
The following tables describe the hardware that was used. There were a total of
66 machines divided into two clusters.
Phoebe Cluster
a 32-node cluster for research
use
Head Node:
specification outbound
1.8GHz Xeon x2, 512MB
memory, and 70GB HD 100Mbps
a 32-node cluster for instructional
use
Head node:
specification outbound
1.5 memory, and 40GB HD
100Mbps
Computing nodes:
#nodes specification inbound
24 3.2GHz Xeon, 512MB
memory, and 36GB HD 1Gbps
8 2.8GHz Xeon, 512MB memory,
and 60GB HD 2Gbps
Computing nodes:
#nodes specification inbound
16 1.5GHz Xeon, 512MB
memory, and 30GB HD 100Mbps
16 1.5GHz Xeon, 512MB
memory, and 30GB HD 1Gbps
Framework Execution
Performance
Agent Teamwork VS.
MPIJava
Wave2D Depth First
1000
Seconds
Medusa Cluster
100
10
1
2
3
In order to run a job you generate a job definition file
using the Resource Specification Language (RSL) and
submit it along with your user certificate using globusrun.
The gram client submits the job to a gatekeeper on the
cluster head, which uses the GSI to authenticate and
authorize the job submission. It then starts a job
manager which issues a callback to the gram client to
connect std error and std out back to the client. The job
manager then submits the job details to the PBS Server.
The PBS Scheduler selects appropriate nodes from the
cluster and transfers the executable to the PBS mom on
the cluster nodes. The PBS mom launches the application.
Applications are written in the MPICH-G2 framework which
uses the grid infrastructure to coordinate the parallel
execution.
Framework Results:
Currently two of the Agent Teamwork
versions of the benchmark programs
cannot be run across the clusters due to
outstanding bugs in the framework. One
of the benchmark programs, Wave2D,
was able to run on a limited number of
nodes. The graphs to the right show
these partial results which indicate that
the Agent Teamwork version is at least
one order of magnitude slower than
MPIJava. At this point however
framework debugging is ongoing.
4
8
14
Nodes
AgentTeamwork
MPIJava
Wave2D Breadth First
10000
Seconds
1
2
1000
100
10
1
2
4
8
Nodes
AgentTeamwork
MPIJava