Transcript Document

ENHANCING PVM 3.4.4
for
LINUX CLUSTERS
Presented by
Saratnath Buddhiraju (98520)
Radhika D’souza
(98530)
PVM Design Features
1. Daemons communicate using UDP based protocol.
2. Task communicates with local daemon using TCP or UNIX
Sockets
3. Application provided with message passing primitives.
4. PVM daemons route the messages
PVM System Components.
Master PVM daemon
• First PVM daemon started up manually
• Responsible for starting slave daemons and makes the VM.
• Adds and deletes nodes dynamically from the VM.
• Also, performs the normal daemon functions.
PVM daemon
• One per each node on the cluster.
• Receives requests for VM operations from tasks running on
that node.
• Daemons collectively route messages to tasks across the VM
• Request for VM configuration operations such as adding new
hosts, etc routed to the master PVM daemon.
• Detect node failures and recover from non-master PVM
daemon deaths.
… PVM System Components
LIBPVM
•
Library providing applications with an interface to the
VM.
•
Translates high level PVM system calls to request
messages to be sent to the VM.
•
Communicates with the local PVM daemon through IPC
mechanisms.
Applications
•
Make calls to routines in libpvm.
•
Typically spawn tasks on remote machines and
communicate with them using message passing
primitives offered by PVM.
•
Completely shielded from network addresses etc. as
PVM provides unique, global task identifiers.
PVM System Components …
Master daemon
daemon
IPC
Node 3
N/W
Node 1
direct
connection
daemon
LIBPVM
task
Node 2
PVM design features
Most design choices reflect upon the design goals of
a) System scalability
b) Support for heterogeneity
1. Inter daemon communication using UDP ensures
scalability allowing up to a 100 hosts.
2. Unix domain sockets / TCP sockets / Shared memory
used for communication between LIBPVM (application
task) and the local PVM daemon.
3. Wait queues maintained across the system for various
PVM requests. Thus no part of the system blocks waiting
for a service to be performed for a particular task.
4. Support for nodes of varied architectures hosting
different OS’ to work together on a single cluster.
Limitations in PVM
1. Round robin scheduling algorithm often
result in sub optimal utilization of cluster
resources.
2. Scope for optimization in communication sub
system
3. Lack of parallel input makes PVM unviable
for Generalized Distributed Computing.
4. System crashes on master PVMD failure.
5. Lack of file system support.
6. Several architecture specific advantages not
utilized for need of support for heterogeneous
computing.
Optimizing inter - daemon communication
PVM daemons communicate using UDP.
Every “data gram received” needs to be acknowledged.
Send packet + explicit acknowledgement --> doubles the number of
packets sent.
PVMD – PVMD packet header
{
Destination task id
Source task id
Sequence number
Acknowledgement
Flags
}
4 bytes
4 bytes
2 bytes
2 bytes
1 byte
Optimization
When a packet is received from host A ….
1. Find an outgoing packet addressed to A in global output queue
2. If not found, find outgoing packet in per-host output queue
3. If found, set acknowledgement bit in packet header’s flag, set
acknowledgement number to received packets sequence
number.
else
Create new explicit acknowledgement packet.
~ to be tested using the ping-pong test.
Distributed Scheduling
•
•
Scheduler to use a Weighted Round-Robin algorithm
Each node to be assigned a weight
Parameters to be considered
1.CPU clock speed
2. No. of processors per node
3. Available RAM
4. No. of running processes on a node
5. Priorities of running processes.
Parameters ignored
1. Cache size
2. Swap space
3. Network Traffic
Results of study …
Goodness of a node for scheduling a CPU bound process is
•
Directly proportional to …
~
No. of CPU’s on the node
~
CPU clock speed
•
Inversely proportional to
~
No. of process in Run state.
~
k^(p – pj)
for each running process j
Where p is nice priority of the process.
•
When the cumulative Resident Set Size of all
processes exceeds 90% of total RAM, swapping
ensues, there by causing a severe slowdown in
process execution time.
Distributed Scheduling …
1. PVM daemons obtain local load information from
each node by at predefined intervals by
~
parsing /proc file system
~
SYSINFO( ) system calls
2. Distribute collected load across the cluster to
remote PVM daemons.
3. Calculate goodness value of nodes using collected
information.
Scheduling …
No. of tasks spawned per node is proportional to
its goodness value.
Setting up a PVM cluster …
Requirements …
1.
2.
3.
4.
All nodes to accessible from a network
Support for TCP,UDP over IP communication suites.
C , FORTRAN compilers
User accounts on all the systems.
Procedure …
•
•
•
Build PVM source code on each node of the cluster.
Set environment variables PVM_ROOT,
PVM_ARCH, etc. in shell startup script.
Allow access to “r”commands - rsh, rlogin without
need for typing passwords.
(changes to be made in PAM configuration)
Our cluster comprises of …
3 Linux servers
• 172.16.5.7
• 172.16.5.8
• 172.16.5.15
to be added …
• A Linux server running on an IBM dual processor
• Solaris server.
Applications
Matrix multiplication …
1.
2.
3.
4.
Each element can be computed independently
Sub matrices assigned to nodes across the cluster
Results recombined to obtain matrix product.
Achieved up to 40 % speedup on a 2-node cluster.
Differential Equation Solutions ...
1. A key component of the LINPACK test used in
benchmarking and comparing clusters of different
architectures.
References
David Browning, “Embarrassingly Parallel Benchmark under PVM”, Computer
Science Corp., NASA Ames Research Labs.
“ ...Instead, proper load balancing is shown to be a critical issue when resource
availability is not know a priori. Because the EP benchmark is
computationally intensive and requires no communication dynamic load
balancing can be implemented very easily and would effectively reduce
bottlenecks …”
J.Dongara et al “ PVM- experiences, current status and Future Directions”,
ORNL- Tenessee.
" ... A distributed scheduler for example will be a separate module with a well
defined interface to the Concurrent Process Environment of which PVM will
form the kernel. ... “
Begiulim, J.Dongara, A. Geist, V,Sunderam, “ A Users Guide to PVM- Parallel
Virtual Machine”- ORNL
“Building Linux Clusters” – O-Reilly