MPI without tight integration

Download Report

Transcript MPI without tight integration

MPI without tight integration
Node 01
Node 02
<pbs_mom> tracked session
<sshd/rshd>
<Job script>
#PBS –l select=3:ncpus=2:mpiprocs=2
…
…
mpirun -hostfile $PBS_NODEFILE a.out
a.out
<pbs_mom>
a.out
a.out
Node 03
a.out
<sshd/rshd>
ssh/rsh
node02
ssh/rsh
node03
a.out
<pbs_mom>
a.out
PBS do not know about processes on nodes 02 and 03 because processes there get
generated outside of PBS scope.
MPI with pbs_remsh/pbs_attach
Node 02
Node 01
<sshd/rshd>
pbs_attach
pbs_attach
<pbs_mom> tracked session
a.out
a.out
<Job script>
#PBS –l select=3:ncpus=2:mpiprocs=2
…
…
mpirun -r pbs_remsh -hostfile $PBS_NODEFILE a.out
<pbs_mom>
a.out
Node 03
<sshd/rshd>
a.out
pbs_attach
ssh/rsh
node02
ssh/rsh
node03
a.out
pbs_attach
a.out
<pbs_mom>
pbs_remsh (see inside it) launches something like “ssh nodeXX pbs_attach –j JOBID
a.out”, informing pbs_mom on the machine process a.out being launched belongs to
JOBID.
MPI with pbs_tmrsh
Node 02
Node 01
a.out
<pbs_mom> tracked session
<pbs_mom>
<Job script>
#PBS –l select=3:ncpus=2:mpiprocs=2
…
…
mpirun -r pbs_remsh -hostfile $PBS_NODEFILE a.out
a.out
a.out
Node 03
a.out
a.out
pbs_tmrsh
node02
<pbs_mom>
pbs_tmsh
node03
a.out
pbs_tmrsh talks directly to pbs_mom, using PBS task management library and a.out
processes are launched by pbs_mom directly.