Workpage 1 Middleware David Colling, Imperial College London, [email protected] I shall talk about release 2 as you should all know about release 1 by.

Download Report

Transcript Workpage 1 Middleware David Colling, Imperial College London, [email protected] I shall talk about release 2 as you should all know about release 1 by.

Workpage 1 Middleware
David Colling,
Imperial College London,
[email protected]
I shall talk about release 2 as you should all
know about release 1 by now!
With thanks …
Many slides have been taken from Massimo
Sgaravatto and were used in the EU review
The architecture
Completely new architecture:
The workload management system has been refactored to streamline the flow of job
information, therefore addressing problems and
shortcomings found with release 1.x.
The re-factored components also provide hooks
and features to support new functionality.
Best description is in deliverable D1.4
State at time
of review
The architecture
See D1.4 for details…
The architecture
User Interface:
Although there have been several changes to
the architecture, the commands available at
the user end are the same… so the new
architecture looks the same to the users.
Network Server
The Network Server is a generic network
daemon, responsible for accepting incoming
requests from the UI (e.g. job submission, job
removal), which, if valid, are then passed to
the Workload Manager.
The architecture
Workload manager:
The Workload Manager is the core
component of the Workload Management
System. Given a valid request, it has to
take the appropriate actions to satisfy it.
To do so, it may need support from other
components, which are specific to the
different request types.
The architecture
Resource Broker:
This has been turned into one of the modules
that help the workload manager, actually 3 submodules…
• Matchmaking
• Ranking
• Scheduling
Job Adapter
The Job Adapter put the finishing touches to
the job’s jdl and creates the job wrapper.
The architecture
Job Controller and CondoG
Actually submit the job to the resources and
track progress.
So how does this all work…
Job submission example (for a “simple” job)
RB node
Replica
Catalog
Network
Server
UI
Workload
Manager
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
Job
Status
edg-job-submit myjob.jdl
RB node
Myjob.jdl
JobType = “Normal”;
Executable = "$(CMS)/exe/sum.exe";
InputData
= "LF:testbed0-00019";
ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN
Test Replica
Replica
Network dc=cnaf, dc=infn, dc=it";
Catalog,dc=sunlab2g,
Catalog
DataAccessProtocol
Server = "gridftp";
InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};
OutputSandbox = {“sim.err”, “test.out”, “sim.log"};
Requirements = other. GlueHostOperatingSystemName == “linux" &&
other. GlueHostOperatingSystemRelease == "Red Hat 6.2“ &&
other.GlueCEPolicyMaxWallClockTime > 10000;
Rank = other.GlueCEStateFreeCPUs;
Job submission
UI
Workload
Manager
UI: allows users to
access the functionalities
of the WMS
Inform.
Service
Job Description Languag
(JDL) to specify job
characteristics and
requirements
Job Contr.
CondorG
CE characts
& status
Computing
Element
submitted
SE characts
& status
Storage
Element
NS: network daemon
RB node
responsible for accepting
incoming requests
Job submission
Replica
Catalog
Network
Server
Job
Job
Status
UI
Input
Sandbox
files
RB
storage
Workload
Manager
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
submitted
waiting
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
UI
Job
RB
storage
WM: responsible to take
the appropriate actions to
satisfy the request
Workload
Manager
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
waiting
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
UI
Matchmaker
RB
storage
Workload
Manager
Where does this
job must be
executed ?
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
waiting
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Matchmaker: responsible
Server
UI
to find the “best” CE
where to submit a job
RB
storage
MatchMaker/
Broker
Workload
Manager
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
waiting
RB node
Job submission
Job
Status
Where are (which SEs)
the needed data ?
submitted
Replica
Catalog
Network
Server
MatchMaker/
Broker
UI
RB
storage
Workload
Manager
Job Contr.
CondorG
Inform.
Service
What is the
status of the
Grid ?
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
waiting
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
UI
Matchmaker
RB
storage
Workload
Manager
Inform.
Service
CE choice
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
waiting
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
UI
RB
storage
Workload
Manager
Inform.
Service
Job
Adapter
Job Contr.
CondorG
CE characts
“touches”
& status
JA: responsible for the final
to the job before performing submission
(e.g. creation of wrapper script, etc.)
Computing
Element
SE characts
& status
Storage
Element
waiting
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
waiting
UI
RB
storage
ready
Workload
Manager
Inform.
Service
Job
Job Contr.
CondorG
JC: responsible for the
actual job management
operations (done via
CondorG)
Computing
Element
CE characts
& status
SE characts
& status
Storage
Element
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
waiting
UI
RB
storage
ready
Workload
Manager
Inform.
Service
scheduled
Job Contr.
CondorG
Input
Sandbox
files
CE characts
& status
SE characts
& status
Job
Computing
Element
Storage
Element
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
waiting
UI
RB
storage
ready
Workload
Manager
Inform.
Service
scheduled
Job Contr.
CondorG
running
Input
Sandbox
“Grid enabled”
data transfers/
accesses
Computing
Element
Job
Storage
Element
Job
Status
RB node
Job submission
submitted
Network
Server
Replica
Catalog
UI
RB
storage
Workload
Manager
ready
Inform.
Service
Job Contr.
CondorG
scheduled
running
Output
Sandbox
files
Computing
Element
waiting
done
Storage
Element
Job submission
Job
Status
RB node
edg-job-get-output <dg-job-id>
submitted
Network
Server
Replica
Catalog
UI
RB
storage
Workload
Manager
ready
Inform.
Service
Job Contr.
CondorG
scheduled
running
Output
Sandbox
Computing
Element
waiting
done
Storage
Element
Job
Status
RB node
Job submission
submitted
Network
Server
Replica
Catalog
waiting
UI
Output
Sandbox
files
ready
RB
storage
Workload
Manager
Inform.
Service
Job Contr.
CondorG
scheduled
running
done
cleared
Computing
Element
Storage
Element
RB node
Logging and bookkeeping.
edg-job-status <dg-job-id>
UI
Network
Server
LB: receives and stores
job events; processes
corresponding job status
Workload
Manager
Job
status
Job Contr.
CondorG
Logging &
Bookkeeping
Log
Monitor
Log of
job events
LM: parses CondorG log
file (where CondorG logs
info about jobs) and notifies LB
Computing
Element
Timescales and functionality…
Release 2 of WP 1 software is due J+27
New functionality to include:
• MPI job submission
•User APIs
•Accounting infrastructure
•Interactive job support
•Job logical checkpointing
Pretty much on time (says Massimo)
Will be tested against non-EDG resources using GLUE
Don’t have time to go through all of these so will
just will just go through checkpointing (as this was
shown in the review).
Job Checkpointing
Job checkpoint states
saved in the LB server
Retrieval
of job
checkpoint
Logging &
Bookkeeping
Server
Job
Also
info
used (even in rel. 1) as repository of job status
Already
The
proved to be robust and reliable
load can be distributed between multiple
LB servers, to address scalability problems
Saving of
job checkpoint state
state.saveState()
Job checkpointing scenario
RB node
Network
Server
UI
Workload
Manager
Logging &
Bookkeeping
Server
Job Contr.
CondorG
Computing
Element X
Computing
Element Y
Job
Statu
s
edg-job-submit jobchkpt.jdl
RB node
jobchkpt.jdl
UI
[JobType = “Checkpointable”;
Executable = "hsum.exe";
StdOutput = Outfile;
InputSandbox = "/home/user/hsum.exe”,
Replica
Network
OutputSandbox
= “Outfile”,
Catalog
Requirements
= member("ROOT",
Server
other.GlueHostApplicationSoftwareRunTimeEnvironment) && member("CHKPT",
other.GlueHostApplicationSoftwareRunTimeEnvironment);
Rank = -other.GlueCEStateEstimatedResponseTime;]
Workload
Manager
UI: allows users to
access the functionalities
of the WMS
Computing
Element X
Job Contr.
CondorG
submitted
Logging &
Bookkeeping
Server
Job Description Languag
(JDL) to specify job
characteristics and
requirements
Computing
Element Y
Job
Status
RB node
submitted
1
Network
Server
Job
UI
1
2
Input
Sandbox
files
RB
storage
3
Job Contr.
CondorG
6
Logging &
Bookkeeping
Server
4
Job
waiting
Matchmaker
Workload
Manager
5
Input
Sandbox
files
Job
Job
Adapter
ready
scheduled
running
6
Job
Computing
Element X
Computing
Element Y
Job
Job
Status
RB node
submitted
Network
Server
waiting
UI
RB
storage
Workload
Manager
Job Contr.
CondorG
From time to time
user’s job asks to save
the intermediate state
Computing
Element X
Logging &
Bookkeeping
Server
…
<save intermediate files>;
State.saveValue(“var1”, value1>;
…
State.saveValue(“varn”, valuen);
State.saveState();
…
ready
scheduled
running
Computing
Element Y
Job
Job
Status
RB node
submitted
Network
Server
waiting
UI
RB
storage
Workload
Manager
Logging &
Bookkeeping
Server
Job Contr.
CondorG
Saving of
intermediate files
Computing
Element X
ready
scheduled
running
Saving of
job state
Computing
Element Y
Job
Job
Status
RB node
submitted
Network
Server
waiting
UI
RB
storage
ready
Workload
Manager
Logging &
Bookkeeping
Server
Job Contr.
CondorG
scheduled
running
done (failed)
Job fails
(e.g. for a CE
problem)
Computing
Element X
Job
Computing
Element Y
Job
Status
RB node
submitted
Network
Server
UI
waiting
Matchmaker
RB
storage
Workload
Manager
Job Contr.
CondorG
Reschedule
and resubmit job
ready
Logging &
Bookkeeping
Server
Where must this
job be
executed ? Possibly
on a different CE where
the job was previously submitted …
scheduled
running
done (failed)
Job
waiting
Computing
Element X
Computing
Element Y
Job
Job
Status
RB node
submitted
Network
Server
UI
waiting
Matchmaker
RB
storage
Workload
Manager
Job Contr.
CondorG
ready
CE choice:
CEy
Logging &
Bookkeeping
Server
scheduled
running
done (failed)
waiting
Computing
Element X
Computing
Element Y
Job
Status
RB node
Network
Server
ready
UI
scheduled
RB
storage
Workload
Manager
Job
Logging &
Bookkeeping
Server
running
Job
Adapter
Job Contr.
CondorG
done (failed)
CE characts
& status
waiting
ready
Computing
Element X
Computing
Element Y
Job
Status
RB node
ready
Network
Server
scheduled
UI
RB
storage
Workload
Manager
Logging &
Bookkeeping
Server
running
done (failed)
Job Contr.
CondorG
Input
Sandbox
files
waiting
ready
Job
scheduled
Computing
Element Y
Computing
Element X
Job
Job
Status
RB node
scheduled
Network
Server
running
UI
RB
storage
Workload
Manager
Job Contr.
CondorG
Logging &
Bookkeeping
Server
Retrieval of last saved
state when job starts
done (failed)
waiting
ready
Retrieval of
intermediate files
(previously saved)
scheduled
running
Computing
Element Y
Computing
Element X
Job
Job
Status
RB node
scheduled
Network
Server
running
UI
RB
storage
Workload
Manager
Logging &
Bookkeeping
Server
Job Contr.
CondorG
Job keeps running
done (failed)
waiting
ready
starting from the point
corresponding to the
retrieved state (doesn’t need
to start from the beginning)
scheduled
running
Computing
Element Y
Computing
Element X
Job
Job
Job checkpointing example
int main ()
{
…
for (int i=event; i < EVMAX; i++)
{ < process event i>;}
...
exit(0); }
Example of
Application
(e.g. HEP MonteCarlo
simulation)
Job checkpointing example
#include "checkpointing.h"
int main ()
{ JobState state(JobState::job);
event = state.getIntValue("first_event");
PFN_of_file_on_SE = state.getStringValue("filename");
….
var_n = state.getBoolValue("var_n");
< copy file_on_SE locally>;
…
for (int i=event; i < EVMAX; i++)
{ < process event i>;
...
state.saveValue("first_event", i+1);
< save intermediate file on a SE>;
state.saveValue("filename", PFN of file_on_SE);
...
state.saveValue("var_n", value_n);
state.saveState(); }
…
exit(0); }
User code
must be easily
instrumented in order
to exploit the
checkpointing
framework …
Job checkpointing example
#include "checkpointing.h"
int main ()
{ JobState state(JobState::job);
event = state.getIntValue("first_event");
PFN_of_file_on_SE = state.getStringValue("filename");
….
var_n = state.getBoolValue("var_n");
< copy file_on_SE locally>;
…
for (int i=event; i < EVMAX; i++)
{ < process event i>;
...
state.saveValue("first_event", i+1);
< save intermediate file on a SE>;
state.saveValue("filename", PFN of file_on_SE);
...
state.saveValue("var_n", value_n);
state.saveState(); }
…
exit(0); }
•User defines what is a state
•Defined as <var, value> pairs
• Must be “enough” to restart a
computation from a
previously saved state
Job checkpointing example
#include "checkpointing.h"
int main ()
{ JobState state(JobState::job);
event = state.getIntValue("first_event");
PFN_of_file_on_SE = state.getStringValue("filename");
….
var_n = state.getBoolValue("var_n");
< copy file_on_SE locally>;
…
for (int i=event; i < EVMAX; i++)
{ < process event i>;
...
state.saveValue("first_event", i+1);
< save intermediate file on a SE>;
state.saveValue("filename", PFN of file_on_SE);
...
state.saveValue("var_n", value_n);
state.saveState(); }
…
exit(0); }
User can save
from time to time
the state of the job
Job checkpointing example
#include "checkpointing.h"
int main ()
{ JobState state(JobState::job);
event = state.getIntValue("first_event");
PFN_of_file_on_SE = state.getStringValue("filename");
….
var_n = state.getBoolValue("var_n");
< copy file_on_SE locally>;
…
for (int i=event; i < EVMAX; i++)
{ < process event i>;
...
state.saveValue("first_event", i+1);
< save intermediate file on a SE>;
state.saveValue("filename", PFN of file_on_SE);
...
state.saveValue("var_n", value_n);
state.saveState(); }
…
exit(0); }
Retrieval of the last saved state
The job can restart from that
point
Further additional functionality
The order of implementation is not up to WP 1
people…
Dependent jobs:
Using Condor DAGMan
Uses same jdl as normal jobs… for example
Further additional functionality
A=[
Executable = "A.sh";
PreScript = "PreA.sh";
PreScriptArguments = { "1" };
Children = { "B", "C" }
];
B=[
Executable = "B.sh";
PostScript = "PostA.sh";
PostScriptArguments = { "$RETURN" };
Children = { "D" }
];
C=[
Executable = "C.sh";
Children = { "D" }
];
D=[
Executable = "D.sh";
PreScript = "PreD.sh";
PostScript = "PostD.sh";
PostScriptArguments = { "1", "a" }
]
Further additional functionality
Job partitioning will be similar to checkpointing,
with the jobs being partitioned according to some
variable.
Partitioned jobs will also have a pre-job and
aggregator
e.g.
Further additional functionality
JobType = Partitionable;
Executable = ...;
JobSteps = ...;
StepWeight = ...;
Requirements = ...;
...
...
Prejob =
[
Executable = ...
Requirements = ...;
...
...
Aggregator =
[
Executable = ...
Requirements = ...;
...
...
];
Further additional functionality
There will also be advanced reservation of
resources and co-location.
Further into the future…
EDG will not use OGSA, however the future is in
the OGSA grid world.
Work is being done at LeSC (See Steven
Newhouse’s talk tomorrow) to wrap the WP 1
components.
Communication via JDML and LBML
Virtualisation of RB through OGSA factory
Use virtualisation to load balance
Increase interoperability
Summary
• The workload management middleware is being refactored addressing short comings in releases 1.x
• This allows additional functionality to be easily
incorporated and use of components by other
projects.
• OGSA is the future and work is being done to allow
the WP 1 components to work in such a world.