Workload Management

Download Report

Transcript Workload Management

Workload Management
David Colling
Imperial College London
• Release 2 is not based on release 1
• Whole new architecture (pretty much
described in D1.4)
• More modular
• I have little practical experience of this
new architecture (yet).
So what is the new architecture?
See D1.4 for details…
The architecture
User Interface:
Although there have been several changes to
the architecture, the commands available at
the user end are (almost) the same… now
edg-job-submit etc
Also now apis
Network Server
The Network Server is a generic network
daemon, responsible for accepting incoming
requests from the UI (e.g. job submission, job
removal), which, if valid, are then passed to
the Workload Manager.
The architecture
Workload manager:
The Workload Manager is the core
component of the Workload Management
System. Given a valid request, it has to
take the appropriate actions to satisfy it.
To do so, it may need support from other
components, which are specific to the
different request types.
The architecture
Resource Broker:
This has been turned into one of the modules
that help the workload manager, actually 3 submodules…
• Matchmaking
• Ranking
• Scheduling
Job Adapter
The Job Adapter put the finishing touches to
the job’s jdl and creates the job wrapper.
The architecture
Job Controller and CondoG
Actually submit the job to the resources and
track progress.
So how does this all work…
Job submission example (for a “simple” job)
RB node
Replica
Catalog
Network
Server
UI
Workload
Manager
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
Job
Status
edg-job-submit myjob.jdl
RB node
Myjob.jdl
JobType = “Normal”;
Executable = "$(CMS)/exe/sum.exe";
InputData
= "LF:testbed0-00019";
ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN
Test Replica
Replica
Network dc=cnaf, dc=infn, dc=it";
Catalog,dc=sunlab2g,
Catalog
DataAccessProtocol
Server = "gridftp";
InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};
OutputSandbox = {“sim.err”, “test.out”, “sim.log"};
Requirements = other. GlueHostOperatingSystemName == “linux" &&
other. GlueHostOperatingSystemRelease == "Red Hat 6.2“ &&
other.GlueCEPolicyMaxWallClockTime > 10000;
Rank = other.GlueCEStateFreeCPUs;
Job submission
UI
Workload
Manager
UI: allows users to
access the functionalities
of the WMS
Inform.
Service
Job Description Languag
(JDL) to specify job
characteristics and
requirements
Job Contr.
CondorG
CE characts
& status
Computing
Element
submitted
SE characts
& status
Storage
Element
NS: network daemon
RB node
responsible for accepting
incoming requests
Job submission
Replica
Catalog
Network
Server
Job
Job
Status
UI
Input
Sandbox
files
RB
storage
Workload
Manager
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
submitted
waiting
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
UI
Job
RB
storage
WM: responsible to take
the appropriate actions to
satisfy the request
Workload
Manager
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
waiting
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
UI
Matchmaker
RB
storage
Workload
Manager
Where does this
job must be
executed ?
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
waiting
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Matchmaker: responsible
Server
UI
to find the “best” CE
where to submit a job
RB
storage
MatchMaker/
Broker
Workload
Manager
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
waiting
RB node
Job submission
Job
Status
Where are (which SEs)
the needed data ?
submitted
Replica
Catalog
Network
Server
MatchMaker/
Broker
UI
RB
storage
Workload
Manager
Job Contr.
CondorG
Inform.
Service
What is the
status of the
Grid ?
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
waiting
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
UI
Matchmaker
RB
storage
Workload
Manager
Inform.
Service
CE choice
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
waiting
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
UI
RB
storage
Workload
Manager
Inform.
Service
Job
Adapter
Job Contr.
CondorG
CE characts
“touches”
& status
JA: responsible for the final
to the job before performing submission
(e.g. creation of wrapper script, etc.)
Computing
Element
SE characts
& status
Storage
Element
waiting
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
waiting
UI
RB
storage
ready
Workload
Manager
Inform.
Service
Job
Job Contr.
CondorG
JC: responsible for the
actual job management
operations (done via
CondorG)
Computing
Element
CE characts
& status
SE characts
& status
Storage
Element
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
waiting
UI
RB
storage
ready
Workload
Manager
Inform.
Service
scheduled
Job Contr.
CondorG
Input
Sandbox
files
CE characts
& status
SE characts
& status
Job
Computing
Element
Storage
Element
Job
Status
RB node
Job submission
submitted
Replica
Catalog
Network
Server
waiting
UI
RB
storage
ready
Workload
Manager
Inform.
Service
scheduled
Job Contr.
CondorG
running
Input
Sandbox
“Grid enabled”
data transfers/
accesses
Computing
Element
Job
Storage
Element
Job
Status
RB node
Job submission
submitted
Network
Server
Replica
Catalog
UI
RB
storage
Workload
Manager
ready
Inform.
Service
Job Contr.
CondorG
scheduled
running
Output
Sandbox
files
Computing
Element
waiting
done
Storage
Element
Job submission
Job
Status
RB node
edg-job-get-output <dg-job-id>
submitted
Network
Server
Replica
Catalog
UI
RB
storage
Workload
Manager
ready
Inform.
Service
Job Contr.
CondorG
scheduled
running
Output
Sandbox
Computing
Element
waiting
done
Storage
Element
Job
Status
RB node
Job submission
submitted
Network
Server
Replica
Catalog
waiting
UI
Output
Sandbox
files
ready
RB
storage
Workload
Manager
Inform.
Service
Job Contr.
CondorG
scheduled
running
done
cleared
Computing
Element
Storage
Element
RB node
Logging and bookkeeping.
edg-job-status <dg-job-id>
UI
Network
Server
LB: receives and stores
job events; processes
corresponding job status
Workload
Manager
Job
status
Job Contr.
CondorG
Logging &
Bookkeeping
Log
Monitor
Log of
job events
LM: parses CondorG log
file (where CondorG logs
info about jobs) and notifies LB
Computing
Element
New functionality…
Release 2 of WP 1 software
New functionality includes:
• MPI job submission
•User APIs
•Accounting infrastructure (Management have
decided not to deploy this for testbed 2)
•Interactive job support
•Job logical checkpointing
New functionality…
All these are implemented…
Specify which sort of job using the JobType
classad e.g.
JobType = “Checkpointable”
However only tested on the WP 1 testbed as yet…
Don’t have time to go through all of these so will
just will just go through checkpointing.
Job checkpointing scenario
RB node
Network
Server
UI
Workload
Manager
Logging &
Bookkeeping
Server
Job Contr.
CondorG
Computing
Element X
Computing
Element Y
Job
Statu
s
edg-job-submit jobchkpt.jdl
RB node
jobchkpt.jdl
UI
[JobType = “Checkpointable”;
Executable = "hsum.exe";
StdOutput = Outfile;
InputSandbox = "/home/user/hsum.exe”,
Replica
Network
OutputSandbox
= “Outfile”,
Catalog
Requirements
= member("ROOT",
Server
other.GlueHostApplicationSoftwareRunTimeEnvironment) && member("CHKPT",
other.GlueHostApplicationSoftwareRunTimeEnvironment);
Rank = -other.GlueCEStateEstimatedResponseTime;]
Workload
Manager
UI: allows users to
access the functionalities
of the WMS
Computing
Element X
Job Contr.
CondorG
submitted
Logging &
Bookkeeping
Server
Job Description Languag
(JDL) to specify job
characteristics and
requirements
Computing
Element Y
Job
Status
RB node
submitted
1
Network
Server
Job
UI
1
2
Input
Sandbox
files
RB
storage
3
Job Contr.
CondorG
6
Logging &
Bookkeeping
Server
4
Job
waiting
Matchmaker
Workload
Manager
5
Input
Sandbox
files
Job
Job
Adapter
ready
scheduled
running
6
Job
Computing
Element X
Computing
Element Y
Job
Job
Status
RB node
submitted
Network
Server
waiting
UI
RB
storage
Workload
Manager
Job Contr.
CondorG
From time to time
user’s job asks to save
the intermediate state
Computing
Element X
Logging &
Bookkeeping
Server
…
<save intermediate files>;
State.saveValue(“var1”, value1>;
…
State.saveValue(“varn”, valuen);
State.saveState();
…
ready
scheduled
running
Computing
Element Y
Job
Job
Status
RB node
submitted
Network
Server
waiting
UI
RB
storage
Workload
Manager
Logging &
Bookkeeping
Server
Job Contr.
CondorG
Saving of
intermediate files
Computing
Element X
ready
scheduled
running
Saving of
job state
Computing
Element Y
Job
Job
Status
RB node
submitted
Network
Server
waiting
UI
RB
storage
ready
Workload
Manager
Logging &
Bookkeeping
Server
Job Contr.
CondorG
scheduled
running
done (failed)
Job fails
(e.g. for a CE
problem)
Computing
Element X
Job
Computing
Element Y
Job
Status
RB node
submitted
Network
Server
UI
waiting
Matchmaker
RB
storage
Workload
Manager
Job Contr.
CondorG
Reschedule
and resubmit job
ready
Logging &
Bookkeeping
Server
Where must this
job be
executed ? Possibly
on a different CE where
the job was previously submitted …
scheduled
running
done (failed)
Job
waiting
Computing
Element X
Computing
Element Y
Job
Job
Status
RB node
submitted
Network
Server
UI
waiting
Matchmaker
RB
storage
Workload
Manager
Job Contr.
CondorG
ready
CE choice:
CEy
Logging &
Bookkeeping
Server
scheduled
running
done (failed)
waiting
Computing
Element X
Computing
Element Y
Job
Status
RB node
Network
Server
ready
UI
scheduled
RB
storage
Workload
Manager
Job
Logging &
Bookkeeping
Server
running
Job
Adapter
Job Contr.
CondorG
done (failed)
CE characts
& status
waiting
ready
Computing
Element X
Computing
Element Y
Job
Status
RB node
ready
Network
Server
scheduled
UI
RB
storage
Workload
Manager
Logging &
Bookkeeping
Server
running
done (failed)
Job Contr.
CondorG
Input
Sandbox
files
waiting
ready
Job
scheduled
Computing
Element Y
Computing
Element X
Job
Job
Status
RB node
scheduled
Network
Server
running
UI
RB
storage
Workload
Manager
Job Contr.
CondorG
Logging &
Bookkeeping
Server
Retrieval of last saved
state when job starts
done (failed)
waiting
ready
Retrieval of
intermediate files
(previously saved)
scheduled
running
Computing
Element Y
Computing
Element X
Job
Job
Status
RB node
scheduled
Network
Server
running
UI
RB
storage
Workload
Manager
Logging &
Bookkeeping
Server
Job Contr.
CondorG
Job keeps running
done (failed)
waiting
ready
starting from the point
corresponding to the
retrieved state (doesn’t need
to start from the beginning)
scheduled
running
Computing
Element Y
Computing
Element X
Job
Job
Further additional functionality
The order of implementation is not up to WP 1
people…
Dependent jobs:
Using Condor DAGMan
For example…
Further additional functionality
A=[
Executable = "A.sh";
PreScript = "PreA.sh";
PreScriptArguments = { "1" };
Children = { "B", "C" }
];
B=[
Executable = "B.sh";
PostScript = "PostA.sh";
PostScriptArguments = { "$RETURN" };
Children = { "D" }
];
C=[
Executable = "C.sh";
Children = { "D" }
];
D=[
Executable = "D.sh";
PreScript = "PreD.sh";
PostScript = "PostD.sh";
PostScriptArguments = { "1", "a" }
]
Further additional functionality
Job partitioning will be similar to checkpointing,
with the jobs being partitioned according to some
variable.
Partitioned jobs will also have a pre-job and
aggregator
e.g.
Further additional functionality
JobType = Partitionable;
Executable = ...;
JobSteps = ...;
StepWeight = ...;
Requirements = ...;
...
...
Prejob =
[
Executable = ...
Requirements = ...;
...
...
Aggregator =
[
Executable = ...
Requirements = ...;
...
...
];
Further additional functionality
Also planned is advanced reservation of
resources and co-location.
Much more monitoring and performance
quantification…
Summary
• New architecture has been implemented
• Lots of new functionality … but not stress tested
• Further functionality and performance
quantification implemented by testbed 3.
Further into the future…
EDG will not use OGSA, however the future is in
the OGSA grid world.
Work is being done at LeSC (See Steven
Newhouse’s talk tomorrow) to wrap the WP 1
components.
Communication via JDML and LBML
Virtualisation of RB through OGSA factory
Use virtualisation to load balance
Increase interoperability