No Slide Title

Download Report

Transcript No Slide Title

Schedulers
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © 2009.
Chapter 3, pp. 65-99. For educational use only. All rights reserved.
Aug 26, 2009
3-1.1
Job Schedulers
• Assigns work (jobs) to compute resources to
meet specified job requirements within
constraints of available resources and their
characteristics
• An optimization problem.
• Objective usually to maximum throughput of
jobs.
3-1.2
Job scheduler
Fig 3.1
3-1.3
Scheduling Policies
Some traditional scheduling policies:
•
•
•
•
•
•
First-in, first-out (longest waiting job)
Favor certain types of jobs
Shortest job first
Smallest (or largest) memory first
Short (or long) running job first
Priority based
3-1.4
Job and resource matching
• Useful for a distributed heterogeneous computing
platform such as a Grid platform
• Found in schedulers we will describe
• Requires both the characteristics the job and the
resources to be described.
• For dynamic characteristics such as resource
load, a mechanism necessary for reporting
dynamic characteristics.
3-1.5
Types of jobs
1. Named executable that can execute on target
resources, possibly with named input and output
files.
2. OS (Linux) commands
3. Scripts
4. Programs that first need compiling.
– For Java, executable is JVM, input is java class file
5. Array jobs - multiple instances of same job
executable
6. Workflow – series of interdependent jobs
3-1.6
Batch jobs
• Most jobs expected to be batch jobs.
(Interactive jobs possible.)
• One of expected types of jobs are long
running unattended batch jobs.
• Standard input, standard output and
standard error often redirected to files.
3-1.7
Types of Compute Resources
• Usually local compute resources consist
individual computers, sometimes hundreds of
computers, connected together in a cluster.
• Such clusters have been around for many years
• Schedulers designed to handle cluster
configurations
3-1.8
Typical cluster configuration
Fig 3.2
3-1.9
Scheduling Compute Resources
Resource characteristics scheduler will consider:
• Static characteristics of machines
- Processor type, speed, number of cores, threads
- Main memory, cache memory, ...
• Dynamic machine conditions:
- Load on machine
- Available disk storage
- Network load
• Network connections and characteristics
• Characteristics of job:
- Code size, data, expected exec. time, memory requirements.
- Location of input files, output files, input/output staging
• User preferences/requirements
3-1.10
Monitoring Job Progress
Schedulers monitor job progress and report back to user.
Typically, job exists in one of various states as it go through
processing, e.g.:
Fig 3.3
3-1.11
Scheduler with automatic data placement
components
(Input/output staging)
e.g. Stork
Fig 3.4
3-1.12
Fault Toleranace
Checkpoint concept
Often available
in schedulers
Fig. 3.5
3-1.13
Advance reservation
• Term used for requesting actions at times
in future
• In this context, requesting a job to start at
some time in the future.
• Both computing resources and network
resources are involved
• Network connection usually being the
Internet is not reserved.
• Found in recent schedulers
3-1.14
Reasons one might want advance
reservation in Grid computing
• Reserved time chosen to reduce network or resource
contention.
• Resources not physically available except at certain times.
• Jobs require access to a collection of resources
simultaneously, e.g. data generated by experimental
equipment.
• A deadline for results of work
• Parallel programming jobs in which jobs must communicate
between themselves during execution.
• Workflow tasks in which jobs must communicate between
themselves during execution.
6d-1.15
• Without advance reservation, schedulers will
schedule jobs from a queue with no guarantee when
they actually would be scheduled to run.
Synchronization
• Critical distributed resources synchronized, i.e. they
all “see” the same time.
• Synchronization can be achieved by running a
Network Time Protocol (NTP) daemon
synchronizing time with a public time server.
6d-1.16
Resource Broker
Intermediary between user and resources that negotiates for
resources and brokers agreement. May involve negotiating cost.
Fig 3.6
3-1.17
Quiz
Give one reason why a scheduler or
resource broker is used in conjunction with
Globus:
(a)Globus does not provide the ability to
submit jobs.
(b)Globus does not provide the ability to make
advance reservations.
(c) No reason whatsoever.
(d) Globus does not provide the ability to
transfer files.
3-1.18
In the context of schedulers, what is meant
by the term “Advance Reservation”?
(a) Requesting an advance.
(b) Submitting a more advanced job.
(c) Move onto the next job.
(d) Requesting actions at a future time.
3-1.19
Scheduler Examples
• Sun Grid Engine
• Condor/Condor-G
3-1.20
(Sun) Grid Engine
Has all the usual features of a job scheduler
including:
•
•
•
•
•
•
•
•
•
•
Various scheduling algorithms
Job and machine matching
Multiple job queues
Checkpointing for fault tolerance and job migration
Multiple array jobs
Parallel job support
Advance reservation (from SGE version 6.2)
Accounting
Command line and GUI interfaces
DRMAA interface (see later)
3-1.21
SGE Command line interface
3-1.22
6d-1.23
qsub
Command to submit job.
Job specified as (shell) script, or as named
executable with the -b yes option:
Example
qsub -b y uptime
The immediate output of form:
Your job 238 ("uptime") has been submitted.
6d-1.24
qstat
Display status of jobs.
qstat issued after previous qsub might produce display:
qw indicates waiting in queue.
Once job completed, qstat will display nothing.
By default, standard input and standard output redirected to
files named
<job_name>.o<job-ID> and <job_name>.e<job_ID>.
6d-1.25
Grid Engine
Graphical User Interface
• Started with qmon command.
Fig. 3.7
3-1.26
Grid Engine job submission GUI interface
Fig. 3.8
3-1.27
Grid Engine job submission - advanced section
Fig. 3.9
3-1.28
Submitting a job through GRAM and
through an SGE scheduler
Fig. 3.10
3-1.29
Running Globus job with SGE scheduler
using globusrun-ws command
Scheduler selected by name using -Ff option (i.e.
factory type).
Name for Sun Grid Engine (obviously) is SGE.
Hence:
globusrun-ws –submit -Ft SGE -f prog1.xml
submits job described in job description file called
prog1.xml.
6d-1.30
Output
Note: the user
credentials have
to be delegated
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:d23a7be0-f87c-11d9-a53b-0011115aae1f
Termination time: 07/20/2008 17:44 GMT
Current job state: Active
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.
6d-1.31
Actual machine running job
Scheduler will choose machine that job is run on,
which can vary for each job. Hence
globusrun-ws –submit –s -Ft SGE –c /bin/hostname
submits executable hostname to SGE scheduler in
streaming mode redirecting output to console, with
usual Globus output.
Output: Hostname displayed as output will be that
of machine running job and may vary.
6d-1.32
Specifying Submit Host
Submit host and location for factory service can be
specified by using -F option, e.g.:
globusrun-ws –submit –s
-F http://coit-grid03.uncc.edu:8440
-Ft SGE –c /bin/hostname
6d-1.33
Condor
• Developed at University of Wisconsin-Madison in
mid 1980’s to convert collection of distributed
workstations and clusters into a high-throughput
computing facility.
• Key concept - using wasted computer power of
idle workstations.
• Hugely successful.
• Many institutions now operate Condor clusters.
6d-1.34
Condor
• Essentially a job scheduler
– jobs scheduled in background on distributed computers,
but without user needing an account on individual
computers.
• Users compile their programs for computers
Condor is going to use, and include Condor
libraries, which apart from other things handles
input and captures output.
• Job described in a job description file.
• Condor then ships job off to appropriate
computers.
6d-1.35
Ideal Use Case
• Executing long-running job multiple times with
different parameters (parameter-sweep problem)
– No communication between jobs
– Each job can be scheduled independently.
• If a single parameter sweep takes n hours on a
single computer.
• With p sweeps would take np hours.
• With m computers, it would take np/m hours
(where p is a multiple of m). m times faster.
6d-1.36
Submitting multiple parameter sweeps
across m computers
Fig. 3.11
3-1.37
Condor Features
• Includes:
– Resource finder
– Batch queue manager
– Scheduler
– Checkpoint/restart
– Process migration
3-1.38
Intended to run job even if:
•
•
•
•
•
•
Machines crash
Disk space exhausted
Software not installed
Machines needed by others
Machines managed by others
Machines far away
3-1.39
Condor Structure
• A collection of Condor machines called
a pool.
• Machines have one or more of 4 roles:
– One Central manager
– Submit machine(s)
– Execution machine(s)
– One Checkpoint server
3-1.40
Central Manager
• Resource broker for a pool.
• Keeps track of:
– Which machines available
– What jobs running
• Negotiates which machine will run
which job, etc.
• Only one central manager per pool.
3-1.41
Submit Machine
• A machine which can submit jobs to pool.
• Must be at least one submit machine in a
pool, and usually more than one.
Execute Machine
• A machine on which jobs can be run.
• Must be at least one execute machine in a
pool, and usually more than one.
3-1.42
Checkpoint Server
• Machine which stores checkpoint files
produced by job which checkpoint.
• Can only be one checkpoint machine in
a pool.
• Optional to have checkpoint machine.
3-1.43
Possible Configuration
• A central manager.
• Some machine that can only be submit
hosts.
• Some machine that can be only execute
hosts.
• Some machines that can be both submit
and execute hosts.
3-1.44
General Condor configuration
Fig 3.12
3-1.45
Internal steps to execute a job in Condor
Fig 3.13
3-1.46
Types of Jobs
• Classified according to environment it provides.
• Currently nine environments (Condor 7.0.1 released 2008):
– Vanilla
– Standard
– Java
– MPI (for legacy)
– Parallel
– Grid (or Globus depending on Condor version)
– Scheduler (possibly for legacy)
– Local
– VM (Virtual Machine)
3-1.47
Vanilla Universe
• For straightforward jobs written in compiled languages such
as C or C++ or pre-compiled applications, shell scripts and
Windows batch files.
• For jobs that cannot be compiled with Condor libraries.
Programs not compiled with Condor libraries.
• Does not provide for features such as checkpointing or
remote system calls.
• May be less efficient than under other universes, but it only
requires executable.
• Code restrictions such as on the use of system calls.
3-1.48
Example job submission
Condor has its own job description language to
describe job in a “submit description file”
Not in XML as predates XML
Simple Submit Description File Example
# This is a comment condor submit file for prog1 job
Universe = vanilla
Executable = prog1
Output = prog1.out
Error = prog1.error
Log = prog1.log
Queue
3-1.49
Submitting Multiple Jobs
Done by adding number after Queue command, i.e.:
Submit Description File Example
# condor submit file for program prog1
Universe = vanilla
Executable = prog1
Queue 500
will submit 500 identical prog1 jobs at once.
Can use multiple Queue commands with Arguments
for each instance.
3-1.50
Submitting job
condor_submit command
condor_submit prog1.sdl
where prog1.sdl is submit description file.
Without any other specification, Condor will attempt to
find suitable executable machine from all available.
Condor works with and without a shared file system.
Most local clusters set up with shared file system and
Condor will not need to explicitly transfer files.
3-1.51
Transferring files
When necessary to explicitly tell Condor to transfer files,
additional parameters included in submit description file:
# condor submit file for uptime
# with explicit file transfers
Universe = vanilla
Executable = uptime
Output = uptime.out
Error = uptime.error
Log = uptime.log
Should_transfer_files = YES
When_to_transfer_output = ON_EXIT
Queue
3-1.52
After submitting job, there will be a message that
job has been submitted, for example after:
condor_submit prog1.sdl
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 662.
3.1.53
Monitoring
Can query status of Condor queue with:
condor_q
Get output of form:
Queue
-- Submitter: coit-grid02.uncc.edu : <152.15.98.25:32821> :
ID
OWNER
SUBMITTED
RUN_TIME ST PRI SIZE CMD
.
662.0
abw
5/23 17:36
0+00:00:00 I 0
9.8 uptime
16 jobs; 1 idle, 0 running, 15 held
Status: H (hold), R (running), I (idle, waiting for machine), C
(Completed), U (unexpanded, never being run) or X (removed).
3.1.54
Standard Universe
• For jobs compiled with Condor libraries.
• Allows for checking pointing and remote
system calls.
• Must be single threaded.
• Not available under Windows.
3-1.55
Compiling in standard universe
condor_compile
Example to compile and link Condor libraries would be:
condor_compile cc -o prog1 prog1.c
Simplest submit job description would be:
#Simplest possible condor submit file for prog1 job
Executable = prog1
Queue
which would run job in standard universe as that is default.
All standard input/output/error (stdin, stdout, stderr) lost, or in
Linux jargon, redirected to /dev/null, without file transfer
commands in submit description file.
3-1.56
Checkpointing
• Certain jobs can checkpoint, both periodically for
safety and when interrupted.
• If checkpointed job interrupted, it will resume at
the last checkpointed state when it starts again.
• Generally no change to source code - need to link
Condor’s Standard Universe support library.
• Checkpointing disabled by including command
+WantCheckpoint = False
in submit description file before queue.
3-1.57
Java universe
For submitting Java programs run by Java Virtual Machine.
Executable is Java class file, as universe invokes JVM
automatically.
Example submit description file
# This is a comment condor submit file for java job
Universe = java
Executable = Prog1.class
Arguments = Prog1 1234
First argument must be
Output = prog1.out
name of class file to be
Error = prog1.error
executed by JVM
Log = prog1.log
Should_transfer_files = IF_NEEDED
When_to_transfer_output = ON_EXIT
Queue
3-1.58
Grid universe
Condor can be used as the environment for
Grid computing:
• Stand-alone without Grid middleware such
as Globus
or alternatively
• Integrated with the Globus toolkit.
3-1.59
Stand-alone Grid environment
Flocking
• Condor pools can be joined together in a process
called flocking in Condor
• Create Grid computing environment with different
pools under different administrative domains.
• Migration will occur if a suitable computer not
available in original pool.
• In Condor-C, jobs can move from one computer’s
job queue to another.
3-1.60
Stand-alone Grid environment
Condor Glidein mechanism
Enables computers to join a Condor pool temporally.
Condor command:
condor_glidein <contact argument>
where <contact argument> could be hostname or
job manager/scheduler or Globus resource.
Various options enable more information to be
passed.
3-1.61
Condor’s matchmaking mechanism
To chooses best computer to run the job
Condor ClassAd
Based upon notion that jobs and resources advertise
themselves in “classified advertisements”, which
include their characteristics and requirements.
Job ClassAd matched against resource ClassAd.
3-1.62
Example
A user might advertise his/her job :
My job needs an Intel Core 2 processor a speed of at
least 2 MHz (or equivalent Intel compatible
processor) and with at least 6 GB of main memory
and 1 TB of working disk space.
using job attributes of ClassAd (in submit description
file).
3-1.63
Compute resources advertise their capabilities, for
example:
I am a computer with an AMD Phenom processor
operating at a speed of 2.6 MHz with 256 GB of main
memory and 16 TB of working disk space.
using machine attributes of the resource ClassAd.
3-1.64
ClassAd Matchmaking Steps
1. Agents (jobs) and resources (computers)
advertise their characteristics and
requirements in “classified advertisements.”
2. Matchmaker scans ClassAds and creates
pairs that satisfy each others constraints
and preferences.
3. Matchmaker informs both parties of match.
4. Agent and resource make contact.
3-1.65
Condor’s ClassAd Matchmaking Mechanism
Fig 3.14
3-1.66
ClassAdd commands
MyType
Identifies type of ClassAd:
MyType = Job
or
MyType = Machine
3-1.67
ClassAdd commands
TargetType
Specifies what ClassAd is to match with:
TargetType = “Machine”
or
TargetType = “Job”
3-1.68
Machine ClassAdd
Set up during system configuration.
Some attributes provided by Condor but their values can be
dynamic and alter during system operation.
Machine attributes can describe such things as:
• Machine name
• Architecture
• Operating system
• Main memory available for job
• Disk memory available for job
• Processor performance
• Current load, etc
3-1.69
3-1.70
Job ClassAdd
Job is typically characterized by its resource
requirements and preferences.
May include:
–
–
–
–
What job requires
What job desires
What job prefers, and
What job will accept
using Boolean expressions.
These details put in submit description file.
3-1.71
3-1.72
Matchmaking commands
Requirements and Rank
Available for both job ClassAd and machine
ClassAd:
Requirements -- specify machine requirements.
Rank -- used to differentiate between multiple
machines that can satisfy requirements and can
identify a preference based upon a user criteria.
3-1.73
Requirements
Requirements = <Boolean Expression>
Use a C/Java-like Boolean expression that
evaluates to TRUE for a match.
3-1.74
Machine Requirements
A machine ClassAd might be:
MyType = “Machine”
TargetType = “Job”
Machine = coit-grid02.cs.uncc.edu
Arch = “INTEL”
OpSy = “LINUX”
Disk = 1000 * 1024
Memory = 100 * 1024
Requirements = (LoadAvg<=0.2)
3-1.75
Job Requirements
Example
My job needs a machine with at least 6 GB of main memory
and 25 MB of disk space.
ClassAd
MyType = “Job”
TargetType = “Machine”
Universe = ...
Executable = ...
Requirements = (memory == 6*1024) && (disk = 25 * 1024)
3-1.76
Rank
Rank = <number>
Computes to a floating point number.
Resource with highest rank chosen.
3-1.77
Job ClassAd’s Rank statement
• Can be used in job ClassAdd for selection
between compatible machines.
• Choose highest rank
Example
Machine
performance
Rank = (Memory * 10000) + KFlops
3-1.78
Rank
Sometimes just TRUE (1) or FALSE (0) is
sufficient for rank, i.e.:
Rank = (Target.Memory > 10000)
3-1.79
Machines Rank Statement
Can also be used in Machines ClassAd for
matchmaking.
Example
Rank = (Department == “Computer Science”)
where Department defined in job ClassAdd, say:
Department=“Computer Science”
3-1.80
Using rank in Machines ClassAd
Job ClassAd
[
MyType = “Job”
TargetType=“Machine”
Machines ClassAd
[
MyType=“Machine”
TargetType=“Job”
…
Department=“Computer
Science”
…
]
…
Rank = (Department ==
“Computer Science”)
…
]
3-1.81
Directed Acyclic Graph
Manager (DAGMan)
Meta-scheduler
Allows one to specify dependencies
between Condor Jobs.
3-1.82
Example
“Do not run Job B until Job A completed
successfully”
Especially important to jobs working together
(as in Grid computing).
3-1.83
Directed Acyclic Graph
(DAG)
• A data structure used to represent
dependencies.
• Directed graph.
• Must not have cycles (acyclic).
• Each job is a node in DAG.
• Each node can have any number of
parents and children.
3-1.84
DAG
Do job A.
Do jobs B and C after
job A finished
Do job D after both
jobs B and C finished.
Job A
Job B
Job C
Job D
3-1.85
Defining a DAG
• Defined by a .dag file, listing nodes and their
dependencies.
• Each “job” statement has a name (say A)
and a file (say a.condor)
• PARENT-CHILD statement describes
relationship between two or more jobs
• Other statements available.
6d-1.86
Example
# diamond.dag
Job A a.sub
Job B b.sub
Job C c.sub
Job D d.sub
Parent A Child B C
Parent B C Child D
Job A
Job B
Job C
Job D
6d-1.87
Directed Acyclic Graph Manager
(DAGMan) DAGs
Fig. 3.15
3-1.88
# (a) DAG
JOB A A.sub
JOB B B.sub
JOB C C.sub
PARENT A CHILD B
PARENT B CHILD C
3-1.89
# (b) DAG
JOB A A.sub
JOB B B.sub
JOB C C.sub
JOB D D.sub
JOB E E.sub
PARENT A CHILD B C
PARENT B CHILD D
PARENT C D CHILD E
3-1.90
DAG for PARENT A B CHILD C D
# DAG
JOB A A.sub
JOB B B.sub
JOB C C.sub
JOB D D.sub
PARENT A B CHILD C D
Fig. 3.16
3-1.91
Running DAG
condor_submit_dag
Start a DAG with dag file diamond.dag.
condor_submit_dag diamond.dag
Submits a Scheduler Universe Job with
DAGMan as executable.
3-1.92
DAGMan
• Acts as a scheduler managing submission
of jobs upon DAG dependencies.
• Holds and submits jobs to Condor queue at
appropriate times.
3-1.93
Job Failures
• DAGMan continues until it cannot make
progress and then creates a rescue file
holding current state of DAG.
• When failed job ready to re-run, rescue
file used to restore prior state of DAG.
3-1.94
Rescue file
Nodes completed marked with DONE.
# DAG
JOB A A.sub DONE
JOB B B.sub DONE
JOB C C.sub DONE
JOB D D.sub
JOB E E.sub
PARENT A CHILD B C
PARENT B CHILD D
PARENT C D CHILD E
Nodes A, B, and C have completed before a failure occurred.
DAG can restart with node D and then node E.
DONE can be inserted by users. Useful for testing.
Restart is at level of nodes.
3-1.95
Summary of Key Condor
Features
• High throughput computing using an
opportunistic environment.
• Provides a mechanisms for running jobs on
remote machines.
• Matchmaking
• Checkpointing
• DAG scheduling
3-1.96
More Information
• http://www.cs.wisc.org/condor
• Chapter 11, Condor and the Grid, D. Thain, T.
Tannenbaum, and M. Livny, Grid Computing:
Making The Global Infrastructure a Reality, F.
Berman, A. J. G. Hey, and G. Fox, editors, John
Wiley, 2003.
• “Condor-G: A Computation Management Agent for
Multi-Institutional Grids,” J. Frey, T. Tannenbaum,
I. Foster, M. Livny, S. Tuecke, Proc. 10th Int. Symp.
High Performance Distributed Computing (HPDC10) Aug. 2001.
3-1.97
Questions
(Multiple choice)
3-1.98
What command is used to submit the
executable /bin/hostname to the SGE
scheduler?
(a) qsub -b y /bin/hostname
(b) qsub /bin/hostname
(c) sge_submit -Ft SGE -c /bin/hostname
(d) qmon /bin/hostname
SAQ 3-4
3-1.99
What Globus command is used to submit the
executable /bin/hostname to coit-grid03.uncc.edu with
the SGE local scheduler?
(a) globusrun-ws -submit -F coit-grid03.uncc.edu -s
SGE -c /bin/hostname
(b) globusrun-ws -submit -s -F coit-grid03.uncc.edu Ft SGE -c /bin/hostname
(c) globusrun-ws -submit coit-grid03.uncc.edu -Ft
SGE -c /bin/hostname
(d) qsub coit-grid03.uncc.edu /bin/hostname
SAQ 3-5
3-1.100
In a Condor environment, what is displayed
after issuing the condor_status command?
(a) An error message as this is not a valid
Condor command
(b) A list of computers in the condor pool and
their status
(d) A list of jobs in the Condor queue and their
status
(e) The status of the Condor central manager
SAQ 3-7
3-1.101
In Condor, where is the job ClassAd?
(a) In the submit description file
(b) In a file that is specified in the
condor_submit command in addition to
the submit description file
(c) In a file that the user transfers to the
computer separately
(d) In the local newspaper
SAQ 3-8
3-1.102
What is DAGMan in Condor?
(a) A Data Access Grid Manager
(b) A software tool for providing checkpointing
(c) A scheduler that can schedule jobs in a
workflow
(d) A Database Group Manager for Condor
SAQ 3-10
3-1.103
What is checkpointing?
(a) Someone pointing to a check
(b) The process of someone checking the
grade of an assignment
(c) Saving state information periodically to be
able to recover from failures during execution
(d) The process of a compiler finding syntax
errors
SAQ 3-11
3-1.104
Which of the following is NOT a Condor
environment?
(a) Globus/Grid
(b) Vanilla
(c) Chocolate
(d) Standard
(e) Java
SAQ 3-12
3-1.105
Quiz
Identify which of the following are similarities between
Condor ClassAd and Globus RSL 1/RSL2/JDD.
(There may be more than one similarity.)
(a) There are no similarities.
(b) They both provide a means of specifying command
line arguments for the job.
(c) They both provide a means of specifying whether a
named user is allowed to execute a job.
(d) They both provide a means of specifying machine
requirements for a job.
3-1.106