No Slide Title

Download Report

Transcript No Slide Title

Job Submission
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © 2009.
Chapter 2, pp. 35-59. For educational use only. All rights reserved. Aug 24, 2009
2-1.1
Types of jobs to be submitted to
a Grid
• Programs written in C, C++, … that need to
be compiled.
• Java programs that need a Virtual Java
Machine
• Pre-compiled application packages
2-1.2
Submitting a job that needs to be compiled
Fig. 2.1
2-1.3
Java programs
• Quite similar to compiling C programs, except
Java compiler (javac) creates class file (bytecode)
that is interpreted by a Java Virtual Machine
(java).
• It is the Java Virtual Machine that is the executing
program and the class file is an input file.
• Other class files usually need to be called too,
found in path specified by CLASSPATH variable,
so this variable must be set up properly.
2-1.4
Submitting a Java job
Fig. 2.2
2-1.5
• Java programs offer more portability because
class file could be sent to any remote computer
having a Java Virtual Machine installed.
• However, speed of execution may be less than
executing fully compiled binaries.
• Some studies have shown Java programs to run
at 70% of equivalent C programs.
• Many internal components of Grid middleware
software such as Globus actually use a mixture of
Java and C. Java commonly used to create Web
service components.
2-1.6
Types of Applications
Since Grid is a collection of computers, user might wish to
use these computers collectively to solve problems.
Two ways:
• Parallel programs -- Break problem down into tasks that
need to be done to solve problem and submit individual
tasks to different computers to work on them
simultaneously.
• Parameter sweep problems -- Run same job on different
computers at same time but with different input parameters.
Particularly attractive for Grid computing platforms because
no dependences between each sweep (usually).
2-1.7
Grid Resource Allocation
Management (GRAM)
Principal job submission
component of Globus
2-1.8
Globus Open Source Grid Software
G
T
4
G
T
3
G
T
2
G
T
3
G
T
4
Community
Scheduler
Framework
[contribution]
Delegation
Service
Python WS Core
[contribution]
C WS Core
Community
Authorization
Service
OGSA-DAI
[Tech Preview]
WS
Authentication
Authorization
Reliable
File
Transfer
Grid
Resource
Allocation Mgmt
(WS GRAM)
Monitoring
& Discovery
System
(MDS4)
Java WS Core
GridFTP
Grid
Resource
Allocation Mgmt
(Pre-WS GRAM)
Monitoring
& Discovery
System
(MDS2)
C Common
Libraries
Pre-WS
Authentication
Authorization
GRAM
Web
Services
Components
Non-WS
Components
Replica
Location
Service
XIO
Credential
Management
Security
Data
Management
Execution
Management
Information
Services
Common
Runtime
I Foster
Job submission components
Fig. 2.3
2-1.10
Running
simple jobs
across a Grid
computing
environment
Fig. 2.4
2-1.11
Specifying the job
Two basic ways a job might be specified:
• Directly by name of executable with required
input arguments
or
• By a job description file – more powerful
2-1.12
Directly
For very simple jobs, one can submit a single job using
-c option, e.g.,
globusrun-ws -submit -c prog1 arg1 arg2
which executes program prog1 with arguments arg1
and arg2 on local host.
-c option actually causes globusrun-ws to generate a
job description with the named program and arguments
that follow.
-c option must be the last globusrun-ws option (why?).
2-1.13
Example
globusrun-ws –submit –c /bin/echo hello
Globus job monitoring output created on command
line and will indicate that the job completes.
However, output from echo program (hello) not
displayed and is lost as is any standard output
without further specification (see later).
1b.14
Job Description File
Gives details such as:
• Job Description
- Name of executable
- Number of instances
- Arguments
- Input files
- Output files
- Directories
- Environment variables, paths, ...
• Resource requirements
- Processor
- Number, cores, ...
- Type
- Speed, ...
- Memory
Used to match
job with
resources
2-1.15
Job Description Languages
Several languages invented.
• Globus - specific:
– Globus 1 and 2 used their Resource
Specification language RSL (version 1)
– Globus 3 used an XML version called RSL-2
– Globus 4 uses a variation of RSL-2 in a JDD
(Job Description Document)
• Job Submission Description Language (JSDL)
– A recent industry-wide standard (2005)
2-1.16
Resource Specification Language
RSL version 1
• A meta-language describing job and its
required execution.
Provides specification for:
• Job description - directory, executable,
arguments, environment
• Resource requirements - machine type,
number of nodes, memory, etc.
2-1.17
RSL Version 1 examples
Constraints Example
Conjunction (AND): &
• To create 3-5 instances of myProg, each on a
machine with at least 64 Mbytes memory available
to me for 1 hours:
& (executable=myProg)
(count>=3)(count<=5)(memory>=64)
(max_time=60)
2-1.18
Constraints Example
Disjunction (OR):
|
• To create 5 instances of myProg, each on a
machine with at least 64 Mbytes memory or 7
instances of myProg, each on a machine with at
least 32 Mbytes memory :
&(executable=myProg)
(|(&(count=5)(memory>=64))
(&(count=7)(memory>=32)))
2-1.19
Requesting multiple resources
multirequest: +
• To execute 5 instances of myProg1 on a machine
with at least 64 Mbytes memory and execute 2
instances of myProg2:
+(&(count=5)(memory>=64))
(executable=myProg1))
(&(count=2)(executable=myProg2))
2-1.20
XML Job Description languages
• With introduction of XML in early
2000’s, job description languages
began to be changed to XML.
2-1.21
Using XML
• Much more elegant and flexible, and in keeping
with Web services.
• Can use XML parsers.
• Allows more powerful mechanisms with job
schedulers.
• Resource scheduler/broker applies specification
to local resources.
2-1.22
Resource Specification
Language, RSL version 2
• XM job description language used
Globus version 3 (GT3).
• An XML language.
2-1.23
RSL-2
• XML version of RSL 1
• Can specify everything from executable,
paths, arguments, input/output, error file,
number of processes, max/min execution
time, max/min memory, job type etc. etc.
2-1.24
GT 3 RSL-2 Example
Specifying Executable
(executable=/bin/echo)
<gram:executable>
<rsl:path>
<rsl:stringElement
value="/bin/echo"/>
</rsl:path>
</gram:executable>
2-1.25
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
RSL and GT 3.2
RSL-2 comparison
for echo program
<?xml version="1.0" encoding="UTF-8"?>
<rsl:rsl xmlns:rsl="http://www.globus.org/namespaces/2003/04/rsl"
xmlns:gram="http://www.globus.org/namespaces/2003/04/rsl/gram"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.globus.org/namespaces/2003/04/rsl
c:/ogsa-3.0/schema/base/gram/rsl.xsd
http://www.globus.org/namespaces/2003/04/rsl/gram
c:/ogsa-3.0/schema/base/gram/gram_rsl.xsd">
<gram:job>
<gram:executable> <rsl:path>
<rsl:stringElement value="/bin/echo"/> </rsl:path>
</gram:executable>
<gram:directory> <rsl:path>
<rsl:stringElement value="/bin"/> </rsl:path>
</gram:directory>
<gram:arguments>
<rsl:string> <rsl:stringElement value="Hello World"/> </rsl:string>
</gram:arguments>
<gram:stdin> <rsl:path>
<rsl:stringElement value="/dev/null"/> </rsl:path> </gram:stdin>
<gram:stdout>
<rsl:pathArray>
<rsl:path>
<rsl:substitutionRef name="HOME"/>
<rsl:stringElement value="/stdout"/>
</rsl:path>
</rsl:pathArray>
</gram:stdout>
<gram:stderr>
<rsl:pathArray>
<rsl:path>
<rsl:substitutionRef name="HOME"/>
<rsl:stringElement value="/stderr"/>
</rsl:path>
</rsl:pathArray>
</gram:stderr>
<gram:count> <rsl:integer value="1"/> </gram:count>
<gram:jobType>
<gram:enumeration>
<gram:enumerationValue> <gram:multiple/> </gram:enumerationValue>
</gram:enumeration>
</gram:jobType>
<gram:gramMyJobType>
<gram:enumeration>
<gram:enumerationValue> <gram:collective/> </gram:enumerationValue>
</gram:enumeration>
</gram:gramMyJobType>
<gram:dryRun> <rsl:boolean value="false"/> </gram:dryRun>
<gram:saveState> <rsl:boolean value="true"/> </gram:saveState>
<gram:twoPhase> <rsl:integer value="600"/> </gram:twoPhase>
</gram:job>
</rsl:rsl>
&((executable=/bin/echo)
(directory="/bin")
(arguments="Hello World")
(stdin=/dev/null)
(stdout="stdout")
(stderr="stderr")
(count=1)
)
2-1.26
Job Description Document
(JDD)
• RSL-2 renamed and called JDD used in
more recent Globus 4 (GT4) documents.
• Similar to original RSL-2 but simplified
syntax.
• Not completely interchangeable.
2-1.27
GT 4 JDD Example
Specifying Executable
executable=/bin/echo
<executable>/bin/echo</executable>
2-1.28
GT 4.0 JDD for echo program
<?xml version="1.0" encoding="UTF-8"?>
<job>
<executable>/bin/echo</executable>
<directory>${GLOBUS_USER_HOME}</directory>
<argument>Hello</argument>
<argument>World</argument>
<stdout>${GLOBUS_USER_HOME}/stdout</stdout>
<stderr>${GLOBUS_USER_HOME}/stderr</stderr>
</job>
2-1.29
Job Submission Description
Language (JSDL)
• A standard introduced by GGF (Global
Grid forum) in 2005 and beginning to be
widely adopted.
2-1.30
Basic JSDL structure
<JobDefinition>
<JobDescription>
<JobIdentification > ....</JobIdentification>
<Application> ... </Application>
<Resources> ... </Resources >
<DataStaging> ... <DataStaging >
</JobDescription>
</JobDefinition>
2-1.31
For executables operating in a Linux environment,
replace <application> with <POSIXapplication>
<POSIXApplication name=”xsd: ... ”>
<Executable> ... </Executable>
<Argument> ... </Argument>
<Input> ... </Input>
<Output> ... </Output>
<Error> ... </Error>
<WorkingDirectory> ... </WorkingDirectory>
</POSIXApplication>
Portable Operating System Interface, a collection of IEEE standards
that define APIs, compatible to most versions of Unix/Linux
2-1.32
Sample Linux job description
<?xml version="1.0" encoding="UTF-8"?>
<jsdl:JobDefinition
xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl"
xmlns:jsdl-posix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix">
<jsdl:JobDescription>
<jsdl:Application>
<JobName>Test Job</JobName>
<Description>Hello world Job</Description>
<jsdl-posix:POSIXApplication >
<jsdl-posix:Executable>/bin/echo</jsdl-posix:Executable>
<jsdl-posix:Argument>hello, world</jsdl-posix:Argument>
<jsdl-posix:Output>${GLOBUS_USER_HOME}/stdout</jsdlposix:Output>
<jsdl-posix:Error>${GLOBUS_USER_HOME}/stderr</jsdlposix:Error>
</jsdl-posix:POSIXApplication>
</jsdl:Application>
</jsdl:JobDescription>
</jsdl:JobDefinition>
2-1.33
<Resources> describes requirements of resources for job and can include:
<Resources>
<CandidateHosts> ...</CandidateHosts>
<FileSystem> ... </FileSystem>
<ExlusiveExecution> ... </ExlusiveExecution>
<OperatingSystem> ... </OperatingSystem>
<CPUArchitecture> ... </CPUArchitecture>
<IndividualCPUSpeed> ... </IndividualCPUSpeed>
<IndividualCPUTime> ... </IndividualCPUTime>
<IndividualCPUCount> ... </IndividualCPUCount>
<IndividualNetworkBandwidth> ... </IndividualNetworkBandwidth>
<IndividualPhysicalMemory> ... </IndividualPhysicalMemory>
<IndividualVirtualMemory> ... </IndividualVirtualMemory>
<IndividualDiskSpace> ... </IndividualDiskSpace>
<TotalCPUTime> ... </TotalCPUTime>
<TotalCPUCount> ... </TotalCPUCount>
<TotalPhysicalMemory> ... </TotalPhysicalMemory>
<TotalVirtualMemory> ... </TotalVirtualMemory>
<TotalDiskSpace> ... </TotalDiskSpace>
<TotalResourceCount> ... </TotalResourceCount>
</Resources>
2-1.34
Submitting a job
2-1.35
GT4 job submission command
globusrun-ws
• Submit and monitor GRAM jobs
• Written in C, for faster startup and
execution than earlier Java version
• Supports multiple and single job submission
• Handles credential management
• Streaming of job stdout/err during execution
2-1.36
Simple job submission
• Step 1: Create proxy with: grid-proxy-int
command.
• Step 2: Issue globusrun-ws with parameters
to specify job.
2-1.37
Some globusrun-ws flags
(options) for job submission
2-1.38
Running GT 4 Job
using XML job description file
• Command:
globusrun-ws –submit –f prog.xml
where prog.xml specifies job in JDD.
-submit causes job to be submitted
Submitted to localhost (machine that is
executing command) as no contact resource
specified.
Submitted immediately using “fork”
2-1.39
With named executable
-c option
Example: Submit program echo with argument
hello to default localhost.
globusrun-ws –submit –c /bin/echo hello
-c Causes globusrun-ws to generate job description
with named program and arguments.
-c option, if used, must be last option.
Only useful for very simple single jobs.
2-1.40
Output modes
-submit Submits (or resubmits) a job in one
of three output modes:
batch
interactive, or
interactive-streaming.
Default (without additional flags to specify) is
interactive.
2-1.41
Interactive mode
Example
Submit program echo with argument hello to default
localhost.
% globusrun-ws –submit –c /bin/echo hello
Submitting job...Done.
Job ID: uuid:d23a7be0-f87c-11d9-a53b0011115aae1f
Termination time: 07/20/2005 17:44 GMT
Current job state: Active
Current job state: CleanUp Job goes thro
several states
Current job state: Done
Destroying job...Done.
Job ID
Output
2-1.42
Streaming
Refers to sending contents of a stream of data from
one location to another location as it is generated.
Often associated with Linux standard output and
standard error streams, stdout and stderr.
For a program that creates output on remote machine,
need:
• Files to hold output and error messages ,or
• Re-direct output and error messages to user console.
2-1.43
Interactive-streaming mode
-s option
Provides for capturing program output
and error messages and re-directing them
to user’s console (output of globusrun-ws)
or to specified files.
2-1.44
Interactive-streaming mode
Re-direction to user console
-s option
Example
globusrun-ws -submit -s -c /bin/echo hello
Output (hello) redirected to (globusrun-ws) stdout
Error messages redirected to (globusrun-ws) stderr
2-1.45
Interactive-streaming mode
Re-direction to files
-s option with –so and –se options
-s for streaming output
and
–so to specify output file
–se to specify error file
2-1.46
Example
globusrun-ws -submit
-s -so outfile -se errorfile -c /bin/echo hello
name of file holding output
Argument for echo
name of file holding error messages
2-1.47
Specify streaming to files
using Job description file
Example (JDD)
<job>
<executable>/bin/echo</executable>
<argument>Hello</argument>
<stdout>jobOut</stdout>
<stderr>jobErr</stderr>
</job>
2-1.48
Batch submission
A long-standing Computer Science term from early
days of computing where jobs submitted to system in
a group (a batch) and wait their turn to be executed
sometime in the future.
Originally appeared when programs were submitted
by punched cards to a shared system, perhaps to be
run perhaps overnight. (The author remembers those
days with frustration.)
Batch submission really part of a scheduling
approach.
2-1.49
Batch submission
-b option
In globusrun-ws, batch referred to as an output
mode because of way output generated.
Once job submitted, control returned to
command line, and one will need to query
system to find out status of job.
2-1.50
For example, suppose we ran the job:
globusrun-ws –submit /bin/sleep 100
in interactive mode. Would return when program (sleep for
100 seconds in this case) completes.
We would get normal globusrun-ws output, such as:
Submitting job...Done.
Job ID: uuid:d23a7be0-f87c-11d9-a53b-0011115aae1f
Termination time: 07/20/2005 17:44 GMT
Current job state: Active
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
only each line would appear as process moves to next status
condition.
2-1.51
Alternatively, could execute sleep in batch output mode: (-b
option):
globusrun-ws –submit –b /bin/sleep 100
Output would immediately appear of the form:
Submitting job…Done
JoB ID: uuid:f9544174-60c5-11d9-97e3-0002a5ad41e5
Termination time: 01/08/2005 16:05 GMT
Displays ManagedJob EPR as job ID (more on this later).
Control returned to command line.
Program may not have finished. In this case it will not for 100
seconds.
2-1.52
Now one has to query state of job to find out when it
completes.
Need job ID (ManagedJob EPR)
Convenient to have that put in a file using –o option
when submitting job, e.g.
globusrun-ws –submit –b -o jobEPR /bin/sleep 100
where jobEPR holds the job ID (ManagedJob EPR).
2-1.53
To watch status of submitted job
“Attach” interactive monitoring with -monitor option.
Job ID (ManagedJob EPR) provided with -j option, e.g.:
globusrun-ws –monitor –j jobEPR
where jobEPA holds ManagedJob EPR.
Then can see stages job goes through with interactive output
immediately:
job state: Active
Current job state: CleanUp
Current job state: Done
Requesting original job description...Done.
Destroying job...Done
although job itself still batch output job.
2-1.54
Some other options
-status Reports the current state of
the job and exits
-kill Requests immediate
cancellation of job and exits.
2-1.55
2-1.56
Specifying where job is submitted
Request to run job processed by “factory” service
called ManagedJobFactoryService.
Default URL:
https://localhost:8443/wsrf/services/ManagedJobFactoryService
2-1.57
To specify where job is submitted
-F Specifies “contact” for the job submission.
globusrun-ws –submit –F http://localhost:8440 –f prog1.xml
Job submitted to localhost
Globus container that hosts services running on port
8440
Factory service still located at.
wsrf/services/ManagedJobFactoryService
2-1.58
Selecting a different host
Example
globusrun-ws –submit –F
https://140.221.65.193:4444/wsrf/
services/managedJobFactoryService
–f prog1.xml
2-1.59
Many other options
Example
-term time
Set an absolute termination time, or a
time relative to successful job creation
2-1.60
Transferring Files
Job submission command, for example:
globusrun-ws –submit
–F http://coit-grid01.uncc.edu –c prog1
requires prog1 to be existing on the remote machine
in the default directory ( ${GLOBUS_USER_HOME} ).
Up to user to ensure executable is in place.
2-1.61
GridFTP
A Globus component that provides for:
• Large data transfers
• Secure transfers
• Fast transfers
– Parallel transfers -- employing multiple virtual
channels sharing a single physical network
connection
– Striping -- employing multiple physical channels
using multiple hardware interfaces.
• Reliable transfers
• Third party transfers.
2-1.62
Globus Open Source Grid Software
G
T
4
G
T
3
G
T
2
G
T
3
G
T
4
Community
Scheduler
Framework
[contribution]
Delegation
Service
Python WS Core
[contribution]
C WS Core
Community
Authorization
Service
OGSA-DAI
[Tech Preview]
WS
Authentication
Authorization
Reliable
File
Transfer
Grid
Resource
Allocation Mgmt
(WS GRAM)
Monitoring
& Discovery
System
(MDS4)
Java WS Core
GridFTP
Grid
Resource
Allocation Mgmt
(Pre-WS GRAM)
Monitoring
& Discovery
System
(MDS2)
C Common
Libraries
Pre-WS
Authentication
Authorization
Non-WS
Components
Replica
Location
Service
XIO
GridFTP
Credential
Management
Security
Web
Services
Components
Data
Management
Execution
Management
Information
Services
Common
Runtime
I Foster
Third party transfers
Transferring a file from one remote location to
another remote location controlled by a party at
another location (the third party).
Already seen third party transfers in Grid portal at
file management portlet.
There, user can initiate a transfer between two
locations from portal running on a third system.
2-1.64
GridFTP third party transfers
Fig 2.5
2-1.65
ReliableFileTransfer (RFT)
service
GridFTP is not a Web/Grid service.
ReliableFileTransfer (RFT) service provides service
interface and additional features for reliable file
transfers (retry capabilities etc.).
RFT uses GridFTP servers to effect actual transfer.
2-1.66
Globus Open Source Grid Software
G
T
4
G
T
3
G
T
2
G
T
3
G
T
4
Community
Scheduler
Framework
[contribution]
Delegation
Service
Python WS Core
[contribution]
C WS Core
Community
Authorization
Service
OGSA-DAI
[Tech Preview]
WS
Authentication
Authorization
Reliable
File
Transfer
Grid
Resource
Allocation Mgmt
(WS GRAM)
Monitoring
& Discovery
System
(MDS4)
Java WS Core
GridFTP
Grid
Resource
Allocation Mgmt
(Pre-WS GRAM)
Monitoring
& Discovery
System
(MDS2)
C Common
Libraries
Pre-WS
Authentication
Authorization
RFT
Web
Services
Components
Non-WS
Components
Replica
Location
Service
XIO
Credential
Management
Security
Data
Management
Execution
Management
Information
Services
Common
Runtime
I Foster
Globus file transfer command
globus-url-copy
Example
Source URL
globus-url-copy
gsiftp://www.coit-grid02.uncc.edu/~abw/hello
file:///home/abw/
Destination URL
copies file hello from coit-grid02.uncc.edu to the local
machine using GridFTP.
Users needs valid security credentials (a certificate
and proxy)
2-1.68
Question
Why three /’s in file URL, i.e. file:/// ?
Answer
The general form of file URL is
file://host/path. If host omitted, it is assumed
to be localhost, left with three /’s, i.e. file:///.
2-1.69
File Staging
Moving complete files to where they are needed.
Usually associated with input and output files.
Input file need to be moved to where program located
Output files generated need to be moved back to user,
or as input to other programs.
Note different to input and output streaming, which
moving a series of data items as a stream as it
happens.
2-1.70
File staging
Fig 2.6
2-1.71
Staging example in JDD
<job>
…
<fileStageOut>
<transfer>
<sourceUrl>file:///prog1Out</sourceUrl>
<destinationUrl>gsiftp://coit-grid05.uncc.edu:2811
/prog1Out</destinationUrl>
</transfer>
</fileStageOut>
…
</job>
2-1.72
Staging example in JSDL
<jsdl:DataStaging>
<jsdl:FileName>
/inputfiles/prog1Input
</jsdl:FileName>
<jsdl:CreationFlag>overwrite</jsdl:CreationFlag>
<jsdl:Source>
<jsdl:URI>
gsiftp://coit grid05.uncc.edu:2811/prog1Input
</jsdl:URI>
</jsdl:Source>
</jsdl:DataStaging>
2-1.73
Sources of GT 4 information
http://www.globus.org/toolkit/docs
2-1.74
Questions
(multiple choice)
2-1.75
When one issues the GT4.0 command:
globusrun-ws -submit -F localhost:8440 -s
-so hello1 -c /bin/echo hello
what is hello?
(a) A java class
(b) An xml file containing the description of the job to
be run
(c) The executable to run in Globus
(d) The argument for the program that will be
executable
2-1.76
When one issues the GT4.0 command:
globusrun-ws -submit -F localhost:8440 -s
-so hello1 -c /bin/echo hello
is the order of the flags important, and if so why?
(a) Not important
(b) Important: -c must be last as it uses the
remaining arguments
(c) Important: -s must be before -so
(d) Important: -F must be first
2-1.77
When one issues the GT4.0 command:
globusrun-ws -submit -F localhost:8440 -s
-so hello1 -c /bin/echo hello
what is localhost?
(a) The server logged into running globusrun-ws.
(b) The computer you are using to log into the
server
(c) None of the other answers.
2-1.78
What does the tag <count> specify
in an RSL-2/JDD file?
(a) The number of different jobs
submitted.
(b) The number of computers to use.
(c) The number of identical jobs to
submit.
(d) The number of arguments.
2-1.79