Transcript Slide 1

Enabling Grids for E-sciencE
gLite job submission
Fokke Dijkstra
Donald Smits Centre for Information Technology,
University of Groningen
Utrecht, Grid Tutorial 2008
www.eu-egee.org
EGEE-III INFSO-RI-222667
EGEE and gLite are registered trademarks
Introduction
Enabling Grids for E-sciencE
?
EGEE-III INFSO-RI-222667
gLite job submission
2
Components in the EGEE Grid
Enabling Grids for E-sciencE
JDL
LCG
File
Catalog
(LFC)
Information
System (BDII)
User Interface (UI)
Workload Management
System (WMS)
Storage
Element
(SE)
Computing
Element (CE)
EGEE-III INFSO-RI-222667
gLite job submission
3
Workload Management System
Enabling Grids for E-sciencE
• Tasks of the WMS
–
–
–
–
Find the best resource for your tasks (jobs)
Submit jobs to compute resources
Logging and book keeping
Delegated Grid credential management
EGEE-III INFSO-RI-222667
gLite job submission
4
Job preparation
Enabling Grids for E-sciencE
• You need to provide
– A complete (enough) job
description
 What program?
 What data?
 Any requirements on OS,
installed software, ??
– Possibly a program
 You’re submitting in unknown
territory!
 Program portably!
 Don’t rely on hard-coded
paths or special locations
 The program you send may
not even be in $HOME!
– Perhaps some input data
– Perhaps instructions on what
to do with the output
EGEE-III INFSO-RI-222667
gLite job submission
5
How to Write a Job Description
Enabling Grids for E-sciencE
• Here is a minimal job description (call it hello.jdl)
Executable = “/bin/echo”;
Arguments = “Goedemiddag”;
StdError = “stderr.log”;
StdOutput = “stdout.log”;
OutputSandbox = {“stderr.log”, “stdout.log”};
• We specified
– The program to run and its arguments
– Directed the standard error and output streams to files
– Told it what to do with the output
EGEE-III INFSO-RI-222667
gLite job submission
6
Job Submission Example
Enabling Grids for E-sciencE
• User issues a voms-proxy-init
– enters his certificate’s password
– Receives a valid Globus proxy
• User issues a:
glite-wms-job-submit -a mytest.jdl
and gets back from the system a unique Job Identifier (JobId)
• User issues a:
glite-wms-job-status JobId
to get logging information about the current status of his Job
• When the “Done” status is reached, the user can issue a
glite-wms-job-output JobId
and the system returns the name of the temporary directory where the job
output can be found on the UI machine.
EGEE-III INFSO-RI-222667
gLite job submission
7
Submitting it
Enabling Grids for E-sciencE
$ voms-proxy-init --voms tutor
Cannot find file or dir: /admins/fokke/.glite/vomses
Enter GRID pass phrase:
Your identity: /O=dutchgrid/O=users/O=rug/OU=rc/CN=Fokke Dijkstra
Creating temporary proxy ........................................... Done
Contacting voms.grid.sara.nl:30007
[/O=dutchgrid/O=hosts/OU=sara.nl/CN=voms.grid.sara.nl] "tutor" Done
Creating proxy ................................................ Done
Your proxy is valid until Wed Nov 5 23:11:27 2008
$ glite-wms-job-submit -a hello.jdl
Connecting to the service
https://wms.grid.sara.nl:7443/glite_wms_wmproxy_server
====================== glite-wms-job-submit Success ======================
The job has been successfully submitted to the WMProxy
Your job identifier is:
https://wms.grid.sara.nl:9000/V7pw7lTR4MeFMVAz12larQ
==========================================================================
JobId
EGEE-III INFSO-RI-222667
gLite job submission
8
A Job Submission Example
Enabling Grids for E-sciencE
LCG
File
Catalog
(LFC)
UI
JDL
Job Status
Information
System (IS)
submitted
waiting
User Interface (UI)
ready
Job + Input sandbox
Job + Input
sandbox
Workload
Management
System (WMS)
scheduled
Storage
Element
(SE)
Computing
Element (CE)
EGEE-III INFSO-RI-222667
gLite job submission
9
Checking the status
Enabling Grids for E-sciencE
$ glite-wms-job-status https://wms.grid.sara.nl:9000/V7pw7lTR4MeFMVAz12larQ
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job : https://wms.grid.sara.nl:9000/V7pw7lTR4MeFMVAz12larQ
Current Status:
Scheduled
Status Reason:
Job successfully submitted to Globus
Destination:
ce.grid.rug.nl:2119/jobmanager-pbs-long
Submitted:
Wed Nov 5 11:12:15 2008 CET
*************************************************************
EGEE-III INFSO-RI-222667
gLite job submission
10
Check status using browser
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
gLite job submission
11
A Job Submission Example
Enabling Grids for E-sciencE
UI
JDL
LCG
File
Catalog
(LFC)
Job Status
Information
System (IS)
submitted
waiting
User Interface (UI)
ready
Workload
Management
System (WMS)
scheduled
Storage
Element
running
(SE)
done
Output Sandbox
EGEE-III INFSO-RI-222667
Computing
Element (CE)
gLite job submission
12
Getting the Output
Enabling Grids for E-sciencE
$
glite-wms-job-output https://wms.grid.sara.nl:9000/V7pw7lTR4MeFMVAz12larQ
Connecting to the service
https://wms.grid.sara.nl:7443/glite_wms_wmproxy_server
==============================================================================
JOB GET OUTPUT OUTCOME
Output sandbox files for the job:
https://wms.grid.sara.nl:9000/V7pw7lTR4MeFMVAz12larQ
have been successfully retrieved and stored in the directory:
/tmp/jobOutput/fokke_V7pw7lTR4MeFMVAz12larQ
===============================================================================
$ cat /tmp/jobOutput/fokke_V7pw7lTR4MeFMVAz12larQ
Goedemiddag
EGEE-III INFSO-RI-222667
gLite job submission
13
A Job Submission Example
Enabling Grids for E-sciencE
LCG
File
Catalog
(LFC)
UI
JDL
Job Status
Information
System (IS)
submitted
waiting
User Interface (UI)
Output Sandbox
ready
Workload
Management
System (WMS)
scheduled
Storage
Element
running
(SE)
done
Computing
Element (CE)
EGEE-III INFSO-RI-222667
cleared
gLite job submission
14
Job Description Language
Enabling Grids for E-sciencE
• Job Description Language based on Classified
Advertisement language
• Lines:
–
–
–
–
–
Attribute = expression;
Can be multiple lines, semicolon is separator
”for strings”
# and // for comments
No blanks after ; !!
EGEE-III INFSO-RI-222667
gLite job submission
15
Types of Attributes
Enabling Grids for E-sciencE
•
The supported attributes are grouped in two
categories:
– Job
Define the job itself
– Resources
 Taken into account by the WMS for carrying out the
matchmaking algorithm
 Computing Resource (Attributes)
Used to build expressions of Requirements and/or Rank
attributes by the user
Have to be prefixed with “other.”
 Data and Storage resources (Attributes)
Input data to process, SE where to store output data, protocols
spoken by application when accessing SEs
EGEE-III INFSO-RI-222667
gLite job submission
16
Job Definition Attributes
Enabling Grids for E-sciencE
• Executable (mandatory)
– The command name
• Arguments (optional)
– Job command line arguments
• StdInput, StdOutput, StdErr (optional)
– Standard input/output/error of the job
• Environment (optional)
– List of environment settings
• InputSandbox (optional)
–
–
–
–
List of files on the UI local disk needed by the job for running
The listed files are staged from the UI to the remote CE
Wildcards allowed
Unique filenames required
• OutputSandbox (optional)
– List of files, generated by the job, which have to be retrieved
EGEE-III INFSO-RI-222667
gLite job submission
17
Resource Attributes
Enabling Grids for E-sciencE
• Requirements
– Job requirements on computing resources
– Specified using attributes of resources published in the Information
System
– other.GlueCEStateStatus == "Production" always included (the resource
has to be in the Production grid)
– Useful requirements:
 Wallclock time and specific sites:
Requirements = other.GlueCEPolicyMaxWallClockTime > 720 &&
RegExp(“nikhef.nl", other.GlueCEUniqueID);
 Specific tag published:
Requirements = Member("VO-ncf-gromacs3.3.2",other.GlueHostApplicationSoftwareRunTimeEnvironment);
– Logical expressions:
 &&: and
 ||: or
 !: not
EGEE-III INFSO-RI-222667
gLite job submission
18
Data Attributes
Enabling Grids for E-sciencE
• InputData (optional)
– Refers to data used as input by the job: these data are published
in the Replica Catalog and stored in the SEs)
– GUIDs and/or LFNs
– Job must be sent to CE that has the data nearby
• DataAccessProtocol (mandatory if InputData specified)
– The protocol or the list of protocols which the application is able
to speak with for accessing InputData on a given SE
EGEE-III INFSO-RI-222667
gLite job submission
19
WMS match making and ranking
Enabling Grids for E-sciencE
• The WMS has to find the best suitable CE where the job
will be executed
• It interacts with Data Management service and Information
System
• The CE chosen has to match the job requirements
• If 2 or more CEs satisfy all the requirements, the one with
the best Rank is chosen
– Specified using attributes of resources published in the
Information Service
– If not specified, default value is used:
Rank = -other.GlueCEStateEstimatedResponseTime;
quickest response time
EGEE-III INFSO-RI-222667
gLite job submission
20
Example JDL File
Enabling Grids for E-sciencE
Executable = “gridTest”;
StdError = “stderr.log”;
StdOutput = “stdout.log”;
InputSandbox = {“/home/joda/test/gridTest”};
OutputSandbox = {“stderr.log”, “stdout.log”};
InputData = “lfn:/grid/tutor/testbed0-00019”;
DataAccessProtocol = “gridftp”;
Requirements = other.Architecture==“INTEL” && \
other.OpSys==“CentOS” && other.FreeCpus >=4;
Rank = “other.GlueHostBenchmarkSF00”;
EGEE-III INFSO-RI-222667
gLite job submission
21
Job Submission
Enabling Grids for E-sciencE
• glite-wms-job-submit [-a] [-d <delegationid>] [-o
<output file>] <job.jdl>
-o the generated jobId is written in the <output file>
 Useful for other commands, e.g.:
glite-wms-job-status –i <input file> (or jobId)
-i the status information about edg_jobId contained in the
<input file> are displayed
-a use automatic delegation
-d use an existing delegated proxy at the WMS
e.g. one generated using:
glite-wms-job-delegate-proxy –d <delegationid>
EGEE-III INFSO-RI-222667
gLite job submission
22
Other WMS UI Commands
Enabling Grids for E-sciencE
• glite-wms-job-list-match
Lists resources matching a job description
Performs the matchmaking without submitting the job
• glite-wms-job-cancel
Cancels a given job
• glite-wms-job-status
Displays the status of the job
• glite-wms-job-output
Returns the job-output (the OutputSandbox files) to the user
• glite-wms-job-logging-info
Displays logging information about submitted jobs (all the events
“pushed” by the various components of the WMS)
Very useful for debug purposes
EGEE-III INFSO-RI-222667
gLite job submission
23
Proxy Renewal
Enabling Grids for E-sciencE
• Why?
– To avoid job failure because it outlived the validity of the initial proxy
– To prevent long term proxies from lying around
– Use a safe system for storing long term proxies
• WMS support automatic proxy renewal mechanism as long as the
user credentials are handled by a proxy server.
1. Create a proxy using
voms-proxy-init --voms <voname>
2. Register this proxy with the MyProxy server using
myproxy-init -s <server> [-t <cred> -c <proxy>] -d -n
server is the server address (e.g. px.matrix.sara.nl)
cred is the number of hours the proxy should be valid on the server
proxy is the number of hours renewed proxies should be valid
3. The Proxy is automatic renewed by WMS without user intervention for
all the job life
EGEE-III INFSO-RI-222667
gLite job submission
24
Advanced Job types: Job Collection
Enabling Grids for E-sciencE
• Set of independent jobs
– Collect jobs in single directory:
glite-wms-job-submit -a --collection <directory>
– Advanced collection
using global set
of attributes
EGEE-III INFSO-RI-222667
[
Type = "Collection";
InputSandbox = {"myjob.exe", "fileA"};
OutputSandboxBaseDestURI = "gsiftp://lxb0707.cern.ch/data/doe";
DefaultNodeShallowRetryCount = 5;
Nodes = {
[
Executable = "myjob.exe";
InputSandbox = {root.InputSandbox,
"fileB"};
OutputSandbox = {"myoutput1.txt"};
Requirements = other.GlueCEPolicyMaxWallClockTime > 1440;
],
[
NodeName = "mysubjob";
Executable = "myjob.exe";
OutputSandbox = {"myoutput2.txt"};
ShallowRetryCount = 3;
],
[
File = "/home/doe/test.jdl";
]
}
]
gLite job submission
25
Advanced Job types: Parametric
Enabling Grids for E-sciencE
• Identical jobs, except the
value of a parameter
• Parameters
– List of items
– Number
 ParameterStart and
ParameterStep necessary
– _PARAM_ replaced by
parameter value
EGEE-III INFSO-RI-222667
[
JobType = "Parametric";
Executable = "myjob.exe";
StdInput = "input_PARAM_.txt";
StdOutput = "output_PARAM_.txt";
StdError = "error_PARAM_.txt";
Parameters = 100;
ParameterStart = 1;
ParameterStep = 1;
InputSandbox = {"myjob.exe", "input_PARAM_.txt“};
OutputSandbox = {"output_PARAM_.txt",
"error_PARAM_.txt"};
]
gLite job submission
26
Advanced job types: MPI
Enabling Grids for E-sciencE
• For programs using the MPI parallel library
• JobType=”MPICH”
• NodeNumber = <n>
– Request n cores on the remote cluster
– Scheduling is determined at remote site
• Submit script that starts up your program using MPI
– You can use mpi-start for this
• Scheduling SMP nodes not yet possible
EGEE-III INFSO-RI-222667
gLite job submission
27
Other Advanced Job types
Enabling Grids for E-sciencE
nodeA
• Direct Acyclic Graph
– Graph shows dependencies
between jobs
nodeB
nodeC
mynode
nodeD
• Interactive
– Opens graphical window
that connects to job
EGEE-III INFSO-RI-222667
gLite job submission
28
Pilot jobs
Enabling Grids for E-sciencE
• Send agent to a site first
– Will fetch workload from central service
– Both single and multi user frameworks exist
• Advantages
– Hides problematic sites from the user
– Probably less overhead per workload
– Makes central scheduling possible
• Disadvantage
– Less efficient scheduling at site
– Security concerns with multiple users
EGEE-III INFSO-RI-222667
gLite job submission
29
How to get your program on the Grid?
Enabling Grids for E-sciencE
•
Send it with job
– Binary package
– Compile it on the fly
Advantages
–
You will be able to run anywhere
Disadvantages
– Portability extremely important,
otherwise jobs will fail
– Overhead
•
Use software manager accounts
– Write permission on special
shared directory
– Publish tags in information system
Advantages
– Software can be validated
– Central management takes
burden from users
Disadvantages
– Sites have to support this
– Lot of work for software manager
•
Use preinstalled packages
– Only possible for special
collaborations
 Example: VL-e software stack
EGEE-III INFSO-RI-222667
Advantages
– Very easy for users
Disadvantages
– Sites have to support it
– Lot of work for software packager
gLite job submission
30
Pointers to advanced topics
Enabling Grids for E-sciencE
• Can be found in gLite user guide:
http://glite.web.cern.ch/glite/documentation/
• Advanced sandbox play
– Gridftp instead of local files
 No space on WMS needed
• Brokerinfo
– Information about local environment (CE, SEs, etc.)
• Job perusal
– Peek at the output while your job is running
• Automatic retries
– RetryCount
– ShallowRetryCount
EGEE-III INFSO-RI-222667
gLite job submission
31