Condor J2 + Developer APIs to Condor A Tutorial on

Download Report

Transcript Condor J2 + Developer APIs to Condor A Tutorial on

Condor J2
+
Developer APIs to Condor
+
A Tutorial on
Condor’s Web Service Interface
Todd Tannenbaum
Computer Sciences Department
University of Wisconsin-Madison
[email protected] [email protected]
http://www.cs.wisc.edu/condor
CondorJ2
› Quill/Quill++: Database reflects state of Condor pool
› Condor J2: Database is the state of Condor pool
› Overview of CondorJ2
 Use database to maintain operational data (workflow state,
machine state, config policies, etc.)
 Implement workflow management, resource management and
resource allocation in J2EE Application Server environment
 Modify master, startd and starter to be web service clients
 Provide web interface for all system services (workflow
submission, machine reconfiguration etc.)
http://www.cs.wisc.edu/condor
2
Motivation
› Flexibility
› Centralized Administratibility
› Attempt to leverage standard “enterprise”
›
technology in this space
Scalability
 As big as you want if you are willing to pay the
big $$$
http://www.cs.wisc.edu/condor
3
Java Application Servers
› Industrial strength middleware for high performance &
scalable web applications
› Widely deployed systems
 Oracle AS 10g, IBM WebSphere, BEA WebLogic, JBoss (open
source)
› Key features






Database connection pooling
Support for transactions
Web service interfaces
Support for clustering (for scalability)
Pluggable security models / role based authorization
Backend database independence
http://www.cs.wisc.edu/condor
4
Condor
Database
JDBC
Application Server
Machine
Modules
Matchmaking
Modules
Condor Pool
Web Site
HTTP
User’s Web
Browser
Workflow
Modules
Condor
Web Services
SOAP over HTTP
User’s
Custom
Tools
Web Service Clients
master
startd
starter
Execute Machines
Pool
Database
JDBC
Application
Server
Application
Server
Application
Server
Load
Balancer
Firewall
NAT
SOAP over HTTP
startd
starter
job
startd
starter
job
Execute Machines
startd
starter
job
What can do in CondorJ2 via
browsers and web services?
› Where do we stand now?
 Add and configure new machines
 Reconfigure machines on the fly
 Specify, submit, monitor and manage workflows
 Monitor global system state
 No matchmaking (yet)
› Is currently research work. When will it
ship? Will it ever ship? Only time will tell.
http://www.cs.wisc.edu/condor
7
Interfacing Applications w/
Condor
› Suppose you have an application which
needs a lot of compute cycles
› You want this application to utilize a
pool of machines
› How can this be done?
http://www.cs.wisc.edu/condor
8
Some Condor APIs
› MW (previous talk)
› Command Line tools
›
›
›
›
condor_submit, condor_q, etc
DRMAA
Condor GAHP
Condor Perl Module
SOAP
http://www.cs.wisc.edu/condor
9
Command Line Tools
› Don’t underestimate them!
› Your program can create a submit file
on disk and simply invoke
condor_submit:
system(“echo universe=VANILLA > /tmp/condor.sub”);
system(“echo executable=myprog >> /tmp/condor.sub”);
. . .
system(“echo queue >> /tmp/condor.sub”);
system(“condor_submit /tmp/condor.sub”);
http://www.cs.wisc.edu/condor
10
Command Line Tools
› Your program can create a submit file
and give it to condor_submit through
stdin:
PERL:
C/C++:
fopen(SUBMIT, “|condor_submit”);
print SUBMIT “universe=VANILLA\n”;
. . .
int s = popen(“condor_submit”, “r+”);
write(s, “universe=VANILLA\n”, 17/*len*/);
. . .
http://www.cs.wisc.edu/condor
11
Command Line Tools
› Using the +Attribute with
condor_submit:
universe = VANILLA
executable = /bin/hostname
output = job.out
log = job.log
+webuser = “zmiller”
queue
http://www.cs.wisc.edu/condor
12
Command Line Tools
› Use -constraint and –format with
condor_q:
% condor_q -constraint ‘webuser==“zmiller”’
-- Submitter: bio.cs.wisc.edu : <128.105.147.96:37866> : bio.cs.wisc.edu
ID
OWNER
SUBMITTED
RUN_TIME ST PRI SIZE CMD
213503.0
zmiller
10/11 06:00
0+00:00:00 I 0
0.0 hostname
% condor_q -constraint 'webuser=="zmiller"' -format "%i\t"
ClusterId -format "%s\n" Cmd
213503
/bin/hostname
http://www.cs.wisc.edu/condor
13
Command Line Tools
› condor_wait will watch a job log file
and wait for a certain (or all) jobs to
complete:
system(“condor_wait job.log”);
› can specify a timeout
http://www.cs.wisc.edu/condor
14
Command Line Tools
› condor_q and condor_status –xml
option
› So it is relatively simple to build on
top of Condor’s command line tools
alone, and can be accessed from many
different languages (C, PERL, python,
PHP, etc).
› However…
http://www.cs.wisc.edu/condor
15
DRMAA
› DRMAA is a GGF standardized job›
›
›
submission API
Has C (and now Java) bindings
Is not Condor-specific -- your app could
submit to any job scheduler with minimal
changes (probably just linking in a
different library)
SourceForge Project
http://sourceforge.net/projects/condor-ext
http://www.cs.wisc.edu/condor
16
DRMAA
› Easy to use, but
› Unfortunately, the DRMAA API does
not support some very important
features, such as:
Two-phase commit
Fault tolerance
Transactions
http://www.cs.wisc.edu/condor
17
Condor GAHP
› The Condor GAHP is a relatively low-level protocol
›
›
based on simple ASCII messages through stdin and
stdout
Supports a rich feature set including two-phase
commits, transactions, and optional asynchronous
notification of events
Is available in Condor 6.7.X
http://www.cs.wisc.edu/condor
18
Example:
GAHP, cont
R: $GahpVersion: 1.0.0 Nov 26 2001 NCSA\ CoG\ Gahpd $
S: GRAM_PING 100 vulture.cs.wisc.edu/fork
R: E
S: RESULTS
R: E
S: COMMANDS
R: S COMMANDS GRAM_JOB_CANCEL GRAM_JOB_REQUEST GRAM_JOB_SIGNAL
GRAM_JOB_STATUS GRAM_PING INITIALIZE_FROM_FILE QUIT RESULTS VERSION
S: VERSION
R: S $GahpVersion: 1.0.0 Nov 26 2001 NCSA\ CoG\ Gahpd $
S: INITIALIZE_FROM_FILE /tmp/grid_proxy_554523.txt
R: S
S: GRAM_PING 100 vulture.cs.wisc.edu/fork
R: S
S: RESULTS
R: S 0
S: RESULTS
R: S 1
R: 100 0
S: QUIT
R: S
http://www.cs.wisc.edu/condor
19
Condor Perl Module
› Perl module to parse the “job log file”
› Recommended instead of polling w/
condor_q
› Call-back event model
› (Note: job log can be written in XML)
http://www.cs.wisc.edu/condor
20
SOAP
› Simple Object Access Protocol
Mechanism for doing RPC using XML
(typically over HTTP or HTTPS)
A World Wide Web Consortium (W3C)
standard
› SOAP Toolkit: Transform a WSDL to
a client library
http://www.cs.wisc.edu/condor
21
Benefits of a Condor SOAP
API
› Condor becomes a service
Can be accessed with standard web
service tools
› Condor accessible from platforms
where its command-line tools are not
supported
› Talk to Condor with your favorite
language and SOAP toolkit
http://www.cs.wisc.edu/condor
22
Condor SOAP API
functionality
›
›
›
›
›
Submit jobs
Retrieve job output
Remove/hold/release jobs
Query machine status
Query job status
http://www.cs.wisc.edu/condor
23
Getting machine status via
SOAP
Your program
condor_collector
queryStartdAds()
Machine List
SOAP library
SOAP
over HTTP
http://www.cs.wisc.edu/condor
24
Lets get some details…
http://www.cs.wisc.edu/condor
25
The API
› Core API, described with WSDL, is
designed to be as flexible as possible
File transfer is done in chunks
Transactions are explicit
› Wrapper libraries aim to make
common tasks as simple as possible
Currently in Java and C#
Expose an object-oriented interface
http://www.cs.wisc.edu/condor
26
Condor setup
› Start with a working condor_config
› The SOAP interface is off by default
 Turn it on by adding ENABLE_SOAP=TRUE
› Access to the SOAP interface is denied by default
 Set ALLOW_SOAP and DENY_SOAP, they
›
work like ALLOW_READ/WRITE/…
 See section 3.7.4 of the v6.7 manual for a
description
 Example: ALLOW_SOAP=*/*.cs.wisc.edu
If using HTTP, must set
QUEUE_ALL_USERS_TRUSTED=TRUE
 (not needed/wanted with HTTPS)
http://www.cs.wisc.edu/condor
27
Necessary tools
› You need a SOAP toolkit
 Apache Axis (Java) - http://ws.apache.org/axis/
 Microsoft .Net - http://microsoft.com/net/
All our
 gSOAP (C/C++) - http://gsoap2.sf.net/
examples are
 ZSI (Python) - http://pywebsvcs.sf.net/
in Java using
 SOAP::Lite (Perl) - http://soaplite.com/
› You need Condor’s WSDL files
Apache Axis
 Find them in lib/webservice/ in your Condor release
› Put the two together to generate a client library
 $ java org.apache.axis.wsdl.WSDL2Java
condorSchedd.wsdl
› Compile that client library
 $ javac condor/*.java
http://www.cs.wisc.edu/condor
28
Helpful tools
› The core API has some complex spots
› A wrapper library is available in Java and C#
 Makes the API a bit easier to use (e.g. simpler file
›
transfer & job ad submission)
 Makes the API more OO, no need to remember and
pass around transaction ids
We are going to use the Java wrapper library for our
examples
 You can download it from
http://www.cs.wisc.edu/condor/birdbath/birdbath.jar
 Will be included in Condor release
http://www.cs.wisc.edu/condor
29
Submitting a job
› The CLI way…
cp.sub:
universe = vanilla
executable = /bin/cp
arguments = cp.sub cp.worked
should_transfer_files = yes
transfer_input_files = cp.sub
when_to_transfer_output = on_exit
queue 1
clusterid = X
procid = Y
owner = matt
requirements = Z
Explicit bits
Implicit bits
$ condor_submit cp.sub
http://www.cs.wisc.edu/condor
30
Submitting a job
• The SOAP way…
1. Begin transaction
Repeat to submit multiple clusters
2.Create cluster
3.Create job
4.Send files
Repeat to submit multiple
5.Describe job
jobs in a single cluster
6.Commit transaction
http://www.cs.wisc.edu/condor
31
Submission from Java
Schedd schedd = new Schedd(“http://…”);
Transaction xact =
schedd.createTransaction();
1. Begin transaction
xact.begin(30);
int cluster = xact.createCluster();
2. Create cluster
int job = xact.createJob(cluster);
3. Create job
File[] files = { new File(“cp.sub”) };
xact.submit(cluster, job, “owner”,
UniverseType.VANILLA, “/bin/cp”,
“cp.sub cp.worked”, “requirements”,
null, files);
xact.commit();
4&5. Send files & describe
job
6. Commit transaction
http://www.cs.wisc.edu/condor
32
Submission from Java
Schedd’s location
Schedd schedd = new Schedd(“http://…”);
Transaction xact =
schedd.createTransaction();
Max time between calls (seconds)
xact.begin(30);
int cluster = xact.createCluster();
int job = xact.createJob(cluster);
File[] files = { new File("cp.sub") };
Job owner, e.g. “matt”
xact.submit(cluster, job, “owner”,
UniverseType.VANILLA, “/bin/cp”,
“cp.sub cp.worked”, “requirements”,
null, files);
xact.commit();
Requirements, e.g. “OpSys==\“Linux\””
Extra attributes, e.g. Out=“stdout.txt” or Err=“stderr.txt”
http://www.cs.wisc.edu/condor
33
Querying jobs
› The CLI way…
$ condor_q
-- Submitter: localhost : <127.0.0.1:1234> : localhost
ID
OWNER
SUBMITTED RUN_TIME ST PRI SIZE CMD
1.0 matt
10/27 14:45 0+02:46:42 C 0 1.8 sleep 10000
…
42 jobs; 1 idle, 1 running, 1 held, 1 unexpanded
http://www.cs.wisc.edu/condor
34
Querying jobs
› The SOAP way from Java…
String[] statusName = { “”, “Idle”, “Running”, “Removed”,
“Completed”, “Held” };
Also, getJobAds given a
int cluster = 1;
int job = 0;
constraint, e.g. “Owner==\“matt\””
Schedd schedd = new Schedd(“http://…”);
ClassAd ad = new ClassAd(schedd.getJobAd(cluster, job));
int status = Integer.valueOf(ad.get(“JobStatus”));
System.out.println(“Job is “ + statusName[status]);
http://www.cs.wisc.edu/condor
35
Retrieving a job
› The CLI way..
› Well, if you are submitting to a local
›
Schedd, the Schedd will have all of a job’s
output written back for you
If you are doing remote submission you
need condor_transfer_data, which
takes a constraint and transfers all files in
spool directories of matching jobs
http://www.cs.wisc.edu/condor
36
Retrieving a job
› The SOAP way in Java…
int cluster = 1;
Discover available files
int job = 0;
Schedd schedd = new Schedd(“http://…”);
Transaction xact = schedd.createTransaction();
xact.begin(30);
Remote file
FileInfo[] files = xact.listSpool(cluster, job);
for (FileInfo file : files) {
xact.getFile(cluster, job, file.getName(), file.getSize(),
new File(file.getName()));
}
xact.commit();
Local file
http://www.cs.wisc.edu/condor
37
Authentication for SOAP
› Authentication is done via mutual SSL
authentication
 Both the client and server have certificates and identify
themselves
› Possible in 6.7.20
› It is not always necessary, e.g. in some controlled
environments (a portal) where the submitting
component is trusted
› A necessity in an open environment -- remember
that the submit call takes the job’s owner as a
parameter
http://www.cs.wisc.edu/condor
38
Questions?
http://www.cs.wisc.edu/condor
39
Authentication setup
› Create and sign some certificates
› Use OpenSSL to create a CA
 CA.sh -newca
› Create a server cert and password-less key
 CA.sh -newreq && CA.sh -sign
 mv newcert.pem server-cert.pem
 openssl rsa -in newreq.pem -out server-key.pem
› Create a client cert and key
 CA.sh -newreq && CA.sh -sign && mv
newcert.pem client-cert.pem && mv newreq.pem
client-key.pem
http://www.cs.wisc.edu/condor
40
Authentication config
› Config options…
 ENABLE_SOAP_SSL is FALSE by default
 <SUBSYS>_SOAP_SSL_PORT
• Set this to a different port for each
SUBSYS you want to talk to over ssl, the
default is a random port
• Example: SCHEDD_SOAP_SSL_PORT=1980
 SOAP_SSL_SERVER_KEYFILE is required and
has no default
• The file containing the server’s certificate
AND private key, i.e. “keyfile” after
cat server-cert.pem server-key.pem >
keyfile
http://www.cs.wisc.edu/condor
41
Authentication config
› Config options continue…
 SOAP_SSL_CA_FILE is required
›
• The file containing public CA certificates
used in signing client certificates, e.g.
demoCA/cacert.pem
All options except SOAP_SSL_PORT have an
optional SUBSYS_* version
 For instance, turn on SSL for everyone except
the Collector with
• ENABLE_SOAP_SSL=TRUE
• COLLECTOR_ENABLE_SOAP_SSL=FALSE
http://www.cs.wisc.edu/condor
42
One last bit of config
› The certificates we generated have a principal name, which
is not standard across many authentication mechanisms
› Condor maps authenticated names (here, principal names) to
canonical names that are authentication method independent
› This is done through mapfiles, given by
SEC_CANONICAL_MAPFILE and SEC_USER_MAPFILE
› Canonical map:
 SSL .*emailAddress=(.*) \1
› “SSL” is the authentication method, “.*emailAddress….*” is a
pattern to match against authenticated names, and “\1” is
the canonical name, in this case the username on the email in
the principal
http://www.cs.wisc.edu/condor
43
HTTPS with Java
› Setup keys…
 keytool -import -keystore truststore -trustcacerts -file
demoCA/cacert.pem
 openssl pkcs12 -export -inkey client-key.pem -in clientcert.pem -out keystore
› All the previous code stays the same, just set some
properties
 javax.net.ssl.trustStore, javax.net.ssl.keyStore,
javax.net.ssl.keyStoreType,
javax.net.ssl.keyStorePassword
 Example: java -Djavax.net.ssl.trustStore=truststore Djavax.net.ssl.keyStore=keystore Djavax.net.ssl.keyStoreType=PKCS12 Djavax.net.ssl.keyStorePassword=pass
http://www.cs.wisc.edu/condor
44