Transcript Document

ILDG Middleware Status
Chip Watson
ILDG-6 Workshop
May 12, 2005
Status: small changes from Dec 2005
Quick review of architecture
Minimal implementation facts
Next steps
Status (quick look)
Only a small amount of middleware work
has been done in the last 6 months
– development of new metadata catalog prototype
at Adelaide based on XML database
– modifications to metadata catalog prototype at
Fermilab to conform to new interface
– small amount of work on replica catalog
prototypes at several sites (JLab, Adelaide,
Architecture remains unchanged
Architecture (review)
Web Services
– Metadata Catalog maps meta data to a
global name
– Replica Catalog maps a global name to one
or more instances
– Storage Resource Manager (optional)
manages a disk, or disk + tape resource
Draft schemas (WSDL) for these services exist
Architecture (review)
File based directories contain...
– Master directory of all collaborations’ MDC, RC
and membership lists, stored as XML files
– Distributed group membership lists (XML)
Initial version of schemas (XML) exist
Implementation View
Master Directory<tbd>.xml
contains for each collaboration:
metadata catalog
replica catalog
group membership
MDC for Japan
RC for Japan
Japan group file
UKQCD group file
USQCD group file
subgroup A file
subgroup B file
file X
MetaData Catalog
ILDG schema defines only a query interface
– multiple query languages (syntax) allowed for
now (no clear winner yet)
– queries map from physics metadata values to
Global File Name (GFN)
– proposed minor modification can also return the
full physics metadata
Minimal Implementation
Master XML directory to be held at<tbd>.xml
For each collaboration, need at least these:
– MetaDataCatalog (e.g. running at<tbd>)
– trivial Replica Catalog (does 1:1 name mapping)
– standard web or ftp server to serve files
Getting going...
(or, what must a collaboration do?)
First: Deploy a metadata catalog
1. choose an existing prototype & deploy
2. populate the catalog with qcdml v1.1
compliant documents, with ILDG compliant
GFN’s (global file names)
Note: names must have collaboration name as part of
the string; this name matches the entry name in the
master directory:
3. request [email protected] to add your MDC to
the master directory on
Getting going...
(or, what must a collaboration do?)
Second: Deploy a replica catalog
1. (option 1) write a simple function which maps
your collaboration’s GFN naming convention
into a static URL pointing to the file
(i.e., no database, just string shuffling)
2. (option 2) get / implement a true RC, with
multiple instance tracking (a database)
3. request [email protected] to add your RC to the
master directory on
Third: Serve the files (http, ftp, srm, ...)
Nice things to also do...
Deploy a real RC, which can track another
collaboration’s copies of your files
Populate a group membership file, to support group
read/write access (otherwise your collaboration is
relegated to “world” status)
Deploy an SRM (with protocol negotiation) and also
at least one file server that supports parallel
streams (gridftp, bbftp, ...) for higher performance
file retrieval
Implement a web interface to your metadata
Near Term Expectations
Adelaide will deploy an MDC, RC within the next
few months
USQCD will also try to match this within the next 6
months, but is currently distracted with getting
machines into production
others have not committed yet
Australian ILDG Node
Paul Coddington
School of Computer Science, University of Adelaide
South Australian Partnership for Advanced Computing
[email protected]
May 2005
• A prototype ILDG node has been set up in Australia for
data from the Centre for the Subatomic Structure of
Matter (CSSM).
• We have developed a metadata catalog, replica catalog
and web portal.
• Currently just allows searching, browsing and
downloading of QCDML metadata
– ability to download configuration files will be added later.
• Metadata for around 50 ensembles is currently available.
Metadata Catalog
• Ensemble and configuration QCDML metadata is
generated as XML files which are loaded into Apache
Xindice, an XML database.
• The metadata catalog web service was developed in
Java using Xindice's implementation of the XML:DB API
for XML databases.
– So should work with other XML databases
• It (almost) conforms to the metadata catalog interface
defined by the ILDG Middleware Working Group.
– Added additional parameter to specify returning GFNs or XML
• XPath queries are passed directly to the XML database.
Other Components
• Replica catalog is a web service wrapper around the
Replica Location Service for Globus Toolkit 3.
– Plan to change this to GT4 RLS or something else.
• No mechanism for downloading files yet
– Will initially generate wget script, like Japanese portal.
– Then investigate using SRM.
• Web portal written using JSP.
• All software will be made freely available after code is
cleaned up and documented.
Middleware Working Group
Near Term Task List
Approve minor changes to MDC interface
Decide on the URL for, and deploy:
master directory file
master membership file
Collect official CA certificates from all collaborations
and post at for all to easily retrieve
(for configuring servers for strongly authenticated operations)
Most Significant
Get data into ILDG compliant format
– create or automate creation of metadata
compliant with qcdml1.1
– write files in ILDG format (or write translation
program for on-the-fly translation)
will LQCD application developers do this?
or will manpower need to be found for translation
Get the MDC operational and populated
(other tasks are comparatively easy)
Other Challenges
Manpower to implement a nice user
interface for browsing, and optionally
retrieving files
(once per collaboration, or shared, even
hosted at ?)
Manpower to write some simple command
line client tools to be used in workflow
Goal of reaching an operational status by
June 2006 is still feasible!