SAM-Grid Middleware - Rod Walker,ICL.  SAM.  JIM.  RunJob.  Conclusions. Rod Walker IC 13th March 2002 http://d0db.fnal.gov/sam.

Download Report

Transcript SAM-Grid Middleware - Rod Walker,ICL.  SAM.  JIM.  RunJob.  Conclusions. Rod Walker IC 13th March 2002 http://d0db.fnal.gov/sam.

SAM-Grid Middleware
- Rod Walker,ICL.
 SAM.
 JIM.
 RunJob.
 Conclusions.
Rod Walker IC 13th March 2002
http://d0db.fnal.gov/sam
Rod Walker IC 13th March 2002
SAM stands for “Sequential Access to
Data via Metadata”.
Sequential access within files – order of files isn’t
important, e.g. HEP data.
History of SAM
Project started in 1997 by FNAL Computing Division(not
just physicists).
Meant for FNAL experiments, and recently taken up by
CDF.
So far ~20 FTE years – a lot of effort.
State of the art in Data Management
No-one
else
deliver TB’s of user selected data on demand.
Rod
Walker
IC has
13thtried
Marchto2002
Global file routing
• Many remote stations want files
– SAM allowed free-for-all to gridftp server.
– MSS access only from FNAL site, cache on private network,...
• Needed control and routing
• Solution: All sites can route files, eg.
– Get fnal files from fnal-router
– route=fnal.gov::nijmegen and nijmegen station has
route=fnal.gov::fnal-router
• Janet - Geant – Esnet – FNAL, 155Mbit bottleneck.
• Janet - Geant – Surfnet – FNAL, Gbit(?)
Rod Walker IC 13th March 2002
SAM Status
•Middleware Development
•Global routing.
•Diverse deployments, e.g. private network, firewall,
shared vs local disk cache.
•CDF deployment – GridPP
•Bug fixes.
•GridFTP and Authentication – GridPP
•Outlook
• Decreasing development. FNAL CD support for RunII
Rod Walker IC 13th March 2002
Rod Walker IC 13th March 2002
JIM history
•Purpose: to build on SAM’s data handling,
to create a real grid.
•Job definition & management
•Information & Monitoring
•Novel concepts
•Already have DH system.
•ups/upd packaging and deployment.
•rpm functionality plus multi-platform, tailoring.
•little dependence on native installation, e.g.python v2.1f
•hugely simplified deployment.
•Use Condor as resource broker.
Rod Walker IC 13th March 2002
JIM components
• User Interface
•Job Definition language based on classadds
• RB reduced to making MMS ranking
function
•Static & dynamic constraints:os,code version,freecpu,…
•Plus external function to query DH system.
• Collaboration with Wisconsin.
•Choose gatekeeper, use external function, separate
submission server from negotiator.
Rod Walker IC 13th March 2002
Rod Walker IC 13th March 2002
JIM components
•Information & Monitoring.
• Currently: grid sensors > ldap > MDS > PHP
• Developing: grid sensors > xml > native Db > PHP, other.
• Reliability, flexibility, persistency.
• Same model works for grid system book-keeping
and user level monitoring.
Rod Walker IC 13th March 2002
User
Interfac
e
Parser
JDL
ClassAd
Condor
Schedd
External
Code
Information Flow
Condor-G
Condor
Negotiator
Cin
Cout
GRAM
Condor
Grid
Manager
Gatekeeper
Batch
Syestem
Grid
Sensors
Compute
Resource
Execution Site
Rod Walker IC 13th March 2002
ClassAd
Condor
Collector
Information
And
Monitoring
RunJob
• Vital tool for d0 MC productions on farms.
•Chains, steers and parallelizes d0 executables. Creates
metadata. Use SAM to store to MSS.
• Now interfaced to SAM for input, and can handle real
data and any d0 executables.
•Will be used for skimming, re-processing datasets, and
user analysis.
•Fully automate monitoring, checking and storage.
•Work underway by UK.
Rod Walker IC 13th March 2002
RunJob status
• Maintenance & development of RunJob, and interface to
SAM-Grid entirely by UK.
• CMS using branch of RunJob for production.
• Dave Evans and Greg Graham collaborating on merging
branches.
•Goal: Single package with EDG and SAM-Grid interfaces.
• Runjob “server” or job-manager.
Rod Walker IC 13th March 2002
SAM-Grid Logistics
User Interface
User Interface
User Interface
Submission
Global Job Queue
User Interface
Submission
Resource Selector
Grid Client
Match Making
Global DH Services
Info Gatherer
SAM Naming Server
Info Collector
SAM Log Server
Resource Optimizer
MSS
Cluster
Data Handling
Local Job Handling
SAM Station
(+other servs)
Grid Gateway
SAM Stager(s)
Local Job Handler
(CAF,RunJob,Vanilla, ...)
JIM Advertise
Dist.FS
Worker Nodes
AAARod Walker IC 13th March 2002
Cache
SAM DB Server
Site
RC
MetaData Catalog
Bookkeeping Service
Info Manager
MDS
Web Serv
Info Providers
Grid Monitoring
XML DB server
Site Conf.
Glob/Loc JID map
...
User Tools
Site
Site
Site
Conclusions
o Core SAM supported by FNAL CD
o Operational support via software shifts.
o UK currently contributes 2 experts on shift.
o JIM post-development support,
o bug fixing, deployment issues (like SAM).
o will need software support shifts.
o RunJob is and will be UK supported.
o Expanding functionality – analysis,reprocessing.
o Increasing deployment – d0 sites, CMS.
o On target for end-March deliverable,
and production Grid in April.
Rod Walker IC 13th March 2002
JIM V1: Package dependencies
samgrid
jim_broker_client
jim_client
sam_common
xml_meta_configurator
sam_config
server_run
jim_broker
jim_info_providers
jim_advertise
galax
orbacus
jim_jobmanagers
Rod Walker IC 13th March 2002
globus
jim_www
jim_sandbox