Windows Implementation of LHCb Experiment Workload Management System DIRAC Ying Ying Li 27Km LHCb is one of the four main high energy physics experiments at.

Download Report

Transcript Windows Implementation of LHCb Experiment Workload Management System DIRAC Ying Ying Li 27Km LHCb is one of the four main high energy physics experiments at.

Windows Implementation of LHCb Experiment
Workload Management System DIRAC
Ying Ying Li
27Km
LHCb is one of the four main high energy physics experiments
at the Large Hadron Collider (LHC) at CERN, Geneva. LHCb
is designed to investigate the matter-antimatter asymmetries
seen in the Universe today, concentrating on studies of
particles containing a b quark. Once it starts operation in
2007/8 LHCb will need to process data volumes of the order of
petabytes per year, requiring tens of thousands of CPUs. To
be able to achieve this, a workload management system (DIRAC), allowing coordinated use of globally distributed computing
resources (Grid) has been developed, with implementation in Python. DIRAC currently coordinates LHCb jobs running on 6000+
CPUs shared with other experiments, distributed among 80+ sites across 4 continents. DIRAC has demonstrated its capabilities
during a series of data challenges held since 2002, with a current record of 10,000+ jobs running simultaneously across the Grid. Most of the
LHCb data-processing applications are tested under both Windows and Linux, but the production system has previously been deployed only on
Linux platforms. This project will allow a significant increase in the resources available to LHCb, by extending the DIRAC system to also use
Windows machines.
Users can create jobs using a
Python API or can directly write
scripts in DIRAC’s Job Definition
Language (JDL). In both cases,
the user specifies the application
to be run, the input data (if
API
required), and any precompiled
libraries. Applications developed
by LHCb can be combined to form
various types of jobs, ranging from production jobs
_v12r15.log’])
dirac.submit(job)
(simulation + digitalisation + reconstruction) to physics
analysis.
Create Job
SoftwarePackages =
{
“DaVinci.v12r15"
};
InputSandbox =
{
“DaVinci.opts”
};
InputData =
{
"LFN:/lhcb/production/DC04/v2/009800
import DIRAC
00/DST/Presel_00980000_00001212.dst
from DIRAC.Client.Dirac import *
"
dirac = Dirac()
};
job = Job()
JobName
= “DaVinci_1";
job.setApplication(‘DaVinci',
Owner
= "yingying";
StdOutput
= "std.out";
'v12r15')
StdError
=
"std.err";
job.setInputSandbox(['DaVinci.o
OutputSandbox =
pts’])
{
job.setInputData(['LFN:/lhcb/pro
"std.out",
duction/DC04/v2/00980000/DST/P
"std.err",
“DaVinci_v12r15.log”
resel_00980000_00001212.dst'])
};
job.setOutputSandbox([‘DaVinci
JobType = "user";
JDL
Jobs are submitted via DISET, the
Users are able to monitor job progress
DIRAC security module built from
from the monitoring web page:
Openssl language tools and modified
http://lhcb.pic.es/DIRAC/Monitoring/Anal
version of pyOpenssl. Authorisation
ysis
use is made of certificate based
authentication. Input files are
uploaded to the
Submit Job
sandbox service on
Monitoring
the DIRAC server.
DIRAC can be tailored to allow
running of any type of application.
The important applications for
Agents
LHCb are based on a C++ frame
work called Gaudi.
GAUSS – Monte Carlo generator
for simulation of particle collisions
in the detector.
B o o l e – P r o d u c e s d e t e c t o r LHC Computing
Grid
response to GAUSS ‘hits’.
Brunel – Reconstruction of Clusters,
e v e n t s f r o m B o o l e / d e t e c t o r. Standalone
desktops,
DaVinci – Physics Analysis
laptops …
(C++).
Bender – Physics Analysis
(Python, using bindings to C++).
Agent
Once a job reaches the DIRAC server
it is checked for requirements placed by
the owner, and waits for a suitably
matched Agent from a free resource.
DIRAC Agents act to link the distributed
resources together. When a resource is free to process jobs it
sends out a locally configured Agent, with the specifications
of the resource, to request jobs from the central server. After
a suitable job and Agent are matched:
 Agent retrieves the job’s JDL and
sandbox, it wraps the job in a Python
script, and reports back.
 If the resource is not a standalone
CPU, the resource backend (LCG,
Windows Compute Cluster, condor etc.)
is checked, and the wrapper is
CASTOR storage
submitted accordingly.
 Download and install any required
application when necessary.
 Using the GridFTP protocol and LFC
(LCG File catalogue) download any
GridFTP Data
required Grid data, for example from the
transfer
CERN Castor system.
 Run the job, and report on progress.
 Perform any requested data transfers.
The process of porting DIRAC to Windows has involved work in several areas, which include automated installation of DIRAC, DISET (security
module), automated LHCb application download, installation, running, and secure data transfer with .NetGridFTP. The result is a fully
operational DIRAC system, that is easily deployable in a Windows environment, and allows the authorised user to submit jobs, and offer the CPU
as an available resource to the LHCb experiment alongside Linux resources. The work describe has been developed on a Windows Compute
Cluster consisting of four Shuttle SN95G5 boxes, running Windows Server 2003 Compute Cluster Edition software. This has also assisted in
the extension of DIRAC’s Compute Cluster backend computing element module. Tests have also been made on a Windows XP laptop, which
demonstrate the flexibility and ease of deployment. The system has been deployed on a small cluster at Cambridge and Bristol, and on a larger
cluster (~100 CPU’s) at Oxford. This project displays the platform independence of DIRAC and its potential. The DIRAC system has been used
successfully with a subset of the LHCb applications. Current work focuses on deploying and testing the full set of LHCb applications under
Windows to allow the running of production jobs.