Managing distributed computing resources with DIRAC
Download
Report
Transcript Managing distributed computing resources with DIRAC
Managing distributed
computing resources with
DIRAC
A.Tsaregorodtsev,
CPPM-IN2P3-CNRS, Marseille
12-17 September 2011, NEC’11, Varna
1
Outline
DIRAC Overview
Main subsystems
Workload
Management
Request Management
Transformation Management
Data Management
Use in LHCb and other experiments
DIRAC as a service
Conclusion
2
Introduction
DIRAC is first of all a framework to build
distributed computing systems
Supporting Service Oriented Architectures
GSI compliant secure client/service protocol
• Fine grained service access rules
Hierarchical Configuration service for bootstrapping
distributed services and agents
This framework is used to build all the DIRAC
systems:
Workload Management
• Based on Pilot Job paradigm
Production Management
Data Management
etc
3
Production
Manager
Physicist
User
Matcher
Service
EGI/WLCG
Grid
EGEE
Pilot
Director
NDG
Grid
NDG
Pilot
Director
GISELA
Grid
EELA
Pilot
Director
CREAM
CE
CREAM
Pilot
Director
User credentials management
The WMS with Pilot Jobs requires a strict user
proxy management system
Jobs are submitted to the DIRAC Central Task Queue with
credentials of their owner (VOMS proxy)
Pilot Jobs are submitted to a Grid WMS with credentials of a
user with a special Pilot role
The Pilot Job fetches the user job and the job owner’s proxy
The User Job is executed with its owner’s proxy used to
access SE, catalogs, etc
The DIRAC Proxy manager service ensures the
necessary functionality
Proxy storage and renewal
Possibility to outsource the proxy renewal to the MyProxy
server
5
Direct submission to CEs
Using gLite WMS now
just as a pilot deployment
mechanism
Limited use of brokering
features
• For jobs with input data the
destination site is already
chosen
Have to use multiple Resource
Brokers because of scalability problems
DIRAC is supporting direct submission to CEs
CREAM CEs
Can apply individual site policy
Direct measurement of the site state watching the pilot status info
• Site chooses how much load it can take (Pull vs Push paradigm)
This is a general trend
All the LHC experiments declared abandoning eventually gLite
WMS
6
DIRAC sites
Dedicated Pilot Director
per (group of) site(s)
On-site Director
Off-site Director
Site managers have full
control
Of LHCb payloads
Off-site Director
On-site Director
Site delegates control
to the central service
Site must only define a
dedicated local user
account
The payload
submission through the
SSH tunnel
In both cases the
payload is executed
with the owner
credentials
7
DIRAC Sites
Several DIRAC
sites in production
in LHCb
E.g. Yandex
• 1800 cores
• Second largest MC
production site
Interesting possibility for small user
communities or infrastructures e.g.
contributing local clusters
building regional or university grids
8
WMS performance
Up to 35K concurrent jobs in ~120 distinct sites
Limited by the resources available to LHCb
10 mid-range servers hosting DIRAC central services
Further optimizations to increase the capacity are possible
●
Hardware, database optimizations, service load balancing, etc
9
Belle (KEK) use of the Amazon EC2
VM scheduler developed for Belle MC
production system
Dynamic
VM spawning taking spot prices and TQ
state into account
Thomas Kuhr, Belle
10
Belle Use of the Amazon EC2
Various computing
resource combined
in a single production
system
KEK
cluster
LCG grid sites
Amazon EC2
Common monitoring,
accounting, etc
Thomas Kuhr, Belle II
11
Belle II
Raw Data Storage
and Processing
Starting at 2015
after the KEK update
50
MC Production
and Ntuple Production
ab-1 by 2020
Computing model
Data
Ntuple
Analysis
rate 1.8 GB/s
( high rate scenario )
Using KEK computing
center, grid and cloud resources
Belle II distributed computing system is based
on DIRAC
Thomas Kuhr, Belle II
12
Support for MPI Jobs
MPI Service developed
for applications in the
GISELA Grid
Astrophysics, BioMed,
Seismology applications
No special MPI support on
sites is required
• MPI software installed by
Pilot Jobs
MPI ring usage optimization
• Ring reuse for multiple jobs
Lower load on the gLite WMS
• Variable ring sizes for different jobs
Possible usage for HEP applications:
Proof on demand dynamic sessions
13
Coping with failures
Problem: distributed resources and services
are unreliable
Software
bugs, misconfiguration
Hardware failures
Human errors
Solution: redundancy and asynchronous
operations
DIRAC services are redundant
Geographically:
Configuration, Request
Management
Several instances for any service
14
Request Management system
A Request Management
System (RMS) to accept and
execute asynchronously any
kind of operation that can fail
Request are collected by
RMS instances on VO-boxes at 7 Tier-1 sites
Data upload and registration
Job status and parameter reports
Extra redundancy in VO-box availability
Requests are forwarded to the central Request
Database
For keeping track of the pending requests
For efficient bulk request execution
15
DIRAC Transformation Management
Data driven
payload
generation based
on templates
Generating data
processing and
replication tasks
LHCb specific
templates and
catalogs
16
Data Management
Based on the Request
Management System
Asynchronous data
operations
transfers, registration,
removal
Two complementary
replication mechanisms
Transfer Agent
• user data
• public network
FTS service
• Production data
• Private FTS OPN network
• Smart pluggable
replication strategies
17
Transfer accounting (LHCb)
18
ILC using DIRAC
ILC CERN group
2M jobs run in the first
year
Using DIRAC Workload
Management and
Transformation systems
Instead of 20K planned
initially
DIRAC FileCatalog was developed for ILC
More efficient than LFC for common queries
Includes user metadata natively
19
DIRAC as a service
DIRAC installation shared by a number of user
communities and centrally operated
EELA/GISELA grid
gLite based
DIRAC is part of the grid production infrastructure
• Single VO
French NGI installation
https://dirac.in2p3.fr
Started as a service for grid tutorials support
Serving users from various domains now
• Biomed, earth observation, seismology, …
• Multiple VOs
20
DIRAC as a service
Necessity to manage
multiple VOs with a single
DIRAC installation
Per
VO pilot credentials
Per VO accounting
Per VO resources
description
Pilot directors are VO
aware
Job
matching takes pilot VO
assignment into account
21
DIRAC Consortium
Other projects are starting to use or
evaluating DIRAC
CTA, SuperB, BES, VIP(medical imaging),
• Contributing to DIRAC development
• Increasing the number of experts
Need
…
for user support infrastructure
Turning DIRAC into an Open Source project
DIRAC Consortium agreement in preparation
• IN2P3, Barcelona University, CERN, …
http://diracgrid.org
• News, docs, forum
22
Conclusions
DIRAC is successfully used in LHCb for all
distributed computing tasks in the first years
of the LHC operations
Other experiments and user communities
started to use DIRAC contributing their
developments to the project
The DIRAC open source project is being built
now to bring the experience from HEP
computing to other experiments and
application domains
23
24
LHCb in brief
Experiment dedicated to
studying CP-violation
Responsible for the dominance
of matter on antimatter
Matter-antimatter difference
studied using the b-quark
(beauty)
High precision physics (tiny
difference…)
Single arm spectrometer
Looks like a fixed-target
experiment
Smallest of the 4 big LHC
experiments
~500 physicists
Nevertheless, computing is also
a challenge….
25
LHCb Computing Model
Tier0 Center
Raw data shipped in real time to Tier-0
Part of the first pass reconstruction and re-reconstruction
Acting as one of the Tier1 center
Calibration and alignment performed on a selected part of
the data stream (at CERN)
Resilience enforced by a second copy at Tier-1’s
Rate: ~3000 evts/s (35 kB) at ~100 MB/s
Alignment and tracking calibration using dimuons (~5/s)
• Used also for validation of new calibration
PID calibration using Ks, D*
CAF – CERN Analysis Facility
Grid resources for analysis
Direct batch system usage (LXBATCH) for SW tuning
Interactive usage (LXPLUS)
27
Tier1 Center
Real data persistency
First pass reconstruction and re-reconstruction
Data Stripping
Group analysis
Event preselection in several streams (if needed)
The resulting DST data shipped to all the other Tier1 centers
Further reduction of the datasets, μDST format
Centrally managed using the LHCb Production System
User analysis
Selections on stripped data
Preparing N-tuples and reduced datasets for local analysis
28
Tier2-Tier3 centers
No assumption of the local LHCb specific support
MC production facilities
Small local storage requirements to buffer MC data before
shipping to a respective Tier1 center
User analysis
No assumption of the user analysis in the base Computing model
However, several distinguished centers are willing to contribute
• Analysis (Stripped) data replication to T2-T3 centers by site
managers
Full or partial sample
• Increases the amount of resources capable of running User
Analysis jobs
Analysis data at T2 centers available to the whole Collaboration
• No special preferences for local users
29