Managing distributed computing resources with DIRAC

Download Report

Transcript Managing distributed computing resources with DIRAC

Managing distributed
computing resources with
DIRAC
A.Tsaregorodtsev,
CPPM-IN2P3-CNRS, Marseille
12-17 September 2011, NEC’11, Varna
1
Outline

DIRAC Overview

Main subsystems
 Workload
Management
 Request Management
 Transformation Management
 Data Management

Use in LHCb and other experiments

DIRAC as a service

Conclusion
2
Introduction

DIRAC is first of all a framework to build
distributed computing systems


Supporting Service Oriented Architectures
GSI compliant secure client/service protocol
• Fine grained service access rules


Hierarchical Configuration service for bootstrapping
distributed services and agents
This framework is used to build all the DIRAC
systems:

Workload Management
• Based on Pilot Job paradigm



Production Management
Data Management
etc
3
Production
Manager
Physicist
User
Matcher
Service
EGI/WLCG
Grid
EGEE
Pilot
Director
NDG
Grid
NDG
Pilot
Director
GISELA
Grid
EELA
Pilot
Director
CREAM
CE
CREAM
Pilot
Director
User credentials management

The WMS with Pilot Jobs requires a strict user
proxy management system





Jobs are submitted to the DIRAC Central Task Queue with
credentials of their owner (VOMS proxy)
Pilot Jobs are submitted to a Grid WMS with credentials of a
user with a special Pilot role
The Pilot Job fetches the user job and the job owner’s proxy
The User Job is executed with its owner’s proxy used to
access SE, catalogs, etc
The DIRAC Proxy manager service ensures the
necessary functionality


Proxy storage and renewal
Possibility to outsource the proxy renewal to the MyProxy
server
5
Direct submission to CEs

Using gLite WMS now
just as a pilot deployment
mechanism

Limited use of brokering
features
• For jobs with input data the
destination site is already
chosen



Have to use multiple Resource
Brokers because of scalability problems
DIRAC is supporting direct submission to CEs


CREAM CEs
Can apply individual site policy

Direct measurement of the site state watching the pilot status info
• Site chooses how much load it can take (Pull vs Push paradigm)
This is a general trend

All the LHC experiments declared abandoning eventually gLite
WMS
6
DIRAC sites


Dedicated Pilot Director
per (group of) site(s)
On-site Director



Off-site Director
Site managers have full
control
Of LHCb payloads
Off-site Director




On-site Director
Site delegates control
to the central service
Site must only define a
dedicated local user
account
The payload
submission through the
SSH tunnel
In both cases the
payload is executed
with the owner
credentials
7
DIRAC Sites

Several DIRAC
sites in production
in LHCb
 E.g. Yandex
• 1800 cores
• Second largest MC
production site

Interesting possibility for small user
communities or infrastructures e.g.


contributing local clusters
building regional or university grids
8
WMS performance

Up to 35K concurrent jobs in ~120 distinct sites



Limited by the resources available to LHCb
10 mid-range servers hosting DIRAC central services
Further optimizations to increase the capacity are possible
●
Hardware, database optimizations, service load balancing, etc
9
Belle (KEK) use of the Amazon EC2

VM scheduler developed for Belle MC
production system
 Dynamic
VM spawning taking spot prices and TQ
state into account
Thomas Kuhr, Belle
10
Belle Use of the Amazon EC2

Various computing
resource combined
in a single production
system
 KEK
cluster
 LCG grid sites
 Amazon EC2

Common monitoring,
accounting, etc
Thomas Kuhr, Belle II
11
Belle II
Raw Data Storage
and Processing

Starting at 2015
after the KEK update
 50

MC Production
and Ntuple Production
ab-1 by 2020
Computing model
 Data
Ntuple
Analysis
rate 1.8 GB/s
( high rate scenario )
 Using KEK computing
center, grid and cloud resources

Belle II distributed computing system is based
on DIRAC
Thomas Kuhr, Belle II
12
Support for MPI Jobs

MPI Service developed
for applications in the
GISELA Grid



Astrophysics, BioMed,
Seismology applications
No special MPI support on
sites is required
• MPI software installed by
Pilot Jobs
MPI ring usage optimization
• Ring reuse for multiple jobs
 Lower load on the gLite WMS
• Variable ring sizes for different jobs

Possible usage for HEP applications:

Proof on demand dynamic sessions
13
Coping with failures

Problem: distributed resources and services
are unreliable
 Software
bugs, misconfiguration
 Hardware failures
 Human errors


Solution: redundancy and asynchronous
operations
DIRAC services are redundant
 Geographically:
Configuration, Request
Management
 Several instances for any service
14
Request Management system

A Request Management
System (RMS) to accept and
execute asynchronously any
kind of operation that can fail



Request are collected by
RMS instances on VO-boxes at 7 Tier-1 sites


Data upload and registration
Job status and parameter reports
Extra redundancy in VO-box availability
Requests are forwarded to the central Request
Database


For keeping track of the pending requests
For efficient bulk request execution
15
DIRAC Transformation Management

Data driven
payload
generation based
on templates

Generating data
processing and
replication tasks

LHCb specific
templates and
catalogs
16
Data Management

Based on the Request
Management System

Asynchronous data
operations


transfers, registration,
removal
Two complementary
replication mechanisms

Transfer Agent
• user data
• public network

FTS service
• Production data
• Private FTS OPN network
• Smart pluggable
replication strategies
17
Transfer accounting (LHCb)
18
ILC using DIRAC

ILC CERN group


2M jobs run in the first
year


Using DIRAC Workload
Management and
Transformation systems
Instead of 20K planned
initially
DIRAC FileCatalog was developed for ILC


More efficient than LFC for common queries
Includes user metadata natively
19
DIRAC as a service

DIRAC installation shared by a number of user
communities and centrally operated

EELA/GISELA grid


gLite based
DIRAC is part of the grid production infrastructure
• Single VO

French NGI installation



https://dirac.in2p3.fr
Started as a service for grid tutorials support
Serving users from various domains now
• Biomed, earth observation, seismology, …
• Multiple VOs
20
DIRAC as a service

Necessity to manage
multiple VOs with a single
DIRAC installation
 Per
VO pilot credentials
 Per VO accounting
 Per VO resources
description

Pilot directors are VO
aware
 Job
matching takes pilot VO
assignment into account
21
DIRAC Consortium

Other projects are starting to use or
evaluating DIRAC
 CTA, SuperB, BES, VIP(medical imaging),
• Contributing to DIRAC development
• Increasing the number of experts
 Need

…
for user support infrastructure
Turning DIRAC into an Open Source project
 DIRAC Consortium agreement in preparation
• IN2P3, Barcelona University, CERN, …
 http://diracgrid.org
• News, docs, forum
22
Conclusions

DIRAC is successfully used in LHCb for all
distributed computing tasks in the first years
of the LHC operations

Other experiments and user communities
started to use DIRAC contributing their
developments to the project

The DIRAC open source project is being built
now to bring the experience from HEP
computing to other experiments and
application domains
23
24
LHCb in brief

Experiment dedicated to
studying CP-violation
Responsible for the dominance
of matter on antimatter
 Matter-antimatter difference
studied using the b-quark
(beauty)
 High precision physics (tiny
difference…)


Single arm spectrometer
Looks like a fixed-target
experiment
 Smallest of the 4 big LHC
experiments
 ~500 physicists


Nevertheless, computing is also
a challenge….
25
LHCb Computing Model
Tier0 Center

Raw data shipped in real time to Tier-0



Part of the first pass reconstruction and re-reconstruction


Acting as one of the Tier1 center
Calibration and alignment performed on a selected part of
the data stream (at CERN)



Resilience enforced by a second copy at Tier-1’s
Rate: ~3000 evts/s (35 kB) at ~100 MB/s
Alignment and tracking calibration using dimuons (~5/s)
• Used also for validation of new calibration
PID calibration using Ks, D*
CAF – CERN Analysis Facility



Grid resources for analysis
Direct batch system usage (LXBATCH) for SW tuning
Interactive usage (LXPLUS)
27
Tier1 Center

Real data persistency

First pass reconstruction and re-reconstruction

Data Stripping



Group analysis



Event preselection in several streams (if needed)
The resulting DST data shipped to all the other Tier1 centers
Further reduction of the datasets, μDST format
Centrally managed using the LHCb Production System
User analysis


Selections on stripped data
Preparing N-tuples and reduced datasets for local analysis
28
Tier2-Tier3 centers

No assumption of the local LHCb specific support

MC production facilities


Small local storage requirements to buffer MC data before
shipping to a respective Tier1 center
User analysis



No assumption of the user analysis in the base Computing model
However, several distinguished centers are willing to contribute
• Analysis (Stripped) data replication to T2-T3 centers by site
managers
 Full or partial sample
• Increases the amount of resources capable of running User
Analysis jobs
Analysis data at T2 centers available to the whole Collaboration
• No special preferences for local users
29