Production Status

Download Report

Transcript Production Status

ATLAS Production
Kaushik De
University of Texas At Arlington
LHC Computing Workshop, Ankara
May 2, 2008
Outline
 Computing Grids
 Tiers of ATLAS
 PanDA Production System
 MC Production Statistics
 Distributed Data Management
 Operations Shifts
 Thanks to T. Maeno (BNL), R. Rocha (CERN), J. Shank
(BU) for some of the slides presented here
Kaushik De
May 2, 2008
2
EGEE Grid
+ Nordugrid - NDGF
(Nordic countries)
Kaushik De
May 2, 2008
3
OSG – US Grid
Kaushik De
May 2, 2008
4
Tiers of ATLAS
 10 Tier 1 centers

Canada, France, Germany, Italy, Netherlands, Nordic Countries,
Spain, Taipei, UK, USA
 ~35 Tier 2 centers

Australia, Austria, Canada, China, Czech R., France, Germany,
Israel, Italy, Japan, Poland, Portugal, Romania, Russian Fed.,
Slovenia, Spain, Switzerland, Taipei, UK, USA
 ? Tier 3 centers
Kaushik De
May 2, 2008
5
Tiered Example – US Cloud
BNL T1
MW T2
UC, IU
NE T2
BU, HU
SW T2
UTA, OU
SLAC T2
GL T2
UM, MSU
Tier 3’s
IU OSG
UTA DPCC
OU Oscer
Wisconsin
UC Teraport
UTD
LTU
SMU
Kaushik De
May 2, 2008
6
Data Flow in ATLAS
Kaushik De
May 2, 2008
7
Storage Estimate
Additional Disk resources needed in the second half of 2008 (post FDR)
Assuming:
Disk (T B)
BNL
IN2P3
SARA
RAL
FZK
CNAF
ASGC
PIC
NDGF
T riumf
Sum
Kaushik De
862
568
564
417
422
270
270
270
270
270
4186
60 days of data taking at 200hz
30 M simulated events
group analysis di sk space ( 70 T B for a 10% T 1)
di sk eficiency (0.7)
Continuous Simulation (assumes all CPU's do simul ation)
1037
677
664
484
497
303
303
303
303
303
4876
May 2, 2008
8
Production System Overview
Panda
ProdDB
submit
send jobs
DQ2
approved
register files
Clouds
Tasks requested
by Physics Working Group
Kaushik De
CA,DE,ES,FR,IT, NL,TW,UK,US
(NDGF coming)
May 2, 2008
9
Panda
 PANDA = Production ANd Distributed Analysis system
 Designed for analysis as well as production
 Project started Aug 2005, prototype Sep 2005, production
Dec 2005
 Works both with OSG and EGEE middleware
 A single task queue and pilots
 Apache-based Central Server
 Pilots retrieve jobs from the server as soon as CPU is
available  low latency
 Highly automated, has an integrated monitoring system, and
requires low operation manpower
 Integrated with ATLAS Distributed Data Management (DDM)
system
 Not exclusively ATLAS: has its first OSG user CHARMM
Kaushik De
May 2, 2008
10
Panda
Cloud
Tier 1
job
input files
storage
output files
job
Tier 2s
input files
output files
input files
output files
Kaushik De
storage
May 2, 2008
11
Panda/Bamboo System Overview
DQ2
Panda server
ProdDB
bamboo
job
LRC/LFC
send log
pull
https
pilot
job
submit
site A
pilot
End-user
Kaushik De
logger
site B
job
https
http
submit
condor-g
Autopilot
Worker Nodes
May 2, 2008
12
Panda server
clients
https
DQ2
Panda server
LRC/LFC
PandaDB
Apache + gridsite
logger
pilot
 Central queue for all kinds of jobs
 Assign jobs to sites (brokerage)
 Setup input/output datasets


Create them when jobs are submitted
Add files to output datasets when jobs are finished
 Dispatch jobs
Kaushik De
May 2, 2008
13
Bamboo
prodDB
Bamboo
cx_Oracle
Panda server
https
Apache + gridsite
cron
https
 Get jobs from prodDB to submit them to Panda
 Update job status in prodDB
 Assign tasks to clouds dynamically
 Kill TOBEABORTED jobs
A cron triggers the above procedures every 10 min
Kaushik De
May 2, 2008
14
Client-Server Communication
 HTTP/S-based communication (curl+grid proxy+python)
 GSI authentication via mod_gridsite
 Most of communications are asynchronous

Panda server runs python threads as soon as it receives
HTTP requests, and then sends responses back immediately. Threads do heavy
procedures (e.g., DB access) in background  better throughput

Several are synchronous
Panda
Client
Python
obj
serialize
(cPickle)
Python
obj
deserialize
(cPickle)
Kaushik De
Request
HTTPS
UserIF
mod_python
(x-www-form
-urlencode)
mod_deflate
Response
Python
obj
May 2, 2008
15
Data Transfer
DQ2
Panda
submitter
 Rely on ATLAS DDM
submit Job
 Panda sends requests to DDM
 DDM moves files and sends
notifications back to Panda
subscribe T2 for disp dataset
 Panda and DDM work
data transfer
asynchronously
callback
 Dispatch input files to T2s and aggregate
output files to T1
pilot
 Jobs get ‘activated’ when all input files
get Job
are copied, and pilots pick them up
run job
 Pilots don’t have to wait for data
finish Job
arrival on WNs
 Data-transfer and Job-execution can
run in parallel
add files to dest datasets
data transfer
Kaushik De
callback
May 2, 2008
16
Pilot and Autopilot (1/2)
 Autopilot is a scheduler to submit pilots to sites via
condor-g/glidein
Pilot  Gatekeeper
Job  Panda server
 Pilots are scheduled to the site batch system and pull
jobs as soon as CPUs become available
Panda server  Job  Pilot
 Pilot submission and Job submission are different
Job = payload for pilot
Kaushik De
May 2, 2008
17
Pilot and Autopilot (2/2)
 How pilot works

Sends the several parameters to Panda server for job
matching (HTTP request)




Retrieves an `activated’ job (HTTP response of the above
request)




Kaushik De
CPU speed
Available memory size on the WN
List of available ATLAS releases at the site
activated  running
Runs the job immediately because all input files should be
already available at the site
Sends heartbeat every 30min
Copy output files to local SE and register them to Local
Replica Catalogue
May 2, 2008
18
Production vs Analysis
 Run on same infrastructures


Same software, monitoring system and facilities
No duplicated manpower for maintenance
 Separate computing resources


Different queues  different CPU clusters
Production and analysis don’t have to compete with each
other
 Different policies for data transfers


Kaushik De
Analysis jobs don’t trigger data-transfer
 Jobs go to sites which hold the input files
For production, input files are dispatched to T2s and
output files are aggregated to T1 via DDM
asynchronously
 Controlled traffics
May 2, 2008
19
Current PanDA production – Past Week
Kaushik De
May 2, 2008
20
PanDA production – Past Month
Kaushik De
May 2, 2008
21
MC Production 2006-07
Kaushik De
May 2, 2008
22
ATLAS Data Management Software - Don Quijote
 The second generation of the ATLAS DDM system (DQ2)

DQ2 developers M.Branco, D.Cameron, T.Maeno, P.Salgado, T.Wenaus,…
 Initial idea and architecture were proposed by M.Branco and T.Wenaus
 DQ2 is built on top of Grid data transfer tools

Moved to dataset based approach

Datasets : an aggregation of files plus associated DDM metadata
Datasets is a unit of storage and replication

Automatic data transfer mechanisms using distributed site services


Subscription system

Notification system
 Current
Kaushik De
version 1.0
May 2, 2008
23
DDM components
DDM end-user tools (T.Maeno, BNL)
(dq2_ls,dq2_get, dq2_cr)
DQ2 dataset
catalog
Local
File
Catalogs
Kaushik De
File
Transfer
Service
DQ2
Subscription
Agents
DQ2
“Queued
Transfers”
May 2, 2008
24
DDM Operations Mode
All Tier-1s have predefined (software) channel with CERN
and with each other.
Tier-2s are associated with one Tier-1 and form the cloud
Tier-2s have predefined channel with the parent Tier-1 only.
US ATLAS DDM operations team :
BNL
GLT2
MWT2
NET2
SWT2
WT2
WISC
H.Ito, W.Deng,A.Klimentov,P.Nevski
S.McKee (MU)
C.Waldman (UC)
S.Youssef (BU)
P.McGuigan (UTA)
Y.Wei (SLAC)
X.Neng (WISC)
NG
RAL
CNAF
SARA
LYON Cloud
TWT2
CERN
T3
grif
PIC
ASGC Cloud
ASGC
lpc
LYON
Melbourne
Tokyo
Beijing
TRIUMF
FZK
lapp
Romania
BNL
T1-T1 and T1-T2 associations according to GP
ATLAS Tiers associations
T1
T2
Kaushik De
.
BNL Cloud
NET2
wisc
MWT2
WT2
T3
GLT2
VO box, dedicated computer
to run DDM services
SWT2
May 2, 2008
25
Activities. Data Replication
 Centralized and automatic (according to computing model)

Simulated data

AOD/NTUP/TAG (current data volume ~1.5 TB/week)



Validation samples


Replicated to BNL for SW validation purposes
Critical Data replication

Database releases


replicated to BNL from CERN and then from BNL to US ATLAS T2s. Data
volume is relatively small (~100MB)
Conditions data


BNL has a complete dataset replicas
US Tier-2s are defined what fraction of data they will keep
– From 30% to 100% .
Replicated to BNL from CERN
Cosmic data


BNL requested 100% of cosmic data.
Data replicated from CERN to BNL and to US Tier-2s
 Data replication for individual groups, Universities, physicists

Kaushik De
Dedicated Web interface is set up
May 2, 2008
26
Data Replication to Tier 2’s
Kaushik De
May 2, 2008
27
You’ll never walk alone
Weekly
Throughput
2.1 GB/s out
of CERN
From Simone Campana
Kaushik De
May 2, 2008
28
Subscriptions
Subscription
 Request
for the full replication of a dataset (or dataset
version) at a given site
Requests are collected by the centralized
subscription catalog
And are then served by a site of agents – the site
services
Subscription on a dataset version
 One
time only replication
Subscription on a dataset
 Replication triggered on every new version detected
 Subscription closed when dataset is frozen
Kaushik De
May 2, 2008
29
Site Services
 Agent based framework
 Goal: Satisfy subscriptions
 Each agent serves a specific part of a request









Fetcher: fetches up new subscription from the subscription catalog
Subscription Resolver: checks if subscription is still active, new
dataset versions, new files to transfer, …
Splitter: Create smaller chunks from the initial requests, identifies
files requiring transfer
Replica Resolver: Selects a valid replica to use as source
Partitioner: Creates chunks of files to be submitted as a single
request to the FTS
Submitter/PendingHandler: Submit/manage the FTS requests
Verifier: Check validity of file at destination
Replica Register: Registers new replica in the local replica catalog
…
Kaushik De
May 2, 2008
30
Typical deployment
 Deployment at Tier0 similar to Tier1s
 LFC and FTS services at Tier1s
 SRM services at every site, including Tier2s
Central
Catalogs
Site
Services
Site
Services
Site
Services
Kaushik De
May 2, 2008
31
Interaction with the grid middleware
 File Transfer Services (FTS)
 One deployed per Tier0 / Tier1 (matches typical site
services deployment)
 Triggers the third party transfer by contacting the SRMs,
needs to be constantly monitored
 LCG File Catalog (LFC)
 One deployed per Tier0 / Tier1 (matches typical site
services deployment)
 Keeps track of local file replicas at a site
 Currently used as main source of replica information by
the site services
 Storage Resource Manager (SRM)
 Once pre-staging comes into the picture
Kaushik De
May 2, 2008
32
DDM - Current Issues and Plans
 Dataset deletion



Non trivial, although critical
First implementation using a central request repository
Being integrated into the site services
 Dataset consistency



Between storage and local replica catalogs
Between local replica catalogs and the central catalogs
Lot of effort put into this recently – tracker, consistency service
 Prestaging of data


Currently done just before file movement
Introduces high latency when file is on tape
 Messaging

More asynchronous flow (less polling)
Kaushik De
May 2, 2008
33
ADC Operations Shifts
 ATLAS Distributed Computing Operations Shifts (ADCoS)

World-wide shifts

To monitor all ATLAS distributed computing resources

To provide Quality of Service (QoS) for all data processing

Shifters receive official ATLAS service credit (OTSMoU)
 Additional information

http://indico.cern.ch/conferenceDisplay.py?confId=22132

http://indico.cern.ch/conferenceDisplay.py?confId=26338
Kaushik De
May 2, 2008
34
Typical Shift Plan
 Browse recent shift history
 Check performance of all sites

File tickets for new issues

Continue interactions about old issues
 Check status of current tasks

Check all central processing tasks

Monitor analysis flow (not individual tasks)

Overall data movement
 File software (validation) bug reports
 Check Panda, DDM health
 Maintain elog of shift activities
Kaushik De
May 2, 2008
35
Shift Structure
 Shifter on call

Two consecutive days

Monitor – escalate – follow up

Basic manual interventions (site – on/off)
 Expert on call

One week duration

Global monitoring

Advice shifter on call

Major interventions (service - on/off)

Interact with other ADC operations teams

Provide feed-back to ADC development teams
 Tier 1 expert on call

Very important (ex. Rod Walker, Graeme Stewart, Eric Lancon…)
Kaushik De
May 2, 2008
36
Shift Structure Schematic
by Xavier Espinal
Kaushik De
May 2, 2008
37
ADC Inter-relations
Production
Alex Read
Central Services
Birger Koblitz
Operations Support
Pavel Nevski
Tier 1 / Tier 2
Simone Campana
Kaushik De
Tier 0
Armin Nairz
DDM
Stephane Jezequel
ADCoS
Distributed Analysis
Dietrich Liko
May 2, 2008
38
Kaushik De
May 2, 2008
39
Kaushik De
May 2, 2008
40