Production Status
Download
Report
Transcript Production Status
ATLAS Production
Kaushik De
University of Texas At Arlington
LHC Computing Workshop, Ankara
May 2, 2008
Outline
Computing Grids
Tiers of ATLAS
PanDA Production System
MC Production Statistics
Distributed Data Management
Operations Shifts
Thanks to T. Maeno (BNL), R. Rocha (CERN), J. Shank
(BU) for some of the slides presented here
Kaushik De
May 2, 2008
2
EGEE Grid
+ Nordugrid - NDGF
(Nordic countries)
Kaushik De
May 2, 2008
3
OSG – US Grid
Kaushik De
May 2, 2008
4
Tiers of ATLAS
10 Tier 1 centers
Canada, France, Germany, Italy, Netherlands, Nordic Countries,
Spain, Taipei, UK, USA
~35 Tier 2 centers
Australia, Austria, Canada, China, Czech R., France, Germany,
Israel, Italy, Japan, Poland, Portugal, Romania, Russian Fed.,
Slovenia, Spain, Switzerland, Taipei, UK, USA
? Tier 3 centers
Kaushik De
May 2, 2008
5
Tiered Example – US Cloud
BNL T1
MW T2
UC, IU
NE T2
BU, HU
SW T2
UTA, OU
SLAC T2
GL T2
UM, MSU
Tier 3’s
IU OSG
UTA DPCC
OU Oscer
Wisconsin
UC Teraport
UTD
LTU
SMU
Kaushik De
May 2, 2008
6
Data Flow in ATLAS
Kaushik De
May 2, 2008
7
Storage Estimate
Additional Disk resources needed in the second half of 2008 (post FDR)
Assuming:
Disk (T B)
BNL
IN2P3
SARA
RAL
FZK
CNAF
ASGC
PIC
NDGF
T riumf
Sum
Kaushik De
862
568
564
417
422
270
270
270
270
270
4186
60 days of data taking at 200hz
30 M simulated events
group analysis di sk space ( 70 T B for a 10% T 1)
di sk eficiency (0.7)
Continuous Simulation (assumes all CPU's do simul ation)
1037
677
664
484
497
303
303
303
303
303
4876
May 2, 2008
8
Production System Overview
Panda
ProdDB
submit
send jobs
DQ2
approved
register files
Clouds
Tasks requested
by Physics Working Group
Kaushik De
CA,DE,ES,FR,IT, NL,TW,UK,US
(NDGF coming)
May 2, 2008
9
Panda
PANDA = Production ANd Distributed Analysis system
Designed for analysis as well as production
Project started Aug 2005, prototype Sep 2005, production
Dec 2005
Works both with OSG and EGEE middleware
A single task queue and pilots
Apache-based Central Server
Pilots retrieve jobs from the server as soon as CPU is
available low latency
Highly automated, has an integrated monitoring system, and
requires low operation manpower
Integrated with ATLAS Distributed Data Management (DDM)
system
Not exclusively ATLAS: has its first OSG user CHARMM
Kaushik De
May 2, 2008
10
Panda
Cloud
Tier 1
job
input files
storage
output files
job
Tier 2s
input files
output files
input files
output files
Kaushik De
storage
May 2, 2008
11
Panda/Bamboo System Overview
DQ2
Panda server
ProdDB
bamboo
job
LRC/LFC
send log
pull
https
pilot
job
submit
site A
pilot
End-user
Kaushik De
logger
site B
job
https
http
submit
condor-g
Autopilot
Worker Nodes
May 2, 2008
12
Panda server
clients
https
DQ2
Panda server
LRC/LFC
PandaDB
Apache + gridsite
logger
pilot
Central queue for all kinds of jobs
Assign jobs to sites (brokerage)
Setup input/output datasets
Create them when jobs are submitted
Add files to output datasets when jobs are finished
Dispatch jobs
Kaushik De
May 2, 2008
13
Bamboo
prodDB
Bamboo
cx_Oracle
Panda server
https
Apache + gridsite
cron
https
Get jobs from prodDB to submit them to Panda
Update job status in prodDB
Assign tasks to clouds dynamically
Kill TOBEABORTED jobs
A cron triggers the above procedures every 10 min
Kaushik De
May 2, 2008
14
Client-Server Communication
HTTP/S-based communication (curl+grid proxy+python)
GSI authentication via mod_gridsite
Most of communications are asynchronous
Panda server runs python threads as soon as it receives
HTTP requests, and then sends responses back immediately. Threads do heavy
procedures (e.g., DB access) in background better throughput
Several are synchronous
Panda
Client
Python
obj
serialize
(cPickle)
Python
obj
deserialize
(cPickle)
Kaushik De
Request
HTTPS
UserIF
mod_python
(x-www-form
-urlencode)
mod_deflate
Response
Python
obj
May 2, 2008
15
Data Transfer
DQ2
Panda
submitter
Rely on ATLAS DDM
submit Job
Panda sends requests to DDM
DDM moves files and sends
notifications back to Panda
subscribe T2 for disp dataset
Panda and DDM work
data transfer
asynchronously
callback
Dispatch input files to T2s and aggregate
output files to T1
pilot
Jobs get ‘activated’ when all input files
get Job
are copied, and pilots pick them up
run job
Pilots don’t have to wait for data
finish Job
arrival on WNs
Data-transfer and Job-execution can
run in parallel
add files to dest datasets
data transfer
Kaushik De
callback
May 2, 2008
16
Pilot and Autopilot (1/2)
Autopilot is a scheduler to submit pilots to sites via
condor-g/glidein
Pilot Gatekeeper
Job Panda server
Pilots are scheduled to the site batch system and pull
jobs as soon as CPUs become available
Panda server Job Pilot
Pilot submission and Job submission are different
Job = payload for pilot
Kaushik De
May 2, 2008
17
Pilot and Autopilot (2/2)
How pilot works
Sends the several parameters to Panda server for job
matching (HTTP request)
Retrieves an `activated’ job (HTTP response of the above
request)
Kaushik De
CPU speed
Available memory size on the WN
List of available ATLAS releases at the site
activated running
Runs the job immediately because all input files should be
already available at the site
Sends heartbeat every 30min
Copy output files to local SE and register them to Local
Replica Catalogue
May 2, 2008
18
Production vs Analysis
Run on same infrastructures
Same software, monitoring system and facilities
No duplicated manpower for maintenance
Separate computing resources
Different queues different CPU clusters
Production and analysis don’t have to compete with each
other
Different policies for data transfers
Kaushik De
Analysis jobs don’t trigger data-transfer
Jobs go to sites which hold the input files
For production, input files are dispatched to T2s and
output files are aggregated to T1 via DDM
asynchronously
Controlled traffics
May 2, 2008
19
Current PanDA production – Past Week
Kaushik De
May 2, 2008
20
PanDA production – Past Month
Kaushik De
May 2, 2008
21
MC Production 2006-07
Kaushik De
May 2, 2008
22
ATLAS Data Management Software - Don Quijote
The second generation of the ATLAS DDM system (DQ2)
DQ2 developers M.Branco, D.Cameron, T.Maeno, P.Salgado, T.Wenaus,…
Initial idea and architecture were proposed by M.Branco and T.Wenaus
DQ2 is built on top of Grid data transfer tools
Moved to dataset based approach
Datasets : an aggregation of files plus associated DDM metadata
Datasets is a unit of storage and replication
Automatic data transfer mechanisms using distributed site services
Subscription system
Notification system
Current
Kaushik De
version 1.0
May 2, 2008
23
DDM components
DDM end-user tools (T.Maeno, BNL)
(dq2_ls,dq2_get, dq2_cr)
DQ2 dataset
catalog
Local
File
Catalogs
Kaushik De
File
Transfer
Service
DQ2
Subscription
Agents
DQ2
“Queued
Transfers”
May 2, 2008
24
DDM Operations Mode
All Tier-1s have predefined (software) channel with CERN
and with each other.
Tier-2s are associated with one Tier-1 and form the cloud
Tier-2s have predefined channel with the parent Tier-1 only.
US ATLAS DDM operations team :
BNL
GLT2
MWT2
NET2
SWT2
WT2
WISC
H.Ito, W.Deng,A.Klimentov,P.Nevski
S.McKee (MU)
C.Waldman (UC)
S.Youssef (BU)
P.McGuigan (UTA)
Y.Wei (SLAC)
X.Neng (WISC)
NG
RAL
CNAF
SARA
LYON Cloud
TWT2
CERN
T3
grif
PIC
ASGC Cloud
ASGC
lpc
LYON
Melbourne
Tokyo
Beijing
TRIUMF
FZK
lapp
Romania
BNL
T1-T1 and T1-T2 associations according to GP
ATLAS Tiers associations
T1
T2
Kaushik De
.
BNL Cloud
NET2
wisc
MWT2
WT2
T3
GLT2
VO box, dedicated computer
to run DDM services
SWT2
May 2, 2008
25
Activities. Data Replication
Centralized and automatic (according to computing model)
Simulated data
AOD/NTUP/TAG (current data volume ~1.5 TB/week)
Validation samples
Replicated to BNL for SW validation purposes
Critical Data replication
Database releases
replicated to BNL from CERN and then from BNL to US ATLAS T2s. Data
volume is relatively small (~100MB)
Conditions data
BNL has a complete dataset replicas
US Tier-2s are defined what fraction of data they will keep
– From 30% to 100% .
Replicated to BNL from CERN
Cosmic data
BNL requested 100% of cosmic data.
Data replicated from CERN to BNL and to US Tier-2s
Data replication for individual groups, Universities, physicists
Kaushik De
Dedicated Web interface is set up
May 2, 2008
26
Data Replication to Tier 2’s
Kaushik De
May 2, 2008
27
You’ll never walk alone
Weekly
Throughput
2.1 GB/s out
of CERN
From Simone Campana
Kaushik De
May 2, 2008
28
Subscriptions
Subscription
Request
for the full replication of a dataset (or dataset
version) at a given site
Requests are collected by the centralized
subscription catalog
And are then served by a site of agents – the site
services
Subscription on a dataset version
One
time only replication
Subscription on a dataset
Replication triggered on every new version detected
Subscription closed when dataset is frozen
Kaushik De
May 2, 2008
29
Site Services
Agent based framework
Goal: Satisfy subscriptions
Each agent serves a specific part of a request
Fetcher: fetches up new subscription from the subscription catalog
Subscription Resolver: checks if subscription is still active, new
dataset versions, new files to transfer, …
Splitter: Create smaller chunks from the initial requests, identifies
files requiring transfer
Replica Resolver: Selects a valid replica to use as source
Partitioner: Creates chunks of files to be submitted as a single
request to the FTS
Submitter/PendingHandler: Submit/manage the FTS requests
Verifier: Check validity of file at destination
Replica Register: Registers new replica in the local replica catalog
…
Kaushik De
May 2, 2008
30
Typical deployment
Deployment at Tier0 similar to Tier1s
LFC and FTS services at Tier1s
SRM services at every site, including Tier2s
Central
Catalogs
Site
Services
Site
Services
Site
Services
Kaushik De
May 2, 2008
31
Interaction with the grid middleware
File Transfer Services (FTS)
One deployed per Tier0 / Tier1 (matches typical site
services deployment)
Triggers the third party transfer by contacting the SRMs,
needs to be constantly monitored
LCG File Catalog (LFC)
One deployed per Tier0 / Tier1 (matches typical site
services deployment)
Keeps track of local file replicas at a site
Currently used as main source of replica information by
the site services
Storage Resource Manager (SRM)
Once pre-staging comes into the picture
Kaushik De
May 2, 2008
32
DDM - Current Issues and Plans
Dataset deletion
Non trivial, although critical
First implementation using a central request repository
Being integrated into the site services
Dataset consistency
Between storage and local replica catalogs
Between local replica catalogs and the central catalogs
Lot of effort put into this recently – tracker, consistency service
Prestaging of data
Currently done just before file movement
Introduces high latency when file is on tape
Messaging
More asynchronous flow (less polling)
Kaushik De
May 2, 2008
33
ADC Operations Shifts
ATLAS Distributed Computing Operations Shifts (ADCoS)
World-wide shifts
To monitor all ATLAS distributed computing resources
To provide Quality of Service (QoS) for all data processing
Shifters receive official ATLAS service credit (OTSMoU)
Additional information
http://indico.cern.ch/conferenceDisplay.py?confId=22132
http://indico.cern.ch/conferenceDisplay.py?confId=26338
Kaushik De
May 2, 2008
34
Typical Shift Plan
Browse recent shift history
Check performance of all sites
File tickets for new issues
Continue interactions about old issues
Check status of current tasks
Check all central processing tasks
Monitor analysis flow (not individual tasks)
Overall data movement
File software (validation) bug reports
Check Panda, DDM health
Maintain elog of shift activities
Kaushik De
May 2, 2008
35
Shift Structure
Shifter on call
Two consecutive days
Monitor – escalate – follow up
Basic manual interventions (site – on/off)
Expert on call
One week duration
Global monitoring
Advice shifter on call
Major interventions (service - on/off)
Interact with other ADC operations teams
Provide feed-back to ADC development teams
Tier 1 expert on call
Very important (ex. Rod Walker, Graeme Stewart, Eric Lancon…)
Kaushik De
May 2, 2008
36
Shift Structure Schematic
by Xavier Espinal
Kaushik De
May 2, 2008
37
ADC Inter-relations
Production
Alex Read
Central Services
Birger Koblitz
Operations Support
Pavel Nevski
Tier 1 / Tier 2
Simone Campana
Kaushik De
Tier 0
Armin Nairz
DDM
Stephane Jezequel
ADCoS
Distributed Analysis
Dietrich Liko
May 2, 2008
38
Kaushik De
May 2, 2008
39
Kaushik De
May 2, 2008
40