Review of CERN Tier-2 Meeting
Download
Report
Transcript Review of CERN Tier-2 Meeting
Review of WLCG Tier-2 Workshop
Duncan Rand
Royal Holloway, University of London
Brunel University
....from the perspective of a Tier-2 system manager
Workshop 3 days – lectures from experiments
Tutorial 2 days – parallel programme
Lots of talks with lots of detail!
General overview - refer to original slides for details
Oriented towards ATLAS (RHUL) and CMS (Brunel)
What did I expect?
An overview of the future
the big picture
more details about the experiments
data flows and rates
how were they going to use the Tier-2 sites?
what did they expect from us?
Perhaps, a tour of the LHC or an experiment
What do the experiments have in common?
Large volume of data to analyse (we knew that)
Need to distribute data to CPU’s, keep track of it, analyse it
and upload results
However, also need to run lots of Monte Carlo (MC) jobs
common to all particle physics experiments
large fraction of all jobs run (ATLAS:1/3; CMS:1/2)
submitted from a central server – 'production'
explains mysterious 'prd' users e.g. lhcbprd running on our
Tier-2 now
What do they do in Monte Carlo production?
Start with small dataset (KB) with initial conditions describing
experiment
Model experiment from collision to analysis
Model proton-proton interactions, detector physics etc..
CPU intensive; about 10 kSI2k hours
Upload larger data-set to Tier-1 at the end
Relatively low network demands; steady data flow from Tier-2
to Tier-1 of about 50Mbit/s (varies for each expt.)
Data Management
Data is immediately transferred from Tier-0 to Tier-1's for
backup
RAW data is first calibrated and reconstructed to give Event
Summary Data (ESD) and Analysis Object Data (AOD) suitable
for analysis
AOD data sets transferred to Tier-2's for analysis – ‘bursty’
depending on user needs, ~300 Mbit/s (varies for each expt.)
Tier-1’s will provide reliable storage of data
Tier-2’s act more like dynamic cache
Tier-1’s handle more or less of essential services such as file
catalogues, FTS services etc.
Computing
Experiments have developed complex software tools to:
handle all this data transfer and keep track of datasets
(CMS:PhEDEx, ATLAS: DDM)
handle submission of MC production
(CMS: ProdManager/ProdAgent)
direct jobs to where the datasets are
enable physicist in office to carry out ‘chaotic user
analysis’ (doesn’t describe their mode of work, more the
lack of central submission of jobs) (CMS:CRAB)
these make more or less demands on a site
ALICE
Alice - not highly relevant to UK as only supported by
Birmingham at Tier-2 level
Distinction between Tier-1 and Tier-2 is by Quality of Service
Require extra VO box installed at a site; unlikely to use nonAlice Tier-2's opportunistically?
Developing ‘parallel root facility’ (PROOF) clusters at Tier-2’s
for faster interactive data analysis
LHCb
Not going to use Tier-2's for analysis of data – concentrate
analysis at Tier-1
Only going to run Monte Carlo jobs at Tier-2's
Simplifies data transfer requirements at Tier-2 level
So, easiest for a Tier-2 to support
Low networking demands: 40Mbit/s aggregated over all
Tier-2’s
UKI-LT2-Brunel (100 Mbit/s) recently in top 10 providers for
LHCb Monte Carlo
ATLAS
Tier-2's provide 40% of total computing and storage
requirements
Hierarchical structure between Tier-1's and Tier-2's
a Tier-1 provides services (FTS, LFC) to group of Tier-2's
no extra services required at Tier-2 level
Tier-2's will carry out MC simulations - results sent back to
Tier-1's for storage and further distribution and processing –
steady 30Mbit/s from site
AOD (analysis object data) will be distributed to Tier-2's for
analysis: 160Mbit/s to site
SC4: how long to analyse 150TB data equivalent to 1 year
running of LHC?
CMS
CPU intensive processing mostly carried out at Tier-2’s
Tier-2’s run 50% MC and 50% analysis jobs
MC production jobs handled by central queue called
‘ProductionManager’
submit, track jobs and register output in CMS databases
jobs handed to ProductionAgents for processing
MC job output does not go from WN to Tier-1 directly
data is stored locally and small files are merged together
by new jobs (heavy I/O)
large file (~TB) returned to Tier-1
CMS
Importance of good LAN bandwidth from WN’s to SE to do
this merging of files
Use ‘CRAB’ (CMS Remote Analysis Builder) at a UI to
analyse data
User specifies dataset
CRAB ‘discovers’ data, prepares job and submits it
‘Surviving the first years’; until detector is understood AOD’s
not that useful - will rely heavily on raw data – large networking
demands
CMS: requirements of Tier-2 site
Division of labour
CMS look after global issues
Tier-2 look after local issues to keep site running
What is required:
a good batch farm with reliable storage
good LAN and WAN networking
install PhEDEx, LFC and Squid cache (calibration data)
pass Site Functional Tests
a good Tier2 is ‘active, responsive, attentive, proactive’
Support and operations afternoon
Discovered that
WLCG = EGEE + OSG
i.e. we are now working more closely with the US Open
Science Grid
OSG not too relevant for the average Tier-2 sys-admin in UK
UKI ROC meeting
Small room, face to face meeting – lots of discussion
Grumbles about GGUS tickets and time taken to close solved
ticket
Close it yourself add ‘status=solved’ to first line of reply
Highlighted for me the somewhat one-directional flow of
information in the workshop itself
Would have been good for Tier-2’s to have been able to
present at the workshop
Middleware tutorials
Popular – lots of discussion
Understandable given fact that Tier-2 system admins more
interested in middleware than experimental computing models
Good to be able to hear roadmap for LFC, DPM, FTS, SFT’s
etc. from middleware developers and ask questions
Tier-2 interaction
Didn't appear to be much interaction between Tier-2's
Lack of name badges?
Missed chance to find out how others do things
Michel Jouvin from GRIF (Paris) gave a summary of his
survey on Tier-2’s
large variation between resources at Tier-2’s
1 to 8 sites per Tier-2; 1 to 13 FTE!
Difference between distributed vs. federated Tier-2’s?
Post-workshop survey excellent idea
Providing a Service
We are the users and customers of the middleware
Tier-2 providing a service for experiments
➢
CMS: ‘Your customers will be remote users’
Tier-2's need to generate a customer service mentality
Need good communication paths to ensure this works well
CMS have VRVS integration meetings and email list – sounds
promising
Not very clear how other experiments will communicate proactively
Summary
Learnt a lot about how the experiments intend to use Tier-2's
Pretty clear about what they need from Tier-2 sites
Could have been more feedback from Tier-2’s
Could have been more interaction between Tier-2’s
Tier-2’s are critical to success of LHC: service mentality
Communication between experiments and Tier-2’s unclear
The LHC juggernaut is changing up a gear !