No Slide Title

Download Report

Transcript No Slide Title

CMS Computing Plans
Ian Willers
CERN
Thank-you to David Newbold
and David Stickland for some
borrowed slides
Or how CMS will process the
data produced by the
experiment
Test Insertion of Vacuum Tube
2
The Technical Design Report
•
•
•
•
Describes the computing system
for the CMS start-up in 2007/8.
Authors: The CMS Collaboration
Includes list of people who
participated in the Computing
Project
Includes
– Computing Model
– Computing Systems and their Tiered
Structure
– CMS Computing Services and System
Operation
3
Time Line
1996
Technical Proposal for CMS Computing – introducing object
oriented software for applications, simulation, databases etc.
2000-1
Hoffmann Review – worldwide analysis and computing model,
software, management and resources
2001
Proposal for Building the LHC Computing Environment – funds
and manpower request presented to council
2001
LHC Computing Grid Project – Grid plus software process,
infrastructure, libraries, persistency, interfaces, simulation …
2004
LHC Worldwide Computing Grid – Memorandum of
Understanding, Maintenance and Operation Adendum
2004
Computing Model – Computing Project initiated as part of
reorganisation
2005
The Computing Project Technical Design Report covers 2005
onwards
4
LHC Running Scenario
• Main focus is first major LHC run (2008)
• Beam time in 2008-9 may be 40% of this figure
• These are peak Luminosities
• We expect to control the trigger mix/rates during the fill such
that the change does not affect the total rate to tape
5
“Baseline” and “Average”
•
In the Computing TDR we discuss an initial baseline
– Best understanding of what we expect to be possible
– We will adjust to take account of any faster than expected developments in
for example grid middleware functionality
– Like all such battle plans, it may not survive unscathed the first engagement
with the enemy…
•
We calculate specifications for “Average” centers.
– Tier-1 centers will come in range of actual capacities (available to CMS)
• Sharing with other experiments…
– Tier-2 centers will also cover a range of perhaps 0.5-1.5 times the average
values
• And will probably be focused to some particular activities (Calibration, Heavy-Ion,..)
that will also break this symmetry in reality
6
Event Sizes and Rates
•
Raw Data size is estimated to be 1.5MB
for 2x1033 first full physics run
– Hard to deduce when the event size
will fall and how that will be
compensated by increasing
Luminosity
•
Event Rate is estimated to be up to
200Hz for 2x1033 first full physics run
– Minimum rate for discovery physics
and calibration: 105Hz (DAQ TDR)
– Standard Model (jets, hadronic,
top,…) +50Hz
•
LHCC study in 2002 showed that
ATLAS/CMS have ~same rates for
same thresholds and physics reach
'Front' view
Level 1
Trigger
Detector Frontend
1
512
RU 1
Event
Manager
64
FED
DAQ links
8x8 Builder
Readout
Units
Controls
CS
Builder Networks (64x64)
BU 1
64
Builder
Units
Filter Farm Network (64xn)
FU
Filter
Sub-farms
7
Event Data Formats
•
RAW
– 1.5MB
– One copy spread over T1 centers and one at Tier-0
– Used for Detector Understanding, Code optimization, Calibrations,…
•
RECO
– 250KB
– Reconstructed hits, Reconstructed objects (tracks, vertices, jets, electrons, muons,
etc.)
– Track Re-finding and Re-fitting, new MET and clustering,
– One copy spread over T1 centers (together with associated RAW)
•
More copies possible for smaller/hot datasets
– Used by all Early Analysis, and by some Detailed Analyses
•
AOD
–
–
–
–
–
50kB
Reconstructed objects (tracks, vertices, jets, electrons, muons, etc.).
Possible small quantities of very localized hit information.
All streams at every T1 center, many streams at T2 centers
Used eventually by Most Physics Analysis
8
Prioritisation
• CMS CM uses strong streaming from the Online/HLT and
throughout the reconstruction and analysis process
– Priorities can be applied at all processing and data management
levels
– Finding the data for a given analysis much more straightforward
– Reducing the number of stream would be trivial (technically)
• Increasing from one is much harder
9
Tiered Architecture and Data Flow
10
Tier-0 Operations
•
Online Streams arrive in a 20 day
input buffer
– They are split into Primary Datasets
that are concatenated to form
reasonable file sizes
– Primary Dataset RAW data is:
• archived to tape at Tier-0
• Sent to reconstruction nodes in the
Tier-0
•
Resultant RECO Data is
concatenated (zip) with matching
RAW data to form a distributable
format FEVT (Full Event)
– RECO data is archived to tape at
Tier-0
– FEVT are distributed to Tier-1
centers (T1s subscribe to data,
actively pushed)
– AOD copy is sent to each Tier-1
center
11
Tier-1 Centres
•
Centres with declared resource
intentions, are:
–
–
–
–
–
–
–
–
•
ASCC (Taiwan)
CCIN2P3 (France)
CERN CAF
FNAL (USA)
GridKA (Germany)
INFN-CNAF (Italy)
PIC (Spain)
RAL (UK)
And additional statements of
intent to provide Tier-1 resources
from: China(IHEP), Korea
(CHEP), Nordic-Countries
12
Tier-1 Operations
•
Receive Custodial data (FEVT
and AOD)
– Current Dataset “on disk”
– Other bulk data mostly on tape with
disk cache for staging
•
Receive Reconstructed
Simulated events from Tier-2
– Archive them, Distribute out AOD
for Simulated data to all other Tier-1
sites
•
Serve Current Data to Analysis
groups running selections, skims,
test reprocessing
– Most analysis products sent to Tier2 for iterative analysis work
•
Run official reconstruction
passes on local RAW/RECO and
simulated data
13
Tier-2 Centres
•
•
•
Tier-2 resource contributions are
declared in an MoU
Each Tier-2 is associated with a
particular Tier-1 centre
We foresee three types of use of
Tier-2 resources:
– Local community use. Some
fraction of the Tier-2 centre
resources will be fully under the
control of their owners
– CMS controlled use. Tier-2
resources will also be used for
organised activities allocated topdown by CMS. (In consultation with
the Local Owners)
– Opportunistic use by any CMS
member
14
Tier-3 Centres
•
Tier-3 sites are (often relatively small) computing installations that
serve the needs of the local institutions user community, and provide
services and resources to CMS in a mostly opportunistic way
– The facilities at these sites do not form part of the negotiated baseline
resource for CMS, but can make a potentially significant contribution to the
experiment’s needs on a best-effort basis
•
Tier-3’s are not covered by CMS MoUs, do not have direct managerial
communication with CMS and are not guaranteed direct support from
the CMS computing project
– Nevertheless, Tier-3 sites are an important component of the analysis
capability of CMS as they provide a location and resources for a given
institute to perform its work with substantial freedom of action
•
Within the WLCG, Tier-3 sites are expected to participate in CMS
computing by coordinating with specific Tier-2 centres, generation
15
CMS-CAF: CERN Analysis Facility
•
Activities directly coupled to the
operations and performance of the
CMS detector
– Diagnostics of detector problems
– Trigger performance services such as
reconfiguration, optimization and the
testing of new algorithms
– Calibrations required by the high level
trigger or the initial reconstruction
pass
•
•
These activities will have the highest
priority at the CAF and will take priority
over all other activities
Support the analysis of CMS wide and
CERN based users
16
Additional Services at CMS-CAF
•
•
•
•
•
•
Recording and book keeping
services
Provide the central information
repository for data management
Store conditions and calibration
data
Provide the main software
repositories
Provide the documentation
repositories
And other services associated
with all Tier-1 centres
–
–
–
–
Small database services
Local file catalogue
User Interface systems
Gateway systems
17
Users and the CMS-CAF
•
•
Detector performance and Operations
takes precedence, but also…
The CAF will provide a major analysis
facility for CMS
– This facility will be accessible to all
CMS users and all individual users will
have equal priority access to the facility
– All users will have interactive access for
code development purposes, be able to
submit analysis jobs to batch queues
and have disk storage for processed
data available for personal use
– It is anticipated however that users with
local owner or local user access at Tier2 centres will use these in preference to
the CMS-CAF facilities
•
CERN based users
18
Services Overview
19
Basic Distributed Workflow
20
Resource Profile
•
2007
–
•
•
2008 as
Computing Model
2009
–
–
–
•
50 days, 1/2
computing,
1/3 disk
Event size
down
Analysis
sample
doubles
1 full
reprocessing
pass
2010
–
–
5 times
processing
time
T0 not in real
time, needs
full 200 days
21
Project Phases
• Computing support for Physics TDR, -> Spring ‘06
– Core software framework, large scale production & analysis
• Cosmic Challenge (Autumn ‘05 -> Spring ‘06)
– First test of data-taking workflows
– Data management, non-event data handling
• Service Challenges (2005 - 06)
– Exercise computing services together with WLCG + centres
– System scale: 50% of single experiment’s needs in 2007
• Computing, Software, Analysis (CSA) Challenge (2006)
– Ensure readiness of software + computing systems for data
– 10M’s of events through the entire system (incl. T2)
• Commissioning of computing system (2006 - 2009)
– Steady ramp up of computing system to full-luminosity running.
22
Results of Service Challenge 3
Site
ASCC, Taiwan
Daily average
(MB/s)
10
BNL, US
107
FNAL, US
185
GridKA, Germany
42
CC-IN2P3, France
40
CNAF, Italy
50
NDGF, Nordic
129
PIC, Spain
54
RAL, UK
52
SARA/NIKHEF, The Netherlands
TRIUMF, Canada
> 150 MB/s
111
34
23
Summary
• CMS has a computing model based on design goals of
scalability and flexibility
• We have utilized features such as streaming to allow us to
optimize the use of the available resources and to allow
tuned responses to actual conditions
• Our initial computing model is designed to allow us to
succeed in analyzing the data according to the experiment
priorities without requiring significant functionality
improvements for middleware and distributed computing
• We rely on LHC Computing using Grid capabilities
24