TIER 1 in Dubna - GRID and Advanced Information Systems

Download Report

Transcript TIER 1 in Dubna - GRID and Advanced Information Systems

Tier 1 in Dubna for CMS:
plans and prospects
Korenkov Vladimir
LIT, JINR
AIS-GRID School
2013, April 25
Tier 0 at CERN: Acquisition,
First pass reconstruction,
Storage & Distribution
1.25 GB/sec
(ions)
[email protected]
1
Tier Structure of GRID Distributed
Computing:
Tier-0/Tier-1/Tier-2
[email protected]
Tier-0 (CERN):
• accepts data from the CMS
Online Data Acquisition
and Trigger System
• archives RAW data
• the first pass of
reconstruction and
performs Prompt
Calibration
• data distribution to Tier-1
Tier-1 (11 centers):
• receives a data from the
Tier-0
• data processing (rereconstruction, skimming ,
calibration etc)
• distributes data and MC to
the other Tier-1 and Tier-2
• secure storage and
redistribution for data and
MC
Tier-2 (>200 centers):
• simulation
• user physics analysis
2
3
Wigner Data Centre, Budapest
•New facility due to be ready at the end
of 2012
•1100m² (725m²) in an existing building
but new infrastructure
•2 independent HV lines
•Full UPS and diesel coverage for all IT
load (and cooling)
•Maximum 2.7MW
Slide from I.Bird (CERN, WLCG) presentation at
GRID2012 in Dubna
WLCG Grid Sites
•
Tier 0
•
Tier 1
•
Tier 2
• Today >150 sites
• >300k CPU cores
• >250 PB disk
Russian Data Intensive Grid
infrastructure (RDIG)
The Russian consortium RDIG (Russian Data Intensive Grid), was set up in
September 2003 as a national federation in the EGEE project.
Now the RDIG infrastructure comprises 17 Resource Centers with
> 20000 kSI2K CPU and > 4500 TB of disc storage.
RDIG Resource Centres:
– ITEP
– JINR-LCG2 (Dubna)
– RRC-KI
– RU-Moscow-KIAM
– RU-Phys-SPbSU
– RU-Protvino-IHEP
– RU-SPbSU
– Ru-Troitsk-INR
– ru-IMPB-LCG2
– ru-Moscow-FIAN
– ru-Moscow-MEPHI
– ru-PNPI-LCG2 (Gatchina)
– ru-Moscow-SINP
- Kharkov-KIPT (UA)
- BY-NCPHEP (Minsk)
- UA-KNU
Country Normalized CPU time (2012-2013)
All Country - 19,416,532,244
Job
726,441,731
Russia- 410,317,672 (2.12%)
23,541,182 ( 3.24%)
6
Country Normalized CPU time per VO
(2012-2013)
7
Russia Normalized CPU time per SITE and
VO (2012-2013)
All VO
Russia - 409,249,900
JINR - 183,008,044
CMS
Russia - 112,025,416
JINR - 67,938,700 (61%)
8
8
Frames for Grid cooperation with CERN





2001: EU-dataGrid
Worldwide LHC Computing Grid (WLCG)
2004: Enabling Grids for E-sciencE (EGEE)
EGI-InSPIRE
CERN-RFBR project “Grid Monitoring from VO perspective”
Collaboration in the area of WLCG monitoring
WLCG today includes more than 170 computing centers where more than 2 million jobs are
being executed daily and petabytes of data are transferred between sites.
Monitoring of the LHC computing activities and of the health and performance of the
distributed sites and services is a vital condition of the success of the LHC data processing.
WLCG Transfer Dashboard
Monitoring of the XRootD federations
WLCG Google Earth Dashboard
Tier3 monitoring toolkit
9
JINR –LCG2 Tier2 site
Provides the largest share to the
Russian Data Intensive Grid (RDIG)
contribution to the global
WLCG/EGEE/EGI Grid-infrastructure.
JINR secured 46% of the overall
RDIG computing time contribution
to the solution of LHC tasks
 During 2012, CICC has run more
than 7.4 million jobs, the overall
CPU time spent exceeding 152
million hours (in HEPSpec06 units)
 Presently, the CICC computing
cluster comprises 2582 64-bit
processors and a data storage
system of 1800 TB total capacity.

WLCG Tier1 center in Russia
• Proposal to create the LCG Tier1 center in Russia (official letter by Minister
of Science and Education of Russia A. Fursenko has been sent to CERN DG R.
Heuer in March 2011).
• The corresponding point to include in the agenda of next 5x5 meeting
Russia-CERN (October 2011)
- for all four experiments ALICE, ATLAS, CMS and LHCb
- ~10% of the summary Tier1 (without CERN) resources
- increase by 30% each year
- draft planning (proposal under discussion) to have prototype in the end
2012, and full resources in 2014 to meet the start of
next working LHC session.
of
• Discussion about distributed Tier1 in Russia for LHC and FAIR
11
Joint NRC "Kurchatov Institute" – JINR
Tier1 Computing Centre
Project: «Creation of the automated system of data processing for
experiments at the Large Hadron Collider (LHC) of Tier1 level and
maintenance of Grid-services for a distributed analysis of these data»
Terms: 2011-2013
Type of project: R&D
Cost: federal budget - 280 million rubles (~8.5 MCHF), extrabudgetary sources - 50%
of the total cost
Leading executor: RRC KI «Kurchatov institute» for ALICE, ATLAS, and LHC-B
Co-executor: LIT JINR (Dubna) for the CMS experiment
Project goal: creation in Russia of a computer-based system for processing
experimental data received at the LHC and provision of Grid-services
for a subsequent analysis of these data at the distributed centers of the
LHC global Grid- system.
Core of the proposal: development and creation of a working prototype of the firstlevel center for data processing within the LHC experiments with a
resource volume not less than 15% of the required one and a full set
of grid-services for a subsequent distributed analysis of these data.
13
The Core of LHC Networking:
LHCOPN and Partners
13
JINR CMS Tier-1 progress
●
●
●
●
●
2012
(done)
2013
2014
CPU (HEPSpec06)
Number of core
14400
1200
28800
2400
57600
4800
Disk (Terabytes)
720
3500
4500
Tape (Terabytes)
72
5700
8000
Disk & server installation and tests: done
Tape system installation: done
Organization of network infrastructure
and connectivity to CERN via GEANT:
done
Registration in GOC DB and APEL: done
Tests of WLCG services via Nagios: done
CMS-specific activity
●
Currently commissioning Tier-1 resource for CMS:
–
–
–
–
–
●
●
Local Tests of CMS VO-services and CMS SW
The PhEDEx LoadTest (tests of data transfer links)
Job Robot Tests (or tests via HammerCloud)
Long-running CPU intensive jobs
Long-running I/O intensive jobs
PHDEDX transferred of RAW input data to our storage
element with transfer efficiency around 90%
Prepared services and data storage for the reprocessing
of 2012 8 TeV reprocessing
Services
•
•
•
•
•
•
•
•
•
•
Security (GSI)
Computing Element (CE)
Storage Element (SE)
Monitoring and Accounting
Virtual Organizations (VOMS)
Workload management (WMS)
Information service (BDII)
File transfer service (FTS + PhEDEx)
SQUID Server
CMS user services (Reconstruction Services, Analysis
Services etc)
17
19 Milestones of the JINR CMS Tier-1 Deployment and Commissioning
Objective
Target date
Presentation the Execution Plan to WLCG OB
Sep 2012
Prototype
Disk & Servers installation and tests
Oct 2012
Tape system installation
Nov 2012
Organization of network infrastructure and connectivity to CERN via GEANT (2 Gb)
Nov 2012
WLCG OPN integration (2 Gb) and JINR-T1 registration in GOCDB including integration with the APEL accounting
system
Dec 2012
M1
Dec 2012
LHC OPN functional tests (2 Gb)
May 2013
Test of WLCG and CMS services (2 Gb LHCOPN)
May 2013
Test of tape system at JINR: data transfers from CERN to JINR (using 2 Gb LHC OPN)
May 2013
Test of publishing accounting data
May 2013
Definition of Tier 2 sites support
May 2013
Connectivity to CERN 10 Gb
Jul 2013
M2
Jul 2013
LHC OPN functional tests (10 Gb)
Aug 2013
Test of tape system at JINR: data transfers from CERN to JINR (using 10 Gb LHC OPN)
Aug 2013
Upgrade of tape, disk and CPU capacity at JINR
Nov 2013
M3
Nov 2013
85% of the job capacity running for at least 2 months
Storage availability > 98% (functional tests) for at least 2 months
Running with > 98% Availabilities & Reliabilities for at least 30 days
WLCG MoU as an associate Tier-1 center
Feb 2014
Disk & Tape & Servers upgrade
Oct 2014
M4
Dec 2014
US-BNL
CERN
Amsterdam/NIKHEF-SARA
Bologna/CNAF
CaTRIUMF
Taipei/ASGC
Russia:
NRC KI
NDGF
JINR
US-FNAL
26 June 2009
De-FZK
Barcelona/PIC
Lyon/CCIN2P3
UK-RAL
21
Staffing
ROLE
Administrative
FTE
1.5
Korenkov V.
Network support
Engineering
Infrastructure
2
Mitsyn V.
2.5
Hardware support
3
Core software and
WLCG middleware
4.5
CMS Services
3.5
Dolbilov A.
Trofimov V.
Total
17
Shmatov S.