Transcript Slide 1

Efficient Handling and Processing of
PetaByte-Scale Data for the Grid Centers
within the FR Cloud
1ST JOINT SYMPOSIUM CEA-IFA
HaPPSDaG
- PROJECT PRESENTATION - FIRST YEAR PROGRESS REPORT -
M. Dulea
National Institute for Nuclear Physics and Engineering
'Horia Hulubei' (IFIN-HH)
20.10.2011
Mihnea Dulea, IFIN-HH
1
OVERVIEW
 Computing support for LHC
 Project topics
 Project objectives and work planning
 Framework agreements
 General information
 Project teams and infrastructure
 First year results
20.10.2011
Mihnea Dulea, IFIN-HH
2
COMPUTING SUPPORT for LHC - LCG
LHC COMPUTING GRID
LCG is a wide distributed array of computing resources that provides the
computing support required for the storage, processing, simulation and
analysis of the data gathered by the four major
experiments performed at LHC.
It consists of more than 140 computing centres
and federations of centres from 35 countries.
The resource centres are classified according
to their size and functionality as Tier-0 (CC
@ CERN), Tier-1 (11 centres), and Tier-2.
The centres are interconnected through a
high-speed network (GEANT2 in EU).
Current and 2012-2014 activity related to LHC.
20.10.2011
Mihnea Dulea, IFIN-HH
3
COMPUTING SUPPORT - FR
ATLAS FRENCH CLOUD
Grid sites:
CC-IN2P3 (Tier-1)
Tier-2 centres: ... (many)
GRIF
Grille de Recherche d'Ile de France
computing grid in Paris region, joint
initiative of CEA/IRFU + labs of
CNRS/IN2P3 (6 sites)
The sites are interconnected
through dedicated 10 Gbps links; connected to the FR NREN:
RENATER: Réseau national de télécommunications pour la technologie, l'enseignement et la
recherche
FR Cloud includes foreign grid centres from China, Japan, Romania
20.10.2011
Mihnea Dulea, IFIN-HH
4
COMPUTING SUPPORT - RO
ROMANIAN TIER-2 FEDERATION RO-LCG
Grid sites:
IFIN-HH, 5 Grid sites (resource centres)
ISS - Inst. for Space Sciences (2 sites)
UPB - Univ. 'Politehnica' of Bucharest
ITIM - NIRD in Molecular & Isotopic
Technologies - Cluj
UAIC, Alex. Ioan Cuza University - Iasi
The sites are connected to the 10 Gbps
backbone of the RO NREN - the Romanian
Educational and Research Network RoEduNet
4 grid sites currently support ATLAS vo: RO-07-NIPNE, RO-02-NIPNE (IFIN-HH);
RO-14-ITIM (Cluj), RO-16-UAIC (Iasi)
20.10.2011
Mihnea Dulea, IFIN-HH
5
PROJECT TOPIC
Computing support for LHC experiments = provision of grid resources + services
The overall support of LCG
deployment and operation is
provided from other funds (e.g.
CONDEGRID project in RO).
HAPPSDAG addressess
specific ATLAS issues
in order to
optimize resource usage
20.10.2011
Mihnea Dulea, IFIN-HH
6
ATLAS ISSUES
Generic requirements regarding
- data transfer from Tier-1 to the associated Tier-2 sites (CC-IN2P3 => RO-LCG)
- transfer of large files from SE to WN for each analysis job; consider many simultaneous jobs
- transfer of log and results files from WN to SE; immediate transfer of log file to UI
RO specific needs at the beginning of the project
Grid cluster
- analysis of the causes of the lower performance of
RO-LCG sites before Oct. 2010
- elaborate and test technical solutions for performance
improvement
- ensure better communication and coordination
between the RO sites and the FR-cloud partners
- general measures for improving Tier1 - Tier2 interaction
- elaborate general guidelines regarding the
improvement in efficiency of the grid centers which are
associated to ATLAS clouds
Transfer paths from/to the Storage Element (SE)
20.10.2011
Mihnea Dulea, IFIN-HH
7
PROJECT OBJECTIVES
Strategic objective: provide means for improvement of the processing and
handling of large data sets at the Tier2 centers which participate in the ATLAS
experiment at the LHC computing support. (RO - case study)
Specific objectives and partner contributions:
 Improve communication and coordination between GRIF/IN2P3 and RO sites (RO/FR)
 Testing & improving quality of the FR - RO data link for large dataset transfers (RO/FR)
 Implementation of specific measures for increasing ATLAS job load and storage
performance on sites (RO)
 Improving large dataset transfer between FR - RO and data analysis (RO/FR)
 Contributing to grid monitoring and technical support within FR-cloud (RO)
 Training regarding grid monitoring and support (FR => RO)
 Dissemination (RO/FR)
15.11.2010
Mihnea Dulea, IFIN-HH
8
PLANNING of WORK
 Stage 1 (01.10.2010 - 10.12.2010)
Analysis of Tier1-Tier2 communication
 Stage 2 (01.01.2011 - 30.09.2011)
Studies and software tools for monitoring and operation of the
FR Cloud - RO grid connection and job loading. Testing of data
handling and processing.
 Stage 3 (01.10.2011 - 30.09.2012)
Methods and procedures for improving the performance of the
RO sites within the FR Cloud
20.10.2011
Mihnea Dulea, IFIN-HH
9
FRAMEWORK AGREEMENTS
 General Cooperation Agreement for Scientific Research
between CEA and IFA, signed in December 2009
- Field of cooperation: Technologies for Information and Health
- Topic proposed for 2010: Grid Technologies
 Joint Call for proposals of joint R&D projects (May 2010)
- IFIN-HH and IRFU submitted a proposal for a Joint Research and
Development Projects
 Cooperation Agreement in the Field of Scientific Research (AS)
between CEA and IFIN-HH, (01.10.2010)
- General Coordinators: Gerard Cognet (FR), Ioan Ursu (RO)
- leading and coordinating the cooperation activities
 Project Agreement (CEA, IFIN-HH)
20.10.2011
Mihnea Dulea, IFIN-HH
10
GENERAL INFORMATION
 RO Contract n° C1-06/2010, between IFA and IFIN-HH
 Start date: 01/10/2010
 Duration: 24 months
 Funding of the RO part of the project: 400 000 lei (~ 94.000 €)
 Funding of the FR part of the project: 133 000 €
2010
BUDGET
Manpower
Travels
Others (Romanian Engineer
staying at Saclay )
Others (French guests
staying in Romania )
Others (equipment)
Others (indirect costs)
Total:
20.10.2011
2011
2012
RO (lei)
CEA (Eur)
RO (lei)
CEA (Eur)
RO (lei)
CEA (Eur)
25.333
8.000
6000
4000
5000
120.133
3.200
48000
14000
10000
82.000
8.000
22000
14000
10000
0
10.000
10.000
0
6.667
40.000
40.000
26.667
200.000
40.000
20.000
160.000
15.000
Mihnea Dulea, IFIN-HH
72.000
46.000
11
PROJECT TEAMS
Project coordinators:
Jean-Pierre Meyer (FR), Mihnea Dulea (RO)
Technical correspondents: Pierrick Micout (FR), Gabriel Stoicea (RO)
FR team (CEA/IRFU)
Eric LANÇON
Pierrick MICOUT
Christine LEROY
Frédéric SCHAER
Zoulikha GEORGETTE
Adelino GOMEZ
RO team (IFIN-HH)
Serban Constantinescu
Mihai Ciubancan
Ionut Traian Vasile
Camelia Mihaela Visan
20.10.2011
Mihnea Dulea, IFIN-HH
12
Centre for Informational Technologies (CTI) - IFIN-HH
INFRASTRUCTURE @ CTI/DPETI
1200 (grid) + 960 (hpc) cores, 270 TB
ANALYSIS of NETWORK INFRASTRUCTURE
Objective: identify the weak points of the FR-RO data connection and adoption of measures
for improving the transfer capacity of large datasets.
Network structure: complex, various owners and administrators => more difficult to act
Section
IFIN-HH LAN
IFIN - UPB
RoEduNet
GEANT2
RENATER
Centres
RO-02-NIPNE
RO-07-NIPNE
UPB
RO-14-ITIM
RO-16-UAIC
In
34
EU
states
GRIF, IN2P3
Administrator
CTI/DPETI
Owner
IFIN-HH
Location
Magurele
ICOMM
AARNIEC
IFIN-HH
MECTS
UPB
Romania
DANTE
EU NRENs
EU
GIP RENATER
GIP
RENATER
France
Activities (RO+FR)
 Testing connectivity & transport capacity with various tools
 Finding routing paths and points of data traffic delay
 Comparing performances of RO-CERN link with those of RO-IN2P3
Conclusions: a) performance degradation at RoEduNet / GEANT2 interface
b) bottlenecks on some of the RoEduNet routers
20.10.2011
Mihnea Dulea, IFIN-HH
14
IMPROVING POINT-TO-POINT TRAFFIC PERFORMANCES
Requires close collaboration with network administrators along the RO-FR path
Example: following bandwidth capacity and traffic analysis, a RoEduNet router was found,
responsible of bottlneck. AARNIEC's intervention rised the available bandwidth to 700 Mbps
(fig. below).
Permanent monitoring required
20.10.2011
Mihnea Dulea, IFIN-HH
15
MONITORING TOOLS for
DATA TRANSFER and STORAGE PERFORMANCE - 1
Development of software tools for monitoring of SE traffic (in/out) (adding data sent by
daemons running on storage servers in a database + web interface for display)
Tools developed in IFIN-HH; useful for FR partners too for monitoring RO sites.
 Traffic from/to WNs and from/to external network
Max at 5 Gbps
20.10.2011
Max at > 3 Gbps
Mihnea Dulea, IFIN-HH
16
MONITORING TOOLS for
DATA TRANSFER and STORAGE PERFORMANCE - 2
 Traffic on gateway (in/out); SE extern throughput
 Monitoring groups of running or pending jobs
20.10.2011
Mihnea Dulea, IFIN-HH
17
MONITORING TOOLS for
DATA TRANSFER and STORAGE PERFORMANCE - 3
 Accounting of running or pending jobs on CE or CREAM-CE
20.10.2011
Mihnea Dulea, IFIN-HH
18
IMPROVEMENT of SITE MONITORING and TECHNICAL SUPPORT
Implementation of its own SAM (Service Availability Monitoring) system, that uses IFIN-HH
grid infrastructure and a new monitoring vo - ifops. Results published using Nagios.
Early notification of technical staff leads to improvement of availability of grid services
Monitoring of CREAM-CE, tbit03.nipne.ro
20.10.2011
Mihnea Dulea, IFIN-HH
19
IMPROVEMENT and TESTS of SE-WN THROUGHPUT
Adding more resources (WNs) doesn't always mean better results. Scalability is required
Improvement of file transfer speed from SE to WN, required by analysis jobs (4-6 files 2-4 GB)
Replacing the transfer to disk servers through Network File System (NFS) protocole by new
DPM (Disk Pool Manager) disk storage servers.
Higher transfer speed => no job exceeds the time limit => no cancellation
Tests of the new configuration
Time representation of the transfer speed (in Mbps) for 70 quasi-simultaneous jobs
20.10.2011
Mihnea Dulea, IFIN-HH
20
GLOBAL IMPROVEMENT of EFFICIENCY
Mean efficiency of ATLAS job execution in 2011: 91%
Monthly number of ATLAS jobs and number of ATLAS events processed in RO-LCG
20.10.2011
Mihnea Dulea, IFIN-HH
21
TRAINING REGARDING MONITORING AND TECHNICAL SUPPORT
20.06.11 - 04.07.11: training stage of C. Visan at CEA/IRFU, preparing later
participation to monitoring and support activities for FR Cloud sites.
Topics:
- CEA/IRFU monitoring methods at site, VO, project levels; EGI/WLCG and LHC
monitoring (Christine Leroy, Pierrick Micout )
- grid site usage (Georgette Zoulikha)
- NAGIOS installing/configuration on virtual machines (Frederic Schaer)
- job submission through Pathena (PanDA Athena), at LAL-Orsay (Laurent Duflot)
- CACTI site monitoring (Victor Mendoza, Université Pierre et Marie Curie (UPMC))
- instructions for site and job monitoring in ADCoS (ATLAS Distributed Computing
Operations Shift) and for support team of FR Cloud (Squad). (Sabine Crepe)
20.10.2011
Mihnea Dulea, IFIN-HH
22
MOBILITY
 Kick-off meeting (15-16.11 2010, Saclay)
 Participation at the RO-LCG 2010 Conference, Bucharest (Christine Leroy, Sabine
Crepe - IN2P3)
 Participation of Gabriel Stoicea to the spring meeting of LCG-France (30-31.05.2011)
 Training - monitoring and support (20.06.11 - 04.07.11, Saclay), C.M. Visan
20.10.2011
Mihnea Dulea, IFIN-HH
23
BENEFITS
CEA/IRFU
 The results of the project contribute to global improvement of FR Cloud efficiency
 Elaboration, in collaboration, of general guidelines for interaction between grid centres
in ATLAS clouds, and
 Using FR-RO interaction as a representative case study for sharing best practices with
smaller sites
IFIN-HH
 General efficiency improvement of the activity of the RO sites
 Better integration and visibility in the framework of the computing support for ATLAS
collaboration
 High-level training of RO technical staff
20.10.2011
Mihnea Dulea, IFIN-HH
24
PROSPECTS
 Further development of methods and procedures for improving the performance
of the RO sites within the FR Cloud
 General guidelines regarding the improvement in efficiency of the grid
centers which are associated to ATLAS clouds
 HAPPSDAG workshop and technical meeting in Bucharest (28-30.11.2011)
 Participation of IFIN-HH to site and job monitoring in ADC shifts (ATLAS
Distributed Computing) or in the monitoring team of FR Cloud.
 Dissemination of results
20.10.2011
Mihnea Dulea, IFIN-HH
25
THANK YOU FOR YOUR ATTENTION !
Questions?
20.10.2011
Mihnea Dulea, IFIN-HH
26