Объединенный институт ядерных исследов

Download Report

Transcript Объединенный институт ядерных исследов

Status of the JINR GRIDinfrastructure and participation in
WLCG project
Vladimir Korenkov (JINR, Dubna)
Physics and Computing at ATLAS
Dubna 21 January, 2008
LHC experiments support
–
–
–
–
Networking
Computer power
Data storage
Software
installation and
maintenance
– Mathematical
support
Grid – solution
for LHC
experiments
Tiers-1 for
JINR
ATLAS,
CMS,ALICE
LHC experiments support at JINR
A necessary level of all the elements of the JINR
telecommunication, network and information
infrastructure should be provided:
• High-throughput telecommunication data links,
• JINR local area network (LAN) backbone,
• Central computer complex and Grid segment,
• Software support of the LHC experiments.
Current Status of External
Network Communications at JINR
Telecommunication channels
N*10 Gbps construction
Upgrade of Dubna-Moscow data link up to 10 Gbps in 2007 and
up to 40 Gbps in 2010.
Network Monitoring:
Incoming and outgoing traffic distribution
• Total:
incoming in 2007 – 241.9 TB,
outgoing -227.8;
300
250
TB
200
Incoming
150
100
50
0
2003
2004
2005
2006
• Most of traffic:
CERN (88.8%),
DESY, INFN, SARA, IN2P3
Outgoing
• 47 local sub-networks;
• Local traffic – 77.6 TB
Created in 2007:
• direct point-to-point data
2007
channel between JINR LAN and
CERN as part of the JINR
program of participating in LCG
in CERN;
• Dubna-City Internet eXchange.
JINR Local Area Network Backbone (LAN)
• Comprises 5880 computers
Plans:
Step-by-step modernization of the JINR
Backbone – transfer to 10 Gbps
Development and modernization of the
control system of the JINR highway
network
and nodes,
Users - 3322
Modem pool users - 689
Remote VPN users (Lanpolis,
Contact, TelecomMPK) - 500;
• High-speed transport (1Gbps)
(Min. 100 Mbps to each PC);
• Controlled-access (Cisco PIX525 firewall) at network
entrance;
• Partially isolated local traffic
(8 divisions have own
subnetworks with Cisco
Catalyst 3550 as gateways);
• General network authorization
system involves many services
(AFS, batch systems, Grid,
JINR LAN remote access, etc.
JINR Central Information and
Computing Complex (CICC)
670 kSi2K
100 TB Disk
The JINR Central Information and
Computing Complex (CICC) is the
element of Russian GRID Segment
used for LHC computing and for the
other applications.
Contract prepared in Dec 2007:
SuperBlade – 2 BOX
40 CPU Xenon 5430 2.66 GHz Quad
Core ~ 400 kSi2K
Total expected in March, 2008 1070 kSi2K
JINR WLCG infrastructure
CICC comprises:
53 servers
7 interactive nodes
60 4-core computing nodes, Xeon 5150, 8GB RAM.
6 2-core computing nodes, Athlon , 2GB RAM, Mirynet.
Site name: JINR-LCG2
Internal CICC network – 1Gbit/sec
Operating system - Scientific Linux 4.4,
Scientific Linux CERN 4.5;
Middleware version GLITE-3.1
File Systems – AFS (the Andrew File System) for user Software
and home directories is a world-wide distributed file system.
AFS permits to share easily files in an heterogeneous distributed
environment (UNIXes, NT) with a unique authentication scheme
(Kerberos).
dCache- for data.
User registration system – Kerberos 5 ( AFS use Kerberos 5
for authentication )
JINR Central Information and Computing Complex (CICC)
June 2007, the CICC resources and services have
been integrated into a unified information and
computing structure
SL3/32 Int/UI - Interactive
nodes/User Interface at 32-bit
architecture with SL3,
SL4/32 Int/UI - Interactive
nodes/User Interface at 32-bit
architecture with SL4,
SL4/64 Int/UI - Interactive
nodes/User Interface at 64-bit
architecture with SL4,
LCG-RB - LCG Resource Broker,
LCG-CE - LCG Computing
Elements,
WN - Worker Nodes,
X509 PX - Proxy,
VObox - special node where
experiments (ALICE, CMS, etc.) or
Virtual organizations (VO) can run
specific agents and services to
provide a reliable mechanism to
accomplish various tasks specific
for VO,
AFS - AFS servers,
dCashe - dCashe servers (82 TB)
LHC software
In 2007, the migration to the 64-bit architecture under Scientific
Linux 4 operating system has been accomplished at the CICC.
The following current versions of the specialized software are
installed at the JINR-LCG2 site:
for ALICE - AliEn (v2-13.141), VO ALICE.AliRoot.v4-06-Rev-04,
VO ALICE.APISCONFIG.V2.2, VO ALICE.GEANT3.v1-8-1, VO
ALICE.loadgenerator.v-1.0, VO ALICE.ROOT.v5-16-00;
for ATLAS - VO-atlas-cloud-NL, VO-atlas-production(12.0.31, 12.0.5,
12.0.6, 13.0.20, 13.0.30 and 13.0.30.1), VO-atlas-elease(11.0.42
and 11.0.5), VO-atlas-tier-T3;
for CMS - VO-cms-CMSSW(1 6 0, 1 6 1 and 1 6 3 - 1 6 7);
for LHCb - VO-lhcb-Gauss(v25r9 - v25r12), VO-lhcb-XmlDDDB(v22r2 and
30r14), VO-lhcb-Boole-v12r10, VO-lhcb-DaVinci(v17r6 - v17r8,
v18r0 and v19r0 - v19r5), VOlhcb-Brunel(v30r15 and v30r17),
VO-lhcb- DecFiles(v13r9, v13r10 and v13r12), VO-lhcbParamFiles(v5r0).
Several versions of ALICE, ATLAS and CMS software are being also
installed at the CICC locally at the AFS system.
JINR WLCG infrastructure
JINR provides the following services in the WLCG environment:
Basic services - Berkley DB Information Index (top level BDII); site
BDII; Computing Element (CE); Proxy Server (PX); Resource Broker
(RB); Workload Management System + Logging&Bookkeeping Service
(WMS+LB); RGMA-based monitoring system collector server (MONbox); LCG File Catalog (LFC); Storage Element (SE), dCache 82 TB;
Special Services - VO boxes for ALICE and for CMS; ROCMON;
PPS and testing infrastructure - Pre-production gLite version;
Software for VOs: dCache xrootd door, AliROOT, ROOT, GEANT
packages for ALICE; ATLAS packages; CMSSW packages for CMS and
DaVinchi, Gauss packages for LHCb.
Grid Virtual Organizations at JINR CICC:
June-December 2007
Grid VO
Jobs
number
CPU time
(kSi2k*hours)
ALICE
90 441
1 370 820.40
ATLAS
15 643
48 980.43
CMS
52 249
51 883.18
LHCb
10 484
6 604.50
BIOMED
25 103
164 102.07
FUSION
9 208
145 053.80
Others
17 665
47 022.10
220 793
1 834 466.49
(ops, dteam, hone)
TOTAL
dCache in JINR
Internet / GRID
JINR Backbone
local
Protocols gFTP, SRM, XROOT
lxfs07
lxfs07
lxfs07
lxpub01
Admin
WNs
lxfs07
lxfs07
lxfs07
wn…
DCAP
Doors
82 TB
lxfs07
PNFS
lxfs07
lxfs07
lxfs07
lxfs71
RAID
lxfs07
lxfs07
lxfs07
rda02
Pools
dCache: Files, VO, Discs
200000
CMS
180000
ATLAS
160000
140000
other
120000
100000
Ряд1
80000
60000
40000
FREE!
20000
0
Прочие
BIOMED
Files
ATLAS
CMS
Total 82 ТВ
JINR in the RDIG infrastructure
Now the RDIG infrastructure comprises 15
Resource Centers with more 1500 CPU and more
650 TB of disc storage.
RDIG Resource
Centres:
– ITEP
– JINR-LCG2
– Kharkov-KIPT
– RRC-KI
– RU-Moscow-KIAM
– RU-Phys-SPbSU
– RU-Protvino-IHEP
– RU-SPbSU
– Ru-Troitsk-INR
– ru-IMPB-LCG2
– ru-Moscow-FIAN
– ru-Moscow-GCRAS
– ru-Moscow-MEPHI
– ru-PNPI-LCG2
– ru-Moscow-SINP
RDIG monitoring&accounting
http://rocmon.jinr.ru:8080
Monitored values
CPUs - total /working / down/ free / busy
Jobs - running / waiting
Storage space - used / available
Network - Available bandwidth
Accounting values
Number of submitted jobs
Used CPU time
Totally sum in seconds
Normalized (with WNs productivity)
Average time per job
Waiting time
Totally sum in seconds
Average ratio waiting/used CPU time
per job
Physical memory
Average per job
Russia and JINR Normalized CPU time per
SITE (June 2007 - December 2007)
Jun 07 Jul 07 Aug 07 Sep 07 Oct 07 Nov 07 Dec07 Total
JINR 103,238 244,393 136,615 320,041 365,456 341,876 11,258 1,522,877 47.26%
RDIG LCG2-sites: statistics on CPU usage, data
transfers and site reliability
RDIG SITES
CPU Usage
KSI2K
Nov.2007
Dec.2007
Data transfers
from CERN
(TeraBytes)
Oct.-Nov.2007
Reliability
Oct.2007 – Jan.
2008
Oct.2007
FIAN
336 51
94
43
-
IHEP
75203 71
72
84
25
INR
40620 79
93
35
-
ITEP
380164 63
93
84
11
JINR
1008105 93
93
94
37
479 92
95
44
-
- 96
97
92
-
366723 44
85
0
5
120 86
0
0
6
92068 25
0
49
3
83
88
-
MEPHI
Phys-SPbSU
PNPI
RRC-KI
SINP
SPbSU
Totals
Average
3556 86
1967374
87
71
73
56
Network bandwidth and reliability of
data transfers
The following LHC computing
centers serve as Tier1
centers for RDIG:
FZK (Karlsruhe) – for ALICE,
SARA (Amsterdam) – for ATLAS,
CERN – for CMS (CERN-PROD)
and LHCb.
The quality of the JINR – Tier1s
connectivity is under close
monitoring
FTS Monitoring: CERN-JINR transfers
Best transfer-test results for CERN - JINR 01.08.2007-04.08.2007
Average throughput 20 MB/s during the whole 04.08
Average data movement from CERN to JINR 01.2007-11.2007
CERN-PROD as T1 for RDMS
4 August, 2007
Transfer rate 12 – 22 MBs
98% of successful transfers
1.64 TB have been transferred
http://rocmon.jinr.ru/scripts/phedex
October-November, 2006
(CNAF as FTS T1 for JINR)
Transfer rates less than 2 MBs
Ratio of successful transfers ~20-30%
Testing of JINR-LCG2 site by CMS Job Robot Jobs
from October, 2007 till now
The CMS JobRobot is a program, currently operated from a machine at CERN,
that creates typical CMS user analysis jobs, submitts them to specific SE sites,
and collects them while keeping track of the corresponding information. Its main
objective is to test how sites are responding to job processing in order to detect
JINR-LCG2 site demonstrates a high level
of job processing reliability
possible problems and correct them as soon as possible.
As an example - CMS JobRobot Summary Table for 17.11.2007
EGEE SITES:
LHC VOs (ALICE, ATLAS, CMS and LHCb)
Normalised CPU time by SITE
June 2007 - December 2007
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
IN2P3-CC
4 731 732
CERN-PROD
4 393 875
FZK-LCG2
3 432 919
TRIUMF-LCG2
3 358 117
INFN-T1
2 244 936
IN2P3-LPC
1 705 242
INFN-PISA
1 438 029
UKI-NORTHGRID-MAN-HEP 1 369 207
GRIF
1 368 942
RAL-LCG2
1 306 579
JINR-LCG2
1 217 267
Statistics obtained from the EGEE Accounting Portal:
http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.html
EGEE SITES:
LHC VOs (ALICE, ATLAS, CMS and LHCb)
Normalised CPU time by SITE
September 2007 - January 2007
SITE
Sep 07
Oct 07
Nov 07
624,019
1,258,052
1,229,198
729,033
377,204
4,217,506
1,386,891
1,121,041
466,991
436,611
122,955
3,534,489
FZK-LCG2
643,124
612,975
598,618
645,248
384,200
2,884,165
CERN-PROD
483,517
869,397
504,751
598,155
302,715
2,758,535
INFN-T1
217,179
387,501
358,604
910,196
365,699
2,239,179
NDGF-T1
354,620
705,382
478,403
405,428
0
1,943,833
0
445,528
696,033
638,411
160,448
1,940,420
GRIF
266,099
342,961
248,369
271,297
194,211
1,322,937
IN2P3-LPC
263,765
394,922
312,387
241,328
79,836
1,292,238
JINR-LCG2
228,648
280,674
278,157
268,371
171,503
1,227,353
TRIUMF-LCG2
IN2P3-CC
IN2P3-CC-T2
Dec 07
Jan 08
Total
Worldwide LHC Computing
Grid Project (WLCG)
The protocol between CERN, Russia and JINR on a
participation in LCG Project has been approved in 2003.
The tasks of the Russian institutes in the LCG have
been defined as:
 LCG software testing;
 evaluation of new Grid technologies (e.g. Globus toolkit 3) in a
context of using in the LCG;
 event generators repository, data base of physical events:
support and development.
JINR in the WLCG
•support and development WLCG infrastructure;
•participation in WLCG middleware testing/evaluation,
• participation in Data and Service Challenges,
• grid monitoring and accounting system development;
• FTS-monitoring and testing
• MCDB development;
• Participation in ARDA activities in coordination with experiments;
• HEP applications;
• User & Administrator Training and Education
• support of JINR member states in the WLCG activities.
User Training and Induction
User’s support to stimulate their active usage of LCG resources
(courses, lectures, trainings, publication of user guides in
Russian):
Courses “CMS user analysis using EGEE/LCG infrastructure” January 19, 2007. Lectures: CMS computing support at JINR:
current status and plans; Short introduction to LCG/EGEE; CMS
user jobs submission with the usage of ASAP. Practical part:
Usage of ASAP ("private" user jobs and jobs needed an access to
CMS data bases of simulated events)
http://rdms-cms.jinr.ru/docs/rdms_1/cours1.htm
Tutorial on distributed analysis of ATLAS data - April 19, 2007.
Lectures: Main LCG commands and operations with files; Data
analysis with GANGA and practical part.
http://atlasinfo.jinr.ru/computing/tutorial_190407.html
User Training and Induction
Courses for CMS users on submitting jobs to the LCG-infrastructure
Russian and JINR physicists
participants of ATLAS experiment
train and practise with Grid and the
GANGA
COURSES
LECTURES
PRACTICAL TRAINING
The XXI International Symposium on Nuclear
Electronics and Computing (NEC'2007)
Bulgaria, Varna, 10-17 September, 2007.
The main topics of the
symposium are:
Detector & Nuclear Electronics
Computer Applications for
Measurement and Control in
Scientific Research
Triggering and Data Acquisition
Accelerator and Experiment
Automation Control Systems
Methods of Experimental Data
Analysis
Information & Data Base Systems
Computer Networks for Scientific
Research
Data & Storage Management
GRID computing
2-nd International Conference
"Distributed Computing and Grid-technologies in
Science and Education“
LABORATORY OF INFORMATION TECHNOLOGIES
26 - 30 June, 2006
The first conference, organized 2 years ago by LIT,
became the first forum in Russia in this field. The second
conference was attended by more than 200 specialists from
17 countries and from 46 universities and research centers.
The scientific program included 96 reports covered 8
topics: 1) creation and operating experience of Grid
infrastructures in science and education; 2) methods and
techniques of distributed computing; 3) distributed
processing and data storage; 4) organization of the network
infrastructure for distributed data processing; 5) algorithms
and methods of solving applied problems in distributed
computing environments; 6) theory, models and methods of
distributed data processing; 7) distributed computing within
LHC projects and 8) design techniques and experience of
using distributed information Grid systems.
In the framework of the conference two tutorials on Grid
systems gLite and NorduGrid were organized.
In general opinion of the conference attendees, such
conference should be continued. This will allow one to
extend the dialogue of leading experts from Europe, USA
and Russia.
3-nd International Conference
"Distributed Computing and Grid-technologies
in Science and Education“
will be on 30 June – 4 July, 2008
JINR CICC
2007
2008
2009
2010
CPU
(kSI2K)
670
(1070)
1250
1750
2500
Disk
(Tbytes)
100
400
800
1200
Active
Tapes
(Tbytes)
0
0
100
200
JINR – cooperation in Grid
Worldwide LHC Computing Grid (WLCG);
Enabling Grids for E-sciencE (EGEE);
CERN-INTAS projects;
BMBF grant “Development of the GRID-infrastructure and tools to
provide joint investigations performed with participation of JINR and
German research centers’
“Development of Grid segment for the LHC experiments” was
supported in frames of JINR-South Africa cooperation agreement in
2006-2007;
NATO project "DREAMS-ASIA“ (Development of gRid EnAbling
technology in Medicine&Science for Central ASIA);
JINR-Romania cooperation Hulubei-Meshcheryakov programme
LIT team participate in project "SKIF-GRID" - A Program of the
Belarussian-Russian Union State.
We work in close cooperation and provide support to our partners in
Ukraine, Belarus, Czech Republic, Romania, Poland, Germany, South
Africa, Bulgaria, Armenia, Uzbekistan, Georgia.
protocols of cooperation with INRNE (Bulgaria), ArmeSFo (Armenia),
FZK Karlsruhe GmbH (Germany), Wroclaw University (Poland), IHEPI
TSU (Georgia), NC PHEP BSU (Belarus), KFTI NASU (Ukraine), etc
Conclusions
As a result of JINR participation in the WLCG and EGEE project, the JINR
LGG/EGEE site is fully integrated into the worldwide LCG/EGEE grid
infrastructure providing all the necessary resources, services and software
for participation of JINR specialists in ALICE, ATLAS and CMS
experiments after the LHC start which is expected in 2008 year.
We shall continue the required computing support for ALICE, ATLAS and
CMS at the JINR CICC.
We plan to continue our participation in the WLCG project to support and
develop the JINR LCG/EGEE site at the running phase of the LHC
experiments.
The further JINR activities in the WLCG project are based on the
Memorandum on Understanding signed in September, 2007 by Russia,
JINR and CERN. This agreement gives a juridical and financial basement
for participation of Russia and JINR in the WLCG project after the LHC
start.
Also we shall continue our activities at the next stage of the EGEE project
taking into account that these two global grid projects are developing in a
close cooperation.