The DataGRID Project
Download
Report
Transcript The DataGRID Project
The DataGRID Project
Les Robertson
CERN - IT Division
May 2001
[email protected]
CERN
les robertson - cern-it mar-2001 - 1
last update: 18/07/2015 00:57
Enabling Worldwide
Scientific Collaboration
• an example of the problem
• the DataGRID solution
• concluding remarks
CERN
les robertson - cern-it mar-2001 - 2
last update: 18/07/2015 00:57
The Beginning of DataGRID
The DataGRID project evolved from the conjunction of
the search for a practical solution to building the computing
system for CERN’s next accelerator – the Large Hadron
Collider (LHC)
and the appearance of Ian Foster and Carl Kesselman’s book
– The GRID – Blueprint for a New Computing Infrastructure
CERN
les robertson - cern-it may-2001 - 3
last update 18/07/2015 00:57
The Problems
Vast quantities of data
Enormous computing requirements
Researchers spread all over the world
CERN
les robertson - cern-it may-2001 - 4
last update 18/07/2015 00:57
The Large Hadron Collider Project
4 detectors
ATLAS
CMS
Storage –
Raw recording rate 0.1 – 1 GBytes/sec
Accumulating at 5-8 PetaBytes/year
10 PetaBytes of disk LHCb
Processing –
200,000 of today’s fastest PCs
CERN
les robertson - cern-it may-2001 - 5
last update 18/07/2015 00:57
Storage
Network
Computing fabric
at CERN (2005)
12
Thousands of CPU boxes
1.5
0.8
8
6*
24 *
Farm Network
Hundreds of
tape drives
0.8
0.8
Real-time
detector data
* Data Rate
in Gbps
250
5
960 *
LAN-WAN Routers
Storage Network
Thousands of disks
CERN
les robertson - cern-it may-2001 - 6
last update 18/07/2015 00:57
Simulated Collision in the ATLAS Detector
CERN
les robertson - cern-it may-2001 - 7
last update 18/07/2015 00:57
Complex Data = More CPU Per Byte
Estimated CPU Capacity required at CERN
Moore’s law –
4,000
some measure of
the capacity
technology
advances provide
for a constant
number of
processors or
investment
LHC
3,000
2,000
Other
experiments
Jan 2000:
3.5K SI95
CERN
les robertson - cern-it may-2001 - 8
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
0
1998
1,000
[email protected]
K SI95
5,000
last update 18/07/2015 00:57
Data Analysis: unpredictable requests require intensive
computation on huge data flows
1-100 GB/sec
35K SI95
Event Filter
(selection &
reconstruction)
Detector
0.1 to 1
GB/sec
200 TB / year
~200 MB/sec
Event
Summary
Data
1 PB / year
Raw data
500 TB
~100 MB/sec
250K SI95
Event
Reconstruction
Event
Simulation
CERN
350K SI95
64 GB/sec
Processed
Data
Batch Physics
Analysis
analysis objects
Interactive
Data Analysis
Thousands of scientists
les robertson - cern-it may-2001 - 9
last update 18/07/2015 00:57
CERN's Users in the World
CERN
Europe:
267 institutes, 4603 users
Elsewhere: 208 institutes, 1632 users
les robertson - cern-it may-2001 - 10
last update 18/07/2015 00:57
Solution (i)
Large Scale Computing Fabrics
Long experience in HEP with large clusters –
processors, disk farms, mass storage
reliable, manageable, flexible growth
Applications adapted to a well established computing model
Currently using thousands of simple PCs, IDE disk servers,
Ethernet
Everything is using commodity components, tape storage excepted
New developments needed to scale these up by an order of
magnitude to tens of thousands of components
maintaining reliability and availability targets
containing management costs
Terabit switches
New levels of management automation –
installation, monitoring, auto-diagnosing, self-healing
CERN
les robertson - cern-it may-2001 - 11
last update 18/07/2015 00:57
Large Scale Computing Fabrics (cont)
But the requirements are greater than can be satisfied at a
single site
political/financial arguments against very large facilities
national constraints from funding organisations
exploiting existing computing centre infrastructure
Compare with geographical distribution of super-computing
centres
CERN
les robertson - cern-it may-2001 - 12
last update 18/07/2015 00:58
Solution (ii)
Regional Centres - a Multi-tier Model
Lab m
USA
FermiLab
Lab a
Tier2
Physics
Department
Tier 1
UK
Rutherford
CERN
Italy
Desktop
Lab b
CNAF/Bologna
NL
NIKHEF
Uni b
France
IN2P3/Lyon
Uni n
Lab c
Is this usable?
manageable?
Uni b
Uni y
CERN
les robertson - cern-it may-2001 - 13
[email protected]
Uni x
last update 18/07/2015 00:58
The Promise of Grid Technology
What does the Grid do for you?
you submit your work
and the Grid
Finds convenient places for it to be run
Optimises use of the widely dispersed resources
Organises efficient access to your data
Caching, migration, replication
Deals with authentication to the different sites that you will
be using
Interfaces to local site resource allocation mechanisms,
policies
Runs your jobs
Monitors progress
Recovers from problems
.. and .. Tells you when your work is complete
CERN
les robertson - cern-it may-2001 - 14
last update 18/07/2015 00:58
The DataGRID Project
www.eu-datagrid.org
DataGRID Partners
Managing partners
UK
PPARC
Italy
INFN
France
CNRS
Holland NIKHEF
Italy
ESA/ESRIN
CERN – proj.mgt. - Fabrizio Gagliardi
Industry
IBM (UK), Communications & Systems (F), Datamat (I)
Associate partners
Finland- Helsinki Institute of Physics
& CSC,
Swedish Natural Science Research
Council (Parallelldatorcentrum–KTH,
Karolinska Institute),
Istituto Trentino di Cultura,
Zuse Institut Berlin,
University of Heidelberg,
CEA/DAPNIA (F),
IFAE Barcelona,
CNR (I),
CESNET (CZ),
KNMI (NL),
SARA (NL),
SZTAKI (HU)
CERN
les robertson - cern-it may-2001 - 16
last update 18/07/2015 00:58
The Data Grid Project - Summary
European dimension
EC funding 3 years, ~10M Euro
Closely coupled to several national initiatives
Multi-science
Technology leverage –
Globus, Condor, HEP farming & MSS, Monarc, INFN-Grid, Géant
Emphasis –
Data – Scaling - Reliability
Rapid deployment of working prototypes - production quality
Collaboration with other European and US projects
Status –
Started 1 January 2001
Testbed 1 scheduled for operation at end of year
Open –
Open-source and communication
Global GRID Forum
Industry and Research Forum
CERN
les robertson - cern-it may-2001 - 17
last update 18/07/2015 00:58
DataGRID Challenges
Data
Scaling
Reliability
Manageability
Usability
CERN
les robertson - cern-it may-2001 - 18
last update 18/07/2015 00:58
Programme of work
Middleware - starting with a firm base in the Globus toolkit
Grid Workload Management, Data Management,
Monitoring services
Fabric
Fully automated Local Computing Fabric management
Mass Storage
Production quality testbed
Testbed Integration & Network Services
> 40 sites
Géant infrastructure
Scientific Applications
Earth Observation
Biology
High Energy Physics
CERN
les robertson - cern-it may-2001 - 19
last update 18/07/2015 00:58
Grid Middleware
Building on an existing framework (Globus)
workload management
The workload is chaotic – unpredictable job arrival rates, data access
patterns
The goal is maximising the global system throughput (events
processed per second)
Start with Condor Class-Ads
Current issues
Declaration of data requirements at job submission time
The application discovers the objects it requires during
execution
mapping of objects to the files managed by the Grid
Decomposition of jobs (e.g. moving jobs where data is)
Interactive workloads
CERN
les robertson - cern-it may-2001 - 20
last update 18/07/2015 00:58
Data Management & Application
Monitoring
data management
Management of petabyte-scale data volumes, in an
environment with limited network bandwidth and heavy use of
mass storage (tape)
Caching, replication, synchronisation
Support for object database model
application monitoring
Tens of thousands of components, thousands of jobs and
individual users
End-user - tracking of the progress of jobs and aggregates of
jobs
Understanding application and grid level performance
Administrator – understanding which global-level applications
were affected by failures, and whether and how to recover
CERN
les robertson - cern-it may-2001 - 21
last update 18/07/2015 00:58
Fabric Management
Local fabric –
Effective local site management of giant computing fabrics
Automated installation, configuration management, system
maintenance
Automated monitoring and error recovery - resilience, selfhealing
Performance monitoring
Characterisation, mapping, management of local Grid resources
Mass storage management
multi-PetaByte data storage
Expect tapes to be used only for archiving data
“real-time” data recording requirement
active tape layer – 1,000s of users
uniform mass storage interface
exchange of data and meta-data between mass storage systems
CERN
les robertson - cern-it may-2001 - 22
last update 18/07/2015 00:58
Infrastructure
Operate a production quality trans European “testbed”
Interconnecting clusters in about 40 sites
Integrate, build and operate successive releases of the
project middleware
Negotiate and manage the network infrastructure
Initially Ten-155, migrating to Géant
Demonstrations, data challenges performance, reliability
Production environment for applications
Inter-working with US projects (GriPhyN, PPDG)
CERN
les robertson - cern-it may-2001 - 23
last update 18/07/2015 00:58
Testbed Sites
(>40)
Dubna
Lund
RAL
Estec KNMI
IPSL
Paris
Santander
Lisboa
Moscow
Berlin
Prague
CERN
Brno
Lyon
Grenoble
Milano
PD-LNL
Torino
Madrid
Marseille Pisa BO-CNAF
Barcelona
ESRIN
Roma
Valencia
HEP sites
ESA sites
Catania
CERN
[email protected]
- [email protected]
les robertson - cern-it may-2001
- 24
last update 18/07/2015 00:58
Applications
HEP
The four LHC experiments
Live proof-of-concept prototype of the Regional Centre model
Earth Observation
ESA-ESRIN
KNMI (Dutch meteo) climatology
Processing of atmospheric ozone data derived from ERS
GOME and ENVISAT SCIAMACHY sensors
Biology
CNRS (France), Karolinska (Sweden)
Application being defined
CERN
les robertson - cern-it may-2001 - 25
last update 18/07/2015 00:58
Challenges
Large, diverse, dispersed project
but coordinating this European activity is one of the project’s
raisons d’être
Collaboration, convergence with US and other Grid activities
– this area is very dynamic
Organising adequate Network bandwidth –
a vital ingredient for success of a Grid
Keeping the feet on the ground –
The GRID is a good idea but not the panacea suggested by some
CERN
les robertson - cern-it may-2001 - 26
last update 18/07/2015 00:58
Concluding remarks
The vision is –
easy and reliable access to very large, shared,
worldwide distributed computing facilities,
without the user having to know the details
The DataGRID project will provide –
a large (capacity & geography) working testbed
practical experience and tools that can be adapted to the
needs of a wide range of scientific and engineering applications
CERN
les robertson - cern-it may-2001 - 27
last update 18/07/2015 00:58