Status of the LCG Project

Download Report

Transcript Status of the LCG Project

LCG
LHC Computing Grid Project
Creating a Global Virtual Computing
Centre for Particle Physics
ACAT’2002
27 June 2002
Les Robertson
IT Division, CERN
[email protected]
last update: 20/07/2015 18:34
les robertson - cern-it 1
Summary
LCG




last update 20/07/2015 18:34
LCG – The LHC Computing Grid Project
 requirements, funding, creating a Grid
areas of work
 grid technology
 computing fabrics
 deployment
 operating a grid
Plan for the LCG Global Grid Service
A few remarks
les robertson - cern-it-2
LCG
Summary
of Computing Capacity Required for all LHC
Experiments in 2007
source: CERN/LHCC/2001-004 - Report of the LHC Computing Review - 20 February 2001
(ATLAS with 270Hz trigger)
---------- CERN ---------Tier 0
Tier 1
Processing (K SI95)
Disk (PB)
Magnetic tape (PB)
1,727
1.2
16.3
832
1.2
1.2
Total
Regional
Centres
Grand
Total
2,559
2.4
17.6
4,974
8.7
20.3
7,533
11.1
37.9
Funding dictates –

Worldwide distributed computing system

Small fraction of the analysis at CERN

Batch analysis – using 12-20 large regional centres
 how to use the resources efficiently
 establishing and maintaining a uniform physics environment

last update 20/07/2015 18:34
Data exchange and interactive analysis involving tens of
smaller regional centres, universities, labs
les robertson - cern-it-3
Summary - Project Goals
LCG
Goal –
Prepare and deploy the LHC computing environment

applications - tools, frameworks, environment, persistency

computing system  global grid service
 cluster  automated fabric
 collaborating computer centres  grid
 CERN-centric analysis  global analysis environment
This is not another grid technology project –
it is a grid deployment project
last update 20/07/2015 18:34
les robertson - cern-it-4
LCG
Two Phases
The first phase of the project – 2002-2005

preparing the prototype computing environment, including
 support for applications – libraries, tools, frameworks,
common developments, …..
 global grid computing service

funded by Regional Centres, CERN, special contributions to
CERN by member and observer states, middleware
developments by national and regional Grid projects
 manpower OK
 hardware at CERN - ~40% funded

Phase 2 – construction and operation of the initial LHC
Computing Service – 2005-2007
 at CERN – missing funding of ~80M CHF
last update 20/07/2015 18:34
les robertson - cern-it-5
Funding
LCG


Funding agencies have little enthusiasm for investing more in
particle physics
HEP seen as a ground-breaker in computing





initiator of the Web
track record of exploiting leading edge computing
effective global collaborations
real need – for data as well as computation
one of the few application areas with real cross-border data needs

LHC in sync with
-- emergence of Grid technology
-- explosion of network bandwidth

We must deliver on Phase 1 for LHC and show the relevance for other sciences
last update 20/07/2015 18:34
les robertson - cern-it-6
Building a Grid
LCG
Computing Centre Cluster
WAN
application
servers
mass
storage
data cache
last update 20/07/2015 18:34
les robertson - cern-it-7
LCG
Cluster  Fabric
autonomic computing
automated management
installation, configuration,
maintenance, monitoring,
error recovery, …
-reliability
-cost containment
last update 20/07/2015 18:34
les robertson - cern-it-8
The MONARC Multi-Tier Model
(1999)
LCG
Tier 0 - recording,
reconstruction
CERN
IN2P3
FNAL
RAL
Uni n
Lab a
Tier2
Uni b
Department
Lab c



Desktop
MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html
last update 20/07/2015 18:34
les robertson - cern-it-9
[email protected]
Tier 1 –
full service
LCG
Building a Grid
Collaborating
Computer
Centres
last update 20/07/2015 18:34
les robertson - cern-it-10
LCG
a Grid
virtual LHC
Computing Centre
Grid TheBuilding
Collaborating
Computer
Centres
Alice VO
CMS VO
last update 20/07/2015 18:34
les robertson - cern-it-11
LCG
Virtual Computing Centre
The user --sees the image of a single cluster
does not need to know - where the data is
- where the processing capacity is
- how things are interconnected
- the details of the different hardware
and is not concerned by the conflicting policies of the
equipment owners and managers
last update 20/07/2015 18:34
les robertson - cern-it-12
Project Implementation
Organisation
LCG
Four areas
last update 20/07/2015 18:34

Applications (see Matthias Kasemann’s

Grid Technology

Fabrics

Grid deployment
presentation)
les robertson - cern-it-13
Grid Technology Area
Leveraging Grid R&D Projects
LCG
•
significant R&D funding for Grid middleware
•
risk of divergence
and is that good or bad?
•
global grids need standards
•
useful grids need stability
•
•
Many national,
regional Grid projects -hard to do this in the current stateGridPP(UK),
of maturityINFN-grid(I),
NorduGrid, Dutch Grid, …
will we recognise and be willing
to migrate to the winning solutions?
European projects
US projects
last update 20/07/2015 18:34
les robertson - cern-it-14
LCG


Grid Technology Area
Ensuring that the appropriate middleware is
available
Supplied and maintained by the “Grid projects”
 It is proving hard to get the first “production” data
intensive grids going as user services
 Can the grid projects provide long-term support and
maintenance?
 Trade-off between new functionality and stability
last update 20/07/2015 18:34
les robertson - cern-it-15
LCG


last update 20/07/2015 18:34
The Trans-Atlantic Issue
Bridging the ATLANTIC is essential for the project
HICB – High Energy and Nuclear Physics Intergid
Collaboration Board
 GLUE – Grid Laboratory Universal Environment
compatible middleware and infrastructure
 Funded by DataTAG and iVDGL
 Certificates - OK
 Schemas – under way, working with the wider Globus
world, getting complicated – probably OK
 Middleware components – not yet clear – but close
collaboration on
 File replication
 Job scheduling
les robertson - cern-it-16
LCG

Collaboration with Grid Projects
LCG must deploy a GLOBAL GRID
 essential to have compatible middleware &
grid infrastructure
 better – have identical middleware

We are banking on GLUE
But we have to make some choices towards the end of the
year

Services are about stability, support, maintenance
Can the R&D grid projects take commitments for long term
maintenance of their middleware?
last update 20/07/2015 18:34
les robertson - cern-it-17
Scope of Fabric Area
LCG

Tier 1,2 centre collaboration

Grid-Fabric integration middleware
(DataGrid WP4)

Automated systems management package

Technology assessment (PASTA III) started

CERN Tier 0+1 centre
last update 20/07/2015 18:34
les robertson - cern-it-18
LCG
Grid Deployment Area

The aim is to build
 a general computing service
 for a very large user population
 of independently-minded scientists
 using a large number of independently managed sites

This is NOT a collection of sites providing pre-defined services
 it is the user’s job that defines the service
 it is current research interests that define the workload
 it is the workload that defines the data distribution
DEMAND - Unpredictable & Chaotic
But the SERVICE had better be
last update 20/07/2015 18:34
les robertson - cern-it-19
Available & Reliable
LCG
Grid Deployment – current status

Experiments can do (and are doing) their event production using
distributed resources with a variety of solutions
 classic distributed production
– send jobs to specific sites, simple bookkeeping
 some use of Globus, and some of the HEP Grid tools
 other integrated solutions (ALIEN)

The hard problem for distributed computing
is data analysis – ESD and AOD
 chaotic workload
 unpredictable data access patterns
this is where new Grid technology is needed
resource broker, replica management, ..
this is the problem that the LCG has to solve
last update 20/07/2015 18:34
les robertson - cern-it-20
Grid Operation
Local site
LCG
queries
monitoring & alarms
corrective actions
User
Local user support
Local operation
Grid operations
Call Centre
Grid Operations Centre
Grid
information
service
last update 20/07/2015 18:34
Grid
logging &
bookkeeping
les robertson - cern-it-21
Virtual
Organisation
Network
Operations
Centre
LCG
Grid Operation

We do not know how to do this

Probably nobody knows –
looks like network operation, but there are many
more variables to be watched and adjusted;
looks like multi-national commercial systems, but
we have no central ownership, control

A 24 hour service is needed
– round the clock and round the world
last update 20/07/2015 18:34
les robertson - cern-it-22
Setting up the
LHC Global Grid Service
LCG


First data is in 2007
LCG must learn from current solutions, leverage the tools coming
from the grid projects, show that grids are useful
but set realistic targets
 short term (this year):
 use current solutions for physics data challenges (event
productions)
 consolidate (stabilise, maintain) middleware
 learn what a “production grid” really means by working with
DataGrid and VDT
 medium term (next year):
 Set up a reliable global grid service – initially only a few larger
centres, but on three continents
 Stabilise it
 Several times the capacity of the CERN facility
and as easy to use
last update 20/07/2015 18:34
les robertson - cern-it-23
LCG
Having stabilised this base service –
showing that we can run a solid service for the
experiments
then – progressive evolution –




integrate all of the Regional Centre resources
provided for LHC
improve quality, reliability, predictability
integrate new middleware functionality – possibly once
per year
migrate to de facto standards as soon as they emerge
last update 20/07/2015 18:34
les robertson - cern-it-24
LCG






last update 20/07/2015 18:34
Final comments
It is not just about distributing computation,
it is also about managing distributed data (lots of it!)
and maintaining a single view of the environment
All these parallel developments, rapidly changing technology ..
may be good in the long term, but we must deploy
a global grid service next year
A dependable, reliable 24 X 7 service is essential
and not so easy to do with all these sites and all that data
Service Quality is the Key to Acceptance of Grids
Reliable OPERATION will be the factor that limits the size of
practical Grids
We are getting funding because of the relevance for other
sciences, engineering, business -keeping things general, main-line must remain a high priority
les robertson - cern-it-25