WP8 Status and Plans F Harris (Oxford/CERN) 16 Sep 2002 F Harris GridPP Imperial College.

Download Report

Transcript WP8 Status and Plans F Harris (Oxford/CERN) 16 Sep 2002 F Harris GridPP Imperial College.

WP8 Status and Plans

F Harris (Oxford/CERN)

16 Sep 2002 F Harris GridPP Imperial College 1

Outline of presentation

• Overview of experiment plans for use of Grid facilities/services for tests and data challenges

ATLAS

– – –

ALICE CMS LHCb

– –

BaBar D0

• Status of ATLAS/EDG Task Force work • Essential requirements for making 1.2.n usable by broader physics user community • Future activities of WP8 and some questions • Summary

16 Sep 2002 F Harris GridPP Imperial College 2

ATLAS

• Currently in middle of Phase1 of DC1 (Geant3 simulation,Athena reconstruction,analysis). Many sites in Europe+US+Australia,Canada,Japan,Taiwan,Israel and Russia are involved • Phase2 of DC1 will begin Oct-Nov 2002 model using new event • Plans for use of Grid tools in DCs –

Phase1 Atlas-EDG Task Force to repeat with EDG 1.2. ~1% of simulations already done.

– –

Using CERN,CNAF,Nikhef,RAL,Lyon 9 GB input 100 GB output 2000 CPU hrs

Phase2 will make larger use of Grid tools. Maybe different sites will use different tools. There will be (many?) more sites.

This to be defined Sep 16-20.

~10**6 CPU hrs 20 TB input to reconstruction

16 Sep 2002

5TB output (? How much on testbed?)

F Harris GridPP Imperial College 3

ALICE

Alice assume that as soon as a stable version of 1.2.n is tested and validated it will be progressively installed on ‘all’ EDG testbed sites

As new sites come will use an automatic tool for submission of test jobs of increasing output size and duration

at the moment do not plan a "data challenge" with EDG . However plan a data transfer test , as close as possible to the expected data transfer rate for a real production and analysis

Will concentrate the AliEn/EDG interface and on the AliRoot/EDG interface in particular for items concerning the Data Management.

Will use CERN, CNAF,Nikhef, Lyon,Turin,Catania for first tests

CPU and store requirements can be tailored to availability of facilities in testbed – but will need some scheduling and priorities

16 Sep 2002 F Harris GridPP Imperial College 4

• •

CMS

Currently running production

for DAQ Technical Design Report(TDR) Requires full chain of CMS software and production tools. This includes use of Objectivity.(licensing problem in hand..)

5% Data Challenge(DC04)

will start Summer 2003 and will last ~ 7 months. This will produce 5*10**7 events. In last month all data will be reconstructed and distributed to Tier1/2 centres for analysis.

1000 CPUs for 5 months 100 TB output (LCG prototype)

Use of GRID tools and facilities

– Will not be used for current production – Plan to use in DC04 production –

EDG 1.2 will be used to make scale and performance tests (proof of concept).

Tests on RB, RC and GDMP.

Will need Objectivity for tests

.

• IC,RAL,CNAF/BO,Padova,CERN,Nikhef,IN2P3,Ecol- Poly,ITEP • • Some sites will do EDT +GLUE tests

CPU ~50 CPUs distributed Store ~ 200 Gb per site

V2 will be necessary for DC04

required by CMS) starting summer 2003(has functionality 16 Sep 2002 F Harris GridPP Imperial College 5

LHCB

• First intensive Data Challenge starts Oct 2002 doing intensive pre-tests at all sites. – currently • Participating sites for 2002 – CERN,Lyon,Bologna,Nikhef,RAL – Bristol,Cambridge,Edinburgh,Imperial,Oxford,ITEP Moscow,Rio de Janeiro • Use of EDG Testbed – Install latest OO environment on testbed sites . Flexible job submission Grid/non-Grid – First tests(now) for MC + reconstruction +analysis with data stored to Mass Store – Large scale production tests (by October) – Production (if tests OK) 16 Sep 2002 • •

Aim to do percentage of production on Testbed Total reqt is 500 CPUs for 2 months + ~ 10 TB

(10% should be OK on testbed?)

F Harris GridPP Imperial College 6

BaBar Grid and EDG

Target : have some production environment ready for all users by the end of this year – with attractive interface tools – Customised to SLAC site • Have implemented ‘local hacks’ to overcome problems with – use of LSF Batch Scheduler(uses AFS) – AFS File System used for User Home Directories – Batch Workers located inside of the IFZ (security issue) • Three parts of the Globus/EDG software were installed at SLAC: CE, WN and UI • The exercise clearly showed that they are running fine altogether, and also with the RB at IC • Had problems with ‘old’ version of RB. Problems should largely go away with latest version • BaBar now have D.Boutigny on WP8/TWG 16 Sep 2002 F Harris GridPP Imperial College 7

• • •

D0 (Nikhef)

Have already ran many events on the testbeds of NIKHEF and SARA Wish to extend tests to the whole testbed D0 rpm's are already in the EDG releases

and will be installed on all sites. Will set up a special VO and RC for D0 at NIKHEF on a rather short time scale.

Jeff Templon

, NIKHEF rep. in WP8,

will report on work

16 Sep 2002 F Harris GridPP Imperial College 8

ATLAS-EDG task force: members and sympathizers

ATLAS EDG

Jean-Jacques Blaising Frederic Brochu Alessandro De Salvo Michael Gardner Luc Goossens Marcus Hardt Roger Jones Christos Kanellopoulos Guido Negri Fairouz Ohlsson-Malek Steve O'Neale

16 Sep 2002

Laura Perini Gilbert Poulard Alois Putzer Di Qing David Rebatto Zhongliang Ren Silvia Resconi Oxana Smirnova Stan Thompson Luca Vaccarossa

F Harris GridPP Imperial College

Ingo Augustin Stephen Burke Frank Harris Bob Jones Emanuele Leonardi Mario Reale Markus Schulz Jeffrey Templon

9

Achievements so far

see

http://s.home.cern.ch/s/smirnova/www/atlas-edg/

A team of hard-working people across Europe in Atlas and EDG (middleware + WP6 +WP8) has been set up

(led by O Smirnova with help from R Jones and F Harris) •

ATLAS software (release 3.2.1) is packed into relocatable RPMs, distributed and validated elsewhere

Following removal of the GASS Cache fix in EDG, 50% of the planned challenge is performed

but this is changing fast (5 researchers × 10 jobs) – only CERN testbed was fully available to start, 16 Sep 2002 F Harris GridPP Imperial College 10

In progress:

New set of challenges, including smaller input files

Presentation and first results: Luc Goossens

All the core Testbed sites (1.2.2) are becoming available + FZK the rest of the challenge has a chance to be really distributed =>

Big file replication can be done , avoiding GDMP & Replica Manager

With distributed input files, several jobs already have been steered by the RB to NIKHEF, following the requested input data

. The rest of the batch went to CERN

Report in preparation

16 Sep 2002 F Harris GridPP Imperial College 11

Bottom line for Task Force:

Major obstacles:

– GASS Cache limitations (long jobs vs frequent submission) – being worked on – File transfer time limit in data management tools – hopefully can be addressed soon

Still, the ways around are known and quick fixes are deployed, allowing to run production like jobs

• The whole EDG middleware is pretty much in the development state, and

things are changing (improving!) on a daily basis

16 Sep 2002 F Harris GridPP Imperial College 12

Essential requirements for making 1.2.n usable by broader physics user community

Top level requirements

• • • •

Production testbed to be stable for weeks, not hours, and allow spectrum of job submissions Have reasonably easy to use basic functions for job submission, replica handling and mass storage utilisation Good concise user documentation for all functions Easy for user to get certificates and to get into correct VO working environment

So what happens now in today’s reality?

having had very positive discussions at Budapest in joint meetings with Workpackages 1+2+5+6

‘gass-cache’ and ’20 min file limit’ problems are absolute top priority – being pursued with ‘patches’ right now. Let’s hope we don’t need new version of Globus!

‘wrap’ data management complexity

while waiting for version 2. (GDMP is too complex for ‘average’ user) – trying out interim RM for single files.

We need to clarify use of mass store

data (Castor,HPSS,RAL store) by multi-VOs • e.g how is store partitioned between VOs, and how does non-Grid user access • Discussions ongoing and interim solutions being worked on 16 Sep 2002 F Harris GridPP Imperial College 13

• More essential requirements on use of 1.2

We must put

people and procedures in place for mapping VO organisation onto test bed sites

(e.g. quotas, priorities) • We must clarify

user support at sites (middleware + applications)

Installation of applications software

should not be combined

with the system installation •

Authentication & authorisation

Can we streamline this procedure?

Atlas!) (40-odd countries to accommodate for •

Documentation (+ Training - EDG tutorials for experiments)

– Has to be user-oriented and concise – Much good work going on here (user guide+examples). About to be released 16 Sep 2002 F Harris GridPP Imperial College 14

Some longer term requirements

• Job Submission to take into account availability of space on SEs and quota assigned to ‘user’ (e.g. for macro-jobs, say 500 each generating 1 GB) • Mass Store should be on Grid in a transparent way (space management, archiving,staging) • Need ‘easy to use’ replica management system

• Comments

– Are some of these ‘1.2.n ‘rather than ‘2’, i.e. increments in functionality in successive releases?

– Task Force people should maintain continuing dialogue with developers – (should include data challenge managers from all VOs in dialogue) 16 Sep 2002 F Harris GridPP Imperial College 15

Future activities of WP8 and some questions

The mandate of WP8 is to facilitate the interfacing of applications to EDG middleware

, and participate in the evaluation and produce the evaluation reports (start writing very soon!). •

Loose Cannons have been heavily involved in testing middleware components

, and have produced test software and documentation. This should be packaged for use by the Test Group(now strengthened and formalised).

LCs will be involved in liasing with the experiments testing their applications

. The details of how this relates to the new EDG/LCG Testing/Validation procedure have to be worked out.

WP8 have been involved in the development of ‘application use cases’

participate to current ATF activities. This is continuing. LCG via GDB to carry this on in broader sense.

and •

We are interested in the feasibility of a ‘common application layer’

running over middleware functions. This issue goes into the domain of current LCG deliberations.

16 Sep 2002 F Harris GridPP Imperial College 16

• •

Summary

Current WP8 top priority activity is Atlas/EDG Task Force work

– This has been very positive. Focuses attention on the real user problems, and as a result we review our requirements, design etc. Remember the eternal cycle! We must maintain flexibility with continuing dialogue between users and developers.

Will continue Task Force flavoured activities with the other experiments

• •

Current use of Testbed is focused on main sites

(CERN,Lyon,Nikhef,CNAF,RAL) – this is mainly for reasons of support given unstable situation

Once stability is achieved

(see Atlas/EDG work)

we will expand to other sites

. But we should be careful in selection of these sites in the first instance. Local support would seem essential. •

WP8 will maintain a role in architecture discussions

, and maybe be involved in some common application layer developments •

THANKS

To members of IT and the middleware WPs for heroic efforts in past months, and to Federico for laying WP8 foundations 16 Sep 2002 F Harris GridPP Imperial College 17