The ARDA project

Download Report

Transcript The ARDA project

GridPP 10th Collaboration Meeting, CERN, 04 June 2004
The ARDA Project
Between Grid middleware
http://cern.ch/arda
and LHC experiments
Juha Herrala
ARDA Project
[email protected]
www.eu-egee.org
cern.ch/lcg
EGEE is a project funded by the European Union under contract IST-2003-508833
Contents
• Introduction to the LCG ARDA Project
 History and mandate
 Relation to EGEE project
 ARDA prototypes
 Relation to Regional Centres
• Towards ARDA prototypes
 LHCb, ATLAS, ALICE, CMS
• Coordination and forum activities
 Workshops and meetings
• Conclusions and Outlook
GridPP@CERN, 04 June 2004 - 2
How ARDA evolved
• LHC Computing Grid (LCG) project’s Requirements and Technical
Assessment Group (RTAG) for distributed analysis presented their
ARDA report in November 2003.
 ARDA = Architectural Roadmap for Distributed Analysis
 Defined a set of collaborating Grid services and their interfaces
• As a result the ARDA project was launched by LCG
 ARDA = A Realisation of Distributed Analysis
 Purpose is to coordinate different activities in the development of distributed
analysis systems of the LHC experiments, which will be based on the new
service-oriented Grid middleware and infrastructure.
• But the generic Grid middleware is developed by the EGEE project
 Sometimes ARDA became also a synonym for this “second generation”
Grid middleware, which was later (May 2004) renamed to Glite.
 Generic = no significant functionality that is of interest for HEP or any other
science/community alone.
GridPP@CERN, 04 June 2004 - 3
Our starting point / mandate
Recommendations of the ARDA working group
• New service decomposition
 Strong influence of Alien system
• the Grid system developed by the ALICE experiments and used by a wide
scientific community (not only HEP)
• Role of experience, existing technology…
 Web service framework
• Interfacing to existing middleware to enable their use in the experiment
frameworks
• Early deployment of (a series of) end-to-end prototypes to ensure
functionality and coherence
 Middleware as a building block
 Validation of the design
GridPP@CERN, 04 June 2004 - 4
EGEE and LCG ARDA
• LCG strongly linked with
middleware developed/deployed
in EGEE (continuation of EDG)
• The core infrastructure of the
EGEE Grid operation service will
grow out of the LCG service
ARDA
 LCG includes many US and Asian
partners
 EGEE includes other sciences
 Substantial part of infrastructure
common to both
• Parallel production lines
 LCG-2 production Grid
• 2004 data challenges
 Pre-production prototype
• EGEE/Glite MW
• ARDA playground for the LHC
experiments
LCG-1 LCG-2 EGEE-1 EGEE-2
VDT/EDG
EGEE WS MW
GridPP@CERN, 04 June 2004 - 5
ARDA prototypes
• Support LHC experiments to implement their end-to-end analysis
prototypes based on the EGEE/Glite middleware
 ARDA will equally support each of the LHC experiments
 Close collaboration with data analysis teams, ensuring end-to-end
coherence of the prototypes
 One prototype per experiment
• Role of ARDA
 Interface with the EGEE middleware
 Adapt/verify components of analysis environments of the experiments
(robustness/many users, performance/concurrent “read” actions)
 A Common Application Layer may emerge in future
 Feedback from the experiments to the middleware team
• Final target beyond the prototype activity: sustainable distributed
analysis services for the four experiments deployed at LCG Regional
Centres.
GridPP@CERN, 04 June 2004 - 6
ARDA @ Regional Centres
• Regional Centres have valuable practical experience and know how
 Understand “deployability” issues, which is a key factor for (EGEE/Glite)
middleware success
 Data base technologies
 Web Services
• Some Regional Centres will have the responsibility to provide early
installation for the middleware
 EGEE Middleware test bed
 Pilot sites might enlarge the resources available and give fundamental
feedback in terms of “deployability” to complement the EGEE SA1
• Running ARDA pilot installations
 ARDA test bed for analysis prototypes
 Experiment data available where the experiment prototype is deployed
• Stress and performance tests could be ideally located outside CERN
 Experiment-specific components (e.g. a Meta Data catalogue) which might
be used by the ARDA prototypes
 Exploit local know how of the Regional Centres
• Final ARDA goal: sustainable analysis service for LHC experiments
GridPP@CERN, 04 June 2004 - 7
ARDA project team
• Massimo Lamanna
• Birger Koblitz
• Andrey Demichev
• Viktor Pose
Russia
• Derek Feichtinger
• Andreas Peters
ALICE
• Wei-Long Ueng
• Tao-Sheng Chen
Taiwan
• Dietrich Liko
• Frederik Orellana
ATLAS
Experiment interfaces
• Julia Andreeva
• Juha Herrala
CMS
• Andrew Maier
• Kuba Moscicki
LHCb
Piergiorgio Cerello (ALICE)
David Adams (ATLAS)
Lucia Silvestris (CMS)
Ulrik Egede (LHCb)
GridPP@CERN, 04 June 2004 - 8
Towards ARDA prototypes
• Existing systems as starting point
 Every experiment has different implementations of the
standard services
 Used mainly in production environments
 Now more emphasis on analysis
GridPP@CERN, 04 June 2004 - 9
Prototype activity
• Provide a fast feedback to the EGEE MW development team
 Avoid uncoordinated evolution of the middleware
 Coherence between users’ expectations and final product
• Experiments may benefit from the new MW as soon as possible
 Frequent snapshots of the middleware available
 Expose the experiments (and the community in charge of the deployment)
to the current evolution of the whole system
 Experiments’ systems are very complex and still evolving
• Move forward towards new-generation real systems (analysis!)
 A lot of work (experience and useful software) is invested in current data
challenges of the experiments, which makes them a concrete starting point
 Whenever possible adapt/complete/refactorise the existing components: we
do not need yet another system!
• Attract and involve users
 Prototypes with realistic workload and conditions, thus real users from LHC
experiments required!
GridPP@CERN, 04 June 2004 - 10
Prototype activity
• The initial prototype will have a reduced scope of functionality
 Currently components are selection for the first prototype
• Not all use cases/operation modes will be supported
 Every experiment has a production system (with multiple backends, like
PBS, LCG, G2003, NorduGrid, …).
 We focus on end-user analysis on a EGEE MW based infrastructure
• Informal Use Cases are still being defined, e.g. a generic analysis case:
 A physicist selects a data sample (from current Data Challenges)
 With an example/template as starting point (s)he prepares a job to scan the
data
 The job is split in sub-jobs, dispatched to the Grid, some error-recovery is
automatically performed if necessary, and finally merged back in a single
output
 The output (histograms, ntuples) is returned together with simple
information on the job-end status
GridPP@CERN, 04 June 2004 - 11
Towards ARDA prototypes
• LHCb - ARDA
GridPP@CERN, 04 June 2004 - 12
LHCb
• GANGA as a principal component
 Friendly user interface for Grid services
• The LHCb/GANGA plans match naturally with the ARDA mandate
 Goal is to enable physicists (via GANGA) to analyse the data being
produced during 2004 for their studies
 Have the prototype where the LHCb data will be the key (CERN, RAL, …)
• At the beginning, the emphasis will be focused on
 Usability of GANGA
 Validation of the splitting and merging functionality of users jobs
• The DIRAC system is also an important component
 LHCb grid system, used mainly in production so far
 Useful target to understand the detailed behaviour of LHCb-specific grid
components, like the file catalog.
• Convergence between DIRAC and GANGA anticipated.
GridPP@CERN, 04 June 2004 - 13
GANGA
Gaudi/Athena aNd Grid Alliance
Gaudi/Athena: LHCb/ATLAS frameworks
•
The Athena uses Gaudi as a foundation
Single “desktop” for a variety of tasks
•
Help configuring and submitting analysis jobs
•
Keep track of what they have done, hiding
completely all technicalities



•
•
GUI

GANGA
Histograms
Monitoring
Results
JobOptions
Algorithms
GAUDI Program
Resource Broker, LSF, PBS, DIRAC, Condor
Job registry stored locally or in the roaming profile
Automate config/submit/monitor procedures
Provide a palette of possible choices and
specialized plug-ins (pre-defined application
configurations, batch/grid systems, etc.)
Internal Model
UI
•
BkSvc
WLM
ProSvc
Collective
&
Resource
Grid
Services
GANGA
Monitor
Friendly user interface (CLI/GUI) is essential

GUI Wizard Interface
•
•

Help users to explore new capabilities
Browse job registry
Scripting/Command Line Interface
•
•
Automate frequent tasks
python shell embedded into the Ganga GUI
Bookkeeping
Service
WorkLoad
Manager
SE
CE
Profile
Service
File
catalog
Grid Services
Instr.
GAUDI
Program
GridPP@CERN, 04 June 2004 - 14
ARDA contribution to GANGA
• Release management procedure established
 Software process and integration
• Testing, bug fix releases, tagging policies, etc.
 Infrastructure
• Installation, packaging etc.
 ARDA team member in charge
• Integration with job managers/resource brokers
 Waiting for the EGEE middleware, we developed an interface to Condor
 Use of Condor DAGMAN for splitting/merging and error recovery capability
• Design and development in next future
 Integration with EGEE middleware
 Command Line Interface
 Evolution of Ganga features
GridPP@CERN, 04 June 2004 - 15
CERN/Taiwan tests on
LHCb metadata catalogue
• Clone Bookkeeping
DB in Taiwan
Bookkeeping
Server
• Install the WS layer
CERN
• Performance Tests
 Database I/O Sensor
 Bookkeeping Server
performance tests
Oracle
DB
Client
Network
monitor
Virtual Users
Web & XMLRPC Service
performance
tests
CPU Load
Network
Process time
• Taiwan/CERN Bookkeeping Server DB
• XML-RPC Service performance tests
• CPU Load, Network send/receive sensor, Process time
DB I/O
Sensor
CPU Load
Network
Process
time
Bookkeeping
Server
TAIWAN
Oracle
DB
 Client Host performance tests
• CPU Load, Network send/receive sensor, Process time
• Feedback to LHCb metadata catalogue developers
GridPP@CERN, 04 June 2004 - 16
Towards ARDA prototypes
• ATLAS - ARDA
GridPP@CERN, 04 June 2004 - 17
ATLAS
• ATLAS has a relatively complex strategy for distributed analysis,
addressing different areas with specific projects




Fast response (DIAL)
User-driven analysis (GANGA)
Massive production with multiple Grids, etc…
For additional information see the ATLAS Distributed Analysis (ADA) site:
http://www.usatlas.bnl.gov/ADA/
• The ATLAS system within ARDA has been agreed
 Starting point is the DIAL service model for distributed interactive analysis;
users will be exposed to different user interface (GANGA)
• The AMI metadata catalog is a key component in ATLAS prototype
 mySQL as a back end
 Genuine Web Server implementation
 Robustness and performance tests from ARDA
• In the start up phase, ARDA provided some assistance in developing
production tools
GridPP@CERN, 04 June 2004 - 18
•
Many problems still open:


Large network traffic overhead
due to schema independent
tables
SOAP Web Services proxy
supposed to provide DB access
•


•
Meta-Data
(MySQL)

Simulation/ReconstructionVersion
Does not contain physical
filenames
SOAP-Proxy

User
Atlas Metadata- Catalogue,
contains File Metadata:
User
•
User
AMI studies in ARDA
Studied behaviour using many
concurrent clients:
Note that Web Services are
“stateless” (not automatic
handles to have the concept of
session, transaction, etc…): 1
query = 1 (full) response
Large queries might crash server
Shall SOAP front-end proxy reimplement all the database
functionality?
Good collaboration in place with
ATLAS-Grenoble
GridPP@CERN, 04 June 2004 - 19
Towards ARDA prototypes
• ALICE - ARDA
GridPP@CERN, 04 June 2004 - 20
ALICE
• Strategy:
 The ALICE-ARDA will evolve
PROOF SLAVES
Site A
the analysis system presented
at SuperComputing 2003
‘Grid-enabled PROOF’
• Where to improve:
Site B
PROOF SLAVES
TcpRouter
 Heavily connected with
the middleware services
 “Inflexible” configuration
 No chance to use PROOF on
federated grids like LCG
 User libraries distribution
TcpRouter
PROOF
PROOF MASTER
SERVER
TcpRouter
Site C
PROOF
SLAVES
• Activity on PROOF
 Robustness
 Error recovery
TcpRouter
USER SESSION
GridPP@CERN, 04 June 2004 - 21
Improved PROOF system
PROOF SLAVE
SERVERS
Proxy rootd
Grid Services
• Original problem: no support for
hierarchical Grid infrastructure,
only local cluster mode.
• The remote proof slaves look
like a local proof slave on
the master machine
• Booking service is usable also on
local clusters
Proxy proofd
Booking
PROOF
Master
GridPP@CERN, 04 June 2004 - 22
Towards ARDA prototypes
• CMS - ARDA
GridPP@CERN, 04 June 2004 - 23
CMS
• The CMS system within ARDA is still under discussion
• Provide easy access (and possibly sharing) of data for the
CMS users is a key issue in discussions
RefDB
Summaries of successful jobs
CMS DC04
production
Reconstruction
instructions
McRunjob
Reconstruction
jobs
T0 worker
nodes
Tapes
Transfer
agent
Checks
what has
arrived
Reconstructed
data
GDB castor
pool
Updates
Updates
Reconstructed
data
RLS
TMDB
Export
Buffers
GridPP@CERN, 04 June 2004 - 24
CMS RefDB
• Potential starting point for the
•
•
•
•
prototype
Bookkeeping engine to plan and
steer the production across
different phases (simulation,
reconstruction, to some degree
into the analysis phase)
Contains all necessary
information except file physical
location (RLS) and info related to
the transfer management system
(TMDB)
The actual mechanism to provide
these data to analysis users is
under discussion
Measuring performances
underway (similar philosophy
as for the LHCb Metadata
catalog measurements)
GridPP@CERN, 04 June 2004 - 25
• Coordination and forum activities
 Workshops and meetings
GridPP@CERN, 04 June 2004 - 26
Coordination and forum activities
• Forum activities are seen as ‘fundamental’ in the ARDA project definition
 ARDA will channel information to the appropriate recipients, especially to
analysis-related activities and projects outside the ARDA prototypes
 Ensures that new technologies can be exposed to the relevant community
• ARDA should organise a set of regular meetings
 Aim is to discuss results, problems, new/alternative solutions and possibly
agree on some coherent program of work. Workshop every three months.
 The ARDA project leader organises this activity which will be truly distributed
and lead by the active partners
• ARDA is embedded in EGEE
 NA4, Application Identification and Support
• Special relation with LCG
 LCG GAG is a forum for Grid requirements and use cases
 Experiments representatives coincide with the EGEE NA4 experiments
representatives
GridPP@CERN, 04 June 2004 - 27
Workshops and meetings
 1st ARDA workshop
• January 2004 at CERN; open
• Over 150 participants
 2nd ARDA workshop “The first 30 days of EGEE middleware”
• June 21-23 at CERN; by invitation
• Expected 30 participants
 EGEE NA4 Meeting mid July
• NA4/JRA1 (middleware) and NA4/SA1 (Grid operations) sessions
• Organised by M. Lamanna and F. Harris
 3rd ARDA workshop
• Currently scheduled for September 2004 close to CHEP; open
GridPP@CERN, 04 June 2004 - 28
Next ARDA workshop
“The first 30 days of the EGEE middleware”
• CERN: 21-23 of June 2004
 Exceptionally by invitation only
• Monday, June 21
 ARDA team / JRA1 team
 ATLAS (Metadata database services for HEP experiments)
• Tuesday, June 22
 LHCb (Experience in building web services for grid)
 CMS (Data management)
• Wednesday, June 23
 ALICE (Interactivity on the Grid)
 Close out
• Info on the web:
• http://lcg.web.cern.ch/LCG/peb/arda/LCG_ARDA_Workshops.htm
GridPP@CERN, 04 June 2004 - 29
• Conclusions
GridPP@CERN, 04 June 2004 - 30
Conclusions and Outlook
• LCG ARDA has started
 Main objective: experiment prototypes for analysis
 EGEE/Glite middleware becoming available
 Good feedback from the LHC experiments
 Good collaboration within EGEE project
 Good collaboration with Regional Centres. More help needed.
• Main focus
 Prototyping distributed analysis systems of LHC experiments.
 Collaborate with the LHC experiments, the EGEE middleware team and the
Regional Centres to set up the end-to-end prototypes.
• Aggressive schedule
 Milestone for the first end-to-end prototypes is already December 2004.
GridPP@CERN, 04 June 2004 - 31