The ARDA project
Download
Report
Transcript The ARDA project
GridPP 10th Collaboration Meeting, CERN, 04 June 2004
The ARDA Project
Between Grid middleware
http://cern.ch/arda
and LHC experiments
Juha Herrala
ARDA Project
[email protected]
www.eu-egee.org
cern.ch/lcg
EGEE is a project funded by the European Union under contract IST-2003-508833
Contents
• Introduction to the LCG ARDA Project
History and mandate
Relation to EGEE project
ARDA prototypes
Relation to Regional Centres
• Towards ARDA prototypes
LHCb, ATLAS, ALICE, CMS
• Coordination and forum activities
Workshops and meetings
• Conclusions and Outlook
GridPP@CERN, 04 June 2004 - 2
How ARDA evolved
• LHC Computing Grid (LCG) project’s Requirements and Technical
Assessment Group (RTAG) for distributed analysis presented their
ARDA report in November 2003.
ARDA = Architectural Roadmap for Distributed Analysis
Defined a set of collaborating Grid services and their interfaces
• As a result the ARDA project was launched by LCG
ARDA = A Realisation of Distributed Analysis
Purpose is to coordinate different activities in the development of distributed
analysis systems of the LHC experiments, which will be based on the new
service-oriented Grid middleware and infrastructure.
• But the generic Grid middleware is developed by the EGEE project
Sometimes ARDA became also a synonym for this “second generation”
Grid middleware, which was later (May 2004) renamed to Glite.
Generic = no significant functionality that is of interest for HEP or any other
science/community alone.
GridPP@CERN, 04 June 2004 - 3
Our starting point / mandate
Recommendations of the ARDA working group
• New service decomposition
Strong influence of Alien system
• the Grid system developed by the ALICE experiments and used by a wide
scientific community (not only HEP)
• Role of experience, existing technology…
Web service framework
• Interfacing to existing middleware to enable their use in the experiment
frameworks
• Early deployment of (a series of) end-to-end prototypes to ensure
functionality and coherence
Middleware as a building block
Validation of the design
GridPP@CERN, 04 June 2004 - 4
EGEE and LCG ARDA
• LCG strongly linked with
middleware developed/deployed
in EGEE (continuation of EDG)
• The core infrastructure of the
EGEE Grid operation service will
grow out of the LCG service
ARDA
LCG includes many US and Asian
partners
EGEE includes other sciences
Substantial part of infrastructure
common to both
• Parallel production lines
LCG-2 production Grid
• 2004 data challenges
Pre-production prototype
• EGEE/Glite MW
• ARDA playground for the LHC
experiments
LCG-1 LCG-2 EGEE-1 EGEE-2
VDT/EDG
EGEE WS MW
GridPP@CERN, 04 June 2004 - 5
ARDA prototypes
• Support LHC experiments to implement their end-to-end analysis
prototypes based on the EGEE/Glite middleware
ARDA will equally support each of the LHC experiments
Close collaboration with data analysis teams, ensuring end-to-end
coherence of the prototypes
One prototype per experiment
• Role of ARDA
Interface with the EGEE middleware
Adapt/verify components of analysis environments of the experiments
(robustness/many users, performance/concurrent “read” actions)
A Common Application Layer may emerge in future
Feedback from the experiments to the middleware team
• Final target beyond the prototype activity: sustainable distributed
analysis services for the four experiments deployed at LCG Regional
Centres.
GridPP@CERN, 04 June 2004 - 6
ARDA @ Regional Centres
• Regional Centres have valuable practical experience and know how
Understand “deployability” issues, which is a key factor for (EGEE/Glite)
middleware success
Data base technologies
Web Services
• Some Regional Centres will have the responsibility to provide early
installation for the middleware
EGEE Middleware test bed
Pilot sites might enlarge the resources available and give fundamental
feedback in terms of “deployability” to complement the EGEE SA1
• Running ARDA pilot installations
ARDA test bed for analysis prototypes
Experiment data available where the experiment prototype is deployed
• Stress and performance tests could be ideally located outside CERN
Experiment-specific components (e.g. a Meta Data catalogue) which might
be used by the ARDA prototypes
Exploit local know how of the Regional Centres
• Final ARDA goal: sustainable analysis service for LHC experiments
GridPP@CERN, 04 June 2004 - 7
ARDA project team
• Massimo Lamanna
• Birger Koblitz
• Andrey Demichev
• Viktor Pose
Russia
• Derek Feichtinger
• Andreas Peters
ALICE
• Wei-Long Ueng
• Tao-Sheng Chen
Taiwan
• Dietrich Liko
• Frederik Orellana
ATLAS
Experiment interfaces
• Julia Andreeva
• Juha Herrala
CMS
• Andrew Maier
• Kuba Moscicki
LHCb
Piergiorgio Cerello (ALICE)
David Adams (ATLAS)
Lucia Silvestris (CMS)
Ulrik Egede (LHCb)
GridPP@CERN, 04 June 2004 - 8
Towards ARDA prototypes
• Existing systems as starting point
Every experiment has different implementations of the
standard services
Used mainly in production environments
Now more emphasis on analysis
GridPP@CERN, 04 June 2004 - 9
Prototype activity
• Provide a fast feedback to the EGEE MW development team
Avoid uncoordinated evolution of the middleware
Coherence between users’ expectations and final product
• Experiments may benefit from the new MW as soon as possible
Frequent snapshots of the middleware available
Expose the experiments (and the community in charge of the deployment)
to the current evolution of the whole system
Experiments’ systems are very complex and still evolving
• Move forward towards new-generation real systems (analysis!)
A lot of work (experience and useful software) is invested in current data
challenges of the experiments, which makes them a concrete starting point
Whenever possible adapt/complete/refactorise the existing components: we
do not need yet another system!
• Attract and involve users
Prototypes with realistic workload and conditions, thus real users from LHC
experiments required!
GridPP@CERN, 04 June 2004 - 10
Prototype activity
• The initial prototype will have a reduced scope of functionality
Currently components are selection for the first prototype
• Not all use cases/operation modes will be supported
Every experiment has a production system (with multiple backends, like
PBS, LCG, G2003, NorduGrid, …).
We focus on end-user analysis on a EGEE MW based infrastructure
• Informal Use Cases are still being defined, e.g. a generic analysis case:
A physicist selects a data sample (from current Data Challenges)
With an example/template as starting point (s)he prepares a job to scan the
data
The job is split in sub-jobs, dispatched to the Grid, some error-recovery is
automatically performed if necessary, and finally merged back in a single
output
The output (histograms, ntuples) is returned together with simple
information on the job-end status
GridPP@CERN, 04 June 2004 - 11
Towards ARDA prototypes
• LHCb - ARDA
GridPP@CERN, 04 June 2004 - 12
LHCb
• GANGA as a principal component
Friendly user interface for Grid services
• The LHCb/GANGA plans match naturally with the ARDA mandate
Goal is to enable physicists (via GANGA) to analyse the data being
produced during 2004 for their studies
Have the prototype where the LHCb data will be the key (CERN, RAL, …)
• At the beginning, the emphasis will be focused on
Usability of GANGA
Validation of the splitting and merging functionality of users jobs
• The DIRAC system is also an important component
LHCb grid system, used mainly in production so far
Useful target to understand the detailed behaviour of LHCb-specific grid
components, like the file catalog.
• Convergence between DIRAC and GANGA anticipated.
GridPP@CERN, 04 June 2004 - 13
GANGA
Gaudi/Athena aNd Grid Alliance
Gaudi/Athena: LHCb/ATLAS frameworks
•
The Athena uses Gaudi as a foundation
Single “desktop” for a variety of tasks
•
Help configuring and submitting analysis jobs
•
Keep track of what they have done, hiding
completely all technicalities
•
•
GUI
GANGA
Histograms
Monitoring
Results
JobOptions
Algorithms
GAUDI Program
Resource Broker, LSF, PBS, DIRAC, Condor
Job registry stored locally or in the roaming profile
Automate config/submit/monitor procedures
Provide a palette of possible choices and
specialized plug-ins (pre-defined application
configurations, batch/grid systems, etc.)
Internal Model
UI
•
BkSvc
WLM
ProSvc
Collective
&
Resource
Grid
Services
GANGA
Monitor
Friendly user interface (CLI/GUI) is essential
GUI Wizard Interface
•
•
Help users to explore new capabilities
Browse job registry
Scripting/Command Line Interface
•
•
Automate frequent tasks
python shell embedded into the Ganga GUI
Bookkeeping
Service
WorkLoad
Manager
SE
CE
Profile
Service
File
catalog
Grid Services
Instr.
GAUDI
Program
GridPP@CERN, 04 June 2004 - 14
ARDA contribution to GANGA
• Release management procedure established
Software process and integration
• Testing, bug fix releases, tagging policies, etc.
Infrastructure
• Installation, packaging etc.
ARDA team member in charge
• Integration with job managers/resource brokers
Waiting for the EGEE middleware, we developed an interface to Condor
Use of Condor DAGMAN for splitting/merging and error recovery capability
• Design and development in next future
Integration with EGEE middleware
Command Line Interface
Evolution of Ganga features
GridPP@CERN, 04 June 2004 - 15
CERN/Taiwan tests on
LHCb metadata catalogue
• Clone Bookkeeping
DB in Taiwan
Bookkeeping
Server
• Install the WS layer
CERN
• Performance Tests
Database I/O Sensor
Bookkeeping Server
performance tests
Oracle
DB
Client
Network
monitor
Virtual Users
Web & XMLRPC Service
performance
tests
CPU Load
Network
Process time
• Taiwan/CERN Bookkeeping Server DB
• XML-RPC Service performance tests
• CPU Load, Network send/receive sensor, Process time
DB I/O
Sensor
CPU Load
Network
Process
time
Bookkeeping
Server
TAIWAN
Oracle
DB
Client Host performance tests
• CPU Load, Network send/receive sensor, Process time
• Feedback to LHCb metadata catalogue developers
GridPP@CERN, 04 June 2004 - 16
Towards ARDA prototypes
• ATLAS - ARDA
GridPP@CERN, 04 June 2004 - 17
ATLAS
• ATLAS has a relatively complex strategy for distributed analysis,
addressing different areas with specific projects
Fast response (DIAL)
User-driven analysis (GANGA)
Massive production with multiple Grids, etc…
For additional information see the ATLAS Distributed Analysis (ADA) site:
http://www.usatlas.bnl.gov/ADA/
• The ATLAS system within ARDA has been agreed
Starting point is the DIAL service model for distributed interactive analysis;
users will be exposed to different user interface (GANGA)
• The AMI metadata catalog is a key component in ATLAS prototype
mySQL as a back end
Genuine Web Server implementation
Robustness and performance tests from ARDA
• In the start up phase, ARDA provided some assistance in developing
production tools
GridPP@CERN, 04 June 2004 - 18
•
Many problems still open:
Large network traffic overhead
due to schema independent
tables
SOAP Web Services proxy
supposed to provide DB access
•
•
Meta-Data
(MySQL)
Simulation/ReconstructionVersion
Does not contain physical
filenames
SOAP-Proxy
User
Atlas Metadata- Catalogue,
contains File Metadata:
User
•
User
AMI studies in ARDA
Studied behaviour using many
concurrent clients:
Note that Web Services are
“stateless” (not automatic
handles to have the concept of
session, transaction, etc…): 1
query = 1 (full) response
Large queries might crash server
Shall SOAP front-end proxy reimplement all the database
functionality?
Good collaboration in place with
ATLAS-Grenoble
GridPP@CERN, 04 June 2004 - 19
Towards ARDA prototypes
• ALICE - ARDA
GridPP@CERN, 04 June 2004 - 20
ALICE
• Strategy:
The ALICE-ARDA will evolve
PROOF SLAVES
Site A
the analysis system presented
at SuperComputing 2003
‘Grid-enabled PROOF’
• Where to improve:
Site B
PROOF SLAVES
TcpRouter
Heavily connected with
the middleware services
“Inflexible” configuration
No chance to use PROOF on
federated grids like LCG
User libraries distribution
TcpRouter
PROOF
PROOF MASTER
SERVER
TcpRouter
Site C
PROOF
SLAVES
• Activity on PROOF
Robustness
Error recovery
TcpRouter
USER SESSION
GridPP@CERN, 04 June 2004 - 21
Improved PROOF system
PROOF SLAVE
SERVERS
Proxy rootd
Grid Services
• Original problem: no support for
hierarchical Grid infrastructure,
only local cluster mode.
• The remote proof slaves look
like a local proof slave on
the master machine
• Booking service is usable also on
local clusters
Proxy proofd
Booking
PROOF
Master
GridPP@CERN, 04 June 2004 - 22
Towards ARDA prototypes
• CMS - ARDA
GridPP@CERN, 04 June 2004 - 23
CMS
• The CMS system within ARDA is still under discussion
• Provide easy access (and possibly sharing) of data for the
CMS users is a key issue in discussions
RefDB
Summaries of successful jobs
CMS DC04
production
Reconstruction
instructions
McRunjob
Reconstruction
jobs
T0 worker
nodes
Tapes
Transfer
agent
Checks
what has
arrived
Reconstructed
data
GDB castor
pool
Updates
Updates
Reconstructed
data
RLS
TMDB
Export
Buffers
GridPP@CERN, 04 June 2004 - 24
CMS RefDB
• Potential starting point for the
•
•
•
•
prototype
Bookkeeping engine to plan and
steer the production across
different phases (simulation,
reconstruction, to some degree
into the analysis phase)
Contains all necessary
information except file physical
location (RLS) and info related to
the transfer management system
(TMDB)
The actual mechanism to provide
these data to analysis users is
under discussion
Measuring performances
underway (similar philosophy
as for the LHCb Metadata
catalog measurements)
GridPP@CERN, 04 June 2004 - 25
• Coordination and forum activities
Workshops and meetings
GridPP@CERN, 04 June 2004 - 26
Coordination and forum activities
• Forum activities are seen as ‘fundamental’ in the ARDA project definition
ARDA will channel information to the appropriate recipients, especially to
analysis-related activities and projects outside the ARDA prototypes
Ensures that new technologies can be exposed to the relevant community
• ARDA should organise a set of regular meetings
Aim is to discuss results, problems, new/alternative solutions and possibly
agree on some coherent program of work. Workshop every three months.
The ARDA project leader organises this activity which will be truly distributed
and lead by the active partners
• ARDA is embedded in EGEE
NA4, Application Identification and Support
• Special relation with LCG
LCG GAG is a forum for Grid requirements and use cases
Experiments representatives coincide with the EGEE NA4 experiments
representatives
GridPP@CERN, 04 June 2004 - 27
Workshops and meetings
1st ARDA workshop
• January 2004 at CERN; open
• Over 150 participants
2nd ARDA workshop “The first 30 days of EGEE middleware”
• June 21-23 at CERN; by invitation
• Expected 30 participants
EGEE NA4 Meeting mid July
• NA4/JRA1 (middleware) and NA4/SA1 (Grid operations) sessions
• Organised by M. Lamanna and F. Harris
3rd ARDA workshop
• Currently scheduled for September 2004 close to CHEP; open
GridPP@CERN, 04 June 2004 - 28
Next ARDA workshop
“The first 30 days of the EGEE middleware”
• CERN: 21-23 of June 2004
Exceptionally by invitation only
• Monday, June 21
ARDA team / JRA1 team
ATLAS (Metadata database services for HEP experiments)
• Tuesday, June 22
LHCb (Experience in building web services for grid)
CMS (Data management)
• Wednesday, June 23
ALICE (Interactivity on the Grid)
Close out
• Info on the web:
• http://lcg.web.cern.ch/LCG/peb/arda/LCG_ARDA_Workshops.htm
GridPP@CERN, 04 June 2004 - 29
• Conclusions
GridPP@CERN, 04 June 2004 - 30
Conclusions and Outlook
• LCG ARDA has started
Main objective: experiment prototypes for analysis
EGEE/Glite middleware becoming available
Good feedback from the LHC experiments
Good collaboration within EGEE project
Good collaboration with Regional Centres. More help needed.
• Main focus
Prototyping distributed analysis systems of LHC experiments.
Collaborate with the LHC experiments, the EGEE middleware team and the
Regional Centres to set up the end-to-end prototypes.
• Aggressive schedule
Milestone for the first end-to-end prototypes is already December 2004.
GridPP@CERN, 04 June 2004 - 31