Since GridPP14

Download Report

Transcript Since GridPP14

What has Happened Since GridPP14?
John Gordon
CCLRC e-Science Centre
GridPP15, RAL
12th January 2006
John Gordon
[email protected]
Summary
• In Birmingham last September we tried a
new approach to discussion
– Instead of single speakers we had a number of
panel sessions with much more participation
• At the time, the consensus was that these
sessions were fruitful and a refreshing
change
• How successful is it viewed 4 months later?
Sessions
1. "LCG Service Challenges" (plans for SC4 based on experience of SC3)
2. "Running Applications on the Grid" (Why won't my jobs run?)
3. "Grid Documentation" (What documentation is needed/missing? Is it a
question of organisation?)
4. "What value does GridPP add?" - Provocateur: Dave Britton
5. "Beyond GridPP2 and e-Infrastructure" (What is the current status of
planning?)
6. "Managing Large Facilities in the LHC era" (What works? What doesn't? What
won't)
7. "What is a workable Tier-2 Deployment Model?"
8. "What is Middleware Support?" (really all about)
1. "LCG Service
Challenges"
(plans for SC4 based on experience of SC3)
• Panel Members: Jeremy Coles, Brian Davies, John Gordon, Roger
Jones, Paul Millar, Steve Traylen, Paul Trepka, Yong-Jun Zhang
• This was a detailed session which brought out the detailed planning
of Service Challenges.
1. SC is a great idea which is a kind of reality check.
2. Need more documentation and support.
3. Time scales and deadlines are needed for deployment
4. Storage model is important issue especially for storage group
5. Communication on experience (will be discussed more at the next
deployment meeting-what is the next stage)
6. Networks will play an important part in SC4 (more network skills are
required)
There was a list of specific actions
•
•
•
•
•
•
•
•
Implement a better user support model
Support the deployment of an SRM at every Tier-2 site
Revisit site plans for implementing promised resources
Support the installation of any required local catalogues
at sites
Investigate the experiment VO box requests. Make a
recommendation to Tier-2s. Revisit as GridPP.
Better understand network links to sites (we do not want
to saturate links)
Schedule transfer tests from Tier-1 to Tier-2 test rates
and stability
Work closer with experiments?
There was a list of specific actions
• user support (mail lists, web form, TPMs,
GGUS integration)
• SRM at T2 (almost done)
• site plans revised (SRIF3, FEC)
• local catalogues (wiki, SC3, plan for rest)
• VO boxes (review group)
• network links (10 easy questions, wiki)
• T1-T2 tests (plan, stalled, dcache/dpm)
• Experiment links (some progress)
2. "Running Applications on the
Grid"
(Why won't my jobs run?)
• Panel Members: Giuliano Castelli, James Catmore, Dave Colling, David
Grellscheid, Peter Hobson, Steve Lloyd, Grigor Rybkine, Gianfranco Sciacca,
Alvin Tan, Peter Watkins, James Werner
• Several talks as input, less time for input from the floor.
Summary
• A number of people say things working are well - pleasant surprise easier than LSF!
• VO setup and requirements: don't want each VO to have to talk to
each site. VO should provide list of requirements for site to support
VO.
• Certificates: need to improve situation. Once over this hurdle using
the grid is plainer sailing.
• Data management issues more of a problem than job or RB problems.
How to get information to user re failures and support channels.
• Monitoring real file transfers would be an interesting addition.
3. "Grid Documentation"
(What documentation is needed/missing? Is it a question of organisation?)
Panel Members: Stephen Burke, Stephen Childs, Tony Doyle, William Hay, Dave
Kelsey, Peter Love, Giuseppe Mazza, Robin Middleton, Ivan Hollins
•
•
•
•
Could updates to documents be raised at meetings?
A mailing list specifically for document updates may be useful.
Competition between different solutions to one problem.
For all experiments - link in all documentation and give responsibility to a
line manager (for example) to oversee its maintenance.
• What are the mechanisms or how do we find out what is inadequate within a
document - a document should be checked every few months to point out its
inadequacies => should a review process be set up by SB.
• Roles and responsibilities should be established.
• Important documents should be highlighted - and index of useful doc's and
what sources of documents are available may be useful.
• Much progress made by Stephen Burke in many of these areas. Steve attends
PMB
4. "What value does GridPP add?"
Provocateur: Dave Britton
• Minimal feedback from the floor
• Subsequent discussion in PMB
"What value does GridPP add?“
22 points to be further pruned
1) The GridPP Identity
2) Enabling the LCG Project
3) Leading contributions to Grid Middleware
4) The Tier centre structures.
5) A Deployment team.
6) The UK Particle Physics Grid.
5. "Beyond GridPP2 and eInfrastructure"
• (What is the current status of planning?)
• Panel Members: Dave Britton, Abdeslem Djaoui, Tony
Doyle, John Gordon, Dave Kelsey, Steve Lloyd, Dave
Newbold, Rhys Newman
• EGEE II may be superseded by European
infrastructure
• DTI planning a UK infrastructure
• Integrate better with NGS
• More things developed by GridPP will be
supported centrally
6. "Managing Large
Facilities in the LHC era"
• (What works? What doesn't? What won't)
• Panel Members: Mona Agerwaal, Catalin
Condurache, Alessandra Forti, Pete Gronbech,
Lawrence Lowe, Colin Morey, Fraser Speirs, John
Walsh
• Sys admins seem happy with their package
managers.
• We should share common knowledge (about
software tools) more.
• Extra Costs (over and above the price of the
hardware) involved in having large clusters.
7. "What is a workable Tier-2
Deployment Model?“
• Panel Members: Chris Brew, Yves Coppens, Alessandra
Forti, Pete Gronbech, Jiri Mencak, Henry Nebrensky,
Fraser Speirs, Steve Thorn, Steve Traylen, Jeff Tseng,
Olivier Van der Aa
• Conclusion:
Deployment is under control
– testing has made good progress
– operations still an issue.
8. "What is Middleware
Support?"
• (really all about)
• Panel Members: Joseph Dada, Alistair Duncan, Steve Fisher, Jens Jensen,
Yibiao Li, Andrew McNab, Alexander Soroko, Graeme Stewart, Stefan Stonjek,
Owen Synge
•
•
•
•
•
•
DC: gLite test bed
EGEE2 - dedicated testing/certification system
using wiki was good idea. Consolidate into documents.
need some structure to make sure wiki doesn't get out of control.
GS: need some moderators for the wiki.
developers not getting correct requirements for s/w.sysadmin questions not
the same questions that were in the minds
• of the developers..
• bad if the wiki is incorrect.
• need someone to move what is in the wiki to some sort of more formal docs
(LaTeX or DocBook) which has been properly checked and signed off by the
developers.
Conclusion
•
•
•
•
All sessions were felt to be worthwhile
Some produced hard actions
Some areas have made progress since
Positive correlation between subjects which
made progress and where GridPP had existing
structures in place (Deployment, Documentation)
– Counter examples, middleware, experiments
• Let’s do this again but next time take more care
to task people with subsequent progress and look
for new structures to deliver results.