LCG Deployment in the UK

Download Report

Transcript LCG Deployment in the UK

LCG Deployment in the UK
John Gordon
GridPP10
John Gordon [email protected]
CCLRC e-Science Centre
• You’ve heard about LCG…
• … so what’s happening in the UK?
•
•
•
•
LCG Deployment, now and future
The wider UK picture
….and what’s this EGEE?
The Plan
CB
PMB
Deployment Board
Tier1/Tier2,
Testbeds,
Rollout
User Board
Metadata
Storage
Workload
Requirements
Network
Security
Info. Mon.
User
feedback
LCG
ARDA
Application
Development
Expmts
EGEE
Service
specification
& provision
A. Management
Structure
In LCG
Context
Recent LCG
•
•
•
•
•
•
Tier1 +10 other sites
DCs
Tier2 structure
Support structure
GOC Monitoring
LCG Accounting
GridPP Summary:
From Prototype to Production
BaBarGrid
BaBar
CDF
ATLAS
ALICE
LHCb
CMS
CERN Computer
Centre
RAL Computer
Centre
19 UK Institutes
Separate Experiments,
Resources, Multiple
Accounts
2001
EGEE
SAMGrid
D0
GANGA
EDG
ARDA
LCG
LCG
CERN Prototype
Tier-0 Centre
UK Prototype
Tier-1/A Centre
CERN Tier-0
Centre
UK Tier-1/A
Centre
4 UK Tier-2
Centres
4 UK Prototype
Tier-2 Centres
Prototype Grids
2004
'One' Production Grid
2007
Vision
• GridPP2 should deliver a production quality grid
• Meeting the computing needs of UK Particle
Physics
• Autonomous and self-supporting with its own
identity
• Participating in LCG, EGEE, BaBarGrid,
SAMGrid, and any others desired by its
members
• Part of an integrated UK Grid
• Independent but integrated, separate but
seamless
Delivery Plans
•
•
•
•
•
Keep up with LCG
Participate in LHC Data Challenges
TierA for BaBar and BaBarGrid
Participate in LCG Service Challenges
Use by other VOs
• Put in place the structure to deliver this
• …..and more
Production Team
•
•
•
•
•
•
•
Deployment
User Support
Middleware Support
Applications Support
Network Support
Security
Operations
UK Tier-2 Centres
NorthGrid ****
Daresbury, Lancaster, Liverpool,
Manchester, Sheffield
SouthGrid *
Birmingham, Bristol, Cambridge,
Oxford, RAL PPD, Warwick
ScotGrid *
Durham, Edinburgh, Glasgow
LondonGrid ***
Brunel, Imperial, QMUL, RHUL, UCL
Current UK Status:
11 Sites via LCG
Tier2 Centres
• UK model of distributed Tier2 Centres
• Managerial and organisational ‘centre’
• Tier2 is free to organise internally
– so I cannot describe yet
• Tier2 is smaller than an EGEE Region
– but some aspects of the
model may be useful
(their own VO? own RB?)
• May hide some of the internal
structure CE, GIIS?
Deployment
• A Team to roll out software across UK
• Software release certification, installation
support, site certification
• Specialist support for sysadmins
• Consists of staff from T1 + T2
User Support
• Migrate from mailing list to problemtracking
• From sysadmin support to user support
• Managed Helpdesk
– for assignment, tracking, escalation
• We already have a lot of experience
– we haven’t encapsulated it in FAQs etc
Middleware, Security and Security
Network Development
Middleware
Networking
Grid Data Management
Network Monitoring
Configuration
Management
Storage Interfaces
Information Services
Security
M/S/N builds upon UK
strengths as part of
International development
Middleware Support
• GridPP2 Middleware development should
have an emphasis on delivery and support
• Middleware teams should support their
software area
• T2 assigned 5 specialist support posts
• Integrate support effort into Production
Team
Applications Support
• Stephen Burke – roaming support
• 2 T1 experiment-facing people
• UK experiments
• Get deployment and middleware support
working with experiments
– to ensure successful UK involvement in
experiments’ use of Grid.
Network Support
• Mark Leese (CCLRC-DL)
– Rolled out network monitoring to UK Core e-Science
programme
– GridPP2 role in network support
– Network optimisation
– Participation in service challenges
– Hopefully using lightpaths
Security
• New Security Officer (to be appointed)
– Security operations
• Consultants
– Kelsey - Joint EGEE-LCG Security
– Jensen – technical advice to CA/
middleware
– McNab – e-Science Security Centre
• Track UK developments (Permis,
Shibboleth)
Grid Operations
GOC
Secure Database Management via HTTPS / X.509
GOC
GridSite
MySQL
Resource Centre
Resources & Site Information
EDG, LCG-1, LCG-2, …
ce
bdii
se
rb
RC
Monitoring
Operations
• LCG Operations centre
• EGEE ROC
• Monitor GridPP (and NGS and
GridIreland)
• Developed tools for LCG, reuse for
GridPP
• Continue developing for EGEE
• EGEE CIC running grid-wide services
• Accounting
LCG Core Accounting
Base CPU Time (Seconds)
1.00E+10
1.00E+08
1.00E+06
1.00E+04
1.00E+02
1.00E+00
TAIPEI
NIKHEF
CNAF
RAL
FZK
CERN
CAM
Alice
Atlas
cms
d0
LHCb
dteam
Wider Support
• GSC
– UK helpdesk
– UK E-Science CA
• Training
– Our own and EGEE(NeSC)
Other UK Grids
• NGS
– National Grid Service
– 4 large clusters + 2 UK Supercomputers
– Already using VDT and BDII
• ETF
– Developing UK OGSA/WSRF Grid
• UK Grid Operations Centre Director
– Speaking next
• Should all be part of EGEE
EGEE
• UK/I Region in EGEE covers GridPP, NGS, and Grid
Ireland – one of 10 regions
• EGEE’s aim is to integrate national grids
– Not to interfere or impose limits on them
• All of the work I have described, short of actually running
the Resource Centres, is EGEE work
– Many sites are actually signed up to EGEE so we can
report it formally as such
– Many of you will be asked to report work to EGEE
(timesheets, quarterly reports) but this shouldn’t be an
imposition
• The development of GridPP will be aligned with EGEE
– But EGEE is not well defined, so we plan GridPP and
participate in the developing EGEE to learn, adopt,
and influence.
EGEE Issues
• EGEE=LCG?
– non-European sites in LCG
– non-LCG sites in EGEE
• Platform Support
– non-Linux, free linux (cf RHEL)
• Integrated user support
• Support for new VOs
• Security, security, security
The Next Steps
• Just appointed Jeremy Coles
– as GridPP Production Manager
• Grid Definition
– define GridPP,
– get buy-in of stakeholders
• Production Team
– build the team
• Workplan
Production Manager Tasks
• Develop work plan (deliverables/milestones)
• Compile problems and issues list (implement
tracking)
• Organise a GridPP deployment group workshop
• Better establish GridPP identity – address UK
specific needs
• Review/develop operating procedures to maintain
GridPP service
• Get GridPP more involved at UK/experiment software
meetings
• Coordinate UK Tier-2 resource input to LCG and
EGEE
• Work with other grids to establish a single production
grid.
Running a production service:
areas to be reviewed and developed
Main areas to be considered (transparency, control, accountability, security, improvement)
•
Grid accounting
–
•
Grid monitoring
–
–
–
•
–
Service levels (SLAs/MoUs to be developed)
Resource, quota and priority handling
Resource
–
•
For new GridPP users and new operations staff
Middleware release strategy (and stabilisation!)
Tier-2 management
–
–
•
Installation (joining) requirements/guidelines
integration & helpdesk requirements
Library – deployment documentation. User feedback – mechanism to inform future developments
Training
–
•
•
Processes and procedures (e.g. incident handling)
Mechanics of trust model defined: identity, privacy, policy and authority. (e.g how are rights revoked.
Appeals.)
Misuse of resources (intrusion), user & usage audits
Support
–
–
–
•
Grid management: VO setup procedures; adding new Tier-2 resources
Frequency, structure and content of reports to be agreed (e.g. resource usage, job success rates against
targets)
Security
–
–
•
Service-level management tools. Efficiency of resource usage. Replication issues.
Detailed metrics to be agreed
Real-time notification and problem resolution
Management & reporting
–
–
•
Who needs to know what and in what form? Where are the gaps in LCG accounting?
Maintenance plans
Audit
–
Of Grid usage by user/VO
Vision
• GridPP2 should deliver a production quality grid
• Meeting the computing needs of UK Particle
Physics
• Autonomous and self-supporting with its own
identity
• Participating in LCG, EGEE, BaBarGrid,
SAMGrid, and any others desired by its
members
• Part of an integrated UK Grid
• Independent but integrated, separate but
seamless
Challenge
• LCG has given us a good base
– We now have a critical mass based on
LCG2
• Make it production quality grid
• Attract the satellite grids UKQCD, BaBar,
– And bring in other experiments
• Participate fully in LCG and EGEE
– Without alienating non LHC experiments
Can we do it?
Yes, we can!