Transcript Document
The Credibility Gap 11/Oct/07 D. Britton GridPP2 ProjectMap 0.1 0.18 0.35 0.52 Production Grid Milestones 0.2 0.19 0.36 0.53 0.3 0.20 0.37 0.54 0.4 0.21 0.38 0.55 0.5 0.22 0.39 0.56 Design 1.1 1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 Service Challenges 1.2 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 0.7 0.24 0.41 0.58 1 LCG 0.6 0.23 0.40 0.57 Development 1.3 1.3.1 1.3.2 1.3.3 0.8 0.25 0.42 0.59 Navigate down External link Other Link 91 (93%) 11/Nov/2007 0.10 0.27 0.44 0.61 0.11 0.28 0.45 0.62 0.12 0.29 0.46 0.63 2 M/S/N Metadata 2.1 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 2.1.7 2.1.8 2.1.9 2.1.10 2.1.11 2.1.12 Storage 2.2 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6 2.2.7 2.2.8 2.2.9 2.2.10 2.2.11 2.2.12 2.2.13 2.2.14 2.2.15 Workload 2.3 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 2.3.6 2.3.7 2.3.8 2.3.9 2.3.10 2.3.11 Metric OK 0.9 0.26 0.43 0.60 Security 2.4 2.4.1 2.4.2 2.4.3 2.4.4 2.4.5 2.4.6 2.4.7 2.4.8 2.4.9 2.4.10 2.4.11 2.4.12 2.4.13 2.4.14 2.4.15 7 Network 2.6 211 (86%) 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.6.6 2.6.7 2.6.8 2.6.9 2.6.10 2.6.11 2.6.12 2.6.13 0.14 0.31 0.48 0.65 0.15 0.32 0.49 0.66 0.16 0.33 0.50 0.67 0.17 0.34 0.51 0.68 ATLAS 3.1 3.1.1 3.1.2 3.1.3 3.1.4 3.1.5 3.1.6 3.1.7 3.1.8 3.1.9 3.1.10 3.1.11 3.1.12 3.1.13 GANGA 3.2 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 3.2.7 LHCb 3.3 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 3.3.6 3.3.7 3.3.8 3.3.9 3.3.10 3.3.11 3.3.12 3.3.13 0.100 0.101 0.102 0.103 0.104 0.105 0.106 0.107 0.108 0.109 0.110 0.111 0.112 0.113 0.114 0.115 0.116 0.117 0.118 0.119 0.120 0.121 0.122 0.123 0.124 0.125 0.126 0.127 0.128 0.129 0.130 0.131 0.132 0.133 0.134 0.135 0.136 0.137 0.138 0.139 0.140 0.141 0.142 0.143 0.144 0.145 0.146 0.147 3 LHC Apps CMS 3.4 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 3.4.6 3.4.7 3.4.8 3.4.9 3.4.10 3.4.11 3.4.12 3.4.13 3.4.14 3.4.15 Metric Tasks InfoMon not OK 2.5 Complete 2.5.1 2.5.2 2.5.3 2.5.4 2.5.5 2.5.6 2.5.7 2.5.8 2.5.9 2.5.10 2.5.11 2.5.12 2.5.13 2.5.14 Production Grid Metrics 0.13 0.30 0.47 0.64 Tasks Overdue PhenoGrid 3.5 BaBar 4.1 SamGrid 4.2 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.2.6 4.2.7 4.2.8 4.2.9 4.2.10 4.2.11 4.2.12 4.2.13 4.2.14 4.2.15 Portal 4.3 5 Management Project Planning 5.1 6 External 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 5.1.6 5.1.7 5.1.8 5.1.9 5.1.10 5.1.11 5.1.12 UKQCD 4.4 GridPP Status Interoperability 6.2 5.2.1 5.2.2 5.2.3 5.2.4 5.2.5 5.2.6 5.2.7 5.2.8 5.2.9 5.2.10 5.2.11 5.2.12 5.2.13 5.2.14 5.2.15 6.2.1 6.2.2 6.2.3 6.2.4 6.2.5 6.2.6 6.2.7 6.2.8 6.2.9 6.2.10 6.2.11 6.2.12 6.2.13 6.2.14 Engagement 6.3 6.3.1 6.3.2 6.3.3 6.3.4 6.3.5 Knowledge Transfer 6.4 6.4.1 6.4.2 6.4.3 6.4.4 Tasks due in Items next 60 days Status Date - Inactive 30/Jun/07 3.6.1 3.6.2 3.6.3 3.6.4 3.6.5 3.6.6 3.6.7 3.6.8 3.6.9 3.6.10 6.1.1 6.1.2 6.1.3 6.1.4 6.1.5 6.1.6 6.1.7 6.1.8 6.1.9 Project Execution 5.2 4.4.1 4.4.2 4.4.3 4.4.4 4.4.5 4.4.6 4.4.7 4.4.8 4.4.9 4.4.10 23 Dissemination 6.1 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 4.3.6 4.3.7 4.3.8 4.3.9 4.3.10 4.3.11 4.3.12 4.3.13 4 LHC Deployment 3.6 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 4.1.6 4.1.7 4.1.8 4.1.9 4.1.10 4.1.11 4.1.12 3.5.1 3.5.2 3.5.3 3.5.4 3.5.5 3.5.6 3.5.7 3.5.8 3.5.9 4 Non-LHC Apps Monitor OK Monitor not OK Milestone complete Milestone overdue Milestone due soon Milestone not due soon Item not Active 41 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 Tasks + next not 90Due Days Change Forms Update 12 8 Clear D. Britton GridPP2+ Deliverables • These have been defined. • Not yet in the GridPP2 Project Map (no space) • Will be monitored separately and/or in GridPP3 Project Map 11/Nov/2007 GridPP Status D. Britton GridPP3 Project Map 11/Nov/2007 GridPP Status D. Britton Risk Register ID R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R20 R22 R25 R26 R27 R28 R29 R30 R31 R32 R33 R35 R36 R37 R38 R39 R40 R41 R42 R43 R44 Alt-i-r Name Recruitment/retention difficulties Sudden loss of key staff Minimal Contingency GridPP deliverables late Sub-components not delivered to project Non take-up of project results Change in project scope Bad publicity External OS dependence External middleware dependence Lack of monitoring of staff Withdrawal of an experiment Lack of cooperation between Tier centres Scalablity problems Software maintainability problems Technology shifts Repitition of research Lack of funding to meet LCG PH-1 goals Conflicting software requirements Hardware resources inadequate Hardware procurement problems LAN Bottlenecks Tier-2 organisation fails Experiment Requirements not met SYSMAN effort inadequate Firewalls interfere with Grid Inablility to establish trust relationships Security inadequate to operate Grid Interoperability Failure of international cooperation e-Science and GridPP divergence Institutes do not embrace Grid Grid does not work as required Delay of the LHC Lack of future funding Network backbone failure Network backbone bottleneck Network backbone upgrade delay Inadequate User Support 11/Nov/2007 GridPP Li Im Risk Li 2 1 4 2 8 1 3 3 1 2 3 3 2 3 3 6 LCG MSN Apps Pro. Grid Im Risk Li Im Risk Li Im Risk Li Im Risk 2 4 2 2 4 2 2 4 2 2 4 3 3 1 3 3 1 3 3 1 4 4 1 2 1 1 2 1 1 3 2 2 1 3 4 1 1 2 1 2 1 2 2 4 2 2 2 2 8 2 4 4 2 4 2 4 1 4 2 3 2 2 6 4 2 2 2 3 3 3 4 2 3 3 6 4 4 3 2 3 3 3 2 2 6 9 4 2 3 6 1 4 2 2 4 4 3 2 2 2 6 4 2 2 2 3 2 3 4 6 6 2 2 2 3 1 4 4 6 4 2 2 1 3 4 3 1 3 3 4 3 2 3 12 6 1 4 4 3 2 2 3 6 6 2 2 2 1 2 2 2 2 3 3 3 3 2 3 3 3 6 6 6 3 4 6 6 6 2 3 2 3 6 6 2 3 6 0 2 1 2 1 4 4 6 CASTOR “Credibility Gap” 6 2 1 2 4 2 2 1 2 1 2 1 4 8 4 6 4 2 2 8 3 3 9 4 3 “Credibility Gap” 12 GridPP Status 4 2 4 3 D. Britton Credibility Gap: Definition • Refers to the lack of funding for the support of experiment applications running on the Grid. • We believe that this “fell between two stools”: GridPP3 is about the deployment and support of Infrastructure; the Rolling Grants focused on the exploitation of physics, assuming the tools were in place. • We identify this as a Gap in the overall UK strategy to capitalise on all the investment in LHC hardware and computing. • Danger is that experiment’s use of the Grid will be inefficient and/or labour intensive resulting in UK physicists becoming uncompetitive. 11/Nov/2007 GridPP Status D. Britton GridPP Actions After raising this at the last Oversight Committee Meeting, and following the advice received, GridPP took the following actions to address the Credibility Gap: 1) Identified funds available and potentially could be made available within the existing GridPP2 funding envelope. 2) Consulted with ATLAS, CMS and LHCb about how best to address the problem. 3) Formulated a detailed plan for GridPP3, which included the above funds, and submitted to STFC. 11/Nov/2007 GridPP Status D. Britton GridPP2 Funds RAL SLA FY01 SLA FY02 SLA FY03 SLA FY04 SLA FY05 SLA FY06 1 Non SLA £1,522,055 £1,639,723 £1,602,243 £2,461,037 £2,357,724 £2,737,436 £26,667 Sub Total £12,346,885 University Grants GridPP1 Issued £4,603,425 GridPP1 recovered -£52,883 GridPP2 Issued £6,439,676 Sub Total £10,990,219 Other Costs Globus Support £12,658 CERN £5,666,835 Sub Total £5,679,493 Expenditure Spend to date £29,016,596 Income GridPP1 Award £17,000,000 GridPP2 Award £15,900,000 Total Award £32,900,000 Encumbrances SLA FY07 Staff £832,420 SLA FY07 Travel £275,000 SLA FY07 Hardware £1,508,318 Total Encumbrance £2,615,738 Balance Balance £1,267,665 11/Nov/2007 £1268k funding identified, arising from: £316k (EGEE funding for the four Tier-2 coordinators). £94k not yet spent on GridPP2 Tier-2 hardware line. £64k accrued due to vacant staff posts. £40k not yet spent on the GridPP2 consumables line. £22k underspend on Tier-1 hardware in FY06. £134k saved on the total travel budget due to EU rebates. £598K documented in the previous Oversight Committee document. GridPP Status D. Britton Plan Proposed Previously Identified Posts NE9 UB Chair NE8 Administrative Assist. SubTotal Support Transition Posts NC28 VOMS Service NC29 RTM Transition NC30 Networking Completion NC31 Metadata Transition SubTotal Experiment Support Posts ND11 Atlas T1 ND12 LHCb T1 ND13 CMS T1 ND14 Atlas Ganga ND15 LHC Ganga SubTotal Grand Total 11/Nov/2007 FTE Frac 25% 50% Start Duration Effort Date (Months) (Months) 01-Sep-07 43.00 10.75 01-Apr-08 36.00 18.00 79.00 28.75 Cost Estimate £90,785 £35,328 £126,113 50% 50% 50% 100% 01-Apr-08 01-Nov-07 01-Sep-07 01-Oct-07 24.00 7.00 7.00 6.60 44.60 12.00 3.50 3.50 6.60 25.60 £85,400 £21,562 £20,532 £45,000 £172,493 50% 100% 150% 100% 50% 01-Apr-08 01-Apr-08 01-Apr-08 01-Apr-08 01-Apr-08 36.00 36.00 36.00 36.00 36.00 180.00 303.60 18.00 36.00 54.00 36.00 18.00 162.00 £107,500 £215,000 £322,500 £215,000 £107,500 £967,500 £1,266,107 GridPP Status D. Britton STFC Response We would not be permitted to carry forward the £1.27m from GridPP2. After considerable iteration it was agreed that a subset of the proposed posts could be funded from the GridPP3 Working Allowance Experiment support posts reduced from 1.5 to 1.0 FTE for each of the three experiments. Some discretion would be applied in the use of the GridPP3 contingency. 11/Nov/2007 GridPP Status D. Britton GridPP Position GridPP notes that: 1) 2) 3) 4) The 1.5 FTE proposed to support the experiment applications was already subminimal and reducing this compounds the problem. The working allowance was approved to address concerns about the service levels at the Tier-1 and Tier-2s. Pre-spending this elsewhere introduces risk. GridPP2 funds were peer review approved. They were part of the consideration when deciding on the level of the GridPP3 award. At least half the savings were documented prior to the finalisation of the GridPP3 award. The additional contributions to the savings were largely made through obtaining EU funding and careful and responsive management (to delays in the LHC schedule, for example, and anticipation of difficult times ahead). 11/Nov/2007 GridPP Status D. Britton Experiment’s View ATLAS notes that their request for ATLAS-UK funds in this area is on hold pending clarification of the GridPP situation. The timing of all this is bad. CMS are deeply concerned about the shortfall and feel there is significant risk to their operations. They are particularly concerned, being smaller than ATLAS, that they are living on “borrowed time” as several key academics keeping the computing side afloat will shortly have to return to other duties with no obvious substitutes available. More generally, CMS perceive a substantial risk that the RAL Tier-1 will not be integrated into any of the Experiment’s international computing systems at the application level. LHCb has managed to find some additional funding (Imperial College) but worries about their ability to meet the full demands of their computing model in 2008. 11/Nov/2007 GridPP Status D. Britton Summary GridPP has responded to concerns about the shortfall of application support effort by identifying funding within the GridPP2 envelope to fund 1.5 FTE per experiment. STFC has not approved this; instead proposes that the GridPP3 working allowance fund 1.0 FTE per experiment. GridPP and the Experiments do not feel this is wise and/or sufficient. The Credibility Gap has not been completely closed. 11/Nov/2007 GridPP Status D. Britton