Transcript Document

The Credibility Gap
11/Oct/07
D. Britton
GridPP2 ProjectMap

0.1
0.18
0.35
0.52

Production Grid Milestones
0.2
0.19
0.36
0.53
0.3
0.20
0.37
0.54
0.4
0.21
0.38
0.55
0.5
0.22
0.39
0.56
Design
1.1

1.1.1 1.1.2 1.1.3 1.1.4
1.1.5
Service Challenges

1.2

1.2.1 1.2.2 1.2.3 1.2.4
1.2.5

0.7
0.24
0.41
0.58

1
LCG

0.6
0.23
0.40
0.57
Development
1.3

1.3.1 1.3.2 1.3.3
0.8
0.25
0.42
0.59
Navigate down
External link
Other Link
91



(93%)
11/Nov/2007
0.10
0.27
0.44
0.61
0.11
0.28
0.45
0.62
0.12
0.29
0.46
0.63

2
M/S/N

Metadata
2.1

2.1.1 2.1.2 2.1.3 2.1.4 2.1.5
2.1.6 2.1.7 2.1.8 2.1.9 2.1.10
2.1.11 2.1.12

Storage
2.2

2.2.1 2.2.2 2.2.3 2.2.4 2.2.5
2.2.6 2.2.7 2.2.8 2.2.9 2.2.10
2.2.11 2.2.12 2.2.13 2.2.14 2.2.15

Workload
2.3

2.3.1 2.3.2 2.3.3 2.3.4 2.3.5
2.3.6 2.3.7 2.3.8 2.3.9 2.3.10
2.3.11

Metric
OK
0.9
0.26
0.43
0.60
Security
2.4

2.4.1 2.4.2 2.4.3 2.4.4 2.4.5
2.4.6 2.4.7 2.4.8 2.4.9 2.4.10
2.4.11 2.4.12 2.4.13 2.4.14 2.4.15
7

Network
2.6
211

(86%)
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5
2.6.6 2.6.7 2.6.8 2.6.9 2.6.10
2.6.11 2.6.12 2.6.13
0.14
0.31
0.48
0.65
0.15
0.32
0.49
0.66
0.16
0.33
0.50
0.67
0.17
0.34
0.51
0.68

ATLAS
3.1

3.1.1 3.1.2 3.1.3 3.1.4 3.1.5
3.1.6 3.1.7 3.1.8 3.1.9 3.1.10
3.1.11 3.1.12 3.1.13

GANGA
3.2

3.2.1 3.2.2 3.2.3 3.2.4 3.2.5
3.2.6 3.2.7

LHCb
3.3

3.3.1 3.3.2 3.3.3 3.3.4 3.3.5
3.3.6 3.3.7 3.3.8 3.3.9 3.3.10
3.3.11 3.3.12 3.3.13

0.100 0.101 0.102 0.103 0.104 0.105 0.106 0.107 0.108 0.109 0.110 0.111 0.112 0.113 0.114 0.115 0.116
0.117 0.118 0.119 0.120 0.121 0.122 0.123 0.124 0.125 0.126 0.127 0.128 0.129 0.130 0.131 0.132 0.133
0.134 0.135 0.136 0.137 0.138 0.139 0.140 0.141 0.142 0.143 0.144 0.145 0.146 0.147

3
LHC Apps
CMS
3.4

3.4.1 3.4.2 3.4.3 3.4.4 3.4.5
3.4.6 3.4.7 3.4.8 3.4.9 3.4.10
3.4.11 3.4.12 3.4.13 3.4.14 3.4.15
Metric
Tasks
InfoMon
not OK 2.5 Complete


2.5.1 2.5.2 2.5.3 2.5.4 2.5.5
2.5.6 2.5.7 2.5.8 2.5.9 2.5.10
2.5.11 2.5.12 2.5.13 2.5.14

Production Grid Metrics
0.13
0.30
0.47
0.64
Tasks
Overdue

PhenoGrid
3.5
BaBar
4.1


SamGrid
4.2

4.2.1 4.2.2 4.2.3 4.2.4 4.2.5
4.2.6 4.2.7 4.2.8 4.2.9 4.2.10
4.2.11 4.2.12 4.2.13 4.2.14 4.2.15

Portal
4.3

5
Management

Project Planning
5.1
6
External

5.1.1 5.1.2 5.1.3 5.1.4 5.1.5
5.1.6 5.1.7 5.1.8 5.1.9 5.1.10
5.1.11 5.1.12


UKQCD
4.4

GridPP Status
Interoperability
6.2

5.2.1 5.2.2 5.2.3 5.2.4 5.2.5
5.2.6 5.2.7 5.2.8 5.2.9 5.2.10
5.2.11 5.2.12 5.2.13 5.2.14 5.2.15

6.2.1 6.2.2 6.2.3 6.2.4 6.2.5
6.2.6 6.2.7 6.2.8 6.2.9 6.2.10
6.2.11 6.2.12 6.2.13 6.2.14

Engagement
6.3


6.3.1 6.3.2 6.3.3 6.3.4 6.3.5


Knowledge Transfer
6.4

6.4.1 6.4.2 6.4.3 6.4.4
Tasks due in
Items
next 60 days
Status Date - Inactive
30/Jun/07
3.6.1 3.6.2 3.6.3 3.6.4 3.6.5
3.6.6 3.6.7 3.6.8 3.6.9 3.6.10

6.1.1 6.1.2 6.1.3 6.1.4 6.1.5
6.1.6 6.1.7 6.1.8 6.1.9
Project Execution
5.2

4.4.1 4.4.2 4.4.3 4.4.4 4.4.5
4.4.6 4.4.7 4.4.8 4.4.9 4.4.10
23
Dissemination
6.1

4.3.1 4.3.2 4.3.3 4.3.4 4.3.5
4.3.6 4.3.7 4.3.8 4.3.9 4.3.10
4.3.11 4.3.12 4.3.13
4
LHC Deployment
3.6

4.1.1 4.1.2 4.1.3 4.1.4 4.1.5
4.1.6 4.1.7 4.1.8 4.1.9 4.1.10
4.1.11 4.1.12
3.5.1 3.5.2 3.5.3 3.5.4 3.5.5
3.5.6 3.5.7 3.5.8 3.5.9


4
Non-LHC Apps
Monitor OK
Monitor not OK
Milestone complete
Milestone overdue
Milestone due soon
Milestone not due soon
Item not Active
41
1.1.1
1.1.1
1.1.1
1.1.1
1.1.1
1.1.1
1.1.1
Tasks
+ next not
90Due
Days
Change
Forms
Update
12
8
Clear
D. Britton
GridPP2+ Deliverables
• These have been defined.
• Not yet in the GridPP2 Project Map (no space)
• Will be monitored separately and/or in GridPP3
Project Map
11/Nov/2007
GridPP Status
D. Britton
GridPP3 Project Map
11/Nov/2007
GridPP Status
D. Britton
Risk Register
ID
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R16
R17
R18
R20
R22
R25
R26
R27
R28
R29
R30
R31
R32
R33
R35
R36
R37
R38
R39
R40
R41
R42
R43
R44
Alt-i-r
Name
Recruitment/retention difficulties
Sudden loss of key staff
Minimal Contingency
GridPP deliverables late
Sub-components not delivered to project
Non take-up of project results
Change in project scope
Bad publicity
External OS dependence
External middleware dependence
Lack of monitoring of staff
Withdrawal of an experiment
Lack of cooperation between Tier centres
Scalablity problems
Software maintainability problems
Technology shifts
Repitition of research
Lack of funding to meet LCG PH-1 goals
Conflicting software requirements
Hardware resources inadequate
Hardware procurement problems
LAN Bottlenecks
Tier-2 organisation fails
Experiment Requirements not met
SYSMAN effort inadequate
Firewalls interfere with Grid
Inablility to establish trust relationships
Security inadequate to operate Grid
Interoperability
Failure of international cooperation
e-Science and GridPP divergence
Institutes do not embrace Grid
Grid does not work as required
Delay of the LHC
Lack of future funding
Network backbone failure
Network backbone bottleneck
Network backbone upgrade delay
Inadequate User Support
11/Nov/2007
GridPP
Li Im Risk Li
2
1
4 2
8
1 3
3 1
2 3
3 2
3
3
6
LCG
MSN
Apps
Pro. Grid
Im Risk Li Im Risk Li Im Risk Li Im Risk
2
4
2 2
4
2 2
4
2 2
4
3
3
1 3
3
1 3
3
1 4
4
1
2
1
1
2
1
1
3
2
2
1
3
4
1
1
2
1
2
1
2
2
4
2
2
2
2
8
2
4
4
2
4
2
4 1
4
2 3
2 2
6
4
2
2
2
3
3
3
4
2
3
3
6
4
4
3
2 3
3 3
2 2
6
9
4
2 3
6
1 4
2 2
4
4
3 2
2 2
6
4
2 2
2 3
2 3
4
6
6
2 2
2 3
1 4
4
6
4
2 2
1 3
4
3
1 3
3
4 3
2 3
12
6
1 4
4
3 2
2 3
6
6
2
2
2
1
2
2
2
2
3
3
3
3
2
3
3
3
6
6
6
3
4
6
6
6
2 3
2 3
6
6
2 3
6
0
2
1
2
1
4
4
6
CASTOR
“Credibility Gap”
6
2 1
2
4
2
2
1
2
1
2
1
4
8
4
6
4 2
2
8
3 3
9
4 3
“Credibility Gap”
12
GridPP Status
4
2
4
3
D. Britton
Credibility Gap: Definition
• Refers to the lack of funding for the support of experiment applications
running on the Grid.
• We believe that this “fell between two stools”: GridPP3 is about the
deployment and support of Infrastructure; the Rolling Grants focused on
the exploitation of physics, assuming the tools were in place.
• We identify this as a Gap in the overall UK strategy to capitalise on all
the investment in LHC hardware and computing.
• Danger is that experiment’s use of the Grid will be inefficient and/or
labour intensive resulting in UK physicists becoming uncompetitive.
11/Nov/2007
GridPP Status
D. Britton
GridPP Actions
After raising this at the last Oversight Committee Meeting, and following
the advice received, GridPP took the following actions to address the
Credibility Gap:
1)
Identified funds available and potentially could be made available
within the existing GridPP2 funding envelope.
2)
Consulted with ATLAS, CMS and LHCb about how best to address the
problem.
3)
Formulated a detailed plan for GridPP3, which included the above
funds, and submitted to STFC.
11/Nov/2007
GridPP Status
D. Britton
GridPP2 Funds
RAL
SLA FY01
SLA FY02
SLA FY03
SLA FY04
SLA FY05
SLA FY06
1
Non SLA
£1,522,055
£1,639,723
£1,602,243
£2,461,037
£2,357,724
£2,737,436
£26,667
Sub Total
£12,346,885
University Grants
GridPP1 Issued
£4,603,425
GridPP1 recovered
-£52,883
GridPP2 Issued
£6,439,676
Sub Total
£10,990,219
Other Costs
Globus Support
£12,658
CERN
£5,666,835
Sub Total
£5,679,493
Expenditure
Spend to date
£29,016,596
Income
GridPP1 Award
£17,000,000
GridPP2 Award
£15,900,000
Total Award
£32,900,000
Encumbrances
SLA FY07 Staff
£832,420
SLA FY07 Travel
£275,000
SLA FY07 Hardware
£1,508,318
Total Encumbrance
£2,615,738
Balance
Balance
£1,267,665
11/Nov/2007
£1268k funding identified, arising from:
£316k (EGEE funding for the four Tier-2 coordinators).
£94k not yet spent on GridPP2 Tier-2 hardware line.
£64k accrued due to vacant staff posts.
£40k not yet spent on the GridPP2 consumables line.
£22k underspend on Tier-1 hardware in FY06.
£134k saved on the total travel budget due to EU
rebates.
£598K documented in the previous Oversight
Committee document.
GridPP Status
D. Britton
Plan Proposed
Previously Identified Posts
NE9
UB Chair
NE8
Administrative Assist.
SubTotal
Support Transition Posts
NC28
VOMS Service
NC29
RTM Transition
NC30
Networking Completion
NC31
Metadata Transition
SubTotal
Experiment Support Posts
ND11
Atlas T1
ND12
LHCb T1
ND13
CMS T1
ND14
Atlas Ganga
ND15
LHC Ganga
SubTotal
Grand Total
11/Nov/2007
FTE
Frac
25%
50%
Start
Duration
Effort
Date
(Months) (Months)
01-Sep-07
43.00
10.75
01-Apr-08
36.00
18.00
79.00
28.75
Cost
Estimate
£90,785
£35,328
£126,113
50%
50%
50%
100%
01-Apr-08
01-Nov-07
01-Sep-07
01-Oct-07
24.00
7.00
7.00
6.60
44.60
12.00
3.50
3.50
6.60
25.60
£85,400
£21,562
£20,532
£45,000
£172,493
50%
100%
150%
100%
50%
01-Apr-08
01-Apr-08
01-Apr-08
01-Apr-08
01-Apr-08
36.00
36.00
36.00
36.00
36.00
180.00
303.60
18.00
36.00
54.00
36.00
18.00
162.00
£107,500
£215,000
£322,500
£215,000
£107,500
£967,500
£1,266,107
GridPP Status
D. Britton
STFC Response
We would not be permitted to carry forward the £1.27m from GridPP2.
After considerable iteration it was agreed that a subset of the proposed
posts could be funded from the GridPP3 Working Allowance
Experiment support posts reduced from 1.5 to 1.0 FTE for each of the
three experiments.
Some discretion would be applied in the use of the GridPP3 contingency.
11/Nov/2007
GridPP Status
D. Britton
GridPP Position
GridPP notes that:
1)
2)
3)
4)
The 1.5 FTE proposed to support the experiment applications was already subminimal and reducing this compounds the problem.
The working allowance was approved to address concerns about the service
levels at the Tier-1 and Tier-2s. Pre-spending this elsewhere introduces risk.
GridPP2 funds were peer review approved. They were part of the consideration
when deciding on the level of the GridPP3 award. At least half the savings were
documented prior to the finalisation of the GridPP3 award.
The additional contributions to the savings were largely made through obtaining
EU funding and careful and responsive management (to delays in the LHC
schedule, for example, and anticipation of difficult times ahead).
11/Nov/2007
GridPP Status
D. Britton
Experiment’s View
ATLAS notes that their request for ATLAS-UK funds in this area is on hold
pending clarification of the GridPP situation. The timing of all this is bad.
CMS are deeply concerned about the shortfall and feel there is significant
risk to their operations. They are particularly concerned, being smaller
than ATLAS, that they are living on “borrowed time” as several key
academics keeping the computing side afloat will shortly have to return
to other duties with no obvious substitutes available.
More generally, CMS perceive a substantial risk that the RAL Tier-1 will
not be integrated into any of the Experiment’s international computing
systems at the application level.
LHCb has managed to find some additional funding (Imperial College) but
worries about their ability to meet the full demands of their computing
model in 2008.
11/Nov/2007
GridPP Status
D. Britton
Summary
GridPP has responded to concerns about the shortfall of application
support effort by identifying funding within the GridPP2 envelope to
fund 1.5 FTE per experiment.
STFC has not approved this; instead proposes that the GridPP3
working allowance fund 1.0 FTE per experiment.
GridPP and the Experiments do not feel this is wise and/or
sufficient.
The Credibility Gap has not been completely closed.
11/Nov/2007
GridPP Status
D. Britton