Beta Test CET Briefing Charts

Download Report

Transcript Beta Test CET Briefing Charts

Observations on Cost Modeling and Performance
Measurement of Long Term Archives
Kathy Fontaine
NASA Goddard Space Flight Center
Earth Science Data Systems Working Groups
Greg Hunolt, Bud Booth, Mel Banks
SGT, Inc.
PV2007 Conference
October 9 - 11, 2007
DLR Oberpfaffenhofen - Munich - Germany
CEOS WGISS
October 15 - 19, 2007
DLR Oberpfaffenhofen - Munich - Germany
Agenda
• Review - What is the Cost Estimation Toolkit?
– Goal and Approach of the CET (Cost Estimation Tool)
Development
– High Level Description of the Data Activity Reference Model
• Experience / Lessons Learned - Building and Maintaining
the Comparables Database (CDB)
• High Level Description of the Cost Estimating Tool
• Application of the CET to Long-Term Archives
• Summary
• Next Steps
2
Goal of the CET (Cost Estimation Tool) Development
• NASA has always used cost estimating models for planning Earth and
space science flight projects
– For estimating costs of instrument packages, spacecraft, mission control
centers, etc.
• NASA had no tool for estimating life cycle costs of science ground data
handling capabilities, whether stand-alone or within a flight project.
• The goal of the CET development was to see whether that gap could be
filled –
–
–
–
–
•
Project was begun in 2002.
CET Prototypes were tested and evaluated in 2003 and 2004.
‘Operational Beta’ versions were completed in 2005, 2006, and 2007.
Initial testing at GSFC and LaRC in 2005, 2006, 2007 were successful.
CET ‘operational beta’ being evaluated for addition to GSFC’s Integrated
Development Center’s package of tools, and was made available as a NASA
Open Source item in 2007.
- for the PI planning a new Data Activity
– To help PI consider the full range of items that will contribute to the life cycle
cost of a new data activity and to produce an estimate that the PI can com- 3
pare to estimates produced by other means.
CET Approach
•
Cost Estimation by Analogy Method was Adopted
– Decision based on Benchmark study and internal testing of existing tools (PRICE, SEER,
COCOMO, and others);
– At the time, did not find other acceptable parametric methods for estimating life cycle
data costs for implementation and maintenance/operations costs;
– Ensures that estimates will be based on experience with existing science ground data
handling activities;
– Requires assembly of information about existing activities, and…
– Mapping of that information to a common reference model, so that information from
multiple activities with multiple data providers can be normalized and used together in
the estimating process.
•
Comparables Database (CDB)
– The database of information from many existing data activities mapped to the common
reference model.
•
Data Activity Reference Model
– Based on reference model developed during 2001 comparative analysis of 19 U.S. and
international data activities.
– Includes a set of development, operational and support functions / areas of cost, and
descriptors for each.
•
CET Estimation by Analogy Implementation
– The CET uses adaptive regression curve fitting for estimating staffing levels and
parametric techniques (e.g. cost curves) for non-staff cost items.
4
Data Activity Reference Model
• A ‘Data Activity’ is
– An entity that performs data handling functions that may include ingest,
product generation, storage/archive, distribution, and support functions (see
below).
– A data activity’s life cycle includes implementation and a period of operations
(when data activity is performing data handling functions) that may overlap.
– A data activity can be a ‘stand-alone’ organization or embedded within a flight
project or other science or applications project. A ‘data center’ can include
more than one distinct data activity.
• Data Activity Reference Model
– Functions with Descriptors for each…
– Operating Functions: Ingest, Product Generation, Archive, Search and Order,
Access and Distribution, User Support.
– Support Functions: Documentation, Implementation, Sustaining Engineering,
Engineering Support, Management, Technical Coordination,
Facility/Infrastructure.
– See paper for more detail on functions, example of descriptors.
– Compatible with OAIS where the models overlap, see paper for more detail.
5
General Data Activity Reference Model
Functions
Descriptors
For each
Function
DAACs, ESIPS,
SIPSs, Space
Science Data
Centers, etc
Information on
Existing
Data Activities
Map Data Activity
Information to
Reference Model
CDB Building &
Maintenance Tool
Data
Activity
Template
Mapped
Data Activity
Information,
Year by Year,
Function by
Function
Concepts:
Reference Model,
CDB and CET
Comparables Database (CDB)
Version 2.1 - 29 Data Activities
CET Output
PI User Input
Specify Mission Schedule, etc.
Select from Menu of Functions
Provide Descriptors for each
selected function.
Cost Estimation by Analogy:
Function by Function,
Staff – Adaptive Regression Curve
Fitting;
Non-Staff – Parametric.
Cost Estimation Toolkit (CET)
Life-cycle costs and
staffing levels
Graphs
Sensitivity Analysis
High Level Description of CET
• Excel-based, uses Visual Basic for Applications, two workbooks, one
for CET, the other holds the CDB; runs on PC or Macintosh platforms.
• Use the CET to
– Describe a new Data Activity: Menu Driven Sequence of Forms for Selecting
Functions and Entering Descriptors (example to follow).
– Produce a life cycle estimate: year by year, functional breakdown, staffing
profile and costs, costs for non-staff items (example to follow).
– Run a ‘what-if’: vary one or more inputs, re-run, produce new estimate and
comparison with original estimate.
– Test sensitivity of estimate to a range of variation of a selected descriptor.
– Produce graphs: select from a number of options (examples to follow).
– Review and edit/tailor the estimate… tool offers hints such as:
• Adjust staffing levels to smooth out ups and downs that track workload changes but
would be impractical to implement;
• Delete costs for items included in loaded labor rates;
• Adjust for re-use of existing resources.
7
Data Activity Reference Model:
Functions and Descriptors
Data Activity Functions
No. of Descriptors to
Describe Each Function
Ingest
9
Product Generation
21
Documentation
4
Archive
16
Distribution
31
User Support
6
Management
5
Sustaining Engineering
4
Engineering Support
3
Implementation
8
8
Data Activity Reference Model
Ingest Function Descriptors (Example)
Total Ingest FTE
Ingest Technical FTE
Ingest Operations FTE
Ingest Function Level of Service (LOS)
External Ingest Interfaces
Product Types Ingested per Year
Ingest Automation LOS
Number of Products Ingested per year
Ingest Volume per Year
9
CET - Sample Ingest Descriptor Input Form
10
CET Screen Shot – Archive Form
11
CET Screen Shot – Processing Form
12
CET Screen Shot – Sample Output Table
13
CET - Sample Life Cycle Cost Estimate Output
14
CET - Graph Example 1
3. Sample Activity - Total Mission Life Staffing
by Labor Cost Category
7% 1%
33%
50%
Admin Support
Development / Engineering
Management
Operations
Technical / Science
9%
15
CET - Graph Example 2
5. Sample Activ ity - Av ge rage Annual Staffing by
Function FTE - Ope rations Period
0.97
0
2.53
3.87
0
0.35
1.43
0.76
1.79
0.42
4.11
Archive
Development
Distribution
Documentation
Eng Support
Ingest
Management
Processing
Sustaining Eng
Tech Coord
User Support
16
CET – Graph Example 3
7. Sample Activity - Total Estimated Staff
Costs
7%
34%
49%
Development / Engineering
Management / Admin
Operations
Technical / Science
10%
17
Application of the CET to Long-Term Archives
• CET and CDB currently do not directly support Long Term Archives
– No such NASA requirement currently exists for Earth science data, but they
could be extended to do so…
• Step 1 – Extend Data Activity Reference Model:
– Analyze OAIS model (especially Preservation Planning and aspects of Ingest,
Archival Storage, and Data Management) and existing Long Term Archives
– Identify specific functions or aspects of functions associated with long term
archiving that go beyond what the model now includes.
• Step 2 – Extend the Comparables Database:
– Collect information from a number of existing Long Term Archives
– Map to the extended Data Activity Reference Model
– Populate the CDB
• Step 3 – Extend the CET:
– Add estimation of new factors particular to Long Term Archives
• Extended CET / CDB could then be used to estimate staffing / costs for
a New Long Term Archive, and could be used to support management
of existing Long Term Archives.
18
Observations
• Yes, the gap could be filled
– The CET is proving to be a valuable tool for estimating the life cycle costs of
scientific data processing, archive, and distribution activities.
– The information collected for the CDB can also be used by such activities to
monitor their performance.
– The CET and its database is capable of being extended to encompass long term
archives, thus providing a quantitative tool for both planning their development
and monitoring their performance.
• However,
– Cost estimation by analogy requires, among other things,
• lots of analogies [many data activities of similar sizes, for instance]
• lots of maintenance [information must be updated to maintain currency and
relevancy]
• lots of security [data activity information must not be labeled or otherwise
identifiable]
– All of the above would require a good, solid set of requirements, a project plan,
and other necessary management and review structure.
• And so…
19
Next Steps
• NASA is preparing to do an in-depth evaluation of the
tool
– NASA’s evolving data systems present a different
overall picture than was present at the beginning of this
process;
– It is now time to determine whether ‘it should continue
to be done,’ and if so, which pieces and how.
– Existing user feedback is being incorporated, and will
continue to be critical to this process.
http://opensource.gsfc.nasa.gov
20
Thank You for Your Attention!
Questions?
Further questions or comments: [email protected]
21
Backup Charts
22
CET Effort Estimation Process
Comparables DB –
Describes Existing Activities
Effort and Workload
Parameters:
Multiple Activities
Year by Year
Function by Function
Parameter by
Parameter
Effort
Estimation
Intermediate
Parameters
Compute:
Annual Averages,
Workload and Effort
Parameters for
Each CDB Activity
Compute:
Annual Averages,
Workload and Effort
Parameters across
CDB Activities
Generate:
Effort Estimating
Relationships
for each Workload
parameter
Overall Effort Estimate
Workload, LOS
Parameters:
Single New Activity
Year by Year
Function by Function
Stream by Stream
Activity Dataset Describes a New Activity
Compute:
Parameters –
Year by Year
Summed over
Streams
Compute:
Set of year by year
effort estimates
for each workload
parameter
Compute:
Year by Year
Effort Estimates:
Correlation weighted average
over workload parameters,
apply levels of service
Form of effort estimate computation:
Effort[new activity] = f ( Workload [new activity] where
f is function based on CDB activities’ effort-workload
developed using “Curve-Fit” approach.
231
Cost Estimating Approach
•
Method is Cost Estimation by Analogy – the data activities in the CDB are assumed
to be analogs for a new data activity to be estimated.
•
Year by year staff effort for new data activity is estimated from mission and
expected year by year workload (using “effort estimating relationships” – see next
chart), then user’s projected local labor rates are applied to produce estimates of
staff costs.
•
Estimating of effort is done function by function, so CDB comparison is with data
for separate functions rather than with whole data activities.
•
Non-staff items are currently based on CDB history, use inflation normalization,
parametric approaches, ‘cost curves’ etc., for projections.
24
CET’s Effort Estimating Process
•
•
•
Compute averages of annual workload parameters and staffing levels for each
functional area for each CDB activity.
Compute “Effort Estimating Relationships”, i.e. equations for FTE as function
of workload parameters
– Using regression-based curve fitting (see next chart) for operating functions
(ingest, processing, archive, distribution) and for implementation and sustaining
engineering, system purchase cost (normalized to base year then projected);
– Using a ‘base plus delta’ approach for other non-operating functions – CDB
averages as base, delta based on comparison of new activity LOS’s with CDB
averages.
Compute year by year staffing for the functional area for the new activity by
– Use the equations to compute a set of FTE estimates, each based on a specific
workload parameter, and…
– Compute weighted average for each functional area’s staffing categories,
weighted by curve fit correlation for each workload parameter,
– Use applicable Level of Service parameter(s) to bump up estimate if new
activity’s LOS is higher than CDB average, or decrease if lower.
25
Regression-based Curve Fitting - Detail
Curve Fitting is used to develop a relationship between workload parameters and
FTE for the CDB data activities.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
CET takes a set of data, i.e. values for a workload parameter and corresponding
operational or technical FTE values, performs “cluster” outlier screening.
CET computes a set of eight curves, using regression: linear, quadratic,
exponential, logarithmic, power, root, linear-exponential, linear-logarithmic.
CET eliminates those curves which drive estimated FTE negative or introduce
double values (i.e. two FTE estimates for one workload value).
CET computes Pearson correlation coefficient for each curve left.
CET checks all remaining curves for outliers – points whose departure from
the curve exceed a threshold multiple of standard deviation, eliminates an
outlier point (the “worst”).
CET re-computes the curves without the outlier, makes sure each curve’s
correlation is not worse.
CET repeats 5 and 6 until outliers are gone or outlier toss limit is reached.
CET selects the re-computed curve with the best correlation value.
CET uses a limited linear projection if ADS workload exceeds CDB range.
CET uses the final curve’s equation/coefficients to make year by year estimates
26
of FTE.
Calibration / Tuning of the CET
•
•
•
•
The premise of the cost estimation by analogy approach is that if the
CET is calibrated against existing data activities, i.e. tuned to produce
the best possible overall results for the known existing data activities, it
will produce a good life cycle cost estimate for a new data activity.
The CDB contains information for twenty-nine data activities that can be
used as test subjects, since CDB information includes mission
information, workload, staffing, etc.
Calibration / tuning is accomplished by adjusting CET controls:
parameter weighting, outlier removal limits, LOS adjustment
coefficients, until the ‘best’ overall performance for the set of CDB data
activities is achieved.
The accuracy of the CET for existing CDB data activities is measured by
independent testing…
27
Independent Testing Process
• The data activity used as a test subject is not allowed to influence its own
test results.
• In the independent testing process
– The objective is to measure the error of the estimate of a data activity’s staffing
profile (estimate based on its mission and workload).
– A CDB data activity is selected to be a test subject.
– An Activity Data Set is prepared for the data activity, which contains the mission
and workload information a CET user would enter.
– That activity is removed from the CDB.
– The CET reads the Activity Data Set, accesses the CDB, and produces an
estimate for the data activity.
– The estimated staffing profile is compared with the actual staffing profile to
determine the error, function by function and for the activity as a whole.
– The process is repeated for the set of CDB data activities.
– When all activities have been processed, overall errors across the data activities
are computed: e.g. overall average absolute error and percentage, and overall
bias.
28
Independent Testing Results
• Results are based on testing with 28 CDB sites
• Test Results for the September 2006 version 2.1 of the CET
– The typical annual error of estimate is 2.46 FTE (average absolute error, so
positive and negative errors don’t cancel). The average typical error % of actual
is 22.9%.
– The overall annual average error across the 29 sites is –0.03 FTE, which is
–0.3%, showing very little overall bias.
– For the individual estimates for the 29 data activities:
13 have errors less than 20%, 18 have errors less than 30%; 21 have errors less
than 50%; and overall smaller activities have greater errors (see next chart).
– For the CDB activities, the average standard deviation of FTE for a function,
weighted by the number of activities having the function, is 2.57. This is a
rough measure of the variability of the information in the CDB.
– The standard deviation of the typical error for the Version 2.1 CET, 1.66, is well
within the range of variability of the information in the CDB.
29
Independent Testing Results, Continued
Actual Staff Size vs ATE (Average Typical Error) Percentage
300.0%
ATE Err %
250.0%
200.0%
150.0%
100.0%
50.0%
0.0%
0.00
5.00
10.00
15.00
20.00
25.00
30.00
Actual Staff Size, FTE (Averaged over Activity Life)
If the actual size of an activity was 10 FTE or greater, 14 out of 15, or 93%, had
an ATE of less than 30%.
If the actual size of an activity was less than 10 FTE, 4 out of 13, or 31%, had
an ATE of less than 30%.
30
Progress with CET Independent Testing Performance
Improving Average Typical Error (ATE) for CETs
7.00
6.08
6.00
4.89
5.00
3.29
4.00
2.78
3.00
2.47
2.46
2.00
1.00
0.00
CET
Version:
Working
Prototype
May 2003
IOC
September
2003
Beta Test
May 2004
Version 1
Sept 2004
Version 2
Oct 2005
Version 2.1
Sept 2006
31