INFO 636 Software Engineering Process I Prof. Glenn Booker Weeks 4-5 – Estimating Software Size INFO636 Weeks 4-5 www.ischool.drexel.edu.

Transcript INFO 636 Software Engineering Process I Prof. Glenn Booker Weeks 4-5 – Estimating Software Size INFO636 Weeks 4-5 www.ischool.drexel.edu.

INFO 636
Software Engineering Process I
Prof. Glenn Booker
Weeks 4-5 – Estimating
Software Size
INFO636 Weeks 4-5
1
www.ischool.drexel.edu
Why Plan?
• As emphasized earlier, we need a good
estimate of the amount of work to be
performed, in order to predict effort and
time accurately (per Boehm)
• Estimation is one of the most challenging
aspects of managing software
development, hence our substantial focus
on it here
INFO636 Weeks 4-5
2
www.ischool.drexel.edu
Estimation Example
• Other fields have well established
formulas for estimating work
– Construction knows the cost per square foot
of various types of construction
– More complex projects look at the linear
amount of walls, and the areas of various
parts (walls, ceilings, etc.) to develop
good estimates
INFO636 Weeks 4-5
3
www.ischool.drexel.edu
Size Estimation Process
• The framework, or process, for planning a
project was covered last lecture
– Define system requirements
– Product conceptual design
– Estimate product size
– Estimate resources and schedule
– Develop the product
– Refine basis for later estimates
INFO636 Weeks 4-5
4
www.ischool.drexel.edu
Estimation Tools
• Most software estimation tools have been
calibrated to use software size
as an input, and produce effort and
schedule as outputs
– COCOMO, SLIM, PriceS, and McConnell’s
tables in Rapid Development
– Often start at fairly large project sizes,
e.g. 10,000 LOC and up
INFO636 Weeks 4-5
5
www.ischool.drexel.edu
Estimation Tools
• We need a basis for estimation which
works for an individual (programmer)
• Most organizations use either no
estimation methods, or use terribly
unreliable ones
– 100% error is far too common
INFO636 Weeks 4-5
6
www.ischool.drexel.edu
Desired Estimation Goals
• Criteria for a good estimation
method include:
– Use structured and trainable methods
– Should apply to both development and
maintenance
– Should be able to handle all aspects
of development, not just code
INFO636 Weeks 4-5
7
www.ischool.drexel.edu
Desired Estimation Goals
– It should be suitable for statistical analysis
– It should be adaptable to future types of work
– It should be possible to judge the accuracy of
your work (and hence refine the model)
• We’ll briefly cover four estimation
methods, then explain the proxy-based
PROBE approach
INFO636 Weeks 4-5
8
www.ischool.drexel.edu
Estimation Methods
•
•
•
•
•
Wideband-Delphi Method
Fuzzy Logic Method
Standard Component Method
Function Point Method
Proxy-based Estimating
INFO636 Weeks 4-5
9
www.ischool.drexel.edu
Wideband-Delphi Method
• This method was developed by Rand
Corporation
• It uses several people to estimate the
same task, then applies a Delphi method
to get a consensus estimate
• The process is:
– Discuss the problem
INFO636 Weeks 4-5
10
www.ischool.drexel.edu
Wideband-Delphi Method
– Get anonymous estimates, and hand them to
a moderator
– Find the median estimate, and show everyone
the set of estimates
– Discuss the results, to uncover different views
of the project scope
• Repeat the process until estimates
converge to within a predefined range
INFO636 Weeks 4-5
11
www.ischool.drexel.edu
Fuzzy Logic Method
• This approach uses historic data to arrive
at some meaningful estimates based on
qualitative descriptions
– Size categories such as Very Small, Small,
Medium, Large, and Very Large
• How data are divided into these categories
depends on the type of data
INFO636 Weeks 4-5
12
www.ischool.drexel.edu
Fuzzy Logic Method
• Data with a small range (say, a factor of
five from very small to very large) can use
a linear divisions
• Data with a large range can use a base 10
logarithmic division (as shown in the text)
INFO636 Weeks 4-5
13
www.ischool.drexel.edu
Fuzzy Logic Method
• Linear division breaks up sizes into evenly
divided pieces
• Here’s an example for the N track
– If your work to read the text involves chapters
from 23 to 75 pages long (I made those
numbers up), then the range of sizes is
75-23=52 pages
– Divide that range into five pieces by dividing
by four 52/4 = 13
INFO636 Weeks 4-5
14
www.ischool.drexel.edu
Fuzzy Logic Method
– The midpoints of each size are just the lowest
size, then add the 13 four times
•
•
•
•
•
Very Small midpoint = 23 pages
Small midpoint = 23+13=36 pages
Medium midpoint = 23+13*2=49 pages
Large midpoint = 23 +13*3=62 pages
Very Large midpoint = 23 +13*4=75 pages (which
equals the largest chapter size)
INFO636 Weeks 4-5
15
www.ischool.drexel.edu
Fuzzy Logic Method
– Use half of 13, or 6.5, to find the ranges for
each size
•
•
•
•
•
Very Small range is up to 23+6.5=29.5 pages
Small range is 29.5 to 36+6.5=42.5 pages
Medium range is 42.5 to 49+6.5=55.5 pages
Large range is 55.5 to 62+6.5=68.5 pages
Very Large range is 68.5 pages and up
– Notice each category’s range is also 13
pages, since we have linear divisions
INFO636 Weeks 4-5
16
www.ischool.drexel.edu
Fuzzy Logic Method
• The logarithmic version is messier, since
we have to
– Convert the sizes to their base-10 logarithms
– Follow the linear approach using the
logarithms
– Take everything to the power of 10 to convert
it back to the original units
INFO636 Weeks 4-5
17
www.ischool.drexel.edu
Fuzzy Logic Method
– The example in the book has LOC ranging
from 173 to 10,341 LOC
• The log10 of 173 is 2.238
• The log10 of 10,341 is 4.014
– The difference is 4.014 – 2.238 = 1.776
– Divide the difference by four to get the interval
1.776/4=0.444
– Mimic slide 15 to find the midpoints
INFO636 Weeks 4-5
18
www.ischool.drexel.edu
Fuzzy Logic Method
– The midpoints of each size are just the lowest
size, then add the 0.444 four times
•
•
•
•
•
Very Small midpoint = 2.238
Small midpoint = 2.238 + 0.444 = 2.682
Medium midpoint = 2.238 + 0.444*2 = 3.126
Large midpoint = 2.238 + 0.444*3 = 3.570
Very Large midpoint = 2.238 + 0.444*4 = 4.014
(which equals the largest code size)
– Mimic slide 16 to find the ranges of each size
category
INFO636 Weeks 4-5
19
www.ischool.drexel.edu
Fuzzy Logic Method
– Use half of 0.444, or 0.222, to find the ranges
for the first size (then just keep adding 0.444
to each range boundary)
•
•
•
•
•
Very Small range is up to 2.238+0.222=2.460
Small range is 2.460 to 2.460+0.444=2.904
Medium range is 2.904 to 2.904+0.444=3.348
Large range is 3.348 to 3.348+0.444=3.792
Very Large range is 3.792 and up
INFO636 Weeks 4-5
20
www.ischool.drexel.edu
Fuzzy Logic Method
– Now take 10 to the power of the logarithms to
find the actual LOC
•
•
•
•
•
Very Small range is up to 10^2.460=288 LOC
Small range is 288 to 10^2.904=802 LOC
Medium range is 802 to 10^3.348=2228 LOC
Large range is 2228 to 10^3.792=6194 LOC
Very Large range is 6194 LOC and up
– This is the basis for the poorly labeled table at
the bottom of page 104 in the text
INFO636 Weeks 4-5
21
www.ischool.drexel.edu
Fuzzy Logic Method
• An aside…Tables 5.2 in the text divide
each of the five basic categories (Very
Small, etc.) into five more “subranges”
– This follows the same approach, just adding
more detail to each category
– It’s unlikely you’ll have enough data to worry
about subranges
INFO636 Weeks 4-5
22
www.ischool.drexel.edu
Standard Component Method
• The Standard Component Method, by
Putnam, assumes you have a substantial
database from which to make your
estimates
– Make a realistic estimate of how many
screens you think will be in your system
– Estimate the lowest and highest possible
numbers of screens you could imagine
will be in your system
INFO636 Weeks 4-5
23
www.ischool.drexel.edu
Standard Component Method
– For actual estimation, use
n = (lowest number + highest number +
4*realistic number)/6
– The idea is to try to account for possible error
in your estimate
• Repeat this process for each type of
component in your system
INFO636 Weeks 4-5
24
www.ischool.drexel.edu
Function Point Method
• The function point approach uses “function
points” as a proxy for the complexity of the
system, independent of the programming
language used
INFO636 Weeks 4-5
25
www.ischool.drexel.edu
Function Point Method
– Each input or output function, interface, file,
and inquiry is judged on a fixed complexity
scale of small to large (not shown in the Humphrey
text), and assigned some number of function
points
– The total number of function points is adjusted
for 14 “influence” factors, such as the
developers’ expertise, business environment,
etc.
INFO636 Weeks 4-5
26
www.ischool.drexel.edu
Function Point Method
• While a great language-independent
method for judging the complexity of a
program, it isn’t as reliable for estimating
development effort
– See IFPUG for more details
INFO636 Weeks 4-5
27
www.ischool.drexel.edu
Proxy-based Estimating
• We are trying to predict the final size of a
software product
• Measuring or estimating that directly is
tricky at best, so we use proxies to help
get there
– A proxy is an intermediate concept or
substitute for what we really want to predict
INFO636 Weeks 4-5
28
www.ischool.drexel.edu
Proxy-based Estimating
• The overall process is like this
– We want to take the conceptual design, and
break it into parts which correspond
to the proxies available
– Estimate each part of the system, based
on the proxies
– Add them up to get the overall product size
INFO636 Weeks 4-5
29
www.ischool.drexel.edu
Choosing a Proxy
• The proxy size should correspond to the
development effort size
• Proxy content should be countable and
easy to visualize
• Proxy must be customizable
• The proxy should be sensitive to the same
factors which affect development
INFO636 Weeks 4-5
30
www.ischool.drexel.edu
Possible Proxies
• In a manner similar to function points, any
characteristic of the system could be
proxies
– Input screens, output reports, data files
– Objects or classes
• The fuzzy logic and function point
concepts are essentially blended to
produce the PROBE approach
INFO636 Weeks 4-5
31
www.ischool.drexel.edu
PROBE Method
• PROxy-Based Estimation (PROBE) uses
objects as proxies
– See also Appendix C, Tables C36 and C40
• First choose appropriate proxy categories
(e.g. Table 5.7, p. 117)
– For code, calculation, data, I/O, control, print,
etc. might be suitable proxies
– Reading, discussion, homework,… (N track)
INFO636 Weeks 4-5
32
www.ischool.drexel.edu
PROBE Method
• Choose reasonable size options for
the proxies
– For class, you might only have enough data
for three sizes instead of five
• Analyze your historic data to determine
approximate sizes (LOC) for each proxy
– For N track, the amount of effort needed
INFO636 Weeks 4-5
33
www.ischool.drexel.edu
PROBE Method
• Now start using your method for a given
assignment
– Develop a conceptual design for the solution
– Use your proxies to estimate the amount
of code or effort needed to develop them
– The example on page 120 is the first use
of form C39 (p. 683)
INFO636 Weeks 4-5
34
www.ischool.drexel.edu
A Course Note
• P track students will use the estimating
pretty much as written in the text
– Our forms are slightly different
• N track students will develop their own
proxies to correspond to their weekly
activities, and create a custom form N39 to
follow a similar process
INFO636 Weeks 4-5
35
www.ischool.drexel.edu
PROBE Method
• The BASE PROGRAM section of C39 is
a summary of the expected changes
to the preexisting code
– Base Size (B) is the amount of code already
present
– LOC Deleted (D) is how much existing
code you plan to remove
– LOC Modified (M) is how much existing code
you expect to change
INFO636 Weeks 4-5
36
www.ischool.drexel.edu
PROBE Method
• The PROJECTED LOC section contains:
– Base Additions (BA) are planned additions to
existing code (new lines within existing
modules)
– New Objects (NO) are new modules or
classes which will need to be implemented
• Your proxy structure is used to describe
the Type, Methods, and Relative Size of
the changes to BA and NO
INFO636 Weeks 4-5
37
www.ischool.drexel.edu
PROBE Method
• The REUSED OBJECTS (R) section of
C39 is used to describe
– Code you’ll reuse from another preexisting
source
– Code you’ll create during this assignment
which will be reusable
• These tend to be rare during the course
INFO636 Weeks 4-5
38
www.ischool.drexel.edu
PROBE Method
• Now comes the number crunching part
– The Projected LOC (P) is the total amount of
new development for this assignment;
P = BA + NO
– The terms b0 (hereafter beta0) and b1 (beta1)
are linear regression parameters from your
work history
– By now you have a history of planned LOC or
effort, and actual
INFO636 Weeks 4-5
39
www.ischool.drexel.edu
PROBE Method
• What are beta0 and beta1?
• The classic equation for a line is
y = mx + b
– ‘m’ is the slope, which corresponds
to beta1
– ‘b’ is the y-intercept, which is beta0
• Here the ‘x’ axis is the planned LOC or
effort, and the ‘y’ axis has actual values
INFO636 Weeks 4-5
40
www.ischool.drexel.edu
PROBE Method
Actual
LOC
(Y)
x
Linear regression
x
x
x
x
x
}
1
Data points from weekly assignments
Beta1 (slope)
Beta0 (y-intercept)
Planned LOC (X)
INFO636 Weeks 4-5
41
www.ischool.drexel.edu
PROBE Method
• See “regression” handout for an example
of calculating beta0 and beta1
– Note that Sxi2 means S(xi2) not [S(xi)]2
• When you use this, make sure the
formulas are correct
– ‘n’ changes each week as new data is created
INFO636 Weeks 4-5
42
www.ischool.drexel.edu
PROBE Method
• Incidentally, if your estimates are always
perfect, you’d have beta1 = 1, and beta0 =
0 (why?)
• Once you have beta0 and beta1, find:
– New and Changed LOC (N) = beta0 +
beta1*(P + M)
– It’s critical to note that later calculations
for prediction interval use ‘N’, not ‘P’
INFO636 Weeks 4-5
43
www.ischool.drexel.edu
PROBE Method
• The expected size of the application after
this project is
– Total LOC (T) = N + B - D – M + R
• The Total New Reused is the sum of code
flagged (with a *) in the New Objects
section which are being reused
– Don’t need to use this very often
INFO636 Weeks 4-5
44
www.ischool.drexel.edu
PROBE Method
• Then we get to the Range calculation
• We have a refined estimate of the size of
the system, but want to establish a
prediction interval in which the real
outcome is likely to fall
– See the PSP_Calculation_Example.xls
spreadsheet
INFO636 Weeks 4-5
45
www.ischool.drexel.edu
PROBE Method
• To find the Range, we start with a
parameter from the ‘t’ distribution
• Called ‘t(a/2, n-2)’ where
 a/2 is the width of the prediction interval –
generally 70% or 90%
– ‘n-2’ is the number of degrees of freedom;
again, ‘n’ is the number of data pairs
– In Excel, use TINV(1 - a/2, n - 2)
INFO636 Weeks 4-5
46
www.ischool.drexel.edu
PROBE Method
• Next we need the standard deviation, s
– That’s why column G adds up
(Yi - b0 + b1*Xi)2
 s = sqrt[ S(Yi - b0 + b1 Xi)2 / (n-2)]
• Now there’s a new term, xk (xk)
– xk = P + M
– This is the same term used in the N formula –
the projected and modified LOC
INFO636 Weeks 4-5
47
www.ischool.drexel.edu
PROBE Method
• Now use this to plug into formula 5.3 on
page 124
– I’m not going to copy it here
– Notice in the spreadsheet the column H
calculation of (Xi - Xavg)**2
which is also used to find the Range
INFO636 Weeks 4-5
48
www.ischool.drexel.edu
PROBE Method
• Finally, find the Upper and Lower
Prediction Intervals (UPI and LPI)
– UPI = N + Range
– LPI = N – Range
• The Prediction Interval Percent is either
70% or 90%, the value used to find ‘t’
INFO636 Weeks 4-5
49
www.ischool.drexel.edu
PROBE Method
• If Range is comparable to N in magnitude
– Choose a Prediction Interval Percent of 70%
to keep Range smaller, and/or
– Look for data fliers which can have a strong
influence on sigma (s)
• E.g. data points with relatively large value of
(Yi - b0 + b1*Xi)2
INFO636 Weeks 4-5
50
www.ischool.drexel.edu
Object Size Ranges
• The fuzzy logic method (starting on slide
12) summarizes the two most likely
approaches for defining size ranges based
on your historic data
– A Linear approach, generally best if the range
of the data is well under a factor of 10
– A logarithmic approach for wider range data
INFO636 Weeks 4-5
51
www.ischool.drexel.edu
Object Size Ranges
• If your work is following a true normal
distribution, then your objects should have
– 6.68% each in Very Small and Very Large
categories
– 24.17% each in Small and Large categories
– 38.30% in the Medium category
• It’s good to see if this holds
INFO636 Weeks 4-5
52
www.ischool.drexel.edu
Object Size Ranges
• If your object size distribution is really
skewed, you could
– Reconsider the size categories
– Look for better proxies
– See if your design approach is leaning toward
very large or very small objects, or very
inconsistent object sizes
INFO636 Weeks 4-5
53
www.ischool.drexel.edu
N Track Notes
• You’ll use most of the preceding
discussion
– You’ll have different proxies instead of
the {Base Program, Projected LOC, and
Reused Objects}
– You’ll have some equivalent of ‘P’ and ‘N’,
and still find beta0, beta1, and Range
• Your P and N will measure time instead of LOC
– You’ll still find prediction intervals UPI, LPI
INFO636 Weeks 4-5
54
www.ischool.drexel.edu
Improving Estimation
• We tend to try to estimate many small
things for a large task
– The estimation errors tend to cancel each
other somewhat
• The PSP allows you to know what your
estimation errors have been, and hence
improve later estimates
– Though that’s hard to see during the term
INFO636 Weeks 4-5
55
www.ischool.drexel.edu
Improving Estimation
• As you follow this consistently, your values
for beta0 and beta1 will tend to stabilize
– Then you don’t have to keep recalculating
them!
– If you get really weird beta0 and beta1, or
have no history yet, look at other options for
refining your estimate, on page 679 (Table
C35)
INFO636 Weeks 4-5
56
www.ischool.drexel.edu
Improving Estimation
• On large projects, look for a consistent,
and fairly low, level of abstraction
– The conceptual design might need to be
refined to provide enough detail for a
good estimate
– If a single object performs the work of many
kinds of proxies, then it probably needs to be
broken down
INFO636 Weeks 4-5
57
www.ischool.drexel.edu
Improving Estimation
• Estimating products which have no
precedent is really tough
– Make sure the level of uncertainty is
clear to your customer
• Avoid overcompensating for your own
history of errors
– Make small changes in your approach
and try them for a while
INFO636 Weeks 4-5
58
www.ischool.drexel.edu

INFO 636 Software Engineering Process I Prof. Glenn Booker Weeks 4-5 – Estimating Software Size INFO636 Weeks 4-5 www.ischool.drexel.edu.

Transcript INFO 636 Software Engineering Process I Prof. Glenn Booker Weeks 4-5 – Estimating Software Size INFO636 Weeks 4-5 www.ischool.drexel.edu.

Directory