Transcript Software in Practice a series of four lectures on why
Software in Practice
a series of four lectures on why software projects fail, and what you can do about it - with particular emphasis on safety-critical systems
Martyn Thomas
Founder: Praxis High Integrity Systems Ltd Visiting Professor of Software Engineering, Oxford University Computing Laboratory
Lecture 1:
What is the problem with software?
The state of practice Scale Complexity What does testing tell us?
When I started in 1969 ...
IBM 360/65 Computing service for 1000s of users.
Now I have more computing power in my ‘phone.
The Software Crisis
First digital computer, Manchester 1948 First commercial computer, LEO 1951 We are still in the very early stages of software engineering ... … like studying civil engineering when Archimedes was still alive!
NATO Software Engineering conferences in 1968 and 1969 to address the growing crisis in software dependability.
1972 Turing Award Lecture
The vision is that, well before the 1970s have run to completion, we shall be able to design and implement the kind of systems that are now straining our programming ability at the expense of only a few percent in man-years of what they cost us now, and that besides that, these systems will be virtually free of bugs
E W Dijkstra
Software in the 21st Century
Fifty years on, yet still at the beginning.
We are planning drive-by-wire cars, guiding themselves on intelligent roads We are dreaming if we believe we can build such real-world systems safely, with today’s attitudes to software engineering.
We have still not achieved Dijkstra’s vision of thirty years ago!
Thirty years later… … Most computing system projects fail
Project cancellation Major cost or time overrun Much less functionality than planned Security inadequate Major usability problems Excessive maintenance / upgrade costs Serious in-service failure
I’ll talk about some specific failures in later lectures
most software projects fail
Cancelled before delivery Exceed timescales & costs or greatly reduced functionality On time and budget 31% 53% 16% Mean time overrun Mean cost overrun Mean functionality delivered 190% 222% 60% large companies much worse than smaller recent figures better, but still poor source The Chaos Report (1995) http://www.standishgroup.com
most computing projects fail
Of 1027 projects, 130 (12.7%) succeeded Of those 130: 2.3% were development projects 18.2% maintenance projects 79.5% data-conversion projects of the 500+ development projects in the sample, 3 (0.6%) succeeded .
Source: BCS Review 2001 page 62.
Why does it happen?
Because
: scale matters. Small processes don’t scale up process matters. Most developers lack discipline rigour matters. Most developers are afraid of mathematics engineering is conservative, whereas the software industry is ruled by fashion CAA licensing system; C vs Ada at Lockheed Martin; eXtreme this, Agile that ...
Who can make things better? You!
Scale
How many valid paths through 200 line module?
We have found around 750,000 How big are modern systems?
Windows is ~100M LoC Oracle talk about a “gigaLoC” code base.
How many paths is that? How many do you think they have tested?
What proportion will ever be executed?
A medium-scale system: En Route ATC at Swanwick
RS 6000 workstations
Control Room
Airspace
NERC SECTORISATION / EQUIVALENT LATCC SECTOR NAMES
DELEGATED TO COPENHAGEN ACC
S33 (NORTH SEA)
DELEGATED TO ANTRIM (FL165 - 245) DELEGATED TO DUBLIN (FL165 - 245) DELEGATED TO DUBLIN (FL245 -)
S9 (LANDS END) S8 (STRUMBLE) S7 (WIRRAL)
FL240 TO S7
S3/4 (LAKES) S3/4/ S10 S10 (NORTH SEA) S3/4/5 S5 (BRECON) S5 (BRECON) S23 (BRISTOL) S6 (BERRY HEAD) S11 (NORTH SEA) S27/32 (DAVENTRY WEST) S28/34 (DAVENTRY NORTH) S1 S25 S23 S2 /32/25 S1 /32 /25 S28/26 S1 (LUS WEST) S20 (HURN WEST) S1 /S25 S1 S25/S19 S20 S 2 1 ( H U R N L O W ) S20 S19 (HURN EAST) S2 S2 (LUS EAST) S26 (LMS EAST) S25 (LMS WEST) S1 S25 S18 S18 S1 S2 S12 S2 S26 S17 S2 S26 S15/16 S2 S17 S15 (DOVER LOW) S16 (DOVER HIGH) S2 S18 (SEAFORD) S12 (CLACTON EAST) S13 S17 (LYDD) S13 (HIGH) S14 (LOW)
DELEGATED TO S13/14 (FL235+) FL55 - FL660 DELEGATED TO SHANNON (FL245+) DELEGATED TO BREST (FL245+) PUBLICATION DATE: 20 JUN 01 COPIES OF THIS MAP ARE AVAILABLE FROM: OPERATIONAL INFORMATION, ROOM 3322, BOX 12, SWANWICK.
\\CAHSWNS01\SWANWICK.GLB$\ATC\NERC SECTORISATION.PDF
CHANGE:
S2 CORRECTED IN THE VICINITY OF THE WESTCOTT RC.
NERC SECTORISATION 20.06.01
NOT FOR OPERATIONAL USE
A medium sized system
114 controller workstations 20 supervisory/management positions 10 engineering positions 48-workstation simulator 2 15-workstation test systems 2.5 million lines of software >500 processors
Operational data
1,667,381 flights in 2002 Continuous operation, one 3-hour failure (other flight delays caused by NAS failures at West Drayton)
Challenges for the future
Current ATC safety depends on the controller’s ability to clear their sector with radio only.
Future traffic growth requires > 10 a/c on frequency. Controllers would be overloaded So future ATC will depend on automatic systems, which must not fail.
Target? At least the avionics standard:10 -8 pfh No current air traffic management systems are built to such standards. This could be your job in 3 years time.
How can we be sure a system works?
Assurance:
showing
that a system works Much harder than just system that works
developing
a you need to generate evidence that it works what evidence is sufficient?
How safe or reliable is a system that has never failed?
What evidence does testing provide?
How can we do better?
How safe is a system that has never failed?
If it has run for the operating conditions remain much the same, the best estimate for the probability of failure in the next n n hours without failure, and if hours is
0.5
To show that a system has a pfh of <10 fault-free testing. (10,000 hours is 13.89 months) -4 with 50% confidence, we need about 14 months of
What evidence does testing provide?
“Testing shows the presence, not the absence, of bugs” - Dijkstra We
cannot
test every path.
Testing individual operations or boundary conditions may find faults, but such tests provide no evidence of pfh.
Statistical
testing, under operational conditions, provides evidence of pfh.
But it takes a very long time.
Statistical testing
To show an MTBF of
n
hours, with 99% confidence, takes around 10
n
hours of testing with
no faults found
( >100,000 years.
) . So avionics (10 -8 pfh) would need around 10 9 hours With good prior evidence, e.g. from a strong process, using a Bayesian approach may reduce this to ~10,000 years Actual testing is trivially short by comparison.
Summary
Developing reliable software is difficult because of the size and complexity of real-life systems.
The software industry is very young, amateurish and immature. Most significant projects overrun dramatically (and unnecessarily) or totally fail.
In future lectures, I will explore why some failures have occurred (Therac, Arianne, LAS, Taurus …) and talk about what you need to know if you are to become a professional amongst all these amateurs.