What is Software Quality? Chapter 2 Pressman's definition of "Software Quality" Conformance to explicitly stated functional and performance requirements, explicitly documented development standards, and implicit characteristics that.

Download Report

Transcript What is Software Quality? Chapter 2 Pressman's definition of "Software Quality" Conformance to explicitly stated functional and performance requirements, explicitly documented development standards, and implicit characteristics that.

What is
Software Quality?
Chapter 2
Pressman's definition of
"Software Quality"
Conformance to explicitly stated functional
and performance requirements, explicitly
documented development standards, and
implicit characteristics that are expected of
all professionally developed software.
text page 25
IEEE Definition of
"Software Quality"
1.
The degree to which a system,
component, or process meets specified
requirements.
2.
The degree to which a system,
component, or process meets customer
or user needs or expectations.
text page 24
IEEE Definition of
"Software Quality Assurance"
1.
2.
A planned and systematic pattern of all
actions necessary to provide adequate
confidence that an item or product conforms
to established technical requirements.
A set of activities designed to evaluate the
process by which the products are
developed or manufactured. Contrast with
quality control.
CMM
"The Capability Maturity Model for Software
developed by the SEI is a framework that describes
the key elements of an effective software process.
The CMM describes an evolutionary improvement
path for software organizations from an ad hoc,
immature process to a mature, disciplined one."
"The CMM covers practices for planning,
engineering, and managing software development
and maintenance. When followed, these practices
improve the ability of organizations to meet goals for
cost, schedule, functionality, and product quality."
More Definitions

"software error"
"software fault"
"software failure"

types of errors
1.
2.
3.
4.
code error
procedure error
documentation error
software data error
text sections 2.1 and 2.2
Causes of software errors
1.
faulty requirements definition
2.
client-developer communication failures
3.
deliberate deviations from software requirements
4.
logical design errors
5.
coding errors
6.
non-compliance with documentation and coding instructions
7.
shortcomings of the testing process
8.
procedure errors
9.
documentation errors
text section 2.3
Cost of Errors
"Software bugs, or errors, are so prevalent and so detrimental
that they cost the U.S. economy an estimated $59.5 billion
annually, or about 0.6 percent of the gross domestic product. …
Although all errors cannot be removed, more than a third of
these costs, or an estimated $22.2 billion, could be eliminated by
an improved testing infrastructure that enables earlier and more
effective identification and removal of software defects. These
are the savings associated with finding an increased percentage
(but not 100 percent) of errors closer to the development stages
in which they are introduced. Currently, over half of all errors are
not found until "downstream" in the development process or
during post-sale software use."
US Dept of Commerce
June 2002
Some Famous Software Errors
Therac-25
 Patriot Missile System
 NASA's Mars Polar Lander
 ESA's Ariane 5 Launch System
 2003 Blackout

many details stolen from
www.wikipedia.org
Therac-25 - the problem


When operating in soft X-ray mode, the machine was
designed to rotate three components into the path of the
electron beam, in order to shape and moderate the power
of the beam. …
The accidents occurred when the high-energy electronbeam was activated without the target having been rotated
into place; the machine's software did not detect that this
had occurred, and did not therefore determine that the
patient was receiving a potentially lethal dose of radiation,
or prevent this from occurring.
Therac-25 - the reasons





The design did not have any hardware interlocks to prevent the electronbeam from operating in its high-energy mode without the target in place.
The engineer had reused software from older models. These models
had hardware interlocks and were therefore not as vulnerable to the
software defects.
The hardware provided no way for the software to verify that sensors
were working correctly.
The equipment control task did not properly synchronize with the
operator interface task, so that race conditions occurred if the operator
changed the setup too quickly. This was evidently missed during testing,
since it took some practice before operators were able to work quickly
enough for the problem to occur.
The software set a flag variable by incrementing it. Occasionally an
arithmetic overflow occurred, causing the software to bypass safety
checks.
Patriot Missile System


On February 25, 1991, the Patriot missile battery at Dharan, Saudi
Arabia had been in operation for 100 hours, by which time the
system's internal clock had drifted by one third of a second. For a
target moving as fast as an inbound TBM, this was equivalent to a
position error of 600 meters.
The radar system had successfully detected the Scud and predicted
where to look for it next, but because of the time error, looked in the
wrong part of the sky and found no missile. With no missile, the initial
detection was assumed to be a spurious track and the missile was
removed from the system. No interception was attempted, and the
missile impacted on a barracks killing 28 soldiers.
Mars Polar Lander



The last telemetry from Mars Polar Lander was sent just
prior to atmospheric entry on December 3, 1999. No further
signals have been received from the lander. The cause of
this loss of communication is unknown.
According to the investigation that followed, the most likely
cause of the failure of the mission was a software error that
mistakenly identified the vibration caused by the deployment
of the lander's legs as being caused by the vehicle touching
down on the Martian surface, resulting in the vehicle's
descent engines being cut off whilst it was still 40 meters
above the surface, rather than on touchdown as planned.
Another possible reason for failure was inadequate
preheating of catalysis beds for the pulsing rocket thrusters
Ariane 5 Rocket



June 4, 1996 was the first test flight of the Ariane 5 launch system. The rocket
tore itself apart 37 seconds after launch, making the fault one of the most
expensive computer bugs in history.
The Ariane 5 software reused the specifications from the Ariane 4, but the
Ariane 5's flight path was considerably different and beyond the range for
which the reused code had been designed. Specifically, the Ariane 5's
greater acceleration caused the back-up and primary inertial guidance
computers to crash, after which the launcher's nozzles were directed by
spurious data. Pre-flight tests had never been performed on the re-alignment
code under simulated Ariane 5 flight conditions, so the error was not
discovered before launch.
Because of the different flight path, a data conversion from a 64-bit floating
point to 16-bit signed integer caused a hardware exception (more specifically,
an arithmetic overflow, as the floating point number had a value too large to
be represented by a 16-bit signed integer). Efficiency considerations had led
to the disabling of the exception handler for this error. This led to a cascade
of problems, culminating in destruction of the entire flight.
2003 North America Blackout
August 14, 2003





















12:15 p.m. Inaccurate data input renders a system monitoring tool in Ohio ineffective.
1:31 p.m. The Eastlake, Ohio, generating plant shuts down.
2:02 p.m. First 345-kV line in Ohio fails due to contact with a tree in Walton Hills, Ohio.
2:14 p.m. An alarm system fails at FirstEnergy's control room and is not repaired.
2:27 p.m. Second 345-kV line fails due to tree.
3:05 p.m. A 345-kV transmission line fails in Parma, south of Cleveland due to a tree.
3:17 p.m. Voltage dips temporarily on the Ohio portion of the grid. Controllers take no action, but power shifted by the first failure
onto another 345-kV power line causes it to sag into a tree. While Mid West ISO and FirstEnergy controllers try to understand the
failures, they fail to inform system controllers in nearby states.
3:39 p.m. A First Energy 138-kV line fails.
3:41 and 3:46 p.m. Two breakers connecting FirstEnergy’s grid with American Electric Power are tripped as a 345-kV power line
and 15 138-kV lines fail in northern Ohio. Later analysis suggests that this could have been the last possible chance to save the grid
if controllers had cut off power to Cleveland at this time.
4:06 p.m. A sustained power surge on some Ohio lines begins uncontrollable cascade after another 345-kV line fails.
4:09:02 p.m. Voltage sags deeply as Ohio draws 2 GW of power from Michigan.
4:10:34 p.m. Many transmission lines trip out, first in Michigan and then in Ohio, blocking the eastward flow of power. Generators
go down, creating a huge power deficit. In seconds, power surges out of the East, tripping East coast generators to protect them,
and the blackout is on.
4:10:37 p.m. Eastern Michigan grid disconnects from western part of state.
4:10:38 p.m. Cleveland separates from Pennsylvania grid.
4:10:39 p.m. 3.7 GW power flow from East through Ontario to southern Michigan and northern Ohio, more than ten times larger
than the condition 30 seconds earlier, causing voltage drop across system.
4:10:40 p.m. Flow flips to 2 GW eastward from Michigan through Ontario, then flip westward again in a half second.
4:10:43 p.m. International connections begin failing.
4:10:45 p.m. Western Ontario separates from east when power line north of Lake Superior disconnects. First Ontario plants go
offline in response to unstable system. Quebec is protected because its lines are DC, not AC.
4:10:46 p.m. New York separates from New England grid. 4:10:50 p.m. Ontario separates from Western New York grid.
4:11:57 p.m. Last lines between Michigan and Ontario fail.
4:13 p.m. End of cascade. 256 power plants are off-line. 85% went offline after the grid separations occurred,
most of them on automatic controls. 50 million people without power.
Next…

Software Quality Factors
 how
do you know quality when you see it?
 how do you measure quality?