Dependability and its threats: A Taxonomy

Download Report

Transcript Dependability and its threats: A Taxonomy

Dependable Computing:
Concepts, Challenges, Directions
Jean-Claude Laprie
COMPSAC 2004 — Hong Kong, September 28-30, 2004
 Concepts
 A. Avizienis, J.C. Laprie, B. Randell, C. Landwehr:
‘Basic Concepts and Taxonomy of Dependable and
Secure Computing’, IEEE Trans. on Dependable and
Secure Computing, vol. 1, no. 1, Jan-March 2004, pp.
11-33
 Challenges
 From real-life statistical data
 Directions
 For ubiquitous computing to be effective
Dependability:
ability to deliver service that can justifiably be trusted
Service delivered by a system: its behavior as it is perceived by its
user(s)
User: another system that interacts with the former
Function of a system: what the system is intended to do
(Functional) Specification: description of the system function
Correct service: when the delivered service implements the system
function
Service failure: event that occurs when the delivered service deviates
from correct service, either because the system does not comply with
the specification, or because the specification did not adequately
describe its function
Part of system state that may cause a subsequent service failure: error
Adjudged or hypothesized cause of an error: fault
Failure modes: the ways in which a system can fail, ranked according to
failure severities
Dependability:
ability to avoid service failures that are more frequent
or more severe than is acceptable
When service failures are more frequent or more severe than
acceptable: dependability failure
Dependability
Readiness
for usage
Continuity
of service
Absence
of catastrophic
consequences on
the user(s) and
the environment
Absence of
unauthorized
disclosure of
information
Absence
of improper
system
alterations
Ability to
undergo
repairs and
evolutions
Availability Reliability Safety Confidentiality Integrity Maintainability
Authorized actions
Security
Absence of unauthorized access to, or handling of, system state
Propagation
Causation
… FailuresCausationFaultsActivationErrors
Failures
Faults …
Availability Reliability Safety Confidentiality Integrity Maintainability
Fault
Prevention
Fault
Tolerance
Fault
Removal
Fault
Forecasting
Dependability
Attributes
Availability
Reliability
Safety
Confidentiality
Integrity
Maintainability
Means
Fault Prevention
Fault Tolerance
Fault Removal
Fault Forecasting
Threats
Faults
Errors
Failures
Service Threats
…
Failures
Faults
Phase of creation
or occurrence
System boundaries
Phenomenological
cause
Dimension
Objective
Capability
Persistence
Errors
Development faults
Operational faults
Internal faults
External faults
Natural faults
Human-made faults
Hardware faults
Software faults
Malicious faults
Non-malicious faults
Accidental faults
Incompetence faults
Permanent faults
Transient faults
Failures
Faults
…
Faults
Phase of creation
or occurrence
Development
Internal
System boundaries
Phenomenological
cause
Dimension
Objective
Intent
Human-made
Software
Non
Malicious
Non
Del
Del
Capability Acc Inc Acc Inc
Del Del
Internal
Nat
Hardware
Mal Mal
Operational
Nat
Human-made
Hdw Hdw Hdw
Non
Malicious
Non
Del
Nat
External
Del
Non Non
Mal Mal
Non
Mal
Non Non
Del Del
Non
Del
Acc Inc Acc Inc Acc Acc
Hardware
Non
Malicious
Non
Del
Del
Software
Mal
Mal
Del
Del
Acc Acc Inc Acc Inc
Non
Malicious
Non
Del
Del
Acc Inc Acc Inc
Persistence Per Per Per Per Per Per Per Per Per Per Per Per Tr Per Tr Tr Per Tr Tr Per Tr Per Tr Tr Per Tr Per Tr Tr Per Tr
Software
Flaws
Logic Hardw Errata
Phys
Bombs
Deter
Produc Defects
Development Faults
Physical
Interference
Physical Faults
Intrusion V
Attempts W
Interaction Faults
Input
Mistakes
Human-made Faults
Objective
Intent
Capability
Non-malicious
Non-deliberate
(Mistake)
Accidental
Deliberate
(Bad decision)
Incompetence
Individuals
&
organizations
Decision by
independent
professional
judgement by
board of enquiry or
legal proceedings
in court of law
Malicious
Accidental
Interaction
(operators,
maintainers)
&
Development
(designers)
Deliberate
Incompetence
Malicious logic
faults:
logic bombs,
Trojan horses,
trapdoors,
viruses,
worms,
zombies
Intrusion
attempts
Service Threats
…
Failures
Faults
Errors
Failures
Domain
Detectability
Consistency
Faults
…
Content failures
Early timing failures
Late timing failures
Halt failures
Erratic failures
Signalled failures
Unsignalled failures
Consistent failures
Inconsistent failures
Minor failures
Consequences
l
l
l
Catastrophic failures
…
Failure
Facility for
stopping
recursion
Context
dependent
Fault
Error
Activation
Propagation
Fault
…
Causation
Error
alters
service
Interaction
faults
Prior presence
of a
vulnerability:
Internal fault
that enables an
external fault to
harm the
system
Failure
Activation
reproducibility
Solid
(hard)
faults
Elusive
(soft)
faults
Elusive permanent faults
and
Transient faults
Intermittent faults
Interaction
or
composition
Development failures
Development process terminates before the system
is accepted for use and placed into service
Incomplete
or faulty
specifications
Excessive
number of
specification
changes
Inadequate
design wrt
functionality
or
performance
Too many
development
faults
Insufficient
predicted
dependability
Faulty
estimates of
development
costs
Partial development failures
Budget or schedule overruns
Downgrading to less functionality, performance, dependability
Dependability and its attributes
 Definitions of dependability
 Original definition: ability to deliver service that can
justifiably be trusted
 Aimed at generalizing availability, reliability, safety,

confidentiality, integrity, maintainability, that are then
attributes of dependability
Focus on trust, i.e. accepted dependence
Dependence of system A on system B is the extent to
which system A’s dependability is (or would be) affected
by that of system B
 Alternate definition: ability to avoid service failures that
are more frequent or more severe than is acceptable
 A system can, and usually does, fail. Is it however still
dependable ? When does it become undependable ?

criterion for deciding whether or not, in spite of service
failures, a system is still to be regarded as dependable
 Dependability failure  development fault(s)
Dependability vs. High Confidence vs. Survivability vs. Trustworthiness
Concept
Dependability
High Confidence
Survivability
Trustworthiness
Goal
1) ability to deliver
service that can
justifiably be
trusted
2) ability of a
system to avoid
service failures that
are more frequent
or more severe
than is acceptable
consequences of
the system
behavior are well
understood and
predictable
capability of a
system to fulfill its
mission in a timely
manner
assurance that a
system will perform
as expected
Threats
present
1) development
faults (e.g., software
• internal and
external threats
• naturally
occurring hazards
and malicious
attacks from a
sophisticated and
well-funded
adversary
1) attacks (e.g.,
1) hostile attacks
2) failures (internally
2) environmental
disruptions
flaws, hardware errata,
malicious logic)
2) physical faults
(e.g., production
defects, physical
deterioration)
3) interaction faults
(e.g., physical
interference, input
mistakes, attacks,
including viruses,
worms, intrusions)
intrusions, probes,
denials of service)
generated events due
to, e.g., software
design errors,
hardware degradation,
human errors,
corrupted data)
3) accidents
(externally generated
events such as natural
disasters)
(from hackers or
insiders)
(accidental disruptions,
either man-made or
natural)
3) human and
operator errors
(e.g., software flaws,
mistakes by human
operators)
Dependability
Subsumes concerns in reliability, availability, safety, confidentiality,
integrity, maintenability — the attributes of dependability — within a
unified conceptual framework; enables the appropriate balance
between the attributes to be addressed
Means for dependability — fault prevention, fault tolerance, fault
removal, fault forecasting — provide an orthogonal classification of
development activities; essential for abstract and discrete systems
(nonexistent or vanishing safety factor)
Causal chain of threats to dependability — fault - error - failure
Central to understanding and mastering various threats
likely to affect a system
Provides for a unified presentation of those threats, though
preserving their specificities via the various classes
Rigorous terminology — not just definitions: a model
abstraction
structuration
recursion
Avoiding intellectual confusion(s)
Focusing on scientific problems and technical choices
June 1980: False alerts at the North American Air Defense
(NORAD)
4
4
4
4
April 1981: First launch of the Space Shuttle postponed
4
4
June 1985 - January 1987: Excessive radiotherapy doses
(Therac-25)
4
4
August 1986 - 1987: the "wily hacker" penetrates several
tens of sensitive computing facilities
4
4
November 1988: Internet worm
4
4
15 January 1990: 9 hours outage of the long-distance phone
in the USA
4
February 1991: Scud missed by a Patriot (Dhahran, Gulf War)
4
4
November 1992: Crash of the communication system of
the London ambulance service
4
4
26 and 27 June 1993: Authorization denial of credit card
operations in France
4
4
4
4
4
4
4
4
4
4
Confidentiality
Safety
Distributed
Availability/
Reliability
Localized
Failures
Interaction
Development
Service failures
Physical
Faults
4
4
4
4
4
4
4
4 June 1996: Failure of Ariane 5 maiden flight
4
4
4
13 April 1998: Crash of the AT&T data network
4
4
4
4
February 2000: Distributed denials of service on large Web sites
4
4
4
4
May 2000: Virus I love you
4
4
4
4
July 2001: Worm Code Red
4
4
4
4
July 2001: Worm Sircam
4
4
4
August 2003: Propagation of the electricity blackout in the
USA and Canada
4
4
4
4
4
Non-malicious faults
Number of failures by
causes
[consequences and outage
durations highly
application-dependent]
Dedicated computer
systems
Larger, controlled
systems
(e.g., transactions,
electronic switching,
Internet back-end
servers)
(e.g., commercial
airplanes; telephone
network; Internet frontend servers for web
applications)
Faults
Rank
Proportion
Rank
Proportion
Physical internal
3
~ 10%
2
15-20%
Physical interaction
3
~ 10%
2
15-20%
Human-made interaction *
2
~ 20%
1
40-50%
Development
1
~ 60%
2
15-20%
* Root analysis evidences that human-made interaction faults often
can be traced to development faults
99.9999%
Availability
99.999%
99.99%
Cell
phone
s
99.9%
99%
Interne
t
9%
1950
1960
1970
From J. Gray, Dependability in the Internet era
• Complexity
• Economic pressure
« faster, cheaper, badder »
1980
1990
Availability
2000
Outage
duration/yr
0,999999
32s
0,99999
5mn 15s
0,9999
52mn 34s
0,999
8h 46mn
0,99
3j 16h
0,9
36j 12h
Tandem fault tolerant systems
NetCraft — Uptime statistics
2000
Duration
(yrs)
7000
Systems
Processors
9000
25500
30000
80000
1800
Disks
74000
200000
1600
438
1400
150
Exploitation
100
50
0
611
695
744
809
Number of requests
www.daiko-lab.co.jp
www.google.com
Software
Total
936
200
400
Avg 259
200
Avg 92
0
1027
250
600
1084
300
Environment
Hardware
800
1312
350
Maintenance
ma x
1337
400
avg
1000
1895
450
1200
2605
MTBF (yrs)
21 yrs
6298
MTBF System
4389
Reported outages
12849
Clients
Top 50 most requested sites
Uptime (hours)
Number
www.microsoft.com
www.whitehouse.gov
80
70
60
100
80
60
0
In
c
r e or .
s.
0
s
E
st r
a t ro
us r
Ex
ce
p
D Ke .
e rn
bu e
g l
.
S
ha y s
ng t .
. h Ap
an p l
g
In
c
re o r .
s.
ob
90
st Er
a t ro
us r
Ex
ce
p
D Ke .
e rn
bu e
g l
.
Sy
ha s t
ng .
.h A p
a n pl
g
Present in Jan. 1999
o
0,05
bs
Uncovered
N
0,1
o
0,15
% responses
0,2
o
Intel Processors
% responses
Robust ness t est ing of POSIX Calls
N
120
100
80
60
40
20
0
(M X
ay eo
98 n
)
ig AIX
ita
4
l U .1
Fr
e e n ix
B S 4.0
D
2.
2.
Iri 5
H
P
x
UX
6.
2
B
.1
Li
nu 0.20
x
2.
Ly
nx 0.1
8
O
S
2.
N
et
4.
0
B
S
D
1.
Q
3
N
X
4.
S
un 2 .4
O
S
5.
5
D
0,25
8)
P
(m e
a r nt
P e c h i um
nt 95
iu )
(N m
o P
P v 95 ro
en )
(M t i u
a m
M y 9 7 II
o
b )
(A ile
ug P.
9 II
C 8)
(A e le
pr r
il 9 o n
# "Errat a"
Failure probabilit y
(C)OTS data: OS and hardware
Test of Chorus Classix R3
Int ernal fault s
Average
Min-Max
Ext ernal fault s
Average
Min-Max
50
40
30
20
10
0
Sync component
Int. faults, standard
Int. faults, wrapped
Ext. faults, standard
Ext. faults, wrapped
40
20
Malicious faults: statistics from SEI/CERT
Reported incidents
Reported vulnerabilities
90000
80000
70000
4500
4000
3500
60000
3000
50000
2500
40000
2000
30000
1500
1000
20000
10000
0
1995 1996 1997 1998 1999 2000 2001 2002
500
0
1995 1996 1997 1998 1999 2000 2001 2002
Global Information Security Survey 2003 — Ernst & Young
Hardware failures
Telecommunications failures
Software failure
Viruses and worms
Third party failure
Operational errors
Infrastructure failure
System capacity failure
DDOS attack
Employee misconduct
Natural disaster
Inadvertent act of bus. partner(s)
Malicious technical acts
Former employee misconduct
Business partner misconduct
0%
5%
10%
15%
20%
Non malicious faults
81%
Malicious faults
19%
Yearly survey on computer damages in France — CLUSIF (2000, 2001, 2002)
Occurrences
Non malicious causes
71%
Internal
fai lures 
Physical
25,0%
sabotage 
20,0%
15,0%
T hefts
10,0%
5,0%
Im age
0
damage
Loss of
essential 
servi ces
Non malicious causes
29%
Natural
events 
Physical

acci dents
Disclosures
and frauds
T argetted
logic
attacks
Risk perception
Uti li zati on
errors
Viru s
infection
Desi gn 
errors
Physical
sabotage 30, 0%
T hefts
Loss of
essential
servi ces
Natural
events
20, 0%
10, 0%
Im age

damage
Physical
acci dents
Disclosures
and frauds
Malicious cause s
29%
T argetted
logic
attacks
Occurrence impact
100%
80%
Internal
fai lures
Low
60%
Uti li zati on
errors 
Viru s
infecti on
Desi gn 
errors
Malicious caus es
71%
Average
40%
High
20%
0%
Int. Loss Nat Uti l. Des. Virus T hefts
fai l. ess. events err. err. Infec.
serv.
Non malicious causes
Malicious Causes
3 year trends
 stable
 increase
 decrease
Development failures
1994
2002
Number of surveyed projects
8,380
13,522
Successful projects (completed on-time and on-budget, with all
features and functions as initially specified)
16%
34%
Challenged projects (completed and operational but overbudget, over the time estimate, and offers fewer features and
functions than originally specified)
53%
51%
Canceled projects
31%
15%
Overruns for challenged projects
89%
82%
Left functions for challenged projects
61%
52%
Total estimated budget for software projects in the USA, in G$
250
225
81
38
Estimated lost value for software projects in the USA, in G$
From Standish Group (Chaos reports)
High dependability for safety-critical or dedicated systems
Avionics, railway signalling,
nuclear control, etc.
Transaction processing,
back-end servers, etc.
Scalability of dependability?
Continuous complexity growth (web-based applications, networked
embedded systems)
In addition to fault removal,
Generalization of
fault tolerance
Physical
faults
Residual
software
faults
Intrusions
Vulnerabilities
[some
unavoidable for
usability]
Human-made
interaction
faults:
Administration,
configuration,
maintenance
faults
Fault tolerance
Physical
faults
Residual
software
faults
Fail-fast &
reconfigure
Checkpointing
for elusive faults
Rejuvenation
for data aging
Intrusions
Information
dispersal
(storage)
Platform
diversity
Human-made
interaction
faults
System
automation
(‘autonomic
systems’)
Automation
paradox
Error detection
Wrapping for COTS
Fault tolerance assessment
Coverage demonstration, by analysis (incl. formal) and by
experiments (representative fault injection)