Transcript Slide 1

Lecture 2: Combinatorial
Modeling
CS 7040
Trustworthy System Design,
Implementation, and Analysis
Spring 2015, Dr. Rozier
Adapted from slides by WHS at UIUC
Introduction
Introduction to Combinatorial
Methods
• One of the simplest validation methods
utilizing analytical/numerical techniques that
can be used for reliability and availability
modeling.
• Requires certain assumptions…
Combinatorial Assumptions
• Component failures are independent
• For availability, repairs are independent
When these assumptions hold, simple formulas
for reliability and availability exist!
Defining Reliability
Reliability
• A key to trustworthy systems is the use of
reliable components and systems.
– Leads to high availability!
• Reliability: The reliability of a system at time t,
R(t), is the probability that system operation is
proper throughout the interval [0,t].
Reliability
• Reliability: The reliability of a system at time t,
R(t), is the probability that system operation is
proper throughout the interval [0,t].
• Probability theory and combinatorics can be
applied directly to reliability models.
Reliability
• Reliability: The reliability of a system at time t,
R(t), is the probability that system operation is
proper throughout the interval [0,t].
• Let X be a random variable representing the
time to failure of a component. The reliability
at time t is given by:
Reliability
• Reliability: The reliability of a system at time t,
R(t), is the probability that system operation is
proper throughout the interval [0,t].
• Unreliability can be defined similarly as:
Probability Refresher
• A random variable X is unique determined by
its set of possible values, , and and the
associated probability distribution (or density)
function (pdf), a real-valued function
defined for each possible value
as a
probability that X has the value x
Probability Refresher
• The cumulative distribution function (cdf) of
the discrete random variable X is the real
valued function
defined for each
as
Probability Refresher
• The cumulative distribution function (cdf) of
the continuous random variable X is the real
valued function
defined for each
as
PDFs and CDFs
Reliability and Unreliability
• Reliability:
• Unreliability:
Failure Rates
Failure Rate
• What is the rate at which a component fails at
time t?
– The probability that a component that has not yet
failed, fails in the interval
Note: We are not looking at
We are seeking
Failure Rate
Failure Rate
Failure Rate
Failure Rate
•
is called the failure rate or hazard rate
Survival Function
• In addition to the reliability/hazard function
we have the survival function
Survival Function
• In addition to the reliability/hazard function
we have the survival function
Survival Function
• In addition to the reliability/hazard function
we have the survival function
Survival and Hazard
• Hazard (or Failure) function – instantaneous
failure rate at some time t.
• Survival function – the probability that the
time of failure is later than some time t.
Typical Failure Rate
System Reliability
System Reliability
• While
can give the reliability of a
component, how do you compute the
reliability of a system?
System Reliability
System failure can occur when one, all, or some
of the components fail. If one makes the
independent failure assumption, system failures
can be computed quite simply.
The independent failure assumption states that
all component failures of a system are
independent, i.e., the failure of one component
does not cause another component to be more
or less likely to fail.
System Reliability
• Given this assumption, we can determine:
– Minimum failure time of a set of components
– Maximum failure time of a set of components
– Probability that k of N components have failed at a
particular time t.
Maximum of n Independent Failure
Times
• Let
be independent component
failure times. Suppose the system fails at time
S if all the components fail.
• Thus,
• What is
?
Maximum of n Independent Failure
Times
Maximum of n Independent Failure
Times By independence
Maximum of n Independent Failure
Times
By definition
Maximum of n Independent Failure
Times
Minimum of n Independent Failure
Times
• Let
be independent component
failure times. A system fails at time S if any of
the components fail.
• Thus,
• What is
?
Minimum of n Independent Failure
Times
• What is
?
Minimum of n Independent Failure
Times
• What is
?
• Trick: If
is an event, and
complement such that
and
, then
is the set
Minimum of n Independent Failure
Times
Minimum of n Independent Failure
Times
By trick
Minimum of n Independent Failure
Times
By independece
Minimum of n Independent Failure
Times
By LOTP
Minimum of n Independent Failure
Times
k of N
• Let
be component failure times
that have identical distributions (i.e.,
). The system has
failed by time S if k or more of the N
components have failed by S.
P[at least k components failed by time t]
= P[exactly k failed OR exactly k+1 failed …]
= P[exactly k failed] + P[exactly k+1 failed] …
k of N
• What is P[exactly k failed]?
= P[k failed and (N – k) have not]
where
is the failure distribution of each
component
k of N in General
• For non-identical failure distributions, we
must sum over all combinations of at least k
failures.
• Let
be the set of all subsets of
such that each element in
is a set of size at
least k, i.e.,
k of N in General
• The set
represents all the possible failure
scenarios.
• Now
is given by
Component Building Blocks
• Complex systems can be analyzed
hierarchically.
Example: A computer fails if both power
supplies fail, or both memories fail, or if the
CPU fails.
System problem is one of a minimum: the
system fails when the first of three subsystems
fails…
Component Building Blocks
• Power supply subsystem is a maximum: both
must fail
• Memory supply subsystem is a maximum:
both must fail
Summary
A system comprises N components, where the
component failure times are given by the
random variables
. The system fails
at time S with distribution
if:
Condition
All components fail
One component fails
k components fail, identical distributions
k components fail, general case
Distribution
Reliability Formalisms
Reliability Formalisms
• There are several popular graphical
formalisms to express system reliability. The
core of the solvers for these formalisms are
the methods we have just examined. We will
discuss a subset of these formalisms:
– Reliability Block Diagrams
– Fault Trees
– Reliability Graphs
Reliability Formalisms
• There are several popular graphical
formalisms to express system reliability. The
core of the solvers for these formalisms are
the methods we have just examined. We will
discuss a subset of these formalisms:
– Reliability Block Diagrams
– Fault Trees
– Reliability Graphs
There is nothing special about these formalisms
except their popularity.
What is a Graphical Formalism
• A way to draw visual diagrams with formal
underlying mathematical meanings.
Reliability Block Diagrams
• Blocks represent components
• A system failure occurs if there is no path from
source to sink.
Reliability Block Diagrams
• Series:
– System fails if any component fails
Reliability Block Diagrams
• Parallel:
– System fails if all components fail
Reliability Block Diagrams
• k of N:
– System fails if at least k of N components fail.
Example
A NASA satellite architecture under study is designed for high reliability. The
major computer system components include the CPU system, the high-speed
network for data collection and transmission, and the low-speed network for
engineering and control. The satellite fails if any of the major systems fail.
There are 3 computers, and the computer system fails if 2 or more of the
computers fail. Failure distribution of a computer is given by
There is a redundant (2) high-speed network, and the high-speed network
system fails if both networks fail. The distribution of a high-speed network
failure is given by
The low-speed network is arranged similarly, with a failure distribution of
Example
Example
Example
Example
Background: Series-Parallel Graphs
Series-Parallel Decomposition of
NASA Example
Fault Trees
Fault Tree Example
Reliability Graphs
Reliability Graph Example
Solve by Conditioning
Solve by Conditioning
Conditioning Fault Trees
Reliability/Availability Point
Estimates
Reliability/Availability Tables
Reliability Modeling Process
Reliability Modeling Process
Reliability Modeling Process
For next time
• Homework 1!
• Due next Tuesday
• Review combinatorics