Improvements in the System Safety State-of-the-Art

Download Report

Transcript Improvements in the System Safety State-of-the-Art

The “ANSI Process” for
System Safety Assurance
Presented at the Safety Case Workshop
Huntsville, AL; January 14th, 2014
David B. West, CSP, P.E., CHMM, Fellow
NATIONAL SECURITY • ENERGY & ENVIRONMENT • HEALTH • CYBERSECURITY
© SAIC. All rights reserved.
What do we mean by the “ANSI Process”?
• The publishing of best practices in ANSI/GEIA-STD-0010-2009 was done by a
working group of the SAE International G-48 System Safety Committee
• Best practices are developed and standardized so that the community of
practitioners can advance the state-of-the-art
• The best practices documented in ANSI/GEIA-STD-0010-2009 include:
• Designing a System Safety Program around 5 basic elements
• Using a modernized risk assessment matrix
• Describing hazards in terms of their Source, Mechanism, and Outcome
• Giving consideration to the concept of Total System Risk
• In this workshop, the “ANSI Process” refers to System Safety processes and
methodologies outlined in ANSI/GEIA-STD-0010-2009, “Standard Best Practices
for System Safety Program Development and Execution”
2
Outline of Presentation
• Brief background on the G-48 System Safety Committee
• How standardizing best practices can drive advancements in the state-ofthe-art
• The G-48 Committee’s development of ANSI/GEIA-STD-0010-2009
• The 5 basic elements of an effective system safety program, as
presented in ANSI/GEIA-STD-0010-2009
• Improvements, covered in ANSI/GEIA-STD-0010-2009, to the traditional
risk assessment matrix
• The source-mechanism-outcome model for describing hazards
• Risk summation
3
Overview of the G-48 System Safety Committee
•
•
•
•
•
•
Established in 1966 by the Electronics Industries Association (EIA)
System Safety experts from industry, government, military
Advisory body to U.S. Govt. on System Safety issues and standards – e.g., MIL-STD-882
Develops/seeks consensus on System Safety methodologies
Three meetings per year
Parent organizations after EIA:
–
–
–
–
GEIA
ITAA
TechAmerica
SAE International (July 2013)
• Mission Statement:
– To promote the development of safe systems, products, and processes: the G-48 Committee
compiles, develops, improves and publishes best practices in the discipline of System Safety.
4
Overview of the G-48 System Safety Committee (Cont.)
5
Overview of the G-48 System Safety Committee (Cont.)
G-48 Meeting No. 133 – Huntsville, AL – January 2013
6
How standardizing best practices can drive
advancements in the state-of-the-art
• A key motivating factor in developing ANSI/GEIA-STD-0010-2009 was the desire to make
improvements in the System Safety state-of-the-art.
• The next five charts graphically present a notional and non-quantitative picture of how
improvements in the practice of any human endeavor can be actively brought about through
the standardization of best practices.
• This approach for bringing about improvements has been successfully followed in several
other fields, including:
–
–
–
–
7
The medical profession
Steam boiler design and manufacturing
Fire protection in building design
The automotive industry
Frequency of Practice
Variation of Practice in a Typical Discipline
Measure of “Goodness”
(Proficiency, Effectiveness, Accuracy, Value, etc.)
8
Frequency of Practice
Standardization Option 1:
Define and Document Current Practice
Good news:
Bad news:
Recognition of
full spectrum
of current
practices
No improvement;
practice stagnates
Measure of “Goodness”
(Proficiency, Effectiveness, Accuracy, Value, etc.)
9
Frequency of Practice
Standardization Option 2 (Good):
Option 1 + Identify Central Tendency & Gradations
Good news:
Bad news:
Substandard
practices req’d
to improve
No improvement for
most of the spectrum;
practice stagnates
Minimally
Acceptable
Consensus
Exemplary, or
State-of-theArt
Substandard
Cutting
Edge
Measure of “Goodness”
(Proficiency, Effectiveness, Accuracy, Value, etc.)
10
Frequency of Practice
Standardization Option 3 (Better):
Option 2 + Decrease Variation
Good news:
Bad news:
These are
pressured to
improve…
…but these
might as well
“slack off”
Consensus
Measure of “Goodness”
(Proficiency, Effectiveness, Accuracy, Value, etc.)
11
Standardization Option 4 (Best):
Option 3 + Improve Mean Practice
Frequency of Practice
Good news:
More good
news:
Overall
spectrum of
practice
improves
No sacrifice
of gains at
the top of the
spectrum
Measure of “Goodness”
(Proficiency, Effectiveness, Accuracy, Value, etc.)
12
The G-48 Committee’s Development of
ANSI/GEIA-STD-0010-2009
•
•
•
•
•
13
Background: Acquisition Reform and MIL-STD-882D
Identified Opportunities for Improving System Safety Practice
The G-48 Committee’s Draft of MIL-STD-882E
“De-militarizing” the Draft 882E to Form an Industry Standard
Revision A of ANSI/GEIA-STD-0010-2009
The G-48 Committee’s Development of
ANSI/GEIA-STD-0010-2009
Background: Acquisition Reform and MIL-STD-882D
• Acquisition Reform efforts by the U.S. DOD in the late 1990’s resulted in eliminating many
military standards
• MIL-STD-882 was preserved by making Revision D (Feb 2000) much less prescriptive then
it had been in previous revisions (~30 pages, no S.S. tasks, guidance only)
• G-48 Committee received much feedback from 2000-2004 that industry, in general, did
not like MIL-STD-882D
• Committee agreed that:
– It was time to consider the preparing a revision of MIL-STD-882
– A new revision of MIL-STD-882 provided an opportunity for improving standard
practices
14
The G-48 Committee’s Development of
ANSI/GEIA-STD-0010-2009 (Cont.)
Identified Opportunities for Improving System Safety Practice
•
•
•
•
•
•
•
No universal understanding as to what basic elements are
included in a successful System Safety Program
Risk assessment matrix not laid out in Cartesian coordinates
(which would have risk increasing up and to the right)
Disproportionately scaled risk assessment matrix
No quantitative bounds for hazard probability categories; mixed probability and
frequency terms
No provision for taking hazard exposure interval into account
Using approach that if hazard risks – taken individually – are acceptable, then
system risk is acceptable (regardless of number or risk level of individual
hazards); i.e., no assessment of total system risk
Inconsistent and/or incomplete methods for describing hazards
These shortcomings were addressed in the System Safety
best practices documented in ANSI/GEIA-STD-0010-2009.
15
The G-48 Committee’s Development of
ANSI/GEIA-STD-0010-2009 (Cont.)
The G-48 Committee’s Draft of MIL-STD-882E
• In late summer 2004, a preliminary Draft 1 of 882E was prepared by Chuck Dorney, a
longtime G-48 participant, and distributed to the G-48 Committee for review – numerous
comments for improvement in late 2004 and early 2005
• All ideas for improvements presented to G-48 Committee in January 2005
• G-48 Action Item 109-01 was to “produce a strawman Draft MIL-STD-882E,
‘adding discipline to our discipline’”
• An ad hoc working group was formed from several Huntsville-based organizations: APT
Research, U.S. Army Aviation & Missile Command, SAIC
16
The G-48 Committee’s Development of
ANSI/GEIA-STD-0010-2009 (Cont.)
17
The G-48 Committee’s Development of
ANSI/GEIA-STD-0010-2009 (Cont.)
The G-48 Committee’s Draft of MIL-STD-882E (Cont.)
• Throughout 2005 and into early 2006, the G-48’s 882E working group held several
meetings to incorporate recommendations for improvement
• Primary Focus:
1) Simplifying Work Elements
and Process Flow
2) Modernizing the Risk
Assessment Matrix
3) Introducing Risk Summation
18
The G-48 Committee’s Development of
ANSI/GEIA-STD-0010-2009 (Cont.)
The G-48 Committee’s Draft of MIL-STD-882E (Cont.)
• February 2006: G-48’s Final Draft MIL-STD-882E submitted for review and approval
through U.S. DOD standardization process
• Approved by nearly every DOD standardization member that reviewed it
• Key non-concurrence by DOD’s Environment, Safety, and Occupation Health (ESOH)
Integrated Process Team (IPT); ESOH IPT took control
• G-48 Committee did not want to lose all the improvements that we worked so hard to
incorporate. So…
19
The G-48 Committee’s Development of
ANSI/GEIA-STD-0010-2009 (Cont.)
“De-militarizing” the Draft 882E to Form an Industry Standard
• After the key non-concurrences derailed the G-48's Draft 882E, the Committee embarked
on a new effort to rewrite the document as an industry (non-military) best practices
standard.
• A 3-person team performed a thorough scrub of the document to remove all militaryspecific terminology, weapon system references, etc.
• Result was the first real draft of what would become GEIA-STD-0010
• Additional Improvements:
– Emphasis on “Worst Case Risk” to replace “Most Reasonable Credible Mishap”
– Added “Engineered Safety Features” (ESF) to System Safety order of precedence
– Added guidance to describe hazards in terms of Source – Mechanism – Outcome
(SMO)
20
The G-48 Committee’s Development of
ANSI/GEIA-STD-0010-2009 (Cont.)
“De-militarizing” the Draft 882E
to Form an Industry Standard
(Cont.)
• GEIA-STD-0010 published in October 2008
• Approved by ANSI in February 2009 and republished as ANSI/GEIA-STD-0010-2009
21
The G-48 Committee’s Development of
ANSI/GEIA-STD-0010-2009 (Cont.)
Revision A of ANSI/GEIA-STD-0010-2009
• Feedback received from industry after the original version of GEIA-STD-0010 was released
indicated that the standard needed something analogous to the DOD’s Data Item
Descriptions, or DIDs
• In 2011, an effort was begun to develop Task Data Descriptions (TDDs), where
appropriate, for tasks from Appendix B of GEIA-STD-0010
• Approach:
– Compare tasks from MIL-STD-882C to new tasks in GEIA-STD-0010
– Adapt existing DIDs referenced from 882C to become new TDDs for corresponding
tasks in GEIA-STD-0010
– Develop new TDDs where necessary
• Purpose of Revision A was stated as:
…provide Task Data Descriptions (TDDs) for System Safety Tasks in Annex (sic) B of the
Standard. TDDs are analogous to Data Item Descriptions (DIDs) found in military
standards. The TDDs will be placed in a new appendix (Appendix C). This revision will
also incorporate numerous editorial corrections to the current version of the standard.
22
The Five Basic Elements of an Effective System Safety Program
1) Simplifying Work Elements
and Process Flow
2) Modernizing the Risk
Assessment Matrix
3) Introducing Risk Summation
23
The Five Basic Elements of an Effective System Safety Program
(Continued)
24
Credit: From analysis of various risk management processes and presentation
developed by APT Research, Huntsville, AL.
The Five Basic Elements of an Effective System Safety Program
(Continued)
The Eight Program Elements outlined in MIL-STD-882D and earlier versions were
combined and simplified into five, to provide a more concise representation of current
consensus practices.
1. Documentation of the system
safety approach
2. Identification of hazards
3. Assessment of mishap risk
4. Identification of mishap risk
mitigation measures
5. Reduction of mishap risk to an
acceptable level
6. Verification of mishap reduction
7. Review and acceptance of
residual mishap risk by the
appropriate authority
8. Tracking hazards and residual
mishap risk
25
1. Program Initiation
2. Hazard Identification and
Tracking
3. Risk Assessment
4. Risk Reduction
5. Risk Acceptance
I–A–R-A
The Five Basic Elements of an Effective System Safety Program
(Continued)
26
Improvements to the Traditional Risk Assessment Matrix
•
•
•
Matrix from MIL-STD882D
Axes converted to
logarithmic scales
Note:
•
•
27
Highest risk at upperleft
Huge variation in span
of risk covered by
different cells
A “Pop Quiz”
Identify as many ways as
possible that the risk matrix
at right could be improved
- Flip vertical axis to have
highest risk at upper-right
- Do not mix probability and
frequency terms
- Provide quantitative
bounds for likelihood and
consequence scales
- Consider changing 4C, 3D,
and 2E to High, or Yellow,
Risk (Bonus question:
Why?)
Good attribute: Numbering
of consequence categories
28
Improvements to the Traditional Risk Assessment Matrix
Hazard Frequency (Mishaps per <exposure interval>)
Hazard
Severity
7
Catastrophic 6
$200M
Catastrophic 5
$20M
Catastrophic 4
$2M
Critical 3
$200K
Marginal 2
$20K
Negligible 1
$2K
Designed
Out
I
Near Zero
0
H
Extremely
Infrequent
0.00001
Very
Infrequent
G 0.0001 F
Infrequent
Intermittent
Occasional
E
D
C
0.001
0.01
Typical 4x5 Matrix
High
100 Fatal
“Minimizability”
Serious
10 Fatal
Medium
1 Fatal
Low
de minimus
Adapted from Fig. 11 of “A Common Mishap Risk Assessment Matrix for
U.S. DoD Aircraft Systems,” D. Swallom, 23rd ISSC, 2005.
29
1
X
Catastrophic
$2B 1K Fatal
0.1
Somewhat
Frequent
B
10
Frequent
A
The Source-Mechanism-Outcome Model for Hazard Descriptions
• Previous definitions of “Hazard” did not always, or consistently, require enough information
• This model requires a hazard to be described in terms of its:
– SOURCE (the physical presence – situation, configuration, material, items, their
characteristics, proximity and/or potential for interface, energy, etc. – that exists
prior to, and enables, the initiation of an mishap sequence)
– MECHANISM (the complete sequence of events – actions, reactions, interactions,
etc. – from initiation of the mishap, through to stable end state)
– OUTCOME (the end result of the subject accident sequence, specified in terms of
the harm that would come to an asset of value; if a range of outcome severities was
possible, it is understood that the outcome stated for the described hazard is that
which, when paired with the probability of its occurrence, yields the highest risk, or
probability-severity combination)
• Describing a hazard with this model prompts the analyst to identify ways in which:
– The SOURCE can be eliminated, isolated, or otherwise protected
– The MECHANISM can be interrupted if it should start
– The OUTCOME can be mitigated
30
The Source-Mechanism-Outcome Model for Hazard Descriptions
(Continued)
Source
31
Mechanism
Outcome
The Source-Mechanism-Outcome Model for Hazard Descriptions
(Continued)
• A Practical Exercise
– Improve upon the following hazard descriptions by re-stating them in terms of a
SOURCE, MECHANISM, and OUTCOME (be creative and invent the context)
•
Slippery spot on walkway
Pipe carrying oil in the space over narrow walkway (SOURCE) develops a
leak; leaked oil accumulates on walkway; person using walkway slips on oil
and falls (MECHANISM), sustaining a major injury (OUTCOME)
•
Extremely hot surface in microgravity payload canister
External surface of furnace in payload canister reaches 800○F during
normal operation (SOURCE). Emergency abort from orbit necessitates reentry to atmosphere before surface of furnace can cool; flammable gases
in payload bay enter canister and are ignited by hot surface, causing
explosion (MECHANISM). Spacecraft disintegrates during descent, causing
death of all occupants (OUTCOME).
32
Summation of Total Risk
Partial System Risks (r) Assessed Individually:
Acceptable Level
r1
r2
r3
r4
…
rn
Total System Risk (R) Assessed as Σ (r1 + r2 + r3 + r4 + … + rn):
r1
r2
r3
r4
…
rn
?
Acceptable Level
33
Summation of Total Risk
(Continued)
Total System Risk (R)
Individual hazard risk (r)
n
≈
r1
i
i=1
r2
RISK TOLERANCE
r3
...
rn
34
Σ (r )
Presentation Recap
– The work of the TechAmerica G-48 System Safety Committee in developing and
publishing ANSI/GEIA-STD-0010-2009
– How a discipline can be advanced by standardizing its best practices
– The 5 basic elements of an effective System Safety Program, as outlined in
ANSI/GEIA-STD-0010-2009
– Attributes of a modernized risk assessment matrix
– The Source-Mechanism-Outcome model for describing hazards, and how its use
helps in the identification of effective hazard controls
– The concept of Summation of Total Risk
QUESTIONS?
35