STPA: A New Technique for Hazard Analysis Based on STAMP

Transcript STPA: A New Technique for Hazard Analysis Based on STAMP

STPA
A new hazard analysis technique based on
the STAMP model of accident causation
Outline
• What is STPA
• STPA process
• Example: Robot
• Example: TCAS
• Comparison and Results
• In-class STPA
2
STAMP-Based Hazard Analysis (STPA)
• Basic premise is to prevent accidents by enforcing safety
constraints on system behavior (controlling hazardous
system states)
• Goals (same as any hazard analysis)
– Identification of system hazards and related safety constraints
necessary to ensure acceptable risk
• Design For Safety
– Accumulation of information about how hazards can occur.
– Use info to eliminate, mitigate and control hazards in system
design, development, manufacturing, and operations
3
Controlling States
• Since hazardous states can be prevented through
appropriate control (enforcing safety constraints), this
hazard analysis method seeks to find instances of
Inadequate Control
• Inadequate control occurs when there are state
transitions to hazardous states
• The commands, decisions, or actions that lead to
violation of safety constraints:
• “Inadequate Control Actions”
4
Inadequate Control Actions
Identify inadequate control actions
1. A required control action is not provided or not
followed
2. An incorrect or unsafe control action is provided
3. A potentially correct control action is provided too
late or too early (at the wrong time)
4. A correct control action is stopped too soon.
5
Control Flaw Taxonomy
•
Design of the control algorithm does not enforce
constraints
–
–
–
•
Flaw(s) in creation process
Process changes without appropriate change in control
algorithm (asynchronous evolution)
Incorrect modification or adaptation
Process models are inconsistent, incomplete, or incorrect
–
–
–
Flaw(s) in creation process
Flaw(s) in updating process
Inadequate or missing feedback
•
•
•
•
Not provided in system design
Communication flaw
Time lag
Inadequate sensor operation
6
Control Flaw Taxonomy (cont)
• Time lags and measurement inaccuracies not accounted
for
• Expected process inputs are wrong or missing
• Expected control inputs are wrong or missing
• Disturbance model is wrong
– Amplitude, frequency, or period is out of range
– Unidentified disturbance
• Inadequate coordination among controllers and decision
makers
7
Inadequate Control Execution
Inadequate Execution of Control Actions
• Communication flaw
• Inadequate actuator operation
• Time lag
8
© Copyright Nancy Leveson, Aug. 2006
STPA: A New Hazard Analysis Technique
Based on STAMP
Inadequate control
Commands
Controller
Control
Input
Wrong or
Missing
Actuator(s)
Inadequate
Actuator
Operation
Process Input
Wrong or
Missing
Inadequate
Control
Algorithm
Controlled
Process
Failure
Process
Model
Wrong
Disturbances
Unidentified
or Out of
Range
Sensor(s)
Feedback
Wrong or
Missing
Inadequate
Sensor
Operation
Process Output
Wrong or Missing
9
How to Perform STPA
1. High-level Hazard Analysis:
• Indentify Accidents
• Hazards
• High-level Safety Constraints
2. Identify Inadequate Control Actions
• Control structure
3. Control Flaws
• In the design
4. Change design to eliminate, mitigate, or control
potentially unsafe control actions and behaviors.
• Or accept
5. Iterate
10
Identifying and Specifying Safety
Constraints
• Most requirements only specify nominal behavior
– Need to specify off-nominal behavior
– Need to specify what system and software must NOT do
• What must not do is not inverse of what must do
• Derive from system hazard analysis
11
© Copyright Nancy Leveson, Aug. 2006
Example: Mobile Robot
12
Thermal Tile Robot Example
1. Identify high-level functional requirements and
environmental constraints.
e.g. size of physical space, crowded area
2. Identify high-level hazards
a. Violation of minimum separation between mobile base and
objects (including orbiter and humans)
b. Mobile robot becomes unstable (e.g., could fall over)
c. Manipulator arm hits something
d. Fire or explosion
e. Contact of human with DMES
f. Inadequate thermal control (e.g., damaged tiles not detected,
DMES not applied correctly)
g. Damage to robot
13
© Copyright Nancy Leveson, Aug. 2006
Thermal Tile Robot Example (2)
3. Restate hazards as high-level safety constraints
e.g. Robot must not allow humans to come in contact with DMES
4. Try to eliminate from system design
5. If cannot be eliminated or adequately controlled at
system design level, will need to refine and allocate
them to system components.
14
© Copyright Nancy Leveson, Aug. 2006
Design Constraints are Refined and Traced to
Components
1.4.2.1 Mobile Base (MB):
Requirements:
MB-FR1: The mobile base shall be able to carry all the mobile robot
subsystem components [2.6.3(73)]
MB-FR2: The mobile base shall be able to move smoothly in any
direction and to cross cable covers on the floor [EA.3(15), H3(38),
2.6.2(73)]
MB-FR3: The mobile base shall be able to raise its inspection and
injection equipment to the level required for servicing the tiles, from
2.9 meters to 4 meters [EA.2(15), 2.6.3(73), 2.10.1(81), 2.10.4(81)]
15
© Copyright Nancy Leveson, Aug. 2006
Design Constraints are Refined and Traced to
Components (2)
Design Constraints:
MB-C1: The mobile base must be no more than 2.5 meters long and
1 meter wide. While moving, it must fit under structural beams as
low as 1.75 meters [EA.2(15), 4.6)]
Safety-Related Design Constraints
MB-SC1: The mobile base must be able to ensure accuracy of 10 cm
for positioning and 1 mm for tile servicing (inspection and injection
tasks [EA.2(15), H4(38), 2.6.1(73), 2.6.4(73)]
MB-SC2: The mobile base design must protect against fire and
explosion [H6(39), 2.6.5(73), 2.6.6(73)]
MB-SC3: It must be possible to move the mobile base out of the way
in case of an emergency [2.9.2(79)]
16
© Copyright Nancy Leveson, Aug. 2006
Design Constraints are Refined and Traced to
Components (3)
Motor Controller:
2.9.2 The drivetrains for locomotion are within the diameter of the
wheel hub and consist of a brushless DC motor, resolver for
positioning and commutation, a brake, a cycloidal reducer providing
225:1 gear reduction with exceptional stiffness, and a locking hub
that couples the output of the reducer to the wheel. The locking hub
allows the operator to disengage the wheels from the drivetrain
completely [MB-SC3(20)]
Rationale: In an emergency, the ability to disengage the
wheels will allow towing or pushing the machine out of the
way.
17
© Copyright Nancy Leveson, Aug. 2006
Define preliminary control structure and refine constraints
and design in parallel.
18
© Copyright Nancy Leveson, Aug. 2006
Refinement and Allocation
• After defining initial control structure, refine constraints and
design in parallel.
– Identify potentially hazardous control actions by each of system
components that would violate system design constraints. Restate as
component safety design requirements and constraints.
– Perform hazard analysis using STPA to identify how safety-related
requirements and constraints could be violated (the potential causes
of inadequate control and enforcement of safety-related constraints).
– Augment the basic design to eliminate, mitigate, or control potential
unsafe control actions and behaviors.
– Iterate over the process, i.e. perform STPA on the new augmented
design and continue to refine the design until all hazardous
scenarios are eliminated, mitigated, or controlled.
• Document design rationale and trace requirements and
constraints to the related design decisions.
19
© Copyright Nancy Leveson, Aug. 2006
Try to eliminate hazards from system conceptual
design. If not possible, then identify controls and
new design constraints.
For unstable base hazard
System Safety Constraint:
Mobile base must not be capable of falling over under
worst case operational conditions
20
© Copyright Nancy Leveson, Aug. 2006
First try to eliminate:
1.
Make base heavy
Could increase damage if hits someone or something.
Difficult to move out of way manually in emergency
2.
Make base long and wide
Eliminates hazard but violates environmental constraints
3.
Use lateral stability legs that are deployed when manipulator arm
extended but must be retracted when mobile base moves.
Two new design constraints:
•
Manipulator arm must move only when stabilizer legs are fully
deployed
•
Stabilizer legs must not be retracted until manipulator arm is fully
stowed.
21
© Copyright Nancy Leveson, Aug. 2006
Identify potentially hazardous control actions by
each of system components
1.
A required control action is not provided or not followed
2.
An incorrect or unsafe control action is provided
3.
A potentially correct or inadequate control action is provided too late or
too early (at the wrong time)
4.
A correct control action is stopped too soon.
Hazardous control of stabilizer legs:
•
Legs not deployed before arm movement enabled
•
Legs retracted when manipulator arm extended
•
Legs retracted after arm movements are enabled or retracted
before manipulator arm fully stowed
•
Leg extension stopped before they are fully extended
22
© Copyright Nancy Leveson, Aug. 2006
Restate as safety design constraints on components
1. Controller must ensure stabilizer legs are extended
whenever arm movement Is enabled
2. Controller must not command a retraction of stabilizer
legs when manipulator arm extended
3. Controller must not command deployment of stabilizer
legs before arm movements are enabled. Controller
must not command retraction of legs before
manipulator arm fully stowed
4. Controller must not stop leg deployment before they
are fully extended
23
© Copyright Nancy Leveson, Aug. 2006
Do same for all hazardous commands:
e.g., Arm controller must not enable manipulator arm
movement before stabilizer legs are completely
extended.
At this point, may decided to have arm controller and leg
controller in same component
24
© Copyright Nancy Leveson, Aug. 2006
To produce detailed scenarios for violation of
safety constraints, augment control structure with
process models
Arm Movement
Stabilizer Legs
Manipulator Arm
Enabled
Disabled
Unknown
Extended
Retracted
Unknown
Stowed
Extended
Unknown
How could become inconsistent with real state?
e.g. issue command to extend stabilizer legs but external
object could block extension or extension motor could
fail
25
© Copyright Nancy Leveson, Aug. 2006
Problems often in startup or shutdown:
e.g., Emergency shutdown while servicing tiles. Stability legs manually
retracted to move robot out of way. When restart, assume stabilizer legs
still extended and arm movement could be commanded. So use
“unknown” state when starting up
Do not need to know all causes, only safety constraints: - - May
decide to turn off arm motors when legs extended or when arm
extended. Could use interlock or tell computer to power it off.
- Must not move when legs extended? – Power down wheel motors
while legs extended.
Coordination problems
26
© Copyright Nancy Leveson, Aug. 2006
Example: TCAS
27
Step 1: Identify hazards and translate into highlevel requirements and constraints on behavior
TCAS Hazards:
1. A near mid-air collision (NMAC): Two controlled aircraft
violate minimum separation standards)
2. A controlled maneuver into ground
3. Loss of control of aircraft
4. Interference with other safety-related aircraft systems
5. Interference with the ground-based ATC system
6. Interference with ATC safety-related advisory
System Safety Design Constraints:
–
–
–
TCAS must not cause or contribute to an NMAC
TCAS must not cause or contribute to a controlled
maneuver into the ground
…
28
© Copyright Nancy Leveson, Aug. 2006
Step 2: Define basic control structure
29
© Copyright Nancy Leveson, Aug. 2006
Component Responsibilities
TCAS:
• Receive and update information about its own and other aircraft
• Analyze information received and provide pilot with
– Information about where other aircraft in the vicinity are
located
– An escape maneuver to avoid potential NMAC threats
Pilot
• Maintain separation between own and other aircraft using visual
scanning
• Monitor TCAS displays and implement TCAS escape maneuvers
• Follow ATC advisories
Air Traffic Controller
• Maintain separation between aircraft in controlled airspace by
providing advisories (control action) for pilot to follow
30
© Copyright Nancy Leveson, Aug. 2006
Aircraft components (e.g., transponders, antennas)
• Execute control maneuvers
• Receive and send messages to/from aircraft
• Etc.
Airline Operations Management
• Provide procedures for using TCAS and following TCAS advisories
• Train pilots
• Audit pilot performance
Air Traffic Control Operations Management
• Provide procedures
• Train controllers,
• Audit performance of controllers
• Audit performance of overall collision avoidance system
31
© Copyright Nancy Leveson, Aug. 2006
For the NMAC hazard:
TCAS:
1. The aircraft are on a near collision course and TCAS does not
provide an RA
2. The aircraft are in close proximity and TCAS provides an RA that
degrades vertical separation.
3. The aircraft are on a near collision course and TCAS provides an
RA too late to avoid an NMAC
4. TCAS removes an RA too soon
Pilot:
1. The pilot does not follow the resolution advisory provided by TCAS
(does not respond to the RA)
2. The pilot incorrectly executes the TCAS resolution advisory.
3. The pilot applies the RA but too late to avoid the NMAC
4. The pilot stops the RA maneuver too soon.
32
© Copyright Nancy Leveson, Aug. 2006
Step 3b: Use identified inadequate control
actions to refine system safety design
constraints
• When two aircraft are on a collision course, TCAS must always
provide an RA to avoid the collision
• TCAS must not provide RAs that degrades vertical separation
• …
• The pilot must always follow the RA provided by TCAS
• …
33
© Copyright Nancy Leveson, Aug. 2006
Step 4: Determine how potentially hazardous
control actions could occur (scenarios of how
constraints can be violated). Eliminate from design
or control in design or operations.
Step4a: Augment control structure with process models for each control
component.
Step4b: For each of inadequate control actions, examine parts of
control loop to see if could cause it.
Guided by a set of generic control loop flaws
Step 4c: Design controls and mitigation measures
Step4d: Consider how designed controls could degrade over time.
34
© Copyright Nancy Leveson, Aug. 2006
35
© Copyright Nancy Leveson, Aug. 2006
TCAS does not provide an RA when required to avoid an NMAC
- Unit is not operational
--Pilot does not turn it on
-- Self-monitor turns off TCAS unit
-- Component failure
- TCAS does not perceive a conflict
-- Current location of aircraft is incorrect
TCAS thinks other aircraft is on the ground
Incorrect altitude provided to TCAS
Uneven terrain
TCAS puts other aircraft outside protected volume
-- Location of own aircraft incorrect
Altimeter error
Delay in receipt of information about altitude change
36
© Copyright Nancy Leveson, Aug. 2006
Comparison with Traditional HA
Techniques
• Top-down (vs. bottom-up like FMECA)
• Considers more than just component failure and failure events
(includes these but more general)
• Guidance in doing analysis (vs. FTA)
• Handles dysfunctional interactions and system accidents,
software, management, etc.
37
© Copyright Nancy Leveson, Aug. 2006
Comparisons (2)
• Concrete model (not just in head)
– Not physical structure (HAZOP) but control (functional)
structure
– General model of inadequate control (based on control
theory)
• HAZOP guidewords based on model of accidents being
caused by deviations in system variables
• Includes HAZOP model but more general
• Compared with TCAS II Fault Tree (MITRE)
STPA results more comprehensive
Included Ueberlingen accident
38
© Copyright Nancy Leveson, Aug. 2006
Ballistic Missile Defense System (BMDS)
Non-Advocate Safety Assessment using STPA
• A layered defense to defeat all ranges of threats in all
phases of flight (boost, mid-course, and terminal)
• Made up of many existing systems (BMDS Element)
–
–
–
–
Early warning radars
Aegis
Ground-Based Midcourse Defense (GMD)
Command and Control Battle Management and
Communications (C2BMC)
– Others
• MDA used STPA to evaluate the residual safety risk of
inadvertent launch prior to deployment and test
39
Results
• Deployment and testing held up for 6 months because so
many scenarios identified for inadvertent launch. In many of
these scenarios:
– All components were operating exactly as intended
– Complexity of component interactions led to unanticipated
system behavior
• STPA also identified component failures that could cause
inadequate control (most analysis techniques consider only
these failure events)
• As changes are made to the system, the differences are
assessed by updating the control structure diagrams and
assessment analysis templates.
• Adopted as primary safety approach for BMDS
40
In-Class STPA
Subway Train Doors
Train Doors
• What are is the system goal(s)?
• What are the accidents?
• What are the hazards?
• Translate the hazards into safety constraints
42
43
What to do for Train Doors Exercise
• To Do in your groups or individually, your choice.
• See slide 10, “How to do STPA” and follow that to do a STPA hazard
analysis on an existing train door design you are familiar with. (feel
free to add design changes that are interesting to you)
• Be sure to include:
– Control structure and control loops,
– process models (the controller’s model of what the
system/process is doing),
– The expected control inputs, measurements, sensors etc that
you find to be relevant as you are going through the STPA
process.
– Inadequate control actions (slide 5)
– Control flaws and inadequate control executions. (slides 6-9)
44
What to do for Train Doors Exercise
Once you’ve found the inadequate control actions and
related control flaws and inadequate control executions,
Identify
– new safety constraints on the system and
– new design decisions to enforce the safety constraints
(and prevent inadequate control)
• We’ll talk about the Train Doors STPA in class next
week. No need to turn any papers, but bring what
you’ve done so we can go over it as a group.
• Feel free to contact me with questions. Maggie
Stringfellow: [email protected]
45

STPA: A New Technique for Hazard Analysis Based on STAMP

Transcript STPA: A New Technique for Hazard Analysis Based on STAMP

Directory