Future Trends in Process Safety

Download Report

Transcript Future Trends in Process Safety

Future Trends in
Process Safety
Prof. Nancy Leveson
Engineering Systems
Aeronautics and Astronautics
MIT
– You’ve carefully thought out all the angles
– You’ve done it a thousand times
– It comes naturally to you
– You know what you’re doing, it’s what you’ve
been trained to do your whole life.
– Nothing could possibly go wrong, right?
Think Again
Topics
• Lessons from Texas City
• New factors in process accidents
• Safety as a control problem
• Conclusions
Leadership
• Safety requires passionate and effective leadership
• Tone is set at the top of the organization
• Not just sloganeering but real commitment
• Setting priorities
– Adequate resources assigned
– A designated, high-ranking leader
• Safety and productivity are not conflicting if take a
long-term view
Managing and Controlling Safety
• Need clear definition of expectations, responsibilities,
authority, and accountability at all levels of safety control
structure
• Entire control structure must together enforce the system
safety property
• Unsafe changes must be eliminated or controlled
through system design or detected and fixed before they
lead to an accident.
– Planned changes (MOC process)
– Unplanned changes
Visibility and Communication
• Downward and upward communication
– Requires a positive, open, trusting environment
– Need effective measurement and monitoring of
process safety performance (e.g., injury rates are not
useful and are misleading)
• Avoid “culture of denial”
• If managers do not want to hear, people stop talking
Information and Appropriate Feedback
• Good accident/incident investigation and follow
through
– Identification and correction of systemic causal
factors.
– Ensuring thorough reporting of incidents and near
misses
• Thorough hazard identification, analysis, and control
• Effective process safety audit system to ensure
adequate process safety performance
Oversight and Control
• Results of operating experience, process hazard
analyses, audits, near misses, or accident
investigations must be used to improve process
operations and process safety management system.
• Address promptly and track to completion the
deficiencies found during assessments, audits,
inspections and incident investigation.
Fumbling for his recline button Ted
unwittingly instigates a disaster
Process Safety vs. Personal Safety
• All behavior influenced by context in which it occurs
– Both physical and social context
– Personal safety focuses on changing individual
behavior
– Process (system) safety focuses on design of system
in which behavior occurs
• To understand why process accidents occur and to
prevent them, need to:
– Understand current context (system design)
– Create a design that effectively ensures safety
The Enemies of Safety
• Complacency
• Arrogance
• Ignorance
Factors in Complacency
• Discounting risk
• Over-relying on redundancy
• Unrealistic risk assessment
• Ignoring low-probability, high-consequence events
• Assuming risk decreases over time
• Ignoring warning signs
Topics
• Lessons from Texas City
• New factors in process accidents
– New technology
– System accidents
– New types of human error
• Safety as a control problem
• Conclusions
Accident with No Component Failures
Types of Accidents
• Component Failure Accidents
– Single or multiple component failures
– Usually assume random failure
• System Accidents
– Arise in interactions among components
– Related to interactive complexity and tight coupling
– Exacerbated by introduction of computers and
software
Safety vs. Reliability
• Safety and reliability are NOT the same
– Sometimes increasing one can even decrease the
other.
– Making all the components highly reliable will have no
impact on system accidents.
• For relatively simple, electro-mechanical systems
with primarily component failure accidents, reliability
engineering can increase safety
• For complex systems, need something more
Humans in Process Safety
• Usually define human error as deviation from normative
procedures, but operators always deviate from standard
procedures
– Normative vs. effective procedures
– Sometimes violation of rules has prevented accidents
• Cannot effectively model human behavior by
decomposing it into individual decisions and acts and
studying it in isolation from
– Physical and social context
– Value system in which takes place
– Dynamic work process
• Less successful actions are natural part of search by
operators for optimal performance
New Operator Roles and Errors
• High tech automation changing cognitive demands on
operators
–
–
–
–
Supervising rather than directly monitoring
Doing more cognitively complex decision-making
Dealing with complex, mode-rich systems
Increasing need for cooperation and communication
• Human-factors experts complaining about technologycentered automation
– Designers focus on technical issues, not on supporting
operator tasks
– Leads to “clumsy” automation
• Errors are changing, e.g., errors of omission vs.
commission
Impacts on System Design
• Design for error tolerance
• Alarm management (managing by exception)
• Matching tasks to human characteristics
• Design to reduce human errors
• Providing information and feedback
• Training and maintaining skills
Topics
• Lessons from Texas City
• New factors in process accidents
• Safety as a control problem
– New approaches to hazard analysis
– Design for safety
– Risk analysis and management
• Conclusions
STAMP: A System’s Model of
Accident Causality
• Systems-Theoretic Accident Model and Processes
– Safety treated as a control problem, not a “failure”
problem
– Accidents are not simply an event or chain of
events
• Involve a complex, dynamic process
• Arise from interactions among humans,
machines and the environment
A Broad View of “Control”
• Does not imply need for a “controller”
Component failures and dysfunctional interactions may
be “controlled” through design
(e.g., redundancy, interlocks, fail-safe design)
or through process
• Manufacturing processes and procedures
• Maintenance processes
• Operations
• Does imply the need to enforce safety constraints in
some way
STAMP (2)
• Safety is an emergent property that arises when system
components interact with each other within a larger
environment
– A set of safety constraints related to behavior of
system components enforces that property
– Accidents occur when interactions among system
components violate those constraints
– Goal of process (system) safety engineering is to
identify the safety constraints and enforce them in the
system design
Example Safety Constraints
•
Build safety in by enforcing constraints on behavior
Controller contributes to accidents not by “failing” but by:
1. Not enforcing safety-related constraints on behavior
2. Commanding behavior that violates safety constraints
System Safety Constraint:
Water must be flowing into reflux condenser whenever catalyst is
added to reactor
Software (Controller) Safety Constraint:
Software must always open water valve before catalyst valve
STAMP (3)
• Systems are not static
– A socio-technical system is a dynamic process
continually adapting to achieve its ends and to react
to changes in itself and its environment
– Systems and organizations migrate toward accidents
(states of high risk) under cost and productivity
pressures in an aggressive, competitive environment
– Preventing accidents requires designing a control
structure to enforce constraints on system behavior
and adaptation that ensures safety
Example
Control
Structure
Controlling and managing dynamic
systems requires visibility and feedback
Controller
Model of
Process
Control
Actions
Feedback
Controlled Process
Relationship Between Safety and
Process Models
• Accidents occur when models do not match process
and
– Incorrect control commands given
– Correct ones not given
– Correct commands given at wrong time (too early, too
late)
– Control stops too soon
(Note the relationship to system accidents)
Relationship Between Safety and
Process Models (2)
• How do they become inconsistent?
–
–
–
–
Wrong from beginning
Missing or incorrect feedback
Not updated correctly
Time lags not accounted for
Resulting in
Uncontrolled disturbances
Unhandled process states
Inadvertently commanding system into a hazardous state
Unhandled or incorrectly handled system component
failures
Modeling Accidents Using STAMP
Two types of models are used:
1. Static safety control structure
2. Behavioral dynamics (system dynamics)
Dynamic processes behind change in the safety
control structure, i.e., why it may change (e.g.,
degrade) over time
Simplified System Dynamics Model of Columbia Accident
Uses for STAMP
• Basis for new, more powerful hazard analysis techniques
(STPA)
• Safety-driven design
• More comprehensive accident/incident investigation and root
cause analysis
• Organizational and cultural risk analysis
– Defining safety metrics and performance audits
– Designing and evaluating potential policy and structural improvements
– Identifying leading indicators of increasing risk (“canary in the coal mine”)
• New risk management tools
• New holistic approaches to security
STAMP-Based Hazard Analysis (STPA)
• Supports a safety-driven design process where
– Hazard analysis influences and shapes early design
decisions
– Hazard analysis iterated and refined as design evolves
• Goals (same as any hazard analysis)
– Identification of system hazards and related safety
constraints necessary to ensure acceptable risk
– Accumulation of information about how hazards can be
violated, which is used to eliminate, reduce and control
hazards in system design, development, manufacturing,
and operations
STPA (2)
• STPA process
– Starts with identifying system requirements and
design constraints necessary to maintain safety.
– Then STPA assists in
• Top-down refinement into requirements and safety
constraints on individual components.
• Identifying scenarios in which safety constraints can be
violated.
• Using results to eliminate or control hazards in design,
operations, etc.
© Copyright Nancy Leveson, Aug. 2006
Comparison of STPA with Traditional
HA Techniques
• Top-down (vs bottom-up like FMECA)
• Considers more than just component failure and
failure events (includes these but more general)
• Guidance in doing analysis (vs. FTA)
• Handles dysfunctional interactions and system
accidents, software, management, etc.
Comparisons (2)
• Concrete model (not just in head)
– Not physical structure (HAZOP) but control (functional)
structure
– General model of inadequate control (based on control
theory)
• HAZOP guidewords based on model of accidents being
caused by deviations in system variables
• Includes HAZOP model but more general
• Fault trees concentrate on component failures, miss
system accidents
Risk Analysis and Risk Management
Effectiveness and Credibility of ITA
Time
System Technical Risk
Time
Identifying Lagging vs. Leading Indicators
Number of waivers issued good indicator for risk in Space Shuttle
operations but lags rapid increase in risk
1 Risk Units
400 Incidents
0.75 Risk Units
300 Incidents
0.5 Risk Units
200 Incidents
0.25 Risk Units
100 Incidents
0 Risk Units
0 Incidents
0
100
200
300
System Technical Risk : Unsuccessful ITA 0613
Outstanding Accumulated Waivers : Unsuccessful ITA 0613
400
500
600
Time Time
(Month)
700
800
900
1000
Risk Units
Incidents
No. of incidents under investigation a better leading indicator
Time
Managing Tradeoffs Among Risks
• Good risk management requires understanding
tradeoffs among
– Schedule
– Cost
– Performance
– Safety
Example: Schedule Pressure and Safety Priority
Relative Cost
Schedule Pressure
1.4
1.2
High
1
0.8
0.6
0.4
0.2
0
Low
High
1.4
1.2
1
Low
0.8
0.6
0.4
0.2
10
0
1.
2.
9
8
0
10
Schedule Pressure
7
6
5
2.5 Safety
4
3
5
2
Safety Priority
7.5
1
High
High
0
Low
Low
Priority
Overly aggressive schedule enforcement has little effect on
completion time (<2%) & cost, but has a large negative impact
on safety
Priority of safety activities has a large positive impact, including
a positive cost impact (less rework)
Conclusions
• Future needs for safety in the process industry:
– Differentiation between process safety and personal
(occupational) safety
– Improved safety culture management
– New approaches to handle
• Advanced technology (particularly digital technology)
• System accidents and complexity
• New types of human error
• Using a control-based (vs. failure-based) model of
causality expands our power to prevent process
accidents