Risk - Flight Safety Foundation

Download Report

Transcript Risk - Flight Safety Foundation

Risk management tools
Patrick Hudson
Tim Hudson
Hudson Global Consulting
How can we manage risk?
• We can manage risk by hoping it won’t happen
• We can manage risk by offering sacrifices to the
Gods
• We can manage risk by understanding what we
are doing
• The first two don’t work
• The third is what a Safety Management System
does
Risk
• Risk is a complex concept
• Combination of to different components
– RISK = Outcome x Probability of that outcome
• Outcomes – what could happen
– Usually seen as a scenario
– Worst case - conservative
– Most credible worst case
• Probability of those outcomes
–
–
–
–
Often measured as frequency of occurrence
Needs to be applied before anything has gone wrong
Probabilities are difficult to estimate
Knowing the probability may change its value
Session 16
Building World Class SMS
There is more to an SMS than lots of good intentions
No Structure
Organizati
on
Structure
TRIPOD
Road
Safety
Plan
Unsafe
Act
Audit
Alcohol
&
Drugs
Policy
Structure
HSE
Policy
Audit
Plans
safety management system
HAZARDS
&
Continuous
Improvement
EFFECTS
MGMT.
Health
Risk
Assess.
HSE
Objectives
Incident Plan
Targets
EA
Potential
Matrix
Plan
Feedback
Check
Do
Engage
Production
Safety Management System (SMS)
Better defenses
converted to increased
production
Protection
Safety Management System (SMS)
Production
Best practice
operations
under SMS
Protection
Generic HSE Management System (Shell)
1- Leadership and Commitment
2 - Policy and Strategic
Objectives
PLAN
3 - Organisation,
Responsibilities
Resources and Standards
4 - Hazards & Effects Mgt
(Risk Mgt)
DO
5 - Planning & Procedures
6 – Implementation,
FEEDBACK
Corrective Action
Monitoring
CHECK
7 - Audit
8 - Management Review
Corrective Action
Corrective Action
Hazard-based approach
HEMP - Hazard and Effects Management Process
Identify - What are the hazards?
Assess - how big are those hazards?
Control - how do we control the hazards?
Recover - what if it still goes wrong?
Step 1. Identification
• First identify your hazards
– What is going to hurt you?
– Needs to be specific enough to manage practically
• E.g. not just potential and kinetic energy
– General enough to manage specifics in the same way
– Accumulate in a list – Hazard Register
• A range of tools and methods help here
– Brainstorming - proactive
– HAZID
– Incident analyses - reactive
• Reporting
Step 2. Assess
•
•
•
•
•
•
How big is the risk you are taking and running?
A wide range of tools available
Not an exact science – whatever anyone tells you
Small risks can be ignored
Large risks may not be taken
Usually framed in terms of ALARP
– As Low as Reasonably Practicable
– Not intended to be as low as possible
• Risk assessment should point to what to do about
the hazard in question
Step 3. Manage and control
• Primarily preventative
• Success is measured by nothing going wrong
• Prevention involves a variety of approaches
– Use of the hierarchy of controls
– Barriers to keep hazards in place
– Controls to prevent them escaping
• Management is directly responsibility for the provision of
controls and barriers
– Requires resourcing, procurement and continuous evaluation
• Front line personnel is responsible for their use once
provided and supported
– Requires ability to operate the controls and barriers
Step 4. Recovery
• Recovery is necessary after control over a
hazardous process has been lost
• But before the worst case consequences have
been achieved
• Recovery controls and barriers are reactive
• The term Mitigation applies best here
• These controls are usually much more expensive
than preventative controls
• Sometimes challenged because “We’ve never
used that so we can get rid of it and save money”
Tools
•
Risk management tools are intended to help one or more of the 4 steps
– Usually applied continuously to improve
– Especially on the feedback loops
•
•
•
•
•
•
•
•
•
•
Audits
Incident investigations
Reporting
Performance assessment for predictive improvement
Identify – discover unexpected hazards
Assess – evaluate what needs to be done
Control – systematically list the controls to see if they are adequate to reduce the
risk to acceptable levels
Recover – identify what will reduce the consequences
Successful risk management allows us to take the risks that enable us to get the
benefits without disaster
These can easily be mapped onto the ICAO components
– Not just the risk management elements
– Also all the other elements
Minimising Regret
Maximising Opportunity
Regret
Go
No-Go
Incident
Missed
Opportunity
No Regret
Normal
Operations
Safe
Risk Assessment Matrices
• A simple way of supporting the product of
outcome and probability
• Not a discrete set of values, but an easy way
of representing the distributions of severity of
outcomes and their probabilities
• So – there is no single CORRECT Matrix
Risk Assessment Matrix
C onsequ ence
Ratin g
People
0
No injury
1
Slight
injury
Minor
injury
Major
injury
Single
fatality
Multiple
fatality
2
3
4
5
A ssets
Environ
ment
No
No effects
damage
Slight
Slight
damage
effect
Minor
Minor
damage
effect
Local
Localised
damage
effect
Major
Major
damage
effect
Extensiv e Massive
damage
effect
A
Incre asin g Probabi lity
B
C
D
Never
heard of in
industry
Incident
heard of in
industry
Incident
heard of in
company
Low
Risk
Low
Risk
Low
Risk
E
Low
Risk
Incident
happens
several
times per
year in
company
Low
Risk
Incident
happens
several
times per
year in a
location
Low
Risk
Low
Risk
Low
Risk
Med/low
Risk
Med/low
Risk
Med/low
Risk
Med/low
Risk
Med/low
Risk
Med/low
Risk
Med/low
Risk
Med/low
Risk
Med/low
Risk
Medium
Risk
Medium
Risk
Medium
Risk
Medium
Risk
Medium
Risk
Medium
Risk
High
Risk
High
Risk
High
Risk
High
Risk
High
Risk
High
Risk
High
Risk
The colour determines the level of active risk management required
Risk Calculations
0
1
2
3
4
10
11
After
5
6
8
Now
7
9
12
13
Reduced exposure Left side
14
Mitigation
Right side
Risk matrix alternative
0
2
2
4
4
5
8
12
15
28
8
20
40
100
200
Mitigation
Right side
Reduced exposure Left side
The numbers are a reflection of how unacceptable the matrix cell is
What is ALARP?
ALARP = As Low As Reasonably Practical
120
100
Risk to
stakeholders
Risk
80
Cost
60
40
Legal mimimum
requirements
20
0
1
2
3
4
Options
5
6
How can we understand our controls?
• The Bowtie is an industry standard in many highhazard activities
• Bowties cover both control and recovery
• Bowties are not primarily intended to be
quantitative, but can be computed with
• Bowties visually express the extent and types of
control and are easy for managers to understand
– Is everything procedural
– Does one person have to do everything
Bow-tie Concept
Events and
Circumstances
CONTROLS
H
A
Z
A
R
D
Harm to people and
damage to assets
or environment
Undesirable event with
potential for harm or damage
Engineering activities
Maintenance activities
Operations activities
C
O
N
S
E
Q
U
E
N
C
E
S
Bow-tie Concept
for a specific event
Events and
Circumstances
RISK CONTROLS
H
A
Z
A
R
D
Harm to people and
damage to assets
or environment
Undesirable event with
potential for harm or damage
Engineering activities
Maintenance activities
Operations activities
C
O
N
S
E
Q
U
E
N
C
E
S
A problem for aviation
• Simple models have difficulty in capturing
recent major commercial aviation incidents
• Asiana 214, QF 32, AF 447, BA 38
A Diversion - Causality
• Simple accidents are simply caused
– Linear and deterministic
• Complex accidents are more complex
• 80-20 rule suggests simple accidents are 80%
• Remaining 20% require us to recognize
complexity
Theory 1 - how accidents are caused
• Linear causes – A causes B causes C
• Deterministic - either it is a cause or it isn’t
• We can compute both backwards and forwards
• People are seen as the problem – human error etc
• Probably good enough to catch 80% of the accidents we are
likely to have
• Covers most of private and GA operators
Private users
Theory 2 - how accidents are caused
• Non-Linear causes
– Cause and consequence may be disproportionate
– These causes are organizational, not individual
• Deterministic dynamics- either it is a cause or it isn’t
• We can compute both backwards and forwards
– Increasingly difficult with non-linear causes
• This is the Organizational Accident Model
• Probably good enough to catch 80% of the residual accidents = 96%
• Probably best GA and professional operations
Oilfield operations
Non-linearity
• The size of an effect (consequence) is linearly
proportional to the input – linearity
• Non-linearity is different
– The size of an effect (bad consequences) gets bigger (or
smaller after a while) as a function of the input
– The improvement in performance gets smaller (almost
always) even though the input gets bigger
• Linearity works fine to start with, but only 80% of
the cases
Linear and non-linear functions
Linear
Non-linear
Effect
Effect
Cause
Cause
Suddenly gets a lot worse
More non-linear functions
Non-linear
Non-linear
Effect
Effect
Cause
It can’t get much worse
Cause
Both – starts bad, tails off
Determinism
• A Causes B
• If A happens, then B will happen next
Non-determinism
• Move from A causes B to A makes B more
likely
• Causation is probabilistic
• Probabilities are distributions, not points
Conditionalize on latest aircraft generation
Types of accidents
• Theory 1
• Simple models may cover 80% of all accidents
• These are the simple personal accidents
• Theory 2
• The next step gets 80% of the remainder = 96%
• These are the complex personal accidents and some
organizational accidents
• Theory 3
• The probabilistic approach may net the next 80% = 99.2%
• These are the complex process accidents
Theory 3 - how accidents are caused
•
Non-Linear causes
•
Non-Deterministic dynamics
– Probabilistic rather than specific
– Influences on outcomes by people and the organisation
•
•
Probabilities may be distributions rather that single values
We cannot compute both backwards and forwards
•
The dominant accidents that remain are WEIRD
– WILDLY
– ERRATIC
– INCIDENTS
– RESULTING IN
– DISASTER
•
Prior to an event there may be a multitude of possible future outcomes
Unusual or WEIRD Accidents
• In commercial aviation major accidents are
now extremely rare
• Simple risk assessment and analysis models
often fail to capture how these accidents are
caused
• We need to understand our risk space better
• The Rule of Three is an example of how to do
this
The Rule of Three
•
•
•
•
Accidents have many causes (50+)
A number of dimensions were marginal
Marginal conditions score as Orange
NO-Go conditions score as Red
• The Rule of 3 is Three Oranges = Red
Aircraft Operation Dimensions
•
•
•
•
•
•
•
Crew Factors Experience, Duty time, CRM
Aircraft Perf. Category, Aids, Fuel, ADDs
Weather Cloud base, wind, density alt, icing, wind
Airfield Nav Aids, ATC, Dimensions, Topography
Environment Night/day, Traffic, en route situation
Plan Change, Adequacy, Pressures, Timing
Platform Design, Stability, Management
The Rule of Three
Crash
Big Sky
Outcome
We fixed it
Problem
No problem
1/2
1 1/2
No of Oranges
2 1/2
3 1/2
Why does the rule work?
• People use cognitive capacity to allow for
increasing risk
• As the oranges increase the remaining
available capacity is reduced
• At 3 oranges there is little available capacity
remaining
• Any trigger can de-stabilize the system
• An accident suddenly becomes very likely
How random numbers combine
Load > strength
Normal upper limit
Normal lower limit
The danger zone/safe zone – safe
operating envelope concept
Normal path
blocked by
uncommon
circumstance
Normal path through
the safe field
Known dangerzone
Defined
Operational
Boundary
Unknown dangerzone
(swiss cheese defect)
Enter unknown dangerzone
Risk
• Risk is a complex concept
• Classically probability x outcome
• Safety management is about:
– Taking risk – acceptable (ALOS) vs unacceptable
– Running risk – getting away with it
– Can be based on luck or on professionalism
• The granularity of the outcomes and how they
can be reached is essential
• Most approaches are crude
– Salami slicing is a way to evade regulation
Risk Space
High Risk areas
Low risk/resilient areas
Single distribution A
Known
danger
zone
Single distribution
Known
danger
zone
B
Single distribution C
Known
danger
zone
Known
danger
zones
Combined distribution (A,B,C)
Combined distribution (A,B,C)
Known
Known
danger
zones
danger
zone
Combined distribution (A,B,C)
Known
Known
danger
zones
danger
zone
Unexpected
danger
zone
Simple view of combined distribution
Simple view of combined distribution
Low average risk
despite danger
zone
Simple view of combined distribution
Medium average
risk despite
danger zone
Simple view of combined distribution
High average
risk due to
sufficient
granularity
Mission Creep and Drift into Danger
• Success with risks makes people willing to
accept greater risks
– This is a consequence of risk homeostasis
• This can look like complacency, but is a natural
consequence of their successes, so far
• Failure to understand the finer detail of the
risk space makes this drift into danger more
likely
Conclusion
• Conventional risk assessment involves
uncovering the potential for bad
consequences
• Modern commercial aviation is very safe, so
the accidents we wish to avoid may not be
caught by standard techniques
• Advanced risk analysis involves increasing our
understanding of the risk space we operate in