Human Error From Taking Risk to Running Risk

Download Report

Transcript Human Error From Taking Risk to Running Risk

Human Error From Taking Risk to Running Risk

Prof Patrick Hudson Centre for Safety Studies Department of Psychology Leiden University

Introduction - Structure

• Two Types of Risk • Case studies – Piper Alpha & Herald of Free Enterprise • Human Error • The Organisational Accident Model • Examining the sources of risks • Case study DAL 39 • Solutions to human error • What to look for • Conclusion

Where am I coming from?

• Psychology – Why do people do what they do?

• Human error – How can people get things so wrong?

• Oil and Gas industry, Aviation & Medicine – Extremely high hazard industries • The organisational model of accidents – Reason’s Swiss Cheese Model

What is safety all about?

• Preventing harm to people • Safeguarding assets • Protecting environment • Preserving reputation • If things didn’t go wrong it would be easy • Safety and profits are about risk management

Managing risks

• Safety is about managing risks to people, the environment etc - what risks do you take?

• The alternative is to run the risks and hope for the best - can we run the risks?

• What happens to companies that run risks?

– The best make profits, the worst go bankrupt • So, we need to have a risk management process - we need understanding of the types of risk and where they come from

Risks

• We can distinguish two ways to approach risk • We take a risk – We can decide the return is worth it • We run a risk – We can become victims if things go wrong • People who take risks are not always the same as those who run them

Case Study Piper Alpha

• A major disaster • Changed the way Oil and Gas industry operates • Created the requirements for Safety Management Systems and Safety Cases to be ‘living system’ and ‘living document’ • Had legal effects as far as Australia

Piper Alpha

Piper Alpha Disaster

• In July 1987 the Piper Alpha platform was destroyed with 167 fatalities • The immediate cause was leaking gas condensate • The disaster was made worse by a total failure of defences • By 1990 Occidental was out of business in the UK

The next morning

Why do accidents happen?

• Accidents are quite infrequent • An accident is often seen as being caused by one or more individuals • But --- • In Piper alpha the major problems were the platform design and the permit to work system • Piper Alpha had also been audited and passed by the regulator 7 days earlier

What were the risks?

• Many people died because they followed procedures • The platform management failed to provide a safe workplace • The regulator had failed to audit the system

Case Study Herald of Free Enterprise

• Herald of Free Enterprise sank outside Zeebrugge harbour • The Assistant Bosun was asleep • The bow doors were still open • 186 people died

Herald of Free Enterprise

TRIMMING PROBLEM SHIP HEAD DOWN MANAGEMENT HIGH BOW WAVE 15 MINUTES EARLIER 5 MINUTES LATE NO CHECKING SYSTEM ACCELERATION CHIEF OFFICER LEAVES G-DECK MASTER ASSUMES SHIP READY CAPSIZE DOOR PROBLEM LOADING OFFICER ASSISTANT BOSUN BOSUN ASSISTANT BOSUN ASLEEP NO INDICATION DOORS OPEN

Herald Analysis

• The assistant bosun was overworked • The masters had asked for indicators • The management had refused on grounds of cost • A Townsend Thoreson vessel left Dover with the bow doors open the next day!

Active vs Latent Failures

• Analysis of disasters indicates the need to distinguish two types of human failure • Active Failures - Errors and violations that impact directly on the system and victims • Latent Failures happen - Accidents waiting to

From Error to Underlying Cause

Latent Conditions

Slips

Unintended Actions

Planning Design Procedures Lapses

Unsafe Acts

Active Errors

Decisions

Mistakes

Intended Actions

Violations Training Planning Communication Accountability

Latent Conditions

Types of risk

• The individuals making the active failures are frequently running the risks • Those accepting the latent failures are those who have taken the original risk • They expect that all will go well • Weaknesses in the system allow problems to happen • The unsafe acts of individuals are the obvious human errors - running risks

The Causes of Incidents

• Triggers • Defences • Unsafe Acts • Preconditions • Underlying Causes • Decisions made Immediate Causes Underlying Causes

Why do Accidents Happen?

• Equipment – Breakdowns – Doesn’t work • People – Incompetence – Sloppiness – Risk Taking • Organisation – Allowing failures to propagate – Accidents waiting to happen

Latent Conditions = Underlying Causes

• Latent Conditions represent accidents waiting to happen • Many problems are to be found. E.g.: – Poor procedures (Incorrect, unknown, out of date) – Bad design accepted – Commercial pressures not well balanced – Organisation incapable of supporting operation – Maintenance poorly scheduled • Latent conditions make errors more likely or the consequences worse • Individuals are the recipients of somebody else’s problems • Taking a risk involves accepting latent conditions, running the risk involves becoming a recipient of those problems

Classifying Latent Conditions

• We can group underlying causes Whys • Hows refer to the immediate causes • Underlying causes refer to the organisational level • Concentrating on why means we no longer concentrate upon individuals • The categories are dependent upon what you are going to do with the information

Preconditions

• The reasons why an individual or group may make an error • Preconditions influence the probability • There are few effects of individual differences (accident proneness does not exist) • Preconditions that induce or make errors more likely are the result of (failure to) control • The question is: Why are the preconditions for error present?

Preconditions II

• Haste • Ignorance • Design • Unusual situations • Fatigue • Habit • “Strong but Wrong” • These are the symptoms of s deeper problem

Accident Causation Model

Fallible Decisions Latent Conditions Preconditions Unsafe Acts Defences Local triggers Environmental conditions

Reason’s Swiss cheese model of accident causation

Some holes due to active failures Hazards Losses Other holes due to latent conditions Successive layers of defences, barriers, & safeguards

HSE Management

Hazard/ Risk Taking risks Barriers or Controls WORK Running risks Undesirable outcome

H A Z A R D

Shell’s Bow-tie Concept

Events and Circumstances Harm to people and damage to assets or environment BARRIERS Undesirable event with potential for harm or damage Engineering activities Maintenance activities Operations activities C O N S E Q U E N C E S

Case Study DAL 39 Schiphol

• An example of multiple failures • The criminal appeal found that the 3 Air Traffic Controllers were guilty of an infringement • There was no punishment (so no further appeal) • Consider what the conventional and actual risks were • Would you have spotted these?

• Would they appear in a conventional risk analysis?

DAL 39

• A Delta 76 aborted take-off at Amsterdam Schiphol on discovering 747 being towed across the runway • Reduced visibility conditions (Phase - B) • The tower controller was in training, under the tower supervisor • There was another trainee and of the 11 people in the tower five were changing out to rest • The incident happened between the inbound and outbound morning peaks

DAL 39 continued

• The marshalling vehicle called in unexpectedly as Charlie-8 with a towed KLM 747 from a parking apron • Radio communications were unclear and C-8 did not state exactly where he was • C-8 was given clearance • The stopbar light control box confused everyone in the tower (it was a new addition) • The controller, thinking that the tow had crossed successfully, gave DAL 39 clearance • The DAL pilots saw the 747 and stopped in time

DAL 39 Initial Analysis

• Tow failed to report exact position or destination • Tow not announced in advance (as per procedures for phase B) • Assistant ATCo believed tow from right to left (did not know that a tunnel was in use) • Controllers completely unfamiliar with new control box • Ground radar pictures set up to cover different arrival and departure runways meant tow not visible on one screen • Controller was meshing the tow between both take-offs and landings • The tow, given clearance 1m 40 sec earlier, started off once the stopbars went out

yel low van + tow-truck + B747-400

Why did all this happen - 1?

• Tow was in violation, but this appears to be routine • No clear protocols for ground vehicles and no hazard analysis • Different language for aircraft (English) and ground vehicles (Dutch) • Poor quality of ground radio • Clearances appeared to be unlimited once given • Tower supervisor was also OTJ trainer in the middle of the rush hour • Altered control box not introduced to ATC staff

Why did all this happen - 2?

• No briefings about alterations at Schiphol (It has been a building site for years) • Too many trainees in the tower in rush hour under low visibility conditions • Differences in definition of low visibility between aerodrome and ATC • No management apparent of the change in use of the S-Apron • No operational audits by LVNL or Schiphol, of practice as opposed to paper • Schiphol designed requiring crossing and the use of multiple runways for noise abatement reasons

The DAL 39 event scenario

Pilots see 747 and abort take-off Routine violation of tow procedures Tunnel brought into use without briefings Airport structure Airport decides to change airport structure Controller gives clearance without assurance of tow position Tower combining training and operations during difficult periods

How can we manage errors?

• Risks refer to things that can go wrong • Errors represent ways in which people can fail to control the hazards • An inspector/auditor should be looking at two levels – Are the standards being adhered to?

– Are the standards appropriate?

– Have any hazards been missed or managed ineffectively?

Safety Management Cycle

Leadership and Commi tment Policy and Strategic Objectives PLAN Organisation, Responsibilities Resources, Standards & Documentation DO Hazards and Effects Management Planning and Procedures Implementation FEEDBACK Corrective Action Monitoring CHECK Audit Management Review Corrective Action and Improvement Corrective Action And Improvement

Identify

Error Management

Avoid

Reduce Support Check Learn

Error management and inspection

• We can uncover problems from a wide range of sources of information – Accidents – Near misses – History – Brainstorming • We can see if the best control methods are being applied • If we leave everything to the individual we have already created major problems

Error Management II

Identify What Why How Who Where When

Avoid

What Why Reduce What Why How Support How Who Where When Check Who Where When Learn What Why How Who Where When

What happened here?

Safety Management and Safety Culture

• The level of safety management is a function of the organisational safety culture • Individuals may do their best, but that may not be enough • Is the organisation organised and systematic?

• Are they satisfied with their performance, or do they feel they could do better?

The Evolution of Safety Culture

Increasing Informedness GENERATIVE safety is how we do business round here PROACTIVE we work on the problems that we still find CALCULATIVE we have systems in place to manage all hazards REACTIVE Safety is important, we do a lot every time we have an accident PATHOLOGICAL who cares as long as we’re not caught Increasing Trust & Accountability

The Edge

The Edge Normally Safe

No need

Inherently Safe 6% 10% Normally Safe 15%

Safety Management Systems Safety Culture

The Edge

Return on Capital Invested

Conclusion

• When analysing risks you have to consider the whole range – From decisions to operate etc in certain ways – To decisions to act in certain ways • When inspecting you have to examine the context, including yourself