Presentation on Alarm Management
Download
Report
Transcript Presentation on Alarm Management
ASM
Abnormal
Situation
Management
Defining the way things
will be.
The birth of ASM...
• ASM grew from an initial focus on alarm
management. Most sites are aware that operator
overload and alarm floods are common during
abnormal operations. As we analyzed the issues
around alarm management, we discovered that
operator problems with the alarm system were
only a symptom of a general issue:
– the design, implementation, and maintenance of
many facilities, systems, and practices.
ASM Consortium
Current Membership:
BRAD ADAMS WALK ER
A R C H I T E C T U R E, P. C.
University Affiliates
• Charter:
– Research the causes of
abnormal situations and
create technologies to address
this problem
• Deliverables:
– Technology, best practices,
application knowledge,
prototypes, metrics
• History:
– Started in 1994
– Co-funded by US Govt
(NIST)
– Budget: +$16M USD
• Current Status:
– Committed through 2002
– Honeywell leadership
– Expanding membership
Requirements for Safe Operation
• Hazards must be recognized and
Understood
• Equipment must be “fit for purpose”
• Systems and procedures to maintain
plant Integrity
• Competent staff
• Emergency Preparedness
• Monitor Performance
In the area of alarm management most companies fail to
meet these basic requirements for safe operation
Various cost elements
F uture upgrades (e.g.,
Advanc ed Control)
Theore tically possible; cu rrently unsustainable
Current Limit
Comfort Margin
Lost opportunity
(Cost of comfort)
Lost Profit
Theoretical Limit
Operating Target
Profit
Incident
Break-even
Lost Revenue
Additional
unplanned costs
Effic iency
Loss
Fixed Costs
(Idle Plant)
Shut down
Plant Perfor mance
Savings from reducing the comfort
margin
Accident
Equipment
damage, etc.
Losses due to
incidents, accidents
(about 10% of
operating costs)
A Look At Plant Operations
A typical Production
Profile for an Asset
Intensive Facility for
a calendar year.
95 days
79 days
62 days
47 days
23 days
Days per Year
30 days
16 days
8 days
5 days
< 60%
Daily Production
Production Target set by Enterprise
95%
100%
Factors Affecting Plant
Operations
Plant Operating Target
Planning Constraints
Plant Availability
Operational Constraints
Production
Effectiveness
Days per Year
Plant Incidents
Asset Utilization
Plant Capacity Limit
< 60%
Daily Production
95%
100%
Agility/Flexibility
Rate
620
6 20
6 10
6 00
5 90
5 80
5 70
5 60
5 50
5 40
5 30
5 20
5 10
5 00
4 90
4 80
183
180
177
174
172
169
166
163
160
157
154
151
595
590
584
578
572
567
561
555
549
543
538
532
526
520
3 0 0
610
600
590
580
570
560
550
540
530
520
510
500
490
480
4 70
4 60
148
145
515
509
503
497
492
486
480
474
468
463
457
Frequ en cy
470
4 50
4 40
142
F eed
460
450
440
4 30
4 20
18
16
14
12
10
8
6
4
2
0
430
139
200
4 10
136
133
130
3.2%
420
4 00
3 90
3 80
3 70
127
124
121
118
115
112
Frequency
2 0 0
410
400
390
380
370
15
3 60
3 50
3 40
3 30
3 20
3 10
3 00
2 90
2 80
# Day s
250
360
350
340
330
320
310
300
290
280
# Days
Real Life Examples
His to gr a m
2 5 0
1503
1 5 0
$ 2 4.2 M
24.2M
1 0 0
5 0
0
Ra te
10
$33.5 M
5
0
This plant had
$24.2M in lost
capacity due to
asset availability &
incidents!
300
Histogra m
150
5.8%
5.8%
100
50
This plant had
5.8% in lost
capacity!
0
Production rate
Total Feed
$38.5 M
This plant lost
$38.5M!
Ra te
20
Total Fe ed
And this plant
lost $33.5M!
Site Studies have identified Plant Lost
Opportunity
Between 3-15% in Lost
Capacity is attributed to asset
in-availability and incidents
Plant Operating Target
Planning Constraints
Plant Availability
Operational Constraints
Production
Management
Days per Year
Plant Incidents
NEW EMPHASIS!!
Asset Management
DCS/APC/
Optimization efforts
Reliability & CMMS
Plant Capacity Limit
< 60%
Daily Production
Manufacturing
Execution
Scheduling & ERP
95%
100%
Major Profit Potential
Emphasis on plant &
equipment reliability
improvements and reduced
incidents can result in a
recovery of 3-15% of
lost capacity!
Higher Plant Operating Target
Fewer Planning Constraints
Days per Year
Fewer Operational Constraints
Plant Capacity Limit
< 60%
Daily Production
95%
100%
The Importance of Alarm Management
Improvement Project
Alarm management is the proper
design, implementation,
operation, and maintenance
of industrial manufacturing
plant alarm systems.
Current alarming practices are leading to
Incidents
Major problem is:alarm flood
Standing Alarms
Poor Configuration of Alarms
Nuisance Alarms
Technology exists to significantly
contribute to effective alarm systems
and provide good Situation Awareness
Alarms identified as contribution
A Case
b
The lightning struck just before 9:00 AM on a Sunday. It
immediately started a fire in the crude distillation unit of the
refinery. The control operators on duty responded by calling out
the fire brigade, and then had to divert their attention to a growing
number of alarms while desperately trying to bring the crude unit
to a safe emergency shutdown.
Hydrocarbon flow was lost to the deethanizer in the FCCU recovery
section, which fed the debutanizer further along. The system was
arranged to prevent total loss of liquid level in the two vessels, so
the falling level in the deethanizer caused the deethanizer discharge
valve to close. This, in turn, caused the level in the debutanizer to
drop rapidly and its discharge valve also closed. Heat remained on
the debutanizer and the trapped liquid vaporized as the pressure
rose causing the pressure relief valve to “pop” (for the first of three
times) into the flare KO drum and then immediately onto the flare
itself.
continued
In a matter of minutes, the board operator was able to restore flow to
the deethanizer. This permitted the deethanizer discharge valve to
be opened, allowing renewed flow forward to the debutanizer. The
rising level in the debutanizer should have caused the debutanizer
discharge valve to open (by the level controller action) and allow
flow on to the naphtha splitter. Although the operators in the
control room received a signal indicating the valve had opened, the
debutanizer, nonetheless was filling rapidly with liquid while the
naphtha splitter was emptying. The operators were concentrating
on the displays which focussed on the problems with the
deethanizer and debutanizer, and had no overview of the process
available to indicate that even though the debutanizer discharge
valve registered as open, there was no flow going from the
debutanizer to the naphtha splitter.
b
Despite attempts to divert the excess, the debutanizer became
liquid-logged about an hour later and the pressure relief valve
lifted for the second time, venting to the flare via the flare KO
drum. Because there were enormous volumes of gas venting, the
level of liquid in the flare KO drum was rising to a very high
value.
About 2-1/2 hours later, the debutanizer vented to the flare a third
time AND CONTINUED VENTING FOR 36 MINUTES. The
high level alarm for the flare drum was activated at this time.
But with alarms going off every 2 to 3 seconds, there appears to
be no evidence that that alarm was ever seen. By this time, the
flare KO drum had filled with liquid well beyond its design
capacity. The fast-flowing gas through the overfilled drum
forced liquid out of the drum’s discharge pipe. The discharge
line was not designed for liquid, so the force of the liquid caused
a rupture at an elbow. This released over 20 tons of highly
flammable hydrocarbon.
continued
The ensuing release quickly formed an ominous
drifting cloud of vapor and droplets. In a matter
of minutes, this cloud found its ignition source
350 feet downwind. The resulting explosion was
heard 80 miles away. In the town nearest the
plant, few windows still held intact panes, so
overpowering was the pressure shock wave from
the blast. The last fires in the refinery were
eventually extinguished 2 days later.
end
Interface
between the
organization
& the individual
Management
Source
Failure
Types
Workplace
Functional
Failure
Types
Organization
Stylistic or Cultural
Condition
Tokens
Precursors
Unsafe Acts
Errors &
Violations
Individual
Indicators
General Failure
Types
Top Down:
Accidents
Commitment
Incidents
Competence
Near-Misses
Cognizance
1-10 hit list
data collected &
analyzed
Proactive Design
SI Projects
Poor workplace
design
High workload
Unsociable hours
Inadequate
training
Poor perception
of hazards
Alarms
Human Factors
Near miss
Auditing
Du Pont
Training
Workspace
Motivation
Attitude
Safety Information System
Diagnostic and
remedial measures
Best Practices
Control room
design
Group Factors
Working Practice
Various cost elements
F uture upgrades (e.g.,
Advanc ed Control)
Theore tically possible; cu rrently unsustainable
Current Limit
Comfort Margin
Lost opportunity
(Cost of comfort)
Lost Profit
Theoretical Limit
Operating Target
Profit
Incident
Break-even
Lost Revenue
Additional
unplanned costs
Effic iency
Loss
Fixed Costs
(Idle Plant)
Shut down
Plant Perfor mance
Savings from reducing the comfort
margin
Accident
Equipment
damage, etc.
Losses due to
incidents, accidents
(about 10% of
operating costs)
Managing Abnormal Situations
Anatomy of a Disaster from Operations Perspective
Operational
Modes:
Plant States:
Disaster
Critical
Systems:
Plant
Activities:
Area Emergency Response
System
Emergency
Accident
Operational
Goals:
Site Emergency Response
System
Minimize
Impact
Firefighting
First Aid
Rescue
Physical and Mechanical
Containment System
Out of
Control
Bring to
Safe State
Evacuation
Safety Shutdown,
Protective Systems,
Abnormal
Hardwired Emergency Alarms
Abnormal
Return to
Normal
Manual Control &
Troubleshooting
Keep Normal
Preventative
Monitoring &
Testing
DCS Alarm System
Decision Support System
Process Equipment,
Normal
Normal
DCS, Automatic Controls
Plant Management Systems
Days per Year
< 60%
620
6 20
6 10
6 00
5 90
5 80
5 70
5 60
5 50
5 40
5 30
5 20
5 10
5 00
4 90
4 80
4 70
595
590
584
578
572
567
561
555
549
543
538
532
3 0 0
610
600
590
580
570
560
550
540
530
520
510
500
490
480
4 60
526
Production rate
470
20
460
Total Fe ed
4 50
F eed
450
520
Frequ en cy
4 40
4 30
515
509
503
497
492
486
480
474
468
463
457
183
180
177
174
172
169
166
163
160
157
154
151
148
145
142
139
136
133
130
127
124
121
118
115
112
0
440
4 20
18
16
14
12
10
8
6
4
2
0
430
4 10
4 00
3 90
3 80
# Day s
Frequency
200
420
410
400
390
380
3 70
3 60
3 50
3 40
3 30
3 20
3 10
3 00
2 90
2 80
250
370
# Days
15
360
350
340
330
320
310
300
290
280
300
Unexpected Upsets Cost 3-8% of Capacity
3.2%
5.8%
Histogra m
150
100
50
2 5 0
2 0 0
$ 2 4.2 M
His to gr a m
1503
1 5 0
1 0 0
5 0
~ $10 Billion annually in lost production !
0
Total Feed
Ra te
$38.5 M
Ra te
Daily Production
Plant Operating Target
10
5
$33.5 M
0
Planning Constraints
Rate
Summarized Production Data
Operational Constraints
Optimization efforts
Plant Capacity Limit
95%
100%
Major Profit Potential
Higher Plant Operating Target
Fewer Planning Constraints
Fewer Operational Constraints
Days per Year
Focused efforts can
result in recovery of
3-8% of capacity
Plant Capacity Limit
< 60%
Daily Production
95%
~ $10 Billion potential to the bottom line!
100%
Timing diagram of DIN V 19251 as applicable
for a single channel SRS with ultimate self tests
executed within the PST
Failure Occurrence in the
Process or in the
Safeguarding System
Failure is
Detected
Safe status of the
Process assured
t
System internal
diagnostic time
Time for
corrective action
Time for reaction of the Process
on the corrective action
Fault Tolerance Time
Fault tolerance time of the process or Process Safety Time (PST)
Reliability Requirements for Alarms
Claimed PFDavg
1 – 0.1
Alarm system
Human
integrity/reliability reliability
requirements
requirements
Alarms may be
integrated into the
process control
system
No special requirements – however
the alarm system should be operated
engineered and maintained to the
good engineering standards
identified in the EEMUA Guide
EMMUA Alarm Systems Guide page 17
CONCEPT 1 : RISK REDUCTION
Actual
remaining
risk
Risk to meet
required Level
of Safety
EUC Risk
Necessary minimum risk reduction [ D R ]
Actual risk reduction
Partial risk covered
by E/E/PES
SRSs
Partial risk covered
by Other Technology
SRSs
Partial risk covered
by External Risk
Reduction Facilities
Risk reduction achieved by all SRSs & External Risk Reduction Facilities
Increasing
Risk
SAFETY INTEGRITY LEVELS
TABLE 2: SAFETY INTEGRITY LEVELS: TARGET
FAILURE MEASURES
SAFETY DEMAND MODE OF CONTINUOUS/HIGH
OPERATION
DEMAND MODE OF
INTEGRITY
OPERATION
LEVEL (Average Probability
of failure to perform (Average Probability
its design function
of a dangerous
(SIL)
on demand)
failure per year)
4
10-5 to < 10-4
10-5 to < 10-4
3
10-4 to < 10-3
10-4 to < 10-3
2
10-3 to < 10-2
10-3 to < 10-2
1
10-2 to < 10-1
10-2 to < 10-1
Reliability requirements for alarms
Claimed PFDavg
0.1 – 0.01
Alarm system
Human reliability
integrity/reliability
requirements
requirements
Alarms system should
be designated as safety
related & categorized as
SIL 1
Alarm system should
be independent from
the process control
system
The operator should be
trained in the
management of the
specific plant failure
that the alarm indicates;
The alarm presentation
arrangements should
make the claimed alarm
very obvious to the
operator and
distinguishable from
other alarms
The alarm should
remain on view to the
operator for the whole
of the time it is active
EMMUA Alarm Systems Guide page 17
Reliability requirements for alarms
Claimed PFDavg
Below 0.01
Alarm system
Human reliability
integrity/reliability
requirements
requirements
Alarms system would
have to be designated as
safety related and
categorized as at least
SIL2
It is not recommended
that claims for a PFDavg
below 0.01 are made
for any operator action
even if it is multiple
alarmed and very
simple.
For all credible
accident scenarios the
designer should
demonstrate that the
total number of safety
related alarms and their
maximum rate of
presentation does not
overload the operator
EMMUA Alarm Systems Guide page 17
The Setting of a high pre-trip alarm
Maximum rate of change
of alarmed variable during fault
Limit at which
protection operates
Time for operator B
to respond to alarm
and correct fault
Abnormal Operating Region
A
Alarm Setting
Limit of largest normal
operational fluctuation
EMMUA Alarm Systems Guide page 17
120
Explosion
Gas Concentration (Percentage of LEL)
Lower Explosive Limit (LEL)
100
Actual Gas
Concentration
80
Actual trip point
Normal
operating Level
60
Error
Measured Gas
Concentration
Set trip point
Gas concentration
prior to fault
40
20
Fault
Occurs
Sampling
Delay
Sensor
Delay
Error
Delay
Shut Down
System Delay
0
0
10
20
30
40
50
60
Time after onset of fault (Seconds)
70
80
Redesign Choices
• Redesign - the plant or its controls to provide greater margin between
the normal operating limits & the trip limits. This is the most desirable
solution but is often impractical or too expensive;
• Setting within normal operating limits - setting the alam within the
limits of normal operating fluctuations & accepting that spurious
alarms will occur during large normal disturbances. This is
ergonomically very undesirable and will tend to increase alarm rates
and reduce the operator confidence in the alarm system. In effect it
increases the Average Probability of Failure on Demand (PFDavg) for
the alarm system as a whole;
• setting nearer trip limits - setting the alarm closer to the trip limits and
accepting that some fast transients will not be corrected by the operator
before they reach the trip level. This will increase the production
losses due to plant trips, & because there are more demands on the
protection system, tend to make the plant less safe. It also implies an
increase PFDavg for the alarm system.
EMMUA Alarm Systems Guide page 17
Different Kinds of Events
Potential
Impact
of
Initiating
Event
Abrupt/Catastrophic
Manageable
Insidious
Time
Impact of DCS Alarm System
Awareness of Disturbances
With typical alarm systems,
orienting begins after an event
creates an abnormal plant state.
Potential
Impact
of
Initiating
Event
Incident
The extent of the problem can
impact operator’s ability to be fully
aware of the locations of process
disturbances.
As disturbances propagate the
number of conditions to be aware of
increases as well as the response
requirements and the likelihood of
missing important information.
Failure is
Detected
Safe status of the
Process assured
Failure Occurrence in the
Process or in the Safeguarding System
Point of operator awareness
Correct intervention causes return to normal
Time
Impact of DCS Alarm System
Management of Problems
Incident
Potential
Impact
of
Initiating
Event
Inadequate filtering interferes with Action
Alarm Floods delay Evaluation
Standing Alarms
interfere with
Orientation
Time
Point of operator awareness
Correct intervention causes return to normal
Impact of Good Alarm Management in Situation
Awareness
Potential
Impact
of
Initiating
Event
• Increases likelihood of
awareness of disturbances
• Reduces time to awareness
• Hence, reduces the average
impact of initiating events
Time
Average shift in awareness with decision support
Impact of Protection System
UN-SAFE
Incident
Impact
of
Initiating
Event
Trip
Emergency Alarm
SAFE
Loss
Quality
High Alarm
Profit
Operator
diagnostic time
Time
Process Safety Time
Trip from SIS
Emergency
FTT
High FTT= Fault Tolerance Time
No response
Incorrect
Potential
Impact
of
Initiating
Event
Suboptimal
Best
Time
Impact of Decision Support System
Support for Optimal Response
Potential
Impact
of
Initiating
Event
• Reduces errors
• Decreases time to
implement response
• Manages side effects
• Increases awareness
Time
ASM Alarm Management Solutions
Education for Management, Engineers,
Technicians and Operators.
• Alarm Performance Assessment.
• Requirement for alarm optimization tools.
• Alignment with Company & EEMUA
Guidelines.
• Alarm Rationalization.
• User Interface Design.
• Decision Support Activities
Alarm Management Optimization
Objectives
• Enhance operator effectiveness
– Avoid alarm floods
– Identify root causes
– Eliminate nuisance alarms
• Enhance profitability
– Reduce variability
– Maximize plant up time
– Prevent damage to equipment
• Reduce risk of :
– Injury to personnel
– Environmental incidents
Alarm Management Optimization
The Process
Collect Data
Change
Management
Analyze
Develop Plant
Alarm Management
Standards & Philosophy
Identify
Enhancements
Implement
Verify Against
Standards
Alarm Management Optimization
Alarm Management
70.0%
70000
50.0%
50000
40.0%
40000
30.0%
30000
20.0%
20000
10.0%
HF3083_PVHI
GF5013_PVHI
ML4822_UNREASBL
DP0730_DEVLO
WFL2423_LO
MP4012_PVHI
0.0%
HP1120_PVLO
MP4012_UNREASBL
WF2466_UNREASBL
WAH2402_HI
WAL2401_LO
WFL2424_LO
WFL2422_LO
GP1120_PVLO
GF7670_DEVHI
BF7060_PVLO
GF7670_DEVLO
GT4011_PVLO
GT4011_UNREASBL
HP3351_LO
GP1120_PVHI
GL3131_LO
BF7000_PVHI
WAL2428_LO
HP1020_PVLO
GF3473_PVLO
MF4260_PVLO
BF7012_OPEN
0
GF3463_PVLO
10000
WF2426_PVLO
After - 30 Points Account for ~ 52 %
of All Alarms
Alarm Distribution by Tag After Removing 30 Nuisance Alarms from the Data Set
2 2000
K
60.00%
1800
50.00%
1400
40.00%
1200
1000
30.00%
800
20.00%
600
400
10.00%
200
MP4290_PVHI
DP9200_PVLO
MP4323_PVLO
MA4323_UNREASBL
HF0013_PVLO
MF4240_PVLO
GA1110_PVHI
MF4220_PVHI
WF2434_PVLO
HP2080_PVHI
JF3030_PVLO
HT0143_PVLO
WPDH2423_HI
ML4016_LO
GF0003_PVLO
HF3093_PVLO
GF3040_PVLO
SL1417_HIHI
JF6010_PVLO
BF7110_PVLO
HF3090_PVLO
ML4812_UNREASBL
MT4394_PVHI
DP0670_DEVLO
WFL2421_LO
0.00%
BF7080_PVLO
0
Cumulative Percentage of Total Alarm Events
1600
MT4394_UNREASBL
Number of Alarm Events
60.0%
60000
Cumulative Percentage of Total Alarm Events
80.0%
80000
GA3094_PVLO
– UserAlert
– Optimization Suite
• Alarm Rationalization and
Documentation
• Alarm Metrics and Analysis
• Advanced Alarm Handlers
90.0%
90000
HL3052_HI
• Recommend and apply new alarm
management software
100100000
K
GT3494_PVHI
– Analyze existing system performance
– Assist in developing an alarm strategy and
educating operations staff
– Rationalize existing alarm system
30 Points Account for Approximately 85% of All Alarms
Number of Alarm Events
• Increase the effectiveness of the
existing alarm system through
proven methodology
Before - 30 Points Account for ~ 85 %
of All Alarms
Optimization Suite…
Alarm Rationalization
• Alarm priority (class) is based on severity and
level of impact and time
• Available priority options in TPS:
– No Action
–
–
–
–
–
–
Journal
Print
Print & Journal
Low
High
Emergency
Optimization Suite…
Alarm Rationalization
• Recommends alarm priorities based on plant
philosophy
– Severity of impact
– Time to respond
– Trip Point
• Electronically captures plant alarm
management philosophy
– Time to respond rules definition
– Impact and severity rules definition
• Apply manual priority override
• Use Alarm Impact Templates
• Generate EC Files (Honeywell)