CM for Digital and Digital Upgrades

Download Report

Transcript CM for Digital and Digital Upgrades

Digital Modifications and
Configuration Control of Digital
Systems
John Connelly
Exelon Generation
Engineering Manager – Capital Projects
OPEX – Digital Challenges
 Implementation of digital modifications is an industry wide issue:
•
•
IER 11-02 identifies adverse trend in SCRAMS between 2005 and 2010
43 SCRAMS (35%) were the result of flawed implementation of Design Changes
involving digital technology
 INPO 10-008 examined events from 2003 to 2007
•
•
•
•
17 SCRAMS from software malfunctions resulted in loss of 1.6 million MWh
24 SCRAMS from hardware malfunctions resulted in a loss of 3.1 million MWh
Significant operational and safety challenges
A modest $50 / MWh yields an industry-wide cost of ~$200M
2
Common Threads
Irrespective of utility, most events share two common
themes:
• Flaws in the processes by which digital modifications are
implemented
• Inadequate knowledge of the complex technologies and
techniques common to nearly all digital modifications
3
Changes To INPO Evaluation Process - CM
 Performance Objectives for the Design Change Process (CM.3) are
under revision
 Future INPO evaluations will include a review of the processes by
which you manage the unique characteristics of digital technology.
This includes:
•
•
•
•
•
•
Development and control of procurement specifications
Software
Vendor interfaces
Testing
Validation
Failure Modes and Effects Analysis
4
Changes To INPO Evaluation Process - Knowledge
 Application of digital technology requires very different and specialized
skills to implement correctly
 INPO ACAD 98-04, Rev 2 introduces the entity of “Digital Engineer”
 Engineers assigned to work independently on digital projects must be
qualified to ACAD 98-04, Rev 2 by March 2013
 Training evaluations conducted after March of 2013 will be in
accordance with the requirements of ACAD 98-04, Rev 2
5
Knowledge and Process Inventory
 Digital technology, while superior in nearly every dimension to analog
technology, requires very different competencies and processes:
•
•
•
•
•
•
•
•
•
•
•
•
Software engineering
Hardware design
Exception / Fault / Error Handling / Recovery
Networking
Cyber Security
Human Factors Engineering
Advanced analysis techniques (FMEA / SHA / CDR)
EMI / RFI
Interfacing systems knowledge
Plant Operations
Testing / Dynamic response analysis
Life-Cycle Management
6
Key Takeaway…
Digital Is Different!
• Engineering processes for “conventional” modifications do not, by
themselves, provide an adequate defense against errors and
events
• Requires very different skills to implement correctly
• Your design processes will be evaluated against this reality
7
Exelon Digital Modification
Processes
Exelon Internal OPEX
 A series of events beginning in 2005 made it clear that improvement
opportunities existed
 The Quad Cities Reactor Recirculation Adjustable Speed Drives (ASD) provides
a representative example of the challenges
• Approximately 150 Issue Reports
• Manual scram, power reductions and operational challenges
 Principle findings from CCA:
•
•
•
•
Latent design flaws in vendor products
FMEA did not detect design issues
Excessive reliance on vendors
Testing failed to uncover issues
 Similar experiences with other modifications
9
9
Redesigning the process at Exelon
 Formed Corporate Capital Projects Group to oversee large, multi-site digital
modifications (RRASD, DEH, MPT, TDFWP, BOP 7300…)
 Staffed with subject matter experts on digital technology
 CPG works closely with implementing engineers at the sites who manage the EC
development process
 Advanced training provided to site and corporate digital I&C engineers to jump
start performance
 Procedures and processes revised to capture best practices – process
improvements will be continue indefinitely as practices continue to mature
10
10
Exelon Digital Modification Process
 The existing Configuration Control process is now supplemented with
procedures that address the unique attributes of digital technology







Management Of Digital Modifications
Digital Design Considerations
Design Attributes For Digital Systems
Software Development
Digital Procurement Process
Factory Acceptance Testing
Cyber Security
 The process continues to evolve as Cyber Security requirements are
implemented and additional best practices are identified
11
11
Typical Processes For
Large Digital Projects
Typical Project Lifecycle
13
13
Procurement Specifications
 The act of fully defining detailed vendor requirements commensurate
with project safety significance, operational risk and project scope.
 Specifically identifying documentation and performance requirements for
a given project including (but not limited to):







Verification and Validation (V&V) requirements
Software Quality Assurance measures
Hardware design requirements (including Single Point Vulnerabilities)
Failure Modes and Effects Analysis (FMEA) requirements
Software testing and validation requirements
Cyber Security requirements
Life Cycle Management (LCM) requirements
 Time invested in the development of a detailed procurement specification
improves project execution by avoiding unbudgeted scope changes
14
14
The role of software design –
A brief case study
Perfect Software Does Not Exist
 No system will ever be perfect no matter how rigorous the development
process used or amount of money spent to develop and maintain it –
humans develop software and humans will always make mistakes
 Highly automated systems effectively move the point of error from the
user (Operations and Maintenance) to the programmer but human error
still exists
 The Space Shuttle flight control system was arguably the most rigorously
developed and tested control system ever conceived
• 400,000 words (very small footprint compared to a modern DCS)
• $100,000,000 per year in maintenance
• Over the 25 year shuttle program, 16 Severity Level 1 software issues were identified –
SL1 issues are those that would result in the loss of the orbiter under the right conditions
16
16
How do software driven systems malfunction?
 Software malfunctions are systemic, not random
 In the absence of hardware induced fault, instructions will execute
exactly as written unerringly and without exception
 Software malfunctions require the simultaneous existence of two
conditions:
• An error must be present (often undetected)
• An initiating event must occur
 If both conditions are not satisfied, no error will occur
17
17
A Representative Example From Aerospace
 The Event:
• A completed commercial airliner is about to be delivered to the customer
• A Factory Acceptance Test is being conducted by factory and customer personnel in
which the parking brakes are applied and all four engines are taken to maximum
continuous thrust
• At this power setting and altitude (zero feet) the flight control system automatically selects
“takeoff” mode as designed
• The flight control system correctly recognizes that the wing surfaces are incorrectly
configured for a takeoff and continuously sounds the Ground Proximity Warning (GPW)
alarm as designed – this alarm is critical and cannot be silenced
• A technician, irritated by the alarm and unable to silence it, trips the feed breaker for the
GPW system knowing that this will de-energize the alarm
• Ground proximity radar loses power and clears the zero altitude interlock
• With the interlock cleared, control system now concludes the plane is in the air and
releases the brakes – this is a programmed behavior to prevent landing the aircraft with
the brakes set
• Plane immediately accelerates (no passengers or luggage and little fuel) and strikes the
jet blast barrier at full power
18
18
The results
19
19
The results
20
20
A Representative Example From Aerospace
Software malfunction requires two conditions:
• The error must be present and undetected:
 This application software had been in service for years and “ground run-up” tests
are somewhat routine
• The initiating event must occur:
 The loss of supply voltage to GPW interlock caused the brakes to release exactly
as they were programmed to do.
 The software development team never envisioned this combination of events
21
21
Software evolves during the development process
Changes can invalidate
previous testing or introduce
new errors
22
22
The Interfaces To Cyber Security
Integration With Cyber Security Requirements
 The Cyber Security Rule (10 CFR 73.54) is a license condition that applies to any
digital component that is:
•
•
•
•
•
•
Safety Related
Important To Safety (defined as reactivity impact)
Physical Security
Emergency Preparedness
Systems that support any of the above
Systems with pathways of connectivity to any of the above
 Significant synergies exist between the Digital I&C process and Cyber Security
 Consider the extent to which these processes are interconnected and aware of
the other
24
24
Cyber / Digital Relationship
25
25
Testing Considerations
Factory Acceptance Testing
 Many test plans focus on “positive testing” which confirms expected responses
for a given set of inputs or stimulus conditions – informative but only to a point
 Negative testing focuses on verifying that you don’t get an unexpected response
when you combine unusual stimulus or do something outside of normal operation
– effectively its an attempt to trigger a malfunction which can be very informative
 It’s nearly inevitable that over the life of a system, it will be operated in a way the
designers never anticipated. Take advantage of unstructured testing opportunities
(i.e. pre-FAT) to attempt to “break” the system early in the development cycle
while there is ample opportunity to take corrective action for issues identified
 Process needs to involve Operators, System Engineers and SME’s
27
27
Modification Acceptance Testing
 Most modification issues are not with the systems themselves but rather
interfaces to installed plant hardware (power / hydraulics / supporting systems /
actuators / protective devices / EMI / RFI…)
 The Mod Acceptance Test (MAT) is the very first time the system will be tested in
the plant environment. In some cases it will be the first time that the system is
connected to any physical components and therefore represent the first
opportunity to identify and correct interface issues – care should be taken to
exercise every interface to the extent possible and as early as possible
 All models are wrong – this includes your plant simulator and vendor simulation
models therefore in-plant testing is critical and your most robust line of defense
28
28
Post Installation Configuration Control
Ongoing Configuration Control
 One of the advantages of digital systems is that they are easily modifiable – this
also constitutes a vulnerability if not taken into consideration by the process
• Processes need to exist to detect any inadvertent changes to a systems configuration
• “Baseline / Compare” utilities can be used to compare system states with a known and
approved baseline configuration
• Periodic audits of log, system and event files
• Surveillance testing
• Defined protocols for testing of authorized modifications (i.e. regression testing)
 Not all changes are modifications
• Changes to calibration constants controlled in accordance with maintenance procedures
• Pre-evaluated adjustments (tuning within defined boundaries)
• Specific changes for Cyber Security incident response in accordance with CS procedures
 Reference EPRI Topical Report 1022991 – “Guideline On Configuration
Management For Digital Instrumentation And Control Equipment And Systems”
30
30
Questions?
31
31