Transcript Slide 1
ME 4054W: Design Projects RISK MANAGEMENT Lecture Topics • What is risk? • Types of risk • Risk assessment and management techniques 2 Risk Trivia • The study of risk as a science started during the Renaissance in the 16th century. • The initial impetus was from the financial world. • The word “risk” is derived from the early Italian word “risicare” which means “to dare”. 3 Risk Risk is a measure of the probability and severity of adverse effects.1 Risk involves the possibility of loss. Without the potential for loss, there is no risk. Managing risk involves choice. Without choice, there is no risk management. 1 Lowrance 1976 4 Risk Risks are future events with a probability of occurrence and a potential for loss. If caught in time, risks can be avoided or have their impacts reduced. Completely eliminating all risk would be very expensive, if not impossible. The key is achieving an acceptable risk level. 5 What are some of the types of risk found in projects? 6 Types of Risk Found in Projects • Technology risk • Human resource risk • Economic risk • Operational risk • Market risk • Geographic risk • Timing risk • Environmental risk • etc. The primary focus of the balance of this lecture will be technology risk 7 Team Exercise • What are some items on your project that you consider risks? • What are some of the attributes of the items identified as risks? 8 Risk Assessment It is common in risk assessment processes to seek answers to the following set of questions: • What can go wrong? • What is the likelihood that it would go wrong? • What are the consequences? Kaplan and Garrick 1981 9 What is Risk Management? The identification, analysis, assessment, control, and avoidance, minimization, or elimination of unacceptable risks. Identify, assess, prioritize, then manage risk. businessdictionary.com 10 Risk Management Risk management seeks to answer a second set of three questions: 1. What can be done and what options are available? 2. What are the associated tradeoffs in terms of all costs, benefits, and risks? 3. What are the impacts of current management decisions on future options? Haimes 1991 11 Risk Management • When should it be done? – Recommended practice is to apply risk management before “release” using judgment, expert knowledge and experience. – Risk should also be monitored throughout a product’s life. – Not all issues are identified before the fact. Analyzing and responding to failures is an important aspect of risk management. 12 Failure Mode & Effects Analysis (FMEA) • FMEA is a structured approach aimed at identifying all possible failures in a design, process, product or service. • It is also called potential failure modes and effects analysis (PFMEA) and failure modes, effects and criticality analysis (FMECA). Source: American Society for Quality (asq.org) 13 Failure Mode & Effects Analysis (FMEA) • “Failure modes” means the ways, or modes, in which something might fail. • Failures are any errors or defects, especially ones that affect the customer. • Failures can be potential or actual. • “Effects analysis” refers to studying the consequences of those failures. Source: American Society for Quality (asq.org) 14 Failure Mode & Effects Analysis (FMEA) • The purpose of the FMEA is to take actions to eliminate or reduce failures, starting with those that are the highest priority. • Ideally, the use of FMEA begins during the earliest conceptual stages of design and continues throughout the life of the product or service. Source: American Society for Quality (asq.org) 15 Failure Mode & Effects Analysis (FMEA) When to use FMEA • When a process, product or service is being designed or redesigned, after quality function deployment. • When an existing process, product or service is being applied in a new way. • Before developing control plans for a new or modified process. Source: American Society for Quality (asq.org) 16 Failure Mode & Effects Analysis (FMEA) When to use FMEA • When improvement goals are planned for an existing process, product or service. • When analyzing failures of an existing process, product or service. • Periodically throughout the life of the process, product or service. Source: American Society for Quality (asq.org) 17 FMEA Procedure 1. Assemble a cross-functional team with diverse knowledge about the process, product or service and customer needs. Functions often included are: design, manufacturing, quality, testing, reliability, maintenance, purchasing, sales, marketing and customer service. Suppliers and customers can be added, if needed, to supplement the team. Source: American Society for Quality (asq.org) 18 FMEA Procedure 2. Identify the scope of the FMEA. • Is it for a concept, system, design, process or service? • What are the boundaries? • How detailed should we be? Source: American Society for Quality (asq.org) 19 FMEA Procedure 3. Complete the FMEA form beginning with the identifying information at the top of the form. FAILURE MODE AND EFFECTS ANALYSIS Responsibility: Prepared by: Item: Model: Core Team: Potential Process Function Failure Mode Potential Effect(s) of Failure S e v C O Potential l c Cause(s)/ a c Mechanism(s) s u of Failure s r FMEA number: Page : FMEA Date (Orig): D e t e c Current Process Controls R P N 0 0 0 0 0 0 0 0 0 0 0 0 0 Source: American Society for Quality (asq.org) 20 Recommended Action(s) Responsibility and Target Completion Date Rev: Post-Action Results Actions Taken S e v O c c D e t R P N 0 0 0 0 0 0 0 0 0 0 0 0 0 FMEA Procedure • In an FMEA, potential failure modes are rated on the following attributes: – Severity (S) is an assessment of the severity of the potential effect of the failure. – Occurrence (O) is an assessment of the likelihood of occurrence of the failure. – Detection (D) is an assessment of the likelihood that the problem will be detected before it reaches the end-user/customer. 21 Severity Rating Scale (Should be tailored to meet the needs of your project or company) Rating Description Definition (Severity of Effect) 10 Dangerously high 9 Extremely high 8 Very high 7 High Failure causes a high degree of customer dissatisfaction. 6 Moderate Failure results in a subsystem or partial malfunction of the product. 5 Low Failure creates enough of a performance loss to cause the customer to complain. 4 Very Low 3 Minor Failure would create a minor nuisance to the customer, but the customer can overcome it without performance loss. 2 Very Minor Failure may not be readily apparent to the customer, but would have minor effects on the customer’s process or product. 1 None Failure would not be noticeable to the customer and would not affect the customer’s process or product. Failure could injure the customer or an employee. Failure would create noncompliance with federal regulations. Failure renders the unit inoperable or unfit for use. Failure can be overcome with modifications to the customer’s process or product, but there is minor performance loss. 22 Occurrence Rating Scale (Should be tailored to meet the needs of your project or company) Rating Description Potential Failure Rate 10 Very High: Failure is almost inevitable. More than one occurrence per day or a probability of more than three occurrences in 10 events (Cpk < 0.33). 9 High: Failures occur almost as often as not. One occurrence every three to four days or a probability of three occurrences in 10 events (Cpk ≈ 0.33). 8 High: Repeated failures. One occurrence per week or a probability of 5 occurrences in 100 events (Cpk ≈ 0.67). 7 High: Failures occur often. One occurrence every month or one occurrence in 100 events (Cpk ≈ 0.83). 6 Moderately High: Frequent failures. One occurrence every three months or three occurrences in 1,000 events (Cpk ≈ 1.00). 5 Moderate: Occasional One occurrence every six months to one year or five failures. occurrences in 10,000 events (Cpk ≈ 1.17). 4 Moderately Low: Infrequent failures. One occurrence per year or six occurrences in 100,000 events (Cpk ≈ 1.33). 3 Low: Relatively few failures. One occurrence every one to three years or six occurrences in ten million events (Cpk ≈ 1.67). 2 Low: Failures are few and far between. One occurrence every three to five years or 2 occurrences in one billion events (Cpk ≈ 2.00). 1 Remote: Failure is unlikely. One occurrence in greater than five years or less than two occurrences in one billion events (Cpk > 2.00). 23 Detection Rating Scale (Should be tailored to meet the needs of your project or company) Rating Description 10 Absolute Uncertainty 9 Very Remote 8 Remote 7 Very Low 6 Low 5 Moderate 4 Moderately High 3 High 2 Very High 1 Almost Certain Definition The product is not inspected or the defect caused by failure is not detectable. Product is sampled, inspected, and released based on Acceptable Quality Level (AQL) sampling plans. Product is accepted based on no defectives in a sample. Product is 100% manually inspected in the process. Product is 100% manually inspected using go/no-go or other mistake-proofing gauges. Some Statistical Process Control (SPC) is used in process and product is final inspected off-line. SPC is used and there is immediate reaction to out-ofcontrol conditions. An effective SPC program is in place with process capabilities (Cpk) greater than 1.33. All product is 100% automatically inspected. The defect is obvious or there is 100% automatic inspection with regular calibration and preventive maintenance of the inspection equipment. 24 FMEA “By The Numbers” • The Risk Priority Number (RPN) is defined as: RPN = S x O x D • The Criticality Index (CI) is defined as: CI = S x O 25 FMEA “By The Numbers” • RPN and CI are used to assess the risk associated with potential problems in a product or process design and to prioritize issues for corrective action. • The RPN or CI values that require action are project/product-specific. An approximate rule of thumb is if RPN ≥ 200 or CI ≥ 40, then that risk should be mitigated. • All risks with severity of 10 must be mitigated. 26 Simple FMEA Example Process Function Potential Failure Mode Dispense Does not amount of cash dispense requested by cash customer Dispenses too much cash Potential Cause(s)/ Mechanism(s) of Failure O c c u r Customer very disatified Out of cash 5 Incorrect entry to demand deposit system 8 Machine jams Potential Effect(s) of Failure D e t e c R P N C R I T Internal low-cash alert 5 200 3 Internal jam alert 10 Current Process Controls Responsibility Recommende and Target d Action(s) Completion Date Post-Action Results R P N C R I T 40 0 0 240 24 0 0 10 160 16 0 0 Actions Taken S e v O c c D e t Discrepancy in cash balancing Power failure during transaction 2 None Bank loses money Bills stuck together 2 Loading procedure 7 (riffle ends of stacks) 84 12 0 0 Denominations in wrong trays 3 Two-person visual verification 4 72 18 0 0 Heavy network traffic 7 None 10 210 21 0 0 Power interruption during transaction 2 None 10 60 6 0 0 Discrepancy in cash balancing Takes too long to dispense cash S e v 6 Customer somewhat annoyed Source: American Society for Quality (asq.org) 3 27 “Completing” the FMEA • Use the RPN and CI to set priorities for corrective actions. • Assign responsibilities and target completion dates for the corrective actions. • After action is taken, reassess S, O and D and recalculate RPN and CI to determine if additional actions are needed. 28 Risk Management Decision Making • The graphic at right provides a general guideline for decisions about when to mitigate risks. There is no single right answer; a number of similar guideline tables exist. • It is strongly suggested to err on the side of caution. 29 Green = No mitigation needed Yellow = Consider mitigating risk Red = Risk mitigation needed You’ve launched the product and there are failures. What now? • The FMEA process is primarily focused on pre-release risk management, but can be used after launch. • Many other processes, such as FRACAS, Fault Tree Analysis, Root Cause Analysis and 5 Whys are available to assess and manage failures of commercial products. 30 FRACAS • FRACAS stands for Failure Reporting and Corrective Action System. • FRACAS provides a process for reporting, classifying, analyzing failures, and planning corrective actions in response to those failures. • Common FRACAS outputs may include: Field MTBF, MTBR, MTTR, spares consumption and reliability growth. 31 Sample FRACAS Process Flow http://www.maintenancephoenix.com/wp-content/uploads/2013/01/FRACAS-MTBF-Process-Map.jpg 32 Root Cause Analysis • Root cause analysis (RCA) is a method of problem solving that tries to solve problems by attempting to identify and correct the root causes of events, as opposed to simply addressing their symptoms. • Focusing correction on root causes has the goal of preventing problem recurrence. 33 http://withfriendship.com/images/f/25154/a-root-cause-analysis-is.gif 34 Fault Tree Analysis • Fault tree analysis (FTA) is a top down, deductive failure analysis in which an undesired state of a system is analyzed using Boolean logic to combine a series of lower-level events. 35 5 Whys • The 5 Whys is an iterative question-asking technique used to explore the cause-andeffect relationships underlying a particular problem. • The primary goal is to determine the root cause of a defect or problem. 36 5 Whys Example • The vehicle will not start. (the problem) o Why? - The battery is dead. (first why) o Why? - The alternator is not functioning. (second why) o Why? - The alternator belt has broken. (third why) o Why? - The alternator belt was well beyond its useful service life and not replaced. (fourth why) o Why? - The vehicle was not maintained according to the recommended service schedule. (fifth why, a root cause) • A possible corrective action addressing the fifth Why and solution to the problem’s root cause is to maintain the vehicle according to the recommended service schedule. Adapted from http://en.wikipedia.org/wiki/5_Whys 37 Risk Management Cycle 38