Transcript Document
DCM 15.3 Managing Risk in the Data Center and Critical Environments Jim Nelson-President BCS, Inc. Chairman-ICOR 1 Auditing a CE is about defining risk: anything that can reduce the operational readiness of a CE Risk assessment methodology Qualitative & quantitative measures 2 Risk Management 3 Managing Risk in a Critical Environment Often a direct response to a downtime event Downtime has a profound impact on the perception of reliability Knee-jerk reaction Rebuilding trust Never say “Never again” Root Cause Analysis 4 ISO 31000 What are the potential causes? What are the impacts? What is the probability or likelihood? How to mitigate the consequences or likelihood of the risk occurring? 5 The Organization ISO 22313 Guidance 6 Risk Identification Design vs Operations: Even a welldesigned system can have significant operational issues. Check support equipment and maintenance staff. Contributing Factors: Some issues that are not in-and-of themselves a risk can contribute to an actual risk when factored together. Outside Influences: Things you cannot stop from happening but can be mitigated as to impact such as those caused by weather and instability of the power grid. 7 Risk Identification Age Related Issues: Many components of a CE are designed with specific life expectancies. As equipment ages, the risk of an outage increases. Best Practices: As the CE ages gaps can develop between current practices and industry best practices. As this gap widens, the risk grows. Regulatory Changes: New electrical code is issued every 3 years. Evaluation of changes and how they impact the CE operation is vital. 8 Examples of Risk Treatments Control or mitigate Financing / Insurance Transfer Acceptance Avoidance 9 Administrative Risks Physical Attributes 10 Administrative Risks Operational Practices Maintenance Program Staffing / Retention Knowledge and Training Written Procedures Documentation Physical / Electronic Data Backups System Drawings 11 Physical Attributes Mechanical Systems Electrical Systems Fire Life Safety Building Automation and Controls System Monitoring Structural 12 Cost of Downtime Damage to mission critical data Impact of downtime on organizational productivity Damages to equipment and other assets Cost to detect and remediate systems and core business processes Legal and regulatory impact, including litigation defense cost Lost confidence and trust among key stakeholders Diminishment of marketplace brand and reputation 13 Keeping up with Best Practices Performing a CE audit can help to introduce best practices into operations Changes in Mission Changes in Technology Growing Importance of Data Innovative Ideas Energy Efficiency 14 Management System Standards & Resilience Business Continuity Management: ISO 22301 Organizational Resilience: ISO 22316x Crisis Management & Communications: PAS200:2011 Risk Management & Insurance: ISO 31000 Emergency Management: ISO 22320 Social Resilience Facility Management: ISO 14001 Supply Chain Continuity & Security: PD 25222:2011 & ISO 28000 Legal, Compliance & Audit Technology Infrastructure: ISO 27001, 27031, & 20000 15 Types of Management Systems ISO 28000: Supply Chain Security ISO 22301: Business Continuity Social Accountability Future Standards? 16 ISO 19011: 2011 ISO 17021: 2011 – Requirements for third-party certification of management systems ISO 19011: 2011 – Guidelines for auditing management systems External Auditing Internal Auditing Supplier Auditing 3rd Party Auditing For legal & regulatory purposes 1st Party Audit 2nd Party Audit For certification requiremnts 17 Foundation of Profession Information is Secure Fair Representation Due Professional Care Impartial & Objective Evidence Based Approach 18 Types of Reference Materials Management Systems Code Requirements Sector and National Standards Best Practices Rules & Regulations Compliance White Papers Technical Publications Recalls Bulletins 19 NFPA 70: National Electrical Code (NEC) The National Electrical Code (NEC), or NFPA 70, is a regionally adoptable standard for the safe installation of electrical wiring and equipment in the United States. NFPA 70E - covers electrical safety requirements for employees Copies of codes and standards are available for purchase through the NFPA online: http://www.nfpa.org/codes-and-standards/buy-nfpa-codesand-standards 20 TIA 942: Telecommunications Infrastructure Standard for Data Centers TIA 942 specifies the minimum requirements for telecommunications infrastructure of data centers and computer rooms, including single tenant enterprise data centers and multi-tenant Internet hosting data centers. The topology specified in this document is intended to be applicable to any size data center. http://www.tiaonline.org/standards/b uy-tia-standards Additional TIA standards are also listed and available for purchase 21 International Organization of Standardization (ISO) ISO develops International Standards. Founded in 1947, and since then have published more than 19,500 International Standards covering almost all aspects of technology and business. From food safety to computers, and agriculture to healthcare. http://www.iso.org/iso/home/store.htm This site has standards in their offline version as well as online collections. Other standards are also available through this site. 22 ASHRAE (American Society of Heating, Refrigerating and Air-Conditioning Engineers) ASHRAE writes standards for the purpose of establishing consensus for: 1. Methods of test and classification standards 2. Design standards 3. Protocol standards 4. Rating standards (in limited cases) http://www.techstreet.com/ashrae/subgroups/34755 There is a search option to assist in locating the ASHRAE standard that applies to you specific area of interest 23 Occupational Safety and Health Administration (OSHA) The purpose of OSHA is to assure safe and healthful working conditions for working men and women by setting and enforcing standards and by providing training, outreach, education and assistance. https://www.osha.gov/pls/oshaweb /owasrch.search_form?p_doc_type =STANDARDS&p_toc_level=1&p_k eyvalue=1910 OSHA standards are available online at no cost. 24 Audit Program: The Audit Plan Opening Meeting Sampling techniques-methodology, evidence, interviews, audit trail Process Reviews, Document Reviews, Inspections Daily Status Meetings Audit Opinion Development-(do not rush to an opinion) Summary Document, Findings Document, Findings Presentation Non-conformities, root cause analysis, treatments Closing Meeting Surveillance / Tracking audits 25 Developing an Audit Program Having an highly defined program will enrich the audit process by: Drive consistent reporting Define the actions steps to take when performing the audit Align the various participants to the process Allows for centralized tracking of corrective actions Give the auditor the tools required to perform the audit Speeds up the audit process Internal Audits and External Audits Standards aligned Certification options 26 The Importance of the Audit Schedule A clearly defined schedule is critical to ensuring that the audit is successful and all items are properly addressed Drives consistency Ensures that all events occur in order Improves forecasting Prevents confusion No missed areas of interest Saves time and $$$ 27 Determining Audit Scope Cost Benefit Analysis PESTEL Analysis SWOT Analysis Business Impact Analysis 28 Scope Health check Regulatory requirement Client requirement Mechanical systems Power systems ICT environment Data security Certification 29 General Guidelines for Data Collection 1. 2. 3. 4. 5. 6. Tours Written Documentation Physical evidence Documentation Drawings Procedures 7. CMMS 8. BMS 9. Computer based Monitoring Systems 10. Staffing Interviews 11. Process Review 30 Understand the mission and goals of the company Identify the mission statement Identify the goals of the company 31 Align the process to the mission and goals Identify Opportunities: Look for gaps in CE design and process that might prevent the achievement of mission and goals. Drive the audit program to support the mission and goals. Support New Initiatives: Is the company promoting green initiatives? Is the drive towards reliability and consistently available services? Has a natural disaster driven a process? What about security against terrorist activities? Public Opinion: Consider public opinion issues such as energy consumption, hazardous waste disposal, safety of the work force, etc. 32 Program Overview and Governing Documentation Program Overview – A written document that describes in detail the goals and methods to be used in the conducting of data center audits. Governing Documentation – A list of all forms and documents that will be used to report and track the audit process and deliverables. Document Management System (DMS) 33 Program Overview The mission statement of the audit program The program description – a base description of the audit program – including reasons and goals Risk prioritization formulas and process of the creation for each category used in the formula Governing Agency within the organizational structure 34 Program Overview Responsibility Matrix – Lists all potential actions that will be required as part of the audit, what is expected to happen, and who will perform the action steps Deliverables Description – A list of the expected outcome of the audit and who will receive the documents Written process for conducting the audit – a process description for the conduct of the audit – step by step walkthrough of what will happen Expected items to be provided upon start of an audit 35 Create the Tools to Meet Goals of Audit Pre-task spreadsheet Expected actions list Document of actionable items Audit spreadsheet 36 Review and Process Approvals Management Review Peer Review Stakeholder Review Categorical Expert Review 37 Importance of Obtaining Permissions Take Photos Physical Access Data Storage Devices Use of Ladders Special Training Requirements 38 Use Consistent Methods of Delivery Process and Discovery Method Tools to Drive Consistency Report of Audit Findings 39 Who will need to be available for questions? Engineering Leads Security Personnel ICT Management Individual System Experts Admins for various systems (BMS, CMMS, DMS, Monitoring, Drawings, & Documentation) Consultants 40 Initial Meeting: All Stakeholders Anyone who has input to the audit or has ownership over one or more of the critical systems that support the CE should be invited to this meeting. CE/DC Manager Team Leads Chief Engineer / Infrastructure Manager They will play a central role into the operation and maintenance of the CE and as such should not be missing – reschedule to ensure attendance. 41 Closing Meeting: All Stakeholders The Closing Meeting should be set to coincide with the completion of the audit when all the findings are prepared and ready for delivery. The audit is completed. Audit findings are ready for delivery Each item on the list of risks has been identified and properly researched and referenced. Ensure that all questions are answered and that the site is prepared to address each issue to meet code compliance or best practice. 42 Follow-up / Tracking Meetings Invitees depends on what items are left to discuss – this is ongoing until completion This can be included as part of the normal IT/ engineering team meetings for tracking. Ensure that each item is listed and discussed. This is a series of meetings intended to track the audit findings until they have thoroughly been addressed and closed out. Each item should be tracked and listed by risk Have an estimated completion date for each item and list a responsible person to ensure each item is completed Once all items are closed – the audit is closed out and no further audit meetings are required. 43 Management Review 44 Contact Information: Jim Nelson President, BCS 866.629.6327 [email protected] www.BusinessContinuitySvcs.com Chairman of the Board, ICOR 866.765.8321 [email protected] www.theicor.org 45