Transcript Document

DCM 15.3 Managing Risk in the Data Center
and Critical Environments
Jim Nelson-President BCS, Inc.
Chairman-ICOR
1
Auditing a CE is about defining risk:
anything that can reduce the operational
readiness of a CE
Risk assessment methodology
Qualitative & quantitative measures
2
Risk Management
3
Managing Risk in a Critical Environment
Often a direct response to a downtime event
Downtime has a profound impact on the perception of reliability
Knee-jerk reaction
Rebuilding trust
Never say “Never again”
Root Cause Analysis
4
ISO 31000
What are the potential
causes?
What are the impacts?
What is the probability
or likelihood?
How to mitigate the
consequences or
likelihood of the risk
occurring?
5
The Organization
ISO 22313 Guidance
6
Risk Identification
Design vs Operations: Even a welldesigned system can have significant
operational issues. Check support equipment
and maintenance staff.
Contributing Factors: Some issues that
are not in-and-of themselves a risk can
contribute to an actual risk when factored
together.
Outside Influences: Things you cannot
stop from happening but can be mitigated as
to impact such as those caused by weather
and instability of the power grid.
7
Risk Identification
Age Related Issues: Many components of a CE
are designed with specific life expectancies. As
equipment ages, the risk of an outage increases.
Best Practices: As the CE ages gaps can
develop between current practices and industry
best practices. As this gap widens, the risk
grows.
Regulatory Changes: New electrical code is
issued every 3 years. Evaluation of changes and
how they impact the CE operation is vital.
8
Examples of Risk Treatments
Control or mitigate
Financing / Insurance
Transfer
Acceptance
Avoidance
9
Administrative
Risks
Physical
Attributes
10
Administrative Risks
Operational Practices
Maintenance Program
Staffing / Retention
Knowledge and Training
Written Procedures
Documentation
Physical / Electronic Data Backups
System Drawings
11
Physical Attributes
Mechanical Systems
Electrical Systems
Fire Life Safety
Building Automation and Controls
System Monitoring
Structural
12
Cost of Downtime
Damage to mission critical data
Impact of downtime on organizational productivity
Damages to equipment and other assets
Cost to detect and remediate systems and core business
processes
Legal and regulatory impact, including litigation defense
cost
Lost confidence and trust among key stakeholders
Diminishment of marketplace brand and reputation
13
Keeping up with Best Practices
Performing a CE audit can help to introduce best
practices into operations
Changes in Mission
Changes in Technology
Growing Importance of Data
Innovative Ideas
Energy Efficiency
14
Management System Standards & Resilience
Business Continuity
Management: ISO 22301
Organizational Resilience:
ISO 22316x
Crisis Management &
Communications:
PAS200:2011
Risk Management &
Insurance: ISO 31000
Emergency Management:
ISO 22320
Social Resilience
Facility Management:
ISO 14001
Supply Chain Continuity &
Security: PD 25222:2011 &
ISO 28000
Legal, Compliance & Audit
Technology Infrastructure:
ISO 27001, 27031, & 20000
15
Types of Management Systems
ISO 28000: Supply
Chain Security
ISO 22301:
Business
Continuity
Social
Accountability
Future
Standards?
16
ISO 19011: 2011
ISO 17021: 2011 – Requirements for third-party
certification of management systems
ISO 19011: 2011 – Guidelines for auditing management
systems
External Auditing
Internal Auditing
Supplier Auditing
3rd Party Auditing
For legal & regulatory purposes
1st Party Audit
2nd Party Audit
For certification requiremnts
17
Foundation of
Profession
Information is
Secure
Fair Representation
Due Professional
Care
Impartial & Objective
Evidence Based
Approach
18
Types of Reference Materials
Management Systems
Code Requirements
Sector and National Standards
Best Practices
Rules & Regulations
Compliance
White Papers
Technical Publications
Recalls
Bulletins
19
NFPA 70: National Electrical Code (NEC)
The National Electrical Code (NEC), or
NFPA 70, is a regionally adoptable standard for
the safe installation of electrical wiring and
equipment in the United States.
NFPA 70E - covers electrical safety
requirements for employees
Copies of codes and standards are available for
purchase through the NFPA online:
http://www.nfpa.org/codes-and-standards/buy-nfpa-codesand-standards
20
TIA 942: Telecommunications Infrastructure
Standard for Data Centers
TIA 942 specifies the minimum requirements for
telecommunications infrastructure of data centers
and computer rooms, including single tenant
enterprise data centers and multi-tenant Internet
hosting data centers.
The topology specified in this document is intended
to be applicable to any size data center.
http://www.tiaonline.org/standards/b
uy-tia-standards
Additional TIA standards are also
listed and available for purchase
21
International Organization of
Standardization (ISO)
ISO develops International Standards. Founded in
1947, and since then have published more than
19,500 International Standards covering almost all
aspects of technology and business. From food safety
to computers, and agriculture to healthcare.
http://www.iso.org/iso/home/store.htm
This site has standards in their offline version as well as online collections.
Other standards are also available through this site.
22
ASHRAE (American Society of Heating,
Refrigerating and Air-Conditioning Engineers)
ASHRAE writes standards for the purpose of
establishing consensus for:
1. Methods of test and classification standards
2. Design standards
3. Protocol standards
4. Rating standards (in limited cases)
http://www.techstreet.com/ashrae/subgroups/34755
There is a search option to assist in locating the
ASHRAE standard that applies to you specific area of
interest
23
Occupational Safety and
Health Administration (OSHA)
The purpose of OSHA is to assure safe and healthful
working conditions for working men and women by
setting and enforcing standards and by providing
training, outreach, education and assistance.
https://www.osha.gov/pls/oshaweb
/owasrch.search_form?p_doc_type
=STANDARDS&p_toc_level=1&p_k
eyvalue=1910
OSHA standards are available
online at no cost.
24
Audit Program: The Audit Plan
Opening Meeting
Sampling techniques-methodology, evidence, interviews, audit
trail
Process Reviews, Document Reviews, Inspections
Daily Status Meetings
Audit Opinion Development-(do not rush to an opinion)
Summary Document, Findings Document, Findings Presentation
Non-conformities, root cause analysis, treatments
Closing Meeting
Surveillance / Tracking audits
25
Developing an Audit Program
Having an highly defined program will enrich the
audit process by:
Drive consistent reporting
Define the actions steps to take when performing the audit
Align the various participants to the process
Allows for centralized tracking of corrective actions
Give the auditor the tools required to perform the audit
Speeds up the audit process
Internal Audits and External Audits
Standards aligned
Certification options
26
The Importance of the Audit Schedule
A clearly defined schedule is critical to
ensuring that the audit is successful and
all items are properly addressed
Drives consistency
Ensures that all events occur in order
Improves forecasting
Prevents confusion
No missed areas of interest
Saves time and $$$
27
Determining Audit Scope
Cost Benefit Analysis
PESTEL Analysis
SWOT Analysis
Business Impact Analysis
28
Scope
Health check
Regulatory requirement
Client requirement
Mechanical systems
Power systems
ICT environment
Data security
Certification
29
General Guidelines for Data Collection
1.
2.
3.
4.
5.
6.
Tours
Written Documentation
Physical evidence
Documentation
Drawings
Procedures
7. CMMS
8. BMS
9. Computer based
Monitoring Systems
10. Staffing Interviews
11. Process Review
30
Understand the mission and
goals of the company
Identify the mission statement
Identify the goals of the company
31
Align the process to the mission and goals
Identify Opportunities: Look for gaps in CE
design and process that might prevent the
achievement of mission and goals. Drive the audit
program to support the mission and goals.
Support New Initiatives: Is the company
promoting green initiatives? Is the drive towards
reliability and consistently available services? Has a
natural disaster driven a process? What about security
against terrorist activities?
Public Opinion: Consider public opinion
issues such as energy consumption,
hazardous waste disposal, safety of the
work force, etc.
32
Program Overview and
Governing Documentation
Program Overview – A written document that
describes in detail the goals and methods to be
used in the conducting of data center audits.
Governing Documentation – A list of all forms and
documents that will be used to report and track the
audit process and deliverables.
Document Management System (DMS)
33
Program Overview
The mission statement of the audit program
The program description – a base description of
the audit program – including reasons and goals
Risk prioritization formulas and process of the
creation for each category used in the formula
Governing Agency within the organizational
structure
34
Program Overview
Responsibility Matrix – Lists all potential actions that
will be required as part of the audit, what is expected to
happen, and who will perform the action steps
Deliverables Description – A list of the expected
outcome of the audit and who will receive the documents
Written process for conducting the audit – a process
description for the conduct of the audit – step by step
walkthrough of what will happen
Expected items to be provided upon start of an audit
35
Create the Tools to Meet Goals of Audit
Pre-task spreadsheet
Expected actions list
Document of actionable items
Audit spreadsheet
36
Review and Process Approvals
Management Review
Peer Review
Stakeholder Review
Categorical Expert Review
37
Importance of Obtaining Permissions
Take Photos
Physical Access
Data Storage Devices
Use of Ladders
Special Training
Requirements
38
Use Consistent Methods of Delivery
Process and
Discovery
Method
Tools to Drive
Consistency
Report of Audit
Findings
39
Who will need to be
available for questions?
Engineering Leads
Security Personnel
ICT Management
Individual System Experts
Admins for various systems (BMS, CMMS, DMS,
Monitoring, Drawings, & Documentation)
Consultants
40
Initial Meeting: All Stakeholders
Anyone who has input to the audit or has ownership over
one or more of the critical systems that support the CE
should be invited to this meeting.
CE/DC Manager
Team Leads
Chief Engineer / Infrastructure Manager
They will play a central role into the operation and maintenance of the CE
and as such should not be missing – reschedule to ensure attendance.
41
Closing Meeting: All Stakeholders
The Closing Meeting should be set to coincide with the
completion of the audit when all the findings are
prepared and ready for delivery.
The audit is completed.
Audit findings are ready for delivery
Each item on the list of risks has been identified and properly
researched and referenced.
Ensure that all questions are answered and that the site is prepared
to address each issue to meet code compliance or best practice.
42
Follow-up / Tracking Meetings
Invitees depends on what items are left to discuss – this
is ongoing until completion
This can be included as part of the normal IT/ engineering team
meetings for tracking. Ensure that each item is listed and discussed.
This is a series of meetings intended to track the audit findings until
they have thoroughly been addressed and closed out.
Each item should be tracked and listed by risk
Have an estimated completion date for each item and list a responsible person
to ensure each item is completed
Once all items are closed – the audit is closed out and no further audit meetings
are required.
43
Management Review
44
Contact Information: Jim Nelson
President, BCS
866.629.6327
[email protected]
www.BusinessContinuitySvcs.com
Chairman of the Board,
ICOR
866.765.8321
[email protected]
www.theicor.org
45