Let`s Get Real: Disaster Recovery and Business Continuity

Download Report

Transcript Let`s Get Real: Disaster Recovery and Business Continuity

Let’s Get Real: Disaster
Recovery and Business
Continuity in Public Safety
Is Yours Just a Paper Plan or a Real
Way to Prepare and Respond to
Incidents and Disasters?
Presentation Overview
•
Key DR/BC Concepts and Issues
–
–
–
–
–
–
–
–
–
–
–
–
•
•
•
Player Scorecard: Who Is In the Game and Why?
DR/ BC Framework
Action Steps to a Real Plan
–
–
–
–
–
•
Report card and dashboard
Scenarios
Requirements: What has to operational by when for work to be done by how many at what
locations serving what customers who are where?
Facilities
People
Systems
Integration
Coordination
Daily readiness and simulated escalations
Testing and independent verification and validation
Implementation and triage
Recovery, discovery, and improvements
First steps
Critical functions
Funding and leveraging scarce resources
Think out of the box
Integration with the big picture DR/BC plan and activities of your jurisdiction
Conclusions
Key DR/BC Concepts and
Issues
The Report Card and
Dashboard
• All aspects of the plan, test, and
implementation should be scored simply
(Red, Yellow, and Green)
• Key indicators of planning and readiness
need a dashboard to enable assessment
and action
– Score or status
– Trend
– Key issue
Public Safety Scenarios
• Public safety entities have a more difficult
challenge
• Your IT DR/BC plan is intertwined with risk
scenarios
• You may be affected by the risks of a given
scenario and your IT plan must address those
risks appropriately to maintain operations
• You also have a role in response to the
scenario so the events will affect your
operational requirements
Scenarios Overview
• Threat driven geographic circles of impact
• Kinds of threats and events
• Responsibility
– What will you do, what is shared, what do others
have to do for themselves
• Tolerance for risk and uncertainty
• Lesson learned: if you have a well known
and documented local risk:
– Have a real plan or get ready for a career
change…
Source: IBM
Scenarios
• Identify Possible and Likely Natural
Disasters and Environmental Conditions By
Kind and Duration of Effects
– Tornado
– Hurricane
– Tsunami
– Flood
– Snowstorm
– Drought
– Earthquake
Scenarios
• Identify Possible and Likely Natural
Disasters and Environmental Conditions By
Kind and Duration of Effects
– Electrical storms
– Fire
– Subsidence and landslides
– Freezing Conditions
Scenarios
• Identify Possible and Likely Natural
Disasters and Environmental Conditions By
Kind and Duration of Effects
– Contamination, Toxic releases and
environmental hazards
– Epidemic
– Pandemic
– Animal or crop disease outbreak
Scenarios
• Organized and/or Deliberate Disruption
– Act of terrorism
• WMD
– Acute and short lived (bomb)
– Acute and long lived (dirty bomb)
– Chronic
» Long term (contaminants and biohazards)
» Permanent (radioactivity, etc.)
• WLD (suicide bombers, car bombs, utility sabotage)
• Bioterrorism or genetically modified or inorganic
organisms
– Direct contact
– Infectious
» Contact
» Airborne
Scenarios
• Organized and/or Deliberate Disruption
– Act of Sabotage
– Product or food tampering
– Act of war
– Theft
– Arson
– Labor Disputes / Industrial Action
Scenarios
• Loss of Utilities and Services
– Electrical power failure
– Loss of gas supply
– Loss of water supply
– Petroleum and oil shortage
• Raw materials
• Refined materials
– Communications services breakdown
– Loss of drainage / waste removal and trash
pickup
Scenarios
• Equipment or System Failure
– Internal power failure
– HVAC failure
– Equipment failure (excluding IT hardware)
Scenarios
• Serious Information Security Incidents
– Cyber crime
– Malware
– Zombie attacks
– Denial of service
– Loss or alteration of records or data
– Disclosure of sensitive information
Scenarios
• IT system failure (local or hosted)
– Hardware
– Software
• Commercial application
• Locally developed application
– Data
– Communications
Scenarios
• Other Emergency Situations
– Workplace violence
– Public transportation disruption
– Neighborhood hazard
– Health and safety issues
Scenarios
• Multiple and compound hazards and
events
– Purposeful
– Coincidental
– Causally connected
– Interrelated
IT Requirements
• What systems need to function
• How fast
– Maximum and optimum time frame for each
system or function to be restored
• How well
– Sometimes minimal functionality is sufficient
IT Requirements
• Where will it be used and by whom and
will the communications infrastructure
support it?
– Employees
– Users or beneficiaries
• By what priority will systems be restored
• The priority will be modified by what
contingencies
– E.g. a long term total evacuation changes the
operational needs for criminal justice systems
and personnel
Facilities
•
•
•
•
•
•
•
•
Hot, warm, cold
Mirrored, recoverable, reload-able
Properly located
EOC
Non-EOC
Operational
IT facilities
For user interaction with IT systems
Facilities
• New kinds of mutual aid and sister
city/county/state arrangements
– Work with friends, colleagues, associations,
and vendors
– To match you with a comparable entities that
are located outside the various geographic
threat circles
– Who can mirror your IT operations (hardware,
software, operating systems, and culture)
People
• The right numbers, skills, location,
redundancy, etc.
– Skills and abilities inventory
•
•
•
•
Employees
Contractors
Vendors
Mutual aid and “the cavalry”
People
• Force in depth—who is the backup to the
backup to the backup?
• Consider the actual health and physical
abilities and disabilities of a person when
assigning tasks for a disaster scenario
– The disaster is not the time to find out the
electrician in the hazmat suit has a heart
condition
• What family and personal duties may
interfere with performing official duties (e.g.
save your own kids or save a stranger)?
Systems
•
•
•
•
Daily operational
Interdependent systems
Emergency only
Identity security and access management
for physical and logical security
– Follow FIPS 201 for federal/state/local
interoperability
Integration
• With whom should you work closely?
• Identify integration issues between:
– Internal systems and public safety entities
– Other governmental systems
– Related actors
– Non-governmental systems and processes
• Example: 911 and 311or its equivalent
– Normally separate but related
– Emergencies blur the line
– Co-location, cross training, and system
integration
Coordination
•
•
•
•
•
Within organization
Within unit of government
Across units of government
Across levels of government
Across public and private boundaries
Daily Readiness and
Simulated Escalations
•
•
•
•
A disaster a day (“What, that’s not normal?”)
Realistic scenarios
Captured lessons
Learning and actually responding to lessons
learned within risk framework
• A quality and security framework for daily
operations has substantial overlap with
DR/BC
Security Capabilities Models
Like similar capability
models from the
Carnegie Mellon SEI,
SCMM models brings
benefits:
– Helps close security
holes
– Serves as a foundation
for growth
– Guides security
leadership
– Is evolutionary, not
chaotic
– Supports point
solutions
Security Leadership
Strategy
Security Sponsorship
Causes
Security Strategy
Security Program
Security Program Structure
Security Program Resources and Skillsets
Management
Security Policies
Security Policies, Standard and Guidelines
Security Management
Security Administration
Security Monitoring
Knowledge
User Management
User Management
User Awareness
Information Asset Security
Application Security
Technologies
Database / Information Security
Host Security
Internal Network Security
Network Perimeter Security
Technology Protection and Continuity
Support
Physical and Environment Controls
Contingency Planning Controls
KPMG SCMM Model
Effects
Capability Maturity
Like the SCI
CMM
models, the
KPMG
Security
Capability
Model has
five levels of
maturity:
Optimizing
(5)
Continuously
improving
process
Managed
(4)
Predictable
process
Standard,
consistent
process
Disciplined
process
Informal
process
Defined
(3)
Repeatable
(2)
Initial
(1)
Testing and Independent
Verification and Validation
• Does the planned response or action step
actually work?
• Who verifies that it does?
• What do you do if it fails the test?
Implementation and Triage
• Someone better be in charge
• Dispute resolution processes
• Who will be your Sensibility and Sanity
Checker (off site, not affected by the
disaster, and actually getting enough sleep
to make sound decisions)?
• Baton Rouge example with Mayor Holden
Recovery, Discovery, and
Improvements
• What will the new normal be and when will
it happen
• Learn from history, both recent and long
past
• Document while the event occurs if at all
possible (make it someone’s job) or soon
after before memories fade
Player Scorecard
Who Is In the Game and Why
Overlapping and InterRelated Responsibilities
Disaster
Preparedness and
Recovery and
Business
Continuity
Physical Security
Public Safety
Quality Assurance
Methodologies Cyber Security
The Usual Suspects in
Public Safety
• Police
• Fire
• Other sworn officers (transit, game, building
or branch based, etc.)
• National Guard
• Public Health
• Public Works
• Transportation
• Environmental Protection
The Usual Suspects in
Emergency Management
• Federal, state and local emergency
management entities
• National Guard
• NOAA, NWS, NSSL, other National
Laboratories,
• Corps of Engineers
IT Entities
• CIO, CTO, and Enterprise IT Shops
• Distributed IT Departments and leadership
• Government IT contractors
– DR/BC specific entities
– Applications developers and software
– Hardware
– Service providers (ASP, MSP, call centers, etc.
• Communications providers
Policy Makers
• Executive, legislative, and judicial
– Those who hold the seat and those who
actually make the decisions…
– Go below the top level to ensure clarity,
alignment, and redundancy
• EOC designees
• Emergency authorizers
Non-Governmental
Organizations
• Media
– Broadcast and satellite
• Emergency Broadcast System Members
– Print
– New media
• The Web
– Government site mangers
– Commercial site managers
– Citizens and bloggers
– Self-organizing communities (e.g. Craig’s List)
Non-Governmental
Organizations
•
•
•
•
Charities
Businesses and business associations
Community organizations
Vital private services (hospitals, nursing
homes, etc. )
A DR/BC Framework
Business Operations
and Technology
• Create a matrix, not a linear or
organizational view
• Strategy
• Organization
• Processes
• Applications and data
• Technology
• Facilities
Source: IBM
Action Steps to a Real Plan
First Steps
First Steps
• Leadership: clarity, alignment, and
commitment
• Authority or consensus?
• Stakeholders roles and responsibilities
• Be clear about risk tolerance
• Applications and IT assets inventory
– If needed, dust off and update your Y2K work
• Good data on plan status, readiness, test
results, response, and compliance
First Steps
• Make a friend in accounting—actuarially
accurate threat scenarios are more likely to
be funded as risk and cost can be properly
balanced
• Review existing plan or make a plan
• Borrow or buy a template
• Review peer plans and conduct site visits
• Communicate until it hurts
Critical Functions
Nail Down Your Critical
Functions
• Law and order essentials (people, mobility,
tools, survival basics, etc.)
• Communications
• Personnel management (policies,
scheduling, notification trees and systems,
counseling, etc.)
• Data and the connections to data and people
• Transactional systems
Nail Down Your Critical
Functions
• Rescue and response
• Pipeline to the health care system
• Building/location/hazmat information for fire
and first responders
• Justice processing and incarceration
• Dispatch
Nail Down Your Critical
Functions
• Records
• Mobility
– Devices and local storage if communications are
intermittent or fail (e.g. mobile maps and
databases)
• Know what you can actually cover (and what
you are just waiving your hands at and
hoping it either works or is never needed)
Funding and Leverage
Funding and Leverage
• Work within your risk/threat/cost/benefit
matrix and follow your own rules
• How serious are you about being
prepared?
Funding and Leverage
• Stop building single purpose
infrastructures and reuse what you have
– “Ask not, what an infrastructure can do for
you, but what it can do for your taxpayers”
• Use shared services
• Follow standards or help create them if
lacking
Funding and Leverage
• Determine what pre-existing, unmet needs
can be addressed by a new investment
• Determine whether existing public safety
or enterprise systems will do the job and if
you can use them
• Invest wisely
– Vendors over inventors
– COTS over customization
– Web services over hard coding
Think Out of the Box
Think Third World
•
•
•
•
•
•
•
Hand crank your computers
Bike generators
Solar and wind power
Portable water purifiers
Emergency shelter
Runners and mountain bikes
Hand tools
Think New World
• Internet Protocol (IP) everything
– Bridge between radio, wireless data/WI-FI and
use each as IP conduits as needed
• Gigs of portable flash memory
• Satellite data and telephony
Think New World
•
•
•
•
Instant Message
Text and mobile email
Cell On Wheels/Boat/Balloon
Negotiate/legislate priority and bumping
rights in telecommunications provisioning
Integrate With the Big DR/BC
Picture
The Big Picture
• Consult EM before, during, and after
• Once essential public safety systems have
a DR/BC IT and overall plan it can be
incorporated into the overall EM plan for
the jurisdiction
• Tie it all together in formal and informal
agreements
• Create a focal point such as your EOC
EOC Basics
• Not located in a hazard area (floodway)
• 500 square feet minimum floor space
• Communications section adjacent to EOC
• Three methods of communications with state EMA
and local responders
• UPS and generator systems located above flood
level
• Sleeping space for identified staff
• Kitchen space/food or meal contract
• New construction to International Building Code
Source: Alabama EMD
Conclusion: Essential Public Safety
Systems and Organizations Must
Be Disaster Resistant, Flexible,
Diversified, and Redundant
(Or We Are All In Big Trouble)
Contact Information
Richard J. H. Varn
Center for Digital Government
[email protected]
Model Plan Outline
• What follows is a private sector based, but
broadly applicable tool that sells for $199
• To buy a copy of the business continuity
plan generator see http://www.eoncommerce.com/rusecure/bcp.asp
Model Plan Outline
•
•
•
•
Business Continuity - Preparing the Plan
Initiating the BCP Project
Project Initiation Activities
BC 010101 Review of Existing BCP (if
available)
Model Plan Outline
• BC 010102 Benefits of Developing a BCP
• BC 010103 BCP Policy Statement
• BC 010104 Preliminary BCP Project
Budget
• BC 010105 Procedure for Approving BCP
Content
Model Plan Outline
• BC 010106 Communication on BCP
Project to All Employees
• Project Organization
• BC 010201 Terms of Reference for BCP
Project Manager
• BC 010202 Appoint BCP Project Manager
and Deputy
• BC 010203 Select and Notify BCP Project
Team
Model Plan Outline
• BC 010204 Initial BCP Project Meeting
• BC 010205 Project Objectives and
Deliverables
• BC 010206 Project Milestones
• BC 010207 Project Reporting
Requirements and Frequency
• BC 010208 Required Documents and
Information
Model Plan Outline
• Assessing Business Risk and Impact of
Potential Emergencies
• Emergency Incident Assessment
• BC 020101 Environmental Disasters
• BC 020102 Organized and / or Deliberate
Disruption
Model Plan Outline
• BC 020103 Loss of Utilities and Services
• BC 020104 Equipment or System Failure
• BC 020105 Serious Information Security
Incidents
• BC 020106 Other Emergency Situations
• Business Risk Assessment
Model Plan Outline
• BC 020201 Key Business Processes
• BC 020202 Establish Time-Bands for
Business Service Interruption
Measurement
• BC 020203 Financial and Operational
Impact
• IT and Communications
Model Plan Outline
• BC 020301 Specifications of IT and
Communication Systems and Business
Dependencies
• BC 020302 Key IT, Communications and
Information Processing Systems
• BC 020303 Key IT Personnel and Emergency
Contact Information
• BC 020304 Key IT and Communications
Suppliers and Maintenance Engineers
• BC 020305 Existing IT Recovery Procedures
Model Plan Outline
• Existing Emergency Procedures
• BC 020401 Summary of Existing
Procedures for Handling Emergency
Situations
• BC 020402 Key Personnel Responsible
for Handling Existing Emergency
Procedures
• BC 020403 External Emergency Services
and Contact Numbers
Model Plan Outline
• BC 020500 Premises Issues
• BC 020501 Responsibility and Authority
for Building Repairs
• BC 020502 Back-up Power Arrangements
• Preparing for a Possible Emergency
Model Plan Outline
• Back-up and Recovery Strategies
• BC 030101 Alternative Business Process
Handling Strategy
• BC 030102 IT Systems Back-Up and
Recovery Strategy
• BC 030103 Premises and Essential
Equipment Back-up and Recovery
Strategy
Model Plan Outline
• BC 030104 Customer Service Back-up
and Recovery Strategy
• BC 030105 Administration and Operations
Back-up and Recovery Strategy
• BC 030106 Information and
Documentation Back-up and Recovery
Strategy
• BC 030107 Insurance Coverage
• Key BCP Personnel and Supplies
Model Plan Outline
• BC 030201 Functional Organization Chart
• BC 030202 BCP Project Co-coordinator
and Deputy for Each Functional Area
• BC 030203 Key Personnel and
Emergency Contact Information
• BC 030204 Key Suppliers and Vendors
and Emergency Contact Information
• BC 030205 Manpower Recovery Strategy
Model Plan Outline
• BC 030206 Establishing the Disaster
Recovery Team
• BC 030207 Establishing the Business
Recovery Team
• Key Documents and Procedures
• BC 030301 Documents and Records Vital
to the Business Process
• BC 030302 Off-site Storage
Model Plan Outline
• BC 030303 Emergency Stationery and
Office Supplies
• BC 030304 Media Handling Procedures
• BC 030305 Emergency Authorization
Procedures
• BC 030306 Prepare Budget for Back-up
and Recovery Phase
Model Plan Outline
• Disaster Recovery Phase
• Planning for Handling the Emergency
• BC 040101 Identification of Potential
Disaster Status
• BC 040102 Involvement of Emergency
Services
• BC 040103 Assessing Potential Business
Impact of the Emergency
Model Plan Outline
• BC 040104 Project Management Activities
• Notification and Reporting During
Recovery Phase
• BC 040201 Mobilizing the Recovery Team
• BC 040202 Notification to Management
and Key Employees
Model Plan Outline
• BC 040203 Handling Personnel Families
Notification
• BC 040204 Handling Media during the
Disaster Recovery Phase
• BC 040205 Maintaining Event Log during
Disaster Recovery Phase
• BC 040206 Disaster Recovery Phase
Report
• Business Recovery Phase
Model Plan Outline
• Managing the Business Recovery Phase
• BC 050101 Mobilizing the Business
Recovery Team
• BC 050102 Assessing Extent of Damage
and Business Impact
• BC 050103 Preparing Specific Recovery
Plan
Model Plan Outline
• BC 050104 Monitoring Progress
• BC 050105 Keeping Everyone Informed
• BC 050106 Handing Business Operations
Back to Regular Management
• BC 050107 Preparing Business Recovery
Phase Report
• Business Recovery Activities
Model Plan Outline
• BC 050201 Power and Other Utilities
• BC 050202 Premises, Fixtures and
Furniture (Facilities Recovery
Management)
• BC 050203 Communication Systems
• BC 050204 IT Systems (Hardware and
Software)
Model Plan Outline
•
•
•
•
BC 050205 Production Equipment
BC 050206 Other Equipment
BC 050207 Warehouse and Stock
BC 050208 Trading, Sales and Customer
Service
Model Plan Outline
• BC 050209 Human Resources
• BC 050210 Information and
Documentation
• BC 050211 Office Supplies
• BC 050212 Operations and Administration
(Support Services)
Model Plan Outline
•
•
•
•
•
Testing the Business Recovery Process
Planning the Tests
Develop Objectives and Scope of Tests
Setting the Test Environment
Environmental Disasters
Model Plan Outline
•
•
•
•
•
•
•
Organized and / or deliberate disruption
Loss of Utilities and Services
Equipment or System Failure
Serious Information Security Incidents
Other Emergency Situations
Prepare Test Data
Identify Who is to Conduct the Tests
Model Plan Outline
• Identify Who is to Control and Monitor the
Tests
• Prepare Feedback Questionnaires
• Prepare Budget for Testing Phase
• Training Core Testing Team for each
Business Unit
Model Plan Outline
• Conducting the Tests
• Test each part of the Business Recovery
Process
• Test Accuracy of Employee and Vendor
Emergency Contact Numbers
• Assess Test Results
• Training Staff in the Business Recovery
Process
Model Plan Outline
•
•
•
•
•
•
•
•
Managing the Training Process
Develop Objectives and Scope of Training
Training Needs Assessment
Training Materials Development Schedule
Prepare Training Schedule
Communication to Staff
Prepare Budget for Training Phase
Assessing the Training
Model Plan Outline
•
•
•
•
Feedback Questionnaires
Assess Feedback
Keeping the Plan Up-to-date
Maintaining the BCP
Model Plan Outline
• Change Controls for Updating the Plan
• Responsibilities for Maintenance of Each
Part of the Plan
• Test All Changes to Plan
• Advise Person Responsible for BCP
Training