Transcript DR 101
Business Continuity: Ensuring Survival
Ron LaPedis, CBCP, CISSP Sr. Product Manager, Compaq
Agenda
Continuity planning? I thought it was called disaster recovery… Why?
Professional practices Continuity planning model Step by step Horror stories Food for thought 2
Some people never learn…
…for 10 minutes…her job was to race through work areas and scoop up appointment books, payroll records and Rolodexes Source: San Francicso Chronicle 11/30/89 Crane Collapse Closes Buildings needed to carry on business elsewhere… Many tenants’ main concern was getting payroll checks…phone lists and calendars (Over 1 month after the Loma Prieta earthquake) 3
Something happens
Disaster event occurs Productivity (Single department or multiple departments) Business process loss Time
Disaster recovery
Disaster event occurs 1112.7
Productivity Business process loss Time
Continuity planning
Disaster event occurs Business process loss Productivity Time
Why?
Downtime is lost revenue
Industry Financial Financial Media Retail Retail Transportation Entertainment Shipping Financial Application Brokerage operations Credit card sales Pay-per-view Home shopping (TV) Catalog sales Airline reservations Tele-ticket sales Package shipping ATM fees Average cost per hour of downtime (US$) $ $ $ $ $ $ $ $ $ 7,840,000 3,160,000 183,000 137,000 109,000 108,000 83,000 34,000 18,000
Source: Contingency Planning Research, 2000
8
Downtime is not acceptable
Time zones are no longer a barrier for conducting business If your site is down, your competition is one click away – – – – – – – – – – Utility failure Communications failure System failure Application failure OS failure Utility upgrade Communications upgrade System upgrade Application upgrade OS upgrade
And what about system and database maintenance?
9
Downtime is controllable
System and network architecture – High-availability systems – Redundant network – Hardened primary site – Remote backup site Continuity planning – Know what you will do before you need to do it 10
Continuity planning perspective
Ensures that an event doesn’t become a disaster Covers a broad spectrum of business and technology issues The key goal: – Required business process availability 11
Disaster Recovery Institute International (DRII)
Mission
DRII’s mission is to provide the leadership and best practices that serve as a base of common knowledge for all business continuity and disaster recovery planners and organizations in the industry.
12
DRII’s professional practices
Pre-planning 1.
Project initiation and management 2.
3.
Planning 4.
Developing business continuity strategies 5.
Risk evaluation and control Business impact analysis Emergency response and operations 6.
Developing and implementing business continuity plans Post-planning 7.
8.
Awareness and training programs Maintaining and exercising business continuity plans 9.
10.
Public relations and crisis communication Coordination with public authorities 13
DRII’s business continuity planning model
1.
Project initiation phase 2.
Functional requirements phase 3.
Design and development phase 4.
Implementation phase 5.
Testing and exercise phase 6.
Maintenance and update phase 7.
Execution phase 14
It’s a process
Start Maintenance and updating
Project initiation
Required availability times Testing and exercising Business continuity process Functional requirements Procedures Implementation Design and development
Project initiation phase
Management commitment and policies Objectives and requirements Baseline assumptions Project management Teams – – – – Delphi – Business function knowledge Corporate team – Infrastructure / common activities EMT – Emergency Management Team ‘the workers’ CMT – Crisis Management Team ‘the decision makers’ 16
Project initiation phase
Project management
CP is a process consisting of programs and projects It does not take a subject matter expert to manage projects, it takes a project manager Use your CP experts to perform CP activities, not to manage projects.
17
Maintenance and updating Project initiation Required availability times Testing and exercising Business continuity process
Functional requirements
Procedures Implementation Design and development
Functional requirements phase
Fact gathering, alternatives and decisions Risk analysis and controls Business impact analysis – RTO – Recovery Time Objective – How fast – RPO – Recovery Point Objective – How much Alternative strategies Cost benefit analysis and budgeting 19
Functional requirements phase
Risk analysis
Asset inventory and definition Communication and monitoring Vulnerability and threat assessment Decision Evaluation of controls 20
Functional requirements phase
Risk analysis
Quantitative – Facts and figures, hard – – – – Statistical Actuarial Annualized Loss Exposure (ALE) Objective Qualitative – Not calculable, soft – Reputation – – Future market share Subjective 21
Functional requirements phase
Risk analysis
Controls do not reduce the threat, they reduce the exposure (and hence, the risk) 22
Functional requirements phase
Business impact analysis
C O S T Acceptable downtime L O S S Maximum cost of control Time to recover 23
Maintenance and updating Project initiation Required availability times Testing and exercising Business continuity process Functional requirements Procedures Implementation
Design and development
Design and development phase
Scope and objectives Recovery teams Cookbook Key disaster scenario Escalation, notification, and activation 25
Design and development phase
Recovery teams
Evaluation and declaration Notification Emergency response Interim processing Salvage Relocation/reentry 26
Design and development phase
Key disaster scenario
“A fire broke out in the computer room. We are unsure of the state of the computers and data stored there. The building has been shut down by the fire department until they are sure that it is safe to enter. They are estimating that we will not have access to the building for a couple of days” 27
Design and development phase
Escalation, notification, and activation
Who activates the EMT?
How does the EMT get activated?
Who decides to activate the CMT?
How does the CMT get activated?
How does the CMT decide to activate the plan?
What happens if certain members of the CMT are unavailable?
28
Maintenance and updating Project initiation Required availability times Testing and exercising Business continuity process Functional requirements Procedures
Implementation
Design and development
Implementation phase
Emergency response Command and control Designation of authority Scripts Vendors and resources 30
Implementation phase
Designation of authority
Who is in charge?
– If they are not available, who is in charge?
If they are not available, who is in charge?
– If they are not available, who is in charge?
Committees cannot be in charge!
31
Implementation phase
Scripts
Step by step listing of activities to be performed every step of the way – In a disaster situation, people do not think rationally Scripts can be tested, tuned, and tested again – The person who follows a script does not need to be the person who developed the script Automate as much as possible – One company has 800 automated scripts just for recovering their database!
32
Implementation phase
Vendors and resources
Hot site, warm site, cold site, off-site records storage Equipment replacement Rent-a-guard Salvage experts Catering Hotel rooms, rental cars Local authorities – Police, fire, hospitals, hazmat teams 33
Maintenance and updating Project initiation Required availability times
Testing and exercising
Business continuity process Functional requirements Procedures Implementation Design and development
Testing and exercise phase
Training and awareness Exercise program objectives Exercise plans, scenarios and exercises Evaluation and modification 35
Testing and exercise phase
Exercise program objectives
Practice makes perfect – Some companies spend hundreds of hours tweaking parts of their plans to decrease recovery time
Every second counts
36
Testing and exercise phase
Evaluation and modification
What went wrong and how do we fix it for next time?
Do
not
find someone to blame. A fault found now could save your company later Were any of our assumptions wrong?
Do we need to revisit a previous phase?
37
Maintenance and updating
Project initiation Required availability times Testing and exercising Business continuity process Functional requirements Procedures Implementation Design and development
Maintenance and update phase
Remember to budget for this phase. An untested, stale plan is worse than no plan at all!
Review criteria – still current?
Status, reporting, and audits Distribution and security – Your plan is a competitive asset 39
Execution phase
If an event becomes a disaster – Decide – Declare – Notify – Execute 40
Not just an IT problem
IT can recover computers and applications, not Business Processes The computers are humming, the applications are loaded…
. . . and no one is around to use them
Like Cheerios are part of a complete breakfast…
IT recovery is part of a complete contingency plan
41
Horror Stories
Horror stories
Your backup site is in Atlantic city. You declare during the Miss America pageant (Hurricane Andrew) Your computer room is in the basement and there’s a fire in the building (Bell Canada) Will the generators be safe? Do you have a way to refuel them? (Tropical storm Allison) 43
Horror stories
1.
You power up the generators and nothing happens 2.
You power up the generators and the power surge blows out your systems 3.
You power up the generators and realize that your air conditioning isn’t on backup power Hint: Exercise your plan!
44
Food for thought Tapes
Where is your tape backup hardware?
Where are tapes stored until they go offsite?
How quickly do your tapes go offsite?
Are multiple tape copies sent via different routes?
Do you do tape retrieval / restore tests?
For recovery, do you ship tapes in ‘waves?’ 45
Food for thought Replicated enterprise storage
Vendors guarantee disk integrity – Backup disk = primary disk at a bit level Database integrity is
not
guaranteed Your database software needs to recover the database to a consistent state before you can begin processing on the backup system 46
Physical disk does not equal logical database
Disk 1 D1 D1
Database disk cache flushed infrequently for performance
Disk 2 D2 D2 Audit Log Disk T1 B T2B T3 B D1 D2 D1 D2 D1 T3 C D2 T1
Audit disk cache flushed at transaction commit for safety
Source system Target system C D2 D1
Not flushed to disk but transaction committed and log flushed
Disk 1 D1 D1 Disk 2 Audit Log Disk
On disk, but not committed
D2 D2 T1 B T2B T3 B D1 D2 D1 D2 D1 T3 C D2 T1 C
= disk cache flush
47
Food for thought
Check your third party site contract – How many other companies in the same threat area use the same vendor?
– How soon do you have to vacate? Where will you go?
– Have you included workstations and space for them?
48
Remember that building?
One year later, the tornado-scarred Bank One tower in Ft. Worth Texas is still closed.
2000/03/30 2001/02/10 49