Transcript DR 101

Business Continuity: Ensuring Survival

Ron LaPedis, CBCP, CISSP Sr. Product Manager, Compaq

Agenda

 Continuity planning? I thought it was called disaster recovery…  Why?

 Professional practices  Continuity planning model  Step by step  Horror stories  Food for thought 2

Some people never learn…

…for 10 minutes…her job was to race through work areas and scoop up appointment books, payroll records and Rolodexes Source: San Francicso Chronicle 11/30/89 Crane Collapse Closes Buildings needed to carry on business elsewhere… Many tenants’ main concern was getting payroll checks…phone lists and calendars (Over 1 month after the Loma Prieta earthquake) 3

Something happens

Disaster event occurs Productivity (Single department or multiple departments) Business process loss Time

Disaster recovery

Disaster event occurs 1112.7

Productivity Business process loss Time

Continuity planning

Disaster event occurs Business process loss Productivity Time

Why?

Downtime is lost revenue

Industry Financial Financial Media Retail Retail Transportation Entertainment Shipping Financial Application Brokerage operations Credit card sales Pay-per-view Home shopping (TV) Catalog sales Airline reservations Tele-ticket sales Package shipping ATM fees Average cost per hour of downtime (US$) $ $ $ $ $ $ $ $ $ 7,840,000 3,160,000 183,000 137,000 109,000 108,000 83,000 34,000 18,000

Source: Contingency Planning Research, 2000

8

Downtime is not acceptable

  Time zones are no longer a barrier for conducting business If your site is down, your competition is one click away – – – – – – – – – – Utility failure Communications failure System failure Application failure OS failure Utility upgrade Communications upgrade System upgrade Application upgrade OS upgrade

And what about system and database maintenance?

9

Downtime is controllable

 System and network architecture – High-availability systems – Redundant network – Hardened primary site – Remote backup site  Continuity planning – Know what you will do before you need to do it 10

Continuity planning perspective

 Ensures that an event doesn’t become a disaster  Covers a broad spectrum of business and technology issues  The key goal: – Required business process availability 11

Disaster Recovery Institute International (DRII)

Mission

DRII’s mission is to provide the leadership and best practices that serve as a base of common knowledge for all business continuity and disaster recovery planners and organizations in the industry.

12

DRII’s professional practices

Pre-planning 1.

Project initiation and management 2.

3.

Planning 4.

Developing business continuity strategies 5.

Risk evaluation and control Business impact analysis Emergency response and operations 6.

Developing and implementing business continuity plans Post-planning 7.

8.

Awareness and training programs Maintaining and exercising business continuity plans 9.

10.

Public relations and crisis communication Coordination with public authorities 13

DRII’s business continuity planning model

1.

Project initiation phase 2.

Functional requirements phase 3.

Design and development phase 4.

Implementation phase 5.

Testing and exercise phase 6.

Maintenance and update phase 7.

Execution phase 14

It’s a process

Start Maintenance and updating

Project initiation

Required availability times Testing and exercising Business continuity process Functional requirements Procedures Implementation Design and development

Project initiation phase

 Management commitment and policies  Objectives and requirements  Baseline assumptions  Project management  Teams – – – – Delphi – Business function knowledge Corporate team – Infrastructure / common activities EMT – Emergency Management Team ‘the workers’ CMT – Crisis Management Team ‘the decision makers’ 16

Project initiation phase

Project management

 CP is a process consisting of programs and projects  It does not take a subject matter expert to manage projects, it takes a project manager  Use your CP experts to perform CP activities, not to manage projects.

17

Maintenance and updating Project initiation Required availability times Testing and exercising Business continuity process

Functional requirements

Procedures Implementation Design and development

Functional requirements phase

 Fact gathering, alternatives and decisions  Risk analysis and controls  Business impact analysis – RTO – Recovery Time Objective – How fast – RPO – Recovery Point Objective – How much  Alternative strategies  Cost benefit analysis and budgeting 19

Functional requirements phase

Risk analysis

Asset inventory and definition Communication and monitoring Vulnerability and threat assessment Decision Evaluation of controls 20

Functional requirements phase

Risk analysis

 Quantitative – Facts and figures, hard – – – – Statistical Actuarial Annualized Loss Exposure (ALE) Objective  Qualitative – Not calculable, soft – Reputation – – Future market share Subjective 21

Functional requirements phase

Risk analysis

Controls do not reduce the threat, they reduce the exposure (and hence, the risk) 22

Functional requirements phase

Business impact analysis

C O S T Acceptable downtime L O S S Maximum cost of control Time to recover 23

Maintenance and updating Project initiation Required availability times Testing and exercising Business continuity process Functional requirements Procedures Implementation

Design and development

Design and development phase

 Scope and objectives  Recovery teams  Cookbook  Key disaster scenario  Escalation, notification, and activation 25

Design and development phase

Recovery teams

 Evaluation and declaration  Notification  Emergency response  Interim processing  Salvage  Relocation/reentry 26

Design and development phase

Key disaster scenario

“A fire broke out in the computer room. We are unsure of the state of the computers and data stored there. The building has been shut down by the fire department until they are sure that it is safe to enter. They are estimating that we will not have access to the building for a couple of days” 27

Design and development phase

Escalation, notification, and activation

 Who activates the EMT?

 How does the EMT get activated?

 Who decides to activate the CMT?

 How does the CMT get activated?

 How does the CMT decide to activate the plan?

 What happens if certain members of the CMT are unavailable?

28

Maintenance and updating Project initiation Required availability times Testing and exercising Business continuity process Functional requirements Procedures

Implementation

Design and development

Implementation phase

 Emergency response  Command and control  Designation of authority  Scripts  Vendors and resources 30

Implementation phase

Designation of authority

 Who is in charge?

– If they are not available, who is in charge?

 If they are not available, who is in charge?

– If they are not available, who is in charge?

 Committees cannot be in charge!

31

Implementation phase

Scripts

 Step by step listing of activities to be performed every step of the way – In a disaster situation, people do not think rationally  Scripts can be tested, tuned, and tested again – The person who follows a script does not need to be the person who developed the script  Automate as much as possible – One company has 800 automated scripts just for recovering their database!

32

Implementation phase

Vendors and resources

 Hot site, warm site, cold site, off-site records storage  Equipment replacement  Rent-a-guard  Salvage experts  Catering  Hotel rooms, rental cars  Local authorities – Police, fire, hospitals, hazmat teams 33

Maintenance and updating Project initiation Required availability times

Testing and exercising

Business continuity process Functional requirements Procedures Implementation Design and development

Testing and exercise phase

 Training and awareness  Exercise program objectives  Exercise plans, scenarios and exercises  Evaluation and modification 35

Testing and exercise phase

Exercise program objectives

 Practice makes perfect – Some companies spend hundreds of hours tweaking parts of their plans to decrease recovery time

Every second counts

36

Testing and exercise phase

Evaluation and modification

 What went wrong and how do we fix it for next time?

 Do

not

find someone to blame. A fault found now could save your company later  Were any of our assumptions wrong?

 Do we need to revisit a previous phase?

37

Maintenance and updating

Project initiation Required availability times Testing and exercising Business continuity process Functional requirements Procedures Implementation Design and development

Maintenance and update phase

 Remember to budget for this phase. An untested, stale plan is worse than no plan at all!

 Review criteria – still current?

 Status, reporting, and audits  Distribution and security – Your plan is a competitive asset 39

Execution phase

 If an event becomes a disaster – Decide – Declare – Notify – Execute 40

Not just an IT problem

 IT can recover computers and applications, not Business Processes  The computers are humming, the applications are loaded…

. . . and no one is around to use them

 Like Cheerios are part of a complete breakfast…

IT recovery is part of a complete contingency plan

41

Horror Stories

Horror stories

 Your backup site is in Atlantic city. You declare during the Miss America pageant (Hurricane Andrew)  Your computer room is in the basement and there’s a fire in the building (Bell Canada)  Will the generators be safe? Do you have a way to refuel them? (Tropical storm Allison) 43

Horror stories

1.

You power up the generators and nothing happens 2.

You power up the generators and the power surge blows out your systems 3.

You power up the generators and realize that your air conditioning isn’t on backup power Hint: Exercise your plan!

44

Food for thought Tapes

 Where is your tape backup hardware?

 Where are tapes stored until they go offsite?

 How quickly do your tapes go offsite?

 Are multiple tape copies sent via different routes?

 Do you do tape retrieval / restore tests?

 For recovery, do you ship tapes in ‘waves?’ 45

Food for thought Replicated enterprise storage

 Vendors guarantee disk integrity – Backup disk = primary disk at a bit level  Database integrity is

not

guaranteed  Your database software needs to recover the database to a consistent state before you can begin processing on the backup system 46

Physical disk does not equal logical database

Disk 1 D1 D1

Database disk cache flushed infrequently for performance

Disk 2 D2 D2 Audit Log Disk T1 B T2B T3 B D1 D2 D1 D2 D1 T3 C D2 T1

Audit disk cache flushed at transaction commit for safety

Source system Target system C D2 D1

Not flushed to disk but transaction committed and log flushed

Disk 1 D1 D1 Disk 2 Audit Log Disk

On disk, but not committed

D2 D2 T1 B T2B T3 B D1 D2 D1 D2 D1 T3 C D2 T1 C

= disk cache flush

47

Food for thought

 Check your third party site contract – How many other companies in the same threat area use the same vendor?

– How soon do you have to vacate? Where will you go?

– Have you included workstations and space for them?

48

Remember that building?

 One year later, the tornado-scarred Bank One tower in Ft. Worth Texas is still closed.

2000/03/30 2001/02/10 49

Thank you!