Introduction to Disaster Recovery Course 718

Download Report

Transcript Introduction to Disaster Recovery Course 718

Business Continuity & Disaster
Recovery
(14%- approx. 28 questions)
PART 3 LECTURE NOTES
Oyedepo Oyebola
Bsc(Comp Sc),ACA,CISA
[email protected]
08053539765
Recovery Alternatives Strategies






Duplicate IPFs- Dedicated, self developed recovery sites
Hot sites- Fully configured and ready to operate within hours
Warm sites-Partially configured without the main computer
Cold sites- only basic environmental infrastructure
Mobile sites -specially designed trailer to provide a readyconditioned IPF
Reciprocal agreements- Organizations with similar equipment &
applications, promise to provide computer time to each other
when an emergency arises
Contractual provisions for the use of third party sites should cover
the following:
 Configurations, Disaster, Speed of availability
 Subscribers per site, Subscribers per area, Preference
 Insurance coverage, Usage period, Communications
 Warranties, Right-to-Audit, Testing & Reliability
QUIZ
1. A disaster recovery plan (DRP) for an organization's financial
system specifies that the recovery point objective (RPO) is no
data loss and the recovery time objective (RTO) is 72 hours.
Which of the following is the MOST cost-effective solution?
A. A hot site that can be operational in eight hours with
asynchronous backup of the transaction logs
B. Distributed database systems in multiple locations updated
asynchronously
C. Synchronous updates of the data and standby active systems
in a hot site
D. Synchronous remote copy of the data in a warm site that can
be operational in 48 hours
The correct answer is:
D. Synchronous remote copy of the data in a warm site that can be operational in 48 hours
Explanation:
The synchronous copy of the storage achieves the RPO objective and a warm site operational in 48 hours
meets the required RTO. Asynchronous updates of the database in distributed locations do not meet the
RPO. Synchronous updates of the data and standby active systems in a hot site meet the RPO and RTO
requirements but are more costly than a warm site solution.
Procuring Alternative Hardware
A vendor or third party
 Off-the-Shelf
 Credit agreement or emergency credit cards
 Network disaster prevention
 Server disaster recovery plans
Having developed a strategy for the recovery of sufficient IT facilities
to support critical business processes, it is essential that the
strategies for these functions operate until all facilities are restored.
Therefore, they may include:
 Doing nothing until recovery facilities are ready
 Using manual procedures
 Complying only with regulatory and legal requirements
 Focusing on the most important customers,suppliers,
products,systems, etc
 Using PC-based systems to capture data for later processing or to
perform simple local processing

Quiz
2. An IS auditor discovers that an organization’s business continuity
plan provides for an alternate processing site that will
accommodate 50% of the primary processing capability. Based on
this, which of the following actions should the IS auditor take?
A.
Do nothing, because generally less than 25% of all processing is
critical to an organizations survival; therefore, the backup
capacity is adequate.
B.
Identify applications that could be processed at the alternate site,
and develop manual procedures to back up other processing.
C.
Ensure that critical applications have been identified and that the
alternative site could process all such applications.
D.
Recommend that the IPF arrange for an alternate processing site
with the capacity to handle at least 75% of normal processing
Factors to consider in developing a BC/DR plan
 Evacuation procedures
 Pre-disaster readiness for incidence response
 Declaration procedures
 Declaration circumstances
 Clear identification of responsibilities & person
responsible for each function in the plan
 Clear identification of contract information
 Step-by-step explanation of recovery option
 Clear identification of resource requirements
Copies of the Plan must be maintained offsite
Recovery Teams








Incident Response Team-receive the incident info
Emergency action team—first responders
Damage assessment team-Assesses extent of damage
Emergency mgt team—Coordinate all other team
activities & key decision making
Admin support team-provides clerical support
Security team-Continuous monitoring of system
security
Relocation team-Coordinates movt from hot site to
new/restored original location
Others-Training team,Offsite storage team,Software
team,Application team,Emergency operations team,Network
recovery team,Communications team,Transportation team, User
hardware team,Data preparation & record team,Supplies
team,Salvage team,Coordination team,Recovery test team,Legal
affairs team
Other Issues in plan development







Management & User involvement is vital to the success of the BCP
User management involvement is essential to the identification of
critical systems,their associated critical recovery times and the
specification of needed resources
The 3 major divisions that require involvement in the formulation
of BCP are support services, business operations and information
processing support
In developing the plan, It is essential that the entire organization is
considered, not just the IS processing services.
The following items should be included in the plan:
A list of staff, with contact information, required to maintain critical
business functions in the short,medium and long term
The configuration of building facilities,desks,chairs,phones etc
required to maintain critical business functions in the short,medium
and long term
Components of a BCP
Depending upon size and/or requirement of an
organization, A BCP may consist of more than one
plan document. This may include
•Business Recovery(or Resumption) Plan (BRP)
•Continuity Of Operations Plan (COOP)
•IT Contingency plan/Continuity of support plan
•Crisis Communication Plan
•Incident response plan
•Disaster Recovery Plan
•Occupant Emergency Plan
The plan must be consistent with one another
Purpose & Scope of BCP Components
Plan
Purpose
Scope
Provide procedures for sustaining essential
Business Continuity business operations while recovering from a
Plan
significant distruption
Business Recovery
(or Resumption) Plan Provide procedures for recovering business
(BRP)
opperations immediately following a disaster
Addresses business processess; IT addressed
based only on its support for business
processess
Addresses business processess; not ITfocused; IT addressed based only on its support
for business processess
Continuity of
operations plan
(COOP)
Continuity of Support
plan/IT contingency
plan
Crisis
communication plan
Addresses the subset of an organization's
missions that are deemed most critical; usually
written at headquarters level; not IT-focused
Same as IT contigency plan;addresses IT
system distruptions; not business process
focused
Addresses communications with personnel and
the public; not IT focused
Cyber incident
response plan
Disaster Recovery
Plan
Occupant
Emergency Plan
(OEP)
Provide procedures & capabilities to sustain
an organizations at an alternate site for up to
30 days
Provide procedures & capabilities for
recovering a major application or general
support system
Provide procedures for disseminating status
reports to personnel and the public
Provide strategies to detect,respond to, and
limit consequences of malicious cyber
incident
Provide detailed procedures to facilitate
recovery of capabilities at an alternate site
Provide coordinated procedures for minimizing
losss of life or injury and protecting property
damage in response to a physical threat
Focuses on Info. Security responses to
incidents affecting systems and/or networks
Often IT-focussed; limited to major distruptions
with long-term effects
Focuses on personnel and property particular to
the specific facility; not business process or IT
system functionality based
Other components of the plan:
 Key Decision-making personnel
The plan should contain a list or “call tree” i.e a
notification directory, of key decision making IS and
end-user personnel required to initiate and carry out
recovery efforts.
The directory should contain:
 A prioritized list of contacts (I.e who gets call first)
 Primary & emergency phone no & addresses for each
critical person, Equipment & Software Vendors, Hot
sites representative, Insurance, Legal/regulatory
agencies etc
 Backup
of required supplies
• Detailed, up-to-date hard copy procedures that can
be followed easily by staff and contract personnel
• Supply of special forms e.g. Check stock,Invoice
form,& order forms should be secured at an offsite
location
• Specialized programs and equipment should be
provided at the hot site.
Telecommunication Networks Disaster Recovery Methods






Redundancy- Extra capacity, fail over devices, multiple
paths between routers, dynamic routing protocols, etc
Alternative routing- done via different Media e.g copper
cable for fiber optics, dial-up line for a leased line; mobile
phone for a land line, couriers as an alternative to
electronic transmission
Diverse routing- done via a split/duplicate cable facility.
Media is the same and can go with the same/different
conduits.
Long-haul network diversity-long distance network
utilizing T1 circuits (digitized voice transmission lines)
Last-mile circuit protection- redundant combination of
local carriers T1or E1s,microwave and/or coaxial cable
Voice recovery- Redundant cabling and VoIP
Redundant Array of Independent (or Inexpensive) Disks
RAID is is an umbrella term for data storage schemes that divide
and/or replicate data among multiple hard drives.
RAID can be designed to provide increased data reliability and/or
increased I/O performance
RAID combines physical hard disks into a single logical unit either
by using special hardware or software
There are three key concepts in RAID:
◦ Mirroring- the copying of data to more than one disk;
◦ Stripping- the splitting of data across more than one disk; and
◦ Error correction- where redundant data is stored to allow problems to be
detected and possibly fixed (known as Fault tolerance).
RAID LEVELS
Level 0 -Disk Stripping without fault tolerance. Improved performance with no
redundancy or parity
Level 1 -Mirroring, Fault tolerant but slows down writes and cuts available disk
space in half. Best performance in a multiuser environments.
Level 2 - Hamming code ECC- uses an hash algorithm for
recovering lost data- One Recovery Disk is required for every four
disk of storage. HW based & resource-intensive-there4 seldom used
Level 3 -Parallel transfer with byte-level parity – Strip data
across multiple drives (Enhanced form of Level 0) & Dedicated parity
drive for fault tolerance
Level 4 –Independent data disks with shared parity diskSame as 3 but uses block level parity and disk striping rather
than byte-level within a block
Level 5 -Striped Set with Distributed Parity- Strips both
data and parity info across multiple drives at the block level
providing fault tolerant capability.Differs from 4 in that parity
info is distributed across all the disks and its faster.
Level 6- Striped Set with Dual Distributed Parity-similar
to 5 but calculates two sets of parity info for each block of
data. High fault tolerant capabilities
Others- Level 7,10,53,0+1- costly/high-overhead solutions with
limited scalability