Disaster Recovery Capstone Project
Download
Report
Transcript Disaster Recovery Capstone Project
Advisor:
Jim French, Dept of Ecology
Team Members:
Scott Andersen, WSDOT
Gary Duffield, DIS
Doug Selix, OFM
Thelma Smith, WSDOT
Brian Sylvester, DOP
How can the state achieve a coordinated
approach to IT disaster recovery?
How will we recover critical services and
infrastructure knowing that we share services,
platforms, and customers that rely on each other
for data during the recovery?
How do we expose the risks, identify the gaps
and move toward meeting recovery time
objectives?
How do we ensure that the capacity to recover
aligns with the risk tolerance of state leadership?
1.
Establish and empower a central authority for
‘Enterprise’ (Statewide) D/R Planning
2.
Standardize and consolidate IT Infrastructure
where ever possible to ease D/R Planning
3.
Practice D/R Planning at the ‘Enterprise’ (not
agency) level
4.
Mandate D/R planning for all IT systems
5.
Develop and document State guidelines on ‘risk
appetite’
Resilience
and Recoverability
(R/R)
Leadership is about change!
Shared Vision: Changes on the horizon
Standardization & Consolidation
System Level R/R Focus
R/R Designed into All Systems
Risk Tolerance and Oversight
Senior Level Sponsorship
State Agencies’ Partnership
Strategic and Tactical Leadership
Strategic = Resilience
Tactical = Recoverability
Governor
Emergency Management Council
State Agency Liaisons
DIS, OFM, DOP, DOT, etc.
Comprehensive Emergency
Management Plan - CEMP
ISB Standards
State Agencies’ Plans
Existing Catch 22
Change agency-centric approach to statewide R/R
solution
Establish shared vision for funding R/R
Integrate R/R into Spending Plans
Develop policy that cements R/R funding into IT
initiatives
Establish Ownership and Oversight
Align R/R efforts with similar or preexisiting efforts
Emergency management groups
Agencies’ leadership teams
Establish new teams or partnerships as needed
Establish policies for:
Compliance
Success Metrics
Change Management
LEADERSHIP!!!
Proactive = Resilience
Reactive = Recovery
Close Gaps and Remove Roadblocks
Leverage Existing or
Program
Empower new
Hardware and software consolidation and
standardization is becoming the driving force behind
organizations evaluating their Disaster Recovery
plans.
A 2009 survey from Symantec Corporation found
that 64% of organizations are creating or reevaluating their DR plans based on a plan to
consolidate and standardize their infrastructure.
Hosting Service Matrix
Increase
provider
mgmt,
reduce
agency
resources
Maturity
Target
Transition
Target
Leverage common infrastructure,
consolidate hardware, reduce cost
2
Adopt a cost effective enterprise High
Availability Architecture solution
(Resilience).
Future investments in Infrastructure and
Applications should include Resilience
and Recoverability.
Planning for Resilience and Recoverability
should be at the Enterprise Level.
Planning for recovery by agency,
technology, or individual application is not
effective for an enterprise class system.
Enterprise Level Planning is complex, and must be
done for Essential Systems.
Essential Systems support Essential Agency Functions
as defined in agency COOP plans
Must consider core agency systems - run by agency or
service provider
Must consider dependencies such as infrastructure and
interface services
Must consider dependant trading partner systems
Must consider enterprise data at recovery point
Must include procedures for assuring data integrity at
recovery point
OFM Example - The State Payment Process
Payment Process based upon AFRS and all
systems that it connects to
Historical DR Plan “DIS will recover the
mainframe and all will be good”
Look at interfaces to partner agencies
Look at known single points of failure
Enterprise Class Planning requires
someone to focus on getting it done for
essential systems!
A single organization must facilitate
Enterprise planning
Enterprise system owner and Stakeholders
must fully participate in development and
testing of R/R Plans
Enterprise Planning is HARD!
Enterprise Class Systems are COMPLEX!
Someone Needs to GET ‘er DONE!
Many, if not most, recent IT systems developed
without Disaster Recovery – Why?
Elimination viewed as a ‘Cost Reduction’ strategy.
This is a ‘false economy’ – a calculated risk
Real consequences to State citizens:
Missing vital systems after a disaster
Or
Spend too much to ensure their availability
Creation of WSRRO
Mandate all new IT systems include R/R
Review and approve
Criteria
Agency impact analysis
Integration impact analysis
Validate appropriateness of plan
Types of ‘valid’ plans:
‘Resilience’
‘Warm site’
‘Cold site’
Data protection only
No recovery plan
Time
Cost
Assurance
Assurance
Cost
Time
Resilience
Recovery (Warm)
Recovery (Cold)
Data Protection Only
Mandate R/R planning for all IT
systems
Scope for critical functions only
Ensure ‘Enterprise’ context
If your house was on-fire, what would you save?
We all live in the same house, we need to decide
what is going to be saved! And how much!
We won’t be able to save it all.
Be careful what you choose!
What is important to the WA State Enterprise?
Public Safety (EMD/WSP/DOC/Roads/others?)
Citizen Systems – Licensing, Social Systems,
others?
Financial Systems - How we dispense and
receive funds.
H/R Systems, Data Centers?
State Enterprise Approach!
How much and what loss is acceptable?
Data? E-mail? File Systems?
Hardware/infrastructure
Network s, communications?
Applications used by Citizens?
Applications used by Agencies?
What does this look like?
How do we determine what and how much?
Identify and Develop a Risk Matrix!
Now we know what, How do we really know it will
work?
What are our expectations for Disaster Recovery?
How do we ensure that RECOVERY WILL work?
LEADERSHIP!
Identify and apply standardized comprehensive testing
(Know what and how much to test and test it the same way
across the board!
Perform Resilience and Recoverability Plans
Review Results and apply Process Improvement!
(Do it better next time!)
Target Enterprise (State Level)
Programs/Systems NOT silo agencies
Identify how much of it we really need!
RISK MATRIX!
Standardized Comprehensive Testing applied
Regularly perform Resilience and Recoverability
Testing
Process Improvement
1.
Establish and empower a central authority for
‘Enterprise’ (Statewide) R/R Planning
2.
Standardize and consolidate IT Infrastructure
where ever possible to ease R/R Planning
3.
Practice R/R Planning at the ‘Enterprise’ (not
agency) level
4.
Mandate R/R planning for all new IT systems
5.
Develop and document State guidelines on ‘risk
appetite’
Thank you!
Scott Andersen, WSDOT
Gary Duffield, DIS
Doug Selix, OFM
Thelma Smith, WSDOT
Brian Sylvester, DOP