Disaster Recovery at the University of Alberta

Download Report

Transcript Disaster Recovery at the University of Alberta

Disaster Recovery at the
University of Alberta
Rob Lake
(Presenter and Co-producer)
Information Technology Planning and Forecasting Officer
Office of the Vice Provost (Information Technology)
University of Alberta
[email protected]
www.vpit.ualberta.ca
Co-produced with (and thanks to):
Marika Bourque
Associate CITO and Executive Director of AICT
University of Alberta
Kevin Moodie
Director of AICT
University of Alberta
Brian Acheson
Director of AICT
University of Alberta
2007 EDUCAUSE Top Ten IT Issues
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Funding IT
Security
Administrative/ERP/Information Systems
Identity/Access Management
Disaster Recovery/Business Continuity
Faculty Development, Support and Training
Infrastructure
Strategic Planning
Course/Learning Management Systems
Governance, Organization and Leadership for IT
Source: EDUCAUSE review May/June 2007
2006 ECAR Survey Results
“If central IT systems and services were not operational at my
institution, business units could carry out essential operations.”
2.7% - Strongly Agree
18.6% - Agree
2.1% - Don't Know
31.6% - Strongly Disagree
10.0% - Neutral
35.0% - Disagree
Source: Ron Yanosky, ECAR Symposium, 30 June 2006
2006 ECAR Survey Results
“My institution is prepared to restore centrally
controlled systems in the event of a disruption.”
0.9% - Don't Know
6.5% - Strongly Agree
5.0% - Strongly Disagree
20.7% - Disagree
44.6% - Agree
22.3% - Neutral
Source: Ron Yanosky, ECAR Symposium, 30 June 2006
ECAR Most Critical Services
•
•
•
•
•
•
•
Campus Internet connection
Institutional Web site
Campus network
E-mail
Voice telephony
Course management system
Recovery time objective (RTO): 48 hours or less
Source: ECAR Research Bulletin Volume 2007 Issue 4
Disaster Recovery at the UofA
• Four components to disaster recovery plan:




Academic information systems
Administrative information systems
Off-site data recovery centre
Emergency notification
Academic Information Systems
• AICT Disaster Recovery Overview plan
completed in March 2007
• Core academic services:








Voice and data network connectivity
Web
Email / Webmail
Telephony
E-Learning (WebCT)
DNS
Identity Management
AFS
Academic Information Systems
• Recovery Time Objective: 48 hours
• Current plan requires a hot site to meet RTO for
core academic services
• Will investigate warm site possibilities and
virtualization opportunities in the future
Academic Information Systems
• Overall requirements (worst case):




$6.0 million for basic infrastructure
$5.6 million to meet specified RTO of critical services
$9.4 million for restoration of secondary services
Total cost: $21.0 million
Academic Information Systems
• Plan considers two scenarios:
 Restoration with a secondary hot site
 Restoration without a secondary hot site
Academic Information Systems
• Restoration with a secondary hot site:
 Basic infrastructure already in place
 Fully functioning equipment for the core academic
services already in place
 Installation of secondary services
 48 hour RTO for the core services
 3 month minimal restoration timeframe for the
secondary services
Academic Information Systems
• Restoration without a secondary hot site:






Requires selection of a hot site
Installation of basic infrastructure
Installation of core services
Installation of secondary services
3 to 6 month downtime for core services
Up to 9 month downtime for secondary services
Administrative Information Systems
•
•
•
•
Outsourced to IBM Global Services since 2000
Relocated to Markham, Ontario in 2005
Warm site options in Montreal and Edmonton
Deferred until outsourcing contract renewal in
2010
Regional Data Centre
• Toma and Bouma Management Consultants and
Stantec engaged in April 2006 to develop a
Business Plan for a new Disaster Recovery
Centre (DRC)
• Preliminary Business Case completed in late
2006
• Approved by Vice-Presidents in April 2007
Data Centre Standards
• Defined by the Telecommunications
Infrastructure Standard for Data Centers (TIA
942)
• Classifies data centers into Tiers
• Each Tier offers a higher degree of
sophistication and reliability
Tier 1
• Basic: 99.671% availability
• Annual downtime of 28.8 hours
• Susceptible to disruptions from both planned
and unplanned activity
• Single path for power and cooling distribution,
no redundant components (N)
• May or may not have a raised floor, UPS or
generator
• 3 months to implement
Tier 2
• Redundant Components: 99.741% availability
• Annual downtime of 22.0 hours
• Less susceptible to disruption from both planned
and unplanned activity
• Single path for power and cooling disruption,
includes redundant components (N+1)
• Includes raised floor, UPS and generator
• 3 to 6 months to implement
Tier 3
• Concurrently Maintainable: 99.982% availability
• Annual downtime of 1.6 hours
• Enables planned activity without disrupting
computer hardware operation, but unplanned
events will still cause disruption
• Multiple power and cooling distribution paths but
with only one path active, includes redundant
components (N+1)
• Includes raised floor, UPS and generator
• 15 to 20 months to implement
Tier 4
• Fault Tolerant: 99.995% availability
• Annual downtime of 0.4 hours
• Planned activity does not disrupt critical load and
data center can sustain at least one worst-case
unplanned event with no critical load impact
• Multiple active power and cooling distribution
paths with redundant components
• 15 to 20 months to implement
Regional Data Centre
• Data Centre Requirements:
 Tier 3
 18,000 sq. ft.
• 6000 sq. ft. for servers and racks
• 3000 sq. ft. for future growth
• 9000 sq. ft. for support
 Minimum 5 km from primary computing centre
Regional Data Centre
•
Options:
1. Exchange computing centre space with other
institutions
2. Lease space from service providers
3. Build new DRC alone
4. Build new DRC with public and / or private partners
Option 1
1. Exchange computing centre space
•
•
•
•
Minimal exchange with U of Calgary
No space in either computing centre
Reliance on external staff
Would require new capital investment
Option 2
2. Lease space from service provider
• Four vendors surveyed
• Lack of capacity in Alberta at this time, but that is
changing
• Vendors would consider building a facility in the
Edmonton area if they could find an “anchor” tenant
• Costs unclear, but would be about $3.5 million per
year
Option 3
3. Build new DRC alone
• Requires provincial funding assistance
• Unlikely for University of Alberta only
Option 4
4. Build new DRC with public and / or private
partners (P3 arrangement)
•
•
•
•
•
Northern Alberta post-secondary institutions
Government of Alberta
City of Edmonton
Capital Health
TELUS / Epcor / etc.
Option 4
• 16 Alberta post-secondary institutions surveyed:




12 responded
5 have no plan
6 have a plan in progress
1 plan completed
• 8 interested in a regional solution
• 4 “somewhat interested”
Regional Data Centre
• 30,000 sq. ft. facility required
• Capital costs range from $12 million to $36
million (average: $22 million)
• Operating costs range from $1 million to $4
million per year (average: 3.5 million)
• Better chance to be funded by provincial
government
• Governance model required
• Rural or urban, but travel time important
Regional Data Centre
• Working group established in August 2008
between Government of Alberta, City of
Edmonton, Capital Health and the University of
Alberta
• Four meetings held
• Government of Alberta leading the initiative
• Consultants to finish long term strategy by end of
February
• 50,000 sq. ft. Tier 3 facility for 20 year period
Regional Data Centre
•
•
•
•
Four possible locations considered
Location needs to satisfy Provincial Auditors
May still involve a P3 model
Jubilee Auditorium model
Risk Mitigation
• 3000 square foot server room to open in
Enterprise Square in March 2008
• Lights out facility
• Limited hot site capability – storage of offsite
backup tapes
• Intended for building tenants – one room per
building
• Green computing
• Virtualization
• IT Principles of Operation
Emergency Communication System
• To be implemented by the start of the 2008/09
academic year
• Emergency Communication Work Group
established in November 2007
• Work in progress on an Emergency
Communication Plan
• RFP for an alert system to be released by the
end of February 2008
Getting the Message Out
•
•
•
•
•
•
•
Home page announcement
Email
Telephony (cell and VOIP-enabled phones)
Facebook
Campus and local radio / media
Sirens
Flashing lights
Notification Software Criteria
•
•
•
•
•
•
Flexible
Easy-to-use
Continuously available (24x7x365)
Accommodate two-way communications
Accessible from multiple locations
Handle high volumes for calls or messages
within a reasonable timeframe
• Support an educational institution environment
Getting Started…..
• This effects everyone – need buy-in from all
constituents on campus
• Roll-out will include a sign-up campaign and a
public awareness campaign
• Many roll-out strategies will be employed
• Several emergency exercises have recently
been held
Summary
•
•
•
•
Still an outstanding IT liability
No inexpensive solution available
Lack of capability for service providers
Partnership for a new regional facility the best
option at this time
• Some mitigation with the Enterprise Square
server room
• Currently exploring many partnership options