I Never Backup Production So Why Would You?
Download
Report
Transcript I Never Backup Production So Why Would You?
Why Backup Your Production Database? I Never Do.
Adam Backman [email protected]
Partner, White Star Software
Backup are for Sissies
No reason to backup – Our stuff never fails
Just takes up resources
We already have redundancy
I hate changing the tapes
I’m tired
I’m hungry
I don’t feel good
Portions of a OpenEdge Database
Database table of contents (.db)
The data files (.d*)
Before image journals (.b*)
After image journals (.a)
Modified buffers in memory
Other Important Stuff
Application
External data (GIS, photos, …)
User files
External systems (EDI, Data warehouse, …)
Reliability is Important
Loss of data is expensive
Many businesses now lack a paper trail
Redundancy does not equal reliability
− Rogue program
− 2 copies of bad data
What is High Availability?
Classic definition equals 24x7 operation
− Examples: manufacturing/e-commerce/follow the sun
− Little or no downtime
− Maintenance is done in very specific windows
More common definition
− Traditional business 8a-6p, single country 3 time zones 9-5
− Operational hours are critical
− Maintenance windows on a regular basis
Unconventional definition
− Is performance good enough to run the business
Backup Process’ Impact to Production
Backing up production
− Pause during backup of before image journal
− Uses I/O capacity of production
− Impacts the effectiveness of the buffer pool
Split mirror backup
− Use of quiet point keeps pause to a minimum
− Pause is non-zero
After Image file backup
− Still needs a backup to begin the process
− Very little impact for backup process
− Long recovery time
Cover all sides
Everyone should be running after image journaling
Need removable backup periodically
− Wide scale events (fire, flood, …)
− To recover from after image journals
Replication is becoming the new default
− OE Replication
− Log-based replicarion
− Hardware replication
OpenEdge Replication
OpenEdge Replication is only replication method
supported by Progress
OpenEdge Replication is the only method that
allows you to use the target database(s) from
reporting
OpenEdge Replication requires that you have
after image journaling enabled
Do not attempt to implement OpenEdge
Replication until after you have a good AI
management plan implemented
OpenEdge Replication
Production (source)
Shared
Memory
Source
DB
Replication
Server
Reporting (target)
Replication
Agent
Target
DB
Shared
Memory
Log-Based Replication
Log-based replication has been used for years as
OE Replication is a fairly new product
Log-based replication provides a vehicle for
replication without the licensing costs of OE
Replication
Not real-time
Code for this type of replication must be
maintained by the user and there is no official
support from the vendor
Hardware-Based Replication
Hardware-based replication is a function of the
hardware vendors and thus supported directly by
them
This method is NOT supported by Progress
ALL write operations must be guarantied across
the source and target disk systems
Archiving
Who does your archiving (Iron Mountain, thirdparty, someone’s house, …)
What do you keep
− Two weeks of dailies
− 5 weeks of weeklies
− 1 year of monthlies
How to label you backups
− Who did the backup
− Command to restore
− Date and Time
Archiving (continued)
Data Archiving
− Archive/Delete?
− Archive/Save historical
− Archive/Save aggregates
After Image file archiving
− At least 2 backups worth
− I recommend a week or more if possible
Building a Good Recovery Strategy
Know your business
− Components of business
how people do business with you
− Components of systems
Tools (applications and physical)
Know your risks (fire, flood, hurricane, …)
Be inclusive
− Technical people (network, phones, facilities, …)
− Business people (people who own the data)
Build an execution plan with contingencies
Creating a plan
Goals (Event-based goals)
− If we lose a disk (DB gone)
− If we have a fire (Machine Gone)
− If we have a natural disaster (Facility Gone)
Hardware
Software
Data
Other stuff
Creating a plan - Goals
Acceptable downtime (Generally cost based)
Everyone wants zero but it is generally cost prohibitive
Planned outages
− Hardware install and maintenance
− Software upgrade
− O/S upgrade or patch
Notifications (Both before and during outage)
− Who
− When
− What do they do?
Creating a Plan – Other Stuff
What makes your business run?
− Phones
− Faxes
− Business to Business (EDI, XML Feed, …)
Can people work from home?
Do you have/need another location?
Contact lists in case of major catastrophe
− Kept up-to-date
− Kept online and printed in an accessible location
How about if I am a SAAS user
Who is your provider
Verify their recovery plan
Run dry run of at least one recovery scenraio
Have specific service level agreements
− Time to recover
− Maximum loss of data
− Penalties for missing times
How about if I am a SAAS provider
Build regular recovery plan
Unique concerns
− Security
− Compliance (HIPAA, SOX, …)
Build achievable SLAs for your users
Implementing Your Plan
First implementation should be a totally manual
process to insure the steps work and allow for
documentation
Document the process as you go
−
−
−
−
−
Who are you logged in as?
Exactly what you typed
Where you were (console, remote, …)
Can things be done in parallel or sequentially
Where are the logs and what to look for in the logs
Documentation
All recovery documentation should be VERY
specific
Create documents for normal maintenance
− Backups
− Database growth
− Modification of OS, Application, printers, …
Create scenario based recovery plans
− Lose a disk (or disk pair)
− Fire
− Flood
Testing Your Plan
Who does the test?
− Not the person who wrote it
− The backup person for the implementation
− Someone who is “always” there regardless of technical
ability
How often to test?
−
−
−
−
Material data change (10% increase is a good target)
Any change in database configuration
Do you have a second site or redundant hardware?
Do you have enough disk capacity (space and
throughput)
How to test your plan
Fail over to your backup system
Fail back to your primary system
Contingency planning for personnel, physical plant
and equipment (Lead time for resources)
Summary: Recovery Planning
Get over it. You still need to backup.
Backup your backup not production if possible
Be inclusive when building your team
Always backup what you have now, however little,
before starting to recover
Create and maintain a comprehensive plan
− Include everything needed to use the application:
Hardware, applications, and data
Create and maintain physical and online contact
lists
Test your plan periodically (At least annually)
Still have questions?
Please feel free to contact me directly.
Adam Backman
White Star Software
(603)897-1010
[email protected]
Thank you for your time
THANK YOU