Title of Presentation - EMEA PUG Challenge Conference

Download Report

Transcript Title of Presentation - EMEA PUG Challenge Conference

Creating a complete recovery solution

Adam Backman [email protected]

Partner, White Star Software

Why backup?

        No reason to backup – Our stuff never fails Just takes up resources We already have redundancy I hate changing the tapes I ’m tired I ’m hungry I don ’t feel good Are we there yet?

Cost of Lost Data

   Loss of data is expensive − Cost to rekey − Cost of lost business Can you recover the data − Not everyone has paper trail Focus on recovery

Building a Good Recovery Strategy

    Know your business − Components of business how people do business with you − Components of systems Tools (applications and physical) Know your risks (fire, flood, hurricane, …) Be inclusive − − Technical people (network, phones, facilities, …) Business people (people who own the application functionality) Build an execution plan with contingencies

Why be inclusive?

    Business people know what they need − What they use in the application − What they need to do their job (phone, fax, internet, …) Stops the finger pointing when a problem occurs Business people generally control the budget Need decision makers − − Authorize spending Data loss vs. recovery time decisions

Focus on Recovery

     Loss of data Rogue program, hacked, poor testing Loss of disk Disk failure Loss of machine Machine failure Loss of machine room Fire, flood,… Loss of building Natural disaster

Creating a plan

    Goals (Event-based goals) − Data corruption (Data) − − − If we lose a disk (DB) If we have a fire (Machine) If we have a natural disaster (Facility) Hardware Software Other stuff

Main Goal

The main goal for any recovery plan is to minimize or eliminate the impact to the customer

Creating a plan - Goals

   Acceptable downtime (Generally cost based) Everyone wants zero but it is generally cost prohibitive Planned outages − − − Hardware install and maintenance Software upgrade O/S upgrade or patch Notifications (Both before and during outage) − − − Who When What do they do?

Creating a Plan – Other Considerations

    What makes your business run?

− − − Phones Faxes Business to Business (EDI, XML Feed, …) Can people work from home?

Do you have/need another location?

Contact lists in case of major catastrophe − − Kept up-to-date Kept online and printed in an accessible location

Cover All Sides

   Everyone should be running after image journaling Need removable backup periodically (daily) − − Wide scale events (fire, flood, …) To recover from after image journals Replication is the new normal − OE Replication − − Log-based replication Hardware replication

Portions of a OpenEdge Database

     Database table of contents (.db) The data files (.d*) Before image journals (.b*) After image journals (.a) Modified buffers in memory

How probkup -online works

1.

2.

3.

4.

5.

6.

Lock the DB Do a pseudo-checkpoint Switch AI files (if necessary) Backup the BI file Unlock the DB Backup the database

Other Items to Consider

    Application files Data outside the database (GIS, photos, …) User files (reports, output, …) External systems (EDI, Warehouse, …)

Backup ’s Impact to Production

   Backing up production − − Pause during backup of before image journal Uses I/O capacity of production Split mirror backup − − Use of quiet point keeps pause to a minimum Pause is non-zero After Image file backup − − − Still needs a backup to begin the process Little impact for backup process Potentially long recovery time

After Image Management

  Use OpenEdge after image management − − − Automated Supported Nice features Use script-based after image management − − Custom Requires support and testing

Replication options

   OpenEdge Replication − − − − Supported by Progress Real time replication Allows for reporting against target database Additional license cost Log-based replication − − − − Free Requires self-support Not real time (greater potential data loss) No reporting option Hardware replication

OpenEdge Replication – Single Target

Production (source) Reporting (target) Shared Memory Replication Server Replication Agent Shared Memory Source DB Target DB

OpenEdge Replication

Production (source) Shared Memory Replication Server Source DB

Multiple Targets

Rep Agent Target 1 Shared Memory Target DB Rep Agent Target 2 Shared Memory Target DB

Log-Based Replication

   Log-based replication has been used for years Log-based replication provides a vehicle for replication without the licensing costs of OE Replication Not real-time Code for this type of replication must be maintained by the user and there is no official support from the vendor

Hardware-Based Replication

   Hardware-based replication is a function of the hardware vendors and thus supported directly by them This method is

NOT

supported by Progress ALL write operations must be guarantied across the source and target disk systems

Archiving – Why is it important

    Needs to be accessible Needs to be secure Needs to be in a different location Need to have policies − Physical access − − Number of saved archives Security

Archiving

   Who does your archiving (Iron Mountain, third party, someone ’s house, …) What do you keep − Two weeks of dailies − − 5 weeks of weeklies 1 year of monthlies How to label you backups − Who did the backup − − Command to restore Date and Time

Pre-archiving

    After image files − Move to remote system Database backups − Move to remote system before moving to removable media Support files User data

Archiving

(continued)   Data Archiving – What data archiving?

− Archive/Delete?

− − Archive/Save historical Archive/Save aggregates After Image file archiving − At least 2 backups worth − I recommend a week or more if possible

How about if I am a SAAS user

    Who is your provider Verify their recovery plan Run dry run of at least one recovery scenario Have specific service level agreements − Time to recover − − Maximum loss of data Penalties for missing times (need to cover your costs)

How about if I am a SAAS provider

   Build regular recovery plan Unique concerns − − Security Compliance (HIPAA, SOX, …) Build achievable SLAs for your users that at a minimum include: − − − − Data loss Uptime Maximum recovery time Maintenance time

Implementing Your Plan

  First implementation should be a totally manual process to insure the steps work and allow for documentation Document the process as you go − − − − − − Who are you logged in as?

Exactly what you typed Where you were (console, remote, …) Can things be done in parallel or sequentially Where are the logs What to look for in the logs (good and bad)

Documentation

   All recovery documentation should be

VERY

specific as the person doing the recovery will not be familiar with you or your environment Create documents for normal maintenance − − − Backups Database growth Modification of OS, Application, printers, … Create scenario based recovery plans − − − Lose a disk (or disk pair) Fire Flood

Testing Your Plan

  Who does the test?

− Not the person who wrote it − The backup person for the implementation − Someone who is “always” there regardless of technical ability How often to test?

− Material data change − Any change in database configuration  New directories  New areas − − Do you have a second site or redundant hardware?

Do you have enough disk capacity (space and throughput)

How to test your plan

   Fail over to your backup system − You may lose the use of the backup system but that is a good trade in the vast majority of cases − Maybe not live but at least a restore and test Fail back to your primary system − After a live failover Contingency planning for: − − − − Personnel Physical plant Equipment Lead time for resources (hours or days)

Summary: Recovery Planning

     You still need to backup Backup your backup not production if possible Be inclusive when building your team Create and maintain a

comprehensive

plan Include

everything

needed to use the application:    Hardware Applications Data Create and maintain physical and online contact lists and store it with your archived backups

Conclusion: Recover planning  

Always

backup what you have now, however little, before starting to recover Test your plan periodically (At least annually)

Still have questions?

Please feel free to contact me directly.

Adam Backman White Star Software (603)897-1010 [email protected]

Thank you for your time THANK YOU