Database Repair and Recovery - Dan Foreman

Download Report

Transcript Database Repair and Recovery - Dan Foreman

Progress Database Repair & Recovery

Dan Foreman BravePoint, Inc.

Email: [email protected]

Introduction- Dan Foreman

Progress User since 1984 (V2.1) Speaker at many Progress User Conferences from 1990 to 2012 PUG Challenge 2012

Introduction- Dan Foreman

Author of:

Progress Performance Tuning Guide Progress Database Administration Guide Progress System Tables Guide Progress V10 Database Admin Jumpstart

Book purchase allows free online access ProMonitor – Database Monitoring Tool ProD&L - Accelerated Dump/Load Utility Balanced Benchmark – Load testing tool PUG Challenge 2012

Introduction - Who Are You

Progress Version: V6, V7, V8, V9, V10.0*, V10.1*, V10.2* DB OS: Unix? Windows? Linux? Is there anything else?

Largest Single Database?

Highest Concurrent User Count?

PUG Challenge 2012

Special Request

Mobile Phones on Mute Please!

PUG Challenge 2012

Goals

Can I teach you database brain surgery in an hour?

Note that DBAs have one big advantage that human brain surgeons don’t…do you know what that is?

PUG Challenge 2012

Real Horror Stories

Fortune 500 Company (sorry but they would not appreciate us sharing their name) DB Corruption February 23 Last Good Backup: January 11 Last Good AI Files: January 17 We facilitated a special version of rfutil that would ignore errors during the Roll Forward process PUG Challenge 2012

Real Horror Stories

Customer running SCO OpenServer We had told the customer to move to a more “modern” OS, i.e. Linux OS Problem – Can’t mount the Disks; Discovered that backups to tape were not occurring (backup to disk was OK but couldn’t see the disks) Had to boot using Knoppix to repair things – took forever to reinstall SCO PUG Challenge 2012

Real Horror Stories

Fortune 1000 Company HP Server Admin outsourced to IBM Backups outsourced to 3d party 3d party stopped doing backups, unannounced, due to non-payment DB Corrupted Restoration impossible BravePoint 2012

Preventive Maintenance

Backups (yes, I know you think you have backups but have you tested one recently?) Test your Entire Recovery Plan PUG Challenge 2012

Preventive Maintenance

Warm Standby Database - A database on another machine with a recent copy of the production DB Also called a D/R Database This is easy to do in Progress...covered soon PUG Challenge 2012

Preventive Maintenance

Unix: don’t logon as root unless you really need to Use O/S security to protect the DB, BI, and AI files from accidental or casual or intentional deletion PUG Challenge 2012

Preventive Maintenance

Unix: Don’t use kill -9 to terminate a Self Service Progress session; You might bring the database DOWN! if you happen to kill a session that is holding a Latch ALWAYS have an up-to-date Structure (.st) file available - we will see why later PUG Challenge 2012

Preventive Maintenance

Monitor the BI file High Water Mark Monitor 'Delinquent' Transactions (Active Transactions longer than 30-60 minutes) Monitor Large Transactions (A Client with a Large Number of concurrent Record Locks) longtrx*.p Progress program on the BravePoint Website to detect Delinquent Transactions PUG Challenge 2012

Preventive Maintenance

Use the -bithold parameter as an extra safeguard; Set to 50% of available BI Disk Space; even in V9 V9/V10 supports Terabyte sized BI Files but extent sizes are still limited to 2gb unless you use the EnableLargeFiles option on proutil and the file system must be 2gb enabled too PUG Challenge 2012

Preventive Maintenance

Monitor the Area High Water Marks to avoid growing into the Variable Length Extent There is a Performance Hit, usually insignificant but sometimes not, when growing the Variable Extent A Single Variable Extent can limit some of the recovery options discussed later PUG Challenge 2012

Quiz Question

Who are the Smartest DBAs in the Room?

PUG Challenge 2012

Answer

DBAs who enabled After Imaging on their Mission Critical Databases If you’re not using AI, you probably shouldn’t be responsible for your company’s databases Management need convincing?

Play Chicken with them PUG Challenge 2012

After Imaging

Who is currently not using AI?

If not, why not? Is public flogging or humiliation required?

PUG Challenge 2012

After Imaging

PSC docs say that AI offers protection against Disk failure Disk fails 5 minutes before the backup starts on the final day of your year end close No paper trail Ouch! Time to work on your resume (C.V.) After Image File(s) + Last (Good) Backup = State of DB at time of crash PUG Challenge 2012

After Imaging - Why Use It?

But you say…”I have disk mirroring (also known as RAID 1) so I’m protected against a disk failure” BUT Mirroring does NOT protect against all database evils PUG Challenge 2012

After Imaging - Why Use It?

True Horror Story #1 A DBA (logged on as root) FTP’d a test database into the directory where the production database resided... unfortunately they had the same name Disk Mirroring worked just fine…..

After Imaging would have probably saved the day PUG Challenge 2012

After Imaging - Why Use It?

True Horror Story #2 A user ran an archiving program on live data that wasn’t ready to be archived Once again the mirroring performed perfectly AI might have improved the situation as it is possible to Roll Forward to a specific point in time PUG Challenge 2012

After Imaging - Why Use It?

True Horror Story #3 – Part 1 BI file hit the V8 2GB limit @ 1600 on a busy day (300+ users) Large Production Database was corrupted Progress Support Recommendation: dump & load or restore from backup which meant substantial down time or data loss PUG Challenge 2012

After Imaging - Why Use It?

True Horror Story #3 – Part 2 Fortunately the customer called me and I was able to temporarily patch the database until a D&L could be performed Irony: I had recommended AI to this customer over one year prior to this event PUG Challenge 2012

After Imaging - Why Use It?

Avoid probkup online issues Transaction Activity is Frozen while the BI File is Backed Up The I/O Overhead of disk/tape backup Possible Solution Use AI to maintain a Warm Spare DB Backup the Replicated Database PUG Challenge 2012

After Imaging - Why Use It?

Warm Standby (D/R) Database A Warm Standby DB is: A replicated Database on another Server DB can an be brought online quickly in case of catastrophic failure to the production system It’s ‘warm’ because it is not 100% current…usually 2-15 minutes behind PUG Challenge 2012

After Imaging - Why Use It?

A HOT spare database is not possible using AI except with: Fathom Replication (oops…OpenEdge Replication) Replication Triggers SAN Mirroring Even these options don’t guarantee zero loss of data PUG Challenge 2012

After Imaging - Why Use It?

Easy refreshing of a Report Server DB A Report Server DB is: A database on another server Used for reporting only To relieve the production system of the overhead imposed by reporting Doesn’t require same level of hardware or Progress license PUG Challenge 2012

Essential DB Monitoring

Performed periodically to make sure you don’t have hidden or unreported corruption PUG Challenge 2012

Essential DB Monitoring

Corruption Checks

proutil dbanalys probkup/procopy proutil dbrpr

proutil dbscan (non-interactive dbrpr)

proutil idxfix dbtool

BravePoint 2012

Essential DB Monitoring

-MemCheck AND ALL THE OTHER SIMILAR OPTIONS BravePoint 2012

Database Log File Monitoring

Check the Database log (.lg) file for errors DAILY. Look for words such as: kill* drastic warn* error system dead fatal abnormal exceed* fail* wrong unexpected* invalid died damage* overflow* violation insufficient missing disappear* corrupt* allow* attempt* cannot enough illegal beyond impossible increase unknown unable stop* ProMonitor supports automated log file monitoring PUG Challenge 2012

1124 Errors

SYSTEM ERROR: wrong dbkey in block

99.9999% probability of H/W problem Don’t limit search to disks; consider: Disk Controllers, RAM (parity errors), Firmware, etc.

Don’t let the Hardware Technician blame Progress or the Application Don’t let the Hardware Technician escape without fixing the problem PUG Challenge 2012

1124 Stories

Seagate Firmware on 2gb drives (mid 90’s) HP Server/EMC SAN administered by HP HP/UX diagnostics showed no problems EMC diagnostics showed no problems Cause: Bad SAN Fabric Switch PUG Challenge 2012

1124 Stories

Sometimes a Server reboot can (temporarily) fix a 1124 situation However this might be a situation where the hardware is in a bad-good bad good… cycle BravePoint 2012

Database File System Full

Use prostrct repair to relocate Extents to a place with more space Copy the Extent to the new location Update the Structure File (.st) to reflect the current location of the Extents (one good reason to have a current one) Run prostrct repair new.st Alternatively use prostrct add to add new Extents PUG Challenge 2012

No Space for Before Image File

Do Not run out of BI disk space if: More space isn’t available elsewhere The BI extent(s) can’t be relocated To perform Crash Recovery (part of the proutil truncate bi process), the BI file will grow; sometimes 2X or more PUG Challenge 2012

No Space for Before Image File

If there is no space for the BI file to grow, there is no Crash Recovery If Crash Recovery partially completes but then crashes, the next Crash Recovery will create an even larger BI file!!!

PUG Challenge 2012

No Space for Before Image File

Force Access (-F) is the only option (if you don’t have AI enabled) Even having AI enabled is problematic Crash Recovery Notes are also written to the AI logs You can’t do rfutil aimage empty during Crash Recovery!!

This is why the –bithold parameter is so important PUG Challenge 2012

Corrupt Database Blocks

What Kind of Block? Index or Data Block Type 2 (Index = IX) Block Type 3 (Record Manager = RM) Use proutil dbrpr to get the Block Type Or if the Block is in a Storage Area dedicated to Indexes or Tables, you automatically know PUG Challenge 2012

Corrupt IX Block Options

Try rebuilding the Index that the IX Block belongs to Try to Truncate the Area Reformat the block as a Free Block (proutil dbrpr) If it is an RM block, see the next set of slides PUG Challenge 2012

Corrupt RM Blocks

Reformat the block as a Free Block (proutil dbrpr) Replace the Block with the same Block from another DB (restored from a backup)

proutil dbrpr

5. Dump Block (from the good DB) 4. Load Block (into the bad DB) Don’t forget to backup the DB fi PUG Challenge 2012

RM Block Transplant

How can I tell if the block has changed since the last probkup? The Block Update Counter stored in the Block Header If not using probkup, the DBKEY from the AI Logs can be obtained with aimage scan verbose but you REALLY need to be motivated PUG Challenge 2012

Emergency Dump

‘Front and Back’ 4GL Dump

for each customer by custnum (until you hit the bad spot) for each customer by custnum descending Assumes the damage is limited in scope

PUG Challenge 2012

Emergency Dump

If the Primary Index is damaged, try dumping using a non-Primary Index Or try fixing the index with proutil

idxfix

Indexless Binary Dump proutil -C dump

-index 0 Only works for Type 2 Areas BravePoint 2012

Emergency Dump

RECID Dump Doesn’t require an Index Very Slow on a Large Database If one table per Area, perhaps no so bad Usually the Last, Last, Last Resort PUG Challenge 2012

Deleted/Damaged Extents

First, backup the “remnant” of the DB This may seem like a useless step but if the last backup is defective you may need to repair the broken DB and that’s difficult to do if it’s deleted The backup gives you time to: Prepare a plan of action Call outside resources (like me) for help Calm down, take a Xanax & lock the door Prepare a new Resume (C.V.) PUG Challenge 2012

Deleted/Damaged AI Extents

If an AI Extent is deleted, simply disable AI and... What? Not running AI?

Disable AI (rfutil aimage end) Fix the issue that caused the lost Extent Recreate the Extent with prostrct add Restart AI (rfutil aimage begin) If this doesn’t work, next slide PUG Challenge 2012

Deleted/Damaged AI Extents

Method #2 Disable AI with rfutil aimage end. You may get an error message regarding the missing AI Extent but typically AI is still disabled Truncate the BI file with proutil truncate bi. You may get an error regarding the missing AI Extent but typically the BI file is still truncated PUG Challenge 2012

Deleted/Damaged AI Extents

Method #2 continued Remove all AI Extents with prostrct

remove

Recreate the original AI Extents with

prostrct add

Restart After Imaging with rfutil

aimage begin

Reformat the truncated BI file with

proutil bigrow

PUG Challenge 2012

Deleted/Damaged BI Extents

Force access with -F V8.2 and later -F only works on proutil truncate bi (and promon and proshut) If you Force access, consider the DB damaged!

Forcing access THROWS AWAY the BI file Unfortunately, sometimes –F is the only option PUG Challenge 2012

Deleted/Damaged BI Extents

Force Access with –F continued Forcing Access sets the ‘Tainted Flag’ in the Master Block Even if you fix the Tainted Flag (idxbuild), consider the DB damaged!

Dump & Load (this is if there is no AI recovery option) If you can’t get into the database with -F or any other way, try the Read Only (-RO) option PUG Challenge 2012

Deleted/Damaged Database Extents .db

.d1

.d2

.d3

HWM .d4

.d5

PUG Challenge 2012

Deleted/Damaged DB Extents

Restore the DB and BI from Backup Apply the AI files Re-enable AI BI Grow Done!

That didn’t work?, next slide PUG Challenge 2012

Deleted/Damaged DB Extents

Use prostrct unlock if the deleted Extent was Empty (above the High Water Mark) prostrct unlock -extents will recreate missing Extents (and set the Tainted flag) However unlock also changes the time stamps on the AI files and they can’t be used any longer PUG Challenge 2012

Deleted DB Extents – Extent Transplant This technique is for Extents that contain Data except the Schema Area .d1 Extent (contains Master Block) Restore a copy of the deleted Extent from a Backup or other source The Extent’s ‘Last Opened’ time stamps won’t match Use prostrct unlock to sync the time stamps (broken in some versions) PUG Challenge 2012

Deleted DB Extents – Extent Transplant The data in the Extent might not match but… Use the -miracle option to re-create the Data

Loss of the Schema Area .d1 extent is usually not recoverable Loss of a High Water Mark extent is also usually not recoverable PUG Challenge 2012

Deleted DB Extents – Extent Transplant This is why small fixed extents may still be a good idea PUG Challenge 2012

Deleted DB Extents

If the DB Broker is still running #1 DON’T Shutdown the Database That ‘closes’ the database extents and you won’t be able to re-open them If a Client is still connected to the DB can access the Progress Editor, just dump the Database from the Dictionary Even if they can’t access the Editor, put dict.p (renamed as a menu item) into their PROPATH PUG Challenge 2012

Deleted DB Extents - Unix

If the DB Broker is still running #2 Warm Boot the System ASAP Don’t Shut Down the DB First Run fsck (*ix only) and it can probably recover the deleted Extent Why?

On Unix a file is not absolutely deleted until every process that has it open is gone (the Broker still has it open) PUG Challenge 2012

Lost .db File

In V9 and later it is relatively easy to restore the .db file

prostrct builddb

Requires an up-to-date Structure File (remember that point from the Preventive Maintenance list?) PUG Challenge 2012

Sources of Help

Progress Documentation

Progress DB Administration Guide

[email protected]

PSC Kbase (i.e. Krapbase) PSC Tech Support My mobile phone: +1 541-908-3437 For those weekend emergencies when you need expert assistance This is not a free call PUG Challenge 2012

Conclusion

Thank you for coming More details can be found in my

Progress Database Administration Guide

Publications are available at www.BravePoint.com

[email protected]

Do we have time for Questions?

PUG Challenge 2012