Transcript ADABAS Disaster Recovery
Experiences with D/R Procedures
Of ADABAS Data on Mainframes Natural Conference Boston Dieter W. Storr May 2004 [email protected]
May 2004 Dieter W. Storr - [email protected]
2
Different Disaster Different Action
Unplanned downtime
Machine outages Network outages Software failures
Disaster
Site / data center loss Catastrophic failure May 2004 Dieter W. Storr - [email protected]
3
Leading Causes of Downtime
Source: DRJ Summer 2002, Volume 15, Number 3
Power Storm Flood Terrorism Outage Damage Sabotage 29% 11% 10% 8%
May 2004 Dieter W. Storr - [email protected]
4
Other Causes of Downtime
Fire
Earthquake
Computer Crime
May 2004 Dieter W. Storr - [email protected]
5
LA Times Downtime
Flood Damage 21 April 2002:
Water was flooding through the Orange County facility, 14-inch pipe that supplies the fire-sprinkler system burst, half the facility standing in more than a foot of muddy water Affected areas: editorial, ad ops, IT,HR, ADABAS was not affected May 2004 Dieter W. Storr - [email protected]
6
LA Times Downtime
Bomb Alarm 14 June 2002:
A bomb was believed to have been left in the Bank of America branch that’s set into the Times Building Security swept the building, DBA’s observed the system from home May 2004 Dieter W. Storr - [email protected]
7
LA Times Downtime
Bomb Alarm 29 July 2002:
An intruder claimed to have a bomb, darted into the garage Security swept the building, OP stopped CA7 so PLOGCOPY couldn’t start automatically, two PLOG’s got full, ADABAS was locked, DBA’s later started the PLCOPY jobs manually May 2004 Dieter W. Storr - [email protected]
8
LA Times Downtime Power Outage - 29 August 2002 (3:43 P.M.)
City (DWP) had a power grid, flood leaked into a DWP transformer There were actually 2 spikes/outages, the first started the UPS switchover, which was interrupted by the second, which took the UPS down
.
May 2004 Dieter W. Storr - [email protected]
9
LA Times Downtime Power Outage cont’
The network was back in service after a short delay.
Our Unix-based servers were restarted, and checked. There was no evidence of damage to the Sybase Adaptive Server Enterprise (ASE, formerly: Sybase SQL Server) servers.
May 2004 Dieter W. Storr - [email protected]
10
LA Times Downtime Power Outage cont’
Mainframe recovery was delayed due to corruption to the Hardware Management Console (HMC) OP did a power-on reset, which restored the HMC Operations IPLed, and Technical Support proceeded with system checkout procedures.
Although Enterprise Storage Server (ESS) had an error indicator, it was still up and did not add to any outages IBM reset error indicator without impact.
May 2004 Dieter W. Storr - [email protected]
11
LA Times Downtime Power Outages cont’
Started ADABAS servers manually: Parm Error 23, DIB block remained after an abnormal termination Started all servers with IGNDIB=YES
18:25 ADABAS IS ACTIVE NO ADAN58 Message
May 2004 Dieter W. Storr - [email protected]
12
LA Times Downtime
ADAN58 Message (ADA71: ADAN5A)
ADAN58 BUFFER-FLUSH START RECORD DETECTED DURING AUTORESTART. THE NUCLEUS WILL T E R M I N A T E AFTER AUTORESTART. IN CASE OF POWER FAILURE, THE DATABASE MIGHT BE INCONSISTENT BECAUSE OF PARTIALLY WRITTEN BLOCKS.
O N L Y IN THIS CASE, REPAIR THE DATABASE BY RESTORE AND REGENERATE; OTHERWISE RESTART THE NUCLEUS.
ADAN5A
: FILES MODIFIED DURING AUTORESTART: files May 2004 Dieter W. Storr - [email protected]
13
Power Failure During Buffer Flush
A B C D E E
May 2004
F C H F C
Dieter W. Storr - [email protected]
D
old block updated block partially updated block on disk
14
Nucleus Restart After Power failure - IGNDIB=YES
May 2004 Dieter W. Storr - [email protected]
15
LA Times Downtime Power Outage cont’
Switched all PLOGs Checked batch and online There was no evidence of damage to any of the ADABAS components.
May 2004 Dieter W. Storr - [email protected]
16
Other LA Times Disasters
1965: Watts riots 1971: Sylmar quake 6.5
1987: Whittier punch 5.9
1992: LA riots 1994: Northridge quake 6.7
6 Feb 1998: El Niňo, flooding in B-1 computer room 15 April 1999: Power failure ‘news editing’ May 2004 Dieter W. Storr - [email protected]
17
ADABAS Recovery
CLOG
Command Log (CLOG) Failure - I/O Error
Restore or reallocate/format the CLOG ADABAS will come up through Autorestart normally No data loss if CLOG is not used May 2004 Dieter W. Storr - [email protected]
18
ADABAS Recovery Protection Log (PLOG) Failure - I/O Error
Restore or reallocate/format the PLOG Take a full back-up of the database ADABAS will come up through Autorestart normally Restart batch jobs Restartable batch jobs = OK Non-restartable batch jobs = check May 2004 Dieter W. Storr - [email protected]
19
ADABAS Recovery
TEMP SORT
TEMP and SORT Failure - I/O Error
Restore or reallocate/format the TEMP/SORT dataset Different actions for the utilities See the ADABAS Utilities manuals 20 May 2004 Dieter W. Storr - [email protected]
ADABAS Recovery DSIM Failure - I/O Error
Restore or reallocate/format a DSIM dataset Different actions for the utilities See the ADABAS Utilities manuals
DSIM
May 2004 Dieter W. Storr - [email protected]
21
ADABAS Recovery
RLOGM RLOGR
Recovery Aid Dataset Failure - I/O Error
Restore or reallocate/format a RLOG dataset Prepare the RLOG dataset ADARAI PREPARE RLOGSIZE / RLOGDEV….
Different actions for the utilities See the ADABAS Utilities manuals Take a full back-up of the database This will start the first generation of the RLOG dataset May 2004 Dieter W. Storr - [email protected]
22
ASSO ASSO
ADABAS Recovery
DATA DATA
ASSO/DATA Failure - I/O Error
Copy PLOG twice - ADARES PLCOPY Restore or reallocate/format DATA dataset(s) Instead of reallocate/format and restore all DATA volumes, System specialists can Reallocate and format the new volume Restore the VTOC chain Restore and Regenerate only files that were located on the failed volume Otherwise, . . .
May 2004 Dieter W. Storr - [email protected]
23
ASSO ASSO
ADABAS Recovery
DATA
ASSO/DATA Failure - I/O Error
Restore entire database ADASAV RESTORE [OVERWRITE = for GCB] ADASAV RESTONL [OVERWRITE] include PLOG
DATA
Start nucleus with UTIONLY=YES Regenerate updates from end of last save (SYN2) ADARES REGENERATE PLOGNUM=xxx ADARES FROMCP=SYN2,FROMBLK=xxx May 2004 Dieter W. Storr - [email protected]
24
ASSO ASSO
ADABAS Recovery
DATA DATA
ASSO/DATA Failure - I/O Error
Possible utilities need to be rerun (see ADARES): ADALOD LOAD FILE=xxx ADALOD UPDATE FILE=xxx ADALOD UPDATE FILE=xxx,DDISN ADAINV INVERT FILE=xxx,FIELD=xx Lock files to rerun utilities ADADBS OPERCOM LOCKU=xx Unlock utility-only status ADADBS OPERCOM UTIONLY=NO May 2004 Dieter W. Storr - [email protected]
25
ASSO ASSO
ADABAS Recovery
DATA DATA
ASSO/DATA Failure - I/O Error
Rerun the regenerate function for the relevant files Unlock the regenerated files ADADBS OPERCOM UNLOCKU=xx Don’t repeat these steps if ADARES points out: ADALOD LOAD FILE=nn ADARES REGENERATE FILE=nn ADADBS REFRESH FILE=nn Nucleus is ready May 2004 Dieter W. Storr - [email protected]
26
WORK1
ADABAS Recovery
WORK2 WORK3
WORK 1 Failure - I/O Error
Restore or reallocate/format the WORK dataset Restore and regenerate the entire database to avoid inconsistencies: open transactions See ASSO/DATA failure May 2004 Dieter W. Storr - [email protected]
27
WORK1
ADABAS Recovery
WORK2 WORK3
WORK 2/3 Failure - I/O Error
End the database normally (ADAEND) to avoid open transactions in part 1 of WORK Restore or reallocate/format the WORK dataset Restart the database normally If database abends then restore and regenerate the entire database - see ASSO/DATA failure May 2004 Dieter W. Storr - [email protected]
28
DATA DS
ADABAS Recovery
DS
Failure in Data Storage Blocks
//DDSIIN DD DSN=SAVE.SIBA….
// DD DSN=PLCOPY.LOG1… // DD DSN=PLOCPY.LOG2… //DDCARD DD * ADARES REPAIR DSRABN=xxx-yyy ADARES FILE=n1,n2,n3
Failure in DSST
ADADCK DSCHECK FILE=xxx ADADCK REPAIR
DS CALL SAG ! !
May 2004 Dieter W. Storr - [email protected]
29
ASSO
ADABAS Recovery
CP DATA
Nucleus Ends With RC 77
Not restartable No more space for
Checkpoint File
Rename old WORK (CP) Allocate/format new WORK with old space Change high-used RABN and high-used ISN Restart nucleus with new WORK and UTIONLY=YES Nucleus is in “crippled mode” - no user has access Expand the database Stop the nucleus normally
CP
Rename old WORK and restart the nucleus with old WORK (autorestart) May 2004 Dieter W. Storr - [email protected]
30
ASSO
ADABAS Recovery
User DATA
Nucleus Ends With RC 77
Not restartable No more space for
user files
Rename old WORK Allocate/format new WORK with old space
User
Restart nucleus with new WORK and UTIONLY=YES Nucleus is in “crippled mode” - no user access Expand database Stop nucleus normally Rename old WORK and restart nucleus with old WORK (autorestart) May 2004 Dieter W. Storr - [email protected]
31
ASSO DATA
ADABAS Recovery Nucleus Abends - Missed DE Values
Descriptor is marked in FDT as DE, value doesn’t exist in ASSO, but in DATA.
Check: ADAICK ICHECK FILE=xxx[,NOOPEN]
CALL SAG ! !
ADAVAL VALIDATE FILE=xxx,DESCRIPTOR=yy Solution 1: ADAULD UNLOAD FILE=xxx,UTYPE=EXF ADALOD LOAD FILE=xxx,LWP=yyyyK Solution 2: ADADBS RELEASE FILE=xxx,DESCRIPTOR=yy ADAINV INVERT FILE=xxx,FIELD=yy,LWP=...
May 2004 Dieter W. Storr - [email protected]
32
Back-up Possibilities
ADASAV to tape / disk Including Fast Dump Restore, DFDSS Delta Save Facility (DSF) Delta Save QDUMP (Legent) Disk mirroring (hardware level)
ASSO DATA
FlashCopy of Enterprise Storage Server (ESS) Peer-to-Peer Remote Copy Extended Distance (PPRC-XD) OC-3 links two EMC disc arrays Replication Stand-by systems Restore and Regenerate Entire Transaction Server May 2004 Dieter W. Storr - [email protected]
33
ADABAS Disaster Recovery
How to back-up Collect recovery data Restore w/o nucleus Start nucleus w/ UTILONLY=YES Regenerate w/ nucleus Switch UTIONLY=NO May 2004 Dieter W. Storr - [email protected]
34
ADABAS 6.2.2 Back-up at LA Times 21:00 Weekly ASSO / DATA / WORK / etc. ADAP1BKF
Online SAVE
01:00 ADAP1PLC
(FEOFPL)
02:00 03:00 8:00 - 11:00 12:00 ADAP1PLC
PLOG Switch
DFDSS
Full-Volume Back-up
ADAP1BKO
Copy Tapes
PDS, GDGs etc.
BRM/ABARS
Several Jobs May 2004 Dieter W. Storr - [email protected]
Pick-up by Recall
35
Date DB 4/03 8/03 4/03 8/03 4/03 8/03 4/03 8/03 4/03 8/03 1 2 3 4 5
Production Database Back-ups
GB Cartridge 3490 Silo
15 min
Number of 3490 Carts
2 4.9
4.9
30.0
36.7
11.6
17.1
9.7
9.9
5.2
7.3
150+ min 224+ min 110+ min 90+ min 28 min 42 19 9 5
Disk 3390(3399)
< 2 min < 2 min < 35 min < 45 min < 15 min < 22 min < 15 min < 15 min < 5 min < 7 min May 2004
ADASAV SAVE BUFNO=2,TTSYN=60 Record format . . . : VB Record length . . . : 27994 Block size . . . . : 27998 BUFNO=30
Dieter W. Storr - [email protected]
36
Back-up to SMS Disk Pool
Run times are consistently at least 80% lower when writing to disk instead of cartridge Run times are consistently around 60% lower when copying from disk to cartridge (compared with cart to cart) DFSMShsm, automate your storage management tasks, SMS Production Storage Pool May 2004 Dieter W. Storr - [email protected]
DFSMShsm
37
Back-up to Disk Pool
No cartridge errors No cartridge drive errors No cartridges get accidentally ejected from the silo Smaller back-up window Smaller maintenance windows Less impact to application processes Greater confidence that the data you need will be there when you need it May 2004 Dieter W. Storr - [email protected]
38
IBM Magstar 3494/Virtual Tape Server
Linear design 1 - 18 frames Conf. Flexibility SCSI, FC, ESCON, FICON 3590, 3490E, VTS High availability Dual robotics Dual library manager
>42 old 3490 carts will fit on 1 new 3494 cart 5 x 3390 volumes fit on one 3494 cart One 3494 cart can be read in 45 seconds into the VTS disk cache (raid-5)
May 2004 Dieter W. Storr - [email protected]
39
Virtual Tape Concept
Virtual tape drives Appear as multiple 3490E tape drives 3490E Media 1 and 2 support Shared / partitioned like real tape drives Tape Volume Caching All data access is to cache Improves ‘mount’ performance LRU Cache management Volume Stacking Fully utilizes physical cart capacity Reduces physical cart requirement Reduces footprint requirement May 2004 Dieter W. Storr - [email protected]
180 181 . . .
19F Virtual Drive 1 Virtual Drive 2 Virtual Drive n Tape Volume Cache Magstar 3590 30/60 GB capacity*
Virtual Volume 1 Virtual Volume 2 Virtual Volume n Logical Volume 1 Logical Volume n
* assumes 3:1 compression
40
Performance Tests
Input
Adabas Adabas Disk VTS Disk VTS Adabas Adabas
Output
Disk VTS VTS VTS VTS 3590 3490 VTS May 2004
MM.SS
42.63
46.43
42.47
48.38
39.39
47.86
216.27
52.47
Storage
526125 tracks 3390 31 log. 3490 tapes 31 log. 3490 tapes 31 log. 3490 tapes 31 log. 3490 tapes 1 phys. 3590 tape 51 phys. 3490 tapes 39 log. 3490 tapes Dieter W. Storr - [email protected]
41
Collecting Data For Recovery
Block Ranges SYN1 - SYN2 For ADASAV RESTORE From ADASAV SAVE PROTECTION LOG PLOGNUM=64, SYN1=4695, SYN2=4698 From ADAREP SYN1 06 UTI 2002-09-23 21:00:09 64 4695 DUAL ADAP1BKF
SYNP 06 UTI 2002-09-23 21:00:12 64 4696 DUAL ADAP1BKF
SYN2 06 UTI 2002-09-23 21:01:37 64 4698 DUAL ADAP1BKF
SYNV 0A UTI 2002-09-23 21:01:40 64 4699 DUAL ADAP1BKF SYNV 0A UTI 2002-09-23 21:01:40 64 4700 DUAL ADAP1BKF SYNV 28 UTI 2002-09-23 21:02:08 64 4702 DUAL ADAP1PLC SYNP 28 UTI 2002-09-23 21:02:08 64 4703 DUAL ADAP1PLC
42
Collecting Data For Recovery
Block Ranges SYN2 - End For ADARES REGENERATE From ADAREP
SYN1 06 UTI 2002-09-23 21:00:09 64 4695 DUAL ADAP1BKF SYNP 06 UTI 2002-09-23 21:00:12 64 4696 DUAL ADAP1BKF
SYN2 06 UTI 2002-09-23 21:01:37 64 4698 DUAL ADAP1BKF
SYNV 0A UTI 2002-09-23 21:01:40 64 4699 DUAL ADAP1BKF SYNV 0A UTI 2002-09-23 21:01:40 64 4700 DUAL ADAP1BKF SYNV 28 UTI 2002-09-23 21:02:08 64 4702 DUAL ADAP1PLC SYNP 28 UTI 2002-09-23 21:02:08 64 4703 DUAL ADAP1PLC
43
Collecting Data For Recovery
Dataset Name From Back-up Job (GDG) For ADASAV RESTORE
ADABAS.PRODOFFD.DB1.BACKUP.FULL.G0842V00 CATALOGED May 2004 Dieter W. Storr - [email protected]
44
Collecting Data For Recovery
Dataset Names From PLOG Copy Jobs (GDG) Matching block numbers 4695 - End For ADASAV RESTORE and ADARES REGENERATE DDSIAUS1 OUTPUT VOLUME=WRK015, SESSION NR=64 FROMBLK= 1214, FROMTIME=2002-09-23 03:30:24 TOBLK= 4701, TOTIME= 2002-09-23 21:01:42 ADABAS.PROD.DB1.PLOG.COPY.G7170V00
DDSIAUS1 OUTPUT VOLUME=WRK015, SESSION NR=64 FROMBLK= 4702, FROMTIME=2002-09-23 21:02:08 TOBLK= 4748, TOTIME= 2002-09-23 23:30:03 ADABAS.PROD.DB1.PLOG.COPY.G7171V00
DDSIAUS1 OUTPUT VOLUME=WRK004, SESSION NR=64 FROMBLK= 4749, FROMTIME=2002-09-23 23:30:25 TOBLK= 4791, TOTIME= 2002-09-24 03:30:33 ADABAS.PROD.DB1.PLOG.COPY.G7172V00
May 2004 Dieter W. Storr - 45 [email protected]
Recovery - Part 1 - W/O Nucleus ADASAV RESTONL
//DDKARTE DD * ADASAV RESTONL BUFNO=2,OVERWRITE //REPORT EXEC ADAREP //DDKARTE DD * ADAREP NOFILE //
May 2004 Dieter W. Storr - [email protected]
46
Recovery - Part 2
Start the ADABAS nucleus with normal JCL (UTIONLY=YES)
May 2004 Dieter W. Storr - [email protected]
47
Recovery - Part 2 - With Nucleus ADARES REGENERATE
// DD DISP=SHR,BUFNO=30, // DSN=ADABAS.PROD.DB1.PLOG.COPY.G7172V00
//DDKARTE DD * ADARES REGENERATE PLOGDBID=215,PLOGNUM=64 ADARES FROMCP=SYN2,FROMBLK=4698 ADARES TOCP=EOD,TOBLK=00000 not needed
May 2004 Dieter W. Storr - [email protected]
48
Recovery - Part 3 - With Nucleus
Lock files to re-run utilities See regenerate report
ADADBS OPERCOM LOCKU=fnr or SYSAOS: A / I / L / F or modify command /F jobname,LOCKU=fnr
Unlock utility-only status for users
ADADBS OPERCOM UTIONLY=NO or SYSAOS: A / I / L / U or modify command /F jobname,UTIONLY=NO May 2004 Dieter W. Storr - [email protected]
49
Recovery - Part 3 - With Nucleus
Re-run the utilities - if necessary
ADALOD LOAD / UPDATE / DDISN ADAINV INVERT FILE=xxx,FIELD=xx
Unlock files
ADADBS OPERCOM UNLOCKF=fnr or SYSAOS: A / I / L / F / N or modify command /F jobname,UNLOCKF=fnr May 2004 Dieter W. Storr - [email protected]
50
Delta Save Facility (DSF)
ASSO ASSO ASSO NUCLEUS DLOG Buffer Pool Delta Log (RABN) changed RABN changed blocks DSF=YES DATA DATA DATA ADASAV SAVE DELTA DSF=YES Delta Save Dual Protection Log DDPLOGR2 DDPLOGR1
May 2004
DDDSIM Extracted ADARES PLCOPY Blocks DSF=YES PLOG copy
Dieter W. Storr - [email protected]
DSIM DDSAVE1
Full Image Save Online/Offline DDREST1 Delta Save Facility Delta Save RABN ADASAV RESTORE DSF=YES ASSO Online Images extracted RABN from PLOG DDDELT1-8 DSIM DDDSIM
May 2004 Dieter W. Storr - [email protected]
DATA
52
Delta Save QDUMP (CCA - now: TSI)
ADABAS MPM und Utilities QDUMP Front End ADAIOR CSA RABN-WRITE 1234 6789 8431 8987 9123 9675
http://www.treehouse.com/qdump.shtml
ASSO
May 2004
DATA Read Sub task QDUMP Control Program Internal Buffer Read Sub task Write Sub task Write Sub task
Dieter W. Storr - [email protected]
53
Disk Mirroring Benefits
Asynchronous disk mirroring can provide better physical protection by supporting extended physical distances.
No loss of committed transactions in synchronous storage (mirroring/RAID) on a CPU failure
ASSO ASSO DATA DATA
May 2004 Dieter W. Storr - [email protected]
54
Disk Mirroring Limitations
No protection from data corruption introduced by the hardware / software Secondary site is not guaranteed to be transitionally consistent, because data is moved at the disk/track/sector or bit level (in the case of asynchronous mirroring).
Client application must be re-started after failure and need to be aware of failure May 2004 Dieter W. Storr - [email protected]
ASSO ASSO DATA DATA
55
Disk Mirroring Limitations
Synchronous mirroring and RAID devices can add overhead to application performance.
Redundant/specialized high availability hardware/software can be expensive and restricted to use for backup purposes only.
Secondary copy of data is not available for use – low hardware utilization.
Need to replicate everything on disk, no selectivity of data replication May 2004 Dieter W. Storr - [email protected]
ASSO ASSO DATA DATA
56
S/390
Example For Disk Mirroring
Back Up / Hot Site EMC 5700 UNIX SRDF remote mirrored synchronized S/390
May 2004
OC-3 link 12-15 miles EMC 5700 Main Platform
Dieter W. Storr - [email protected]
SRDF remote mirrored synchronized UNIX
57
Dedicated line broadband speeds and prices
T-1 - 1.544 megabits per second (24 DS0 lines) Ave. cost $400.-$650./mo. T-3 - 43.232 megabits per second (28 T1s) Ave. cost $6,000.-$16,000./mo.
OC-3 - 155 megabits per second (100 T1s)
Ave. cost $20,000.-$45,000./mo. OC-12 - 622 megabits per second (4 OC3s) no price OC-48 - 2.5 gigabits per seconds (4 OC12s) no price OC-192 - 9.6 gigabits per second (4 OC48s) no price
Source: http://www.infobahn.com/research-information.htm
prices updated: 16 March 2004 May 2004 Dieter W. Storr - [email protected]
58
Peer-to-Peer Remote Copy Extended Distance (PPRC-XD) PPRC = 60 miles - PPRC-XD = continent FlashCopy ESS Shark Also see TimeFinder from EMC
May 2004 Dieter W. Storr - [email protected]
ESS Shark - IBM ESS DASD - HDS also support PPRC
59
External Back-up Systems Fast Copy of Data
Snapshot No data movement A virtual copy by copying pointers Copy Process Physical copy asynchr. from the log. Copy No impact on applic. on the original data Specific Hardware Required Software works only with the hardware Work on Volume Level Some snapshot only tools work also on dataset level May 2004 Dieter W. Storr - [email protected]
60
Snapshot & Physical Copy IBM
Hardware: Enterprise Storage Server Software: Flashcopy http://www.share.org/proceedings/sh98/data/S3087.PDF
EMC
2
Hardware: Symmetrix Remote Data Facility Software: EMC TimeFinder http://www.emc.com/interactive_center/media/timefinder/tf_noRC.htm
l May 2004 Dieter W. Storr - [email protected]
61
How It Works
Read only: update requests are queued Suspend Pre-defined time window Resume Read / update Read only snap Read / update
Source Data May 2004 Snapshot Dieter W. Storr - [email protected]
Physical Backup
Source: SAG
62
Replication Benefits
Warm standby systems can be configured over a Wide Area Network, providing protection from site failures.
Ability to more quickly swap to the standby system in the event of failure, as backup database is already on-line.
Data corruption is typically not replicated as transactions are logically reproduced rather than I/O blocks mirrored.
May 2004 Dieter W. Storr - [email protected]
63
Replication Benefits
Warm standby systems can be configured over a Wide Area Network, providing protection from site failures.
Ability to more quickly swap to the standby system in the event of failure, as backup database is already on-line.
Data corruption is typically not replicated as transactions are logically reproduced rather than I/O blocks mirrored.
ASSO ASSO DATA DATA WORK WORK
May 2004 Dieter W. Storr - [email protected]
64
Replication Benefits
Automatic switch over for clients using a switching mechanism, no client restart needed.
Originating applications are minimally impacted as replication takes place asynchronously after commit of the originating transaction.
The warm standby database is available for read-only operations, allowing better utilization of backup systems.
ASSO ASSO DATA DATA WORK WORK
May 2004 Dieter W. Storr - [email protected]
65
Replication Benefits
Ability to resynchronize and easily switch back to primary system when it becomes available without loss of data.
ASSO DATA WORK WORK DATA ASSO
May 2004 Dieter W. Storr - [email protected]
66
Replication
Limitations
Warm standby system will be out-of-date by transactions committed at the active database that have not been applied to the standby.
Protection is limited to components supporting Warm Standby (e.g. DBMS data sources may be protected but file systems may not be supported).
ASSO DATA DATA WORK WORK ASSO
May 2004 Dieter W. Storr - [email protected]
67
Entire Transaction Propagator
The Entire Transaction Propagator allows for asynchronous data replication.
Replicated data can be updated and synchronized with master data at user specified intervals.
May 2004 Dieter W. Storr - [email protected]
68
OS/390 Recovery Procedures
Prepared by the Mainframe Recovery Team
Recovering
The OS/390 platform
The ABARS aggregates
The ADABAS databases
May 2004 Dieter W. Storr - [email protected]
69
Mainframe Recovery Procedures
Initial Setup
Verify Shipments from Recall Load OS/390 Documentation into BookManager
Pre-IPL Procedures
IPL SunGard Floor System; Check Settings Reserve Cypress Tape Drives Restore SYS002 and OS7PC0 Connect Times and SunGard Catalogs Import MVSCAT Catalog Entries Recover Remaining System Volumes Restore SYSR.DRP
Libraries Go to Pre-IPL Procedures Restore ADABAS Production Volumes Restore Times PROCLIBs VARY OFF Work, Production and Page Packs Initialize Work Volumes Initialize Production Volumes Restore Page Volumes Restore HSM and TMC Datasets Restore Remaining System Catalogs Insert Third-Party Software Passwords Copy and Print SYSLOG IPL Times System
Post-IPL Procedures
Check Clock and Reset, if Needed Change JES2 parm to P=NOREQ RSMSWORK RSMSPRM RHSMTREP RHSMDISM RHSMDELV Begin Application (ABARS) Restores M. Makofske, 77263 Draft of January 24, 2002 May 2004 Dieter W. Storr - [email protected]
70
OS/390 D/R Times (SUNGARD)
About 2400 tapes Shipping time from storage to the mainframe ?
4 hours ahead for tape staging OS/390 and ABARS aggregates 5 hours planned, 7+ hours with problems ADABAS databases Approx. 2-3 hours for tape restore and regenerate Next test Nov 1: approx. 45 minutes from disk pool May 2004 Dieter W. Storr - [email protected]
71
Experiences From D/R Tests
Problems to IPL on a strange CPU (6 hours duration)
Initial setup (restore SYS.. Libraries) Pre-IPL procedures (restore Adabas, work, spool volumes, etc) Post-IPL procedures (DFHSM in disaster mode, etc.)
Application restores
Tape drive offline problems, Import MVSCAT typo errors, etc.
Recovered wrong volumes, generation errors Initialize work volumes - conversion to SMS (DFSMShsm) TMC recovery problems caused BRM recovery problems, too
May 2004 Dieter W. Storr - [email protected]
72
Experiences From D/R Tests
Sent wrong cartridges with system dates to storage Less channels for tapes on our offsite (2 instead of 4) = double restore time
May 2004 Dieter W. Storr - [email protected]
73
Experiences From D/R Tests
RESTONL abended with SB00, no PLOG restored, Recovery Aid flag was on at the saved database.
REGENERATE deleted file and pointed out to repeat the ADALOD job but the input dataset was not saved
We did a full volume restore (DFDSS), restored the database and forgot to format the dual protection logs.
Missed protection logs
BRM restored wrong aggregates
Missing full-volume restores - (Database 2) Missing volumes in Work Storage Pool - (Database 3)
May 2004 Dieter W. Storr - [email protected]
74
Experiences From D/R Tests
BRM: Back-up and Recovery Manager ABARS: Aggregate Back-up and Recovery Support (ABARS = not: Air conditioning and refrigeration industry
services
Recovered (-1) Aggregates instead of (0) – (all Databases) Recovered only SOME files on Aggregate (0) - (Database 1) BRM/ABARS was not properly recovered (wrong version of BRM database) Once those problems were resolved (several hours later), the ADABAS recovery ran smoothly.
5 Databases (61.4GB) restored and regenerated in 3.5 hours (tape/cart)
May 2004 Dieter W. Storr - [email protected]
75
How Far is ‘Far Enough?’
(http://www.drj.com/articles/spr03/1602-02.html)
Alternate Facility
Offsite Storage Facility Answer = 105 miles …so the survey
May 2004 Dieter W. Storr - [email protected]
76
Lessons Learned
(http://www.drj.com/articles/spr02/1502-07.html)
Distance is key Streets, bridges, tunnels, airports are closed
Tape recovery is not effective
All applications are critical
Inconsistent back-up is no back-up at all
People-dependent processes do not suffice
Two sites are not enough
People are irreplaceable; so is information
May 2004 Dieter W. Storr - [email protected]
77
Lessons Learned
(http://www.drj.com/articles/spr02/1502-07.html)
Companies that relied on tape or on third-party provider found in many cases they had difficulty meeting their recovery time objectives
All disasters are possible
May 2004 Dieter W. Storr - [email protected]
78
Helpful Links
Software AG - ADABAS Recovery http://www.softwareag.com/adabas/news/vers_7.htm
http://servline24.softwareag.com/SecuredServices/
May 2004 Dieter W. Storr - [email protected]
79
Helpful Links
FlashCopy http://www.share.org/proceedings/sh97/data/S9111.PDF
http://www.storage.ibm.com/hardsoft/products/ess/pubs/f2ahs05.pdf
Shark (ESS) http://www.almaden.ibm.com/cs/shark/ http://www.storage.ibm.com/hardsoft/disk/index.html
State of the Art Storage http://www.networkmagazine.com/article/NMG20010104S0002/2 EMC TimeFinder http://www.emc.com/products/software/timefinder.jsp
Entire Transaction Propagator (SAG) http://servline24.softwareag.com/SecuredServices/document/html/et p151/pdf/man.pdf
May 2004 Dieter W. Storr - [email protected]
80
Thank you!
May 2004 Dieter W. Storr - [email protected]
Questions?
81