ADABAS Disaster Recovery

Download Report

Transcript ADABAS Disaster Recovery

Experiences with D/R Procedures

Of ADABAS Data on Mainframes Natural Conference Boston Dieter W. Storr May 2004 [email protected]

May 2004 Dieter W. Storr - [email protected]

2

Different Disaster Different Action

Unplanned downtime

 Machine outages  Network outages  Software failures 

Disaster

  Site / data center loss Catastrophic failure May 2004 Dieter W. Storr - [email protected]

3

Leading Causes of Downtime

Source: DRJ Summer 2002, Volume 15, Number 3

Power Storm Flood Terrorism Outage Damage Sabotage 29% 11% 10% 8%

May 2004 Dieter W. Storr - [email protected]

4

Other Causes of Downtime

Fire

Earthquake

Computer Crime

May 2004 Dieter W. Storr - [email protected]

5

LA Times Downtime

Flood Damage 21 April 2002:

 Water was flooding through the Orange County facility, 14-inch pipe that supplies the fire-sprinkler system burst, half the facility standing in more than a foot of muddy water  Affected areas: editorial, ad ops, IT,HR, ADABAS was not affected May 2004 Dieter W. Storr - [email protected]

6

LA Times Downtime

Bomb Alarm 14 June 2002:

 A bomb was believed to have been left in the Bank of America branch that’s set into the Times Building  Security swept the building,  DBA’s observed the system from home May 2004 Dieter W. Storr - [email protected]

7

LA Times Downtime

Bomb Alarm 29 July 2002:

 An intruder claimed to have a bomb, darted into the garage  Security swept the building,  OP stopped CA7 so PLOGCOPY couldn’t start automatically, two PLOG’s got full, ADABAS was locked, DBA’s later started the PLCOPY jobs manually May 2004 Dieter W. Storr - [email protected]

8

LA Times Downtime Power Outage - 29 August 2002 (3:43 P.M.)

 City (DWP) had a power grid, flood leaked into a DWP transformer  There were actually 2 spikes/outages, the first started the UPS switchover, which was interrupted by the second, which took the UPS down

.

May 2004 Dieter W. Storr - [email protected]

9

LA Times Downtime Power Outage cont’

 The network was back in service after a short delay.

 Our Unix-based servers were restarted, and checked. There was no evidence of damage to the Sybase Adaptive Server Enterprise (ASE, formerly: Sybase SQL Server) servers.

May 2004 Dieter W. Storr - [email protected]

10

LA Times Downtime Power Outage cont’

 Mainframe recovery was delayed due to corruption to the Hardware Management Console (HMC)  OP did a power-on reset, which restored the HMC  Operations IPLed, and Technical Support proceeded with system checkout procedures.

 Although Enterprise Storage Server (ESS) had an error indicator, it was still up and did not add to any outages  IBM reset error indicator without impact.

May 2004 Dieter W. Storr - [email protected]

11

LA Times Downtime Power Outages cont’

 Started ADABAS servers manually: Parm Error 23, DIB block remained after an abnormal termination  Started all servers with IGNDIB=YES

18:25 ADABAS IS ACTIVE NO ADAN58 Message

May 2004 Dieter W. Storr - [email protected]

12

LA Times Downtime

ADAN58 Message (ADA71: ADAN5A)

ADAN58 BUFFER-FLUSH START RECORD DETECTED DURING AUTORESTART. THE NUCLEUS WILL T E R M I N A T E AFTER AUTORESTART. IN CASE OF POWER FAILURE, THE DATABASE MIGHT BE INCONSISTENT BECAUSE OF PARTIALLY WRITTEN BLOCKS.

O N L Y IN THIS CASE, REPAIR THE DATABASE BY RESTORE AND REGENERATE; OTHERWISE RESTART THE NUCLEUS.

ADAN5A

: FILES MODIFIED DURING AUTORESTART: files May 2004 Dieter W. Storr - [email protected]

13

Power Failure During Buffer Flush

A B C D E E

May 2004

F C H F C

Dieter W. Storr - [email protected]

D

old block updated block partially updated block on disk

14

Nucleus Restart After Power failure - IGNDIB=YES ADA200 00230 User exit 2 active. ADA201 00230 PLOG2 closed. ADAP3X2P submitted. ADAN21 00230 PROTECTION-LOG PLOGR1 STARTED ADAN02 00230 NUCLEUS-RUN WITH PROTECTION-LOG 00677 ADAL02 00230 2002-08-29 18:25:18 CLOGRS IS ACTIVE ADAN03 00230 ADABAS COMING UP ADAN5A 00230 FILES MODIFIED DURING AUTORESTART: ADAN5A 00230 00038 00057 00069 00072 00073 00074 ADAN5A 00230 00075 00076 00104 00138 00139 00148 ADAN5A 00230 00195 00221 00243 ADAN19 00230 RUNNING WITH ASYNCHRONOUS BUFFERFLUSH ADAN8Y 00230 FILE-LEVEL CACHING INITIALIZED ADAN80 00230 ADABAS DYNAMIC CACHING ENVIRONMENT ESTABLISHED. ADAN01 00230 A D A B A S V6.2.2 IS ACTIVE ADAN01 00230 MODE = MULTI I S O L A T E D ADAN01 00230 RUNNING WITHOUT RECOVERY-LOG ADA800 00230 User exit 8 active.

May 2004 Dieter W. Storr - [email protected]

15

LA Times Downtime Power Outage cont’

 Switched all PLOGs  Checked batch and online  There was no evidence of damage to any of the ADABAS components.

May 2004 Dieter W. Storr - [email protected]

16

Other LA Times Disasters

 1965: Watts riots  1971: Sylmar quake 6.5

 1987: Whittier punch 5.9

 1992: LA riots  1994: Northridge quake 6.7

 6 Feb 1998: El Niňo, flooding in B-1 computer room  15 April 1999: Power failure ‘news editing’ May 2004 Dieter W. Storr - [email protected]

17

ADABAS Recovery

CLOG

Command Log (CLOG) Failure - I/O Error

 Restore or reallocate/format the CLOG  ADABAS will come up through Autorestart normally  No data loss if CLOG is not used May 2004 Dieter W. Storr - [email protected]

18

ADABAS Recovery Protection Log (PLOG) Failure - I/O Error

 Restore or reallocate/format the PLOG  Take a full back-up of the database  ADABAS will come up through Autorestart normally  Restart batch jobs  Restartable batch jobs = OK  Non-restartable batch jobs = check May 2004 Dieter W. Storr - [email protected]

19

ADABAS Recovery

TEMP SORT

TEMP and SORT Failure - I/O Error

 Restore or reallocate/format the TEMP/SORT dataset  Different actions for the utilities  See the ADABAS Utilities manuals 20 May 2004 Dieter W. Storr - [email protected]

ADABAS Recovery DSIM Failure - I/O Error

 Restore or reallocate/format a DSIM dataset  Different actions for the utilities  See the ADABAS Utilities manuals

DSIM

May 2004 Dieter W. Storr - [email protected]

21

ADABAS Recovery

RLOGM RLOGR

Recovery Aid Dataset Failure - I/O Error

 Restore or reallocate/format a RLOG dataset  Prepare the RLOG dataset  ADARAI PREPARE RLOGSIZE / RLOGDEV….

 Different actions for the utilities  See the ADABAS Utilities manuals  Take a full back-up of the database  This will start the first generation of the RLOG dataset May 2004 Dieter W. Storr - [email protected]

22

ASSO ASSO

ADABAS Recovery

DATA DATA

ASSO/DATA Failure - I/O Error

 Copy PLOG twice - ADARES PLCOPY  Restore or reallocate/format DATA dataset(s)  Instead of reallocate/format and restore all DATA volumes, System specialists can  Reallocate and format the new volume  Restore the VTOC chain  Restore and Regenerate only files that were located on the failed volume  Otherwise, . . .

May 2004 Dieter W. Storr - [email protected]

23

ASSO ASSO

ADABAS Recovery

DATA

ASSO/DATA Failure - I/O Error

 Restore entire database ADASAV RESTORE [OVERWRITE = for GCB] ADASAV RESTONL [OVERWRITE] include PLOG

DATA

 Start nucleus with UTIONLY=YES  Regenerate updates from end of last save (SYN2) ADARES REGENERATE PLOGNUM=xxx ADARES FROMCP=SYN2,FROMBLK=xxx May 2004 Dieter W. Storr - [email protected]

24

ASSO ASSO

ADABAS Recovery

DATA DATA

ASSO/DATA Failure - I/O Error

 Possible utilities need to be rerun (see ADARES): ADALOD LOAD FILE=xxx    ADALOD UPDATE FILE=xxx ADALOD UPDATE FILE=xxx,DDISN  ADAINV INVERT FILE=xxx,FIELD=xx  Lock files to rerun utilities  ADADBS OPERCOM LOCKU=xx  Unlock utility-only status  ADADBS OPERCOM UTIONLY=NO May 2004 Dieter W. Storr - [email protected]

25

ASSO ASSO

ADABAS Recovery

DATA DATA

ASSO/DATA Failure - I/O Error

 Rerun the regenerate function for the relevant files  Unlock the regenerated files  ADADBS OPERCOM UNLOCKU=xx  Don’t repeat these steps if ADARES points out:  ADALOD LOAD FILE=nn   ADARES REGENERATE FILE=nn ADADBS REFRESH FILE=nn  Nucleus is ready May 2004 Dieter W. Storr - [email protected]

26

WORK1

ADABAS Recovery

WORK2 WORK3

WORK 1 Failure - I/O Error

 Restore or reallocate/format the WORK dataset  Restore and regenerate the entire database to avoid inconsistencies: open transactions See ASSO/DATA failure May 2004 Dieter W. Storr - [email protected]

27

WORK1

ADABAS Recovery

WORK2 WORK3

WORK 2/3 Failure - I/O Error

 End the database normally (ADAEND) to avoid open transactions in part 1 of WORK  Restore or reallocate/format the WORK dataset  Restart the database normally  If database abends then restore and regenerate the entire database - see ASSO/DATA failure May 2004 Dieter W. Storr - [email protected]

28

DATA DS

ADABAS Recovery

DS

Failure in Data Storage Blocks

//DDSIIN DD DSN=SAVE.SIBA….

// DD DSN=PLCOPY.LOG1… // DD DSN=PLOCPY.LOG2… //DDCARD DD * ADARES REPAIR DSRABN=xxx-yyy ADARES FILE=n1,n2,n3

Failure in DSST

ADADCK DSCHECK FILE=xxx ADADCK REPAIR

DS CALL SAG ! !

May 2004 Dieter W. Storr - [email protected]

29

ASSO

ADABAS Recovery

CP DATA

    

Nucleus Ends With RC 77

 Not restartable     No more space for

Checkpoint File

Rename old WORK (CP) Allocate/format new WORK with old space Change high-used RABN and high-used ISN Restart nucleus with new WORK and UTIONLY=YES Nucleus is in “crippled mode” - no user has access Expand the database Stop the nucleus normally

CP

Rename old WORK and restart the nucleus with old WORK (autorestart) May 2004 Dieter W. Storr - [email protected]

30

ASSO

ADABAS Recovery

User DATA

   

Nucleus Ends With RC 77

 Not restartable     No more space for

user files

Rename old WORK Allocate/format new WORK with old space

User

Restart nucleus with new WORK and UTIONLY=YES Nucleus is in “crippled mode” - no user access Expand database Stop nucleus normally Rename old WORK and restart nucleus with old WORK (autorestart) May 2004 Dieter W. Storr - [email protected]

31

ASSO DATA

ADABAS Recovery Nucleus Abends - Missed DE Values

Descriptor is marked in FDT as DE, value doesn’t exist in ASSO, but in DATA.

Check:  ADAICK ICHECK FILE=xxx[,NOOPEN] 

CALL SAG ! !

ADAVAL VALIDATE FILE=xxx,DESCRIPTOR=yy Solution 1:  ADAULD UNLOAD FILE=xxx,UTYPE=EXF  ADALOD LOAD FILE=xxx,LWP=yyyyK Solution 2:  ADADBS RELEASE FILE=xxx,DESCRIPTOR=yy  ADAINV INVERT FILE=xxx,FIELD=yy,LWP=...

May 2004 Dieter W. Storr - [email protected]

32

Back-up Possibilities

 ADASAV to tape / disk  Including Fast Dump Restore, DFDSS  Delta Save Facility (DSF)  Delta Save QDUMP (Legent)  Disk mirroring (hardware level)  

ASSO DATA

FlashCopy of Enterprise Storage Server (ESS) Peer-to-Peer Remote Copy Extended Distance (PPRC-XD)  OC-3 links two EMC disc arrays  Replication  Stand-by systems   Restore and Regenerate Entire Transaction Server May 2004 Dieter W. Storr - [email protected]

33

ADABAS Disaster Recovery

 How to back-up  Collect recovery data  Restore w/o nucleus  Start nucleus w/ UTILONLY=YES  Regenerate w/ nucleus  Switch UTIONLY=NO May 2004 Dieter W. Storr - [email protected]

34

ADABAS 6.2.2 Back-up at LA Times 21:00 Weekly ASSO / DATA / WORK / etc. ADAP1BKF

Online SAVE

01:00 ADAP1PLC

(FEOFPL)

02:00 03:00 8:00 - 11:00 12:00 ADAP1PLC

PLOG Switch

DFDSS

Full-Volume Back-up

ADAP1BKO

Copy Tapes

PDS, GDGs etc.

BRM/ABARS

Several Jobs May 2004 Dieter W. Storr - [email protected]

Pick-up by Recall

35

Date DB 4/03 8/03 4/03 8/03 4/03 8/03 4/03 8/03 4/03 8/03 1 2 3 4 5

Production Database Back-ups

GB Cartridge 3490 Silo

15 min

Number of 3490 Carts

2 4.9

4.9

30.0

36.7

11.6

17.1

9.7

9.9

5.2

7.3

150+ min 224+ min 110+ min 90+ min 28 min 42 19 9 5

Disk 3390(3399)

< 2 min < 2 min < 35 min < 45 min < 15 min < 22 min < 15 min < 15 min < 5 min < 7 min May 2004

ADASAV SAVE BUFNO=2,TTSYN=60 Record format . . . : VB Record length . . . : 27994 Block size . . . . : 27998 BUFNO=30

Dieter W. Storr - [email protected]

36

Back-up to SMS Disk Pool

 Run times are consistently at least 80% lower when writing to disk instead of cartridge  Run times are consistently around 60% lower when copying from disk to cartridge (compared with cart to cart)  DFSMShsm, automate your storage management tasks, SMS Production Storage Pool May 2004 Dieter W. Storr - [email protected]

DFSMShsm

37

Back-up to Disk Pool

 No cartridge errors  No cartridge drive errors  No cartridges get accidentally ejected from the silo  Smaller back-up window  Smaller maintenance windows  Less impact to application processes  Greater confidence that the data you need will be there when you need it May 2004 Dieter W. Storr - [email protected]

38

IBM Magstar 3494/Virtual Tape Server

   Linear design  1 - 18 frames Conf. Flexibility  SCSI, FC, ESCON, FICON  3590, 3490E, VTS High availability  Dual robotics  Dual library manager

>42 old 3490 carts will fit on 1 new 3494 cart 5 x 3390 volumes fit on one 3494 cart One 3494 cart can be read in 45 seconds into the VTS disk cache (raid-5)

May 2004 Dieter W. Storr - [email protected]

39

Virtual Tape Concept

 Virtual tape drives    Appear as multiple 3490E tape drives 3490E Media 1 and 2 support Shared / partitioned like real tape drives  Tape Volume Caching    All data access is to cache Improves ‘mount’ performance LRU Cache management  Volume Stacking  Fully utilizes physical cart capacity   Reduces physical cart requirement Reduces footprint requirement May 2004 Dieter W. Storr - [email protected]

180 181 . . .

19F Virtual Drive 1 Virtual Drive 2 Virtual Drive n Tape Volume Cache Magstar 3590 30/60 GB capacity*

Virtual Volume 1 Virtual Volume 2 Virtual Volume n Logical Volume 1 Logical Volume n

* assumes 3:1 compression

40

Performance Tests

Input

Adabas Adabas Disk VTS Disk VTS Adabas Adabas

Output

Disk VTS VTS VTS VTS 3590 3490 VTS May 2004

MM.SS

42.63

46.43

42.47

48.38

39.39

47.86

216.27

52.47

Storage

526125 tracks 3390 31 log. 3490 tapes 31 log. 3490 tapes 31 log. 3490 tapes 31 log. 3490 tapes 1 phys. 3590 tape 51 phys. 3490 tapes 39 log. 3490 tapes Dieter W. Storr - [email protected]

41

Collecting Data For Recovery

Block Ranges SYN1 - SYN2 For ADASAV RESTORE From ADASAV SAVE PROTECTION LOG PLOGNUM=64, SYN1=4695, SYN2=4698 From ADAREP SYN1 06 UTI 2002-09-23 21:00:09 64 4695 DUAL ADAP1BKF

SYNP 06 UTI 2002-09-23 21:00:12 64 4696 DUAL ADAP1BKF

SYN2 06 UTI 2002-09-23 21:01:37 64 4698 DUAL ADAP1BKF

SYNV 0A UTI 2002-09-23 21:01:40 64 4699 DUAL ADAP1BKF SYNV 0A UTI 2002-09-23 21:01:40 64 4700 DUAL ADAP1BKF SYNV 28 UTI 2002-09-23 21:02:08 64 4702 DUAL ADAP1PLC SYNP 28 UTI 2002-09-23 21:02:08 64 4703 DUAL ADAP1PLC EOD 00 ET 2002-09-23 23:30:03 64 4747 DUAL ADAPRREP SYNS 53 ET 2002-09-23 23:30:25 64 4749 DUAL ADAP1REP SYNV 28 UTI 2002-09-23 23:30:30 64 4750 DUAL ADAP1PLC SYNP 28 UTI 2002-09-23 23:30:31 64 4751 DUAL ADAP1PLC May 2004 Dieter W. Storr - [email protected]

42

Collecting Data For Recovery

Block Ranges SYN2 - End For ADARES REGENERATE From ADAREP

SYN1 06 UTI 2002-09-23 21:00:09 64 4695 DUAL ADAP1BKF SYNP 06 UTI 2002-09-23 21:00:12 64 4696 DUAL ADAP1BKF

SYN2 06 UTI 2002-09-23 21:01:37 64 4698 DUAL ADAP1BKF

SYNV 0A UTI 2002-09-23 21:01:40 64 4699 DUAL ADAP1BKF SYNV 0A UTI 2002-09-23 21:01:40 64 4700 DUAL ADAP1BKF SYNV 28 UTI 2002-09-23 21:02:08 64 4702 DUAL ADAP1PLC SYNP 28 UTI 2002-09-23 21:02:08 64 4703 DUAL ADAP1PLC EOD 00 ET 2002-09-23 23:30:03 64 4747 DUAL ADAPRREP SYNS 53 ET 2002-09-23 23:30:25 64 4749 DUAL ADAP1REP SYNV 28 UTI 2002-09-23 23:30:30 64 4750 DUAL ADAP1PLC SYNP 28 UTI 2002-09-23 23:30:31 64 4751 DUAL ADAP1PLC May 2004 Dieter W. Storr - [email protected]

43

Collecting Data For Recovery

Dataset Name From Back-up Job (GDG) For ADASAV RESTORE

ADABAS.PRODOFFD.DB1.BACKUP.FULL.G0842V00 CATALOGED May 2004 Dieter W. Storr - [email protected]

44

Collecting Data For Recovery

Dataset Names From PLOG Copy Jobs (GDG) Matching block numbers 4695 - End For ADASAV RESTORE and ADARES REGENERATE DDSIAUS1 OUTPUT VOLUME=WRK015, SESSION NR=64 FROMBLK= 1214, FROMTIME=2002-09-23 03:30:24 TOBLK= 4701, TOTIME= 2002-09-23 21:01:42 ADABAS.PROD.DB1.PLOG.COPY.G7170V00

DDSIAUS1 OUTPUT VOLUME=WRK015, SESSION NR=64 FROMBLK= 4702, FROMTIME=2002-09-23 21:02:08 TOBLK= 4748, TOTIME= 2002-09-23 23:30:03 ADABAS.PROD.DB1.PLOG.COPY.G7171V00

DDSIAUS1 OUTPUT VOLUME=WRK004, SESSION NR=64 FROMBLK= 4749, FROMTIME=2002-09-23 23:30:25 TOBLK= 4791, TOTIME= 2002-09-24 03:30:33 ADABAS.PROD.DB1.PLOG.COPY.G7172V00

May 2004 Dieter W. Storr - 45 [email protected]

Recovery - Part 1 - W/O Nucleus ADASAV RESTONL

//RESTONL EXEC ADASAVRD //DDREST1 DD DISP=SHR,BUFNO=30, // DSN=ADABAS.PRODOFFD.DB1.BACKUP.FULL.G0842V00 //DDPLOG DD DISP=SHR,BUFNO=30, // DSN=ADABAS.PROD.DB1.PLOG.COPY.G7170V00

//DDKARTE DD * ADASAV RESTONL BUFNO=2,OVERWRITE //REPORT EXEC ADAREP //DDKARTE DD * ADAREP NOFILE //

May 2004 Dieter W. Storr - [email protected]

46

Recovery - Part 2

Start the ADABAS nucleus with normal JCL (UTIONLY=YES) ADAN21 00215 PROTECTION-LOG PLOGR1 STARTED ADAN02 00215 NUCLEUS-RUN WITH PROTECTION-LOG 00064 ADAL02 00215 2002-09-21 21:20:29 CLOGRS IS ACTIVE ADAN03 00215 ADABAS COMING UP ADAN19 00215 RUNNING WITH ASYNCHRONOUS BUFFERFLUSH ADAN8Y 00215 FILE-LEVEL CACHING INITIALIZED ADAN80 00215 ADABAS DYNAMIC CACHING ENVIRONMENT ESTABLISHED. ADAN01 00215 A D A B A S V6.2.2 IS ACTIVE ADAN01 00215 MODE = MULTI I S O L A T E D ADAN01 00215 RUNNING WITHOUT RECOVERY-LOG ADA800 00215 User exit 8 active. ADA801 00215 ADAP1PLC submitted.

May 2004 Dieter W. Storr - [email protected]

47

Recovery - Part 2 - With Nucleus ADARES REGENERATE

//REGEN EXEC ADARES //DDSIIN DD DISP=SHR,BUFNO=30, // DSN=ADABAS.PROD.DB1.PLOG.COPY.G7170V00 // DD DISP=SHR,BUFNO=30, // DSN=ADABAS.PROD.DB1.PLOG.COPY.G7171V00

// DD DISP=SHR,BUFNO=30, // DSN=ADABAS.PROD.DB1.PLOG.COPY.G7172V00

//DDKARTE DD * ADARES REGENERATE PLOGDBID=215,PLOGNUM=64 ADARES FROMCP=SYN2,FROMBLK=4698 ADARES TOCP=EOD,TOBLK=00000 not needed

May 2004 Dieter W. Storr - [email protected]

48

Recovery - Part 3 - With Nucleus

Lock files to re-run utilities See regenerate report

 ADADBS OPERCOM LOCKU=fnr or SYSAOS: A / I / L / F or modify command /F jobname,LOCKU=fnr 

Unlock utility-only status for users

 ADADBS OPERCOM UTIONLY=NO or SYSAOS: A / I / L / U or modify command /F jobname,UTIONLY=NO May 2004 Dieter W. Storr - [email protected]

49

Recovery - Part 3 - With Nucleus

Re-run the utilities - if necessary

 ADALOD LOAD / UPDATE / DDISN  ADAINV INVERT FILE=xxx,FIELD=xx 

Unlock files

 ADADBS OPERCOM UNLOCKF=fnr or SYSAOS: A / I / L / F / N or modify command /F jobname,UNLOCKF=fnr May 2004 Dieter W. Storr - [email protected]

50

Delta Save Facility (DSF)

ASSO ASSO ASSO NUCLEUS DLOG Buffer Pool Delta Log (RABN) changed RABN changed blocks DSF=YES DATA DATA DATA ADASAV SAVE DELTA DSF=YES Delta Save Dual Protection Log DDPLOGR2 DDPLOGR1

May 2004

DDDSIM Extracted ADARES PLCOPY Blocks DSF=YES PLOG copy

Dieter W. Storr - [email protected]

DSIM DDSAVE1

Full Image Save Online/Offline DDREST1 Delta Save Facility Delta Save RABN ADASAV RESTORE DSF=YES ASSO Online Images extracted RABN from PLOG DDDELT1-8 DSIM DDDSIM

May 2004 Dieter W. Storr - [email protected]

DATA

52

Delta Save QDUMP (CCA - now: TSI)

ADABAS MPM und Utilities QDUMP Front End ADAIOR CSA RABN-WRITE 1234 6789 8431 8987 9123 9675

http://www.treehouse.com/qdump.shtml

ASSO

May 2004

DATA Read Sub task QDUMP Control Program Internal Buffer Read Sub task Write Sub task Write Sub task

Dieter W. Storr - [email protected]

53

Disk Mirroring Benefits

 Asynchronous disk mirroring can provide better physical protection by supporting extended physical distances.

 No loss of committed transactions in synchronous storage (mirroring/RAID) on a CPU failure

ASSO ASSO DATA DATA

May 2004 Dieter W. Storr - [email protected]

54

Disk Mirroring Limitations

 No protection from data corruption introduced by the hardware / software  Secondary site is not guaranteed to be transitionally consistent, because data is moved at the disk/track/sector or bit level (in the case of asynchronous mirroring).

 Client application must be re-started after failure and need to be aware of failure May 2004 Dieter W. Storr - [email protected]

ASSO ASSO DATA DATA

55

Disk Mirroring Limitations

 Synchronous mirroring and RAID devices can add overhead to application performance.

 Redundant/specialized high availability hardware/software can be expensive and restricted to use for backup purposes only.

 Secondary copy of data is not available for use – low hardware utilization.

 Need to replicate everything on disk, no selectivity of data replication May 2004 Dieter W. Storr - [email protected]

ASSO ASSO DATA DATA

56

S/390

Example For Disk Mirroring

Back Up / Hot Site EMC 5700 UNIX SRDF remote mirrored synchronized S/390

May 2004

OC-3 link 12-15 miles EMC 5700 Main Platform

Dieter W. Storr - [email protected]

SRDF remote mirrored synchronized UNIX

57

Dedicated line broadband speeds and prices

 T-1 - 1.544 megabits per second (24 DS0 lines) Ave. cost $400.-$650./mo.  T-3 - 43.232 megabits per second (28 T1s) Ave. cost $6,000.-$16,000./mo. 

OC-3 - 155 megabits per second (100 T1s)

Ave. cost $20,000.-$45,000./mo.  OC-12 - 622 megabits per second (4 OC3s) no price  OC-48 - 2.5 gigabits per seconds (4 OC12s) no price  OC-192 - 9.6 gigabits per second (4 OC48s) no price

Source: http://www.infobahn.com/research-information.htm

prices updated: 16 March 2004 May 2004 Dieter W. Storr - [email protected]

58

Peer-to-Peer Remote Copy Extended Distance (PPRC-XD) PPRC = 60 miles - PPRC-XD = continent FlashCopy ESS Shark Also see TimeFinder from EMC

May 2004 Dieter W. Storr - [email protected]

ESS Shark - IBM ESS DASD - HDS also support PPRC

59

External Back-up Systems Fast Copy of Data

 Snapshot No data movement   A virtual copy by copying pointers  Copy Process  Physical copy asynchr. from the log. Copy  No impact on applic. on the original data  Specific Hardware Required  Software works only with the hardware  Work on Volume Level  Some snapshot only tools work also on dataset level May 2004 Dieter W. Storr - [email protected]

60

Snapshot & Physical Copy IBM

 Hardware: Enterprise Storage Server  Software: Flashcopy http://www.share.org/proceedings/sh98/data/S3087.PDF

EMC

2

 Hardware: Symmetrix Remote Data Facility  Software: EMC TimeFinder http://www.emc.com/interactive_center/media/timefinder/tf_noRC.htm

l May 2004 Dieter W. Storr - [email protected]

61

How It Works

Read only: update requests are queued Suspend Pre-defined time window Resume Read / update Read only snap Read / update

Source Data May 2004 Snapshot Dieter W. Storr - [email protected]

Physical Backup

Source: SAG

62

Replication Benefits

 Warm standby systems can be configured over a Wide Area Network, providing protection from site failures.

 Ability to more quickly swap to the standby system in the event of failure, as backup database is already on-line.

 Data corruption is typically not replicated as transactions are logically reproduced rather than I/O blocks mirrored.

May 2004 Dieter W. Storr - [email protected]

63

Replication Benefits

 Warm standby systems can be configured over a Wide Area Network, providing protection from site failures.

 Ability to more quickly swap to the standby system in the event of failure, as backup database is already on-line.

 Data corruption is typically not replicated as transactions are logically reproduced rather than I/O blocks mirrored.

ASSO ASSO DATA DATA WORK WORK

May 2004 Dieter W. Storr - [email protected]

64

Replication Benefits

 Automatic switch over for clients using a switching mechanism, no client restart needed.

 Originating applications are minimally impacted as replication takes place asynchronously after commit of the originating transaction.

 The warm standby database is available for read-only operations, allowing better utilization of backup systems.

ASSO ASSO DATA DATA WORK WORK

May 2004 Dieter W. Storr - [email protected]

65

Replication Benefits

 Ability to resynchronize and easily switch back to primary system when it becomes available without loss of data.

ASSO DATA WORK WORK DATA ASSO

May 2004 Dieter W. Storr - [email protected]

66

Replication

Limitations

 Warm standby system will be out-of-date by transactions committed at the active database that have not been applied to the standby.

 Protection is limited to components supporting Warm Standby (e.g. DBMS data sources may be protected but file systems may not be supported).

ASSO DATA DATA WORK WORK ASSO

May 2004 Dieter W. Storr - [email protected]

67

Entire Transaction Propagator

 The Entire Transaction Propagator allows for asynchronous data replication.

 Replicated data can be updated and synchronized with master data at user specified intervals.

May 2004 Dieter W. Storr - [email protected]

68

OS/390 Recovery Procedures

Prepared by the Mainframe Recovery Team 

Recovering

The OS/390 platform

The ABARS aggregates

The ADABAS databases

May 2004 Dieter W. Storr - [email protected]

69

Mainframe Recovery Procedures

Initial Setup

Verify Shipments from Recall Load OS/390 Documentation into BookManager

Pre-IPL Procedures

IPL SunGard Floor System; Check Settings Reserve Cypress Tape Drives Restore SYS002 and OS7PC0 Connect Times and SunGard Catalogs Import MVSCAT Catalog Entries Recover Remaining System Volumes Restore SYSR.DRP

Libraries Go to Pre-IPL Procedures Restore ADABAS Production Volumes Restore Times PROCLIBs VARY OFF Work, Production and Page Packs Initialize Work Volumes Initialize Production Volumes Restore Page Volumes Restore HSM and TMC Datasets Restore Remaining System Catalogs Insert Third-Party Software Passwords Copy and Print SYSLOG IPL Times System

Post-IPL Procedures

Check Clock and Reset, if Needed Change JES2 parm to P=NOREQ RSMSWORK RSMSPRM RHSMTREP RHSMDISM RHSMDELV Begin Application (ABARS) Restores M. Makofske, 77263 Draft of January 24, 2002 May 2004 Dieter W. Storr - [email protected]

70

OS/390 D/R Times (SUNGARD)

 About 2400 tapes  Shipping time from storage to the mainframe ?

 4 hours ahead for tape staging  OS/390 and ABARS aggregates  5 hours planned, 7+ hours with problems  ADABAS databases  Approx. 2-3 hours for tape restore and regenerate  Next test Nov 1: approx. 45 minutes from disk pool May 2004 Dieter W. Storr - [email protected]

71

Experiences From D/R Tests

Problems to IPL on a strange CPU (6 hours duration)

  

Initial setup (restore SYS.. Libraries) Pre-IPL procedures (restore Adabas, work, spool volumes, etc) Post-IPL procedures (DFHSM in disaster mode, etc.)

   

Application restores

Tape drive offline problems, Import MVSCAT typo errors, etc.

Recovered wrong volumes, generation errors Initialize work volumes - conversion to SMS (DFSMShsm) TMC recovery problems caused BRM recovery problems, too

May 2004 Dieter W. Storr - [email protected]

72

Experiences From D/R Tests

 

Sent wrong cartridges with system dates to storage Less channels for tapes on our offsite (2 instead of 4) = double restore time

May 2004 Dieter W. Storr - [email protected]

73

Experiences From D/R Tests

RESTONL abended with SB00, no PLOG restored, Recovery Aid flag was on at the saved database.

REGENERATE deleted file and pointed out to repeat the ADALOD job but the input dataset was not saved

We did a full volume restore (DFDSS), restored the database and forgot to format the dual protection logs.

Missed protection logs

BRM restored wrong aggregates

 

Missing full-volume restores - (Database 2) Missing volumes in Work Storage Pool - (Database 3)

May 2004 Dieter W. Storr - [email protected]

74

Experiences From D/R Tests

 

BRM: Back-up and Recovery Manager ABARS: Aggregate Back-up and Recovery Support (ABARS = not: Air conditioning and refrigeration industry

services )

Recovered (-1) Aggregates instead of (0) – (all Databases) Recovered only SOME files on Aggregate (0) - (Database 1) BRM/ABARS was not properly recovered (wrong version of BRM database) Once those problems were resolved (several hours later), the ADABAS recovery ran smoothly.

5 Databases (61.4GB) restored and regenerated in 3.5 hours (tape/cart)

May 2004 Dieter W. Storr - [email protected]

75

How Far is ‘Far Enough?’

(http://www.drj.com/articles/spr03/1602-02.html) 

Alternate Facility

Offsite Storage Facility Answer = 105 miles …so the survey

May 2004 Dieter W. Storr - [email protected]

76

Lessons Learned

(http://www.drj.com/articles/spr02/1502-07.html) 

Distance is key Streets, bridges, tunnels, airports are closed

Tape recovery is not effective

All applications are critical

Inconsistent back-up is no back-up at all

People-dependent processes do not suffice

Two sites are not enough

People are irreplaceable; so is information

May 2004 Dieter W. Storr - [email protected]

77

Lessons Learned

(http://www.drj.com/articles/spr02/1502-07.html) 

Companies that relied on tape or on third-party provider found in many cases they had difficulty meeting their recovery time objectives

All disasters are possible

May 2004 Dieter W. Storr - [email protected]

78

Helpful Links

 Software AG - ADABAS Recovery http://www.softwareag.com/adabas/news/vers_7.htm

http://servline24.softwareag.com/SecuredServices/  ADABAS Restart and Recovery (Operations Manual) http://servline24.softwareag.com/SecuredServices/  University of Arkansas - D/R Plan http://www.uark.edu/staff/drp/  Disaster Recovery Journal http://www.drj.com

May 2004 Dieter W. Storr - [email protected]

79

Helpful Links

 FlashCopy http://www.share.org/proceedings/sh97/data/S9111.PDF

http://www.storage.ibm.com/hardsoft/products/ess/pubs/f2ahs05.pdf

 Shark (ESS) http://www.almaden.ibm.com/cs/shark/ http://www.storage.ibm.com/hardsoft/disk/index.html

 State of the Art Storage http://www.networkmagazine.com/article/NMG20010104S0002/2  EMC TimeFinder http://www.emc.com/products/software/timefinder.jsp

 Entire Transaction Propagator (SAG) http://servline24.softwareag.com/SecuredServices/document/html/et p151/pdf/man.pdf

May 2004 Dieter W. Storr - [email protected]

80

Thank you!

May 2004 Dieter W. Storr - [email protected]

Questions?

81