Advanced C as it Relates to OnLine

Download Report

Transcript Advanced C as it Relates to OnLine

Archiving & Restoring
John F. Miller III
7/17/2015
1
TOC
•
•
•
•
•
Term & History
Disaster Recovery Planning
Backup & Restore Procedures
Architecture (XPS differences)
The grab bag
7/17/2015
2
Terminology
• Serial Backup
– Archives the entire system at a single point in time
using only one data stream
• Parallel Backup
– Archives the requested dbspace one at a time to N
data streams
• External Backup
– Allows a third party application to backup the
database server while maintain logical consistency
7/17/2015
3
Terminology
• Cold Restore
– Restoring the server when the database engine
is offline
• Warm Restore
– Restores of dbspaces which occur while the
database engine is online
• Mixed Restore
– A cold restore of set dbspaces followed by a
warm restore of other dbspaces
7/17/2015
4
Terminology
• Imported Restore
– Transferring an archive taken on one computer
and restoring it on a second computer
• Point-in-Time Restore
– Restoring the entire system to a single point it
time
• Restartable Restore
– Allows the DBA to pickup the restore from the
failure point
7/17/2015
5
Early Backup and Restore
History
• 1.X Turbo
– Only Quiescent mode archives
• 4.X named OnLine for advanced archiving
technology
• 5.X same core technology
– limitation revealed (scalability & extensibility)
7/17/2015
6
DSA Backup and Restore
History
•
•
•
•
•
6.0 new client/server model developed
7.1 & 7.20 same core technology
7.21 new client (onbar)
7.3 server API re-write
9.2 onbar usability features added
7/17/2015
7
Pre-DSA Archive
Bad Grammar Archive
•
•
•
•
•
Archive Checkpoint (get timestamp)
Free extents recorded
Reserve pages saved
Chunks backed-up by ascending chunk number
Pages modified during archive are placed in physical
log
• tbtape routinely scans physical log for unarchived
before-images
• Pages placed directly to tape
7/17/2015
8
Pre-DSA Restore
• Begins with OnLine off-line
• Reads configuration file, matches params to
config params of archive tape
• Zero out logs (physical & logical)
• Validate size of all chunks
• Read tape, copying pages based on their
address directly to disk
7/17/2015
9
DSA Archive Architecture
Major Differences
• True client-server architecture
• Archived pages logically grouped by
dbspaces
• Granularity of creations
• Granularity of restores
• Warm restores
• Physical log pages kept in temp tables
7/17/2015
10
Server Algorithm Changes
Good Grammar Archive
• List is made of all pages that should be archived
– Cost vs Benefit
• Before images are queued by the modifier
• A new thread is responsible for the before image
handling
7/17/2015
11
Disaster Recovery
• Goals
• Planning
7/17/2015
12
What is a Successful
Recovery?
• “Successful” recovery
is defined by your
business needs
7/17/2015
13
Goals For Recovery
• Determine acceptable recovery time
– How long can your business function without
the data?
– How long can your production system be down
during a restore?
7/17/2015
14
Determine Acceptable
Data Loss
Type
Time
Quantity
Distribution
7/17/2015
15
Recovery Strategy
Plan Recovery
Goals
Tune the
Strategy
Analyze/Test
the Strategy
7/17/2015
Select
Tools
Implement
The Strategy
16
Data Layout
• Poor data layout can hurt BAR performance
• Isolating the different types of data can
facility restore priority
• Example
– 8 dbspaces each with 2 chunk, but one dbspace
has 68 chunk
7/17/2015
17
Data Layout Examples
• Important frequently modified in its own
dbspaces
– important data such as orders should
dbspace_orders
– dbspace containing zipcodes and other lightly
modified data can be backed up with less
frequency
7/17/2015
18
Right, Fast or Cheap?
Choose Two!
7/17/2015
19
Select Tools
Backup Utilities
Load/Unload
ontape
ON-Bar
External Backup/Restore
Fault Tolerance Mechanisms
Mirroring
High Availability Data Replication
(HDR)
Enterprise Data Replication (DR)
7/17/2015
High Performance
Loader (HPL)
dbexport/dbimport
dbschema
SQL load/unload
onload/onunload
dbload
Customer ESQL
programs
20
Ontape Backup Features
•
•
•
•
•
Backup at the Server level
Support for incremental backups
Manual or continuous logical log backup
Restore entire system or single dbspace
Backup is self describing
7/17/2015
21
On-Bar Backup Features
•
•
•
•
•
•
Parallel backup and restore
System and dbspace level backup and restore
Support for incremental backups
Manual or automatic backup of logical logs
Instance point-in time recovery
Open interface for communication with
storage managers (XBSA)
7/17/2015
22
External Backup Features
• EBR allows administrators to make a
consistent copy of their dbspaces using
external tools
• Used with many 3rd party backup products
• Allows for both cold and warm restores
7/17/2015
23
EBR - Examples
• Planned uses:
– File system snapshots
– Breaking of mirrors
– Third party “raw” backup
• Basic Steps
– Block coserver(s) at checkpoint
– Backup dbspaces using third party tools
– Unblock coserver(s)
7/17/2015
24
Restoring
• Logical Logs required
• Restore looks hung, nothings happening
• Handling unanticipated problems
7/17/2015
25
Logical Logs Required
for a Restore
• Cold Parallel Restore
– Starting log is the log that contains the begin of
the oldest active transaction when the first
archive checkpoint occurred
– At least the logical log that contains the last
archive checkpoint
• Cold Whole System (Non-Parallel)
– No logical logs required
– Logs included with archive
7/17/2015
26
Logical Logs Required
for a Restore
• Warm Restore
– Starting log is the log that contains the begin of
the oldest active transaction when the first
archive checkpoint
– All logs to the current point in time
• If you are using DR then you must include
the replay point
7/17/2015
27
Example of Logical Logs
Required for a Restore
Log 10
B
Log 11
B
Log 12
Log 13
B
Archive Checkpoint
B
Oldest Begin Work
Logs Required
Cold restore all
Warm restore
7/17/2015
Logs 10-12
Optional 13
Logs 11->
No Optional Logs
28
Restartable vs.
Suspended Restored
• Restartable Restore
– When the database engine prematurely shuts
down the engine may be restarted in recovery
mode
• Suspended Restore
– When the archive client receives an error which
is restartable and the database engine does not
shutdown
7/17/2015
29
Restartable Restore
• Turned OFF by default
• What can restart when?
– Whole system
– Partial Restore
– Logical Recovery from a cold restore
• Only available with On-BAR
• onbar -RESTART
Architecture
• Overview
• Archive Clients
• Moving Data
– IDS
– XPS
• Server Threads
• XPS Architecture
7/17/2015
31
What Pages are Sent to
the Archive
• If page’s timestamp is older than maxstamp
and newer than minstamp, it is put to tape
• If a page is greater than current stamp, but
older than minstamp, it is put to tape, and
it’s timestamp is updated to maxstamp-1
• Pages newer than max, but older than
current are considered to be modified after
the archive started, and are ignored.
7/17/2015
32
Understanding
Timestamps
0
Max-Stamp
Not Archived
Current Stamp
7/17/2015
33
OnLine Wheel-O-Death
0
Min-Stamp
Max-Stamp
The timestamp 50%
away from Max-Stamp
ie Max-Stamp - 2GB
The timestamp at the
start of the archive
Not Archived
All Pages in the red region have
their timestamp updated along
with being archived
7/17/2015
Current Stamp
The timestamp at the
current point in time
34
Archive Clients
7/17/2015
XBSA
SMV
XBSA
Onbar
Common
Archive
Code
35
DSA Client Server
Model
Archive
Client
7/17/2015
SQLI/ASF
Network Connection
Streams
Local Connection
Archive
BE
36
Moving Data between
Client and Server
SQLI Requests
Archive Data Buffer
ONINIT
SQLI Returns Shared
Memory Address
Shared Memory
Archive
Client
7/17/2015
37
Moving Data between
Client/Server
• The size of the buffers used to transmit data
– ontape - control by onconfig’s TAPEBLOCK
– onBar - BAR_XFER_BUFSIZE - maximum size
is one online page smaller than 64kb
• The number of buffers:
– ontape
– onbar - BAR_XPORT_COUNT min 3 max 99
• Monitoring the data transfer
– onstat -g stq
7/17/2015
38
What Data is Shipped to
the Archive Client
• Server sends raw
online pages just
like they exist on
disk
7/17/2015
39
Example of onstat -g stq
Stream Queue: (session 11 cnt 10) 0:ad91400 1:ada1400
2:adb1400 3:adc1400 4:add1400 5:ade1400 6:adf1400
7:ae01400 8:ae11400 9:ae21400
Full Queue: (cnt 0 waiters 0) 0:0 1:ada1400
2:adb1400 3:adc1400 4:add1400 5:ade1400 6:adf1400
7:ae01400 8:ae11400
Empty Queue: (cnt 0 waiters 1)
Stream Queue: (session 10 cnt 10) 0:ac8d400 1:ac9d400
2:acad400 3:acbd400 4:accd400 5:acdd400 6:aced400
7:acfd400 8:ad0d400 9:ad1d400
Full Queue: (cnt 9 waiters 0) 0:ac9d400 1:acad400
2:0 3:accd400 4:acdd400 5:aced400 6:acfd400
7:ad0d400 8:ad1d400
Empty Queue: (cnt 0 waiters 1)
7/17/2015
40
Server Threads
• ontape
• Scanner
• Before Image Processor
7/17/2015
41
Ontape Thread
• Always called ontape regardless of archive
client
• Responsible for all communication to
archive client
7/17/2015
42
Scanner Thread
(arc_backup1)
• The “dummy” thread, geared for
I/O performance and not thinking
• Handed a list of pages to backup
• Scans data from disk into shared
memory buffers
• Makes NO decisions about the data
• Ensures the page address is correct
7/17/2015
43
Before Image Processor
Thread (arc_backup2)
• Monitors the before image queues
• Determines if the before image
needs to be saved or discarded
• Drains the before image memory
queue, by storing the page images
into temp tables
• Creates multiple temp tables if
required
7/17/2015
44
XPS Difference &
Architecture Overview
• Basic XPS Architecture
– Client Sub-Systems
– Server Sub-Systems
• Differences
– sysutils
– configuration
7/17/2015
45
Basic XPS Architecture
Storage
Manager 1
Coserver 3
Coserver 4
Storage
Manager 2
OnLine XPS
Coserver 1
Coserver 2
onbar
7/17/2015
46
Client Sub-Systems
Executable
Function
onbar
Shell script wrapper
onbar_d
The driver
start_worker
Shell script wrapper
onbar_w
Worker process
onbar_m
Distributes bootfiles
onbar_s
Checks server state
7/17/2015
47
Client Sub-Systems
Storage
Manager 1 onbar_w
Coserver 3
Coserver 4
OnLine XPS
Coserver 1
7/17/2015
onbar
onbar_d
Coserver 2
48
Server Sub-Systems
• ASF/local streams
– Send/Receive commands and data buffers
• Backup Scheduler (BUS)
New
– distributes tasks to workers
• XBAR
New
– communicates between coservers
• RSAM
– only sees a single coserver
– manages all I/O to disk (dbspaces/chunks)
7/17/2015
49
XBAR
• Interfaces with both BUS and RSAM
• Manages distributed execution of backup
and restores
– transfers data from the object’s coserver
(coserver where the dbspace/chunk exists) to
onbar_w’s coserver (output coserver)
– Uses XMF between coservers
– Uses local stream between onbar_w and output
coserver
7/17/2015
50
Backup Scheduler (BUS)
• Manages user requests, workers, storage
managers and coservers
• Farms out work to onbar_w
• Reports success or failure to onbar_d after
each work item has been attempted
• onbar_w create a new worker queue in the
bus when it is started
7/17/2015
51
XBAR/BUS support in
SMI
• New tables for BUS data structures:
7/17/2015
–
–
–
–
–
–
–
–
sysbusession list of sessions
sysbuobject
what’s in the queue
sysbuobjses
for which session
sysbusm
BAR_SM paragraphs
sysbusmdbspace
space to BAR_SM map
sysbusmlog
logstream to BAR_SM map
sysbusmworker
worker to BAR_SM map
sysbuworker
info about each onbar_w 52
Moving Data between
Client/Server Version 8
Storage
Manager 1 onbar_w
SQLI
Coserver 3
Coserver 4
OnLine XPS
Shared Memory
Coserver 1
Coserver 2
SQLI
7/17/2015
onbar
onbar_d
53
Difference Between
8 and 7
• Multiple Nodes
• Non-locality of devices and data
– Backup data may be shipped between nodes
• Multiple Storage Managers
– One Storage manager can server the entire
system
– Multiple storage managers can eliminate
performance bottlenecks for large systems
7/17/2015
54
Difference immediately
seen by DBAs
• Command line is slightly
different
• Configuration parameters are
very different
– Version 7 has 6 configuration
parameters, none needs to be set
– Version 8 has 15 configuration
parameters, most must be
configured
7/17/2015
55
Difference immediately
seen by DBAs
• sysutils has more columns
• Emergency bootfiles
– more columns
– 1 boot file per coserver
– Merge boot files
• Additional onstat options
7/17/2015
56
arc_very_old_pages()
Why do it??
7/17/2015
57
arc_very_old_pages()
• Permanent solution #1
–
–
–
–
No longer use timestamps for recovery
Disk timestamps do not need to be refreshed
Memory and disk timestamp are different
Bitmaps used to keep track of foreground writes
• Permanent solution #2
– Multiple instances of the same page in the physical log
– Only the oldest instance of a page is restored during
physical recovery
7/17/2015
58
7.31 Solution #1
• Must be enabled CCFLAGS
Physical Recovery Started at Page(1:1065).
Physical Recovery Complete: 0 Pages Examined 0 Pages Restored.
7/17/2015
59
9.21 Solution #2
Physical Recovery Started at Page(1:1065).
Physical Recovery Complete: 0 Pages Examined 0 Pages Restored.
7/17/2015
60
Override Internal Error
Checks
• The -O option is much like -f for UNIX rm
• Does many different things:
– Allow restore of a space that is still on-line
– Creates a filesystem entry for each chunk if
there isn’t one
– Allows expiration of objects from sysutils and
the storage manager that may be needed in a
restore
7/17/2015
61
Archive Utilities
• Explaining onstat & oncheck options
– onstat -d
– onstat -g arc
– onstat -g stq
• Validating Archive
• Managing the archive catalogs
7/17/2015
62
onstat -g arc
num
2
3
DBSpace
dbspace1
dbspace2
Q Size Q Len
92
0
69
0
Buffer partnum
4
0x100085
1
0x100084
Dbspaces - Archive Status
name
number level date
rootdbs
1
0
10/04/2001.10:17
dbspace1
2
0
10/04/2001.10:17
dbspace2
3
0
10/04/2001.10:17
sbspace1
4
0
10/04/2001.10:17
sbspace2
5
0
10/04/2001.10:17
7/17/2015
log
5
5
5
5
5
size
240
150
scanner
0x2033ee
0x302f1a
log-position
0x10b608
0x10b608
0x10b608
0x10b608
0x10b608
63
onstat -d information
•
•
•
•
•
D Chunk is down
L Storage space is being logically restored
O Chunk is online
P Storage Space is physically restored
R Storage space is being restored
7/17/2015
64
oncheck -pr
Validating PAGE_1DBSP & PAGE_2DBSP...
DBspace number
2
DBspace name
dbspace1
. . . . .
DBspace archive status
Archive Level
Real Time Archive Began
Time Stamp Archive Began
Logical Log Unique Id
Logical Log Position
7/17/2015
Archive Level
Real Time Archive Began
Time Stamp Archive Began
Logical Log Unique Id
Logical Log Position
0
10/04/2001 10:33:09
306128
6
0x3d2018
1
10/04/2001 10:35:28
323695
8
0x208018
65
Validating Archives
• Utilizes a executable
called archecker
7/17/2015
66
Validating Archives
• What is actually validated
• What other information is there for me
• What else can go wrong with my validated
restore
• How do I validated my archives
7/17/2015
67
What is actually validated
• Format of each page on the archive is check (similar
to oncheck -cd)
• Tape control pages are sanity check
• Each table is checked ensuring all pages of the table
exist on the archive tape
• Reserve page format is validated
• Each chunk free list is verified
• Table extents are checked for overlap (oncheck -pe)
7/17/2015
68
Other Information for
the DBA
• AC_MSGPATH - Message log for archecker
• {AC_STORAGE}/INFO
– extent list for each dbspace, oncheck -pe
DBS.{dbspace_#}
– time to process each tape/object
– Information about the number and type of pages
processed; profile.{pid}
• {AC_STORAGE}/SAVE
– contains a binary image of control information
7/17/2015
69
Profile Information
Profile Information
=======================
Total pages processed
Total Data pages
Total index pages
Total smart blob pages
Total blob space pages
Total partition pages
Total chunk free list pages
Total Reserve pages
Total bit map pages
MORE . . .
7/17/2015
51227
49327
828
6
0
328
5
12
335
70
Extent Information
db1:sysprocedures
db1:sysprocbody
db1:sysprocauth
db1:sysprocedures
db1:sysprocbody
db1:t1
FREE
7/17/2015
0x00200235
0x0020023D
0x0020025D
0x00200265
0x0020026D
0x0020028D
0x002061A5
8
32
8
8
32
24344
3
71
Validating Archives
• ontape
– archecker -tdvs
– AC_TAPEBLK, AC_TAPEDEV
• onbar
– onbar -r -v (version 7.3X)
– onbar -v (9.20 & 8.30)
– onbar -b -v (8.30)
7/17/2015
72
onsmsync
• Adds from ixbar files to sysutils
• Removes objects from sysutils
• Three expiration policies
– -g: remove older than the Nth generation
– -t: remove from before a datetime
– -i: remove older than an interval
7/17/2015
73
Understand ixBar Files
•
•
•
•
•
•
•
•
Server name
object name
object type
is_serial
action id
archive level
SMV copy id high
SMV copy id low
7/17/2015
•
•
•
•
Backup start date
Backup start time
Backup end data
Backup end time
74
Storage Manager Snafus
• Timeout of onbar
• Error 131 Object not found
• Salvaging logs and getting wrong object
7/17/2015
75
Recovery Snafus
• Check the devices are linked proper
– KAIO only uses raw I/O
– overlapping data
• While restoring database appears hung
7/17/2015
76
Preparing to Call Support
7/17/2015
77
Restore seems Hung
•
•
•
•
The tape is done
onstat -D shows no I/O
Very little CPU activy
While the system clears the
physical and logical logs
there is very little activity
and the system appears to
be hung.
7/17/2015
78
Improvements
• A message into the online log indicating this
phase of the restore started and completed.
• The use of intelligent parallelism to clear all
the logs in a single chunks with one thread.
One disk clear thread per chunk.
Clearing the physical and logical logs has started
Cleared 2100 MB of the physical and logical logs in 612 seconds
7/17/2015
79
Parallel Archive Procedures
• The archive is broken down into archive jobs
with each dbspace being its own backup
• An onbar_d is started to backup a single
dbspace
• Connects to database server and Storage
manager requesting the backup session
• Updates sysutils and ixbar file
7/17/2015
80
Parallel Restore Procedures
7/17/2015
81