Transcript Slide 1

Oracle High Availability - A Case
Study
Rama Balaji
Senior Oracle Consultant
Copyright © 2009 Rolta International, Inc., All Rights Reserved
Overview
• This presentation is a case study of a
customer whose storage array failed
during heavy transaction processing
period.
• Methods followed to quickly restore the
database.
• How the best practice configuration was
arrived while maintaining at least two
copies of production databases on-line.
• How Oracle’s HA components were
effectively explored to achieve the final
configuration with only few minutes of
production downtime.
Agenda
• Background
• Approach
• Initial Configuration
• Disaster to Full recovery
• High Availability Features
• Conclusion and best practices
• Question and Answer
Oracle Features Used
• RMAN compressed backups
• Physical Standby using Data Guard
• ASM to non-ASM and vice versa
• Cascaded Destinations Standby
• 2 node RAC
• Failover and Switchover
• Flash Recovery Area and Flash Back
Logs
Background
• E-commerce client
• Nature of business
• Initial cost effective business
approach
• Business growth
• IT infrastructure
Approach
Event
Database Availability
1. Storage array failure
No Database
2. Restored database
from tape using RMAN
One
3. Set up the Physical
Standby
Two
4. Second Physical
Standby using cascaded
destination feature
5. Failover from production
to standby
Three
Two
Approach
Event
6. 2-Node RAC with ASM on
the new storage array as
standby
7.Switched over from single
node Db instance to 2 Node
RAC as primary
8.Disconnected the second
standby
9. 2 node RAC standby to 2
node RAC primary
10. Flashback Logs cleanup
from FRA
Database
Availability
Three
Three
Two
Two
No Downtime
Initial configuration
2 node RAC + ASM
Storage Array
• Server - Linux Red Hat x86_64
• Database – Oracle 10.2.0.3
• ASM – two diskgroups
DATA_DG – Database
FLASH_DG –
flash_recovery_area
• Daily full database backup to
disk- RMAN, and to tape using
third party software
• Hourly archivelog backups to
disk – RMAN, and to tape using
third party software
• No RMAN catalog
RMAN
• RMAN> show controlfile autobackup;
RMAN configuration parameters are:
CONFIGURE CONTROLFILE AUTOBACKUP ON;
• RMAN> show controlfile autobackup format;
RMAN configuration parameters are:
CONFIGURE CONTROLFILE AUTOBACKUP
FORMAT FOR DEVICE TYPE DISK TO
'/opt/oracle/admin/PROD/backup/cntl/%F';
RMAN Backup script
• RUN
{
CONFIGURE RETENTION POLICY TO
REDUNDANCY 1;
ALLOCATE CHANNEL ch1 DEVICE TYPE DISK
MAXPIECESIZE 2G FORMAT
'/opt/oracle/admin/PROD/backup/db/RCOMPRESSE
D_%U';
BACKUP as compressed backupset DATABASE
plus archivelog channel ch1;
}
RMAN Archivelog backup script
• RUN
{
ALLOCATE CHANNEL ch1 DEVICE TYPE DISK
FORMAT
'/opt/oracle/admin/PROD/backup/archive/ARCH_%U'
;
BACKUP as compressed backupset archivelog all;
}
Storage Array Failure
2 node RAC + ASM
Storage Array
• Storage admin determined
that the SAN failure has
caused loss of entire data
including the database and
backups.
• Validated that the RMAN
backup on tape from
previous night as well as
archivelog files from
previous hour were intact.
Restore using RMAN
2 node RAC + ASM
Storage Array
Single DB
Non- ASM
• Restored the tape backup
and all archivelog file
backups from tape to
internal disks on one of the
RAC node
Restore from ASM to nonASM
• Restored the backup from tape to the original
backup location.
• Restored spfile from autobackup.
RMAN> restore spfile from autobackup;
• Created pfile from spfile;
Restore from ASM to non-ASM
(Continued…)
•
Edited the following parameters in pfile
 control_files - Changed +DATA_DG and
+FLASH_DG to file system
 db_file_name_convert - Changed
+DATA_DG to file system
 log_file_name_convert - Changed
+DATA_DG and +FLASH_DG to file system
•
startup nomount using pfile;
•
Using RMAN nocatalog, restore controlfile from
autobackup.
RMAN> restore controlfile from
‘/opt/oracle/admin/PROD/backup/cntl/c-66219680220081230-0e’
Restore from ASM to nonASM (Continued…)
•
alter database mount;
•
restore database;
•
recover database;
•
Created temporary tablespaces
•
alter database open;
Physical Standby
PROD
PRODS
Primary
Database
Standby
Database
Physical Standby Steps
•
On the primary database (PROD) generated pfile.
Create pfile=’/tmp/initPROD.ora’ from spfile; and
copy that file to the standby server.
•
On the primary (PROD) backed up the current
controlfile using RMAN.
RMAN> backup current controlfile for standby;
•
RMAN> copy current controlfile for standby to
'/tmp/sby_control01.ctl';
Physical Standby Steps
(Continued…)
•
Run a full RMAN backup on the primary (PROD).
RMAN> RUN
{
ALLOCATE CHANNEL ch1 DEVICE TYPE
DISK MAXPIECESIZE 2G FORMAT
'/home/oracle/backup_standby/RCOMPRESSED_%
U';
BACKUP as compressed backupset DATABASE
plus archivelog channel ch1;
}
•
Copied all backup files as well as controlfile backup
to the standby (PRODS). Backup files needs to be
in the same location as the primary backup location
or create a symbolic link.
Physical Standby Steps
(Continued…)
•
Edited the parameter file on the standby
(PRODS).
 Changed control_files parameter to point to
the controlfile on the standby server.
 Changed db_file_name_convert to use the
new location on the standby server.
 Changed log_file_name_convert to use the
new location on the standby server.
•
Start up the standby (PRODS)instance in
nomount
Physical Standby Steps
(Continued…)
•
rman nocatalog
RMAN> connect target
username/password@PROD
RMAN> connect auxiliary /
RMAN> RUN {
ALLOCATE auxiliary CHANNEL ch1 DEVICE
TYPE DISK FORMAT
'/home/oracle/backup_standby/RCOMPRESSED_%
U';
duplicate target database for standby;
}
•
Changed the following parameter on the primary
(PROD).
SQL> ALTER SYSTEM SET log_archive_dest_2=
'SERVICE=PRODS_XPT REOPEN=300'
Physical Standby Steps
(Continued…)
•
Changed the standby (PRODS) to Managed
Recovery
alter database recover managed standby database
disconnect from session;
•
Issued the following command to make sure the
archive log files are applied on the standby
(PRODS)
Select sequence#, applied from v$archived_log
order by sequence#;
Data Guard Cascaded
Destinations
PROD
Primary
Database
PRODS
Data Guard
LOG_ARCHIVE_DEST_2
='SERVICE=PRODS_XPT
REOPEN=300';
Standby
Database
1
PRODC
Data Guard
Cascaded
Destination
LOG_ARCHIVE_DEST_2
='SERVICE=PRODC_XPT
REOPEN=300';
Standby
Database
2
Data Guard Failover
PROD
Primary
Database
Data
Guard
Failover
PRODS
PRODC
Primary
Database
Standby
Database
Data
Guard
Data Guard Failover Steps
•
On the primary (PROD)I did the following.
After users were logged out, created a
table called TEST.
Create table test as select * from
dba_users;
This was a last operation on the primary
database.
•
On the primary (PROD) issued the
following command couple of times.
Alter system switch logfile;
Data Guard Failover Steps
(Continued…)
•
•
On the primary (PROD), noted down the sequence#
SQL> archive log list
Database log mode
Archive Mode
Automatic archival
Enabled
Archive destination /d01/oracle/flashback
Oldest online log sequence 6503
Next log sequence to archive 6504
Current log sequence
6504
On the standby (PRODS) issued the following
command to make sure standby is in maximum
performance mode.
SQL> alter database set standby database to
maximize performance;
Data Guard Failover Steps
(Continued…)
•
On the standby(PRODS), issued the following
command
SQL> select thread#, low_sequence#,
high_sequence# from v$archive_gap;
no rows selected
•
On the standby (PRODS), check the last
sequence#,
SQL> select max(sequence#) from
v$archived_log;
MAX(SEQUENCE#)
------------6504
Data Guard Failover Steps
(Continued…)
•
On the standby (PRODS),
SQL> alter database recover managed standby
database finish force;
Database altered
On the standby (PRODS),
SQL> alter database commit to switchover to
primary;
Database altered
•
On the standby instance (PRODS),
SQL> shutdown immediate
ORA-01507: database not mounted
ORACLE instance shut down.
Data Guard Failover Steps
(Continued…)
•
SQL> startup
ORACLE instance started.
Total System Global Area 2147483648 bytes
Fixed Size
Variable Size
2074152 bytes
1056967128 bytes
Database Buffers
1073741824 bytes
Redo Buffers
14700544 bytes
Database mounted.
Database opened.
At this point standby is opened as primary.
Data Guard Failover Steps
(Continued…)
•
On the current primary (PRODS), issued
the following query to make sure the table
was brought over.
Select * from test;
then drop the table.
•
On the second standby (PRODC) instance
verified the cascaded standby destination
by querying v$archived_log.
•
I had to bounce the cascaded standby
instance (PRODC), and put it back in a
recovery mode.
2-Node RAC as Standby
PRODC
PRODS
Primary
Database
Data Guard
2 node RAC + ASM
Standby Database 2
PROD1
PROD2
Standby
Database 1
Storage Array
Data Guard Switchover Steps
•
Make sure only one instance on the RAC system is
mounted(PROD1).
•
Make sure the FAL_SERVER and FAL_CLIENT
parameters are set correctly on the primary as well
as standby.
On the primary
FAL_SERVER=standby instance
FAL_CLIENT=primary instance
On the Physical Standby
FAL_SERVER=primary instance
FAL_CLIENT=standby instance
Data Guard Switchover Steps
(Continued…)
•
On the primary (PRODS) issued the following
SQL> select switchover_status from v$database;
SWITCHOVER_STATUS
-------------------------------TO STANDBY
•
On the standby (PROD1) instance issued the
following command
SQL> select switchover_status from v$database;
SWITCHOVER_STATUS
-------------------------------TO PRIMARY
Data Guard Switchover Steps
(Continued…)
•
On the primary (PRODS) issued the following
command
SQL> ALTER DATABASE COMMIT TO
SWITCHOVER TO PHYSICAL STANDBY;
•
SQL> ALTER SYSTEM SET
log_archive_dest_state_2='DEFER';
•
Shutdown the primary instance (PRODS)
SQL> shutdown immediate;
SQL> Startup nomount;
SQL> alter database mount standby database;
Data Guard Switchover Steps
(Continued…)
•
On the standby (PROD1) issued the following
command
SQL> ALTER DATABASE COMMIT TO
SWITCHOVER TO PRIMARY;
•
On the standby (PROD1)
SQL>shutdown immediate;
SQL> startup open;
•
SQL>alter system switch logfile;
•
Make sure other instance(PROD2) comes up as
well.
Data Guard Switchover
PROD
2 node RAC + ASM
Primary Database 1
Storage Array
PRODS
Standby
Database 1
PRODC
Data
Guard
Standby
Database 2
Final Configuration
2 node RAC + ASM
Primary Database
Storage Array
2 node RAC + ASM
Standby Database
Storage Array
Real Application Clusters
• RAC misconception.
• Always register the database and
instances to the cluster.
• You have to use “netca” to register the
listener to crs in 10g.
• Srvctl is “case sensitive”.
• Always use single parameter file for
multiple RAC nodes.
• Client side load balancing.
• Server side load balancing.
ASM Configuration
Storage Group 2
Storage Group 1
ASM Disk Groups
DATA
FLASH
LUN 1
LUN 2
LUN 5
LUN 6
LUN 3
LUN 4
LUN 7
LUN 8
ASM disks
Data Files
Control Files
Online log files
Archive log files
RMAN backups
Mirrored cntl and log
Flashback Logs
Flash Recovery Area
• Show parameter db_recovery
NAME
TYPE
VALUE
------------------------------------ ----------- --------------------------db_recovery_file_dest
string
+FLASH_DG
db_recovery_file_dest_size big integer 190000M
• SQL> select
space_used/(1024*1024*1024),space_limit/(1024*1024
*1024) from v$recovery_file_dest;
SPACE_USED (in GB)
--------------------------55.6249833
SPACE_LIMIT (in GB)
---------------------------185.546875
Flash Recovery Area
• SQL> select * from v$flash_recovery_area_usage;
CONTROLFILE
ONLINELOG
ARCHIVELOG
BACKUPPIECE
IMAGECOPY
FLASHBACKLOG
0
0
18.17
0
0
11.81
0
0
6.03
0
0
11.73
0
0
4691
0
0
1077
• ORA-19815: WARNING: db_recovery_file_dest_size of
53687091200 bytes is 87.63% used, and has
6642196992 remaining bytes available
Flash Recovery Area
• SQL> alter system set db_recovery_file_dest_size=55G
scope=memory;
• SQL> select * from v$flash_recovery_area_usage;
CONTROLFILE
ONLINELOG
ARCHIVELOG
BACKUPPIECE
IMAGECOPY
FLASHBACKLOG
0
0
61.3
0
0
38.7
0
0
20.34
0
0
38.44
0
0
4692
0
0
1046
Cleanup of Archivelog files from
Flash Recovery Area
• Issue the following RMAN commands
RMAN> crosscheck archivelog all;
RMAN> delete expired archivelog all;
• After “delete expired” command
CONTROLFILE
ONLINELOG
0
0
0
0
0
0
1274
ARCHIVELOG
3.83
0
BACKUPPIECE
0
0
0
IMAGECOPY
0
0
0
FLASHBACKLOG
51.23
50.93
1007
Flashback Logs
• SQL> show parameter db_flashback
NAME
TYPE
VALUE
------------------------------------ ----------------------------------------- -----------------db_flashback_retention_target
integer 1440
• SQL> show parameter log_archive_min_succeed_dest;
NAME
TYPE
VALUE
------------------------------------ ----------- ----------------------------log_archive_min_succeed_dest
integer 1
How to cleanup Flashback Logs
from Flash Recovery Area
• SQL> alter system set
log_archive_dest_1='LOCATION=/home/oracle/temp_
archivelog' scope=memory;
• SQL> alter system set
log_archive_dest_state_10=defer scope=memory;
• SQL> alter system set
db_recovery_file_dest_size=55G scope=memory;
• Check the alert log for “deleted Oracle managed files”
messages.
How to cleanup Flashback Logs
from Flash Recovery Area
• SQL> select * from v$flash_recovery_area_usage;
CONTROLFILE
ONLINELOG
ARCHIVELOG
BACKUPPIECE
IMAGECOPY
FLASHBACKLOG
0
0
92.19
0
0
7.3
0
0
0
0
0
0
0
0
1274
0
0
6
• SQL> alter system set
db_recovery_file_dest_size=190G scope=memory;
How to cleanup Flashback Logs
from Flash Recovery Area
• SQL> alter system set log_archive_dest_1=''
scope=memory;
• SQL> alter system set
log_archive_dest_state_10=enable scope=memory
Conclusion and Best Practices
• All configuration changes were performed
without any significant downtime outages to
the production or standby databases.
• Various Oracle technologies, including RMAN,
RAC, ASM, and Data Guard, were
successfully utilized together to achieve the
final high-availability solution.
• An awareness and understanding of the
technologies available from Oracle, together
with an innovative approach to implementation,
were critical in building the environment while
complying with the customer’s business needs.
Question Answers
Contact Information
Rama Balaji
[email protected]
(303) 985-2213
www.tusc.com
World-wide Team of IT
Professionals
• Core Services
– Oracle E-Business Suite
Contact Information
Rama Balaji
[email protected]
(303) 985-2213
www.tusc.com
– Enterprise Performance
Management (EPM)
– Oracle DBA, Database, and
Infrastructure
– Business Intelligence and
Data Warehousing
– Managed Services
– remote support & hosting
– Oracle Fusion Middleware
– Oracle Hyperion
– Software
– Training