title

Download Report

Transcript title

Implementing a logical Data Guard

Steps to get there and tips & tricks Mathijs Bruggink The Netherlands 2012.09.13

Workshop Overview

▶

App 1 Hr

▶

General Information Dataguard Databases

▶

Setting Up a Logical Standby Dataguard

▶

Issues/ Tips and Trics

▶

The End

About Me.

• • •

Oracle Dba since 1998 (since Oracle 7.3.4).

Working for Atos since 2008.

Key activities: Migrations; Real Application Clusters; High Availability; Rman.

•

Blogging

•

http://jointherebels.nl/blog/rac.html

•

http://www.orapro.nl/author/mathijs/

•

Twitter : http://twitter.com/MathijsBruggink

•

Travelling , Bonsais, Music, Married.

Purposes for a Dataguard Database (aka Standby Db) 1

Purposes for a Dataguard Database 2 nd Attempt

Purposes for a Dataguard For Real…..

▶

High availability for your Database.

▶

Cloning for example for Migration strategies.

▶

Part of Migration.

▶

Reporting (offers near time up to date environment).

▶

Mix of all the above.

The types of Dataguard databases

▶ ▶ ▶ ▶ Every Logical SB = born as Physical SB

Physical standby database

Physically

identical copy of the primary database.(Disk Structures identical to the Primary Db on a block-for-block basis).

Synced:

A Physical standby database is synced with the primary database, through Redo Apply, which recovers the redo data received from the Primary side and applies the redo to the physical standby database.

Status:

MOUNTED (NOTE: as from11GR1 ACTIVE Dataguard (DB is Open Readonly) ▶ ▶ ▶ ▶

Logical standby database

Logically: SAME

Logical Info as the primary side.

BUT

the physical organization and structure of the data can be different.

Synced:

synchronized through SQL Apply. This transforms the data in the redo received from the Primary database into SQL statements and then executes the SQL statements on the standby database (Logminer).

Status: OPEN

The types of Dataguard databases

▶ ▶

Snapshot Standby Database

This is a fully updatable standby database created by converting a physical standby database into a snapshot standby database.

Status: OPEN

NOTE : Unlike a physical or logical standby database, a snapshot standby database does not apply the received redo data. It is only applied after the snapshot standby is converted back into a physical standby database.

The Future of Dataguard databases Addressing new Challenges:

▶

And then there will be …… Oracle Goldengate ………….

• •

Zero Downtime for High Availability, Migrations.

Protecting your (changing) datamodel by def. rules.

Interfacing with Dataguard

•

Sqlplus

•

Broker ( cmnd line interface )

•

Grid Control

Data Guard Architecture (AND High Available)

Data Guard Architecture

Data Guard Protection Modes

▶ ▶ ▶

Maximum Performance

– Keyword: P1 Performance (of the Primary Db) over data protect.

It requires ASYNC redo transport

, the LGWR process never waits for acknowledgment from the Standby database.

Maximum Availability

– Keyword: P1 = Availability: Zero data loss protect as a very close P2.

It requires SYNC redo transport

redo has been written to disk.

, thus Primary database performance may be impacted by the amount of time required to receive an acknowledgment from the Standby that

Maximum Protection

– Keyword: Utmost Prio 1 on data protection.

It also requires SYNC redo transport

. The Primary Db will not acknowledge a commit to the application unless it receives acknowledgment from at least one Standby Db in the configuration that the data needed to recover that transaction is safely on disk.

ASYNC redo transport architecture ( Maximum Performance)

Log Network Server (LNS) Remote File Server (RFS)

SYNC redo transport architecture ( Maximum Availability ; Maximum Protection)

Log Network Server (LNS) Remote File Server (RFS)

Automatic update in a Physical SB

Automatic update of Logical SB

SQL Apply in a Logical SB // Tuning

Bug if set to 4096 ( 4M) execute dbms_logstdby.apply_set('MAX_SERVERS', 64); execute dbms_logstdby.apply_set('MAX_SGA', 4095); execute dbms_logstdby.apply_set('_HASH_TABLE_SIZE', 10000000); Logical Change Record

Redo Merging (scn)

exec dbms_logstdby.apply_set ('PRESERVE_COMMIT_ORDER', 'TRUE');

Automatic gap resolution for Standby Databases

Apply Missing archives Not only gap resolution also, protecting needed archives on

primary

side ! ( RMAN-08137: WARNING: archived log not deleted as it is still needed ). BTW Archives are ALWAYS shipped (even when up to date)

Best Practice before start 1

▶ – – –

Prepare the Primary Database :

Application and Admin software. Are all required users present.

Data complete as possible (Initial Imports etc)

–

Bottomline: BE as complete and thorough as possible before start with the Duplication of the Database.

▶

Storage:

Plenty space in ASM on primary side FRA (Flash recovery Area). NEED even more space on the Standby side. Recommend: At least 1.5 times archive space on the LSB side (Backlog is very likely. Sql apply and Oracle will only release the data (even though automatically after 24 Hrs).

Best Practice before start 2

Success ▶

Backup/restore facilities:

Is the Standby server able to restore backups that are created on the primary server (remote access list in networker).

▶ ▶ ▶

Time window:

PLAN your activities well ! Best practice is to work with the backup, the restore, recover and set up of the Standby in one day, or the very next day max.

Physical standby:

Recommend : Physical standby in place and up to date then convert to a logical standby.

– –

Bulk Data(

The SQL apply will have trouble with Monster tables

Create the duplicate Database (that becomes a Physical standby) AFTER you have finished working with the monster table OR define skip rule before doing that kind of work, Instantiate the table, and then undo the skip rule.

Preparations for setting up a Logical SB: Key elements Are :

▶ ▶ ▶ ▶

Password file Connectivity Parameters of Interest to Data Guard Init.ora on both Dataguard environments point to spfile in asm fully prepared with all settings : Example : cat $ORACLE_HOME/dbs/initM2MP1R11.ora = spfile='+MYDBSB_FRA01/m2mp1r1/spfileM2MP1R1.ora'

Parameters of Interest to Data Guard

Of interest for Call OUT

Preparations for the duplication

▶

Duplication script using Backups:

Duplicate Database Script

Or use .. From Active Database

FromActiveDatabase.rtf

Time line of Actions

1 2 BACKLOG 3 1. Backup 2. Restore 3. Start Physical / Convert to Logical

Restoring extra archives if needed after restore

▶

As long as you still have your Physical SB you can do this on the SB SIDE !!!!!! ( After converting it to a logical SB the DBID will change !!!!)

Archive Restores for Physical SB ( If needed)

After duplicate completes Add standby redologs (SRL)

▶ ▶

Best practice: add the standby redo logs to the primary. During the duplicate SRL already present on the standby side !!!

▶

Formula: Number of standby redologs needed is : (4 groups in an instance + 1 extra group) * 6 threads = 30 Standby redologs. (my env. Was 6 node Rac).

▶

DO NOT:

add them to a specific thread. Oracle will sort out, per instance to which SRL it would like to talk on the standby side (so you should create them without a thread assigned to it).

Adding standby redologs

▶ ▶ ▶ ▶ ▶ ▶ ▶ ▶ ▶ ▶ ▶ ▶

alter database add standby logfile

<<<<<<<<<<<<<< Note !

group 26 ('+MYDBSB_FRA01') size 1000m,

Only one member should be allocated :

group 28 ('+MYDBSB_FRA01') size 1000m, ……

A standby redolog is in no way same as a redolog. So no two members are required.

group 50 ('+MYDBSB_FRA01') size 1000m, group 51 ('+MYDBSB_FRA01') size 1000m, group 52 ('+MYDBSB_FRA01') size 1000m, group 53 ('+MYDBSB_FRA01') size 1000m, group 54 ('+MYDBSB_FRA01') size 1000m;

Start / stop Recovery on the Physical SB side

▶

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;

▶

Alertlog shows: MRP0: Background Managed Standby Recovery process started (MYDBSB1) Fast Parallel Media Recovery enabled Managed Standby Recovery starting Real Time Apply Wed May 18 15:22:25 2011 parallel recovery started with 64 processes Note. If you need to stop recovery on a physical:

▶

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Checking the Physical SB database

▶

select * from v$dataguard_status;

▶

select * from v$archive_gap;

▶

Oracle provided script:

–

(can be downloaded from metalink) Physical standby Checks Physical Standby Output

Checking the Physical SB database

▶

Primary

side: alter system archive log current;

▶

Standby

side Check your alertlog on the Standby NOTE: If this SHOULD show shipment of an Arch for all instances.

Check again your connectivity / password files on the source side.

Checking the Physical SB database On the Primary side: Set pagesize 10000 select * from gv$archive_dest where DEST_NAME='LOG_ARCHIVE_DEST_2‘ order by inst_id; Log_archive_dest_2_NOT OK On Physical standby side: COL NAME FOR A13 COL VALUE FOR A20 COL UNIT FOR A30 SET LINES 122 SELECT NAME, VALUE, UNIT, TIME_COMPUTED FROM V$DATAGUARD_STATS WHERE NAME IN ('transport lag', 'apply lag');

Status = Valid / ERROR

Checking the Physical SB database Showed during catchup : NAME VALUE UNIT TIME_COMPUTED ------- -------------------- ------------------------------ ---------------------------- apply lag +00 12:30:48 day(2) to second(0) interval 20-JUN-2011 21:20:36 transport lag +00 14:10:17 day(2) to second(0) interval 20-JUN-2011 21:20:36 .. apply lag means 12:30:48 behind on applying ( so likely an archive is missing) .. transport indicates that shipment of redo from the primary to the physical standby is behind +00 14:10:17

Checking the Physical SB database SET LINES 132 SET PAGESIZE 9999 COL CLIENT_PID FORMAT A12 set numf 9999999 SELECT PID, PROCESS, STATUS, CLIENT_PROCESS,CLIENT_PID, THREAD#, SEQUENCE# SEQ#,BLOCK#, BLOCKS FROM V$MANAGED_STANDBY Order by thread#;

This showed in specific detail:

PID PROCESS STATUS CLIENT_P CLIENT_PID THREAD# SEQ# BLOCK# BLOCKS ------ ------- ------------ -------- ------------ -------- -------- -------- --------

32414 MRP0 WAIT_FOR_GAP N/A N/A 5 1488 0 0

So I restored 1488 for thread 5 !

Convert a Physical SB to Logical SB - 1 1.

Stop redo apply on the physical standby database: alter database recover managed standby database cancel; On the PRIMARY side the LogMiner Dictionary needs to be build (on the source side): execute dbms_logstdby.build; Note: this package will wait for all active transactions to finish. We have had at least 2 occasions where either a maintenance window

–

Full ( Db down ), or

–

Partially ( stopping specific daemons ) Was needed to finish the job… So you need to be aware of that .

Convert a Physical SB to Logical SB - 2 3.

On the SB side following sql:

▶

shutdown immediate;

▶

startup mount ;

▶

alter database recover to logical standby ;

= Choose new name (db name from the SB db will also change.

If you are using spfile that the change is recorded in the spfile during this process.

AND this recover will also change the DBID!

Example: alter database recover to logical standby MYLSBDB1; ▶

shutdown

▶ ▶

startup mount; alter database open resetlogs;

▶

alter database start logical standby apply immediate; BTW if you need to stop apply: ALTER DATABASE STOP LOGICAL STANDBY APPLY;

Health Checking Logical Standby Primary side select * from v$dataguard_status; select * from gv$archive_dest where dest_id = 2 order by INST_ID;

(looking for (no) errors) Open three windows: – a sql window to the primary datatabase – and a unix window with a tail –f on the alertfile on the standby database – and a sqlplus window to the standby database Check current sequence number of the standby redologs on the standby side

select sequence#, first_time, next_time, dict_begin, dict_end from dba_logstdby_log order by sequence#;

On the primary side force a log switch for all Instance by issuing

alter system archive log current; (Metalink) DG logical standby diag Logical Standby Output

Health Checking Logical Standby On the logical side

Check the alertlog.. the logminer should show activities like these: …. ▶ LOGMINER: Begin mining logfile for session 1 thread 6 …….

▶ ▶ LOGMINER: End mining logfile for session 1 thread 3 ……….. LOGMINER: Begin mining logfile for session 1 thread 3……..

▶ ▶ Tue Sep 27 08:34:27 2011 RFS LogMiner: Registered logfile [+MYDBSB_FRA01/m2mp1r1/archivelog/2011_09_27/thread_6_seq_3748.

439.762942763] to LogMiner session id [1]

Conclusion: if all this works fine, then it is a job well done ! ORACLE provided script /opt/oracle/admin/tools /dg_logical_diag.sql (Metalink)

Additional checks (KPI 1)

Verify that log apply services on the standby are currently running. If the query against V$LOGSTDBY returns no rows then logical apply is not running.

column status format a50 wrap column type format a11 set numwidth 15 SELECT TYPE, STATUS, HIGH_SCN FROM V$LOGSTDBY;

Additional checks (KPI 1) (output)

TYPE STATUS HIGH_SCN ----------- -------------------------------------------------- -------------- COORDINATOR ORA-16116: no work available 10982415246160 ANALYZER ORA-16116: no work available 10982415509661 This tells you that the mining engine is waiting to apply transaction 09fd.0af5a3ac

APPLIER ORA-16124: transaction 11 23 562627 is waiting on 10982415249574 another transaction APPLIER ORA-16124: transaction 541 2 892602 is waiting on 10982415249570 another transaction APPLIER ORA-16124: transaction 11 0 562308 is waiting on a 10982415249585 nother transaction APPLIER ORA-16124: transaction 405 20 658443 is waiting on 10982415249590 another transaction ions to be applied 145.10.1094317 and that 2161 transactions need to be applied (first) before the barrier is satisfied READER ORA-16127: stalled waiting for additional transact 10982415512190

BUILDER ORA-44604: Barrier synchronization on DDL with XID 10982415509700 145.10.1094317 (waiting on 2161 transactions)

PREPARER ORA-16116: no work available 10982415512134 PREPARER ORA-16116: no work available 10982415512123 PREPARER ORA-16116: no work available 10982415512190 PREPARER ORA-16116: no work available 10982415512160

Additional checks (KPI 2) In order to check if apply is still working set lines 166 set numf 99999999999999999; SELECT NEWEST_SCN, NEWEST_TIME,APPLIED_SCN, APPLIED_TIME, READ_SCN, READ_TIME FROM DBA_LOGSTDBY_PROGRESS;

NEWEST_SCN NEWEST_TIME

APPLIED_SCN APPLIED_TIME READ_SCN READ_TIME

------------------ ------------------- ------------------ ------------------- ------------ ----- ------------------ 10857770301722 16.06.2011 16:19:38

10857664034502 16.06.2011 03:09:49 10856081424810 15.06.2011 20:41:59

10857770706836 16.06.2011 16:22:11

10857664034502 16.06.2011 03:09:49 10856081424810 15.06.2011 20:41:59

Additional checks (KPI 3) set lines 166 set numf 99999999999999999; select thread# trd, sequence#,first_change#, next_change#,dict_begin beg, dict_end end,to_char(timestamp, 'hh:mi:ss') timestamp, (case when l.next_change# < p.read_scn then 'YES' when l.first_change# < p.applied_scn then 'CURRENT' else 'NO' end) applied from dba_logstdby_log l, dba_logstdby_progress p order by thread#, first_change#;

2 TRD SEQUENCE# FIRST_CHANGE# NEXT_CHANGE# BEG END TIMESTAM APPLIED ------------------ ------------------ ------------------ ------------------ --- --- -- ----- ------ 1 1900 10854889739662 10855983809311 NO NO 12:08:40 YES 1 1901 10855983809311 10855986222881 NO NO 01:11:39 YES YES YES YES 1 1903 10855993572127 10856005620352 NO NO 08:19:07 1 1904 10856005620352 10856011936221 NO NO 10:22:09

Customizing a Logical Standby Database to Replicate or workaround

▶

GOAL: Specify rules For example skip the replication of a table or a set of tables at the logical standby database.

▶

NOTE: 100 % of the redo is always transferred to the standby database. The skipping in this case applies to what SQL Apply will actually process at the standby database with that redo.

▶

HOW: DBMS_LOGSTDBY.SKIP to Skip Replication of Table(s).

▶

……. On the Standby side of course !

Case : Skipping a Table that caused the SQL apply to slow down / stop

▶ Our Case: slow to almost none progress in the SQL applies. ▶ After the duplication a 83 Million row table was created on the primary side. ▶ SQL apply tried to bring that biggy across but that took forever with little progress only. We contacted Oracle They came up with the following approach.

– – – stop the logical standby apply:

alter database stop logical standby apply;

execute skip statement:

E',null); exec dbms_logstdby.skip('DML',‘SCOTT','T_USAGE_DATA_ARCHIV

restart the logical standby apply

alter database start logical standby apply;

Identify Skiprules in place col error format a10 col statement_opt format a20 col owner format a10 col name format a40 col proc format a30 set lines 144 select * from dba_logstdby_skip order by statement_opt;

Instantiate a Table 1 (cause then we felt sorry for skipping) ▶ Right after the duplication we had issues with a large archive table that was filled with 80 Million rows just over night , and the sql apply had hard times catching up with that , slowing the complete sql apply down dramatically. After consulting with the Supplier and the Functional department it was decided to define a skip_rule for thist table. After that the sql apply was able to catch up easily..

▶ After time it turned out that the table was still needed on the standby side. So we needed to explore the Instantiate table procedure that Oracle offers. (Step by Step Approach of How to

Instantiate a Table in Logical Standby Database [ID 842160.1])

Instantiate a Table 2

Best Practice …

▶ ▶ MAKE DOUBLE SURE YOUR UNDO TABLESPACE IS LARGE ENOUGH BEFORE EVEN STARTING.

Even though it is not that documented specifically make sure that the Schema that will have the instantiated table has following extra privileges : ▶

SQL> grant SELECT_CATALOG_ROLE to scott; SQL> grant EXP_FULL_DATABASE to scott; SQL> grant IMP_FULL_DATABASE to scott;

Instantiate a Table 3 ▶ ▶

1. At primary, as sys user grant below roles to the user whose table(s) is being reinstantiated SQL> grant SELECT_CATALOG_ROLE to scott; SQL> grant EXP_FULL_DATABASE to scott; SQL> grant IMP_FULL_DATABASE to scott;

▶ ▶

2. As sys user, create a connected user database link at logical standby. SQL> create public database link link_cherry connect to scott identified by tiger using 'cherry';

▶

3. verify the database link from logical standby to ensure that it gets info from primary SQL> select db_unique_name,database_role from v$database@link_cherry; DB_UNIQUE_NAME DATABASE_ROLE ------------------------------ -------------- cherry PRIMARY

Instantiate a Table 4 ▶ ▶

4. stop Sql apply if it running. You can skip this step otherwise.

SQL> select count(*) from v$logstdby ; ## will result zero rows if sql apply is not running SQL> alter database stop logical standby apply; if broker is in place dgmgrl / edit database 'standby db unique name' set state='LOG-APPLY OFF'; exit

▶ ▶

5. Remove the skip rules, if there is any, from the table that is being instantiated SQL> SQL> select STATEMENT_OPT,NAME from DBA_LOGSTDBY_SKIP where OWNER='SCOTT'; exec dbms_logstdby.unskip('SCHEMA_DDL', 'SCOTT','TEST'); SQL> exec dbms_logstdby.unskip('DML', 'SCOTT','TEST');

Instantiate a Table 5 6. From logical standby ,start instantiating the the table as sys user in sqlplus.

▶ ▶

exec dbms_logstdby.instantiate_table('SCOTT','TEST','LINK_CHERRY')

Note:- Ensure that the Schema Name, Table Name and dblink name are in Uppercase. This procedure recreates the table that is being instantiated. It can be confirmed from dba_objects.created column of logical standby 7. Once you are done with instantiation, resume sql apply ▶ SQL>

alter database start logical standby apply;

if broker is in place dgmgrl /

edit database 'standby db unique name' set state='ONLINE';

exit ▶ ▶ 8.At primary, Revoke the roles granted to the application user - scott SQL> SQL> SQL>

revoke SELECT_CATALOG_ROLE from scott; revoke EXP_FULL_DATABASE from scott; revoke IMP_FULL_DATABASE from scott;

On call Support - 1

▶

Both on the primary side and the standby side we try to keep up a pool of candidate disks when ASM runs out of space.

▶

Stop the apply on the standby side: alter database stop logical standby apply;

▶

Add the disk(s)

▶

Start the apply on the standby side: alter database start logical standby apply immediate;

On call Support - 2

Create tablespace or adding/Resizing Datafiles in a logical Standby.

▶ ▶ ▶ ▶ ▶ ▶

Standby side: Resizing a datafile is always possible Adding a datafile to a tablespace few extra steps (sqlplus)

– –

alter database guard standby; add the datafile to: (f.i.) ALTER TABLESPACE SYSAUX ADD DATAFILE '+MYSBDB_DATA01' SIZE 1000 M);

–

alter database guard all; Primary Side: Creating a tablespace on the primary did work well but caused Side effects on the logical standby side. BEST PRACTICE could be:

– – –

To define a skip rule on the standby database for that tablespace. Then create the tablespace on the primary side and then Create the same tablespace on the standby side.

Where can we improve..

▶

Monitoring ( automated )

▶

Network traffic ( if needed )

▶

….. And of course the Application ..

Used Information

▶

Oracle® Data Guard Concepts and Administration 11g Release 1 (11.1)

Part Number B28294-03 ▶

Oracle Data Guard 11g Handbook

▶

Metalink

▶

DataGuard (11gr1)

▶

DataGuard (11gr2)