Data Guard Performance
Download
Report
Transcript Data Guard Performance
Oracle Active Data Guard
Performance
Joseph Meeks
Director, Product Management
Oracle High Availability Systems
1
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Note to viewer
2
These slides provide various aspects of performance data for
Data Guard and Active Data Guard – we are in the process of
updating for Oracle Database 12c.
It can be shared with customers, but is not intended to be a
canned presentation ready to go in its entirety
It provides SC’s data that can be used to substantiate Data Guard
performance or to provide focused answers to particular concerns
that may be expressed by customers.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Note to viewer
See this FAQ for more customer and sales collateral
– http://database.us.oracle.com/pls/htmldb/f?p=301:75:1014514610433
66::::P75_ID,P75_AREAID:21704,2
3
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Agenda – Data Guard Performance
Failover and Switchover Timings
SYNC Transport Performance
ASYNC Transport Performance
Primary Performance with Multiple Standby Databases
Redo Transport Compression
Standby Apply Performance
4
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Data Guard 12.1 Example - Faster Failover
48 seconds
2,000 sessions
on both primary
and standby
43 seconds
2,000 sessions
on both primary
and standby
5
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
# of database
sessions on primary
and standby
# of database
sessions on primary
and standby
Data Guard 12.1 Example – Faster Switchover
72 seconds
1,000 sessions on
both primary and
standby
83 seconds
500 sessions on
both primary and
standby
6
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
# of database
sessions on
primary and
standby
# of database
sessions on
primary and
standby
Agenda – Data Guard Performance
Failover and Switchover Timings
SYNC Transport Performance
ASYNC Transport Performance
Primary Performance with Multiple Standby Databases
Redo Transport Compression
Standby Apply Performance
7
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Synchronous Redo Transport
Zero Data Loss
Primary database performance is impacted by the total round-trip time for
acknowledgement to be received from the standby database
– Data Guard NSS process transmits Redo to the standby directly from log buffer, in
parallel with local log file write
– Standby receives redo, writes to a standby redo log file (SRL), then returns ACK
– Primary receives standby ACK, then acknowledges commit success to app
The following performance tests show the impact of SYNC transport on
primary database using various workloads and latencies
In all cases, transport was able to keep pace with generation – no lag
We are working on test data for Fast Sync (SYNCNOAFFIRM) in Oracle
Database 12c (same process as above, but standby acks primary as soon as
redo is received in memory – it does not wait for SRL write.
8
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Test 1) Synchronous Redo Transport
OLTP with Random Small Insert < 1ms RTT Network Latency
Workload:
– Random small inserts (OLTP) to 9 tables with 787 commits per second
– 132 K redo size, 1368 logical reads, 692 block changes per transaction
Sun Fire X4800 M2 (Exadata X2-8)
– 1 TB RAM, 64 Cores, Oracle Database 11.2.0.3, Oracle Linux
– InfiniBand, seven Exadata cells, Exadata Software 11.2.3.2
Exadata Smart Flash, Smart Flash Logging and Write-Back flash
enabled provided significant gains
9
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Test 1) Synchronous Redo Transport
OLTP with Random Small Inserts and < 1ms RTT Network Latency
Local standby,
Redo Rate
With Data Guard Synchronous Transport Enabled
104,051,368.80
Data Guard Transport Disabled
104,143,368.00
Txn Rate
0
DG Sync
No DG
20000000
40000000
Txn Rate
787
790.6
60000000
80000000
100000000
Redo Rate
104,051,368.80
104,143,368.00
120000000
<1ms RTT
99MB/s redo rate
<1% impact on
database
throughput
1% impact on
transaction rate
RTT = network round trip time
10
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Test 2) Synchronous Redo Transport
Swingbench OLTP Workload with Metro-Area Network Latency
Exadata X2-8, 2-node RAC database
– smart flash logging, smart write back flash
Swingbench OLTP workload
– Random DMLs, 1 ms think time, 400 users, 6000+ transactions per
second, 30MB/s peak redo rate (different from test 2)
Transaction profile
– 5K redo size, 120 logical reads, 30 block changes per transaction
1 and 5ms RTT network latency
11
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Test 2) Synchronous Redo Transport
Swingbench OLTP Workload with Metro-Area Network Latency
Transactions
per/second
6000
5000
4000
3000
2000
1000
0
12
Swingbench OLTP
30 MB/s redo
6363
6151
6077
tps
tps
tps
3% impact at
1ms RTT
5% impact at
5ms RTT
Baseline
No Data Guard
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Data Guard SYNC
1ms RTT
Network Latency
Data Guard SYNC
5ms RTT
Network Latency
Test 3) Synchronous Redo Transport
Large Insert OLTP Workload with Metro-Area Network Latency
Exadata X2-8, 2-node RAC database
– smart flash logging, smart write back flash
Large insert OLTP workload
– 180+ transactions per second, 83MB/s peak redo rate, random tables
Transaction profile
– 440K redo size, 6000 logical reads, 2100 block changes per transaction
1, 2 and 5ms RTT network latency
13
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Test 3) Synchronous Redo Transport
Large Insert OLTP Workload with Metro-Area Network Latency
Transactions
per/second
Large Insert - OLTP
200
150
83 MB/s redo
<1%% impact
189
188
177
167
tps
tps
tps
tps
2ms RTT
Network
Latency
5ms RTT
Network
Latency
100
50
0
14
Baseline
No
Data Guard
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
1ms RTT
Network
Latency
at 1ms RTT
7% impact at
2ms RTT
12% impact at
5ms RTT
Test 4) Synchronous Redo Transport
Mixed OLTP workload with Metro-Area Network Latency
Exadata X2-8, 2-node RAC database
– smart flash logging, smart write back flash
Mixed workload with high TPS
– Swingbench plus large insert workloads
– 26000+ txn per second and 112 MB/sec peak redo rate
Transaction profile
– 4K redo size, 51 logical reads, 22 block changes per transaction
1, 2 and 5ms RTT network latency
15
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Test 4) Synchronous Redo Transport
Mixed OLTP workload with Metro-Area Network Latency
35,000
Swingbench plus large insert
30,000
112 MB/s redo
Txn Rate
Redo Rate
25,000
3% impact at < 1ms RTT
20,000
5% impact at 2ms RTT
15,000
6% impact at 5ms RTT
10,000
5,000
0
Txns/s
Redo Rate (MB/sec)
% Workload
16
No Sync
29,496
116
100%
0ms
28,751
112
97%
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
2ms
27,995
109
95%
5ms
27,581
107
94%
10ms
26,860
104
91%
Note: 0ms latency on graph
20ms
represents
values falling in
26,206
the 102
range <1ms
89%
Additional SYNC Configuration Details
For the Previous Series of Synchronous Transport Tests
No system bottlenecks (CPU, IO or memory) were encountered during
any of the test runs
– Primary and standby databases had 4GB online redo logs
– Log buffer was set to the maximum of 256MB
– OS max TCP socket buffer size set to 128MB on both primary and standby
– Oracle Net configured on both sides to send and receive 128MB with an
SDU for 32k
– Redo is being shipped over a 10GigE network between the two systems.
– Approximately 8-12 checkpoints/log switches are occurring per run
17
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Customer References for SYNC Transport
Fannie Mae Case Study that includes performance data
Other SYNC references
– Amazon
– Intel
– MorphoTrak – prior biometrics division of Motorola, case study, podcast, presentation
– Enterprise Holdings
– Discover Financial Services, podcast, presentation
– Paychex
– VocaLink
18
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Synchronous Redo Transport
Caveat that Applies to ALL SYNC Performance Comparisons
Redo rates achieved are influenced by network latency, redo-write
size, and commit concurrency – in a dynamic relationship with each
other that will vary for every environment and application
Test results illustrate how an example workload can scale with minimal
impact to primary database performance
Actual mileage will vary with each application and environment.
Oracle recommends customers conduct their own tests, using their
workload and environment. Oracle tests are not a substitute.
19
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Agenda
Failover and Switchover Timings
SYNC Transport Performance
ASYNC Transport Performance
Primary Performance with Multiple Standby Databases
Redo Transport Compression
Standby Apply Performance
20
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Asynchronous Redo Transport
Near Zero Data Loss
ASYNC does not wait for primary acknowledgement
A Data Guard NSA process transmits directly from log buffer in parallel with
local log file write
– NSA reads from disk (online redo log file) if log buffer is recycled before redo
transmission is completed
ASYNC has minimal impact on primary database performance
Network latency has little, if any, impact on transport throughput
– Uses Data Guard 11g streaming protocol & correctly sized TCP send/receive buffers
Performance tests are useful to characterize max redo volume that ASYNC is
able to support without transport lag
– Goal is to ship redo as fast as generated without impacting primary performance
21
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Asynchronous Test Configuration
Details
100GB online redo logs
Log buffer set to the maximum of 256MB
OS max TCP socket buffer size set to 128MB on primary and standby
Oracle Net configured on both sides to send and receive 128MB
Read buffer size set to 256 (_log_read_buffer_size=256) and archive buffers
set to 256 (_log_archive_buffers=256) on primary and standby
Redo is shipped over the IB network between primary and standby nodes
(insures that transport is not bandwidth constrained)
– Near-zero network latency, approximate throughput of 1200MB/sec.
22
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
ASYNC Redo Transport Performance Test
Oracle Database 11.2.
Data Guard ASYNC transport can sustain very
600
high rates
500
484
Redo
Transport
MB/sec
400
‒ 484 MB/sec on single node
‒ Zero transport lag
Add RAC nodes to scale transport performance
300
‒ Each node generates its own redo thread and has a
dedicated Data Guard transport process
200
‒ Performance will scale as nodes are added assuming
adequate CPU, I/O, and network resources
100
A 10GigE NIC on standby receives data at
0
maximum of 1.2 GB/second
Single Instance
‒ Standby can be configured to receive redo across two
or more instances
23
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Data Guard 11g Streaming Network Protocol
High Network Latency has Negligible Impact on Network Throughput
Redo
Transport
Rate
35
Streaming protocol is new with Data Guard 11g
30
Test measured throughput with 0 – 100ms RTT
25
ASYNC tuning best practices
MB/sec
20
– Set correct TCP send/receive buffer size = 3 x
Network
Latency
15
0ms
25ms
50ms
100ms
10
5
0
ASYNC
24
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
BDP (bandwidth delay product)
BDP = bandwidth x round-trip network latency
– Increase log buffer size if needed to keep NSA
process reading from memory
See support note 951152.1
X$LOGBUF_READHIST to determine buffer hit rate
Agenda
Failover and Switchover Timings
SYNC Transport Performance
ASYNC Transport Performance
Primary Performance with Multiple Standby Databases
Redo Transport Compression
Standby Apply Performance
25
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Multi-Standby Configuration
Primary - A
Local Standby - B
A growing number of customers use multi-standby Data
Guard configurations.
SYNC
Additional standbys are used for:
– Local zero data loss HA failover with remote DR
– Rolling maintenance to reduce planned downtime
– Offloading backups, reporting, and recovery from primary
ASYNC
– Reader farms – scale read-only performance
This leads to the question: How is primary database
Remote
Standby - C
26
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
performance affected as the number of remote transport
destinations increases?
Redo Transport in Multi-Standby Configuration
Primary Performance Impact: 14 Asynchronous Transport Destinations
105.0%
104.0%
103.0%
102.0%
101.0%
100.0%
99.0%
98.0%
97.0%
Increase in CPU
Change in redo volume
(compared to baseline)
(compared to baseline)
100.0%
98.0%
96.0%
94.0%
92.0%
0 - 14 destinations
27
102.0%
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
0 -14 destinations
Redo Transport in Multi-Standby Configuration
Primary Performance Impact: 1 SYNC and multiple ASYNC Destinations
Increase in CPU
104.0%
(compared to baseline)
(compared to baseline)
98.0%
100.0%
96.0%
98.0%
94.0%
Zero
1/0
1/1
1/14
# of SYNC/ASYNC destinations
28
102.0%
100.0%
102.0%
96.0%
Change in redo volume
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
92.0%
Zero
1/0
1/1
1/14
# of SYNC/ASYNC destinations
Redo Transport for Gap Resolution
Standby databases can be configured to request log files needed to
resolve gaps from other standby’s in a multi-standby configuration
A standby database that is local to the primary database is normally
the preferred location to service gap requests
– Local standby database are least likely to be impacted by network outages
– Other standby’s are listed next
– The primary database services gap requests only as a last resort
– Utilizing a standby for gap resolution avoids any overhead on the primary
database
29
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Agenda
Failover and Switchover Timings
SYNC Transport Performance
ASYNC Transport Performance
Primary Performance with Multiple Standby Databases
Redo Transport Compression
Standby Apply Performance
30
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Redo Transport Compression
Conserve Bandwidth and Improve RPO when Bandwidth Constrained
2500
2000
Test configuration
Transport
Lag - MB
– 12.5 MB/second bandwidth
– 22 MB/second redo volume
22 MB/sec
uncompressed
1500
Uncompressed volume exceeds
available bandwidth
– Recovery Point Objective (RPO)
1000
impossible to achieve
– perpetual increase in transport lag
500
12 MB/sec
compressed
0
– volume < bandwidth = achieve RPO
– ratio will vary across workloads
Elapsed Time - Minutes
31
50% compression ratio results in:
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Requires Advanced Compression
Agenda
Failover and Switchover Timings
SYNC Transport Performance
ASYNC Transport Performance
Primary Performance with Multiple Standby Databases
Redo Transport Compression
Standby Apply Performance
32
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Standby Apply Performance Test
Redo apply was first disabled to accumulate a large number of log files
at the standby database. Redo apply was then restarted to evaluate
max apply rate for this workload.
All standby log files were written to disk in Fast Recovery Area
Exadata Write Back Flash Cache increased the redo apply rate from
72MB/second to 174MB/second using test workload (Oracle 11.2.0.3)
– Apply rates will vary based upon platform and workload
Achieved volumes do not represent physical limits
– They only represent the particular test case configuration and workload,
higher apply rates have been achieved in practice by production customers
33
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Apply Performance at Standby Database
Test 1: no write-back flash
cache
On Exadata x2-2 quarter rack
Swing bench OLTP workload
72 MB/second apply rate
– I/O bound during checkpoints
– 1,762ms for checkpoint
complete
– 110ms DB File Parallel Write
34
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Apply Performance at Standby Database
Test 2: a repeat of the previous
test but with write-back flash
cache enabled
On Exadata x2-2 quarter rack
Swing bench OLTP workload
174 MB/second apply rate
– Checkpoint completes in
633ms vs 1,762ms
– DB File Parallel Write is
21ms vs 110ms
35
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Two Production Customer Examples
Data Guard Redo Apply Performance
Thomson-Reuters
– Data Warehouse on Exadata, prior to write-back flash cache
– While resolving a gap of observed an average apply rate of 580MB/second
Allstate Insurance
– Data Warehouse ETL processing resulted in average apply rate over a 3
hour period of 668MB/second, with peaks hitting 900MB/second
36
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Redo Apply Performance for Different Releases
Range of Observed Apply Rates for Batch and OLTP
Standby
Apply
Rate
MB/sec
700
600
500
400
300
200
100
0
High End - Batch
High End - OLTP
Oracle
Database 9i
37
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle
Database
10g
Oracle
Database
11g (non
Exadata)
Oracle
Database
11g
(Exadata)
38
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
39
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.