Data Guard Performance

Transcript Data Guard Performance

Oracle Active Data Guard
Performance
Joseph Meeks
Director, Product Management
Oracle High Availability Systems
1
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Note to viewer
2

These slides provide various aspects of performance data for
Data Guard and Active Data Guard – we are in the process of
updating for Oracle Database 12c.

It can be shared with customers, but is not intended to be a
canned presentation ready to go in its entirety

It provides SC’s data that can be used to substantiate Data Guard
performance or to provide focused answers to particular concerns
that may be expressed by customers.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Note to viewer
 See this FAQ for more customer and sales collateral
– http://database.us.oracle.com/pls/htmldb/f?p=301:75:1014514610433
66::::P75_ID,P75_AREAID:21704,2
3
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Agenda – Data Guard Performance
 Failover and Switchover Timings
 SYNC Transport Performance
 ASYNC Transport Performance
 Primary Performance with Multiple Standby Databases
 Redo Transport Compression
 Standby Apply Performance
4
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Data Guard 12.1 Example - Faster Failover
48 seconds
2,000 sessions
on both primary
and standby
43 seconds
2,000 sessions
on both primary
and standby
5
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
# of database
sessions on primary
and standby
# of database
sessions on primary
and standby
Data Guard 12.1 Example – Faster Switchover
72 seconds
1,000 sessions on
both primary and
standby
83 seconds
500 sessions on
both primary and
standby
6
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
# of database
sessions on
primary and
standby
# of database
sessions on
primary and
standby
Agenda – Data Guard Performance
 Failover and Switchover Timings
 SYNC Transport Performance
 ASYNC Transport Performance
 Primary Performance with Multiple Standby Databases
 Redo Transport Compression
 Standby Apply Performance
7
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Synchronous Redo Transport
Zero Data Loss
 Primary database performance is impacted by the total round-trip time for
acknowledgement to be received from the standby database
– Data Guard NSS process transmits Redo to the standby directly from log buffer, in
parallel with local log file write
– Standby receives redo, writes to a standby redo log file (SRL), then returns ACK
– Primary receives standby ACK, then acknowledges commit success to app
 The following performance tests show the impact of SYNC transport on
primary database using various workloads and latencies
 In all cases, transport was able to keep pace with generation – no lag
 We are working on test data for Fast Sync (SYNCNOAFFIRM) in Oracle
Database 12c (same process as above, but standby acks primary as soon as
redo is received in memory – it does not wait for SRL write.
8
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Test 1) Synchronous Redo Transport
OLTP with Random Small Insert < 1ms RTT Network Latency
 Workload:
– Random small inserts (OLTP) to 9 tables with 787 commits per second
– 132 K redo size, 1368 logical reads, 692 block changes per transaction
 Sun Fire X4800 M2 (Exadata X2-8)
– 1 TB RAM, 64 Cores, Oracle Database 11.2.0.3, Oracle Linux
– InfiniBand, seven Exadata cells, Exadata Software 11.2.3.2
 Exadata Smart Flash, Smart Flash Logging and Write-Back flash
enabled provided significant gains
9
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Test 1) Synchronous Redo Transport
OLTP with Random Small Inserts and < 1ms RTT Network Latency
 Local standby,
Redo Rate
With Data Guard Synchronous Transport Enabled
104,051,368.80
Data Guard Transport Disabled
104,143,368.00
Txn Rate
0
DG Sync
No DG
20000000
40000000
Txn Rate
787
790.6
60000000
80000000
100000000
Redo Rate
104,051,368.80
104,143,368.00
120000000
<1ms RTT
 99MB/s redo rate
 <1% impact on
database
throughput
 1% impact on
transaction rate
RTT = network round trip time
10
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Test 2) Synchronous Redo Transport
Swingbench OLTP Workload with Metro-Area Network Latency
 Exadata X2-8, 2-node RAC database
– smart flash logging, smart write back flash
 Swingbench OLTP workload
– Random DMLs, 1 ms think time, 400 users, 6000+ transactions per
second, 30MB/s peak redo rate (different from test 2)
 Transaction profile
– 5K redo size, 120 logical reads, 30 block changes per transaction
 1 and 5ms RTT network latency
11
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Test 2) Synchronous Redo Transport
Swingbench OLTP Workload with Metro-Area Network Latency
Transactions
per/second
6000
5000
4000
3000
2000
1000
0
12
Swingbench OLTP
 30 MB/s redo
6363
6151
6077
tps
tps
tps
 3% impact at
1ms RTT
 5% impact at
5ms RTT
Baseline
No Data Guard
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Data Guard SYNC
1ms RTT
Network Latency
Data Guard SYNC
5ms RTT
Network Latency
Test 3) Synchronous Redo Transport
Large Insert OLTP Workload with Metro-Area Network Latency
 Exadata X2-8, 2-node RAC database
– smart flash logging, smart write back flash
 Large insert OLTP workload
– 180+ transactions per second, 83MB/s peak redo rate, random tables
 Transaction profile
– 440K redo size, 6000 logical reads, 2100 block changes per transaction
 1, 2 and 5ms RTT network latency
13
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Test 3) Synchronous Redo Transport
Large Insert OLTP Workload with Metro-Area Network Latency
Transactions
per/second
Large Insert - OLTP
200
150
 83 MB/s redo
 <1%% impact
189
188
177
167
tps
tps
tps
tps
2ms RTT
Network
Latency
5ms RTT
Network
Latency
100
50
0
14
Baseline
No
Data Guard
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
1ms RTT
Network
Latency
at 1ms RTT
 7% impact at
2ms RTT
 12% impact at
5ms RTT
Test 4) Synchronous Redo Transport
Mixed OLTP workload with Metro-Area Network Latency
 Exadata X2-8, 2-node RAC database
– smart flash logging, smart write back flash
 Mixed workload with high TPS
– Swingbench plus large insert workloads
– 26000+ txn per second and 112 MB/sec peak redo rate
 Transaction profile
– 4K redo size, 51 logical reads, 22 block changes per transaction
 1, 2 and 5ms RTT network latency
15
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Test 4) Synchronous Redo Transport
Mixed OLTP workload with Metro-Area Network Latency
35,000
Swingbench plus large insert
30,000
 112 MB/s redo
Txn Rate
Redo Rate
25,000
 3% impact at < 1ms RTT
20,000
 5% impact at 2ms RTT
15,000
 6% impact at 5ms RTT
10,000
5,000
0
Txns/s
Redo Rate (MB/sec)
% Workload
16
No Sync
29,496
116
100%
0ms
28,751
112
97%
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
2ms
27,995
109
95%
5ms
27,581
107
94%
10ms
26,860
104
91%
Note: 0ms latency on graph
20ms
represents
values falling in
26,206
the 102
range <1ms
89%
Additional SYNC Configuration Details
For the Previous Series of Synchronous Transport Tests
 No system bottlenecks (CPU, IO or memory) were encountered during
any of the test runs
– Primary and standby databases had 4GB online redo logs
– Log buffer was set to the maximum of 256MB
– OS max TCP socket buffer size set to 128MB on both primary and standby
– Oracle Net configured on both sides to send and receive 128MB with an
SDU for 32k
– Redo is being shipped over a 10GigE network between the two systems.
– Approximately 8-12 checkpoints/log switches are occurring per run
17
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Customer References for SYNC Transport
 Fannie Mae Case Study that includes performance data
 Other SYNC references
– Amazon
– Intel
– MorphoTrak – prior biometrics division of Motorola, case study, podcast, presentation
– Enterprise Holdings
– Discover Financial Services, podcast, presentation
– Paychex
– VocaLink
18
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Synchronous Redo Transport
Caveat that Applies to ALL SYNC Performance Comparisons
 Redo rates achieved are influenced by network latency, redo-write
size, and commit concurrency – in a dynamic relationship with each
other that will vary for every environment and application
 Test results illustrate how an example workload can scale with minimal
impact to primary database performance
 Actual mileage will vary with each application and environment.
 Oracle recommends customers conduct their own tests, using their
workload and environment. Oracle tests are not a substitute.
19
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Agenda
 Failover and Switchover Timings
 SYNC Transport Performance
 ASYNC Transport Performance
 Primary Performance with Multiple Standby Databases
 Redo Transport Compression
 Standby Apply Performance
20
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Asynchronous Redo Transport
Near Zero Data Loss
 ASYNC does not wait for primary acknowledgement
 A Data Guard NSA process transmits directly from log buffer in parallel with
local log file write
– NSA reads from disk (online redo log file) if log buffer is recycled before redo
transmission is completed
 ASYNC has minimal impact on primary database performance
 Network latency has little, if any, impact on transport throughput
– Uses Data Guard 11g streaming protocol & correctly sized TCP send/receive buffers
 Performance tests are useful to characterize max redo volume that ASYNC is
able to support without transport lag
– Goal is to ship redo as fast as generated without impacting primary performance
21
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Asynchronous Test Configuration
Details
 100GB online redo logs
 Log buffer set to the maximum of 256MB
 OS max TCP socket buffer size set to 128MB on primary and standby
 Oracle Net configured on both sides to send and receive 128MB
 Read buffer size set to 256 (_log_read_buffer_size=256) and archive buffers
set to 256 (_log_archive_buffers=256) on primary and standby
 Redo is shipped over the IB network between primary and standby nodes
(insures that transport is not bandwidth constrained)
– Near-zero network latency, approximate throughput of 1200MB/sec.
22
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
ASYNC Redo Transport Performance Test
Oracle Database 11.2.
 Data Guard ASYNC transport can sustain very
600
high rates
500
484
Redo
Transport
MB/sec
400
‒ 484 MB/sec on single node
‒ Zero transport lag
 Add RAC nodes to scale transport performance
300
‒ Each node generates its own redo thread and has a
dedicated Data Guard transport process
200
‒ Performance will scale as nodes are added assuming
adequate CPU, I/O, and network resources
100
 A 10GigE NIC on standby receives data at
0
maximum of 1.2 GB/second
Single Instance
‒ Standby can be configured to receive redo across two
or more instances
23
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Data Guard 11g Streaming Network Protocol
High Network Latency has Negligible Impact on Network Throughput
Redo
Transport
Rate
35
 Streaming protocol is new with Data Guard 11g
30
 Test measured throughput with 0 – 100ms RTT
25
 ASYNC tuning best practices
MB/sec
20
– Set correct TCP send/receive buffer size = 3 x
Network
Latency
15
0ms
25ms
50ms
100ms
10
5
0
ASYNC
24
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
BDP (bandwidth delay product)
 BDP = bandwidth x round-trip network latency
– Increase log buffer size if needed to keep NSA
process reading from memory
 See support note 951152.1
 X$LOGBUF_READHIST to determine buffer hit rate
Agenda
 Failover and Switchover Timings
 SYNC Transport Performance
 ASYNC Transport Performance
 Primary Performance with Multiple Standby Databases
 Redo Transport Compression
 Standby Apply Performance
25
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Multi-Standby Configuration
Primary - A
Local Standby - B
 A growing number of customers use multi-standby Data
Guard configurations.
SYNC
 Additional standbys are used for:
– Local zero data loss HA failover with remote DR
– Rolling maintenance to reduce planned downtime
– Offloading backups, reporting, and recovery from primary
ASYNC
– Reader farms – scale read-only performance
 This leads to the question: How is primary database
Remote
Standby - C
26
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
performance affected as the number of remote transport
destinations increases?
Redo Transport in Multi-Standby Configuration
Primary Performance Impact: 14 Asynchronous Transport Destinations
105.0%
104.0%
103.0%
102.0%
101.0%
100.0%
99.0%
98.0%
97.0%
Increase in CPU
Change in redo volume
(compared to baseline)
(compared to baseline)
100.0%
98.0%
96.0%
94.0%
92.0%
0 - 14 destinations
27
102.0%
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
0 -14 destinations
Redo Transport in Multi-Standby Configuration
Primary Performance Impact: 1 SYNC and multiple ASYNC Destinations
Increase in CPU
104.0%
(compared to baseline)
(compared to baseline)
98.0%
100.0%
96.0%
98.0%
94.0%
Zero
1/0
1/1
1/14
# of SYNC/ASYNC destinations
28
102.0%
100.0%
102.0%
96.0%
Change in redo volume
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
92.0%
Zero
1/0
1/1
1/14
# of SYNC/ASYNC destinations
Redo Transport for Gap Resolution
 Standby databases can be configured to request log files needed to
resolve gaps from other standby’s in a multi-standby configuration
 A standby database that is local to the primary database is normally
the preferred location to service gap requests
– Local standby database are least likely to be impacted by network outages
– Other standby’s are listed next
– The primary database services gap requests only as a last resort
– Utilizing a standby for gap resolution avoids any overhead on the primary
database
29
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Agenda
 Failover and Switchover Timings
 SYNC Transport Performance
 ASYNC Transport Performance
 Primary Performance with Multiple Standby Databases
 Redo Transport Compression
 Standby Apply Performance
30
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Redo Transport Compression
Conserve Bandwidth and Improve RPO when Bandwidth Constrained
2500
2000
 Test configuration
Transport
Lag - MB
– 12.5 MB/second bandwidth
– 22 MB/second redo volume
22 MB/sec
uncompressed
1500
 Uncompressed volume exceeds
available bandwidth
– Recovery Point Objective (RPO)
1000
impossible to achieve
– perpetual increase in transport lag
500
12 MB/sec
compressed
0
– volume < bandwidth = achieve RPO
– ratio will vary across workloads
Elapsed Time - Minutes
31
 50% compression ratio results in:
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
 Requires Advanced Compression
Agenda
 Failover and Switchover Timings
 SYNC Transport Performance
 ASYNC Transport Performance
 Primary Performance with Multiple Standby Databases
 Redo Transport Compression
 Standby Apply Performance
32
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Standby Apply Performance Test
 Redo apply was first disabled to accumulate a large number of log files
at the standby database. Redo apply was then restarted to evaluate
max apply rate for this workload.
 All standby log files were written to disk in Fast Recovery Area
 Exadata Write Back Flash Cache increased the redo apply rate from
72MB/second to 174MB/second using test workload (Oracle 11.2.0.3)
– Apply rates will vary based upon platform and workload
 Achieved volumes do not represent physical limits
– They only represent the particular test case configuration and workload,
higher apply rates have been achieved in practice by production customers
33
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Apply Performance at Standby Database
 Test 1: no write-back flash
cache
 On Exadata x2-2 quarter rack
 Swing bench OLTP workload
 72 MB/second apply rate
– I/O bound during checkpoints
– 1,762ms for checkpoint
complete
– 110ms DB File Parallel Write
34
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Apply Performance at Standby Database
 Test 2: a repeat of the previous
test but with write-back flash
cache enabled
 On Exadata x2-2 quarter rack
 Swing bench OLTP workload
 174 MB/second apply rate
– Checkpoint completes in
633ms vs 1,762ms
– DB File Parallel Write is
21ms vs 110ms
35
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Two Production Customer Examples
Data Guard Redo Apply Performance
 Thomson-Reuters
– Data Warehouse on Exadata, prior to write-back flash cache
– While resolving a gap of observed an average apply rate of 580MB/second
 Allstate Insurance
– Data Warehouse ETL processing resulted in average apply rate over a 3
hour period of 668MB/second, with peaks hitting 900MB/second
36
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Redo Apply Performance for Different Releases
Range of Observed Apply Rates for Batch and OLTP
Standby
Apply
Rate
MB/sec
700
600
500
400
300
200
100
0
High End - Batch
High End - OLTP
Oracle
Database 9i
37
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle
Database
10g
Oracle
Database
11g (non
Exadata)
Oracle
Database
11g
(Exadata)
38
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
39
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.