oracle active data guard performance
DESCRIPTION
Oracle Active Data Guard Performance. Joseph Meeks Director, Product Management Oracle High Availability Systems. Note to viewer. These slides provide various aspects of performance data for Data Guard and Active Data Guard – we are in the process of updating for Oracle Database 12c. - PowerPoint PPT PresentationTRANSCRIPT
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.1
Oracle Active Data GuardPerformance
Joseph MeeksDirector, Product ManagementOracle High Availability Systems
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.2
Note to viewer
These slides provide various aspects of performance data for Data Guard and Active Data Guard – we are in the process of updating for Oracle Database 12c.
It can be shared with customers, but is not intended to be a canned presentation ready to go in its entirety
It provides SC’s data that can be used to substantiate Data Guard performance or to provide focused answers to particular concerns that may be expressed by customers.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.3
Note to viewer
See this FAQ for more customer and sales collateral– http://database.us.oracle.com/pls/htmldb/f?
p=301:75:101451461043366::::P75_ID,P75_AREAID:21704,2
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.4
Agenda – Data Guard Performance
Failover and Switchover Timings SYNC Transport Performance ASYNC Transport Performance Primary Performance with Multiple Standby Databases Redo Transport Compression Standby Apply Performance
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.5
Data Guard 12.1 Example - Faster Failover
# of database sessions on primary and standby
# of database sessions on primary and standby
43 seconds2,000 sessions on both primary and standby
48 seconds2,000 sessions on both primary and standby
Preliminary
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.6
Data Guard 12.1 Example – Faster Switchover
# of database sessions on primary and standby
# of database sessions on primary and standby
83 seconds 500 sessions on both primary and standby
72 seconds1,000 sessions on both primary and standby
Preliminary
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.7
Agenda – Data Guard Performance
Failover and Switchover Timings SYNC Transport Performance ASYNC Transport Performance Primary Performance with Multiple Standby Databases Redo Transport Compression Standby Apply Performance
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.8
Synchronous Redo Transport Primary database performance is impacted by the total round-trip time for
acknowledgement to be received from the standby database– Data Guard NSS process transmits Redo to the standby directly from log buffer, in
parallel with local log file write– Standby receives redo, writes to a standby redo log file (SRL), then returns ACK– Primary receives standby ACK, then acknowledges commit success to app
The following performance tests show the impact of SYNC transport on primary database using various workloads and latencies
In all cases, transport was able to keep pace with generation – no lag We are working on test data for Fast Sync (SYNCNOAFFIRM) in Oracle
Database 12c (same process as above, but standby acks primary as soon as redo is received in memory – it does not wait for SRL write.
Zero Data Loss
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.9
Test 1) Synchronous Redo Transport
Workload:– Random small inserts (OLTP) to 9 tables with 787 commits per second– 132 K redo size, 1368 logical reads, 692 block changes per transaction
Sun Fire X4800 M2 (Exadata X2-8)– 1 TB RAM, 64 Cores, Oracle Database 11.2.0.3, Oracle Linux– InfiniBand, seven Exadata cells, Exadata Software 11.2.3.2
Exadata Smart Flash, Smart Flash Logging and Write-Back flash enabled provided significant gains
OLTP with Random Small Insert < 1ms RTT Network Latency
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.10
Test 1) Synchronous Redo Transport
Local standby, <1ms RTT
99MB/s redo rate <1% impact on
database throughput
1% impact on transaction rate
OLTP with Random Small Inserts and < 1ms RTT Network Latency
Txn Rate
Redo Rate
0 20000000 40000000 60000000 80000000 100000000 120000000
104,143,368.00
104,051,368.80With Data Guard Synchronous Transport Enabled
Data Guard Transport Disabled
RTT = network round trip time
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.11
Test 2) Synchronous Redo Transport
Exadata X2-8, 2-node RAC database– smart flash logging, smart write back flash
Swingbench OLTP workload– Random DMLs, 1 ms think time, 400 users, 6000+ transactions per
second, 30MB/s peak redo rate (different from test 2) Transaction profile
– 5K redo size, 120 logical reads, 30 block changes per transaction 1 and 5ms RTT network latency
Swingbench OLTP Workload with Metro-Area Network Latency
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.12
Test 2) Synchronous Redo Transport
30 MB/s redo 3% impact at
1ms RTT 5% impact at
5ms RTT
Swingbench OLTP Workload with Metro-Area Network Latency
0
1000
2000
3000
4000
5000
6000Swingbench OLTP
BaselineNo Data Guard
Data Guard SYNC1ms RTT
Network Latency
Data Guard SYNC5ms RTT
Network Latency
6363tps
6151tps
6077tps
Transactions per/second
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.13
Test 3) Synchronous Redo Transport
Exadata X2-8, 2-node RAC database– smart flash logging, smart write back flash
Large insert OLTP workload– 180+ transactions per second, 83MB/s peak redo rate, random tables
Transaction profile– 440K redo size, 6000 logical reads, 2100 block changes per transaction
1, 2 and 5ms RTT network latency
Large Insert OLTP Workload with Metro-Area Network Latency
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.14
0
50
100
150
200
Test 3) Synchronous Redo Transport
83 MB/s redo <1%% impact
at 1ms RTT 7% impact at
2ms RTT 12% impact at
5ms RTT
Large Insert OLTP Workload with Metro-Area Network Latency
Large Insert - OLTP
BaselineNo
Data Guard
1ms RTTNetwork Latency
5ms RTTNetworkLatency
Transactions per/second
189tps
188tps
167tps
177tps
2ms RTTNetwork Latency
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.15
Test 4) Synchronous Redo Transport
Exadata X2-8, 2-node RAC database– smart flash logging, smart write back flash
Mixed workload with high TPS – Swingbench plus large insert workloads– 26000+ txn per second and 112 MB/sec peak redo rate
Transaction profile– 4K redo size, 51 logical reads, 22 block changes per transaction
1, 2 and 5ms RTT network latency
Mixed OLTP workload with Metro-Area Network Latency
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.16
No Sync 0ms 2ms 5ms 10ms 20ms 0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
Txn
Rate
Redo
Rat
e
Test 4) Synchronous Redo TransportMixed OLTP workload with Metro-Area Network Latency
Swingbench plus large insert 112 MB/s redo 3% impact at < 1ms RTT 5% impact at 2ms RTT 6% impact at 5ms RTT
Note: 0ms latency on graph represents values falling in the range <1ms
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.17
Additional SYNC Configuration Details
No system bottlenecks (CPU, IO or memory) were encountered during any of the test runs
– Primary and standby databases had 4GB online redo logs– Log buffer was set to the maximum of 256MB– OS max TCP socket buffer size set to 128MB on both primary and standby– Oracle Net configured on both sides to send and receive 128MB with an
SDU for 32k– Redo is being shipped over a 10GigE network between the two systems.– Approximately 8-12 checkpoints/log switches are occurring per run
For the Previous Series of Synchronous Transport Tests
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.18
Customer References for SYNC Transport
Fannie Mae Case Study that includes performance data Other SYNC references
– Amazon– Intel– MorphoTrak – prior biometrics division of Motorola, case study, podcast, presentation– Enterprise Holdings– Discover Financial Services, podcast, presentation– Paychex– VocaLink
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.19
Synchronous Redo Transport
Redo rates achieved are influenced by network latency, redo-write size, and commit concurrency – in a dynamic relationship with each other that will vary for every environment and application
Test results illustrate how an example workload can scale with minimal impact to primary database performance
Actual mileage will vary with each application and environment. Oracle recommends customers conduct their own tests, using their
workload and environment. Oracle tests are not a substitute.
Caveat that Applies to ALL SYNC Performance Comparisons
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.20
Agenda
Failover and Switchover Timings SYNC Transport Performance ASYNC Transport Performance Primary Performance with Multiple Standby Databases Redo Transport Compression Standby Apply Performance
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.21
Asynchronous Redo Transport
ASYNC does not wait for primary acknowledgement A Data Guard NSA process transmits directly from log buffer in parallel with
local log file write– NSA reads from disk (online redo log file) if log buffer is recycled before redo
transmission is completed ASYNC has minimal impact on primary database performance Network latency has little, if any, impact on transport throughput
– Uses Data Guard 11g streaming protocol & correctly sized TCP send/receive buffers Performance tests are useful to characterize max redo volume that ASYNC is
able to support without transport lag– Goal is to ship redo as fast as generated without impacting primary performance
Near Zero Data Loss
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.22
Asynchronous Test Configuration
100GB online redo logs Log buffer set to the maximum of 256MB OS max TCP socket buffer size set to 128MB on primary and standby Oracle Net configured on both sides to send and receive 128MB Read buffer size set to 256 (_log_read_buffer_size=256) and archive buffers
set to 256 (_log_archive_buffers=256) on primary and standby Redo is shipped over the IB network between primary and standby nodes
(insures that transport is not bandwidth constrained)– Near-zero network latency, approximate throughput of 1200MB/sec.
Details
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.23
ASYNC Redo Transport Performance Test
Single Instance0
100
200
300
400
500
600 Data Guard ASYNC transport can sustain very high rates‒ 484 MB/sec on single node‒ Zero transport lag
Add RAC nodes to scale transport performance‒ Each node generates its own redo thread and has a
dedicated Data Guard transport process‒ Performance will scale as nodes are added assuming
adequate CPU, I/O, and network resources A 10GigE NIC on standby receives data at
maximum of 1.2 GB/second‒ Standby can be configured to receive redo across two
or more instances
Oracle Database 11.2.
Redo TransportMB/sec
484
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.24
Data Guard 11g Streaming Network Protocol
Streaming protocol is new with Data Guard 11g Test measured throughput with 0 – 100ms RTT ASYNC tuning best practices
– Set correct TCP send/receive buffer size = 3 x BDP (bandwidth delay product)
BDP = bandwidth x round-trip network latency– Increase log buffer size if needed to keep NSA
process reading from memory See support note 951152.1 X$LOGBUF_READHIST to determine
buffer hit rate
High Network Latency has Negligible Impact on Network Throughput
ASYNC0
5
10
15
20
25
30
35
0ms25ms50ms
RedoTransportRateMB/sec
NetworkLatency
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.25
Agenda
Failover and Switchover Timings SYNC Transport Performance ASYNC Transport Performance Primary Performance with Multiple Standby Databases Redo Transport Compression Standby Apply Performance
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.26
Multi-Standby Configuration
A growing number of customers use multi-standby Data Guard configurations.
Additional standbys are used for:– Local zero data loss HA failover with remote DR– Rolling maintenance to reduce planned downtime– Offloading backups, reporting, and recovery from primary – Reader farms – scale read-only performance
This leads to the question: How is primary database performance affected as the number of remote transport destinations increases?
Primary - A Local Standby - B
RemoteStandby - C
SYNC
ASYNC
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.27
Redo Transport in Multi-Standby Configuration
97.0%98.0%99.0%
100.0%101.0%102.0%103.0%104.0%105.0%
Primary Performance Impact: 14 Asynchronous Transport Destinations
93.0%94.0%95.0%96.0%97.0%98.0%99.0%
100.0%101.0%102.0%
Increase in CPU(compared to baseline)
Change in redo volume(compared to baseline)
0 - 14 destinations 0 -14 destinations
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.28
Redo Transport in Multi-Standby Configuration
96.0%
98.0%
100.0%
102.0%
104.0%
Primary Performance Impact: 1 SYNC and multiple ASYNC Destinations
93.0%94.0%95.0%96.0%97.0%98.0%99.0%
100.0%101.0%
Increase in CPU(compared to baseline)
Change in redo volume(compared to baseline)
# of SYNC/ASYNC destinations
Zero 1/0 1/1 1/14
# of SYNC/ASYNC destinations
Zero 1/0 1/1 1/14
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.29
Redo Transport for Gap Resolution
Standby databases can be configured to request log files needed to resolve gaps from other standby’s in a multi-standby configuration
A standby database that is local to the primary database is normally the preferred location to service gap requests
– Local standby database are least likely to be impacted by network outages– Other standby’s are listed next– The primary database services gap requests only as a last resort – Utilizing a standby for gap resolution avoids any overhead on the primary
database
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.30
Agenda
Failover and Switchover Timings SYNC Transport Performance ASYNC Transport Performance Primary Performance with Multiple Standby Databases Redo Transport Compression Standby Apply Performance
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.31
0
500
1000
1500
2000
2500
Redo Transport Compression
Test configuration– 12.5 MB/second bandwidth– 22 MB/second redo volume
Uncompressed volume exceeds available bandwidth
– Recovery Point Objective (RPO) impossible to achieve
– perpetual increase in transport lag 50% compression ratio results in:
– volume < bandwidth = achieve RPO – ratio will vary across workloads
Requires Advanced Compression
Conserve Bandwidth and Improve RPO when Bandwidth Constrained
22 MB/secuncompressed
12 MB/seccompressed
Elapsed Time - Minutes
TransportLag - MB
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.32
Agenda
Failover and Switchover Timings SYNC Transport Performance ASYNC Transport Performance Primary Performance with Multiple Standby Databases Redo Transport Compression Standby Apply Performance
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.33
Standby Apply Performance Test
Redo apply was first disabled to accumulate a large number of log files at the standby database. Redo apply was then restarted to evaluate max apply rate for this workload.
All standby log files were written to disk in Fast Recovery Area Exadata Write Back Flash Cache increased the redo apply rate from
72MB/second to 174MB/second using test workload (Oracle 11.2.0.3)– Apply rates will vary based upon platform and workload
Achieved volumes do not represent physical limits– They only represent the particular test case configuration and workload,
higher apply rates have been achieved in practice by production customers
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.34
Apply Performance at Standby Database
Test 1: no write-back flash cache
On Exadata x2-2 quarter rack Swing bench OLTP workload 72 MB/second apply rate
– I/O bound during checkpoints– 1,762ms for checkpoint
complete– 110ms DB File Parallel Write
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.35
Apply Performance at Standby Database
Test 2: a repeat of the previous test but with write-back flash cache enabled
On Exadata x2-2 quarter rack Swing bench OLTP workload 174 MB/second apply rate
– Checkpoint completes in 633ms vs 1,762ms
– DB File Parallel Write is 21ms vs 110ms
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.36
Two Production Customer Examples
Thomson-Reuters– Data Warehouse on Exadata, prior to write-back flash cache– While resolving a gap of observed an average apply rate of 580MB/second
Allstate Insurance– Data Warehouse ETL processing resulted in average apply rate over a 3
hour period of 668MB/second, with peaks hitting 900MB/second
Data Guard Redo Apply Performance
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.37
Redo Apply Performance for Different Releases
0200400600
High End - BatchHigh End - OLTP
Range of Observed Apply Rates for Batch and OLTP
StandbyApplyRate
MB/sec
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.38
Copyright © 2012, Oracle and/or its affiliates. All rights reserved.39