diagnosing the bottlenecks in your streams environment brian keating chris lawson may 17, 2007

52
Diagnosing the Diagnosing the “Bottlenecks” in “Bottlenecks” in your Streams your Streams Environment Environment Brian Keating Brian Keating Chris Lawson Chris Lawson May 17, 2007 May 17, 2007

Upload: sierra-mynatt

Post on 30-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Diagnosing the Diagnosing the “Bottlenecks” in your “Bottlenecks” in your Streams EnvironmentStreams Environment

Brian KeatingBrian Keating

Chris LawsonChris Lawson

May 17, 2007May 17, 2007

Page 2: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

What’s our Goal for Today?What’s our Goal for Today?

““We will divulge & show you how to We will divulge & show you how to deal with some of the deal with some of the complications of Streams.”complications of Streams.”

Page 3: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Our AgendaOur Agenda

Overview of Streams ReplicationOverview of Streams Replication What are the parts of Streams?What are the parts of Streams? How does it work?How does it work?

Troublesome Parts of StreamsTroublesome Parts of Streams Troubleshooting techniques Troubleshooting techniques A fun trivia questionA fun trivia question Types of bottlenecks & how to resolveTypes of bottlenecks & how to resolve Some helpful SQL scriptsSome helpful SQL scripts Questions?Questions?

Page 4: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

A Few CaveatsA Few Caveats

Unless otherwise stated, our examples Unless otherwise stated, our examples assume Oracle 10g, release 2.assume Oracle 10g, release 2.

Streams Streams set-upset-up is not the real problem; thus is not the real problem; thus our we focus on our we focus on monitoringmonitoring and and troubleshootingtroubleshooting..

Set-up documented in Set-up documented in Oracle 10g Release 2 Oracle 10g Release 2 Concepts & Admin Guide.Concepts & Admin Guide.

Also—the subject of “conflict resolution” is a Also—the subject of “conflict resolution” is a complex subject of its own, and will not be complex subject of its own, and will not be covered.covered.

Page 5: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Is Streams Difficult or Easy?Is Streams Difficult or Easy?

How difficult is it to figure out all the How difficult is it to figure out all the pieces, & keep Streams working pieces, & keep Streams working well?well?

Here’s one opinion . . . Here’s one opinion . . .

Page 6: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Gosh, It’s Really Easy!Gosh, It’s Really Easy!Your Life will be so PeacefulYour Life will be so Peaceful

““You You don’tdon’t need rocket scientists on your staff!” * need rocket scientists on your staff!” *

* Marketing executive, on the simplicity of Streams/CDC.* Marketing executive, on the simplicity of Streams/CDC.

Page 7: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Here’s another point of view . . . Here’s another point of view . . .

Page 8: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Streams is like a Whirlpool Streams is like a Whirlpool

““Is Streams really a proven product?” *Is Streams really a proven product?” *

* Programmer, noting the numerous bug fixes.* Programmer, noting the numerous bug fixes.

DBA

Page 9: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Streams Replication:Streams Replication:A View from 10,000 FeetA View from 10,000 Feet

• Streams propagates both DML & DDL Streams propagates both DML & DDL changes to another database (or changes to another database (or elsewhere in same database.)elsewhere in same database.)

• Streams is based on Streams is based on Log MinerLog Miner, which , which has been around for years.has been around for years.

• You decide, via “rules,” which changes You decide, via “rules,” which changes are replicated.are replicated.

• You can optionally “transform” data You can optionally “transform” data before you apply it at the destination.before you apply it at the destination.

Page 10: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Overview of Streams :Overview of Streams :Three Main ProcessesThree Main Processes

Capture Propagate Apply

c0, c1 a001

background processes

Note: “p” processes also used in capture & apply

J00, j01

Page 11: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

What’s in the Redo Log?What’s in the Redo Log?

• LogMiner continuously reads DML & DDL changes from redo logs.

• It converts those changes into logical change records (LCRs).

• There is at least 1 LCR per row changed. • The LCR contains the actual change, as well

as the original data.

Redo logsLCR’s

LogMiner Processing

Page 12: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

What’s in the Redo Log?What’s in the Redo Log?

• Recall that changed (dirty) blocks in the redo log buffer are written after anyone’s commit—not just your own.

This means that changes will often be captured, propagated (but not applied) even though you haven’t committed!

Page 13: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

The Capture Process:The Capture Process: Capture ChangeCapture Change

Capture Change Generate LCR Enqueue Message

• The capture change step reads changes from redo logs.

• All changes (DML and DDL) in redo logs are captured – regardless of whether Streams is configured to propagate any given change.

• Observe “capture_message_number” value (in v$streams_capture). It regularly increments – even if there are no application transactions.

Page 14: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

The Capture Process:The Capture Process: Generate LCRGenerate LCR

• First, examine change & determine if Streams is configured to handle that change or not.

• If so, convert into one or more “logical change records”, or “LCRs”.

• An LCR only contains changes for one row. DML statements that affect multiple rows will cause multiple LCRs to be generated.

• So, if one statement updates 1,000 rows, at least 1,000 LCRs will be created!

Capture Change Generate LCR Enqueue Message

Page 15: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

The Capture Process:The Capture Process: Enqueue MessageEnqueue Message

• This step places previously-created LCRs onto the capture process’s queue.

• Oracle uses “q00n” background workers, & the QMNC (Queue Monitor Coordinator)

Capture Change Generate LCR Enqueue Message

q001 q002 q003QMNC

Page 16: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

The Capture Process:The Capture Process: Enqueue MessageEnqueue Message

Example of the queue background processes:

Select Program, Sid From V$sessionWhere Upper(program) Like '%(Q%'And Osuser = 'Oracle‘ Order By 1;

PROGRAM SID-------------------------------------------- -----oracle@ny1-server_NY-isb-sa (QMNC) 471oracle@ny1-server_NY-isb-sa (q000) 456oracle@ny1-server_NY-isb-sa (q001) 936oracle@ny1-server_NY-isb-sa (q002) 1366oracle@ny1-server_NY-isb-sa (q003) 261oracle@ny1-server_NY-isb-sa (q004) 744

Page 17: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

The Propagate ProcessThe Propagate Process

Source Queue Source Queue Propagate ProcessPropagate Process Destination Queue Destination Queue

• Streams copies LCRs from the source Streams copies LCRs from the source queue to destination queue.queue to destination queue.

• These transfers are done by the These transfers are done by the J00nJ00n background processes. background processes.

• How do we know it’s the “j” processes?How do we know it’s the “j” processes?• Let’s look at session statistics and see Let’s look at session statistics and see

who’s doing all the db link communicatingwho’s doing all the db link communicating

Page 18: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

The Propagate ProcessThe Propagate ProcessWho’s doing the Sending?Who’s doing the Sending?

Select Sid, Name, Value From V$sesstat One, V$statname TwoSelect Sid, Name, Value From V$sesstat One, V$statname TwoWhere One.Statistic# = Two.Statistic#Where One.Statistic# = Two.Statistic#And Name Like '%Link%‘ And Value > 1000And Name Like '%Link%‘ And Value > 1000

SID NAME VALUESID NAME VALUE----- -------------------------------------------- --------------- -------------------------------------------- ---------- 467 bytes sent via SQL*Net to dblink 7.3870E+10467 bytes sent via SQL*Net to dblink 7.3870E+10 947 bytes sent via SQL*Net to dblink 2.5478E+10947 bytes sent via SQL*Net to dblink 2.5478E+10 467 bytes received via SQL*Net from dblink 113698397467 bytes received via SQL*Net from dblink 113698397 947 bytes received via SQL*Net from dblink 47786284947 bytes received via SQL*Net from dblink 47786284 467 SQL*Net roundtrips to/from dblink 2661410467 SQL*Net roundtrips to/from dblink 2661410 947 SQL*Net roundtrips to/from dblink 1179795947 SQL*Net roundtrips to/from dblink 1179795

SIDs 467, 947 are indeed the “j00” processesSIDs 467, 947 are indeed the “j00” processes

Page 19: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

The Propagate ProcessThe Propagate ProcessWhat’s in the Queue?What’s in the Queue?

Sample of the queue content:Sample of the queue content:

Select Queue_Name, Num_Msgs From V$Buffered_Queues;Select Queue_Name, Num_Msgs From V$Buffered_Queues;

QUEUE_NAME NUM_MSGSQUEUE_NAME NUM_MSGS------------------------- ----------------------------------- ----------APP_NY_PROD_Q 544APP_NY_PROD_Q 544CAP_LA_PROD_Q 1640CAP_LA_PROD_Q 1640

Page 20: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

The Apply ProcessThe Apply ProcessQueue and DequeueQueue and Dequeue

• LCRs usually wait in destination queue until LCRs usually wait in destination queue until their transaction is committed or rolled back.their transaction is committed or rolled back.

• There are some exceptions, due to There are some exceptions, due to the queue “spilling” (covered later)the queue “spilling” (covered later)

lcrlcr lcrlcrlcrlcrlcrlcrlcrlcr

lcrlcr

lcrlcr

lcrlcr

• The LCRs also remain on the The LCRs also remain on the capturecapture queue until they are queue until they are appliedapplied at the at the destination.destination.

Page 21: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

• When transaction commits, the Streams sends When transaction commits, the Streams sends over a single over a single CommitCommit LCR. LCR.

• After destination applies LCRs, destination sends After destination applies LCRs, destination sends back confirming LCR that it’s okay for capture back confirming LCR that it’s okay for capture queue to remove those LCRs.queue to remove those LCRs.

• For rollback, source sends 1 rollback LCR for For rollback, source sends 1 rollback LCR for each each LCR in that transactionLCR in that transaction..

Commits & RollbacksCommits & Rollbacks

And now to the troubleshooting . . . And now to the troubleshooting . . .

Page 22: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Trivia QuestionTrivia Question

Question: how does Streams avoid “infinite Question: how does Streams avoid “infinite loop” replication?loop” replication?

For example, after a change is applied at a For example, after a change is applied at a destination database, why doesn’t Streams “re-destination database, why doesn’t Streams “re-capture” that change, and then propagate it capture” that change, and then propagate it backback to the original source database?to the original source database?

The answer to this question will beThe answer to this question will beprovided later in the presentation.provided later in the presentation.

Page 23: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

LCR ConceptsLCR Concepts

An LCR is one type of Streams message – the An LCR is one type of Streams message – the type that is formatted by the Capture process.type that is formatted by the Capture process.

Two types of LCRs: DDL LCRs and row LCRs.Two types of LCRs: DDL LCRs and row LCRs.

A DDL LCR contains information about a single A DDL LCR contains information about a single DDL change.DDL change.

A row LCR contains information about a A row LCR contains information about a change to a change to a singlesingle row. row.

As a result, transactions that affect As a result, transactions that affect multiplemultiple rows will cause rows will cause multiplemultiple LCRs to be generated. LCRs to be generated.

Page 24: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

LCR ConceptsLCR ConceptsLOB IssuesLOB Issues

Tables that contain LOB datatypes can cause Tables that contain LOB datatypes can cause multiple LCRs to be generated multiple LCRs to be generated per row affectedper row affected!!

Inserts into tables with LOBs will Inserts into tables with LOBs will alwaysalways cause cause multiple LCRs to be generated per row.multiple LCRs to be generated per row.

Updates into those tables Updates into those tables mightmight cause multiple cause multiple LCRs, if LCRs, if anyany of the LOB columns are updated. of the LOB columns are updated.

Deletes will Deletes will nevernever generate multiple LCRs – generate multiple LCRs – deletes always generate 1 LCR per row.deletes always generate 1 LCR per row.

Page 25: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

LCR ConceptsLCR ConceptsOther Items to NoteOther Items to Note

• All of the LCRs that are part of a All of the LCRs that are part of a singlesingle transaction must be applied by transaction must be applied by oneone apply server. apply server.

If a transaction generates 10,000 LCRs, then all If a transaction generates 10,000 LCRs, then all 10,000 of them must be applied by just 10,000 of them must be applied by just oneone apply apply server – no matter how many servers are running.server – no matter how many servers are running.

• For each transaction, there is one For each transaction, there is one additionaladditional LCR LCR generated, at the end of the transaction. generated, at the end of the transaction.

If a single transaction deletes 2,000 rows, then If a single transaction deletes 2,000 rows, then 2,002,0011 LCRs will be generated. This item is LCRs will be generated. This item is important to note for the “spill threshold” value.important to note for the “spill threshold” value.

Page 26: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

““Spill” ConceptsSpill” Concepts

Normally, outstanding messages are held in Normally, outstanding messages are held in buffered queues, which reside in memory.buffered queues, which reside in memory.

In some cases, messages can be “spilled”; that is,In some cases, messages can be “spilled”; that is, they can be moved into tables on disk. 3 Reasons:they can be moved into tables on disk. 3 Reasons:

• The The totaltotal number of outstanding messages number of outstanding messages is too large to fit in the buffered queue;is too large to fit in the buffered queue;

• A given message has been in memory too long;A given message has been in memory too long;

• The number of messages in a The number of messages in a single transactionsingle transaction is is larger than the “LCR spill threshold” parameter.larger than the “LCR spill threshold” parameter.

Page 27: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

““Spill” ConceptsSpill” ConceptsTypes of Spill TablesTypes of Spill Tables

There are two types of tables that can hold There are two types of tables that can hold “spilled” messages:“spilled” messages:

• ““Queue” tablesQueue” tables: each buffered queue has a : each buffered queue has a “queue table” associated with it. Queue tables are “queue table” associated with it. Queue tables are created at the same time as buffered queues.created at the same time as buffered queues.

Name format for queue tables:Name format for queue tables:

AQ$_AQ$_<name that you specified><name that you specified>_P_P

Example:Example:AQ$_AQ$_CAPTURE_QUEUE_TABLECAPTURE_QUEUE_TABLE_P_P

Page 28: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

““Spill” ConceptsSpill” ConceptsTypes of Spill Tables (cont.)Types of Spill Tables (cont.)

• The “Spill” tableThe “Spill” table: there is one (and only one) : there is one (and only one) “spill” table, in any database that uses Streams.“spill” table, in any database that uses Streams.

The name of the spill table is:The name of the spill table is:

SYS.STREAMS$_APPLY_SPILL_MSGS_PARTSYS.STREAMS$_APPLY_SPILL_MSGS_PART As the name implies, the spill table can As the name implies, the spill table can onlyonly be be used by apply processes – not by capture used by apply processes – not by capture processes.processes.

Page 29: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

““Spill” ConceptsSpill” Concepts Where are the spilled messages?Where are the spilled messages?

Tables where spilled messages are placed:Tables where spilled messages are placed:

• If a message is spilled because the If a message is spilled because the totaltotal number of number of outstanding messages is too large, then that outstanding messages is too large, then that message is placed in the associated queue table.message is placed in the associated queue table.

• If a message has been held in memory too long, If a message has been held in memory too long, then that message is placed in the queue table.then that message is placed in the queue table.

• If the number of LCRs in a If the number of LCRs in a single transactionsingle transaction exceeds the “LCR spill threshold”, then exceeds the “LCR spill threshold”, then allall of those of those LCRs are placed in the LCRs are placed in the spillspill table – table – SYS.STREAMS$_APPLY_SPILL_MSGS_PART.SYS.STREAMS$_APPLY_SPILL_MSGS_PART.

Page 30: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

TroubleshootingTroubleshooting

Two basic ways to troubleshoot Streams issues:Two basic ways to troubleshoot Streams issues:

• Query the internal Streams tables and views;Query the internal Streams tables and views;

• Search the alert log for messages. (This is Search the alert log for messages. (This is particularly useful for capture process issues).particularly useful for capture process issues).

Page 31: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Capture Process Capture Process Tables and ViewsTables and Views

streams$_capture_processstreams$_capture_process: lists all defined : lists all defined capture processescapture processes

dba_capturedba_capture: basic status, error info: basic status, error info

v$streams_capturev$streams_capture: detailed status info: detailed status info

dba_capture_parametersdba_capture_parameters: configuration info: configuration info

Page 32: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Propagate Process Propagate Process Tables and ViewsTables and Views

streams$_propagation_processstreams$_propagation_process: lists all defined: lists all defined propagate propagate procsprocs

dba_propagationdba_propagation: basic status, error info: basic status, error info

v$propagation_senderv$propagation_sender: detailed status: detailed status

v$propagation_receiverv$propagation_receiver: detailed status: detailed status

Page 33: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Apply Process Apply Process Tables and ViewsTables and Views

streams$_apply_processstreams$_apply_process: lists all defined apply: lists all defined apply processesprocesses

dba_applydba_apply: basic status, error info: basic status, error info

v$streams_apply_readerv$streams_apply_reader: status of the apply reader : status of the apply reader v$streams_apply_serverv$streams_apply_server: status of apply server(s): status of apply server(s)

v$streams_apply_coordinatorv$streams_apply_coordinator: overall status, : overall status, latency infolatency info

dba_apply_parametersdba_apply_parameters: configuration info: configuration info

Page 34: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

““Miscellaneous” Miscellaneous” Tables and ViewsTables and Views

v$buffered_queuesv$buffered_queues: view that displays the : view that displays the current and cumulative number of messages, for current and cumulative number of messages, for each buffered queue.each buffered queue.

sys.streams$_apply_spill_msgs_partsys.streams$_apply_spill_msgs_part: table that : table that the apply process uses, to “spill” messages from the apply process uses, to “spill” messages from large transactions to disk.large transactions to disk.

system.logmnr_restart_ckpt$system.logmnr_restart_ckpt$: table that holds : table that holds capture process “checkpoint” information.capture process “checkpoint” information.

Page 35: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Types of BottlenecksTypes of Bottlenecks

Two main types of “bottlenecks”:Two main types of “bottlenecks”:

Type 1: Replication is completely stopped; i.e., no Type 1: Replication is completely stopped; i.e., no changes are being replicated at all.changes are being replicated at all.

Type 2: Replication is running, but it is Type 2: Replication is running, but it is slowerslower than the rate of DML; i.e., it is “falling behind”.than the rate of DML; i.e., it is “falling behind”.

Page 36: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Capture Process BottlenecksCapture Process Bottlenecks

Capture process bottlenecks typically have to Capture process bottlenecks typically have to do with a capture process being unable to read do with a capture process being unable to read necessary online or (especially) archive logs. necessary online or (especially) archive logs.

This will result in a “Type 1” bottleneck – that is, This will result in a “Type 1” bottleneck – that is, no changes will be replicated at all (because no no changes will be replicated at all (because no changes can be captured).changes can be captured).

Almost all of the “Type 1” bottlenecks that I Almost all of the “Type 1” bottlenecks that I have ever encountered have been due to archive have ever encountered have been due to archive log issues with capture processes.log issues with capture processes.

Page 37: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Capture Process BottlenecksCapture Process BottlenecksCapture “Checkpoints”Capture “Checkpoints”

The capture process writes its own “checkpoint” The capture process writes its own “checkpoint” information to its data dictionary tables. information to its data dictionary tables.

This checkpoint info keeps track of the SCN This checkpoint info keeps track of the SCN values that the capture process has scanned. values that the capture process has scanned. That information is used to calculate the That information is used to calculate the “required_checkpoint_scn” value. “required_checkpoint_scn” value.

Capture process checkpoint information is Capture process checkpoint information is primarily stored in the following table:primarily stored in the following table:

SYSTEM.LOGMNR_RESTART_CKPT$SYSTEM.LOGMNR_RESTART_CKPT$

Page 38: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Capture Process BottlenecksCapture Process BottlenecksCapture “Checkpoints” (cont.)Capture “Checkpoints” (cont.)

By default, the capture process writes By default, the capture process writes checkpoints very frequently, and stores their data checkpoints very frequently, and stores their data for a long time. On a very write-intensive system, for a long time. On a very write-intensive system, this can cause the LOGMNR_RESTART_CKPT$ this can cause the LOGMNR_RESTART_CKPT$ table to become table to become extremelyextremely large, very quickly. large, very quickly.

The amount of data stored in that table can be The amount of data stored in that table can be modified with these capture process parameters:modified with these capture process parameters:

_checkpoint_frequency_checkpoint_frequency: number of megabytes : number of megabytes captured which will trigger a checkpointcaptured which will trigger a checkpointcheckpoint_retention_timecheckpoint_retention_time: number of days to : number of days to retain checkpoint information retain checkpoint information

Page 39: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Capture Process BottlenecksCapture Process BottlenecksSCN checksSCN checks

When a capture process starts up, it calculates When a capture process starts up, it calculates its “required_checkpoint_scn” value. This value its “required_checkpoint_scn” value. This value determines which redo logs must be scanned, determines which redo logs must be scanned, beforebefore any new transactions can be captured. any new transactions can be captured.

The capture process needs to do this check, in The capture process needs to do this check, in order to ensure that it does not “miss” any order to ensure that it does not “miss” any transactions that occurred while it was down.transactions that occurred while it was down.

As a result, when a capture process starts up, As a result, when a capture process starts up, the redo log that contains that SCN value – the redo log that contains that SCN value – and and everyevery subsequent log – must be present in subsequent log – must be present in the log_archive_dest directory (or in online logs).the log_archive_dest directory (or in online logs).

Page 40: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Capture Process BottlenecksCapture Process BottlenecksSCN checks (cont.)SCN checks (cont.)

If If anyany required redo logs are missing during a required redo logs are missing during a capture process restart, then the capture process capture process restart, then the capture process will permanently “hang” during its startup. will permanently “hang” during its startup.

This issue will completely prevent the capture This issue will completely prevent the capture process from capturing process from capturing anyany new changes. new changes.

If the required redo log(s) cannot be restored, If the required redo log(s) cannot be restored, then the only way to resolve this situation is to then the only way to resolve this situation is to completely completely rebuildrebuild the Streams environment. the Streams environment.As a result, it is extremely important to “keep As a result, it is extremely important to “keep track” of redo logs that the capture process track” of redo logs that the capture process needs, needs, beforebefore deleting any logs. deleting any logs.

Page 41: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Capture Process BottlenecksCapture Process Bottlenecks Which archive logs are needed?Which archive logs are needed?

The following query can be used to determine the The following query can be used to determine the oldestoldest archive log that will need to be read, during archive log that will need to be read, during the next restart of a capture process.the next restart of a capture process.

select a.sequence#, b.name from v$log_history a, v$archived_log bselect a.sequence#, b.name from v$log_history a, v$archived_log bwhere a.first_change# <= where a.first_change# <= (select required_checkpoint_scn from dba_capture (select required_checkpoint_scn from dba_capture where capture_name = ‘<capture process name>’)where capture_name = ‘<capture process name>’)and a.next_change# > and a.next_change# > (select required_checkpoint_scn from dba_capture (select required_checkpoint_scn from dba_capture where capture_name = ‘<capture process name>’)where capture_name = ‘<capture process name>’)and a.sequence# = b.sequence#(+);and a.sequence# = b.sequence#(+);

If If nono rows are returned from that query, then the rows are returned from that query, then the SCN in question resides in an SCN in question resides in an onlineonline log. log.

Page 42: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Capture Process BottlenecksCapture Process BottlenecksFlow ControlFlow Control

In 10g, the capture process is configured with In 10g, the capture process is configured with “ “automatic flow control”. This feature prevents the automatic flow control”. This feature prevents the capture process from spilling many messages.capture process from spilling many messages.

If a large number of messages build up in the If a large number of messages build up in the capture process’s buffered queue, then it will capture process’s buffered queue, then it will “ “pause” capturing new messages, until some pause” capturing new messages, until some messages are removed from the queue.messages are removed from the queue.

A message cannot be removed from the capture A message cannot be removed from the capture process’s queue until one of these items occurs:process’s queue until one of these items occurs:• The message is The message is appliedapplied at the destination at the destination• The message is The message is spilledspilled at the destination at the destination

Page 43: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Propagate Process Propagate Process BottlenecksBottlenecks

I have heard of a few “Type 2” propagate I have heard of a few “Type 2” propagate bottlenecks in some “extreme” environments.bottlenecks in some “extreme” environments.

Parameter changes to consider, to try to avoid Parameter changes to consider, to try to avoid propagate-related bottlenecks:propagate-related bottlenecks:

• Set the propagate parameter LATENCY to 0Set the propagate parameter LATENCY to 0• Set the propagate parameter Set the propagate parameter QUEUE_TO_QUEUE to TRUE (10.2 only)QUEUE_TO_QUEUE to TRUE (10.2 only)• Set the init.ora parameter _job_queue_interval to Set the init.ora parameter _job_queue_interval to

11• Set the init.ora parameter job_queue_processes Set the init.ora parameter job_queue_processes

to 4 or moreto 4 or more

Page 44: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Apply Process BottlenecksApply Process Bottlenecks

Apply process bottlenecks usually deal with the Apply process bottlenecks usually deal with the apply process not “keeping up” with the rate of apply process not “keeping up” with the rate of DML being executed on the source tables. DML being executed on the source tables.

I have generally only seen apply bottlenecks I have generally only seen apply bottlenecks with “batch” DML (as opposed to “OLTP” DML).with “batch” DML (as opposed to “OLTP” DML).

The three main areas to be concerned with The three main areas to be concerned with regarding apply process bottlenecks are:regarding apply process bottlenecks are:

- Commit frequency;Commit frequency;- Number of apply servers; Number of apply servers; - Other Streams parameters.Other Streams parameters.

Page 45: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Apply Process BottlenecksApply Process BottlenecksCommit FrequencyCommit Frequency

The number of rows affected per transaction has The number of rows affected per transaction has an enormous impact on apply performance:an enormous impact on apply performance:

• The number of rows affected per transaction The number of rows affected per transaction should not be too should not be too largelarge – due to spill, and to the – due to spill, and to the “ “one apply server per transaction” restriction.one apply server per transaction” restriction.

• The number of rows affected per transaction The number of rows affected per transaction should not be too should not be too smallsmall, either – due to DML , either – due to DML degradation, and to apply transaction overhead.degradation, and to apply transaction overhead.

From my experience, the optimal commit From my experience, the optimal commit frequency is frequency is aroundaround 500 rows. 500 rows.

Page 46: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Apply Process BottlenecksApply Process BottlenecksNumber of Apply ServersNumber of Apply Servers

The number of parallel apply servers also has a The number of parallel apply servers also has a large impact on the apply process’s performance.large impact on the apply process’s performance.

The number of parallel apply servers is set by the The number of parallel apply servers is set by the apply “PARALLELISM” parameter.apply “PARALLELISM” parameter. From my experience, for best throughput, the From my experience, for best throughput, the PARALLELISM parameter should be PARALLELISM parameter should be three timesthree times the number of CPUs on the host machine. the number of CPUs on the host machine.

Note: setting PARALLELISM that high can Note: setting PARALLELISM that high can potentially “eat up” potentially “eat up” allall of the CPU cycles on the of the CPU cycles on the host, during intensive DML jobs. host, during intensive DML jobs.

Page 47: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Apply Process BottlenecksApply Process BottlenecksOther Streams ParametersOther Streams Parameters

Several other Streams-related parameters can Several other Streams-related parameters can also have an effect on apply performance:also have an effect on apply performance:

• Apply process parameter _HASH_TABLE_SIZE: Apply process parameter _HASH_TABLE_SIZE: set this relatively high (such as 10000000), to set this relatively high (such as 10000000), to minimize “wait dependency” bottlenecks.minimize “wait dependency” bottlenecks.

• Apply process parameter Apply process parameter TXN_LCR_SPILL_THRESHOLD: set to be a little TXN_LCR_SPILL_THRESHOLD: set to be a little bit bit higherhigher than the maximum number of LCRs than the maximum number of LCRs generated per transaction, to prevent spill. generated per transaction, to prevent spill.

• Init.ora parameter aq_tm_processes: set to 1.Init.ora parameter aq_tm_processes: set to 1.

Page 48: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Apply Process BottlenecksApply Process Bottlenecks““Turning Off” StreamsTurning Off” Streams

As mentioned previously, apply bottlenecks As mentioned previously, apply bottlenecks generally occur with “batch”-style DML.generally occur with “batch”-style DML.

It is possible to configure a session so that It is possible to configure a session so that Streams will not capture Streams will not capture anyany DML from that DML from that session – and therefore, that session’s DML will session – and therefore, that session’s DML will not be propagated or applied.not be propagated or applied.

This technique allows batch-style DML to be run This technique allows batch-style DML to be run at at eacheach Streams database Streams database individuallyindividually – rather – rather than running the batch DML in one database, and than running the batch DML in one database, and then having that DML get replicated to all of the then having that DML get replicated to all of the other databases.other databases.

Page 49: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Apply Process BottlenecksApply Process Bottlenecks“Turning Off” Streams (cont.)“Turning Off” Streams (cont.)

“ “Turning off” Streams in a session is done by Turning off” Streams in a session is done by setting the session’s “Streams tag” to a setting the session’s “Streams tag” to a non-NULLnon-NULL value. Here is an example of turning Streams value. Here is an example of turning Streams offoff before a batch statement, and then back before a batch statement, and then back onon::

exec dbms_streams.set_tag (tag => hextoraw(’01’));

<batch DML statement>

commit;

exec dbms_streams.set_tag (tag => NULL);

Note: this technique will only work if the Note: this technique will only work if the “include_tagged_lcr” parameter, in the Streams “include_tagged_lcr” parameter, in the Streams capture rules, is set to FALSE. (FALSE is the capture rules, is set to FALSE. (FALSE is the default value for that parameter in Streams rules.)default value for that parameter in Streams rules.)

Page 50: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Apply Process BottlenecksApply Process BottlenecksApply Process Aborts!Apply Process Aborts!

If there are any If there are any triggerstriggers on any replicated tables, on any replicated tables, then the apply process can sometimes then the apply process can sometimes abortabort – and – and not restart – during periods of intense DML activity.not restart – during periods of intense DML activity.

These aborts happen most frequently when These aborts happen most frequently when one particular apply server becomes one particular apply server becomes “ “overloaded”, and has to run too muchoverloaded”, and has to run too much recursive SQL (triggers cause recursive SQL).recursive SQL (triggers cause recursive SQL).

This issue is caused by Oracle bug # 4712729.This issue is caused by Oracle bug # 4712729. This bug is apparently fixed in 10.2.0.3; andThis bug is apparently fixed in 10.2.0.3; and there are “backport” patches available for there are “backport” patches available for 10.2.0.1 and 10.2.0.2.10.2.0.1 and 10.2.0.2.

Page 51: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Streams “Spot Check” Streams “Spot Check” ScriptScript

The script called “ssc.sql” performs a “spot check” on a varietyThe script called “ssc.sql” performs a “spot check” on a varietyof Streams objects. Here is some sample output from it:of Streams objects. Here is some sample output from it:

********** Streams report for dbprod on 03/28/2007 15:33:54 **********

Capture process : STREAMS_CAPTURE_PROCESS CREATE LCR

Propagate process: STREAMS_PROPAGATE_PROCESS ENABLED

Apply reader : STREAMS_APPLY_PROCESS DEQUEUE MESSAGES

Apply server : STREAMS_APPLY_PROCESS EXECUTE TRANSACTIONApply server : STREAMS_APPLY_PROCESS IDLE

Apply coordinator: STREAMS_APPLY_PROCESS APPLYING Latency: 2

Buffered queue : STREAMS_CAPTURE_QUEUE Messages: 384 Spill msgs: 21Buffered queue : STREAMS_APPLY_QUEUE Messages: 4721 Spill msgs: 0

Spill table rows : 0

Total streams pool memory : 314572800Free streams pool memory : 183437184

*************** End of report ***************

Page 52: Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

Contact Information

Brian Keating: Brian Keating: [email protected]

Chris Lawson: Chris Lawson: [email protected]

This presentation, and the script mentioned in it, This presentation, and the script mentioned in it, are available for download on the Oracle are available for download on the Oracle Magician website:Magician website:

www.oraclemagician.com