looking at rac, gi/clusterware diagnostic tools

28

Upload: leighton-nelson

Post on 11-May-2015

6.105 views

Category:

Technology


5 download

DESCRIPTION

RAC and Clusterware are complex environments to administer and even more so when there are problems. Learn about various tools and utilities which can be used to troubleshoot, instrument and diagnose these problems.

TRANSCRIPT

Page 1: Looking at RAC,   GI/Clusterware Diagnostic Tools
Page 2: Looking at RAC,   GI/Clusterware Diagnostic Tools

Leighton L. NelsonOracle DBA Team Lead (10 yrs experience, 6 years with RAC)RAC SIG US Events Chair and IOUG Liaison

Session# 373

Looking at RAC, GI/Clusterware Diagnostic Tools

Page 3: Looking at RAC,   GI/Clusterware Diagnostic Tools

Clusterware & RAC is Complex!

Page 4: Looking at RAC,   GI/Clusterware Diagnostic Tools

Where do I begin?

Page 5: Looking at RAC,   GI/Clusterware Diagnostic Tools

Clusterware, ASM & RAC Diagnostics

•Diagcollection

•Cluster Verification Utility (cluvfy)

•Cluster Health Monitor (CHM)

•Remote Diagnostics Agent (RDA)

•ADRCI/Support Workbench

•OS Utilities

Page 6: Looking at RAC,   GI/Clusterware Diagnostic Tools

Diagcollection

• Gathers and packages Clusterware logs, traces plus OS logs and core files*

• $ORA_CRS_HOME/bin/diagcollection.pl --collect --crshome $ORA_CRS_HOME (10gR2)

• $GRID_HOME/bin/diagcollection.pl --collect --core|crs|all (11gR2)

• Logs can be filtered by date/time with --adr --beforetime --aftertime

• Allocate enough space in current directory for diagnostic files• Needs to be run on all nodes in the cluster.• Limited information collected if not run as root• In 11.2 diagcollection enhanced to collect ADR and CHM data

Page 7: Looking at RAC,   GI/Clusterware Diagnostic Tools

diagcollection example[root@oelgrid02 u02]# /u01/app/11.2.0/grid/bin/diagcollection.sh --collect

Production Copyright 2004, 2010, Oracle. All rights reserved

Cluster Ready Services (CRS) diagnostic collection tool

The following CRS diagnostic archives will be created in the local directory:

crsData_oelgrid02_20120225_1723.tar.gz -> logs, traces and cores from CRS home. Note: core files will be packaged only with the --core option.

ocrData_oelgrid02_20120225_1723.tar.gz -> ocrdump, ocrcheck etc

coreData_oelgrid02_20120225_1723.tar.gz -> contents of CRS core files in text format

osData_oelgrid02_20120225_1723.tar.gz -> logs from Operating System

Collecting crs data

Page 8: Looking at RAC,   GI/Clusterware Diagnostic Tools

Cluster Verification Utility

• Cluvfy runs in stage mode or component mode

• Can be executed from the Grid Infrastructure Home in 11gR2 or from installation media

• New resource in 11.2.0.2.0 - ora.cvu

• “cluvfy comp –list” displays components that can be checked

• For standalone cluvfy set CV_HOME CV_JDKHOME and CV_DESTLOC

Page 9: Looking at RAC,   GI/Clusterware Diagnostic Tools

Cluster Verification Utility

•Use stage mode during installation/upgrade•Use component mode to diagnose components after Clusterware

installation•Doesn’t diagnose all components e.g. HAIP•$GRID_HOME/bin/cluvfy•$INSTALL_DISK/runcluvfy.sh

•New in 11.2.0.3.0 :

cluvfy comp healthcheck

Page 10: Looking at RAC,   GI/Clusterware Diagnostic Tools

Cluster Verification Utility

cluvfy comp –list output

Page 11: Looking at RAC,   GI/Clusterware Diagnostic Tools

Cluster Health Monitor (CHM)

• Cluster Health Monitor (CHM) monitors and collect OS and clusterware metrics in real-time

• Installed by default in 11.2.0.2+

• Collects metrics at 1 sec interval in 11.2.0.2 and 5 sec interval in 11.2.0.3

• Command Line Interface $GRID_HOME/bin/oclumon

• Collects CHM data using diagcollection.pl --collect --chmos

Page 12: Looking at RAC,   GI/Clusterware Diagnostic Tools

Cluster Health Monitor (CHM)

• Useful for troubleshooting root cause analysis - node reboots/hangs, instance evictions, performance degradations etc

• OTN version of CHM and 11.2.0.2 version are incompatible. If you have 11.2.0.2 then you cannot install OTN version.

• Uses OS API to collect metrics reducing overhead• Clusterware resource called ora.crf• CHM doesn’t require RAC or Clusterware

Page 13: Looking at RAC,   GI/Clusterware Diagnostic Tools

OS Watcher Black Box

• OS Watcher v4.0 has been renamed to OS Watcher Black Box (OSWbb)

• UNIX shell scripts for monitoring the OS (ps, top, mpstat, iostat, netstat, vmstat)

• Useful for diagnosing OS resource and performance problems, node reboots

• Should run on all nodes in a cluster

• Setup private interconnect monitoring

• Execute startOSWbb.sh arg1 arg2 where arg1=collection frequency and arg2=retention time

nohup ./startOSWbb.sh 60 48 &

Page 14: Looking at RAC,   GI/Clusterware Diagnostic Tools

OS Watcher Black Box

• Bundled with OS Watcher Black Box Analyzer (OSWbba)

• Requires Java 1.4.2 or greater

• Correlate OS statistics using the analyzer profile

• Generates graphs and reports for memory, cpu, disk

• Use CLI option to script profile generation for troubleshooting

Page 15: Looking at RAC,   GI/Clusterware Diagnostic Tools

OS Watcher Black Box

Page 16: Looking at RAC,   GI/Clusterware Diagnostic Tools

OS Watcher Black BoxOSWbb Free Memory Graph

Page 17: Looking at RAC,   GI/Clusterware Diagnostic Tools

RACcheck – RAC Configuration Audit Tool

• RACCHECK OUTPUT

Page 18: Looking at RAC,   GI/Clusterware Diagnostic Tools

RACcheck – RAC Configuration Audit Tool

• Assess the configuration of RAC, Clusterware and ASM

• Useful for pre-upgrade and post-upgrade system verification

• Uses “Best Practices” to report configuration problems – PASS/WARNING/FAIL/INFO

• Generates detailed and summary reports with scorecard

Page 19: Looking at RAC,   GI/Clusterware Diagnostic Tools

Remote Diagnostics Assistant

• The diagnostics tool recommended by MOS

• Collects a wealth of information based on configuration – OS/Clusterware/Database logs

• Runs AWR/Statspack report for Performance problems

• Generates reports in HTML format

Page 20: Looking at RAC,   GI/Clusterware Diagnostic Tools

Procwatcher

•Debug Oracle & Clusterware processes using oradebug short_stack or OS debugger (e.g. gdb, pstack)

•Run as Oracle process owner to debug database or as root for clusterware processes

•Can be deployed as a Clusterware resource

•Useful for troubleshooting session hangs, severe performance problems, instance evictions

Page 21: Looking at RAC,   GI/Clusterware Diagnostic Tools

Procwatchergrid@node1[+ASM1]-/u02 >./prw.sh start all

Wed Feb 25 02:30:26 CDT 2012: Starting Procwatcher

Wed Feb 25 02:30:26 CDT 2012: Thank you for using Procwatcher. :-)

Wed Feb 25 02:30:26 CDT 2012: Please add a comment to Oracle Support Note 459694.1

Wed Feb 25 02:30:26 CDT 2012: if you have any comments, suggestions, or issues with this tool.

Wed Feb 25 02:30:26 CDT 2012: Started Procwatcher

Page 22: Looking at RAC,   GI/Clusterware Diagnostic Tools

ADRCI/Support Workbench

• Automatic Diagnostic Repository (ADR) stores database diagnostic information

• Package diagnostics files using ADRCI or Support Workbench

• Manages incidents and problems from alert logs

• Enterprise Manager provides GUI interface to ADR called Support Workbench

Page 23: Looking at RAC,   GI/Clusterware Diagnostic Tools

ADRCI/Support Workbench

Page 24: Looking at RAC,   GI/Clusterware Diagnostic Tools

RACDIAG.SQL

• Gathers debug information for RAC Session Hangs

• One-time data capture

• Performs hanganalyze dumps

• Certain types of hangs will prevent it from running

Page 25: Looking at RAC,   GI/Clusterware Diagnostic Tools

OS Utilities

• truss/strace – trace system calls and signals

•pstack – dump stack trace for process

•pmap/procmap – maps process memory

•nmon/nmon analyzer – collects and analyzes OS stats

• collectl /collectl utils – collects and analyzes OS stats

Page 26: Looking at RAC,   GI/Clusterware Diagnostic Tools

SummaryTool/Utility Instance

EvictionsNode reboots Clusterware

ProblemsRAC Performance

diagcollection ✓ ✓ ✓ ✗

cluvfy ✗ ✗ ✓ ✗

CHM ✓ ✓ ✓ ✓

OSWbb/OSWbba

✓ ✓ ✓ ✓

RDA ✓ ✓ ✓ ✓

RACcheck ✓ ✓ ✓ ✗

Procwatcher ✓ ✗ ✓ ✓

ADRCI/SW ✗ ✗ ✗ ✓

Page 27: Looking at RAC,   GI/Clusterware Diagnostic Tools

MOS Notes

• OS Watcher Black Box User Guide [ID 301137.1]

• OS Watcher Black Box Analyzer User Guide [ID 461053.1]

• Data Gathering for Troubleshooting Oracle Clusterware (CRS or GI) Issues [ID 289690.1]

• CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide [ID 330358.1]

• Diagnosability for Oracle Clusterware (CRS or Grid Infrastructure) Component and Resource [ID 357808.1]

• Data Gathering for Troubleshooting RAC Issues [ID 556679.1]

• Cluster Health Monitor (CHM) FAQ [ID 1328466.1]

• Introducing Cluster Health Monitor (IPD/OS) [ID 736752.1]

• RACcheck - RAC Configuration Audit Tool [ID 1268927.1]

• Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes [ID 459694.1]

• Script to Collect RAC Diagnostic Information (racdiag.sql) [ID 135714.1]

Page 28: Looking at RAC,   GI/Clusterware Diagnostic Tools

Contact Information

•Website - blogs.griddba.com

•LinkedIn – Leighton Nelson

•Twitter - @leight0nn

•Email: [email protected]