rac troubleshooting and diagnosability sangam2016

65
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Troubleshooting and Diagnosing Oracle RAC in the Private Cloud Sandesh Rao, Senior Director , RAC Development

Upload: sandesh-rao

Post on 14-Apr-2017

265 views

Category:

Documents


14 download

TRANSCRIPT

Page 1: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

Troubleshooting and Diagnosing Oracle RAC in the Private CloudSandeshRao,SeniorDirector,RACDevelopment

Page 2: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

SafeHarborStatementThefollowingisintendedtooutlineourgeneralproductdirection.Itisintendedforinformationpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfunctionality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andtimingofanyfeaturesorfunctionalitydescribedforOracle’sproductsremainsatthesolediscretionofOracle.

Confidential– OracleRestricted 2

Page 3: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

Agenda

• ArchitecturalOverview• TroubleshootingScenarios• ProactiveandReactivetools• Q&A

Page 4: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

GridInfrastructureOverview

• OracleClusterwareisrequiredfor11gR2+RACdatabases• OracleClusterwarecanmanagenonRACdatabaseresourcesusingagents• OracleClusterwarecanmanageHAforanyBusinessCriticalApplicationwithagentinfrastructurealsocalledXAG–OraclepublishesBundledAgentsforsomenonRACDBresources

• SAP,GoldenGate,Siebel,Apache..

OracleGridInfrastructure

Page 5: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

GridInfrastructureOverview

• GridInfrastructureisthenameforthecombinationof:-–OracleClusterReadyServices(CRS)–OracleAutomaticStorageManagement(ASM)

• TheGridHomecontainsthesoftwareforbothproducts• CRScanalsobeStandaloneforASMand/orOracleRestart• CRScanrunbyitselforincombinationwithothervendorclusterware• GridHomeandRDBMShomemustbeinstalledindifferentlocations

– TheinstallerlockstheGridHomepathbysettingrootpermissions.

OracleGridInfrastructure

Page 6: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

GridInfrastructureOverview

• CRSrequiressharedOracleClusterRegistry(OCR)andVotingfiles–MustbeinASMorCFS–OCRbackedupevery4hoursautomaticallyGIHOME/cdata– Kept4,8,12hours,1day,1week– Restoredwithocrconfig– VotingfilebackedupintoOCRateachchange.– Votingfilerestoredwithcrsctl

OracleGridInfrastructure

Page 7: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

GridInfrastructureOverview

• FornetworkCRSrequires–One/multiplehighspeed,lowlatency,redundantprivatenetworkforinternodecommunications

– Thinkofinterconnectasamemorybackplaneforthecluster– Shouldbeaseparatephysicalnetwork ormanagedconvergednetwork– VLANSaresupported– Usedfor:-

• Clusterwaremessaging• RDBMSmessagingandblocktransfer• ASMmessaging• HANFSforblocktraffic

OracleGridInfrastructure

Page 8: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

GridInfrastructureOverview

• OnlyonesetofClusterwaredaemonscanrunoneachnode• TheCRSstackisspawnedfromOracleHAServicesDaemon(ohasd)• OnUnixohasd runsoutofinittab withrespawn• Anodecanbeevictedwhendeemedunhealthy

–MayrequirerebootbutatleastCRSstackrestart(rebootless restart)– IPMIintegrationordiskmon incaseofExadata

• CRSprovidesClusterTimeSynchronizationservices– Alwaysrunsbutinobservermodeifntpd configured

OracleGridInfrastructure

Page 9: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

GridInfrastructureProcesses11.2+Agentschangeeverything.• Multi-threadedDaemons• Managemultipleresourcesandtypes• Implementsentrypointsformultipleresourcetypes

– Start,stop check,clean,fail

• oraagent,orarootagent,applicationagent,scriptagent,cssdagent• SingleprocessstartedfrominitonUnix(ohasd)• Diagrambelowshowsallcoreresources

Page 10: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

GridInfrastructureProcesses

Level1

Level2a

Level2b

Level3

Level4a

Level4b

Level0

Page 11: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

GridInfrastructureProcessesInitScripts• /etc/init.d/ohasd (locationO/Sdependent)

– RCscriptwith“start”and“stop”actions– InitiatesOracleClusterware autostart– ControlfilecoordinateswithCRSCTL

• /etc/init.d/init.ohasd (locationO/Sdependent)–OHASDFrameworkScriptrunsfrominit/upstart– ControlfilecoordinateswithCRSCTL– NamedpipesyncswithOHASD

Page 12: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

GridInfrastructureProcesses

• Level1:OHASDSpawns:– cssdagent - AgentresponsibleforspawningCSSD– orarootagent- Agentresponsibleformanagingallrootownedohasd resources– oraagent - Agentresponsibleformanagingalloracleownedohasd resources– cssdmonitor - MonitorsCSSDandnodehealth(alongwiththecssdagent)

• Level2a:OHASDrootagent spawns:– CRSD- Primarydaemonresponsibleformanagingclusterresources.– CTSSD- ClusterTimeSynchronizationServicesDaemon– Diskmon (Exadata)– ACFS(ASMClusterFileSystem)Drivers

Page 13: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

GridInfrastructureProcesses

• Level2b:OHASDoraagent spawns:–MDNSD– MulticastDNSdaemon– GIPCD– GridIPCDaemon– GPNPD– GridPlugandPlayDaemon– EVMD– EventMonitorDaemon– ASM– ASMinstancestartedhereasmayberequiredbyCRSD

• Level3:CRSDspawns:– orarootagent - Agentresponsibleformanagingallrootownedcrsd resources.– oraagent - Agentresponsibleformanagingallnonroot ownedcrsd resources.

• OneisspawnedforeveryuserthathasCRSresourcestomanage.

Page 14: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

GridInfrastructureProcesses

• Level4:CRSDoraagent spawns:– ASMResouce - ASMInstance(s)resource(proxyresource)– Diskgroup- Usedformanaging/monitoringASMdiskgroups.– DBResource- UsedformonitoringandmanagingtheDBandinstances– SCANListener- Listenerforsingleclientaccessname,listeningonSCANVIP– Listener- NodelistenerlisteningontheNodeVIP– Services- Usedformonitoringandmanagingservices– ONS- OracleNotificationService– eONS - EnhancedOracleNotificationService(pre11.2.0.2)– GSD- For9ibackwardcompatibility– GNS(optional)- GridNamingService- Performsnameresolution

Startup Sequence

Page 15: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

TroubleshootingScenariosClusterStartup ProblemTriage(11.2+)

StartupSequence

ps –ef|grep init.ohasdps –ef|grep ohasd.bin Running?

YES

NO crsctl config crsohasd.log Obvious?

NO EngageOracleSupportEngageSysadminTeam

ClusterStartupDiagnosticFlow

TFACollector

ps –ef|grep cssdagentps –ef|grep ocssd.binps –ef|grep orarootagentps –ef|grep ctssd.binps –ef|grep crsd.binps –ef|grep cssdmonitorps –ef|grep oraagentps –ef|grep ora.asmps –ef|grep gpnpd.binps –ef|grep mdnsd.binps –ef|grep evmd.binCrsctl checkcrsCrsctl checkcluster

Running?

YES

NO

YES

EngageSysadminTeam

ohasd.logagentlogsprocesslogs

Obvious?YES

NO

EngageSysadminTeam

EngageOracleSupportSysadminTeam

TFACollectorohasd.logOLRperms

Comparereferencesystem

Obvious?YESNO

TFACollector EngageSysadminTeam

EngageOracleSupportSysadminTeam

Page 16: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• MulticastDomainNameServiceDaemon(mDNS(d))– UsedbyGridPlugandPlaytolocateprofilesinthecluster,aswellasbyGNStoperformnameresolution.ThemDNS processisabackgroundprocessonLinuxandUNIXandonWindows.

– Usesmulticastforcacheupdatesonserviceadvertisementarrival/departure.– Advertises/servesonallfoundnodeinterfaces.– LogisGI_HOME/log/<node>/mdnsd/mdnsd.log

Troubleshooting ScenariosCluster Startup Problem Triage

Page 17: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

<?xmlversion="1.0"encoding="UTF-8"?>

<gpnp:GPnP-ProfileVersion="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile"xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile"xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profilegpnp-profile.xsd"ProfileSequence="6" ClusterUId="b1eec1fcdd355f2bbf7910ce9cc4a228"ClusterName="staij-cluster"PALocation="">

<gpnp:Network-Profile><gpnp:HostNetwork id="gen"HostName="*"><gpnp:Network id="net1"IP=”192.168.1.0"Adapter="eth0"Use="public"/><gpnp:Network id="net2"IP=”192.168.2.0"Adapter="eth1“Use="cluster_interconnect"/></gpnp:HostNetworkcss"></gpnp:Network-Profile>

<orcl:CSS-Profileid="DiscoveryString="+asm"LeaseDuration="400"/>

<orcl:ASM-Profileid="asm"DiscoveryString=""SPFile="+SYSTEM/staij-cluster/asmparameterfile/registry.253.693925293"/>

<ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethodAlgorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><ds:SignatureMethodAlgorithm="http://www.w3.org/2001/10/xml-exc-c14n#"><InclusiveNamespaces xmlns="http://www.w3.org/2001/10/xml-exc-c14n#"PrefixList="gpnp orclxsi"/></ds:Transform></ds:Transforms><ds:DigestMethodAlgorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>x1H9LWjyNyMn6BsOykHhMvxnP8U=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>N+20jG4=</ds:SignatureValue></ds:Signature>

</gpnp:GPnP-Profile>

Troubleshooting ScenariosCluster Startup Problem Triage

Page 18: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• cssdagentandmonitor– Samefunctionalityinbothagentandmonitor– Functionalityofseveralpre-11.2daemonsconsolidatedinboth

• OPROCD– systemhang• OMON– oracleclusterwaremonitor• VMON– vendorclusterwaremonitor

– Runrealtime withlockeddownmemory,likeCSSD– Providesenhancedstabilityanddiagnosability– Logsare

• GI_HOME/log/<node>/agent/oracssdagent_root/oracssdagent_root.log• GI_HOME/log/<node>/agent/oracssdmonitor_root/oracssdmonitor_root.log• 12.1– ORACLE_BASE/diag/node/agent/..

Troubleshooting ScenariosCluster Startup Problem Triage

Page 19: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

NodeEvictions

EvictionScenario

Clusteralertocssd.log

NHB?1050693.11534949.11546004.1

Engagenetworking teamYES

NO

DHB?1549428.11466639.1 YES

NO

Obvious?

NO

YES

Obvious?

NO

YES

Fenced?Resourcestarvation

YES

NO

NOYES

NodeEvictionDiagnosticFlow

Troubleshooting Scenarios

ResourceStarvation?

NO

EngageOracleSupport

Engagesysadminteam

Engagestorageteam

1531223.11328466.1Systemlog

YES

Engageappropriate

team

Resolved?NO

YES

Freememory?CPUload?

NodeResponse?

TFACollector

TFACollector

Page 20: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

MissingNetworkHeartbeat(1)• ocssd.logfromnode1

• ===>sendingnetworkheartbeatsothernodes.Normally,thismessageisoutputonceevery5messages(seconds)

• 2016-08-1317:00:20.023:[CSSD][4096109472]clssnmSendingThread:sendingstatusmsg toallnodes

• 2016-08-1317:00:20.023:[CSSD][4096109472]clssnmSendingThread:sent5statusmsgs toallnodes

• ===>Thenetworkheartbeatisnotreceivedfromnode2(drrac2)for15consecutiveseconds.

• ===>Thismeansthat15networkheartbeatsaremissingandisthefirstwarning(50%threshold).

• 2016-08-1317:00:22.818:[CSSD][4106599328]clssnmPollingThread:nodedrrac2(2)at50%heartbeatfatal,removalin14.520seconds

• 2016-08-1317:00:22.818:[CSSD][4106599328]clssnmPollingThread:nodedrrac2(2)isimpendingreconfig,flag132108,misstime 15480

• ===>continuingtosendthenetworkheartbeatsandlogmessagesonceevery5messages

• 2016-08-1317:00:25.023:[CSSD][4096109472]clssnmSendingThread:sendingstatusmsg toallnodes

• 2016-08-1317:00:25.023:[CSSD][4096109472]clssnmSendingThread:sent5statusmsgs toallnodes

• ===>75%thresholdofmissingnetworkheartbeatisreached.Thisissecondwarning.

• 2016-08-1317:00:29.833:[CSSD][4106599328]clssnmPollingThread:nodedrrac2(2)at75%heartbeatfatal,removalin7.500seconds

Page 21: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

MissingNetworkHeartbeat(2)• ===>continuingtosendthenetworkheartbeatsandlogmessagesonceevery5messages

• 2016-08-1317:00:30.023:[CSSD][4096109472]clssnmSendingThread:sendingstatusmsg toallnodes

• 2016-08-1317:00:30.023:[CSSD][4096109472]clssnmSendingThread:sent5statusmsgs toallnodes

• ===>continuingtosendthenetworkheartbeats,butthemessageisloggedafter4messages

• 2016-08-1317:00:34.021:[CSSD][4096109472]clssnmSendingThread:sendingstatusmsg toallnodes

• 2016-08-1317:00:34.021:[CSSD][4096109472]clssnmSendingThread:sent4statusmsgs toallnodes

• ===>Lastwarningshowsthat90%thresholdofthemissingnetworkheartbeatisreached.

• ===>Theevictionwilloccurin2.49seconds.

• 2016-08-1317:00:34.841:[CSSD][4106599328]clssnmPollingThread:nodedrrac2(2)at90%heartbeatfatal,removalin2.490seconds,seedhbimpd 1

• ===>Evictionofnode2(drrac2)started

• 2016-08-1317:00:37.337:[CSSD][4106599328]clssnmPollingThread:Removalstartedfornodedrrac2(2),flags0x2040c,state3,wt4c0

• ===>Thisshowsthatthenode2isactivelyupdatingthevotingdisks

• 2016-08-1317:00:37.340:[CSSD][4085619616]clssnmCheckSplit:Node2,drrac2,isalive,DHB(1281744040,1396854)morethandisktimeoutof27000afterthelastNHB(1281744011,1367154)

Page 22: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

MissingNetworkHeartbeat(3)• ===>Evictingnode2(drrac2)

• 2016-08-1317:00:37.340:[CSSD][4085619616](:CSSNM00007:)clssnmrEvict:Evictingnode2,drrac2,fromtheclusterinincarnation169934272,nodebirthincarnation169934271,deathincarnation169934272,stateflags 0x24000

• ===>Reconfiguredtheclusterwithoutnode2

• 2016-08-1317:01:07.705:[CSSD][4043389856]clssgmCMReconfig:reconfigurationsuccessful,incarnation169934272with1nodes,localnodenumber1,masternodenumber1

Page 23: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

MissingNetworkHeartbeat(4)• ocssd.logfromnode2:• ===>Loggingthemessagetoindicate5networkheartbeatsaresenttoothernodes

• 2016-08-1317:00:26.009:[CSSD][4062550944]clssnmSendingThread:sendingstatusmsg toallnodes

• 2016-08-1317:00:26.009:[CSSD][4062550944]clssnmSendingThread:sent5statusmsgs toallnodes

• ===>Firstwarningofreaching50%thresholdofmissingnetworkheartbeats

• 2016-08-1317:00:26.213:[CSSD][4073040800]clssnmPollingThread:nodedrrac1(1)at50%heartbeatfatal,removalin14.540seconds

• 2016-08-1317:00:26.213:[CSSD][4073040800]clssnmPollingThread:nodedrrac1(1)isimpendingreconfig,flag394254,misstime 15460

• ===>Loggingthemessagetoindicate5networkheartbeatsaresenttoothernodes

• 2016-08-1317:00:31.009:[CSSD][4062550944]clssnmSendingThread:sendingstatusmsg toallnodes

• 2016-08-1317:00:31.009:[CSSD][4062550944]clssnmSendingThread:sent5statusmsgs toallnodes

• ===>Secondwarningofreaching75%thresholdofmissingnetworkheartbeats

• 2016-08-1317:00:33.227:[CSSD][4073040800]clssnmPollingThread:nodedrrac1(1)at75%heartbeatfatal,removalin7.470seconds

Page 24: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

MissingNetworkHeartbeat(5)• ===>Loggingthemessagetoindicate4networkheartbeatsaresent

• 2016-08-1317:00:35.009:[CSSD][4062550944]clssnmSendingThread:sendingstatusmsg toallnodes

• 2016-08-1317:00:35.009:[CSSD][4062550944]clssnmSendingThread:sent4statusmsgs toallnodes

• ===>Thirdwarningofreaching90%thresholdofmissingnetworkheartbeats

• 2016-08-1317:00:38.236:[CSSD][4073040800]clssnmPollingThread:nodedrrac1(1)at90%heartbeatfatal,removalin2.460seconds,seedhbimpd 1

• ===>Loggingthemessagetoindicate5networkheartbeatsaresenttoothernodes

• 2016-08-1317:00:40.008:[CSSD][4062550944]clssnmSendingThread:sendingstatusmsg toallnodes

• 2016-08-1317:00:40.009:[CSSD][4062550944]clssnmSendingThread:sent5statusmsgs toallnodes

• ===>Evictionstartedfornode1(drrac1)

• 2016-08-1317:00:40.702:[CSSD][4073040800]clssnmPollingThread:Removalstartedfornodedrrac1(1),flags0x6040e,state3,wt4c0

• ===>Node1isactivelyupdatingthevotingdisk,sothisisasplitbraincondition

• 2016-08-1317:00:40.706:[CSSD][4052061088]clssnmCheckSplit:Node1,drrac1,isalive,DHB(1281744036,1243744)morethandisktimeoutof27000afterthelastNHB(1281744007,1214144)

• 2016-08-1317:00:40.706:[CSSD][4052061088]clssnmCheckDskInfo:Mycohort:2

• 2016-08-1317:00:40.707:[CSSD][4052061088]clssnmCheckDskInfo:Survivingcohort:1

Page 25: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

MissingNetworkHeartbeat(6)• ===>Node2isabortingitselftoresolvethesplitbrainandensuretheclusterintegrity

• 2016-08-1317:00:40.707:[CSSD][4052061088](:CSSNM00008:)clssnmCheckDskInfo:Abortinglocalnodetoavoidsplitbrain.Cohortof1nodeswithleader2,drrac2,issmallerthancohortof1nodesledbynode1,drrac1,basedonmaptype2

• 2016-08-1317:00:40.707:[CSSD][4052061088]###################################

• 2016-08-1317:00:40.707:[CSSD][4052061088]clssscExit:CSSDaborting fromthreadclssnmRcfgMgrThread

• 2016-08-1317:00:40.707:[CSSD][4052061088]###################################

Page 26: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

MissingNetworkHeartbeat(7)• Observations1. Bothnodesreportedmissingheartbeatsatthesametime2. Bothnodessentheartbeatstoothernodesallthetime3. Node2aborteditselftoresolvesplitbrain

• Conclusion1. Thisislikelyanetworkproblem,engagenetworkteam2. CheckOSWatcher output(netstat andtraceroute)

1. Configureprivate.net file,notconfiguredbydefault

3. CheckCHMOS4. Checksystemlog

Page 27: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

VotingDiskAccessProblem(1)

ocssd.log:

===>Thefirsterrorindicatingthatitcouldnotreadvotingdisk-- firstmessagetoindicateaproblemaccessingthevotingdisk

2016-08-1318:31:19.787:[SKGFD][4131736480]ERROR:-9(Error27072,OSError(LinuxError:5:Input/outputerror

Additionalinformation:4

Additionalinformation:721425

Additionalinformation:-1)

)

2016-08-1318:31:19.787:[CSSD][4131736480](:CSSNM00060:)clssnmvReadBlocks:readfailedatoffset529of/dev/sdb8

2016-08-1318:31:19.802:[CSSD][4131736480]clssnmvDiskAvailabilityChange:votingfile/dev/sdb8nowoffline

Page 28: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

VotingDiskAccessProblem(2)====>Theerrormessagethatshowsaproblemaccessingthevotingdiskrepeatsonceevery4seconds

2016-08-1318:31:23.782:[CSSD][150477728]clssnmvDiskOpen:Opening/dev/sdb8

2016-08-1318:31:23.782:[SKGFD][150477728]Handle0xf43fc6c8fromlib:UFS::fordisk:/dev/sdb8:

2016-08-1318:31:23.782:[CLSF][150477728]Openedhdl:0xf4365708fordev:/dev/sdb8:

2016-08-1318:31:23.787:[SKGFD][150477728]ERROR:-9(Error27072,OSError(LinuxError:5:Input/outputerror

Additionalinformation:4

Additionalinformation:720913

Additionalinformation:-1)

)

2016-08-1318:31:23.787:[CSSD][150477728](:CSSNM00060:)clssnmvReadBlocks:readfailedatoffset17of/dev/sdb8

Page 29: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

VotingDiskAccessProblem(3)

====>Thelasterrorthatshowsaproblemaccessingthevotingdisk.

====>Notethatthelastmessageis200secondsafterthefirstmessage

====>becausethelongdisktimeout is200seconds

2016-08-1318:34:37.423:[CSSD][150477728]clssnmvDiskOpen:Opening/dev/sdb8

2016-08-1318:34:37.423:[CLSF][150477728]Openedhdl:0xf4336530fordev:/dev/sdb8:

2016-08-1318:34:37.429:[SKGFD][150477728]ERROR:-9(Error27072,OSError(LinuxError:5:Input/outputerror

Additionalinformation:4

Additionalinformation:720913

Additionalinformation:-1)

)

2016-08-1318:34:37.429:[CSSD][150477728](:CSSNM00060:)clssnmvReadBlocks:readfailedatoffset17of/dev/sdb8

Page 30: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

VotingDiskAccessProblem(4)====>Thismessageshowsthatocssd.bintriedaccessingthevotingdiskfor200seconds

2016-08-1318:34:38.205:[CSSD][4110736288](:CSSNM00058:)clssnmvDiskCheck:NoI/Ocompletionsfor200880msforvotingfile/dev/sdb8)

====>ocssd.binabortsitselfwithanerrormessagethatthemajorityofvotingdisksarenotavailable.Inthiscase,therewasonlyonevotingdisk,butifthreevotingdiskswereavailable,aslongastwovotingdisksareaccessible,ocssd.binwillnotabort.

2016-08-1318:34:38.206:[CSSD][4110736288](:CSSNM00018:)clssnmvDiskCheck:Aborting,0of1configuredvotingdisksavailable,need1

2016-08-1318:34:38.206:[CSSD][4110736288]###################################

2016-08-1318:34:38.206:[CSSD][4110736288]clssscExit:CSSDabortingfromthreadclssnmvDiskPingMonitorThread

2016-08-1318:34:38.206:[CSSD][4110736288]###################################

• ConclusionThevotingdiskwasnotavailable,engagestorageteam

Page 31: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• Timesynchronisationissue• ClusterTimeSynchronisationServicesdaemon

– ProvidestimemanagementinaclusterforOracle.• ObservermodewhenVendortimesynchronisations/wisfound

– LogstimedifferencetotheCRSalertlog• ActivemodewhennoVendortimesyncs/wisfound

Node Eviction TriageTroubleshooting Scenarios

Page 32: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• ClusterReadyServicesDaemon– TheCRSDdaemonisprimarilyresponsibleformaintainingtheavailabilityofapplicationresources,suchasdatabaseinstances.CRSDisresponsibleforstartingandstoppingtheseresources,relocatingthemwhenrequiredtoanothernodeintheeventoffailure,andmaintainingtheresourceprofilesintheOCR(OracleClusterRegistry).Inaddition,CRSDisresponsibleforoverseeingthecachingoftheOCRforfasteraccess,andalsobackinguptheOCR.

– LogfileisGI_HOME/log/<node>/crsd/crsd.log• Rotationpolicy10-50M• Retentionpolicy10logs• Dynamicin12.1andcanbechanged

Node Eviction TriageTroubleshooting Scenarios

Page 33: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• CRSDoraagent– CRSD’soraagent manages

• alldatabase,instance,serviceanddiskgroupresources• nodelisteners• SCANlisteners,andONS

– IftheGridInfrastructureownerisdifferentfromtheRDBMShomeownerthenyouwouldhave2oraagents eachrunningasoneoftheinstallationowners.Thedatabase,andserviceresourceswouldbemanagedbytheRDBMShomeownerandotherresourcesbytheGridInfrastructurehomeowner.

– Logfileis• GI_HOME/log/<node>/agent/crsd/oraagent_<user>/oraagent_<user>.log

Node Eviction TriageTroubleshooting Scenarios

Page 34: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• CRSDorarootagent– CRSD’srootagent manages

• GNSandit’sVIP• NodeVIP• SCANVIP• networkresources.

– Logfileis• GI_HOME/log/<node>/agent/crsd/orarootagent_root/oraagent_root.log

Node Eviction TriageTroubleshooting Scenarios

Page 35: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• Agentreturncodes– Checkentrymustreturnoneofthefollowingreturncodes:

• ONLINE• UNPLANNED_OFFLINE

– Target=online,mayberecoveredfailedover• PLANNED_OFFLINE• UNKNOWN

– Cannotdetermine,ifpreviouslyonline,partialthenmonitor• PARTIAL

– Someofaresourcesservicesareavailable.Instanceupbutnotopen.• FAILED

– Requirescleanaction

Node Eviction TriageTroubleshooting Scenarios

Page 36: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

§ Importantlogsandtraces§ 11.2– DatabasesonlyuseADR

• GridInfrastructurefilesin$GI_HOME/log/<node_name>/<component_name>– $GI_HOME/log/myHost/cssd– $GI_HOME/log/myHost/alertmyHost.log

§ 12.1– GridInfrastructureandDatabaseuseADR§ DifferentlocationsforGridInfrastructureandDatabases§ GridInfrastructure

• Alert.log,cssd.log,csrd.log,etc

§ Databases§ Alert.log,backgroundprocesstraces,foregroundprocesstraces

Automatic Diagnostic Repository (ADR)Troubleshooting Scenarios

Page 37: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• Whatifissuesweredetectedbeforetheyhadanimpact?

• Whatifyouwerenotifiedwithaspecificdiagnosisandcorrectiveactions?

• WhatifresourcebottlenecksthreateningSLAswereidentifiedearly?

• Whatifbottleneckscouldbeautomaticallyrelievedjustintime?

• Whatifdatabasehangsandnoderebootscouldbeeliminated?

37

Oracle’sDatabaseandClusterwareTools

Cluster Verification

Utility

ORAchkCluster Health

Monitor

Trace File Analyzer

Quality of Service

Management

Hang Manager

EXAchk

Cluster Health

Advisor

Memory Guard

Confidential– OracleRestricted

Page 38: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

–Automatedriskidentificationandproactivenotificationbeforebusinessisimpacted

–HealthChecksbasedonmostimpactfulreoccurringproblemsacrossOraclecustomerbase

–Runsinyourenvironment– noneedtosendanythingtoOracle

–ScheduledemailHealthCheckreports

–Findingscanbeintegratedintoothertoolsofchoice

Oracle EXAchk/Orachk (Proactive)

EngineeredSystems

NonEngineeredSystems

OracleEXAchk

OracleORAchk

CommonFramework

Lightweight&nonintrusiveOracleStackHealthChecks

38

Page 39: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

1. IncludedinbaseimageandlatestOEDA

2. DownloadlatestversionfromMyOracleSupport(install<1min)1070954.1

3. Autoupdatewhenlaterversionavailable

RollOut&MaintainEXAchk

39

Page 40: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

1. Downloadtheorachk.ziptoyourlocalmachinefromMOSNote1268927.2

2. TransfertoadirectoryonthetargetSystem

3. Unziporachk.zipo Asowneroforacle

databaseorgridhome

Installation

40

Page 41: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• Profilesprovidelogicalgroupingofchecks whichareaboutsimilartopics

• Runonlychecksinaspecificprofile

• Runeverythingexceptchecksinaspecificprofile

Profiles

./exachk –profile <profile>

./exachk –excludeprofile <profile>

Profile Descriptionasm ASMChecksavdf AuditVaultConfigurationchecks

clusterware Oracleclusterwarecheckscontrol_VM ChecksonlyforControlVM(ec1-vm,ovmm,db,pc1,pc2).

Nocrossnodecheckscorroborate Exadatachecksneedsfurtherreviewbyusertodetermine

passorfaildba DBAChecksebs OracleE-BusinessSuitechecks

eci_healthchecks EnterpriseCloudInfrastructureHealthchecksecs_healthchecks EnterpriseCloudSystemHealthchecks

goldengate OracleGoldenGatecheckshardware HardwarespecificchecksforOracleEngineeredsystems

maa MaximumAvailabilityArchitectureChecksovn OracleVirtualNetworking

platinum Platinumcertificationcheckspreinstall Pre-installationchecksprepatch Checkstoexecutebeforepatchingsecurity Securitychecks

solaris_cluster SolarisClusterChecksstorage OracleStorageServerChecksswitch Infinibandswitchchecks

sysadmin Sysadminchecksuser_defined_checks Runuserdefinedchecksfromuser_defined_checks.xml

41

Page 42: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• Profilesprovidelogicalgroupingofchecks whichareaboutsimilartopics

• Runonlychecksinaspecificprofile

• Runeverythingexceptchecksinaspecificprofile

Profiles

./orachk –profile <profile>

./orachk –excludeprofile <profile>

Profile Descriptionasm ASMChecks

bi_middleware OracleBusinessIntelligencechecksclusterware Oracleclusterware checks

dba DBAChecksebs OracleE-BusinessSuitechecks

emagent Cloudcontrolagentchecksemoms CloudControlmanagementserverem Cloudcontrolchecks

goldengate OracleGoldenGate checkshardware HardwarespecificchecksforOracleEngineeredsystems

oam OracleAccessManagerchecksoim OracleIdentifyManagerchecksoud OracleUnifiedDirectoryserverchecksovn OracleVirtualNetworking

peoplesoft Peoplesoft bestpracticespreinstall Pre-installationchecksprepatch Checkstoexecutebeforepatchingsecurity Securitycheckssiebel SiebelChecks

solaris_cluster SolarisClusterChecksstorage OracleStorageServerChecksswitch Infiniband switchchecks

sysadmin Sysadmin checksuser_defined_checks Runuserdefinedchecksfromuser_defined_checks.xml

42

Page 43: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

EnterpriseManagerIntegration

•CheckresultsintegratedintoEMcomplianceframeworkviaplugin

•ViewresultsinnativeEMcompliancedashboards

•Relatedchecksgroupedintocompliancestandards

•Viewtargetschecked,violations&averagescore

•Drilldownintocompliancestandardtoseeindividualcheckresults

•Viewbreakdownbytarget

43

Page 44: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• IntegrationisviatheEnterpriseManagerORAchkHealthchecksplugin withthefollowingSupport:

• Thefollowingprerequisitesmustbemetbeforeyoucandeploytheplug-in:o VerifythatyourEngineeredSystemshardwareandsoftwareareatthesupportedlevelasdescribedin SupportedHardwareandSoftwareVersions

o AllEngineeredSystemplug-insshouldbedeployed

o InfiniBand switchesandstoragecellsshouldbeanEnterpriseManager-managedtargetfortherespectiveengineeredsystem

o Expectpackageshouldbeinstalledonthehosts

EnterpriseManagerPluginPrerequisites

HardwareTypes SupportedByPlugin

Exadata(physicalconfigurationonly) YesExadata(virtualconfiguration) NoRecoveryappliance YesExalogic(physicalconfiguration) YesExalogic(virtualizedconfiguration) YesOracleSuperCluster NoOraclePrivateCloudMachine No

44

Page 45: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

JSONOutputtoIntegratewithKibana,ElasticSearchetc

45

Page 46: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

OracleHealthCheckCollectionManagerDashboard

46

Page 47: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

OracleStackCoverage• EngineeredSystems

• OracleExadataDatabaseMachine• OracleSuperCluster• OraclePrivateCloudAppliance• OracleDatabaseAppliance• OracleBigDataAppliance• OracleExalogicElasticCloud• OracleExalyticsIn-MemoryMachine• OracleZeroDataLossRecoveryAppliance• OracleZFSStorageAppliance

• ASR

• Systems• OracleSolaris• Crossstackchecks• SolarisCluster• OVN

• Oracle Database• StandaloneDatabase• GridInfrastructure&RAC• Maximum AvailabilityArchitecture(MAA)

Scorecard• Upgrade ReadinessValidation• Golden Gate

• EnterpriseManagerCloudControl• Repository• Agent• OMS

• Middleware• ApplicationContinuity• OracleIdentifyandAccessManagement

Suite(OracleIAM)

• E-BusinessSuite• OraclePayables• OracleWorkflow• OraclePurchasing• OracleOrderManagement• OracleProcessManufacturing• OracleReceivables• OracleFixedAssets• OracleHCM• OracleCRM• OracleProjectBilling

• Siebel• Databasebestpractices

• PeopleSoft• Databasebestpractices

• SAP

• EXAdatabestpractices

47

Page 48: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.| 48

GeneratesDiagnosticMetricsViewofClusterandDatabasesClusterHealthMonitor(CHM)

GIMR

ologgerd(master)

osysmond

osysmond

osysmond

osysmond

12cGridInfrastructureManagementRepository

• Alwayson- Enabledbydefault• ProvidesDetailed OSResourceMetrics• AssistsNodeevictionanalysis• Locallylogsallprocessdata• Usercandefinepinnedprocesses• ListenstoCSSandGIPCevents• Categorizesprocessesbytype• Supportsplug-incollectors(ex.traceroute,netstat,ping,etc.)

• NewCSVoutputforeaseofanalysis

OSData OSData

OSData

OSData

Confidential– OracleInternal/Restricted/HighlyRestrictedConfidential– OracleRestricted

Page 49: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.| 49

Oclumon CLIorFullIntegrationwithEMCloudControlClusterHealthMonitor(CHM)

Confidential– OracleInternal/Restricted/HighlyRestrictedConfidential– OracleRestricted

Page 50: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

WhyTFA?(Proactiveandreactive)

Providesoneinterfaceforalldiagnosticneeds

Collectsdataacrosstheclusterandconsolidatesitinoneplace

Collectsallrelevantdiagnosticdataatthetimeoftheproblem,withonlywhatisneededtodiagnosetheproblem

Reducestimerequiredtoobtaindiagnosticdata,whichsavesyourbusinessmoney

50OracleConfidential– Internal

Page 51: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• AllmajorOperatingSystems aresupported– Linux(OEL,RedHat,SUSE,Itanium&zLinux)

–OracleSolaris(SPARC&x86-64)– AIX– HPUX(Itanium&PA-RISC)–Windows

• AllOracleDatabase&Gridversions10.2+aresupported

• YouprobablyalreadyhaveTFAinstalledasitisincludedwith:–OracleGridInfrastructure:

• 11.2.0.4+• 12.1.0.2+• 12.2.0.1+

–OracleDatabase:• 12.2.0.1+

• AlsoavailablefromDoc1513912.251

SupportedPlatformsandVersions

Page 52: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

MonitoringByTFA&AutomatedCollections

52

Automaticallydetectevent

Collect&packagerelevant

diagnostics

NotifyrelevantDBAandorSysAdminby

email

UploadcollectiontoOracleSupportforfurtherhelp

Significantproblemoccurs

1

2

3

4

TFADBA(s)/SysAdmin(s)

OracleGridInfrastructure&Database(s)

Page 53: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• Trim&collectallimportantlogfilesupdatedinthepast12hours:

• CollectaproblemspecificServiceRequestDataCollection(SRDC):

53

Collect

tfactl diagcollect

• Collectionsstoredintherepository directory• Changediagcollecttimeframewith–since<n>h|d• Forlistoftypesofsrdc collectionsusetfactldiagcollect-srdc help

tfactl diagcollect -srdc ora600

Page 54: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

TFAdbglevel profiles• Example

– tfactl dbglevel -setnode_eviction

–wouldbeusedforenhancingdiagnosticswhennode evictions arethebeinginvestigatedandwouldperformthefollowingoperationinternally• crsctl setlogcss "CSSD=4"• crsctl setlogcss "CSSDNMC=4"• crsctl setlogcss "CLSF=4"• crsctl setlogcss "CSSDGMCC=4"• crsctl setlogcss "CSSDGMPC=4"

• Toreverttotheoriginalordefaultlogginglevelsthefollowingcommand– $tfactl dbglevel -unsetnode_eviction

• wouldperformthefollowingoperationsinternally• crsctl setlogcss "CSSD=2"• crsctl setlogcss "CSSDNMC=2"• crsctl setlogcss "CLSF=0"• crsctl setlogcss "CSSDGMCC=2"• crsctl setlogcss "CSSDGMPC=2"

• Inthiswayofsettingthelogginglevelsadegreeofautomationandsimplificationis

OracleConfidential– Internal/Restricted/HighlyRestricted 54

Page 55: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

LogManagement• TFAwillbelogmanagementinterface forsoftwarestack– Rotatelogs– Archivelogs– Purgeoldlogs

• Intelligentlogmanagementbasedonunderstandingofwhatisinlogsandwhatisstillimportant

OracleConfidential– Internal/Restricted/HighlyRestricted 55

TFALogManagement AllLogsAcross

SoftwareStack

Rotate

Archive

Purge

Actionthroughpredictionoruser

input

Page 56: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

SetEmailNotificationAddresses

56

tfactl set [email protected]

Automaticallydetectevent

Collect&packagerelevant

diagnostics

NotifyrelevantDBAandorSys

Adminbyemail

Uploadcollectionto

OracleSupportforfurtherhelp

Significantproblemoccurs

1

2

3

4

TFADBA(s)/SysAdmin(s)

OracleGridInfrastructure&Database(s)

tfactl set notificationAddress=oracle:[email protected]

• TFAcansendemailnotificationwhensignificantproblemsaredetected

• Tosetnotificationemailforanyproblemdetected:

• TosetnotificationemailforspecificORACLE_HOMEsincludetheOSowner:

OracleConfidential– Internal

Page 57: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• Analyzeallimportantrecentlogentries: • Searchrecentlogentries:

57

Analyze

tfactl analyze –since 1d tfactl analyze -search “ora-00600" -since 8h

OracleConfidential– Internal

Page 58: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• TFAincludesallkeydatabasesupporttools• tfactlprovidesasingleinterfacetothemall

Analyze

58

MostoftheseSupporttoolsareonlyavailableintheMyOracleSupportdownload,theyarenotincludedinthebaseGridorDatabaseinstall

Tool Description DetailsORAchk OracleStackHealthChecksonnon-engineered

systems1268927.2

EXAchk OracleStackHealthChecksonEngineeredSystems

1070954.1

oswatcher CollectandarchiveOSmetrics,usefulforinstance/nodeevictions&performanceIssues

301137.1

procwatcher Automate&capturedatabaseperformancediagnostics&sessionlevelhangs

459694.1

oratop Nearreal-timedatabasemonitoring 1500864.1sqlt CaptureSQLtracedateusefulfortuning 215187.1

alertsummary ProvidessummaryofeventsforoneormoredatabaseorASMalertfilesfromallnodes

ls ListsallfilesTFAknowsaboutforagivenfilenamepatternacrossallnodes

Tool Descriptionpstack Generateprocessstackforspecifiedprocessesacrossallnodes

grep Searchalertortracefileswithagivendatabaseandfilenamepattern,forasearchstring.

summary Highlevelsummaryoftheconfigurationvi Openalertortracefilesforviewingagivendatabaseandfile

namepatterninthevieditortail Runatailonanalertortracefilesforagivendatabaseandfile

namepatternparam ShowalldatabaseandOSparametersthatmatchaspecified

patterndbglevel SetandunsetmultipleCRStracelevelswithonecommandhistory Showtheshellhistoryforthetfactlshellchanges Reportanynotedchangesinthesystemsetupoveragiven

timeperiod.Thisincludesdatabaseaparameters,OSparameters,patchesappliedetc

OracleConfidential– Internal

Page 59: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

• Usesrdc <incidenttype>:• Tospecifysid use–sid <oraclesid>• Tospecifydatabaseuse–db<dbname>• Tospecifyincidentdate&timeuse–inc_date <YYYY-MM-DD>-inc_time <HH:MM:SS>

• TouploaddirectlytotheSRuse–sr<SR#>

• Fordbperf usetheseparameterstospecifythegood&badperformanceperiodstocompare:

59

IncidentBasedCollectionswithSRDC

tfactl srdc ora4030

IncidentType Descriptionora4030 ForORA-04030errorsora4031 ForORA-04031errorsdbperf Forbasicdbperformanceproblemsora600 For ORA-00600errorsora700 For ORA-00700errorsora7445 For ORA-07445errors

tfactl srdc ora4030 -sid orcl –db RDBMS121 \-inc_date 2016-06-15 -inc_time 02:48:23 \-sr 3-123456789

Parameter Descriptionperf_base_sd Startdateforagoodperformanceperiodperf_base_st Starttimeforagoodperformanceperiodperf_base_ed Enddateforagoodperformanceperiodperf_base_et Endtimeforagoodperformanceperiodperf_comp_sd Startdateforabadperformanceperiodperf_comp_st Starttimeforabadperformanceperiodperf_comp_ed Enddateforabadperformanceperiodperf_comp_et Endtimeforabadperformanceperiod

tfactl srdc dbperf –db RDBMS121 \–perf_base_sd 2016-06-15 –perf_base_st 01:30:00 \–perf_base_ed 2016-06-15 –perf_base_et 02:00:00 \–perf_comp_sd 2016-06-16 –perf_comp_st 09:30:00 \–perf_comp_ed 2016-06-16 –perf_comp_et 10:00:00

OracleConfidential– Internal

Page 60: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

ClusterHealthAdvisor(CHA)*DiscoversPotentialCluster&DBProblems- NotifieswithCorrectiveActions

60

OSData

GIMR

ochad

• Alwayson- Enabledbydefault• Detectsnodeanddatabaseperformanceproblems

• Provides early-warningalertsandcorrectiveaction

• Supports on-sitecalibrationtoimprovesensitivity

• Integrated intoEMCCIncidentManagerandnotifications

• StandaloneInteractiveGUITool

DBData

CHM

NodeHealth

PrognosticsEngine

DatabaseHealth

PrognosticsEngine

*RequiresandIncludedwithRACorR1NLicense

Confidential– OracleRestricted

Page 61: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.| 61

Oracle12cHangManager

• Alwayson- Enabledbydefault

• Reliablydetectsdatabasehangsanddeadlocks

• Autonomouslyresolvesthem

• SupportsQoSPerformanceClasses,RanksandPoliciestomaintainSLAs

• Logsalldetectionsandresolutions

• NewSQLinterfacetoconfiguresensitivity(Normal/High)andtracefilesizes

AutonomouslyPreservesDatabaseAvailabilityandPerformance Session

DIA0

EVALUATE

DETECT

ANALYZE

Hung?

VERIFY

Victim

QoSPolicy

Confidential– OracleRestricted

Page 62: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.| 62

FullResolutionDumpTraceFileandDBAlertLogAuditReportsOracle12cHangManager

Dump file …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trcOracle Database 12c Enterprise Edition Release 12.2.0.0.0 - 64bit BetaWith the Partitioning, Real Application Clusters, OLAP, Advanced Analyticsand Real Application Testing optionsBuild label: RDBMS_MAIN_LINUX.X64_151013ORACLE_HOME: …/3775268204/oracleSystem name: LinuxNode name: slc05kyrRelease: 2.6.39-400.211.1.el6uek.x86_64Version: #1 SMP Fri Nov 15 13:39:16 PST 2013Machine: x86_64VM name: Xen Version: 3.4 (PVM)Instance name: hm62Redo thread mounted by this instance: 2Oracle process number: 19Unix process pid: 12656, image: oracle@slc05kyr (DIA0)

*** 2015-10-13T16:47:59.541509+17:00*** SESSION ID:(96.41299) 2015-10-13T16:47:59.541519+17:00*** CLIENT ID:() 2015-10-13T16:47:59.541529+17:00*** SERVICE NAME:(SYS$BACKGROUND) 2015-10-13T16:47:59.541538+17:00*** MODULE NAME:() 2015-10-13T16:47:59.541547+17:00*** ACTION NAME:() 2015-10-13T16:47:59.541556+17:00*** CLIENT DRIVER:() 2015-10-13T16:47:59.541565+17:00

2015-10-13T16:47:59.435039+17:00Errors in file /oracle/log/diag/rdbms/hm6/hm6/trace/hm6_dia0_12433.trc (incident=7353):ORA-32701: Possible hangs up to hang ID=1 detectedIncident details in: …/diag/rdbms/hm6/hm6/incident/incdir_7353/hm6_dia0_12433_i7353.trc2015-10-13T16:47:59.506775+17:00DIA0 requesting termination of session sid:40 with serial # 43179 (ospid:13031) on instance 2

due to a GLOBAL, HIGH confidence hang with ID=1.Hang Resolution Reason: Automatic hang resolution was performed to free a

significant number of affected sessions.DIA0: Examine the alert log on instance 2 for session termination status of hang with ID=1.

In the alert log on the instance local to the session (instance 2 in this case), we see the following:

2015-10-13T16:47:59.538673+17:00Errors in file …/diag/rdbms/hm6/hm62/trace/hm62_dia0_12656.trc (incident=5753):ORA-32701: Possible hangs up to hang ID=1 detectedIncident details in: …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc

2015-10-13T16:48:04.222661+17:00DIA0 terminating blocker (ospid: 13031 sid: 40 ser#: 43179) of hang with ID = 1

requested by master DIA0 process on instance 1Hang Resolution Reason: Automatic hang resolution was performed to free a

significant number of affected sessions.by terminating session sid:40 with serial # 43179 (ospid:13031)

Confidential– OracleRestricted

Page 63: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|

Oracle12cDomainServicesCluster(DSC)

63

• HostsFrameworkasServices• Reduceslocalresourcefootprint• Centralizesmanagement• Speedsdeploymentandpatching• OptionalSharedStorage• Supportsmultipleversionsandplatformsgoingforward

DeployswithMinimumFootprintandMaximumManageability

ApplicationMemberCluster

DatabaseMemberCluster

DatabaseMemberCluster

OracleDomainServicesCluster

DatabaseMemberCluster

ApplicationMemberCluster

DatabaseMemberCluster

ORACLECLUSTERDOMAIN

Management Repository ServiceTrace File Analyzer ReceiverORAchk Collection ServiceGrid Names ServiceStorage ServicesRapid Home Provisioning Service

Confidential– OracleRestricted

Page 64: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.| 64

OracleDomainServicesCluster

OracleClusterDomain

IOServiceACFSServices

ASMService

DatabaseMemberCluster

UsesASMService

DatabaseMemberCluster

UsesIO&ASMServiceofDSC

MgmtRepository(GIMR)Service

ApplicationMemberCluster

GIonly

DatabaseMemberCluster

UseslocalASM

SharedASM

AdditionalOptionalServices

RapidHomeProvisioning

(RHP)Service

PrivateNetwork

SAN

NAS

Confidential– OracleRestricted

Page 65: RAC Troubleshooting and Diagnosability Sangam2016

Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.| 65Confidential– OracleRestricted