mow2010: under the hood of oracle clusterware by alex gorbachev, pythian
Post on 29-Jan-2018
10.014 Views
Preview:
TRANSCRIPT
Under the Hoodof Oracle ClusterwareMiracle OpenWorld 2010
15-Apr-2010
Alex Gorbachev, The Pythian Group
© 2009/2010 Pythian
Alex Gorbachev
• CTO, The Pythian Group• Blogger• OakTable Network member• Oracle ACE Director• BattleAgainstAnyGuess.com• Vice-president, Oracle RAC SIG
2
© 2009/2010 Pythian
Why Companies Trust Pythian• Recognized Leader:• Global industry-leader in remote database administration services and consulting for Oracle,
Oracle Applications, MySQL and SQL Server
• Work with over 150 multinational companies such as Forbes.com, Fox Interactive media, and MDS Inc. to help manage their complex IT deployments
• Expertise:• One of the world’s largest concentrations of dedicated, full-time DBA expertise.
• Global Reach & Scalability:• 24/7/365 global remote support for DBA and consulting, systems administration, special
projects or emergency response
3
© 2009/2010 Pythian
Agenda
• Place of Clusterware in Oracle RAC
• Node membership and evictions
• Clusterware startup sequence
• Oracle Cluster Registry
• Resources Management and troubleshooting
• 11gR2 Grid Infrastructure
4
© 2009/2010 Pythian
Agenda
4
Nee
d to
mem
oriz
e
Understanding
Low
Hig
h
Shallow In-depth
The more you understand,
the less you need to memorize
© 2009/2010 Pythian
ArchitectureOS
Clusterware
Instance
ASM
VIPListener
Service
Shared storage
OCR Votingdisk
OS
Clusterware
Instance
ASM
VIPListener
Service
OS
Clusterware
Instance
ASM
VIPListener
Service
interconnectstorage access
5
© 2009/2010 Pythian
ArchitectureOS
Clusterware
Instance
ASM
VIPListener
Service
Shared storage
OCR Votingdisk
OS
Clusterware
Instance
ASM
VIPListener
Service
OS
Clusterware
Instance
ASM
VIPListener
Service
interconnectstorage access
5
© 2009/2010 Pythian
OS
Clusterware
6
© 2009/2010 Pythian
OS
Clusterware
CSSD
Cluster Synchronization Services
6
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD
Cluster Synchronization Services
Cluster Ready Services
6
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD
RACG
VIP
Cluster Synchronization Services
Cluster Ready Services
HA Framework scripts
6
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD
EVM
D
RACG
VIP
Cluster Synchronization Services
Cluster Ready Services
HA Framework scripts
Event Manager
6
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD
EVM
D
OPROCD
RACG
VIP
Cluster Synchronization Services
Cluster Ready Services
HA Framework scripts
Event Manager
Oracle Process Monitor
6
© 2009/2010 Pythian
OS
Clusterware
CRSD
EVM
D
RACG
VIP
CSSD
OPROCD
7
© 2009/2010 Pythian
OS
Clusterware
CRSD
EVM
D
RACG
VIP
CSSD
OPROCD
7
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
8
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
8
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
8
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
9
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
9
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
ShootTheOtherNodeInTheHead
9
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
10
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
11
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
11
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
11
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
11
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
AskTheOtherNodeToRebootItself (c) known quote
11
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
Votingdisk
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
interconnect
12
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
Votingdisk
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
interconnect
OCLSOMON
12
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
Votingdisk
interconnect
OCLSOMON
12
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
13
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
13
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
13
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
OPROCD
13
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
OPROCD
13
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
14
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
14
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
14
© 2009/2010 Pythian
CSSD
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
14
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
OS
Clusterware
CRSD EVM
D
OPROCD
RACG
VIP
CSSDinterconnect
Votingdisk
15
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
OS
Clusterware
CRSD EVM
D
OPROCD
RACG
VIP
CSSDinterconnect
15
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
OS
Clusterware
CRSD EVM
D
OPROCD
RACG
VIP
CSSDinterconnect
15
© 2009/2010 Pythian
CSSD CSSDinterconnect
15
© 2009/2010 Pythian
Evictions
16
© 2009/2010 Pythian
• Network heartbeat lost
Evictions
16
© 2009/2010 Pythian
• Network heartbeat lost• Voting disk access lost
Evictions
16
© 2009/2010 Pythian
• Network heartbeat lost• Voting disk access lost• CSSD is not healthy
Evictions
16
© 2009/2010 Pythian
• Network heartbeat lost• Voting disk access lost• CSSD is not healthy• OS is not healthy
• OPROCD - Unix, Windows, 11g Linux
• hangcheck-timer - 10g Linux
Evictions
16
© 2009/2010 Pythian
DEMONHB failure
• Simulate with “ifconfig eth1 down”• Both nodes notice the loss• Racing to evict each other
• from voting disk => 2 equal sub-clusters
• survives the one with the lowest leader #
• leader is the node with lowest # in sub-cluster
• Winner evicts another node• Setting kill-block in voting disk
• CSSD and OCLSOMON race to suicide
17
© 2009/2010 Pythian
NHB failure symptoms
• NHB failure on several nodes• ocssd.log
• Evicted node can contain other traces• maybe - syslog (Linux - /var/log/messages)
• maybe - oclsomon.log
• almost always - console
• Network is only *possible* root cause• check syslog, ifconfig, netstat
• Network engineering - switches logs
18
© 2009/2010 Pythian
DEMOCSSD is not healthy
• Simulate using kill -STOP <cssd.bin pid>• Another node observes NHB loss
• After misscount seconds => attempt eviction
• but CSSD is frozen and can’t commit suicide
• OCLSOMON detects CSSD timeout• Commit suicide
19
© 2009/2010 Pythian
OCSSD sick - symptoms
• Error in OCLSOMON.log• OCSSD log might be clean on evicted node• syslog might contain OCLSOMON diag. err.• Console often contains diag. err.
• Depending on syslogd settings
• Set diagwait to more that 3 for better diagnosability• 3 seconds is reboottime
• Increases risk of corruption
20
© 2009/2010 Pythian
DEMOhost sick - CPU stalled
• Simulate by pausing OPROCD• kill -STOP <oprocd pid>
• sleep 1 or 2
• kill -CONT <oprocd pid>
• oprocd.log• Usually nothing if node is reset
• Immediate reboot• Console might contain diag msg
21
© 2009/2010 Pythian
Killed by OPROCD - symptoms
• Hard to confirm (nothing in oprocd.log)• Console output often helps
• “SysRq: resetting” could be in syslog as well
• Root cause• Faulty hardware, drivers, caused by IO/network
• Kernel bugs, NTP bugs
• Investigate syslog messages
• Margin can be tuned• diagwait and reboottime CSSD parameters
22
© 2009/2010 Pythian
10g on Linux - hangcheck-timer
• Replaced by OPROCD in 11g and 10.2.0.4+• Most of the time useless and inactive!• Metalink Note 726833.1
• Updated 21-JUL-08!
• Oracle suggests to keep both• I would only leave OPROCD
• Metalink Note 567730.1• OPROCD in 10.2.0.4
23
© 2009/2010 Pythian
Killed by hangcheck-timer
• Rarely can be confirmed• “Hangcheck: hangcheck is restarting the machine”
• Can set hangcheck_dump_tasks to dump state
• See source code...
24
© 2009/2010 Pythian
• Linux & UNIX inittab• init.cssd
• init.evmd
• init.crsd
• Linux & UNIX init.d• init.crs
• Windows Services
Clusterware startup
25
© 2009/2010 Pythian
Daemons startup sequence
CSSD
EVMD
CRSD
Third-party clusterware
• Triggered• by init.crs from init.d sequence
• manually
26
© 2009/2010 Pythian
Startup in Linux & Unix[gorby@dime ~]$ ps -fe | grep 'init\.' | grep -v grep
root 6352 1 0 10:24 ... /bin/sh /etc/init.d/init.evmd run
root 6353 1 0 10:24 ... /bin/sh /etc/init.d/init.cssd fatal
root 6354 1 0 10:24 ... /bin/sh /etc/init.d/init.crsd run
root 7356 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd oprocd
root 7364 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd oclsomon
root 7383 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd daemon
[gorby@dime ~]$ tail -3 /etc/inittab
h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null
h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null
[gorby@dime ~]$ ls -l /etc/rc3.d/S96init.crs
lrwxrwxrwx 1 root root 20 Aug 1 23:51 /etc/rc3.d/S96init.crs -> /etc/init.d/init.crs
27
© 2009/2010 Pythian
t
Startup flow
28
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
t
Startup flow
28
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
t
/etc/oracle/scls_scr/{host}/root/cssrunStartup flow
28
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
t
/etc/oracle/scls_scr/{host}/root/cssrunStartup flow
28
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
init.crs start
init.cssd autostart
t
/etc/oracle/scls_scr/{host}/root/cssrunStartup flow
28
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
init.crs start
init.cssd autostart
t
/etc/oracle/scls_scr/{host}/root/cssrun
/etc/oracle/scls_scr/{host}/root/crsstart• enable• disable
Startup flow
28
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
init.crs start
init.cssd autostart
t
/etc/oracle/scls_scr/{host}/root/cssrun
/etc/oracle/scls_scr/{host}/root/crsstart• enable• disable
Startup flow
28
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
init.cssd oprodc
init.cssd oclsomon
init.cssd daemon
init.cssd oclsvmon
oprocd
oclsomon.bin
ocssd.bin
oclsvmon.bin
evmd.bin
t
/etc/oracle/scls_scr/{host}/root/cssrun
/etc/oracle/scls_scr/{host}/root/crsstart• enable• disable
crsd.bin
Startup flow
28
© 2009/2010 Pythian
DEMOStartup troubleshooting
• Check processes using “ps -fe | grep init”• Check syslog (/var/log/messages)
• Can point to /tmp/crsctl.#####
• Remember boot sequence• Clusterware log files
• if *.bin processes are running already
• crsctl• crsctl check crs/cssd/crsd/evmd
29
© 2009/2010 Pythian
Log files
• log/{host}/cssd/ocssd.log• log/{host}/cssd/oclsomon/ocslmon.log
• ocslmon.ba1, ocslmon.ba2,...
• /etc/oracle/oprocd/{host}.oprocd.log• {host}.oprocd.log.{timestamp}
• syslog• Linux /var/log/messages
• Solaris /var/adm/log
• Console logs
30
© 2009/2010 Pythian
Windows world
• OPROCD = OraFenceService• EVMD = OracleEVMService• CRSD = OracleCRService• CSSD = OracleCSService• OPMD
• Oracle Process Manager Daemon
• Start trigger like init.crs in *nix
• registered with Windows Service Control Manager (WSCM) and delay start by 60 seconds
31
© 2009/2010 Pythian
OS
Clusterware
CRSD
EVM
D
RACG
VIP
CSSD
OPROCD
32
• Passing clusterware events
• Usually not a problem• Verify
• evmwatch -A
• evmpost -u "my message"
© 2009/2010 Pythian
OS
Clusterware
CRSD
EVMD
RACG
VIP
CSSD
OPROCD
32
• Passing clusterware events
• Usually not a problem• Verify
• evmwatch -A
• evmpost -u "my message"
© 2009/2010 Pythian
OS
Clusterware
CRSD
EVM
D
RACG
VIP
CSSD
OPROCD
33
© 2009/2010 Pythian
• CRSD manages cluster resources• Stop / Start
• Failover
• VIP management
• New resources and etc.
• RACG helper scripts
OS
Clusterware
CRSD
EVM
D
RACG
VIP
CSSD
OPROCD
33
© 2009/2010 Pythian
• After CSSD and EVMD• Re-spawned on failure
• No eviction
• Runs as root• VIP control
• OCR management
• root ulimits are in place!
• Can run resources owned by any user
• owner is the property of a resource
CRSD startup
34
© 2009/2010 Pythian
Oracle Cluster Registry
• Repository for all configuration data• Except OCR location itself
• OCR is accessed mostly read-only• Every component reads OCR
• OCR is written only by CRS• only from a single OCR master node
### crsd.log ###
2008-08-02 22:23:50.958: [ OCRMAS] [3065154448]th_master:13:I AM THE NEW OCR MASTER at incar 12. Node Number 1
35
© 2009/2010 Pythian
CRS resources
• Standard Oracle resources• ASM
• Listener
• VIP
• Database and Instance
• etc..
• srvctl => manages Oracle resources
• Custom user resources• crs_% => manages any resources
36
© 2009/2010 Pythian
CRS resource internals
• Unique name• Associated action script
• stop / start / check functions
• Other attributes• check frequency
• pre-requisites
• restart retries
• etc...
• All info stored in OCR
37
© 2009/2010 Pythian
DEMOResource profiles
• Use crs_stat [-t] to check status• Use crs_stat -p to check attributes• crs_* vs srvctl (like srvctl config ... -a)• Standard action scripts
• racgimon
• racgwrap / racgmain
• racgvip
• racgons
• usrvip
38
© 2009/2010 Pythian
DEMOOCR internals
• ocrcheck• ocrconfig
• used during install/ugrade
• backup OCR
• recover OCR
• ocrdump• txt or xml
39
© 2009/2010 Pythian
DEMOracgvip case study
• Check the script• Set env. vars and simulate the call• Use _USR_ORA_DEBUG=1 in the script
40
© 2009/2010 Pythian
Resources hierarchy
• 10.2.0.2 (?)• released dependency of
ASM and Instance on VIP
• If DB registered manually with srvctl• ASM dependency missing
DB
Instance
Nodeapps
GSD ONS
VIP
Listener
ASM
Only 10.1 and 10.2.0.1
CS(Collective Service)
Service
41
© 2009/2010 Pythian
DB
Instance
Nodeapps
GSD ONS
VIP
Listener
ASM
Only 10.1 and 10.2.0.1
CS(Collective Service)
Service
Resources and Oracle homes
DB Home
ASM Home
CRS Home
Listener can be in ASM homeASM home can be Oracle home
Logs are in appropriate home
42
© 2009/2010 Pythian
DEMOtroubleshooting resources
• {home}/log/{host}/racg/{resource_name}.log • Old way - edit racgwrap
• Uncomment _USR_ORA_DEBUG=1
• crsctl debug log res ‘{res_name}:{0|1}’• crs_stat -p | grep DEBUG
• Run “srvctl start ...” manually• SRVM_TRACE=TRUE
43
© 2009/2010 Pythian
Troubleshooting summary
• crsctl check crs | crsd | cssd | evmd• crs_stat [-t]• crs_stat -p [{res_name}]• crsctl debug log css | crs | evm | res• crsctl lsmodules css | crs | evm• crs_stop {res_name} [-f] (stop force resource)• ocrdump• See scripts
44
© 2009/2010 Pythian
Troubleshooting flow
• Is Clusterware up?• Is Oracle resources up?
• Listener & VIP
• Database & ASM instance
• Services
• Did any nodes got rebooted?• Did any resources re-started?
• $ORA_CRS_HOME/log/{host}/crs/crsd.log
• $ORA_CRS_HOME/log/{host}/alert{host}.log
• MOS Note 265769.1 “Troubleshooting 10g and 11.1 Clusterware Reboots”
45
© 2009/2010 Pythian
Enter the 11gR2 World - Grid Infrastructure
46
© 2009/2010 Pythian
Enter the 11gR2 World - Grid Infrastructure
46
Oracle Clusterware Administration and Deployment Guide
© 2009/2010 Pythian
Enter the 11gR2 World - Grid Infrastructure
47
My Oracle Support Note 1053147.1
© 2009/2010 Pythian
11g Grid Infrastructure Documentation
• Oracle Clusterware Administration and Deployment Guide• MOS Note 1053147.1
• 11gR2 Clusterware and Grid Home - What You Need to Know
• MOS Note 1050908.1• How to Troubleshoot Grid Infrastructure Startup Issues
• MOS Note 1053970.1• Troubleshooting 11.2 Grid Infastructure Installation Root.sh Issues
• MOS Note 1050693.1• Troubleshooting 11.2 Clusterware Node Evictions (Reboots)
48
© 2009/2010 Pythian
11gR2 Node Evictions
• Same as in 10g + member kill escalation• LMON process may request CSS to remove an instance from the
cluster via the instance eviction mechanism. If this times out it could escalate to a node kill.
• Processes evicting• CSSD
• CSSDAGENT
• CSSDMONITOR
49
© 2009/2010 Pythian
Questions?
Thank you!
gorbachev@pythian.com
http://www.pythian.com/
top related