mow2010: under the hood of oracle clusterware by alex gorbachev, pythian

Post on 29-Jan-2018

10.014 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Under the Hoodof Oracle ClusterwareMiracle OpenWorld 2010

15-Apr-2010

Alex Gorbachev, The Pythian Group

© 2009/2010 Pythian

Alex Gorbachev

• CTO, The Pythian Group• Blogger• OakTable Network member• Oracle ACE Director• BattleAgainstAnyGuess.com• Vice-president, Oracle RAC SIG

2

© 2009/2010 Pythian

Why Companies Trust Pythian• Recognized Leader:• Global industry-leader in remote database administration services and consulting for Oracle,

Oracle Applications, MySQL and SQL Server

• Work with over 150 multinational companies such as Forbes.com, Fox Interactive media, and MDS Inc. to help manage their complex IT deployments

• Expertise:• One of the world’s largest concentrations of dedicated, full-time DBA expertise.

• Global Reach & Scalability:• 24/7/365 global remote support for DBA and consulting, systems administration, special

projects or emergency response

3

© 2009/2010 Pythian

Agenda

• Place of Clusterware in Oracle RAC

• Node membership and evictions

• Clusterware startup sequence

• Oracle Cluster Registry

• Resources Management and troubleshooting

• 11gR2 Grid Infrastructure

4

© 2009/2010 Pythian

Agenda

4

Nee

d to

mem

oriz

e

Understanding

Low

Hig

h

Shallow In-depth

The more you understand,

the less you need to memorize

© 2009/2010 Pythian

ArchitectureOS

Clusterware

Instance

ASM

VIPListener

Service

Shared storage

OCR Votingdisk

OS

Clusterware

Instance

ASM

VIPListener

Service

OS

Clusterware

Instance

ASM

VIPListener

Service

interconnectstorage access

5

© 2009/2010 Pythian

ArchitectureOS

Clusterware

Instance

ASM

VIPListener

Service

Shared storage

OCR Votingdisk

OS

Clusterware

Instance

ASM

VIPListener

Service

OS

Clusterware

Instance

ASM

VIPListener

Service

interconnectstorage access

5

© 2009/2010 Pythian

OS

Clusterware

6

© 2009/2010 Pythian

OS

Clusterware

CSSD

Cluster Synchronization Services

6

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD

Cluster Synchronization Services

Cluster Ready Services

6

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD

RACG

VIP

Cluster Synchronization Services

Cluster Ready Services

HA Framework scripts

6

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD

EVM

D

RACG

VIP

Cluster Synchronization Services

Cluster Ready Services

HA Framework scripts

Event Manager

6

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD

EVM

D

OPROCD

RACG

VIP

Cluster Synchronization Services

Cluster Ready Services

HA Framework scripts

Event Manager

Oracle Process Monitor

6

© 2009/2010 Pythian

OS

Clusterware

CRSD

EVM

D

RACG

VIP

CSSD

OPROCD

7

© 2009/2010 Pythian

OS

Clusterware

CRSD

EVM

D

RACG

VIP

CSSD

OPROCD

7

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSDEVM

D

OPROCD

RACG

VIP

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

8

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSDEVM

D

OPROCD

RACG

VIP

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

8

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSDEVM

D

OPROCD

RACG

VIP

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

8

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSDEVM

D

OPROCD

RACG

VIP

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

9

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSDEVM

D

OPROCD

RACG

VIP

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

9

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSDEVM

D

OPROCD

RACG

VIP

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

ShootTheOtherNodeInTheHead

9

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSDEVM

D

OPROCD

RACG

VIP

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

10

© 2009/2010 Pythian

OS

Clusterware

CRSDEVM

D

OPROCD

RACG

VIP

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

CSSD

11

© 2009/2010 Pythian

OS

Clusterware

CRSDEVM

D

OPROCD

RACG

VIP

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

CSSD

11

© 2009/2010 Pythian

OS

Clusterware

CRSDEVM

D

OPROCD

RACG

VIP

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

CSSD

11

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

CSSD

11

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

CSSD

AskTheOtherNodeToRebootItself (c) known quote

11

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

Votingdisk

OS

Clusterware

CRSDEVM

D

OPROCD

RACG

VIP

CSSD

interconnect

12

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

Votingdisk

OS

Clusterware

CRSDEVM

D

OPROCD

RACG

VIP

CSSD

interconnect

OCLSOMON

12

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

Votingdisk

interconnect

OCLSOMON

12

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

CSSD

13

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

CSSD

13

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

CSSD

13

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

CSSD

OPROCD

13

© 2009/2010 Pythian

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

OPROCD

13

© 2009/2010 Pythian

OS

Clusterware

CRSDEVM

D

OPROCD

RACG

VIP

CSSD

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

14

© 2009/2010 Pythian

OS

Clusterware

CRSDEVM

D

OPROCD

RACG

VIP

CSSD

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

14

© 2009/2010 Pythian

OS

Clusterware

CRSDEVM

D

OPROCD

RACG

VIP

CSSD

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

14

© 2009/2010 Pythian

CSSD

OS

Clusterware

CSSD

CRSD EVM

D

OPROCD

RACG

VIP

interconnect

Votingdisk

14

© 2009/2010 Pythian

OS

Clusterware

CRSDEVM

D

OPROCD

RACG

VIP

CSSD

OS

Clusterware

CRSD EVM

D

OPROCD

RACG

VIP

CSSDinterconnect

Votingdisk

15

© 2009/2010 Pythian

OS

Clusterware

CRSDEVM

D

OPROCD

RACG

VIP

CSSD

OS

Clusterware

CRSD EVM

D

OPROCD

RACG

VIP

CSSDinterconnect

15

© 2009/2010 Pythian

OS

Clusterware

CRSDEVM

D

OPROCD

RACG

VIP

CSSD

OS

Clusterware

CRSD EVM

D

OPROCD

RACG

VIP

CSSDinterconnect

15

© 2009/2010 Pythian

CSSD CSSDinterconnect

15

© 2009/2010 Pythian

Evictions

16

© 2009/2010 Pythian

• Network heartbeat lost

Evictions

16

© 2009/2010 Pythian

• Network heartbeat lost• Voting disk access lost

Evictions

16

© 2009/2010 Pythian

• Network heartbeat lost• Voting disk access lost• CSSD is not healthy

Evictions

16

© 2009/2010 Pythian

• Network heartbeat lost• Voting disk access lost• CSSD is not healthy• OS is not healthy

• OPROCD - Unix, Windows, 11g Linux

• hangcheck-timer - 10g Linux

Evictions

16

© 2009/2010 Pythian

DEMONHB failure

• Simulate with “ifconfig eth1 down”• Both nodes notice the loss• Racing to evict each other

• from voting disk => 2 equal sub-clusters

• survives the one with the lowest leader #

• leader is the node with lowest # in sub-cluster

• Winner evicts another node• Setting kill-block in voting disk

• CSSD and OCLSOMON race to suicide

17

© 2009/2010 Pythian

NHB failure symptoms

• NHB failure on several nodes• ocssd.log

• Evicted node can contain other traces• maybe - syslog (Linux - /var/log/messages)

• maybe - oclsomon.log

• almost always - console

• Network is only *possible* root cause• check syslog, ifconfig, netstat

• Network engineering - switches logs

18

© 2009/2010 Pythian

DEMOCSSD is not healthy

• Simulate using kill -STOP <cssd.bin pid>• Another node observes NHB loss

• After misscount seconds => attempt eviction

• but CSSD is frozen and can’t commit suicide

• OCLSOMON detects CSSD timeout• Commit suicide

19

© 2009/2010 Pythian

OCSSD sick - symptoms

• Error in OCLSOMON.log• OCSSD log might be clean on evicted node• syslog might contain OCLSOMON diag. err.• Console often contains diag. err.

• Depending on syslogd settings

• Set diagwait to more that 3 for better diagnosability• 3 seconds is reboottime

• Increases risk of corruption

20

© 2009/2010 Pythian

DEMOhost sick - CPU stalled

• Simulate by pausing OPROCD• kill -STOP <oprocd pid>

• sleep 1 or 2

• kill -CONT <oprocd pid>

• oprocd.log• Usually nothing if node is reset

• Immediate reboot• Console might contain diag msg

21

© 2009/2010 Pythian

Killed by OPROCD - symptoms

• Hard to confirm (nothing in oprocd.log)• Console output often helps

• “SysRq: resetting” could be in syslog as well

• Root cause• Faulty hardware, drivers, caused by IO/network

• Kernel bugs, NTP bugs

• Investigate syslog messages

• Margin can be tuned• diagwait and reboottime CSSD parameters

22

© 2009/2010 Pythian

10g on Linux - hangcheck-timer

• Replaced by OPROCD in 11g and 10.2.0.4+• Most of the time useless and inactive!• Metalink Note 726833.1

• Updated 21-JUL-08!

• Oracle suggests to keep both• I would only leave OPROCD

• Metalink Note 567730.1• OPROCD in 10.2.0.4

23

© 2009/2010 Pythian

Killed by hangcheck-timer

• Rarely can be confirmed• “Hangcheck: hangcheck is restarting the machine”

• Can set hangcheck_dump_tasks to dump state

• See source code...

24

© 2009/2010 Pythian

• Linux & UNIX inittab• init.cssd

• init.evmd

• init.crsd

• Linux & UNIX init.d• init.crs

• Windows Services

Clusterware startup

25

© 2009/2010 Pythian

Daemons startup sequence

CSSD

EVMD

CRSD

Third-party clusterware

• Triggered• by init.crs from init.d sequence

• manually

26

© 2009/2010 Pythian

Startup in Linux & Unix[gorby@dime ~]$ ps -fe | grep 'init\.' | grep -v grep

root 6352 1 0 10:24 ... /bin/sh /etc/init.d/init.evmd run

root 6353 1 0 10:24 ... /bin/sh /etc/init.d/init.cssd fatal

root 6354 1 0 10:24 ... /bin/sh /etc/init.d/init.crsd run

root 7356 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd oprocd

root 7364 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd oclsomon

root 7383 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd daemon

[gorby@dime ~]$ tail -3 /etc/inittab

h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null

h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null

h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null

[gorby@dime ~]$ ls -l /etc/rc3.d/S96init.crs

lrwxrwxrwx 1 root root 20 Aug 1 23:51 /etc/rc3.d/S96init.crs -> /etc/init.d/init.crs

27

© 2009/2010 Pythian

t

Startup flow

28

© 2009/2010 Pythian

init.crsd run

init.evmd run

init.cssd fatal

t

Startup flow

28

© 2009/2010 Pythian

init.crsd run

init.evmd run

init.cssd fatal

t

/etc/oracle/scls_scr/{host}/root/cssrunStartup flow

28

© 2009/2010 Pythian

init.crsd run

init.evmd run

init.cssd fatal

t

/etc/oracle/scls_scr/{host}/root/cssrunStartup flow

28

© 2009/2010 Pythian

init.crsd run

init.evmd run

init.cssd fatal

init.crs start

init.cssd autostart

t

/etc/oracle/scls_scr/{host}/root/cssrunStartup flow

28

© 2009/2010 Pythian

init.crsd run

init.evmd run

init.cssd fatal

init.crs start

init.cssd autostart

t

/etc/oracle/scls_scr/{host}/root/cssrun

/etc/oracle/scls_scr/{host}/root/crsstart• enable• disable

Startup flow

28

© 2009/2010 Pythian

init.crsd run

init.evmd run

init.cssd fatal

init.crs start

init.cssd autostart

t

/etc/oracle/scls_scr/{host}/root/cssrun

/etc/oracle/scls_scr/{host}/root/crsstart• enable• disable

Startup flow

28

© 2009/2010 Pythian

init.crsd run

init.evmd run

init.cssd fatal

init.cssd oprodc

init.cssd oclsomon

init.cssd daemon

init.cssd oclsvmon

oprocd

oclsomon.bin

ocssd.bin

oclsvmon.bin

evmd.bin

t

/etc/oracle/scls_scr/{host}/root/cssrun

/etc/oracle/scls_scr/{host}/root/crsstart• enable• disable

crsd.bin

Startup flow

28

© 2009/2010 Pythian

DEMOStartup troubleshooting

• Check processes using “ps -fe | grep init”• Check syslog (/var/log/messages)

• Can point to /tmp/crsctl.#####

• Remember boot sequence• Clusterware log files

• if *.bin processes are running already

• crsctl• crsctl check crs/cssd/crsd/evmd

29

© 2009/2010 Pythian

Log files

• log/{host}/cssd/ocssd.log• log/{host}/cssd/oclsomon/ocslmon.log

• ocslmon.ba1, ocslmon.ba2,...

• /etc/oracle/oprocd/{host}.oprocd.log• {host}.oprocd.log.{timestamp}

• syslog• Linux /var/log/messages

• Solaris /var/adm/log

• Console logs

30

© 2009/2010 Pythian

Windows world

• OPROCD = OraFenceService• EVMD = OracleEVMService• CRSD = OracleCRService• CSSD = OracleCSService• OPMD

• Oracle Process Manager Daemon

• Start trigger like init.crs in *nix

• registered with Windows Service Control Manager (WSCM) and delay start by 60 seconds

31

© 2009/2010 Pythian

OS

Clusterware

CRSD

EVM

D

RACG

VIP

CSSD

OPROCD

32

• Passing clusterware events

• Usually not a problem• Verify

• evmwatch -A

• evmpost -u "my message"

© 2009/2010 Pythian

OS

Clusterware

CRSD

EVMD

RACG

VIP

CSSD

OPROCD

32

• Passing clusterware events

• Usually not a problem• Verify

• evmwatch -A

• evmpost -u "my message"

© 2009/2010 Pythian

OS

Clusterware

CRSD

EVM

D

RACG

VIP

CSSD

OPROCD

33

© 2009/2010 Pythian

• CRSD manages cluster resources• Stop / Start

• Failover

• VIP management

• New resources and etc.

• RACG helper scripts

OS

Clusterware

CRSD

EVM

D

RACG

VIP

CSSD

OPROCD

33

© 2009/2010 Pythian

• After CSSD and EVMD• Re-spawned on failure

• No eviction

• Runs as root• VIP control

• OCR management

• root ulimits are in place!

• Can run resources owned by any user

• owner is the property of a resource

CRSD startup

34

© 2009/2010 Pythian

Oracle Cluster Registry

• Repository for all configuration data• Except OCR location itself

• OCR is accessed mostly read-only• Every component reads OCR

• OCR is written only by CRS• only from a single OCR master node

### crsd.log ###

2008-08-02 22:23:50.958: [ OCRMAS] [3065154448]th_master:13:I AM THE NEW OCR MASTER at incar 12. Node Number 1

35

© 2009/2010 Pythian

CRS resources

• Standard Oracle resources• ASM

• Listener

• VIP

• Database and Instance

• etc..

• srvctl => manages Oracle resources

• Custom user resources• crs_% => manages any resources

36

© 2009/2010 Pythian

CRS resource internals

• Unique name• Associated action script

• stop / start / check functions

• Other attributes• check frequency

• pre-requisites

• restart retries

• etc...

• All info stored in OCR

37

© 2009/2010 Pythian

DEMOResource profiles

• Use crs_stat [-t] to check status• Use crs_stat -p to check attributes• crs_* vs srvctl (like srvctl config ... -a)• Standard action scripts

• racgimon

• racgwrap / racgmain

• racgvip

• racgons

• usrvip

38

© 2009/2010 Pythian

DEMOOCR internals

• ocrcheck• ocrconfig

• used during install/ugrade

• backup OCR

• recover OCR

• ocrdump• txt or xml

39

© 2009/2010 Pythian

DEMOracgvip case study

• Check the script• Set env. vars and simulate the call• Use _USR_ORA_DEBUG=1 in the script

40

© 2009/2010 Pythian

Resources hierarchy

• 10.2.0.2 (?)• released dependency of

ASM and Instance on VIP

• If DB registered manually with srvctl• ASM dependency missing

DB

Instance

Nodeapps

GSD ONS

VIP

Listener

ASM

Only 10.1 and 10.2.0.1

CS(Collective Service)

Service

41

© 2009/2010 Pythian

DB

Instance

Nodeapps

GSD ONS

VIP

Listener

ASM

Only 10.1 and 10.2.0.1

CS(Collective Service)

Service

Resources and Oracle homes

DB Home

ASM Home

CRS Home

Listener can be in ASM homeASM home can be Oracle home

Logs are in appropriate home

42

© 2009/2010 Pythian

DEMOtroubleshooting resources

• {home}/log/{host}/racg/{resource_name}.log • Old way - edit racgwrap

• Uncomment _USR_ORA_DEBUG=1

• crsctl debug log res ‘{res_name}:{0|1}’• crs_stat -p | grep DEBUG

• Run “srvctl start ...” manually• SRVM_TRACE=TRUE

43

© 2009/2010 Pythian

Troubleshooting summary

• crsctl check crs | crsd | cssd | evmd• crs_stat [-t]• crs_stat -p [{res_name}]• crsctl debug log css | crs | evm | res• crsctl lsmodules css | crs | evm• crs_stop {res_name} [-f] (stop force resource)• ocrdump• See scripts

44

© 2009/2010 Pythian

Troubleshooting flow

• Is Clusterware up?• Is Oracle resources up?

• Listener & VIP

• Database & ASM instance

• Services

• Did any nodes got rebooted?• Did any resources re-started?

• $ORA_CRS_HOME/log/{host}/crs/crsd.log

• $ORA_CRS_HOME/log/{host}/alert{host}.log

• MOS Note 265769.1 “Troubleshooting 10g and 11.1 Clusterware Reboots”

45

© 2009/2010 Pythian

Enter the 11gR2 World - Grid Infrastructure

46

© 2009/2010 Pythian

Enter the 11gR2 World - Grid Infrastructure

46

Oracle Clusterware Administration and Deployment Guide

© 2009/2010 Pythian

Enter the 11gR2 World - Grid Infrastructure

47

My Oracle Support Note 1053147.1

© 2009/2010 Pythian

11g Grid Infrastructure Documentation

• Oracle Clusterware Administration and Deployment Guide• MOS Note 1053147.1

• 11gR2 Clusterware and Grid Home - What You Need to Know

• MOS Note 1050908.1• How to Troubleshoot Grid Infrastructure Startup Issues

• MOS Note 1053970.1• Troubleshooting 11.2 Grid Infastructure Installation Root.sh Issues

• MOS Note 1050693.1• Troubleshooting 11.2 Clusterware Node Evictions (Reboots)

48

© 2009/2010 Pythian

11gR2 Node Evictions

• Same as in 10g + member kill escalation• LMON process may request CSS to remove an instance from the

cluster via the instance eviction mechanism.  If this times out it could escalate to a node kill.

• Processes evicting• CSSD

• CSSDAGENT

• CSSDMONITOR

49

top related