oracle clusterware 11gr2 - wordpress.com · oracle clusterware 11gr2 ukoug tebs 2010 frits hoogland...

84
Oracle clusterware 11gR2 UKOUG TEBS 2010 Frits Hoogland Tuesday, December 7, 2010

Upload: dinhtruc

Post on 15-Apr-2018

244 views

Category:

Documents


5 download

TRANSCRIPT

Oracle clusterware 11gR2UKOUG TEBS 2010

Frits Hoogland

Tuesday, December 7, 2010

Who  am  I?

Frits  Hoogland–Working  with  Oracle  products  since  1996–Working  with  VX  Company  since  2009Interests–Databases– ApplicaCon  servers–OperaCng  systems–Web  techniques,  TCP/IP,  network  security– Technical  security,  performanceBlog:  h6p://fritshoogland.wordpress.comEmail:  >[email protected]  ACE  DirectorOakTable  member  

Tuesday, December 7, 2010

Agenda Oracle Restart Hardware

requirements Software

requirements Shutdown Instance check by

clusterware Listener check by

clusterware Startup standalone3

init.ohasd Processes standalone Startup clustered Processes clustered Votedisk ASM & Clusterware Stability Q & A

Tuesday, December 7, 2010

This is an investigation into:

–Oracle Clusterware

–Oracle clusterware 11.2.0.1 aka 11gR2 - 32 bit–Oracle Enterprise Linux 5.4 - 32 bit

–Using: VMWare Fusion version 2.0.6 (196839)

4

Tuesday, December 7, 2010

Clusterware - Oracle restart Standalone install of clusterware

Starts the database and listener automatically–Feature is called ‘oracle restart’–Clusterware needs to be installed in separate home–Free!

5

Tuesday, December 7, 2010

Hardware requirements

6

Memory requirements

–Documentation (http://docs.oracle.com):- Grid/cluster: 1.5G- Grid/standalone: 1.0G

– Installer:- Grid/cluster: 1.5G- Grid/standalone: 1.5G

Tuesday, December 7, 2010

Hardware requirements

7

Network requirements

–Grid/standalone:- 1 network interface- Hostname resolvable

–Grid/cluster- 2 network interfaces- Hostnames resolvable

- hostname(s), vip, SCAN- SSH equivalence (*)

Tuesday, December 7, 2010

Software requirements

8

Operating system

–Use ‘yum’- Needs internet access.- See http://public-yum.oracle.com- Free service (no license required!)

– Install ‘oracle-validated’ package-# yum install oracle-validated

Tuesday, December 7, 2010

Shutdown Clusterware integrates with stop/start system

9

$ ls -l /etc/rc.d/init.d/*ohasd-rwxr-xr-x 1 root root 3105 Mar 12 13:01 /etc/rc.d/init.d/init.ohasd-rwxr-xr-x 1 root root 2616 Mar 12 13:01 /etc/rc.d/init.d/ohasd

Huh? Two control scripts for the HA daemon?

Tuesday, December 7, 2010

Shutdown Close investigation of the comments reveals

some information:

10

$ head -6 /etc/rc.d/init.d/init.ohasd #!/bin/sh## Copyright (c) 2001, 2009, Oracle and/or its affiliates. All rights reserved. ## init.ohasd - Control script for the Oracle HA services daemon# This script is invoked by the init system

Tuesday, December 7, 2010

Shutdown Close investigation of the comments reveals

some information:

11

$ head -6 /etc/rc.d/init.d/ohasd #!/bin/sh## Copyright (c) 2001, 2009, Oracle and/or its affiliates. All rights reserved. ## ohasd.sbs - Control script for the Oracle HA Services daemon# This script is invoked by the rc system.

Tuesday, December 7, 2010

Shutdown

ohasd - rc

12

$ tail -1 /etc/inittabh1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

$ /sbin/chkconfig --list ohasdservice ohasd does not support chkconfig

$ /sbin/chkconfig --list sendmailsendmail 0:off 1:off 2:on 3:on 4:on 5:on 6:off

$ /sbin/chkconfig --list doesnotexisterror reading information on service doesnotexist: No such file or directory

init.ohasd - init

Huh?

Tuesday, December 7, 2010

Shutdown It doesn’t use Redhats specific stop/start

implementation. It starts, though:

13

$ find /etc/rc.d -name "S*ohasd"../rc3.d/S96ohasd../rc5.d/S96ohasd

$ find /etc/rc.d -name "K*ohasd"../rc0.d/K19ohasd../rc6.d/K19ohasd../rc2.d/K19ohasd../rc4.d/K19ohasd../rc1.d/K19ohasd

And it is configured to stop:

....mind the ‘is configured’.....Tuesday, December 7, 2010

Shutdown But it will never stop...

–Leading to crash the clusterware and services(!) on shutdown

14

Redhats stop/start implementation requires:–A lock file– In /var/lock/subsys/–Otherwise it’s considered not started–And will not be stopped as a result of that–Lock-filename = name of stop/start script

identified as bug: 8740030

Tuesday, December 7, 2010

Example stop/start script

15

prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}

Tuesday, December 7, 2010

Example stop/start script

15

prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}

Tuesday, December 7, 2010

Example stop/start script

15

prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}

Tuesday, December 7, 2010

Example stop/start script

15

prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}

Tuesday, December 7, 2010

Example stop/start script

15

prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}

Tuesday, December 7, 2010

Example stop/start script

15

prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}

Tuesday, December 7, 2010

Example stop/start script

15

prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}

Tuesday, December 7, 2010

Example stop/start script

15

prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}

Tuesday, December 7, 2010

Example stop/start script

15

prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}

Tuesday, December 7, 2010

Shutdown

Is this really a problem?

16

Tuesday, December 7, 2010

Instance check by clusterware Applies to database instance and ASM instance

–CRS resource types:- ora.asm.type- ora.database.type

Checked every 1 second:

17

$ crsctl status resource ora.testdb.db -f | grep ^CHECK_INTERVALCHECK_INTERVAL=1$ crsctl status resource ora.asm -f | grep ^CHECK_INTERVALCHECK_INTERVAL=1

Tuesday, December 7, 2010

Instance check by clusterware Two checks are done:

1. Check for pmon background process- Via linux’ proc filesystem: /proc/<PID>/stat file

2. Check for instance status via ‘health check file’- File: $ORACLE_HOME/dbs/hc_sid.dat

This means:–There is a negligible impact on the instance–Whilst a detailed status is known

18

Tuesday, December 7, 2010

Instance check by clusterware Because clusterware reads hc_sid.dat

– It knows if stop/start is user initiated–Modifies resource target status accordingly

19

Tuesday, December 7, 2010

Listener check by clusterware Applies to listener

–CRS resource type:- ora.listener.type

Checked every 60 seconds:

20

$ crsctl status res ora.LISTENER.lsnr -f | grep ^CHECK_INTERVALCHECK_INTERVAL=60

Tuesday, December 7, 2010

Listener check by clusterware One check is done:

1. The command ‘lsnrctl status’ is issued- Returncode of ‘lsnrctl status’ command is used

Listener notifies clusterware of start and stop–Listener.log: ‘Listener completed notification to CRS on start/stop’

–Done through socket- ‘/var/tmp/.oracle/sCRSD_UI_SOCKET’

–Cluster modifies resource target accordingly

21

Tuesday, December 7, 2010

Startup standalone

(Clusterware Administration and Deployment guide 11gr2, 1 Introduction, Overview)

22

Tuesday, December 7, 2010

init.ohasd

23

init

/etc/rc.d/init.d/init.ohasd

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

Tuesday, December 7, 2010

init.ohasd

23

init

/etc/rc.d/init.d/init.ohasd

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun

Tuesday, December 7, 2010

init.ohasd

23

init

/etc/rc.d/init.d/init.ohasd

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun

2. ohasdrun = restart?

Tuesday, December 7, 2010

init.ohasd

23

init

/etc/rc.d/init.d/init.ohasd

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun

2. ohasdrun = restart?no

Tuesday, December 7, 2010

/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe

init.ohasd

23

init

/etc/rc.d/init.d/init.ohasd

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun

2. ohasdrun = restart?no

Tuesday, December 7, 2010

/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe

init.ohasd

23

init

/etc/rc.d/init.d/init.ohasd

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun

2. ohasdrun = restart?

7. ohasdrun readable?

no

Tuesday, December 7, 2010

8. ohasdrun = reboot? echo ”restart” > ohasdrun; wait for message from pipe

/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe

init.ohasd

23

init

/etc/rc.d/init.d/init.ohasd

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun

2. ohasdrun = restart?

7. ohasdrun readable?

no

Tuesday, December 7, 2010

8. ohasdrun = reboot? echo ”restart” > ohasdrun; wait for message from pipe

/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe

init.ohasd

23

init

/etc/rc.d/init.d/init.ohasd

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun

2. ohasdrun = restart?

7. ohasdrun readable?

9. ohasdrun = restart? ohasd restart &; wait for message from pipe

no

Tuesday, December 7, 2010

8. ohasdrun = reboot? echo ”restart” > ohasdrun; wait for message from pipe

/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe

init.ohasd

23

init

/etc/rc.d/init.d/init.ohasd

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun

2. ohasdrun = restart?

7. ohasdrun readable?

9. ohasdrun = restart? ohasd restart &; wait for message from pipe

10. ohasdrun = stop?

no

Tuesday, December 7, 2010

8. ohasdrun = reboot? echo ”restart” > ohasdrun; wait for message from pipe

/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe

init.ohasd

23

init

/etc/rc.d/init.d/init.ohasd

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun

2. ohasdrun = restart?

3. `crsctl check has`

7. ohasdrun readable?

9. ohasdrun = restart? ohasd restart &; wait for message from pipe

10. ohasdrun = stop?

no

Tuesday, December 7, 2010

8. ohasdrun = reboot? echo ”restart” > ohasdrun; wait for message from pipe

/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe

init.ohasd

23

init

/etc/rc.d/init.d/init.ohasd

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun

2. ohasdrun = restart?

3. `crsctl check has`

7. ohasdrun readable?

9. ohasdrun = restart? ohasd restart &; wait for message from pipe

10. ohasdrun = stop?

4. != CRS-4638? sleep 10

no

Tuesday, December 7, 2010

8. ohasdrun = reboot? echo ”restart” > ohasdrun; wait for message from pipe

/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe

init.ohasd

23

init

/etc/rc.d/init.d/init.ohasd

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun

2. ohasdrun = restart?

3. `crsctl check has`

7. ohasdrun readable?

9. ohasdrun = restart? ohasd restart &; wait for message from pipe

10. ohasdrun = stop?

4. != CRS-4638? sleep 10

5. CRS-4638? wait for message from pipe

no

Tuesday, December 7, 2010

Startup standalone

24

init /etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

rootoracle

grid

Tuesday, December 7, 2010

Startup standalone

24

init

/etc/rc.d/init.d/init.ohasd /var/log/messages

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

rootoracle

grid

Tuesday, December 7, 2010

Startup standalone

25

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

Tuesday, December 7, 2010

Startup standalone

26

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

rc

Tuesday, December 7, 2010

Startup standalone

26

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

/etc/oracle/scls_scr/<host>/<usr>/ohasdstr: enable|disable/etc/rc.d/init.d/ohasd

rc

Tuesday, December 7, 2010

Startup standalone

26

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

/etc/oracle/scls_scr/<host>/<usr>/ohasdstr: enable|disable/etc/rc.d/init.d/ohasd

rc

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

Tuesday, December 7, 2010

GRID_HOME/bin/orarootagent.bin

Startup standalone

27

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GRID_HOME/bin/oraagent.binGRID_HOME/bin/cssdagent GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

Tuesday, December 7, 2010

GRID_HOME/bin/ocssd.bin

GRID_HOME/bin/orarootagent.bin

Startup standalone

28

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GRID_HOME/bin/oraagent.bin

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

GH/log/<host>/agent/ohasd/oracssdagent_oracle/oracssdagent_oracle.log

GH/log/<host>/agent/ohasd/orarootagent_oracle/orarootagent_oracle.log

GH/log/<host>/agent/ohasd/oraagent_oracle/oraagent_oracle.log

GRID_HOME/bin/cssdagent

Tuesday, December 7, 2010

GRID_HOME/bin/diskmon.binGRID_HOME/bin/orarootagent.bin

GRID_HOME/bin/ocssd.bin

Startup standalone

29

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GRID_HOME/bin/oraagent.bin

GRID_HOME/bin/cssdagent

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

GH/log/<host>/cssd/ocssd.log

Tuesday, December 7, 2010

listener resource database resourceASM resource

GRID_HOME/bin/diskmon.bin

GRID_HOME/bin/ocssd.bin

Startup standalone

30

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GRID_HOME/bin/oraagent.bin

GRID_HOME/bin/cssdagent

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

GRID_HOME/bin/orarootagent.bin

GH/log/<host>/diskmon/diskmon.log

Tuesday, December 7, 2010

GRID_HOME/bin/diskmon.binGRID_HOME/bin/orarootagent.bin

GRID_HOME/bin/ocssd.bin

Startup standalone

31

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GRID_HOME/bin/oraagent.bin

GRID_HOME/bin/cssdagent

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

ASM resource

database resource

listener resource

Tuesday, December 7, 2010

Processes - standalone Any cluster process can be killed or crashed.

–Without influencing the resources it protects.

Except ocssd.bin–cluster synchronisation services daemon

–Documentation:

32

The cssdagent process monitors the cluster and provides I/O fencing. This service formerly was provided by Oracle Process Monitor Daemon (oprocd), also known as OraFenceService on Windows. A cssdagent failure results in Oracle Clusterware restarting the node.

Tuesday, December 7, 2010

Processes - standalone. 1. killall -9 ocssd.bin2. ASM instance terminates

33

Tuesday, December 7, 2010

Unix process pid: 27506, image: [email protected] (GMON)

2010-03-26 11:06:03.152: [ CSSCLNT]clsssRecvMsg: got a disconnect from the server while waiting for message type 1

2010-03-26 11:06:03.152: [ CSSCLNT]clssgsGroupGetStatus: communications failed (0/3/-1)

2010-03-26 11:06:03.152: [ CSSCLNT]clssgsGroupGetStatus: returning 8

kgxgnpstat: received ABORT event from CLSSGroup services Error [NM abort event ] @ 28019:1125error 29702 detected in background process

ORA-29702: error occurred in Cluster Group Service operation

GMON (ospid: 27506): terminating the instance due to error 29702

34

Processes - standalone.

Tuesday, December 7, 2010

Processes - standalone. 1. killall -9 ocssd.bin2. ASM instance terminates3. Database instances which use ASM terminates

35

Tuesday, December 7, 2010

Unix process pid: 27588, image: [email protected] (ASMB)

NOTE: ASMB terminatingerror 15064 detected in background processORA-15064: communication failure with ASM instance

ORA-03113: end-of-file on communication channelASMB (ospid: 27588): terminating the instance due to error 15064

36

Processes - standalone.

Tuesday, December 7, 2010

Processes - standalone. 1. killall -9 ocssd.bin2. ASM instance terminates3. Database instances which use ASM terminates4. oraagent detects ASM instance termination

- ASM resource status is set to FAILED5. oraagent detects db instance termination

- db resource status is set to FAILED- Diskgroup resource status is set to FAILED

6. orarootagent detects diskmon termination7. orarootagent cleans state and starts diskmon37

Tuesday, December 7, 2010

Processes - standalone. 8. ohasd restarts cssdagent

- (new) cssdagent sets cssd resource state to OFFLINE

9. cssdagent starts cssd- cssd resource state is set to ONLINE

10. ASM instance is started & set to ONLINE11. Diskgroup resource is started & set to

ONLINE12. Database instances are started & set to

ONLINE

38

Tuesday, December 7, 2010

Startup clustered

39

init /etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

rootoracle

grid

Tuesday, December 7, 2010

Startup clustered

39

init

/etc/rc.d/init.d/init.ohasd /var/log/messages

/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

rootoracle

grid

Tuesday, December 7, 2010

Startup clustered

40

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

Tuesday, December 7, 2010

Startup clustered

41

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

rc

Tuesday, December 7, 2010

Startup clustered

41

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

/etc/oracle/scls_scr/<host>/<usr>/ohasdstr: enable|disable/etc/rc.d/init.d/ohasd

rc

Tuesday, December 7, 2010

Startup clustered

41

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

/etc/oracle/scls_scr/<host>/<usr>/ohasdstr: enable|disable/etc/rc.d/init.d/ohasd

rc

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

Tuesday, December 7, 2010

GRID_HOME/bin/oraagent.binGRID_HOME/bin/orarootagent.binGRID_HOME/bin/cssdmonitorGRID_HOME/bin/cssdagent

Startup clustered

42

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

Tuesday, December 7, 2010

cssd.bin

evmd.binASM resourcegipcd.bingpnp.binmdnsd.bin evmlogger.binGRID_HOME/bin/oraagent.bin

Startup clustered

43

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

GRID_HOME/bin/cssdmonitor

GRID_HOME/bin/cssdagent

octssd.bincrsd.bindiskmon.binGRID_HOME/bin/orarootagent.bin

GH/log/<host>/agent/ohasd/oraagent_grid/oraagent_grid.log

GH/log/<host>/agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log

GH/log/<host>/agent/ohasd/oracssdagent_root/oracssdagent_root.log

GH/log/<host>/agent/ohasd/orarootagent_root/orarootagent_root.log

Tuesday, December 7, 2010

cssd.bin

octssd.bincrsd.bindiskmon.bin

evmlogger.binevmd.bin

ASM resource

gipcd.bin

gpnp.bin

mdnsd.bin

Startup clustered

44

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

GRID_HOME/bin/cssdmonitor

GRID_HOME/bin/oraagent.bin

GRID_HOME/bin/orarootagent.bin

GRID_HOME/bin/cssdagent

GH/log/<host>/mdnsd/mdnsd.log

GH/log/<host>/gpnpd/gpnpd.log

GH/log/<host>/gipcd/gipcd.log

GH/log/<host>/evmd/evmd.log

GH/evm/log/<hostvip>_evmlog.<date>

Tuesday, December 7, 2010

crsd.bindiskmon.binoctssd.binGRID_HOME/bin/orarootagent.bin

evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.bin

Startup clustered

45

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

GRID_HOME/bin/cssdmonitor

GRID_HOME/bin/oraagent.bin

cssd.bin

GRID_HOME/bin/cssdagent

GH/log/<host>/cssd/cssd.log

Tuesday, December 7, 2010

octssd.bin

diskmon.bin

cssd.bin

evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.binGRID_HOME/bin/oraagent.bin

Startup clustered

46

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

GRID_HOME/bin/orarootagent.bin

GRID_HOME/bin/cssdmonitor

GRID_HOME/bin/cssdagent

GH/log/<host>/diskmon/diskmon.log

GH/log/<host>/ctssd/octssd.log

GH/log/<host>/crsd/crsd.loglistenerlistener_scan1onseons (java)ASM (check)oraagent.bincrsd.bin

Tuesday, December 7, 2010

listenerlistener_scan1onseons (java)ASM (check)

cssd.bin

evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.bin

octssd.bin

diskmon.bin

GRID_HOME/bin/oraagent.bin

Startup clustered

47

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

GRID_HOME/bin/orarootagent.bin

GRID_HOME/bin/cssdmonitor

GRID_HOME/bin/cssdagent

oraagent.bin

orarootagent.bin

crsd.bin oraagent.bin

GH/log/<host>/agent/crsd/oraagent_grid/oraagent_grid.log

GH/log/<host>/agent/crsd/oraagent_oracle/oraagent_oracle.log

GH/log/<host>/agent/crsd/orarootagent_root/orarootagent_root.log

Tuesday, December 7, 2010

database oraagent.binlistener

listener_scan1

ons

eons (java)

ASM (check)

crsd.bin

cssd.bin

evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.bin

octssd.bin

diskmon.bin

GRID_HOME/bin/oraagent.bin

Startup clustered

48

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

GRID_HOME/bin/orarootagent.bin

GRID_HOME/bin/cssdmonitor

GRID_HOME/bin/cssdagent

orarootagent.bin

oraagent.bin

Tuesday, December 7, 2010

<host-vip>scan1net1

crsd.bin

cssd.bin

evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.bin

octssd.bin

diskmon.bin

GRID_HOME/bin/oraagent.bin

Startup clustered

49

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

GRID_HOME/bin/orarootagent.bin

GRID_HOME/bin/cssdmonitor

GRID_HOME/bin/cssdagent

database

listenerlistener_scan1onseons (java)ASM (check)

oraagent.bin

orarootagent.bin

oraagent.bin

Tuesday, December 7, 2010

crsd.bin

cssd.bin

evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.bin

octssd.bin

diskmon.bin

GRID_HOME/bin/oraagent.bin

Startup clustered

50

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

GRID_HOME/bin/orarootagent.bin

GRID_HOME/bin/cssdmonitor

GRID_HOME/bin/cssdagent

database

listenerlistener_scan1onseons (java)ASM (check)

oraagent.bin

oraagent.bin

net1

<host-vip>

scan1

orarootagent.bin

Tuesday, December 7, 2010

listenerlistener_scan1onseons (java)ASM (check)crsd.bin

cssd.bin

evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.bin

octssd.bin

diskmon.bin

GRID_HOME/bin/oraagent.bin

Startup clustered

51

/etc/rc.d/init.d/init.ohasd /var/log/messages

rootoracle

grid

GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin

GRID_HOME/bin/orarootagent.bin

GRID_HOME/bin/cssdmonitor

GRID_HOME/bin/cssdagent

oraagent.bin

oraagent.bin

net1<host-vip>scan1orarootagent.bin

Tuesday, December 7, 2010

Processes - clustered Any cluster process can be killed or crashed.

–Without influencing the resources it protects.

Except ocssd.bin–cluster synchronisation services daemon

In clustered mode, ocssd.bin’s death results in node reboot.

52

Tuesday, December 7, 2010

Votedisk Function: registration of cluster membership

Votedisks are shared With version 11.2 only CFS or ASM are

supported votedisk storage*–Raw devices supported for upgrade

* A votedisk can be placed on NFS:http://www.oracle.com/technology/products/database/clusterware/pdf/grid_infra_thirdvoteonnfs.pdf

53

Tuesday, December 7, 2010

Votedisk This is how the clusterware detects the ASM

votedisks:

54

$ kfed read /dev/mapper/vg00-lvasmkfbh.endian: 1 ; 0x000: 0x01kfbh.hard: 130 ; 0x001: 0x82kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD...kfdhdb.vfstart: 352 ; 0x0ec: 0x00000160kfdhdb.vfend: 384 ; 0x0f0: 0x00000180...

Tuesday, December 7, 2010

Votedisk ½ plus 1 rule: Quorum

The number of votedisks in a diskgroup is limited by the diskgroup redundancy level:

55

Redundancy Level Number of votedisksExternal 1Normal 3High 5

Tuesday, December 7, 2010

ASM & Clusterware ASM startup settings are in:

–$ORACLE_HOME/gpnp/<hostname>/profiles/peer/profile.xml

–File is ‘signed’

56

<orcl:ASM-Profile id="asm" DiscoveryString="/dev/iscsi" SPFile="+DATA/oel5u4-cluster/asmparameterfile/registry.253.718585107"/>

Tuesday, December 7, 2010

ASM & Clusterware It’s possible to manipulate profile.xml:

–Unsign:

–Adjust–Sign again:

57

$ gpnptool unsign -p=profile.xml -ovr -o=profile.xml

$ gpnptool sign -p=profile.xml -ovr -o=profile.xml -w=file:$ORACLE_HOME/gpnp/wallets/peer -rmws

Tuesday, December 7, 2010

Stability 11gR2 Clusterware is reported to be stable

Bugs/issues known to me:–8740030 OHASD STOP DOES NOT GET EXECUTED DURING SYSTEM SHUTDOWN

–9251136 INSTANCE WILL NOT USE HUGEPAGE IF STARTED BY SRVCTL

–Difficulties with database versions < 11–(kfod / Jeff Hunter’s 11gr2 install doc.)

58

Tuesday, December 7, 2010

A

59

Q &

Tuesday, December 7, 2010