reboot-less node fencing when disk heartbeat is lost.
– Check that clusterware version
[root@test02 ~]# crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.4.0]
– check that both the nodes in the cluster are active
[root@test02 ~]# olsnodes -s
test01 Active
test02 Active
– Stop ISCSI service on node2
[root@test02 ~]# service iscsi stop
Logging out of session [sid: 1, target: iqn.2006-01.com.openfiler:tsn.e55ea88d0212, portal:
192.9.201.182,3260]
Logout of [sid: 1, target: iqn.2006-01.com.openfiler:tsn.e55ea88d0212, portal: 192.9.201.182,3260]:
successful
Stopping iSCSI daemon:
– Alert log of node2 –
– Note that instead of rebooting the node, CRSD resources are cleaned up
[cssd(2876)]CRS-1649:An I/O error occured for voting file: ORCL:ASMDISK013; details at (:CSSNM00059:)
...
[cssd(2876)]CRS-1606:The number of voting files available, 0, is less than the minimum number of
voting files required, 1, resulting in CSSD termination to ensure data integrity;
[cssd(2876)]CRS-1656:The CSS daemon is terminating due to a fatal error;
[cssd(2876)]CRS-1652:Starting clean up of CRSD resources.
2017-12-09 11:04:30.795
...
[cssd(2876)]CRS-1654:Clean up of CRSD resources finished successfully.
2017-12-09 11:04:31.914
— Check that OHAS service is still up on test02
[root@test02 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
– Check that resources cssd , crsd and HAIP are down on test02
[root@test02 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE test02
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE STARTING
ora.cssdmonitor
1 ONLINE ONLINE test02
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.drivers.acfs
1 ONLINE ONLINE test02
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE test02
ora.gpnpd
1 ONLINE ONLINE test02
ora.mdnsd
1 ONLINE ONLINE test02
–Check that test02 is no longer a part of the cluster
[root@test01 cluster01]# olsnodes -s
test01 Active
test02 Inactive
– Restart ISCSI service on test02
[root@test02 ~]# service iscsi start
iscsid dead but pid file exists
Turning off network shutdown.
Starting iSCSI daemon: [ OK ]
[ OK ]
Setting up iSCSI targets: Logging in to [iface: default, target: iqn.2006-
01.com.openfiler:tsn.e55ea88d0212, portal: 192.9.201.182,3260]
Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.e55ea88d0212, portal:
192.9.201.182,3260]: successful
[ OK ]
- Alert log of test02
– Note that as soon as ISCSI service is started, CSSD service starts immediately and test02 joins the cluster
[cssd(5481)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details
...
2017-12-09 11:10:43.897
[cssd(5481)]CRS-1707:Lease acquisition for node test02 number 2 completed
2017-12-09 11:10:47.629
[cssd(5481)]CRS-1605:CSSD voting file is online: ORCL:ASMDISK013; details in
/u01/app/11.2.0/grid/log/test02/cssd/ocssd.log.
2017-12-09 11:10:54.652
[cssd(5481)]CRS-1601:CSSD Reconfiguration complete. Active nodes are test01 test02 .
– check that resources haip, cssd and crsd have started on test02
[root@test02 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE test02 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE test02
ora.crf
1 ONLINE ONLINE test02
ora.crsd
1 ONLINE ONLINE test02
ora.cssd
1 ONLINE ONLINE test02
ora.cssdmonitor
1 ONLINE ONLINE test02
ora.ctssd
1 ONLINE ONLINE test02 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.drivers.acfs
1 ONLINE ONLINE test02
ora.evmd
1 ONLINE ONLINE test02
ora.gipcd
1 ONLINE ONLINE test02
ora.gpnpd
1 ONLINE ONLINE test02
ora.mdnsd
1 ONLINE ONLINE test02
– Check that test02 has joined the cluster
[root@test02 ~]# olsnodes -s
test01 Active
test02 Active
– Check that clusterware version
[root@test02 ~]# crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.4.0]
– check that both the nodes in the cluster are active
[root@test02 ~]# olsnodes -s
test01 Active
test02 Active
– Stop ISCSI service on node2
[root@test02 ~]# service iscsi stop
Logging out of session [sid: 1, target: iqn.2006-01.com.openfiler:tsn.e55ea88d0212, portal:
192.9.201.182,3260]
Logout of [sid: 1, target: iqn.2006-01.com.openfiler:tsn.e55ea88d0212, portal: 192.9.201.182,3260]:
successful
Stopping iSCSI daemon:
– Alert log of node2 –
– Note that instead of rebooting the node, CRSD resources are cleaned up
[cssd(2876)]CRS-1649:An I/O error occured for voting file: ORCL:ASMDISK013; details at (:CSSNM00059:)
...
[cssd(2876)]CRS-1606:The number of voting files available, 0, is less than the minimum number of
voting files required, 1, resulting in CSSD termination to ensure data integrity;
[cssd(2876)]CRS-1656:The CSS daemon is terminating due to a fatal error;
[cssd(2876)]CRS-1652:Starting clean up of CRSD resources.
2017-12-09 11:04:30.795
...
[cssd(2876)]CRS-1654:Clean up of CRSD resources finished successfully.
2017-12-09 11:04:31.914
— Check that OHAS service is still up on test02
[root@test02 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
– Check that resources cssd , crsd and HAIP are down on test02
[root@test02 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE test02
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE STARTING
ora.cssdmonitor
1 ONLINE ONLINE test02
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.drivers.acfs
1 ONLINE ONLINE test02
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE test02
ora.gpnpd
1 ONLINE ONLINE test02
ora.mdnsd
1 ONLINE ONLINE test02
–Check that test02 is no longer a part of the cluster
[root@test01 cluster01]# olsnodes -s
test01 Active
test02 Inactive
– Restart ISCSI service on test02
[root@test02 ~]# service iscsi start
iscsid dead but pid file exists
Turning off network shutdown.
Starting iSCSI daemon: [ OK ]
[ OK ]
Setting up iSCSI targets: Logging in to [iface: default, target: iqn.2006-
01.com.openfiler:tsn.e55ea88d0212, portal: 192.9.201.182,3260]
Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.e55ea88d0212, portal:
192.9.201.182,3260]: successful
[ OK ]
- Alert log of test02
– Note that as soon as ISCSI service is started, CSSD service starts immediately and test02 joins the cluster
[cssd(5481)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details
...
2017-12-09 11:10:43.897
[cssd(5481)]CRS-1707:Lease acquisition for node test02 number 2 completed
2017-12-09 11:10:47.629
[cssd(5481)]CRS-1605:CSSD voting file is online: ORCL:ASMDISK013; details in
/u01/app/11.2.0/grid/log/test02/cssd/ocssd.log.
2017-12-09 11:10:54.652
[cssd(5481)]CRS-1601:CSSD Reconfiguration complete. Active nodes are test01 test02 .
– check that resources haip, cssd and crsd have started on test02
[root@test02 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE test02 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE test02
ora.crf
1 ONLINE ONLINE test02
ora.crsd
1 ONLINE ONLINE test02
ora.cssd
1 ONLINE ONLINE test02
ora.cssdmonitor
1 ONLINE ONLINE test02
ora.ctssd
1 ONLINE ONLINE test02 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.drivers.acfs
1 ONLINE ONLINE test02
ora.evmd
1 ONLINE ONLINE test02
ora.gipcd
1 ONLINE ONLINE test02
ora.gpnpd
1 ONLINE ONLINE test02
ora.mdnsd
1 ONLINE ONLINE test02
– Check that test02 has joined the cluster
[root@test02 ~]# olsnodes -s
test01 Active
test02 Active