we can use ocrconfig -repair command to repair ocr configuration on a node which was not up when the configuration was modified on the other nodes.
Current scenario:
3 node cluster
Nodes: testdb01, testdb02, testdb03
Nodes testdb02 and testdb03 are up
Node testdb01 is down
OCR is stored on ASM diskgroup DATA
Overview:
- Store OCR on additionally on FRA diskgroup
- This information is modified in /etc/oracle/ocr.loc on nodes testdb02 and testdb03 which are up
- This information is not modified in /etc/oracle/ocr.loc on node testdb01 which is down.
– Startup Node testdb01
– Clusterware does not come up on testdb01
– Check alert log and crsd log on testdb01
– Repair OCR configuration on testdb01 so that /etc/oracle/ocr.loc on testdb01 gets updated
– STart clusterware on testdb01 – succeeds
Implementation:
- Store OCR on additionally on FRA diskgroup
[root@testdb02 ~]# ocrconfig -add +FRA
- Check that new OCR location is added in /etc/oracle/ocr.loc on nodes testdb02 and testdb03 which are up
[root@testdb02 ~]# cat /etc/oracle/ocr.loc
#Device/file getting replaced by device +FRA
ocrconfig_loc=+DATA
ocrmirrorconfig_loc=+FRA
[root@testdb03 ~]# cat /etc/oracle/ocr.loc
#Device/file getting replaced by device +FRA
ocrconfig_loc=+DATA
ocrmirrorconfig_loc=+FRA
- Check that new OCR location is not added in /etc/oracle/ocr.loc on node testdb01 which was down
[root@testdb01 ~]# cat /etc/oracle/ocr.loc
ocrconfig_loc=+DATA
local_only=FALSE
– Bring up testdb01 .
– Check that clusterware has not come up there
[root@testdb01 testdb01]# crsctl stat res -t
- Check the alert log of testdb01
[root@testdb01 testdb01]# tailf /u01/app/11.2.0/grid/log/testdb01/alerttestdb01.log
[ohasd(4914)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘testdb01‘.
2017-12-18 23:35:01.950
- Check the crsd log of testdb01 – Indicates that local and master information of OCR configuration does not match
[root@testdb01 crsd]# vi /u01/app/11.2.0/grid/log/testdb01/crsd/crsd.log
[ OCRMAS][2876611472]th_calc_av:5′: Rturn persisted AV [186646784] [11.2.0.1.0]
2017-12-18 23:35:13.931: [ OCRSRV][2876611472]th_not_master_change: Master change callback not registered
2017-12-18 23:35:13.931: [ OCRMAS][2876611472]th_master:91: Comparing device hash ids between local and master failed
2017-12-18 23:35:13.931: [ OCRMAS][2876611472]th_master:91 Local dev (1862408427, 1028247821, 0, 0, 0)
2017-12-18 23:35:13.931: [ OCRMAS][2876611472]th_master:91 Master dev (1862408427, 1897369836, 0, 0, 0)
2017-12-18 23:35:13.931: [ OCRMAS][2876611472]th_master:9: Shutdown CacheLocal. my hash ids don’t match
– Repair OCR configuration on testdb01
[root@testdb01 crsd]# ocrconfig -repair -add +FRA
- Check that new OCR location is added in /etc/oracle/ocr.loc on node testdb01
[root@testdb01 crsd]# cat /etc/oracle/ocr.loc
#Device/file getting replaced by device +FRA
ocrconfig_loc=+DATA
ocrmirrorconfig_loc=+FRA
– Shutdown and restart cluster on testdb01
[root@testdb01 crsd]# crsctl stop crs -f
[root@testdb01 crsd]# crsctl start crs
[root@testdb01 crsd]# crsctl start cluster
– Check that crsd is started on testdb03
[root@testdb01 testdb01]# tailf /u01/app/11.2.0/grid/log/testdb01/ alerttestdb01.log
[crsd(7297)]CRS-1012:The OCR service started on node testdb01.
2017-12-18 23:46:07.609
[crsd(7297)]CRS-1201:CRSD started on node testdb01.
Current scenario:
3 node cluster
Nodes: testdb01, testdb02, testdb03
Nodes testdb02 and testdb03 are up
Node testdb01 is down
OCR is stored on ASM diskgroup DATA
Overview:
- Store OCR on additionally on FRA diskgroup
- This information is modified in /etc/oracle/ocr.loc on nodes testdb02 and testdb03 which are up
- This information is not modified in /etc/oracle/ocr.loc on node testdb01 which is down.
– Startup Node testdb01
– Clusterware does not come up on testdb01
– Check alert log and crsd log on testdb01
– Repair OCR configuration on testdb01 so that /etc/oracle/ocr.loc on testdb01 gets updated
– STart clusterware on testdb01 – succeeds
Implementation:
- Store OCR on additionally on FRA diskgroup
[root@testdb02 ~]# ocrconfig -add +FRA
- Check that new OCR location is added in /etc/oracle/ocr.loc on nodes testdb02 and testdb03 which are up
[root@testdb02 ~]# cat /etc/oracle/ocr.loc
#Device/file getting replaced by device +FRA
ocrconfig_loc=+DATA
ocrmirrorconfig_loc=+FRA
[root@testdb03 ~]# cat /etc/oracle/ocr.loc
#Device/file getting replaced by device +FRA
ocrconfig_loc=+DATA
ocrmirrorconfig_loc=+FRA
- Check that new OCR location is not added in /etc/oracle/ocr.loc on node testdb01 which was down
[root@testdb01 ~]# cat /etc/oracle/ocr.loc
ocrconfig_loc=+DATA
local_only=FALSE
– Bring up testdb01 .
– Check that clusterware has not come up there
[root@testdb01 testdb01]# crsctl stat res -t
- Check the alert log of testdb01
[root@testdb01 testdb01]# tailf /u01/app/11.2.0/grid/log/testdb01/alerttestdb01.log
[ohasd(4914)]CRS-2765:Resource ‘ora.crsd’ has failed on server ‘testdb01‘.
2017-12-18 23:35:01.950
- Check the crsd log of testdb01 – Indicates that local and master information of OCR configuration does not match
[root@testdb01 crsd]# vi /u01/app/11.2.0/grid/log/testdb01/crsd/crsd.log
[ OCRMAS][2876611472]th_calc_av:5′: Rturn persisted AV [186646784] [11.2.0.1.0]
2017-12-18 23:35:13.931: [ OCRSRV][2876611472]th_not_master_change: Master change callback not registered
2017-12-18 23:35:13.931: [ OCRMAS][2876611472]th_master:91: Comparing device hash ids between local and master failed
2017-12-18 23:35:13.931: [ OCRMAS][2876611472]th_master:91 Local dev (1862408427, 1028247821, 0, 0, 0)
2017-12-18 23:35:13.931: [ OCRMAS][2876611472]th_master:91 Master dev (1862408427, 1897369836, 0, 0, 0)
2017-12-18 23:35:13.931: [ OCRMAS][2876611472]th_master:9: Shutdown CacheLocal. my hash ids don’t match
– Repair OCR configuration on testdb01
[root@testdb01 crsd]# ocrconfig -repair -add +FRA
- Check that new OCR location is added in /etc/oracle/ocr.loc on node testdb01
[root@testdb01 crsd]# cat /etc/oracle/ocr.loc
#Device/file getting replaced by device +FRA
ocrconfig_loc=+DATA
ocrmirrorconfig_loc=+FRA
– Shutdown and restart cluster on testdb01
[root@testdb01 crsd]# crsctl stop crs -f
[root@testdb01 crsd]# crsctl start crs
[root@testdb01 crsd]# crsctl start cluster
– Check that crsd is started on testdb03
[root@testdb01 testdb01]# tailf /u01/app/11.2.0/grid/log/testdb01/ alerttestdb01.log
[crsd(7297)]CRS-1012:The OCR service started on node testdb01.
2017-12-18 23:46:07.609
[crsd(7297)]CRS-1201:CRSD started on node testdb01.