For those who are not that interested in the whole story about RAC, GFS2, the misscount-parameter and Red Hat, I’ll give you the outcome: don’t use RAC (10 or 11) with GFS2 – Red Hat Cluster Suite, it’s a dead end road. For those who are interested, you’re invited…

Three years ago customer designed a RAC implementation, based on Red Hat Cluster with GFS as cluster file system. Oracle infrastructure and database on 10g. Perfectly valid reasons, at that time ASM was not quite mature and DBA’s didn’t have much experience with it for example.

The only problem in this scenario seemed to be the support and certification. Support is naturally devided in two parts: Oracle and Red Hat. And Oracle certified the use of GFS under a few conditions:

Note 329530.1 – Using Redhat Global File System (GFS) as shared storage for RAC
and RAC Technologies Matrix for Linux Platforms
(http://www.oracle.com/technetwork/database/clustering/tech-generic-linux-new-086754.html – this link is referenced in the certification matrix on My Oracle Support.)

But now customer is upgrading the infrastructure (not the database) to 11gR2, while using Red Hat Cluster Suite and GFS2. OCR on raw device, Voting disk on GFS2. All works well. Then they ran a series of tests. One test failed: when a node is fenced from the cluster, the other node also reboots.

Looks a lot like bug  “9211611 – 2 NODE CLUSTER – BOTH NODES REBOOT WHEN ONE IS MANUALLY REBOOTED”  but that is based on OCFS2:

“They have to use noatime mount option for all the mounts which store database related data.
They should also make sure CSS timeouts are larger than ocfs2 disk heartbeat timeouts.”
In our case it means it should be under 30 seconds.

So the first focus is on timeouts: Note 294430.1 – CSS Timeout Computation in Oracle Clusterware

Another possible workaround could be to relocate the voting file from GFS2 to a block device. (OCR is already located on a block device, not on GFS2) like this:

1. replace the voting disk to another gfs2 filesystem temporarily (as root)

crsctl replace votedisk
Use the crsctl replace votedisk command to move or replace the existing voting disks. This command creates voting disks in the specified locations, either in Oracle ASM or some other storage option. Oracle Clusterware copies existing voting disk information into the new locations and removes the voting disks from the former locations.

crsctl replace votedisk [+asm_disk_group | path_to_voting_disk [...]]

crsctl replace votedisk

http://download.oracle.com/docs/cd/E11882_01/rac.112/e16794/crsref.htm#CHEGHEJI

for example:
crsctl replace votedisk /mnt/oradata/voting_temp

2. check /mnt/oravoting is only stores the voting device, if so unmount /mnt/oravoting on both nodes

3. zero out the voting device
dd if=/dev/zero of=/dev/mapper/vg_oravote-vote bs=8192 count=32768

4. check whether the ownership and permission is correct on the voting device
(check Oracle® Clusterware Administration and Deployment Guide 11g Release 2 (11.2)
About Voting Disks, Oracle Cluster Registry, and Oracle Local Registry

http://download.oracle.com/docs/cd/E11882_01/rac.112/e16794/votocr.htm#CWADD92110

“If you use CRSCTL to add a new voting disk to a raw device after installation, then the file permissions of the new voting file on remote nodes may be incorrect. On each remote node, check to ensure that the file permissions for the voting file are correct (owned by the Grid Infrastructure installation owner and by members of the OINSTALL group). If the permissions are incorrect, then change them manually.

chmod 660 /dev/mapper/vg_oravote-vote
chown grid:oinstall /dev/mapper/vg_oravote-vote

5. replace the voting device to the block device (as root)
crsctl replace votedisk /dev/mapper/vg_oravote-vote

6. check whether the voting device it on the right place:
crsctl query css votedisk

This didn’t help either…

But.. they updated the misscount timer setting to 600 and retried the fence operation, and now it seems to work as designed!
In the documentation it is said that when using third party clusterware next to Oracle RAC, the value should default to 600 instead of 30. (see note 294430.1). So… is Red Hat Cluster Suite to be considered a third party clusterware? In short: no. So, to put the value of misscount above 30 is not supported / certified.

See also:

Note 558376.1 – VERITAS + RAC ON LINUX FAQ :
“Is Veritas SFRAC supported with Oracle RAC on Linux?
No. Oracle does not support running RAC on Linux with 3rd-party clusterware, and this includes Veritas cluster software.”

Note 294430.1 also says:
Misscount drives cluster membership reconfigurations and directly effects the availability of the cluster. In most cases, the default settings for MC should be acceptable. Modifying the default value of misscount not only influences the timeout interval for the i/o to the voting disk, but also influences the tolerance for missed network heartbeats across the interconnect.

So.. will it be certified in the future? This raised a lot of confusion. Documentation and Oracle consultants didn’t speek the same language. Spamming of Oracle product-management lead to some statements:

Paul Tsien:  We have no plan to certify Redhat GFS2 with 11gR2.  We have many customers running mission critical applications on Oracle stack with Redhat, and have references if the customer is interested in that.

Markus Michalewicz: In general, the RH cluster should not be used on Linux in conjunction with RAC, since not certified and not integrated, which can therefore lead to a dueling cluster issue.

At last the Service Request ended with the statement:

I can confirm that it is not planned to certify/support GFS2 with 11gR2.