Multipath timeout issues with extended – cluster setup

We were setting up a 2 node Oracle Grid Infrastructure (RAC) – extended – cluster on top of RHEL 5.5 according to the Oracle standard documentation, with of course a third NFS-node as voting node. Also using ASM to create “host-based”mirror blockdevices for the Oracle software.

The setup is as follows:

3 HP DL380 G6 systems with a basic RHEL 5u5 x86_64 installation (2 x RAC clusternodes, 1 x NFS-voting-node)

2 SAN’s HP EVA 6400 systems with 2 controllers each (resulting in 8 paths per device)


We did choose this configuration in stead of a configuration with Dataguard because of our high demand of failover-time in case of a node- / SAN- disaster. Should be within 30 seconds. This post raises the question if we made the right decision….

The following analyses and testing by the way has been the effort of my collegae Chris Verhoef, a former RedHat-consultant:

With this setup we are facing the issue that if we loose a complete SAN, the IO’s to the ASM diskgroups will be blocked for approx 3 till 4 minutes. Oracle does not like this. After 70 seconds after a freeze, rdbms is starting to reboot (expected behaviour).  To shorten this  time we have done some testing with the following parameters:

checker timeout



Red Hat 6 and Oracle, status of certification

Red Hat 6 has been there a while, so what about certification with Oracle and when? Nothing yet on the Oracle support site, no press releases (maybe I missed one..). But Red Hat had a blog-post about it a while ago (august 2011):

We’re pleased to announce that on Tuesday, August 9, we formally submitted to Oracle full certification test results of the Oracle 11gR2 database (Single Instance and RAC (including ASM) for x86 and x86-64) on Red Hat Enterprise Linux 6. Oracle database certification is a self-certification program whereby operating system vendors perform extensive testing and submit the results to Oracle for audit and approval.

Errors in alert log [NI cryptographic checksum mismatch] TNS-12599

Using rdbms 11.2 as the repository for our Grid Control environmont, noticed a lot of the same errors in the alert file of the repository or/and the target database:

NI cryptographic checksum mismatch error: 12599.
TNS for Linux: Version – Production
Oracle Bequeath NT Protocol Adapter for Linux: Version – Production
TCP/IP NT Protocol Adapter for Linux: Version – Production
Time: 24-MAR-2011 11:58:31
Tracing not turned on.
Tns error struct:
ns main err code: 12599

TNS-12599: TNS:cryptographic checksum mismatch
ns secondary err code: 2526
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0

Note: 1150874.1 gives the cause, workaround and solution:

Restoring OCR on

As the documentation about restoring an OCR did not work as it should be, the following has been succesfully tested…

Log on as “root”
# . oraenv +ASM1
# ./ocrconfig -showbackup
# ./ocrconfig -showbackup manual

Stop the cluster on all nodes

# crsctl stop crs

My RAC project with Openfiler, part II – grid infrastructure and database

This is the second post of my little project of building my own RAC on my Windows7-desktop (64-bits, 8GB RAM), using 3 VM’s (VM-workstation , 7.1.3 build-324285) :

  • 2 VM’s for two RAC-nodes, based on OEL 5.5, for infra, and database.
  • 1 VM as my own SAN, based on Openfiler 2.3 (free), for ASM.

My goal: just a little bit hands-on experience with Openfiler, and do some testing with RAC 11gr2, and especially Maybe it’s helpful for somebody else, so I’ll post my experiences. Steps I’m gonna perform.

–          The first post handled the following:

1.       Planning my installation
2.       Create a VM with OpenFiler
3.       Configure Openfiler

–          The second post (this post):

4.       Create a VM as node rac1 and configure ASM
5.       Create a VM as node rac2
6.       Install Oracle RAC Infrastructure
7.       Install Oracle RAC software

8.       Install Oracle RAC database

