We were setting up a 2 node Oracle Grid Infrastructure (RAC) – extended – cluster on top of RHEL 5.5 according to the Oracle standard documentation, with of course a third NFS-node as voting node. Also using ASM to create “host-based”mirror blockdevices for the Oracle software.
The setup is as follows:
3 HP DL380 G6 systems with a basic RHEL 5u5 x86_64 installation (2 x RAC clusternodes, 1 x NFS-voting-node)
2 SAN’s HP EVA 6400 systems with 2 controllers each (resulting in 8 paths per device)
We did choose this configuration in stead of a configuration with Dataguard because of our high demand of failover-time in case of a node- / SAN- disaster. Should be within 30 seconds. This post raises the question if we made the right decision….
The following analyses and testing by the way has been the effort of my collegae Chris Verhoef, a former RedHat-consultant:
With this setup we are facing the issue that if we loose a complete SAN, the IO’s to the ASM diskgroups will be blocked for approx 3 till 4 minutes. Oracle does not like this. After 70 seconds after a freeze, rdbms is starting to reboot (expected behaviour). To shorten this time we have done some testing with the following parameters: