We were setting up a 2 node Oracle Grid Infrastructure (RAC) – extended – cluster on top of RHEL 5.5 according to the Oracle standard documentation, with of course a third NFS-node as voting node. Also using ASM to create “host-based”mirror blockdevices for the Oracle software.

The setup is as follows:

3 HP DL380 G6 systems with a basic RHEL 5u5 x86_64 installation (2 x RAC clusternodes, 1 x NFS-voting-node)

2 SAN’s HP EVA 6400 systems with 2 controllers each (resulting in 8 paths per device)

Oracle 11.2.0.2

We did choose this configuration in stead of a configuration with Dataguard because of our high demand of failover-time in case of a node- / SAN- disaster. Should be within 30 seconds. This post raises the question if we made the right decision….

The following analyses and testing by the way has been the effort of my collegae Chris Verhoef, a former RedHat-consultant:

With this setup we are facing the issue that if we loose a complete SAN, the IO’s to the ASM diskgroups will be blocked for approx 3 till 4 minutes. Oracle does not like this. After 70 seconds after a freeze, rdbms is starting to reboot (expected behaviour).  To shorten this  time we have done some testing with the following parameters:

checker timeout

no_path_retry

dev_loss_tmo

First test (expected to be an extreme one in regards to the no_path_retry)
– checker timeout from 60000 to 30000 (udev rule change within 50-udev.rules)
– no_path_retry from 12 to “failed”
– dev_loss_tmo unchanged (16 default from scsi_transport_fc)
This results in a IO block from ASM point of view for a litle bit more than 1 minute (1.05) while expected 30 form checker timeout.

The second test (some more reliable, some queueing)
– checker timeout from 30000ms to 15000ms (udev rule change within 50-udev.rules)
– no_path_retry from “failed” to 2
– dev_loss_tmo from 16 to 7
This results in a IO block from ASM point of view for a little bit more than 1 minute (1.15) while expected 15 form checker timeout with a additional 7×2, so in total approx 30 seconds.

We opened a service-request at Red Hat to lower the IO block time, but also one at Oracle, just to be sure.

After an extended investigation the first advice of Red Hat  is to install the latest version of the next packages, in our case:

- lvm2-2.02.74-5.el5_6.1
– lvm2-cluster-2.03.74-3.el5_6.1
– kpartx-0.4.7-42.el5_6.2
– cmirror-1.1.39-10.el5
– device-mapper-multipath-0.4.7-42.el5_6.2

The result was that the I/O-timout has been reduced to 1:45 minutes, and all the paths were restored correctly. Better indeed yes, but still too slow…

Red Hat searched extensively and finally came with the following solution:

 1. Edit the Multipath device section:

device {
vendor “(COMPAQ|HP)”
product “HSV1[01]1|HSV2[01]0|HSV3[046]0|HSV4[05]0″
getuid_callout “/sbin/scsi_id -g -u -s /block/%n”
prio_callout “/sbin/mpath_prio_alua /dev/%n”
features “0”
hardware_handler “0”
path_grouping_policy group_by_prio
failback immediate
rr_weight uniform
no_path_retry 2
rr_min_io 100
path_checker tur
}

2. Kernel update to kernel 2.6.18-238.el5 see also BZ#627836.

While writing this post, no testing had been done. Result of the testing described in part II of this post.

 

m4s0n501