The second and final post about an issue with a RAC-configuration with two SAN’s.  Problem was a i/o-freeze of minutes when crashing one of the two SAN’s. The first post I ended with a ‘cliffhanger’  because we had a solution, but not tested it yet. Now we tested it.

Start with a mockup of the first post.

Setup:

3 HP DL380 G6 systems with a basic RHEL 5u5 x86_64 installation (2 x RAC clusternodes, 1 x NFS-voting-node)

2 SAN’s HP EVA 6400 systems with 2 controllers each (resulting in 8 paths per device)

Oracle 11.2.0.2

Test: power off 1 SAN.  Default result / problem: i/o freeze of minutes, Oracle didn’t like it, started to evict, shutdown, startup = expected behaviour after such a long i/o freeze. But this is not the intention when installing a RAC with two SAN’s….

Solution:

1. Edit the Multipath device section:

device {
vendor “(COMPAQ|HP)”
product “HSV1[01]1|HSV2[01]0|HSV3[046]0|HSV4[05]0”
getuid_callout “/sbin/scsi_id -g -u -s /block/%n”
prio_callout “/sbin/mpath_prio_alua /dev/%n”
features “0”
hardware_handler “0”
path_grouping_policy group_by_prio
failback immediate
rr_weight uniform
no_path_retry 2
rr_min_io 100
path_checker tur
}

2. Kernel update to kernel 2.6.18-238.el5 see also BZ#627836.

3. While testing this, it came out that we also need another ASMLIB (of course…), in this case oracleasm-2.6.18-238.el5-2.0.5-1.el5.x86_64.rpm.

Result of the test: an i/o freeze of 12 seconds, and after this freeze ASM started to reconfigure the disk-groups, no service loss. This is what we wanted !!

Also tested this as a ‘rolling upgrade’: upgrade by node. No problem.

Hope this will help somebody. Regards.