The second and final post about an issue with a RAC-configuration with two SAN’s. Problem was a i/o-freeze of minutes when crashing one of the two SAN’s. The first post I ended with a ‘cliffhanger’ because we had a solution, but not tested it yet. Now we tested it.
Start with a mockup of the first post.
3 HP DL380 G6 systems with a basic RHEL 5u5 x86_64 installation (2 x RAC clusternodes, 1 x NFS-voting-node)
2 SAN’s HP EVA 6400 systems with 2 controllers each (resulting in 8 paths per device)
Test: power off 1 SAN. Default result / problem: i/o freeze of minutes, Oracle didn’t like it, started to evict, shutdown, startup = expected behaviour after such a long i/o freeze. But this is not the intention when installing a RAC with two SAN’s….
1. Edit the Multipath device section:
getuid_callout “/sbin/scsi_id -g -u -s /block/%n”
prio_callout “/sbin/mpath_prio_alua /dev/%n”
2. Kernel update to kernel 2.6.18-238.el5 see also BZ#627836.
3. While testing this, it came out that we also need another ASMLIB (of course…), in this case oracleasm-2.6.18-238.el5-2.0.5-1.el5.x86_64.rpm.
Result of the test: an i/o freeze of 12 seconds, and after this freeze ASM started to reconfigure the disk-groups, no service loss. This is what we wanted !!
Also tested this as a ‘rolling upgrade’: upgrade by node. No problem.
Hope this will help somebody. Regards.