Powerpath Path Failover

Question

Hi All,

I have a cluster of 2 server, OS is RHEL 5.3 kernel version 2.6.18-128.el5, both installed with PowerPath 5.3 and each server using 2 HBA.

We do a testing of path failover, when I unplugged 1 FC cable from 1 of the HBA, the cluster will failover. I believe the proper behaviour is that only the path that is being failover which is handled by PowerPath.

Can anyone help me with the issue, is there any configuration that I should do on the PowerPath side? Thanks in advance.

Regards,
Sandy

dynamox · Answer

Hello Sandy,Is PowerPath licensed ? What array is this system connected to ? If it's clariion..what failover mode is set to ?

SKT2 · Answer

what cluster s/w? Red hat cluster Suite( which version) with Quorum disk?

what you have in /etc/modprobe.conf? We have the following entry which disables the fail over at HBA level rather PP/Array handles fail over. ( that is what i remember about this entry)

options qla2xxx ConfigRequired=0 ql2xfailover=0

sandy14 · Answer

Hi Dynamox,Yes the powerpath is licensed, and it is connected to Clariion box, I give some output from powermt display as below:Pseudo name=emcpowerbCLARiiON ID=XXXX [SERVER1_Clust]Logical device ID=60060160C9CF1A00EEB33F4C078BDE11 [LUN 68]state=alive; policy=CLAROpt; priority=0; queued-IOs=0Owner: default=SP A, current=SP A Array failover mode: 1==============================================================================---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---### HW Path I/O Paths Interf. Mode State Q-IOs Errors============================================================================== 3 qla2xxx sdb SP A2 active alive 0 0 3 qla2xxx sde SP B2 active alive 0 0 4 qla2xxx sdh SP A3 active alive 0 0 4 qla2xxx sdk SP B3 active alive 0 0Hi SKT,Yes its a redhat cluster suite with a quorum disk. I'm not sure what version is installed, how can i check that?I have read some articles about the /etc/modprobe.conf options, but when i add those lines, i get error messages whenever i do a modprobe. Some said have to do mkinitrd after i add the options, but whenever i do mkinitrd, it saying that there is no changes on the kernel...Can you enlighten me Regards,Sandy

SKT2 · Answer

Post `cman_tool status`

cat /etc/modprobe.conf. I can compare with mine. You dont need to do any modprobe after the change. complete mkinitrd and reboot.

DId u check the size(or do a diff between old.img and new.img) of the *.img file after mkinitrd even though it reports no change in the kernel.

also post /`cat proc/scsi/qla2xxx/3`

sandy14 · Answer

Hi SKT, thanks for your replies[root@SERVER1 ~]# cat /etc/modprobe.confalias eth0 e1000ealias eth1 e1000ealias eth2 bnx2alias eth3 bnx2alias scsi_hostadapter mptbasealias scsi_hostadapter1 mptsasalias scsi_hostadapter2 ata_piixalias scsi_hostadapter3 qla2xxxalias bond0 bondingoptions qla2xxx ql2xfailover=0 ConfigRequired=0# options bond0 mode=balance-alb miinom=100###BEGINPPinclude /etc/modprobe.conf.pp###ENDPP[root@SERVER1 ~]# cman_tool statusVersion: 6.1.0Config Version: 11Cluster Name: SERVER1CLUSTCluster Id: 24958Cluster Member: YesCluster Generation: 220344Membership state: Cluster-MemberNodes: 2Expected votes: 2Quorum device votes: 1Total votes: 3Quorum: 2Active subsystems: 9Flags: DirtyPorts Bound: 0 177Node name: SERVERC01-PRIVNode ID: 1Multicast addresses: 239.192.97.223Node addresses: 172.21.62.11for the mkinitrd, its not even creating a new img file, let me try to do that again once again and post the result later on, I cannot find qla2xxx under the folder /proc/scsiRegards,Sandy

SKT2 · Answer

Here is the one to over write(force) the existing file. Take a backup first#cd /boot#mkinitrd -f initrd-$(uname -r).img $(uname -r)

sandy14 · Answer

Hi,Yes i use the -f options when running the mkinitrd command, i forgot the exact error message. when i can down the cluster, i'll try to do it again and post the result.Regards,Sandy

sandy14 · Answer

Hi,Please find as below when i change the /etc/modprobe.conf and then do mkinitrd. I got message for 'No modules available for kernel 'initrd-2.6.18-128.el5''[root@SERVER1 boot]# cat /etc/modprobe.confalias eth0 e1000ealias eth1 e1000ealias eth2 bnx2alias eth3 bnx2alias scsi_hostadapter mptbasealias scsi_hostadapter1 mptsasalias scsi_hostadapter2 ata_piixalias scsi_hostadapter3 qla2xxxalias bond0 bondingoptions qla2xxx ql2xfailover=0 ConfigRequired=0###BEGINPPinclude /etc/modprobe.conf.pp###ENDPP[root@SERVER1 boot]# lltotal 12474-rw-r--r-- 1 root root 64994 Dec 17 2008 config-2.6.18-128.el5drwxr-xr-x 2 root root 1024 Aug 14 10:49 grub-rw------- 1 root root 3230851 Aug 14 10:16 initrd-2.6.18-128.el5.img-rw------- 1 root root 3230851 Sep 20 11:13 initrd-2.6.18-128.el5.img.backup-rw-r--r-- 1 root root 2982241 Aug 15 02:57 initrd-2.6.18-128.el5kdump.imgdrwx------ 2 root root 12288 Aug 14 10:11 lost+found-rw-r--r-- 1 root root 102182 Dec 17 2008 symvers-2.6.18-128.el5.gz-rw-r--r-- 1 root root 1188481 Dec 17 2008 System.map-2.6.18-128.el5-rw-r--r-- 1 root root 1889308 Dec 17 2008 vmlinuz-2.6.18-128.el5[root@SERVER1 boot]# mkinitrd -f initrd-2.6.18-128.el5.img initrd-2.6.18-1 28.el5No modules available for kernel 'initrd-2.6.18-128.el5'.[root@SERVER1 boot]# lltotal 12474-rw-r--r-- 1 root root 64994 Dec 17 2008 config-2.6.18-128.el5drwxr-xr-x 2 root root 1024 Aug 14 10:49 grub-rw------- 1 root root 3230851 Aug 14 10:16 initrd-2.6.18-128.el5.img-rw------- 1 root root 3230851 Sep 20 11:13 initrd-2.6.18-128.el5.img.backup-rw-r--r-- 1 root root 2982241 Aug 15 02:57 initrd-2.6.18-128.el5kdump.imgdrwx------ 2 root root 12288 Aug 14 10:11 lost+found-rw-r--r-- 1 root root 102182 Dec 17 2008 symvers-2.6.18-128.el5.gz-rw-r--r-- 1 root root 1188481 Dec 17 2008 System.map-2.6.18-128.el5-rw-r--r-- 1 root root 1889308 Dec 17 2008 vmlinuz-2.6.18-128.el5

Conor · Answer

Shouldn't the command be mkinitrd -v initrd-2.6.18-128.el5.img initrd-2.6.18-1 28.el5

SKT2 · Answer

adding -v would give verbose info; U will get a lot info on the stadard o/p which may help to know what is going on.

try the below and make sure the modules for current kernels are available.
#cd /lib/modules/`uname -r`

I think these created when respcetive kernel-devel/kernel-headers rpms are insatlled. can u verify if they are installed.

Also you dont need any outage to run the mkinitrd, rather put/create the img file on a different folder.

sandy14 · Answer

Hi All,I do some additional testing, it appears that when I'm not using cluster to mount the filesystem, the powerpath is working just fine. Whenever I unplug 1 path, the mount point will still be accessible.But when I bring up the cluster and unplug 1 path, it will make my cluster failover to the other node. And to make it more interesting, if I manually swing it back to the node with 1 path unplugged, also no problem.Any clues?Regards,Sandy

RRR · Answer

I've seen the same behavior on a competitor's system with a Windows cluster where 1 path to 1 of the 2 hosts failed and the cluster failed over to the other node. We're still not sure what caused this.

SKT2 · Answer

I see the note 'I cannot find qla2xxx under the folder /proc/scsi'.In all my linux systems with qla/PP i can see them. Normally i see them disappearing them after a kernel upgrade and driver installation fixes them.can u try reinstallng the qlogic driver.?Ther driver version we use is QLogic PCI to Fibre Channel Host Adapter for QLA2460: Firmware version 4.00.26 [IP] , Driver version 8.01.07.15ISP: ISP2422

sandy14 · Answer

Hi All,I try to dig on the cluster side, I found out that linux cluster has the timeout/TKO for the quorum disk. I changed this value to be above 30 sec and it is all working as expected now.Thank you all for the helpRegardsSandy

SKT2 · Answer

u mena previos value was 30. ? We have the tko set to 10

PowerPath

Powerpath Path Failover

Was this post helpful?