VMWare MEM deleting paths

Question

Hello there, I was wondering if anyone has encountered the same problem with the Multipath Extension Module and if this behaviour can be changed.

We have several ESXi hosts with iSCSI MPIO configured pointing to a PS6210 group with redundant controllers and NICs. The configuration is by the book: with 2 vmkernel adapters configured on the same vswitch on separate portgroups, everything on a /24 and multipath works.. kind of (and this is where things get interesting):

We see 2 paths per disk - one for each storage controller IP in active/active, all green. Not bad, but we expect to see 4 paths: 2 for every NIC configured for every storage controller interface.

Here is a sample diagram:

We initially thought this to be an iSCSI discovery problem, but even after setting the volumes as static targets (with both controller NICs as targets) we only see 2 paths per disk. After re-scanning the HBA, 4 paths per EQL disks appear, but half of them simply go away in a few seconds. When the disk MPIO is configured to use VMWare's module all 4 paths stay in the table.

Looking at the ehcmd.log it seems that the MEM is deleting the extra paths:

Why is this a big deal? We have several target disks and hosts, and the way the EHCMD selects the connection to keep seems to be random. Because of this if the paths are traversing through the SAN switching (with only one layer of SAN switches) and we have a switch lockup or vlan failure (a failure that doesn't cause the switch ports' status to change) we lose the access to a random number of disks.

We are not on the latest fw or MEM, but I wanted to ask if anyone on the latest version has experienced the same issue? How many paths do you have per target/NIC? Is this configurable in the MEM?

Maybe the way forward is to isolate the 2 sides of the SAN with VLANs, however dell recommends the 2 switches to be stacked or LAG-d together.

dwilliam62 · Answer

Hello, EQL doesn't work like some other SAN devices. You don't assign volumes to controller ports. Either EQL NIC port can reach any volume. You don't need four connections you only need two. One from each NIC which will go through the pair of switches. If cabled correctly a single switch failure will not cause an outage. They do need to be lagged or stacked so that any NIC can reach any port on the array. If a switch should fail, all the traffic will be routed through the remaining path. The PS6210 has vertical failover, so the passive port connected to the surviving switch So both EQL nic ports will remain active. MEM negotiates connections on the fly. There is no way to change that behavior. Regards, Don

SuperSlowJedi · Answer

Hello Don,even if you cable it correctly (as we did) if the paths the MEM chooses to keep cross the LAG, you will have a disk unavailable if either of the switches locks up, because you can't control which 2 paths are deleted.  (see drawing)vertical failover doesn't kick in unless the port status changes, right? If a switch locks up or loses its config the ports remain UP, but the paths on it fail.Regards,SSJ

dwilliam62 · Answer

Hello, If the LAG or switch fails you still have the other path to all the volumes. You won't lose any disks. Volumes are not assigned to specific ports. Any port can be used to access any volume. Regards, Don

SuperSlowJedi · Answer

Sorry Don, but our experience is different. Please check the drawing below!

In about 50% of the cases the MEM keeps the optimal paths (left of drawing), in the other 50% it deletes the optimal paths and keeps the sub optimal paths which are going through the LAG (right side of drawing)

You can confirm this by running a "esxcli iscsi session connection list"

EQL public all scenarios.png

Our cabling and configuration is as per the guidelines on the hosts for the MPIO, we are only using the grp IP for discovery, etc. We have no problems with failures where the link goes down, vertical failover works as expected. However , I don't see how this configuration can protect against a switch failure where there is no link status change if all paths kept in the table are traversing through a single switch

Regards,
SSJ

dwilliam62 · Answer

Hello, It won't. But that has nothing to do with MEM. Regardless of the cause of a switch failure that path is gone. When the I/O on the failed path fails to get acknowledged that path will be declared failed. All traffic then routes via the surviving path. This doesn't cause any volumes to go offline. The EQL will use any ETH port on the array to reach any initiator port. This means going through the LAG in those cases. If the LAG is properly configured it's not sub-optimal. Link detection isn't the only method for determining if a path is valid for the very reason you are talking about. Regards, Don

SuperSlowJedi · Answer

Hello

How does LAG configuration matter? If you lose the switch, -because it locks up- it won't allow any traffic through. You can test this by knocking out all VLANs configured to your host and storage ports on one switch.

Our tests show that this causes half the volumes to be unavailable, because the MEM deletes the optimal paths in 50% of the cases

Regards, SSJ

dwilliam62 · Answer

Hello,

Then something is amiss here because every volume should have multiple connections. You should not lose any volumes. Have you removed MEM and tried with just the native MPIO?

I would also suggest open a support case. Since you experience goes against what I have seen.

Regards,

Don

bealdrid2 · Answer

I'm grasping at straws here, but:

a. Is "port binding" enabled for the two iSCSI vmkernel ports? You want to make sure it is.

b. Is there only one array/member in this group?

c. From the host try doing a few vmkpings to verify connectivity: (try from both iscsi vmks, and to both array eth ips) Try all combinations.

vmkping -I vmk -s 8972 -d

d. Also make sure "LoginTimeout" is set to 60 on the initiator dynamic discovery settings.

FWIW, in my EQL group with just one member, I have only 2 active connections (vmk1 -> nic1 on EQL; vmk2 -> nic2 on EQL). For my group with multiple members, I have 2 connections per member I believe.

I'd also echo Don's idea of using native MPIO.. since it seems you don't have multiple members in the group, it probably isn't helping a ton.

dwilliam62 · Answer

Hello, Since both NICs are making connections, port binding must be in place. I didn't ask, are you using a H/W iSCSI card or the SW iSCSI initiator? Re: Login_Timeout. That won't impact what he's seeing. But this Tech Report shows how to make sure all the best practices are in place. https://downloads.dell.com/solutions/storage-solution-resources/BestPracticesWithPSseries-VMware%28TR1091%29.pdf Note: Once you have discovered a volume, changing the login_timeout does not propagate down to those volumes. Only new volumes will inherit the change. There is a procedure in that TR, on how to correct that. The node will have to be maint mode to perform it. Regards, Don

EqualLogic

VMWare MEM deleting paths

Was this post helpful?