Unsolved

This post is more than 5 years old

1754

August 20th, 2019 05:00

VMWare MEM deleting paths

Hello there, I was wondering if anyone has encountered the same problem with the Multipath Extension Module and if this behaviour can be changed.

We have several ESXi hosts with iSCSI MPIO configured pointing to a PS6210 group with redundant controllers and NICs. The configuration is by the book: with 2 vmkernel adapters configured on the same vswitch on separate portgroups, everything on a /24 and multipath works.. kind of (and this is where things get interesting):

We see 2 paths per disk - one for each storage controller IP in active/active, all green. Not bad, but we expect to see 4 paths: 2 for every NIC configured for every storage controller interface. 

Here is a sample diagram:

We initially thought this to be an iSCSI discovery problem, but even after setting the volumes as static targets (with both controller NICs as targets) we only see 2 paths per disk. After re-scanning the HBA, 4 paths per EQL disks appear, but half of them simply go away in a few seconds. When the disk MPIO is configured to use VMWare's module all 4 paths stay in the table.

 

Looking at the ehcmd.log it seems that the MEM is deleting the extra paths:

20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1793|>>>> Reconfiguration Request IPC for iqn.2001-05.com.equallogic:4-771816-fcd89ea46-97a005a202c568a9-ds-eng >>>>
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1795|Opcode: 2827
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1796|MPIO session: 8-f6d95f-427f1d8f2-4c60a182005e1059
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1797|Volume PsvId: 4-771816-fcd89ea46-97a005a202c568a9
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1798|Configuration Options: 0x0
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1799|MaxVolumeConnection: 6
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1800|MaxMemberConnection: 4
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1801|AdapterCount: 2
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1802|ConnectionCount: 4
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1806|adapter[0].HostIndex: 1
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1807|adapter[0].addr: 10.10.10.201
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1808|adapter[0].mask: 255.255.255.0
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1809|adapter[0].speed: 10000
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1810|adapter[0].weight: 0
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1811|adapter[0].ConnectionCount.MpioSession: 2
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1812|adapter[0].ConnectionCount.Total: 7
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1813|adapter[0].HBA.MaxConnTotal: 4092
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1814|adapter[0].HBA.MaxConnSession: 0
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1806|adapter[1].HostIndex: 3
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1807|adapter[1].addr: 10.10.10.202
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1808|adapter[1].mask: 255.255.255.0
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1809|adapter[1].speed: 10000
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1810|adapter[1].weight: 0
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1811|adapter[1].ConnectionCount.MpioSession: 2
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1812|adapter[1].ConnectionCount.Total: 7
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1813|adapter[1].HBA.MaxConnTotal: 4092
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1814|adapter[1].HBA.MaxConnSession: 0
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1819|connection[0].saddr: 10.10.10.201
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1820|connection[0].taddr: 10.10.10.1
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1821|connection[0].sport: 34041
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1822|connection[0].tport: 3260
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1819|connection[1].saddr: 10.10.10.202
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1820|connection[1].taddr: 10.10.10.2
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1821|connection[1].sport: 16240
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1822|connection[1].tport: 3260
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1819|connection[2].saddr: 10.10.10.202
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1820|connection[2].taddr: 10.10.10.1
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1821|connection[2].sport: 62917
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1822|connection[2].tport: 3260
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1819|connection[3].saddr: 10.10.10.201
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1820|connection[3].taddr: 10.10.10.2
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1821|connection[3].sport: 45011
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigReq|1822|connection[3].tport: 3260
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigRsp|1829|<<<< Reconfiguration Response IPC <<<<
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigRsp|1830|Opcode: 2828
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigRsp|1831|MPIO session: 8-f6d95f-427f1d8f2-4c60a182005e1059
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigRsp|1832|Volume PsvId: 4-771816-fcd89ea46-97a005a202c568a9
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigRsp|1833|WKA: 10.10.10.100:48140
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigRsp|1834|Status: 0
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigRsp|1835|Ext Status: 0x8
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigRsp|1836|ConnCount: 2
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigRsp|1848|10.10.10.202 -> 10.10.10.1:3260
20Aug19:11:03:16:EHCMD:INFO|PrintReconfigRsp|1848|10.10.10.201 -> 10.10.10.2:53456
20Aug19:11:03:16:EHCMD:INFO|ProcessReconfigResponse|1778|Total number of sessions changed (original 4, new 2)
20Aug19:11:03:16:EHCMD:INFO|Logout|307|Result of logout: 0x0
20Aug19:11:03:16:EHCMD:INFO|Reconfigure|778|Existing iSCSI session removed successfully
20Aug19:11:03:16:EHCMD:INFO|Logout|307|Result of logout: 0x0
20Aug19:11:03:16:EHCMD:INFO|Reconfigure|778|Existing iSCSI session removed successfully
20Aug19:11:03:16:EHCMD:INFO|GetActiveSessionCount|823|Found 2 sessions
20Aug19:11:03:16:EHCMD:INFO|WaitForNSessions|850|Found expected number of active sessions
20Aug19:11:03:16:EHCMD:INFO|Reconfigure|849|Reconfiguration complete
20Aug19:11:03:16:EHCMD:INFO|operator()|293|Deleting stale deferred request for ScsiId 4-771816-fcd89ea46-97a005a202c568a9 Lun 0x0
20Aug19:11:03:16:EHCMD:INFO|enqueue|231|scheduled job JOB_TYPE_CONNECTION_SETUP_NORMAL to run in 240 sec
20Aug19:11:03:16:EHCMD:INFO|ProcessDeviceChange|871|Processing PSP change for ScsiId 4-771816-fcd89ea46-97a005a202c568a9 PsvId 4-771816-fcd89ea46-97a005a202c568a9 (2 paths) MpioSessionId 8-f6d95f-427f1d8f2-4c60a182005e1059
20Aug19:11:03:16:EHCMD:INFO|IoctlUpdate|1144|IOCTL reports Created:0 Added:0 Removed:2
20Aug19:11:03:16:EHCMD:INFO|IoctlUpdate|1149|EHCM counts Created:0 Added:0 Removed:2

 

Why is this a big deal? We have several target disks and hosts, and the way the EHCMD selects the connection to keep seems to be random. Because of this if the paths are traversing through the SAN switching (with only one layer of SAN switches) and we have a switch lockup or vlan failure (a failure that doesn't cause the switch ports' status to change) we lose the access to a random number of disks.


We are not on the latest fw or MEM, but I wanted to ask if anyone on the latest version has experienced the same issue? How many paths do you have per target/NIC? Is this configurable in the MEM?

 

Maybe the way forward is to isolate the 2 sides of the SAN with VLANs, however dell recommends the 2 switches to be stacked or LAG-d together.

4 Operator

 • 

1.5K Posts

August 20th, 2019 10:00

Hello, EQL doesn't work like some other SAN devices. You don't assign volumes to controller ports. Either EQL NIC port can reach any volume. You don't need four connections you only need two. One from each NIC which will go through the pair of switches. If cabled correctly a single switch failure will not cause an outage. They do need to be lagged or stacked so that any NIC can reach any port on the array. If a switch should fail, all the traffic will be routed through the remaining path. The PS6210 has vertical failover, so the passive port connected to the surviving switch So both EQL nic ports will remain active. MEM negotiates connections on the fly. There is no way to change that behavior. Regards, Don

August 23rd, 2019 07:00

Hello Don,

even if you cable it correctly (as we did) if the paths the MEM chooses to keep cross the LAG, you will have a disk unavailable if either of the switches locks up, because you can't control which 2 paths are deleted.  (see drawing)

vertical failover doesn't kick in unless the port status changes, right? If a switch locks up or loses its config the ports remain UP, but the paths on it fail.

Regards,
SSJ

4 Operator

 • 

1.5K Posts

August 23rd, 2019 10:00

Hello, If the LAG or switch fails you still have the other path to all the volumes. You won't lose any disks. Volumes are not assigned to specific ports. Any port can be used to access any volume. Regards, Don

August 27th, 2019 02:00

Sorry Don, but our experience is different. Please check the drawing below!

In about 50% of the cases the MEM keeps the optimal paths (left of drawing), in the other 50% it deletes the optimal paths and keeps the sub optimal paths which are going through the LAG (right side of drawing)

You can confirm this by running a "esxcli iscsi session connection list"

EQL public all scenarios.png

 


Our cabling and configuration is as per the guidelines on the hosts for the MPIO, we are only using the grp IP for discovery, etc. We have no problems with failures where the link goes down, vertical failover works as expected. However , I don't see how this configuration can protect against a switch failure where there is no link status change if all paths kept in the table are traversing through a single switch

Regards,
SSJ

4 Operator

 • 

1.5K Posts

August 27th, 2019 07:00

Hello, It won't. But that has nothing to do with MEM. Regardless of the cause of a switch failure that path is gone. When the I/O on the failed path fails to get acknowledged that path will be declared failed. All traffic then routes via the surviving path. This doesn't cause any volumes to go offline. The EQL will use any ETH port on the array to reach any initiator port. This means going through the LAG in those cases. If the LAG is properly configured it's not sub-optimal. Link detection isn't the only method for determining if a path is valid for the very reason you are talking about. Regards, Don

August 27th, 2019 08:00

Hello

How does LAG configuration matter? If you lose the switch, -because it locks up- it won't allow any traffic through. You can test this by knocking out all VLANs configured to your host and storage ports on one switch.

Our tests show that this causes half the volumes to be unavailable, because the MEM deletes the optimal paths in 50% of the cases

Regards, SSJ

4 Operator

 • 

1.5K Posts

August 27th, 2019 08:00

Hello, 

  Then something is amiss here because every volume should have multiple connections. You should not lose any volumes.   Have you removed MEM and tried with just the native MPIO?  

  I would also suggest open a support case.  Since you experience goes against what I have seen. 


Regards,

Don

 

1 Rookie

 • 

117 Posts

August 28th, 2019 10:00

I'm grasping at straws here, but:


a.  Is "port binding" enabled for the two iSCSI vmkernel ports?  You want to make sure it is.

b.  Is there only one array/member in this group?

c.  From the host try doing a few vmkpings to verify connectivity:  (try from both iscsi vmks, and to both array eth ips)  Try all combinations.

vmkping -I vmk -s 8972 -d

d.  Also make sure "LoginTimeout" is set to 60 on the initiator dynamic discovery settings.

FWIW, in my EQL group with just one member, I have only 2 active connections (vmk1 -> nic1 on EQL; vmk2 -> nic2 on EQL).  For my group with multiple members, I have 2 connections per member I believe.

I'd also echo Don's idea of using native MPIO.. since it seems you don't have multiple members in the group, it probably isn't helping a ton.  

4 Operator

 • 

1.5K Posts

August 28th, 2019 11:00

Hello, Since both NICs are making connections, port binding must be in place. I didn't ask, are you using a H/W iSCSI card or the SW iSCSI initiator? Re: Login_Timeout. That won't impact what he's seeing. But this Tech Report shows how to make sure all the best practices are in place. https://downloads.dell.com/solutions/storage-solution-resources/BestPracticesWithPSseries-VMware%28TR1091%29.pdf Note: Once you have discovered a volume, changing the login_timeout does not propagate down to those volumes. Only new volumes will inherit the change. There is a procedure in that TR, on how to correct that. The node will have to be maint mode to perform it. Regards, Don
No Events found!

Top