Start a Conversation

Unsolved

Closed

S

11 Posts

612

June 20th, 2023 21:00

VMware vSAN with dell switch

Hi,

I have a 3 node vsan cluster connected to 2 dell s3124 switch. The switch is configured with VLT on a 1g port. After power cut, the network of the host are corrupted and partitioned. The vSAN creates  a network partition on the host due to delayed network connectivity and isolated that host. 

Just wanted to know whether the issue is with the VLT and wanted to know how to check the interface usage.

Moderator

 • 

2.8K Posts

June 21st, 2023 02:00

I was checking a vSAN cluster partition. when the ESXi hosts in the vSAN cluster fail to communicate with each other using both multicast and unicast, cluster partition happens. vSAN objects inaccessible until the network problem fixed. I made some inference from these articles i found. 

https://dell.to/43Qv1LE

https://dell.to/3NB4GeJ

https://dell.to/43L4ejI

 

  • subnets that are not configured correctly because all ESXi hosts must have the same subnets.
  • vSAN traffic VMkernel adapters that aren't configured correctly all ESXi hosts must have a VSAN vmknic configured.
  • VLANS issues that aren't configured properly
  • Multicast issues that are specific (all ESXi hosts must have the same multicast settings) You can use more health service checks on the network to help you identify the root cause of the issue. You can also use vmkping and pktcap commands to check the network connection and packet capture among the ESXi hosts.

Moderator

 • 

2.8K Posts

June 21st, 2023 02:00

Hi, if you suspect that VLT is causing the issue, might be helpful to check command port-utilization 

Also you can check interface statistics via command show interfaces counters 

 I'm still researching if I found something useful I'll let you know.

11 Posts

June 21st, 2023 03:00

Hello,

Thanks for your response, i have read that documentation previously. However the issue still persists after multiple powercuts. The esxi looks completely fine, except it is not reachable. Since it is in a distributed switch setup, reseting it is a big task. 

Moderator

 • 

2.8K Posts

June 21st, 2023 03:00

yeah I see, it sounds like not good situation. While you are saying ESXi host isn't reachable, restarting managament agents came to my mind. like using via SSH and running cmd https://dell.to/46gGLc0.restart 

Also might need to restart vCenter appliance. 

11 Posts

June 21st, 2023 04:00

yeah done restarting management network as well. I have captured details of a  vlt port and the vSan port utilization details.

vSAN port utilization


17419865 packets, 16491365641 bytes
637759 64-byte pkts, 3244968 over 64-byte pkts, 2147708 over 127-byte pkts
690333 over 255-byte pkts, 638694 over 511-byte pkts, 10060403 over 1023-byte pkts
357788 Multicasts, 636266 Broadcasts, 16425811 Unicasts
0 runts, 0 giants, 18 throttles
0 CRC, 0 overrun, 346842 discarded

LACP port utilization 

Input Statistics:
1335869 packets, 99848811 bytes
8817 64-byte pkts, 1280396 over 64-byte pkts, 44230 over 127-byte pkts
808 over 255-byte pkts, 663 over 511-byte pkts, 955 over 1023-byte pkts
170935 Multicasts, 1157554 Broadcasts, 7380 Unicasts
0 runts, 0 giants, 0 throttles
0 CRC, 0 overrun, 463059 discarded

Moderator

 • 

2.8K Posts

June 21st, 2023 05:00

vSAN port utilization is higher than LACP port utilization which is normal in the vSAN cluster I think. It might be related VLT ports and VLT configuration. 

11 Posts

June 26th, 2023 05:00

Hi,

As you can see, there is a lot of packets discarded on both the ports

Moderator

 • 

2.8K Posts

June 26th, 2023 06:00

Hi, VMware does not directly enter my support area, but I try to help at my own level of knowledge. You're correct, I apologize for missing that detail. The high number of discarded packets on both the vSAN and LACP ports is indeed an indication of potential network issues. I think you can contact the VMware team or the Dell software team for further support. From the simple to the complex that comes to my mind, check the physical connections first. Then checking the correct configuration of duplex settings. FW update on switches and specifically try to find the source of congestion on the network.

11 Posts

June 26th, 2023 22:00

Hi,

Thanks for your reply,

As already stated, the switch to switch interconnect link for LACP is being configured on an 1GIG port instead of a 10GIG port. And i suspect that as an issue. And also want to understand what traffic is being transferred through the switch to switch interconnect link and whether it bottle necks the 10GIG traffic of vSAN

Moderator

 • 

3.8K Posts

June 27th, 2023 05:00

Hello,

as in the other thread, it is not supported to peer the two switch with 1Gb connection, and yes this could be the reason.

Thanks

No Events found!

Top