Unsolved
This post is more than 5 years old
2 Posts
0
3827
June 29th, 2018 11:00
Dell Compellent SC8000 SAN Connectivity to Cisco Nexus 3172
Cross-posting this from Cisco Forum as this is in the end an inter-operability issue:
My storage engineers are seeing latency they are attributing to the network 20-30ms to from Dell/VMWare ESX servers - as seen in the Dell management utility. Dell blames the network and says the problem is that Flow Control is not enabled.
When I went to try and enable flow control I was disappointed to find:
sw-3172-a(config-if)# flowcontrol receive on
ERROR: This CLI is not supported on n3k platform
and later saw:
"link-level flowcontrol (LLFC) is not supported on the Nexus 3000 and 3100 series. It is supported on the Nexus 3500 series and Nexus 9000"
Is that the end of the story? I see there is another type of flow control "priority flow control". Would that likely serve the the same purpose for Dell Compellent needs? Looks like a lot of configuration. What I am seeing is the Compellent SAN sending RxPause frames to the 3172.
Is there any ability to turn OFF flow control on the compellent? Perhaps that causes more problems than improving performance?
dell-richard g
605 Posts
1
June 29th, 2018 21:00
Hello
0. I assume you are running iscsi?
1. Is this 1GbE or 10GbE you are running? (Are both the server and target 10GbE?)
2. Which NIC(s) are using on the SC8000?
3. How many NIC ports are being used on the SC8000? Are links equally load balanced?
Yes, some Nexus switches do not support 802.3x flow control. But flow control is only necessary when ports are congested. You need to check your Nexus port statistics and confirm if there are any dropped packets/congestion. If there is no congestion/dropped packets on the switch, then the switch is not the issue.
The SC8000 does not send RxPause. The SC8000 will send TxPause frames to tell the sender to pause sending traffic. If the switch does not honor an incoming pause frame from the SC8000, then the packets will keep coming from the sender, thus causing the SC8000 to start dropping packets due to congestion. This in turn will cause retransmission from the sender (i.e. server).
As far as PFC (priority flow control), this is used to pause selected traffic classes. This will not make a difference in your case, hence no need to use it. (and the switch would have to support it as well). To isolate this further, check all switch ports stats for packet drops, CRC errors, and the incoming rate of pause frames. Also, make sure that your cabling is such that the SC8000 paths stay within their fault domains (i.e. packets don't cross the interconnect if there are two switches connected to each other).
One thing you can try as an experiment is to adjust the TCP Window size on your SC8000 fault domain. This is located under Advanced Port Properties of the fault domain entries.
Hope this can get you looking in the right direction.
SanMikeTeo
2 Posts
0
June 30th, 2018 10:00
Thank you very much for the great reply. Well we can say at the port level there are no errors. There is an abundance of RX Pause frames which I assumed must be coming from the Compellent if the Nexus 3172 doesn't speak link-level flowcontrol (LLFC). Any thought on that counter being so high - 645 Million?
There are four Ethernet ports Eth 1/1 - 4 connected to the SAN. Can you expand on your thought of load balancing among the links?
There are two 3172 Nexus connected to each other with 2x10Gbps port channel/trunked. I'll review the cabling. Your point would be that if ESXi01 is attached to Nexus-A and it's traffic flows to SAN attached to Nexus-B via the port channel/interconnect - this could cause an issue?
Ethernet1/2 is up
admin state is up, Dedicated Interface
Hardware: 100/1000/10000 Ethernet, address: acf2.c5f8.3949 (bia acf2.c5f8.3949
)
Description: compellent a-2
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 2/255, rxload 1/255
Encapsulation ARPA, medium is broadcast
Port mode is access
full-duplex, 10 Gb/s, media type is 10G
Beacon is turned off
Auto-Negotiation is turned off
Input flow-control is off, output flow-control is off
Auto-mdix is turned off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
EEE (efficient-ethernet) : n/a
Last link flapped 90week(s) 1day(s)
Last clearing of "show interface" counters never
2 interface resets
30 seconds input rate 65242824 bits/sec, 2395 packets/sec
30 seconds output rate 85753184 bits/sec, 2690 packets/sec
Load-Interval #2: 5 minute (300 seconds)
input rate 45.64 Mbps, 1.93 Kpps; output rate 80.44 Mbps, 2.45 Kpps
RX
125378311434 unicast packets 0 multicast packets 1 broadcast packets
125378311435 input packets 483636429216502 bytes
74831693052 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
634841682 Rx pause
TX
115056817254 unicast packets 78406574 multicast packets 16514260 broadcast
packets
115151738088 output packets 396877513017173 bytes
46710909050 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause
interface Ethernet1/2
description compellent a-2
switchport access vlan 46
spanning-tree port type edge
spanning-tree bpduguard enable
dell-richard g
605 Posts
0
July 2nd, 2018 22:00
1. If the switch port is registering Rx Pause frames, this means that it is receiving pause frames from the edge device. The edge devices transmits (Tx) pause frames, and the switch is just saying that it is receiving pause the frames. Yes, that does appear to be high but the rate/sec would be more accurate to gauge on how often the edge device is sending pause frames.
2. Having two switches connected together via the port-channel is fine. But for SC series environments, make sure that data does not cross the port-channel. You control this by setting up fault domains on the SC controllers (via DSM GUI). Each fault domain has its own network. You can see the cabling setup in our switch configuration guides (SCG) located at: Web Link (this in itself is not causing the high rate of pause frames, but make sure your cabling is correct for H/A purposes.
3. What NIC model is on the SC8000? You can look at the SC8000 GUI.
4. How many physical NICs on each SC8000 controller are used
5. In the SC controller GUI, go to the charting TAB and make sure that each SC8000 NIC port has I/O running across it. If you are using one dual port NIC per controller, than you should have I/O equally load balanced across all four NIC ports (2 per controller). Check the graph for each port in the fault domain.
DELL-Bob Mi
2 Intern
•
230 Posts
0
July 10th, 2018 14:00
One item to check is on your ESX Host. Check to see what the setting is for Delayed Ack.
iSCSI initiator settings for delayed ACK: During periods of high network congestion in some environments, iSCSI transfer latency may exceed acceptable levels. VMware recommends disabling delayed ACK.
VMware recommends disabling delayed ACK per page 9 of the Dell EMC SC Series Best Practices with VMware vSphere 5.x–6.x
http://en.community.dell.com/techcenter/extras/m/white_papers/20441056
rookwoodict
1 Message
0
July 15th, 2018 22:00
Hi Mike,
One thing you need to do is to enable "Input flow-control is off" to input flow-control is on on the interface where you connect to compellent but Nexus 3172 doesn't support LLFC.
Else, you need configure PFC on both nexus 3172 and compellent sc8000.
Regards,