InsaneGeek

46 Posts

20808

April 16th, 2012 20:00

Best way to deal with mixed speed network (10Gb & 1Gb)

Running into issue where we are migrating off of NetApp arrays with 1Gb connections to a Celerra with 10Gb connections and the old NetApp is handily outperforming the Celerra on large I/O. Tracing things down in the network it looks like we overrunning buffers like crazy. We have a Cisco Nexus 5000 core with Cisco 2xxx FEX links that the Celerra is connected to. On one of the NAS heads I've got >2% retransmit rate, playing with the sndcwnd, fastRTO, std_slowstart I'm able to get speeds back up to resonable levels but the window size is tiny (64k bascially no window scaling) which then affects replication and high-latency WAN clients.

The network group is telling me the switch is sending pause frames constantly (not sure if they are to the Celerra or the FEX ISL's), my understanding is that the Broadcom bcmXG 10Gig chips don't support flow control so even if they are sent the Celerra ignores it. I'd have expected TCP window scaling to kick in and prevent it from being such a big problem (if I disable window scaling on local clients performance is good as well). Is there any reason that TCP windowing is not preventing such large amount of drops from occuring? Are there any specific network switch configuations that should be done etc. for a mixed speed network? Clients are all HP servers (DL360 or blades) running Redhat 4 or 5 with broadcom chips as well. Cisco TAC is having us play with some buffer settings and moving links around to better distribute across ASIC's but not sure how much runway that will get me when I'm dropping so much already. I'm not pumping high aggregate bandwidth (maybe 2-3Gbit) but I'm running over the switch buffers like crazy.

I'm a bit at a loss, almost thinking of dumping 10Gb connections and doing a 4x 1Gb link aggregation channel but there has got to be a way to do it as everybody I talk to don't seem to have the problem I'm having.

Celerra NS960: Each DM has 2x 10Gb connections in LACP configuration to Nexus 2xxx fabric extender with FEX links to Nexus 5000 switch

Responses(19)

ebrundick2

1 Rookie

•

5 Posts

1

December 14th, 2015 20:00

I finally made headway with this issue by using sndcwnd = 65535 in my environment.

Doing a read test over NFS from a Linux client attached via 1Gbps (VNX has 10GbE), the results looked like this:

sndcwnd = 0 (default): ~13MB/sec

sndcwnd = 300000: ~13MB/sec

sndcwnd = 131071: ~13MB/sec

sndcwnd = 75000: 111MB/sec with occasional drop to 68MB/sec

sndcwnd = 65536: 117MB/sec extremely consistent

sndcwnd = 65535: 117MB/sec extremely consistent

sndcwnd = 32768: 117MB/sec extremely consistent

Our environment involves Cisco 7000 series with FEX switches; the VNX is attached to the core switches while the 1Gbps servers are attached to FEX. We found that performance was much better when a client was attached at 1Gbps directly to the core, but consistently poor when attached to the FEX until this sndcwnd parameter was modified.

edit:

This is a server_param parameter that requires a datamover reboot.

e.g.:

server_param server_2 -facility tcp -modify sndcwnd -value 65535

server_standby server_2 -activate mover

...

server_standby server_2 -restore mover

A

afp92Tq1w012558

86 Posts

0

May 7th, 2012 08:00

Hi ,

Let me check if I am able to get anything on this

Thanks

Vanitha

BillStein-Dell

Moderator

•

285 Posts

0

May 7th, 2012 08:00

Few questions:

Why do you have the two 10Gb interfaces in a LACP channel? I certainly don't think you need the bandwidth aggregation. Hopefully it's for port redundancy.
I'm assuming you have the Neterion XBlade 65 (Tempest) 10Gb optical NIC (in the Celerra config, it's listed as a "fxg" port). What code version are you running?
Celerra is not likely to be sending the pause frames, as link negotiation is disabled by default, so someone would have had to deliberately set it to enabled. That would be pointless if the switch was not similarly set. You would look at the following:

$ server_sysconfig server_X -pci

to see what the setting is for link negotiation.

4. What kind of client is it? What kind of mount is it?

Remember that for TCP window sizing, it is the client that advertises its receive window size, and the Data Mover will match the max transmit window to the max receive window of the client. If it becomes impossible, or proves to be ineffective, trying to determine a good receive window on the client, the Data Mover can be forced to a specific send window size regardless of what the client advertises.

A Professional Services network analysis and tuning is probably your best bet to maximize performance.

baghelanand

4 Posts

0

November 2nd, 2015 10:00

were you able to fix this issue? I am running into a similar issue. The Network Team says that there is nothing what they can change on the Switch side and I need to validate the host/VNX Storage for this issue.

umichklewis

3 Apprentice

•

1.2K Posts

0

November 2nd, 2015 12:00

Bill's comment above is still valid - a Professional Services engagement with network tuning may be likely. Are you using 10GbE interfaces? Can you provide background on your configuration, issue and troubleshooting so far?

R

Rainer_EMC

4 Operator

•

8.6K Posts

0

November 3rd, 2015 03:00

in my opinion the network team always says the network is fine - until you prove otherwise

you need to look at the detailed statistics and logs from both the switch side and the VNX

baghelanand

4 Posts

0

November 3rd, 2015 07:00

we have a VNX5200 Data Movers which has the 10Gbe interface connecting to a Nexus 5k. The nexus 5K is configured globally for Jumbo frames but the VNX5200 Data movers are configured for 1500MTU becuase the host is 1Gbps and they are doing 1500MTU. Host is Cent OS 6.6 and its mounting a NFS export from my VNX5200.

While performing read operations I see a lot of re-transmission on the VNX5200 stats command. EMC Unified Performance team reviewed the stats and tcpdump I collected form the VNX5200 and they are pointing towards the network.

Notes from EMC support

The customer should investigate the cause of the frame delivery issues. The customer should check if any network device is unexpectedly discarding frames due to buffer discards, errors, QoS, etc. or if the frame delivery issues could be caused by saturation of the WAN bandwidth.

Network Team reviewed the Switches and they did not find any changes which can be done to improve the situation. Currently we are working on getting a 1Gb Modules for Data movers.

Anand Baghel.

baghelanand

4 Posts

0

November 3rd, 2015 10:00

Yes I have tried to setup MTU of 9000 for Host and the VNX5200 Data Movers. Did not see any improvement.

umichklewis

3 Apprentice

•

1.2K Posts

0

November 3rd, 2015 10:00

Have you tried configuring simply setting the datamover interfaces to use Jumbo Frames? It seems as though you're going to force the switch to fragment anyway, since it's MTU is 9000, but devices on both sides (CentOS host and datamover) are doing 1500byte MTUs anyway....

D

downhill2

2 Intern

•

157 Posts

1

February 1st, 2016 14:00

We have been fighting the exact same problem with the 5k's connected to the VNXs at 10Gb, and clients on the 2k's at 1Gb for years now. MTU has no effect on the problem. We did however identify part of the problem with the 2k's with the smaller buffers as I think someone else noted, the EFEX's do help, but it doesn't eliminate the problem entirely. I am going to try setting the send window as well and see if things improve. BTW, I noticed this very recently between 2 VNXs doing IP replication, each VNX at 10Gb, crossing some 1Gb links in between sites, literally 10's of thousands of retrains and dup acks during the replication, pathetically slow transfers of like 20MB/sec when the distance between these sites is a few miles and not a congestion problem, FWIW. Will report back findings if we get this ironed out. Thanks a lot ebrundick2 for the insight, I'm sure it's a match to what we are seeing.

ebrundick2

1 Rookie

•

5 Posts

3

February 2nd, 2016 06:00

Adding more to the discussion- As it turns out our network team was able to enact a change from the Cisco Nexus core switches, something like "no hardware queue" which was configured in reference to the FEX switch uplinks, which disables the FEX switches from performing any hardware queueing, relying on the Nexus core entirely for queueing/buffering.

Once that parameter was enabled, speeds with sndcwnd=0 went up above 110MB/sec for hosts on the server network (attached to FEX switches which link back to the Nexus). So this represented some sort of queueing issue on the part of the network equipment, and I think it's a foregone conclusion that you will encounter these sorts of "buffer" or "queue" limitations in your network any time you go from 10GbE to 1Gbps ethernet.

R

Rainer_EMC

4 Operator

•

8.6K Posts

0

February 3rd, 2016 01:00

Thanks

Great feedback

D

downhill2

2 Intern

•

157 Posts

0

February 10th, 2016 11:00

Our network team got some suggestions from Cisco support that implied the risk of port starvation if the HW queuing was disabled outweighed the semi-limited problem we are fighting, so, they will not disable it on anything. Looking to limit the sndcwnd this weekend on one our VNXs and run some tests.

D

downhill2

2 Intern

•

157 Posts

1

February 15th, 2016 13:00

Update: Set the tcp sndcwnd to 65535 and have observed "0" duplicate acks or retransmissions of any kind between these sites when IP Replicator is running. The 1Gb link is saturated now for at least 50% of the session which resulted in the overall duration being cut by ~60% (9 hours). With the nature of the files, this is far better than it was before. No client-side issues noted yet. Thanks

breakside

1 Message

2

October 19th, 2016 06:00

This worked!

I've been working with our VNX 5200s for over a year now trying to figure out why performance was so poor. Even our 7 year old Win2k3 servers were outperforming it.

Our environment is similar - we also have Nexus 5548 core switches with 2k FEX.

I've had countless tickets open with EMC techs. No one could point to a resolution. We could see that retransmissions plagued us - between 1-4%, but no one knew what to do about it. EMC technicians would write it off as a misconfigured network. Cisco couldn't find anything wrong. I finally brought in a wireshark expert to help pinpoint where the problem was - but we couldn't see any errors other than retransmissions on the link.

I made the change like ebrundick2 suggested "server_param server_2 -facility tcp -modify sndcwnd -value 65535" and rebooted the datamovers. Retranmissions went to almost zero, and CPU went down into the 20% range.

When I told the tech I was going to try this suggestion to modify the sndcwnd value, he was skeptical and said that this "enters into the realm of fine tuning". Fine tuning it may be - but it fixed our environment.

Thank you for finding this.

1
2

View All

No Events found!

Celerra

Best way to deal with mixed speed network (10Gb & 1Gb)

Was this post helpful?