Unsolved

7 Posts

1774

August 27th, 2021 07:00

PS4210XV performance degradation after upgrading from ESXi6.7 to 7.0.2

We have upgraded three ESXi 6.7 to 7.02 (DELL PE R6516), all talking iSCSI over 2x 10GBit NIC with the PS4210XV. After that the (read-)performance was degraded.
A synthetic test with 4 subtests dd read/write 512/1M block size brings with 6.7 approx. 500-600MB/s. With 7.02 write remains, but read only approx. 40 MB/s. A new installation instead of an upgrade brings about 180 MB/s for read, still 1/3 of the original performance.
We installed one host back to 6.7, with the original, good performance.

Has anyone had similar experiences and perhaps a solution?

Moderator

 • 

7.7K Posts

August 27th, 2021 16:00

Hello Wia,

What is the current version of firmware that your PS4210XV is running?

4 Operator

 • 

1.5K Posts

August 29th, 2021 04:00

Hello, 

 I believe you have a case opened on this issue?   F/W is 10.x I believe? 

  ESXi v7.x is not a supported, certified OS for the Dell PS Series SANs.   Though I have not heard of other customers who have upgraded, having the issue you describe.  Since the performance returns after downgrading, I would first insure that all the recommended best practices are in place.  MPIO, Delayed ACK, login timeout, etc..  Also that the VMs don't share multiple VMDKs on a single virtual SCSI adapter.  If you VMs are using the paravirtual driver, you might want to try the LSI one instead. 

 When you are testing, has anyone used ESXTOP then expand the disk IO stats to see where the delay is greatest?  In ESXTOP you can split out the disk latency into subset showing is the delay really the storage or at the ESXi kernel level. 

 https://kb.vmware.com/s/article/1008205

 Regards, 

Don

7 Posts

August 31st, 2021 02:00

EQL FW is V10.0.3
Yes, I have an opened case.

HCL

We have this reproduced with an upgraded ESXi, as well with a fresh install. Best practice on the fresh install as recommanded from EQL and VMWare.

My goal is to learn if other Equallogic users are testing this or have already found this to be the case?

7 Posts

August 31st, 2021 07:00

We tested this and a lot of other "best practice". We always had MPIO and RR, no Jumboframes. Tested also with FP, Jumboframes.
The configuration ist the same as 6.7 because of migration install. We tested as well with fresh install.

DELL and VMWare will do nothing here, the entry in the HCL is "incorrect".

Thus, I hope that others do not get the same problems with production environment and end the discussion here.

4 Operator

 • 

1.5K Posts

August 31st, 2021 07:00

Hi, 

 I have not seen any other EQL customer report a problem upgrading to ESXi v7.x.  I have a small cluster in my lab as well that hasn't shown any issues.  It's not a qualified OS for the EQL Dell PS Series SANs.  

 I would suggest using VMware Round Robin MPIO with IOs per path changed to 3.   Versus FIXED or the default VMware Round Robin IOs per path of 1000.  

 With regards, 

Don

7 Posts

August 31st, 2021 08:00

Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller, driver bnxtnet
SW iSCSI

4 Operator

 • 

1.5K Posts

August 31st, 2021 08:00

Hello, 

  The entry isn't 'incorrect'   DELL/EMC didn't submit qualification test results for EQL and ESXi v7.x.  VMware's default is always going to be FIXED unless the vendor supplies the results of the QA certification suite.  Then the onus is on the storage vendor not VMware to support that MPIO mode. Same goes for the Dell MEM MPIO enhancement.  Dell has to do the testing and certification, then provide that to VMware to get the security key needed to install it as a certified extension. 

 Jumbo frames are very helpful to reduce CPU load and get a little extra performance. Since the array works fine with v6.7 that would suggest the array and switches are working correctly.  My hunch is a network driver.  What kind of network cards are you using?  If you are using the broadcom iSCSI offload have you tried just using the SW iSCSI adapter?   I've seen firmware/driver mismatches cause issues in the past with dependent iSCSI HW adapter.   And on some 10GbE Intel NICs issues with interrupt coalescing I think it was caused issues. 

 Regards, 

Don  

 

 

4 Operator

 • 

1.5K Posts

August 31st, 2021 09:00

Hello, 

 When did you last upgrade the servers themselves?  Not the VMware OS.  If you can, I would suggest upgrading to current versions and try again.  If that fails then switch to using the iSCSI offload of those broadcoms.  Just remember to set all the best practices first on each adapter before trying to discover volumes.   You'll have to unbind the SW iSCSI adapter and bind each HW iSCSI adapter individually. 

 Regards, 
Don

7 Posts

September 3rd, 2021 01:00

We have reset the three servers to 6.7, with the old, good performance.

One last interesting test I did: on one of the 6.7 servers I installed a nested ESXi, first in version 6.7, then upgraded to 7.0.2, configuration as far as possible like the physical servers. The performance is the same in both(!) cases, slightly worse due to the double virtualization, but READ and WRITE are as expected here.

So from my point of view the problem lies in the interaction iSCSI stack of the 7.X / Broadcom BCM57416 driver / EQL.

4 Operator

 • 

1.5K Posts

September 5th, 2021 04:00

Hello, 

  Thank you for the update.  That's a very interesting test.  

I don't see it being on the EQL side but with ESXi v7 and the Broadcom.  Did you look into upgrading the BIOS and firmware on that server?  

  Regards, 

Don

No Events found!

Top