MD3620i Performance issue - Low IO with high latency of LUN

Question

Hope everyone is staying healthy in these crazy time.

Question.... We have a dell MD3620i with 3x MD1220 expansion units

we have a total of 16x SSD's in a Raid 10 Disk Group and 80x 1.2TB 10k disks in a Disk pool

there are about 12 LUN's configured for various things. We also have the high performance license enabled

here is the question... let say we are running 4k or 5k I/0 per sec to the SSD LUN the latency of that LUN seems to go very high and all the other LUNS owned by that controller also suffer very high latency. The latency can get to 40 to 300ms for all LUNS on that controller. This does not really make sense as the I/O load is not really that crazy high.. i would really hope a controller can support that. if we were pushing 50k I/O's per sec.. i could maybe understand this. The latency even goes super high on the 10k Disk pool when the load is on the SSD Disk group.

Does anyone have any suggestions of where to look or some information to post here to help with the debug.

Thank you in advance

DellEMCSupport · Answer

Hello leroyl,

What is the current version of firmware that is on your MD3620i? Are all your SSD in the same enclosure or in different enclosures? How many virtual disks/ disk pools are owned by each controller?

leroyl · Answer

Thanks for the reply.. here is the info requested

We are running version 08.20.24.60 and all our HD/SSD's are also on their latest firmware. Our SSD's are NOT in the same enclosure - Each enclosure has 4x SSD disks

Here is our disk configuration info:

1x 10k disk pool - 72 disk - 7 virtual disks

1x 10k disk group - 7 disks - RAID 5 - 2 virtual disks

1x SSD disk group - 16 disks - RAID 10 - 3 virtual disks

as far as the controllers we have been monitoring and balancing these as best we can to keep the load equal on both controllers. But each time one controller gets a mild amount of I/O's all virtual disks on that controller suffer. Seems like there has to be some setting somewhere that is causing this as I would expect each controller to handle much more then this.

DellEMCSupport · Answer

Hello,

I'm thinking that the best next step is going to be to grab some logs from MDSM. Performance issues could come from a number of different places and this may help to more quickly narrow it down.

https://dell.to/3atPjzx

leroyl · Answer

thanks again for the reply... what is the best way to send you the log info?

DellEMCSupport · Answer

That would depend on the size of the log file. When you have both it and the filesize, we can make a plan around that. There are a few options for getting that file transferred.

leroyl · Answer

The .7z Support Data file is about 3.5mb let me know how to get this to you

DellEMCSupport · Answer

Hello leroyl, What is the OS that is running on your hosts? How many iSCSI connections are you using per controller?

leroyl · Answer

We have 10 VMware ESXi Hosts with a total of 51 active connections - 29 on controller 0 and 22 on Controller 1

DellEMCSupport · Answer

Hello leroyl, Thanks for the information. I will let you know shortly what I find.

leroyl · Answer

Hope you can find something.. let me know if you need anymore info. thanks again for your assistance.

leroyl · Answer

Thanks for the reply I will review the doc and make sure we did not miss anything..

we have the MD3620i connected to 2x Dell MXL 10/40Gb switches

These are in the "B" slots on the m1000e Chassis.. then our ESXi hosts are all also in the same chassis.

DellEMCSupport · Answer

Hello leroyl,

Sorry for the delay. First thing to check is to confirm that your following best practices for iSCSI for an MD36xx system. here is a link to that guide. https://dell.to/2YtdNEw

I can see that your VD are distributed pretty evenly so that doesn’t appear to be any issue. I am seeing that the drive activity is a little high. I am looking to see if that is due to your disk pool size or another factor. What is the model of switches that your MD3620i is connected to?

leroyl · Answer

I read over the attached best practice document again and saw that we did not have flow control enabled on the Dell MXL blade switches - we do have jumbo frames enabled and verified.

So we went ahead and enabled it.. in order to do so we had to disable PFC then enable flow control on all ports (this switch is only used for iSCSI traffic).

After enabling flow control we are still seeing "throttle" and "discard" counters increment for the interfaces connected to some servers and for all ports connected to the storage unit.

DellEMCSupport · Answer

Hello leroyl,

What you need to do is to enable Media scan on your MD3620i. After that you will need to increase Cache block size to either 32KB or 64KB since you are using vmware. Then you will need to upgrade SSD drive firmware as it is out of date for your SSD’s, and the newer firmware improves performance of SSD’s. Here is the link for the firmware. https://dell.to/2Flb2hS

leroyl · Answer

Thanks for that information.. we have changed the Cache block size and also enabled media scan. We will also apply any updates to drive firmware where available

we will monitor the 2 changes made and let you know if there are improvements.

Our team here also did some digging in the log bundle and noticed a high TCP "Duplicate ACK" percentage rate. Controller 1 has a rate of 8% and 18% on each Ethernet port. while controller 0 is .01% and .007% for its ports respectively.

what would be considered a normal Duplicate ACK rate? if this percentage on controller 1 is not normal and needs to be addressed. What would this issue be typically cased by?

PowerVault

MD3620i Performance issue - Low IO with high latency of LUN

Was this post helpful?