Unsolved
2 Intern
•
37 Posts
0
2107
August 7th, 2020 11:00
MD3620i Performance issue - Low IO with high latency of LUN
Hope everyone is staying healthy in these crazy time.
Question.... We have a dell MD3620i with 3x MD1220 expansion units
we have a total of 16x SSD's in a Raid 10 Disk Group and 80x 1.2TB 10k disks in a Disk pool
there are about 12 LUN's configured for various things. We also have the high performance license enabled
here is the question... let say we are running 4k or 5k I/0 per sec to the SSD LUN the latency of that LUN seems to go very high and all the other LUNS owned by that controller also suffer very high latency. The latency can get to 40 to 300ms for all LUNS on that controller. This does not really make sense as the I/O load is not really that crazy high.. i would really hope a controller can support that. if we were pushing 50k I/O's per sec.. i could maybe understand this. The latency even goes super high on the 10k Disk pool when the load is on the SSD Disk group.
Does anyone have any suggestions of where to look or some information to post here to help with the debug.
Thank you in advance


DellEMCSupport
631 Posts
0
August 7th, 2020 16:00
Hello leroyl,
What is the current version of firmware that is on your MD3620i? Are all your SSD in the same enclosure or in different enclosures? How many virtual disks/ disk pools are owned by each controller?
leroyl
2 Intern
•
37 Posts
0
August 13th, 2020 08:00
Thanks for the reply.. here is the info requested
We are running version 08.20.24.60 and all our HD/SSD's are also on their latest firmware. Our SSD's are NOT in the same enclosure - Each enclosure has 4x SSD disks
Here is our disk configuration info:
1x 10k disk pool - 72 disk - 7 virtual disks
1x 10k disk group - 7 disks - RAID 5 - 2 virtual disks
1x SSD disk group - 16 disks - RAID 10 - 3 virtual disks
as far as the controllers we have been monitoring and balancing these as best we can to keep the load equal on both controllers. But each time one controller gets a mild amount of I/O's all virtual disks on that controller suffer. Seems like there has to be some setting somewhere that is causing this as I would expect each controller to handle much more then this.
DellEMCSupport
631 Posts
0
August 14th, 2020 07:00
Hello,
I'm thinking that the best next step is going to be to grab some logs from MDSM. Performance issues could come from a number of different places and this may help to more quickly narrow it down.
https://dell.to/3atPjzx
leroyl
2 Intern
•
37 Posts
0
August 14th, 2020 08:00
thanks again for the reply... what is the best way to send you the log info?
DellEMCSupport
631 Posts
0
August 14th, 2020 08:00
That would depend on the size of the log file. When you have both it and the filesize, we can make a plan around that. There are a few options for getting that file transferred.
leroyl
2 Intern
•
37 Posts
0
August 14th, 2020 08:00
The .7z Support Data file is about 3.5mb
let me know how to get this to you
DellEMCSupport
631 Posts
0
August 17th, 2020 16:00
Hello leroyl,
What is the OS that is running on your hosts? How many iSCSI connections are you using per controller?
leroyl
2 Intern
•
37 Posts
0
August 18th, 2020 05:00
We have 10 VMware ESXi Hosts
with a total of 51 active connections - 29 on controller 0 and 22 on Controller 1
DellEMCSupport
631 Posts
0
August 18th, 2020 11:00
Hello leroyl,
Thanks for the information. I will let you know shortly what I find.
leroyl
2 Intern
•
37 Posts
0
August 24th, 2020 14:00
Hope you can find something.. let me know if you need anymore info.
thanks again for your assistance.
leroyl
2 Intern
•
37 Posts
0
August 24th, 2020 14:00
Thanks for the reply I will review the doc and make sure we did not miss anything..
we have the MD3620i connected to 2x Dell MXL 10/40Gb switches
These are in the "B" slots on the m1000e Chassis.. then our ESXi hosts are all also in the same chassis.
DellEMCSupport
631 Posts
0
August 24th, 2020 14:00
Hello leroyl,
Sorry for the delay. First thing to check is to confirm that your following best practices for iSCSI for an MD36xx system. here is a link to that guide. https://dell.to/2YtdNEw
I can see that your VD are distributed pretty evenly so that doesn’t appear to be any issue. I am seeing that the drive activity is a little high. I am looking to see if that is due to your disk pool size or another factor. What is the model of switches that your MD3620i is connected to?
leroyl
2 Intern
•
37 Posts
0
August 26th, 2020 13:00
I read over the attached best practice document again and saw that we did not have flow control enabled on the Dell MXL blade switches - we do have jumbo frames enabled and verified.
So we went ahead and enabled it.. in order to do so we had to disable PFC then enable flow control on all ports (this switch is only used for iSCSI traffic).
After enabling flow control we are still seeing "throttle" and "discard" counters increment for the interfaces connected to some servers and for all ports connected to the storage unit.
DellEMCSupport
631 Posts
0
September 8th, 2020 12:00
Hello leroyl,
What you need to do is to enable Media scan on your MD3620i. After that you will need to increase Cache block size to either 32KB or 64KB since you are using vmware. Then you will need to upgrade SSD drive firmware as it is out of date for your SSD’s, and the newer firmware improves performance of SSD’s. Here is the link for the firmware. https://dell.to/2Flb2hS
leroyl
2 Intern
•
37 Posts
0
September 10th, 2020 08:00
Thanks for that information.. we have changed the Cache block size and also enabled media scan. We will also apply any updates to drive firmware where available
we will monitor the 2 changes made and let you know if there are improvements.
Our team here also did some digging in the log bundle and noticed a high TCP "Duplicate ACK" percentage rate. Controller 1 has a rate of 8% and 18% on each Ethernet port. while controller 0 is .01% and .007% for its ports respectively.
what would be considered a normal Duplicate ACK rate? if this percentage on controller 1 is not normal and needs to be addressed. What would this issue be typically cased by?