Unsolved
This post is more than 5 years old
4 Posts
0
2107
October 3rd, 2017 08:00
PS6000 pausing issue
We are using a pair of PS6000's as iscsi targets, mounting the volume on six servers each. Everything has been going well for a couple years now, but recently we are having an issue.
I've tried searching online for answers but frankly I don't know how to describe the problem in a meaningful way.
What is happening is we see the load spike on the servers. When I look at the offending processes they are stuck in D state. When I inspect them with lsof I invariably see they are using the SAN.
The PS6000's are in different datacentres so its not possible that the issue is network related. Its different SANs using different networks with different clients.
When I try to access the SAN sometimes it seems fine and sometimes it hangs forever. By access I mean either access a file or do a directory listing.
Failing over to the other controller appeared to have resolved the issue, but it just happened again. So if it did change things at all, it was only temporary.
One area of concern is the SANs are on the public internet(I inherited these), so is it possible they have a bug that someone is triggering to cause this?
The SAN is running V6.0.4 (R322829).



timdau1
4 Posts
0
October 3rd, 2017 09:00
timdau1
4 Posts
0
October 3rd, 2017 15:00
Any suggestions for troubleshooting beyond buying a support contract?
bealdrid2
1 Rookie
•
117 Posts
0
October 3rd, 2017 17:00
I'm curious as to how often these events occur for you. I had similar issues with my PS6510/6500 arrays. Same deal; had them in different locations, saw similar behavior. For me it was 20-30s of random pauses a couple times a day. The duration of the hang/pause was directly related to how much free space there was on the array. At times we saw hangs upwards of 25-30s; then as we were in the process of vacating it, the hang times would gradually go down to 10-15s as the "in-use" space decreased. I also noticed active page movement (replication, balancing) between arrays caused the frequency of these "hangs" to go up. Say from 1/day to 1/hour. I had a case open a LONG time; we got sidetracked due to other issues, but still they could never get to the bottom of it. The issue was observed with all OS's- ESXi, RHEL, Windows, etc which all had best practices set. Our other PS6210 arrays never did this and they used all the same switches, were even in the same group. My belief is there is some fundamental issue with the PS6500/6510 series array controllers. Maybe the 6000's have the same thing.
timdau1
4 Posts
0
October 3rd, 2017 18:00
I think we have sufficient space, more than 1TB free at least. How much do you recommend?
TotalSpace: 5.1TB
UsedSpace: 3.08TB
SnapSpace: 603.63GB
dell-richard g
605 Posts
0
October 3rd, 2017 18:00
Maybe I can add a few details. I realize you said that it can't be a network issue, but check these just in case:
1. Check the PS6000 system event log in the GUI. See what is happening at the time of the long delays
2. Are there iSCSI connection drops?
3. When the delays occur, have the hosts ping the PS6000 active eth interfaces to see if a response comes back. Just checking to see if the interfaces are still operational during the long pauses.
4. On the EQL GUI Network Tab, check the error counter for each eth interface and make sure it is not incrementing.
5. Have the PS6000 been rebooted recently?