Unsolved
This post is more than 5 years old
7 Posts
0
1379
March 6th, 2017 09:00
Bind Volumes to Member (and a performance question)
Hi,
two weeks ago a battery in one of the two members in our PS6100 group went bad and we were told by the dell support to swap out the whole controller (seems odd to me but this is not the point of this post).
Because we know from previous incidents that it would be a bad idea to initiate a failover with more than 250 connections to this member. It takes minutes for the secondary controller to pick up the connections and by this time some of our VMs which are served by this group would have died from load.
So we added a PS6000 to the group and through volume select x bind node3 we bound the volumes to either node1 or node3 (node2 is the bad one).
Now my two questions:
Is there any way to see the bind operations progress (the relocation) or the status? show members only shows me the current distribution but neither if this volume is actually be moving or what the desired state would be.
Second: Today I noticed an entry in the event log stating "volume xxx is no longer bound to node1" followed by this volume moving back to node2. I did not unbind the volume and it seems that I am not able to stop the relocation. Anyone with an idea what happens here?
Bonus question:
we have around 500 volumes in the group and around 450 connections at a given time. The performance of the interface (Java GUI and CLI) is very unstable. For example: bringing up the "show volumes" results sometimes is as fast as 30 seconds sometimes it takes more than 4 minutes. Is this the expected behaviour? I don't think so. So what happens here?
Thanks for any answer or hint.



timevers
7 Posts
0
March 6th, 2017 11:00
Hi Don,
thanks for your answer. I am running 6.0.11 on all members and I have plenty of space free on both node1 (1.3TB) and node3 (4.5TB) so I don't see why any load balancer should feel the urge to move something under my feet.
Is the mentioned message in the log indicating exactly this behaviour, or might it be something else? The exact message was:
Volume root1128.c-sda is no longer bound to member node1.
Could you elaborate on what might get a group so busy that a listing of the volumes takes several minutes to show? Is this expected behaviour?
Regarding the failover: It's not about the timeouts. They are set according to the Dell recommendations (even higher in fact), but a lot of the VMs running on the volumes receive a lot of requests which trigger high load issues if the disks are unavailable for minutes. We know that this would likely lead to some crashes when a controller dies (which btw did not happen during the 6 or so years we are using Equallogics now).
But we do not want to gamble on this if we can avoid it.