Start a Conversation

Unsolved

1 Rookie

 • 

13 Posts

1884

February 21st, 2021 20:00

Question on PowerFlex Spare and Raid1

Hello,

Can someone confirm, I have 6x nodes and configured 17% for Spare policy to allow 1 node down. Assuming that I have 1 node down, can the cluster still allow another node to be down since configuration is raid1?

 

Thanks,

Peter

31 Posts

February 22nd, 2021 16:00

Hi Peter,

The 17% Spare Policy that you have set, really just prevents you from creating more Volumes into that logical spare space. If you doubled it to 33% you could guarantee that if you had two nodes fail sequentially you would still be okay. 

Since this is 'virtual' spare space, there is also another scenario where you can sustain multiple sequential failures and that is if you have not consumed the space for Volumes yet. The system will automatically borrow whatever spare space it needs in such a scenario --- the spare policy is more just a "guarantee" that you will definitely have enough reserved to always rebuild. If you don't set the policy, it will be "non-guaranteed" and dependent on how much raw capacity has already been allocated to Volumes. 

One of the golden rules of PowerFlex though is that all writes must be written twice on different nodes. There is no exception to this, and it will not allow you to continue if this requirement cannot be met -- and this is why you must always plan to have sufficient spare capacity on hand for your environment. (The recommendation is 1 node < 10 nodes, and 10% if >10 nodes).

1 Rookie

 • 

14 Posts

March 13th, 2021 18:00

It depends on the setup and which two nodes went down. 

If your setup has only two MDM and one Tie-breaker, and both of your MDM nodes went down, you are down.  

If the down nodes are not MDM node, you can have up to 5 nodes down as long as your storage space in the remaining nodes can hold minimum one copy of your data and the surveying node is a MDM node.

In the ScaleIO, the data is stored in two different nodes.  If one node down, the system will copy the remains data spread all other surveying nodes to make it two copies. All surveying nodes will use up more space to store the data.  This is the Rebuild process you can see in the GUI. Before the rebuild process is competed, you CAN NOT have another disk or node go down.  Otherwise, some data may lose, because the remaining node only has one copy of the dead node data before the rebuild is completed.  Lose anther node or disk before the rebuild process is completed would result data lose.  Once the rebuild process is completed, all data are store two copies again across the nodes.  In this case, you can have another node go went again.  The rebuild process repeats on the surveying nodes again until the data are reconstructed two copies across all surveying nodes and you can have another node down again.   

The Rebuild process is facilitated by the MDM and execute by SDS.  The storage will survey as long as at least one MDM exits and the remaining storage can hold at least one copy of the data. In your example, if each of 6 node has one 2TB drive, and you total used space is 1TB.  You can keep losing node, even 5 nodes.  As long as the last node is a MDM node, you data is still accessible.  Because the last node has 2TB drive, and you have only 1TB data.  

If you lose both MDM nodes and the same time, or you lose another disk/node before the Reconstruct is completed,  you are going be hard down. 

  

1 Rookie

 • 

13 Posts

March 18th, 2021 05:00

thank you!

No Events found!

Top