Start a Conversation

Unsolved

H

2 Posts

873

January 19th, 2021 00:00

Multiple SDSs down impact of Scaleio

Hi,

 

I would like to know about ScaleIO failure impact in detail, hoping this is the right place to make a question like this.

When we use volumes, as far as I know, each of volumes is split into lots of chunks and they are placed in SDSs like the below figure.

image-2021-01-19-09-47-49-172.png

In this case, two or more SDS's down(different fault sets) can impact on parts of all the volumes used by mutiple VMs as both mirrors are broken.

Here are a few questions for this:
- Does this case make faults of VMs using the volumes?
   - If so, what we can do to restore the VM?
   - If not, what impacts are expected?


- Will it be fine before the system access the data?

 

BR,

Hongseo

45 Posts

January 26th, 2021 07:00

Hi HongSeo,
Let me clarify what happens when you have a data unavailable (DU) situation .
You may have 2 or more SDS nodes disconnected from the cluster.

If there is DU - then the volumes will be effected and therefore any vms using the volumes will be effected.
You may be lucky and part of the data on the volume may not have been effected and therefore some vms will be not effected.
The volumes though are created with data spread across all the SDS nodes in the cluster.
The vms running on the PowerFlex datastores will probably go into a read-only state if they are Linux vms.
Once the DU has been resolved a reboot of the vms will bring them back to the expected state.

If you cannot correct the DU situation, you will need to restore the vms from a backup.

Another point to note:
When an ESXi host loses connectivity to its datastores (due to DU), it will indefinitely try to recover, which will eventually cause hostd and vpxa to hang and disconnect from vCenter.
The workaround here is to unmap any PowerFlex volumes to the ESXi host and reboot it.
Once the DU have been corrected , you can map the volume back.

Hope this helps.

2 Posts

February 3rd, 2021 15:00

Hi Gearoid,

Greatly appreciate your answer!

I would like to ask a few more questions if you don't mind:

    -  Is it possible that part of the data on the volume is not effected when 2 or more SDS nodes are down? I thought some chunks of every volume must be in DU situation, which means every volume will be corrupted.
    -  How can DU be resolved without restore/recreate vms? Possible when it was read-only state before it's broken?
    - If a VM was booting from a volume, can it be hanged when DU situation happens?

Thanks again for your support I thought it must be not easy to get any answer like that.

BR,

Hongseo

No Events found!

Top