Recovery of ScaleIO 1.32.4 environment after power down/up where TB is no longer available

Scenario:

ScaleIO 1.32.4 installed via VMware plugin to 3 ESX hosts

3 ESX hosts running 6.0U2

host 1 = MDM

host 2 = MDM

host 3 = TB + GW

single protection domain, no fault sets

SDS and SDS running on all 3 nodes

Over the weekend we had to power down the entire data center, which means we had to "deactivate" the protection domain before shutting down the SVMs and subsequently the ESX hosts.

Upon powering back on everything, host 3 did not recover (that hosted the TB and GW). The problem is without the TB I am unable to "activate" the protection domain, even if I put the cluster in "single" mode. For next steps since the cluster would not let me really do anything to it until the TB became available again I attempted to replaced the TB. First I removed the TB from the cluster via the scli --remove_tb command. I then installed a new replacement host, got ESX configured up just like the other 2 hosts. I then ran the web plugin to "Add servers to a registered scaleIO system" to attempt to install a new TB, However I never see the screen below where I can replace the old TB with a new one. At this point I'm not sure how to recover unless I just redploy the whole SIO environment from the ground up. Looking for any helpful suggestions, esp if it is possible via the web plugin vs some other manual method. Thanks!

Capture.JPG.jpg

Responses(1)

chrstopherm

2 Posts

0

June 15th, 2016 08:00

I always hate when people open threads like this then never provide the final solution to the problem they found. Well here was the solution to the problem that worked for me (your mileage may vary). I was able to recover from the above scenario by just deploying the SDC pieces and SDS SVM to the replaced ESX host via the VMware ScaleIO plugin. Note: I made sure to use the old TB SVM ScaleIO IP info (Mgmt/SIO d1/SIO d2) so that I would not need to manually update all of the SDCs in the cluster. Then I sshed into the new SDS SVM running on the ESX host and changed directory to /root/install. I then manually installed the TB MDM file via rpm -ivh (same should work for a MDM I would assume). Next I sshed into the primary MDM and then logged into the scli as admin and added the TB back into the cluster via the command scli --add_tb --tb_ip D1,D2 --force_clean where D1 and D2 are your ScaleIO Data 1 and Data 2 IP addresses. I then put the cluster back into cluster mode via scli --switch_to_to_cluster_mode

View All

No Events found!

PowerFlex

Recovery of ScaleIO 1.32.4 environment after power down/up where TB is no longer available

Was this post helpful?