Start a Conversation

Unsolved

This post is more than 5 years old

1970

June 21st, 2016 09:00

MDM failover causes all windows clusters to fail

So we've been doing testing to validate the function of ScaleIO so we can purchase it and get full support/etc.

However, every time we do our testing for MDM failover.. it simply fails.

ScaleIO - 3node cluster

Version: 2.0.6035

Windows 2012R2 cluster, attached disks for SQL.

We can change the MDM owner via command line or by rebooting the primary MDM and all of the attached windows disks fail.

When the primary MDM comes back online, they are still failed.  Forcing ownership change back to the original primary MDM the disks come back online but are corrupt.

This essentially kills our ability to do anything with ScaleIO.

Anything specific we need to do for the secondary MDM to take over properly?  Is there something in Windows we need to do?

306 Posts

June 21st, 2016 13:00

Hi,

try to run "drv_cfg --query_mdms" and make sure that your SDCs point to all configured MDM IP addresses - it's a common mistake that SDCs point only to primary MDM IP so when if fails over to secondary, SDCs fail as they don't know its IP address.

cheers,

Pawel

12 Posts

June 21st, 2016 13:00

That was the first thing we verified.

All of the MDMs are in place.

On a second test node we have setup, running 5 nodes with 3 MDM, we have the exact same problem.

As soon as the primary MDM goes offline, everything on the windows cluster dies with it and does not recover.

306 Posts

June 21st, 2016 22:00

OK - are you use one or two data networks? If two, all all the Windows boxes able to talk to each other on both?

Can you see any errors in 'showevents' or the problem only manifests on the Windows machines?

12 Posts

June 22nd, 2016 07:00

single network

there are no errors anywhere except on windows

and to clarify, this does occur with both 2008R2 and 2012R2, but we have not had any issues with our RHEL systems (non clustered) losing access during a failover.

12 Posts

June 22nd, 2016 07:00

second clarification, our linux RAC clusters have no issues, it appears to be only Windows.

I am in the process of doing validation for standalone systems and not just clustered ones now.

confirmed, this happens with both cluster and stand alone machines for Windows 2008R2 and 2012R2

306 Posts

June 25th, 2016 01:00

Can you please gather get_info bundle from all the nodes in the cluster and upload them to:

https://ftp.emc.com/action/login?domain=ftp.emc.com&username=ogXzfu0Tq&password=AUrRBBU060

Probably the best way to tackle it would be via regular Service Request - do you have service contract for this installation? If yes, please open an SR and we can handle it faster this way.

12 Posts

June 25th, 2016 14:00

I think we found what the issue is and will be testing the fix for it early next week.

in the MDM section of the registry it was listing our systems one per line.

We removed these entries and added them to a single line separated by a comma with no spaces.

Doing a query on the MDMs now shows ID[0] and ID[1] where it simply showed ID[0] for all of them before.

306 Posts

June 26th, 2016 23:00

Nice one - let us know if that fixed it!

12 Posts

June 29th, 2016 12:00

that appears to fix the problem.

If the MDMs are not listed in the same line it simply does not work.

It would help if the documentation was a bit more clear on this, especially if you have X numbers of MDMs and add more.  The command seems to show you just --add_mdm  but that isn't the case.  You need to modify the existing with all of them on the one line.

306 Posts

June 29th, 2016 22:00

Hi,

This is a good news, thank you for the update.

From what I see in the User Guide, the "drv_cfg --mod_mdm_ip" actually mentions you should include all the IP addresses:

--new_mdm_ip:

The new IP address list (comma delimited) for this MDM. If you want to retain the existing address(es), include them in this list.

I can try to open an enhancement request for this if you like, but what exactly would you like to see there?

Best,

Pawel

August 16th, 2016 17:00

Little bit of a necropost, but it says that you CAN put them in a comma delimited list in the guide, not that you MUST put them in a comma delimited list. We've found that you MUST do it.

No Events found!

Top