Start a Conversation

Unsolved

This post is more than 5 years old

1888

July 1st, 2015 21:00

ScaleIO 1.32 Rebooting Primary MDM causes all IO to freeze

Hello ScaleIO community , we have a 5 node ScaleIO system 1.32

Node 1: CentOS 7 - Primary MDM, SDS

Node 2: CentOS 7 -Secondary MDM, SDS

Node 3: CentOS 7 -TB, SDS

Node 4: CentOS 7 -SDS

Node 5: Windows 2012 R2 Standard - SDC

A single volume is mapped to the SDC

We start a vary long copy process to the single volume mapped to the SDC

While the copy is running, when we reboot any node but node 1, the copy continues uninterrupted, and the cluster rebuilds / re-balance graceful after the reboot completes.

While the copy is running, only when we reboot node 1 the IO FREEZE completely, and the cluster rebuilds / re-balance after the reboot completes.

When node 1 comes back, and we switch back the ownership from the secondary MDM in Node 2 to the primary in Node 1, the copy process resumes from where it left.

What could be causing this problem?

Thank you.

Saul

60 Posts

July 2nd, 2015 04:00

Did you install ScaleIO manually or using the Installation Manager?

9 Posts

July 2nd, 2015 05:00

When you setup your SDC, which MDM IP(s) did you use?  If you only used the primary MDM IP, you will get the result you see now.  Make sure to use all MDM IPs (separated with commas) when setting up the SDC.

23 Posts

July 2nd, 2015 06:00

The install was done with Installation Manager

23 Posts

July 2nd, 2015 07:00

Hi Keller5, I believe that can be the culprit I used only the primary MDM IP.

Now I'm using this topology.csv file to add the two MDMs to the SDC and is reporting this new problem with Installation Manager

Error parsing CSV : Line 6 contains an IP that already appeared in a previous line

IPs,Domain,Username,Password,Operating System,Is MDM/TB,Is SDS,Protection Domain,SDS Pool List,SDS Device List,SDS Device Names,Is SDC,MDM IPs

10.1.1.51 , , root , ******** , linux , Primary , Yes , Windows , SSD , /dev/sdb , sdb ,,

10.1.1.52 , , root , ******** , linux , Secondary , Yes , Windows, SSD , /dev/sdb , sdb ,,

10.1.1.53 , , root , ******** , linux , TB , Yes , Windows , SSD , /dev/sdb , sdb ,,

10.1.1.54 , , root , ******** , linux ,  , Yes , Windows , SSD , /dev/sdb , sdb ,,

10.1.1.61 , CORP , Administrator , ******** , windows ,  , , , , , , Yes ,"10.1.1.51,10.1.1.52"

I verified uploading the file into Excel and it parses correctly, what could be causing this error?

Thank you for following up with me.

Saul

23 Posts

July 2nd, 2015 09:00

I will like to share my findings...

It seems the problem happens during the SDC install via the Installation Manager, as I have found no way to provide both MDMs IPs on the CSV file, does anyone know how to do this via CSV?

I read the Install guide multiple times, looked at the examples but they don't show a sample CSV file to install a SDC pointing to both MDMs using Installation Manager.

As a workaround if I perform the SDC installation manually, it works correctly during a Primary MDM outage.

PS C:\Users\Administrator.CORP\Downloads\ScaleIO_1.32_Complete_Windows_SW_Download\ScaleIO_1.32_Windows_Download> msiexec /i EMC-ScaleIO-sdc-1.32-402.1.msi MDM_IP="10.1.1.51,10.1.1.52"

PS C:\Program Files\EMC\ScaleIO\sdc\bin> .\drv_cfg.exe --query_mdms

Retrieved 1 mdm(s)

MDM-ID 7bda3fd532f295fc SDC ID eb35d4f300000005 INSTALLATION ID 1bd9239a1b9a0645 IPs [0]-10.1.1.51 [1]-10.1.1.52

PS C:\Program Files\EMC\ScaleIO\sdc\bin> .\drv_cfg.exe --query_vols

Retrieved 1 volume(s)

VOL-ID 58f2b48700000000 MDM-ID 7bda3fd532f295fc

MDM restricted SDC mode: Disabled

Query all SDC returned 5 SDC nodes.

SDC ID: eb3586d300000000 Name: N/A IP: 10.1.1.54 State: Connected GUID: 8C428CDA-CBFF-42EB-8080-60A8D9F96AC2

    Read band 0 IOPS 0 Bytes per-second

    Write band 0 IOPS 0 Bytes per-second

SDC ID: eb3586d400000001 Name: N/A IP: 10.1.1.52 State: Connected GUID: 92016AA0-5599-46F1-A593-0E71EE575EF1

    Read band 0 IOPS 0 Bytes per-second

    Write band 0 IOPS 0 Bytes per-second

SDC ID: eb3586d500000002 Name: N/A IP: 10.1.1.51 State: Connected GUID: 8761EE53-5386-49C2-A616-11687E91BB81

    Read band 0 IOPS 0 Bytes per-second

    Write band 0 IOPS 0 Bytes per-second

SDC ID: eb3586d600000003 Name: N/A IP: 10.1.1.53 State: Connected GUID: CE8A66FB-6C20-4D9A-9733-260A1C4BE242

    Read band 0 IOPS 0 Bytes per-second

    Write band 0 IOPS 0 Bytes per-second

SDC ID: eb35d4f300000005 Name: N/A IP: 10.1.1.61 State: Connected GUID: BD249CA9-3809-D548-8F0C-615747734956

    Read band 0 IOPS 0 Bytes per-second

    Write band 0 IOPS 0 Bytes per-second

Thank you

Saul

23 Posts

July 2nd, 2015 09:00

Here is output from the SDC side

PS C:\Program Files\emc\ScaleIO\sdc\bin> .\drv_cfg.exe --rescan

Calling kernel module to refresh MDM configuration information

Successfully completed the rescan operation

PS C:\Program Files\emc\ScaleIO\sdc\bin> .\drv_cfg.exe --query_mdms

Retrieved 1 mdm(s)

MDM-ID 7bda3fd532f295fc SDC ID eb35ade300000004 INSTALLATION ID 1bd9239a1b9a0645 IPs [0]-10.1.1.51

PS C:\Program Files\emc\ScaleIO\sdc\bin> .\drv_cfg.exe --query_vols

Retrieved 1 volume(s)

VOL-ID 58f2b48700000000 MDM-ID 7bda3fd532f295fc

No Events found!

Top