Unsolved
This post is more than 5 years old
23 Posts
0
1888
July 1st, 2015 21:00
ScaleIO 1.32 Rebooting Primary MDM causes all IO to freeze
Hello ScaleIO community , we have a 5 node ScaleIO system 1.32
Node 1: CentOS 7 - Primary MDM, SDS
Node 2: CentOS 7 -Secondary MDM, SDS
Node 3: CentOS 7 -TB, SDS
Node 4: CentOS 7 -SDS
Node 5: Windows 2012 R2 Standard - SDC
A single volume is mapped to the SDC
We start a vary long copy process to the single volume mapped to the SDC
While the copy is running, when we reboot any node but node 1, the copy continues uninterrupted, and the cluster rebuilds / re-balance graceful after the reboot completes.
While the copy is running, only when we reboot node 1 the IO FREEZE completely, and the cluster rebuilds / re-balance after the reboot completes.
When node 1 comes back, and we switch back the ownership from the secondary MDM in Node 2 to the primary in Node 1, the copy process resumes from where it left.
What could be causing this problem?
Thank you.
Saul
alexkh
60 Posts
0
July 2nd, 2015 04:00
Did you install ScaleIO manually or using the Installation Manager?
keller51
9 Posts
0
July 2nd, 2015 05:00
When you setup your SDC, which MDM IP(s) did you use? If you only used the primary MDM IP, you will get the result you see now. Make sure to use all MDM IPs (separated with commas) when setting up the SDC.
SignatureIT
23 Posts
0
July 2nd, 2015 06:00
The install was done with Installation Manager
SignatureIT
23 Posts
0
July 2nd, 2015 07:00
Hi Keller5, I believe that can be the culprit I used only the primary MDM IP.
Now I'm using this topology.csv file to add the two MDMs to the SDC and is reporting this new problem with Installation Manager
Error parsing CSV : Line 6 contains an IP that already appeared in a previous line
IPs,Domain,Username,Password,Operating System,Is MDM/TB,Is SDS,Protection Domain,SDS Pool List,SDS Device List,SDS Device Names,Is SDC,MDM IPs
10.1.1.51 , , root , ******** , linux , Primary , Yes , Windows , SSD , /dev/sdb , sdb ,,
10.1.1.52 , , root , ******** , linux , Secondary , Yes , Windows, SSD , /dev/sdb , sdb ,,
10.1.1.53 , , root , ******** , linux , TB , Yes , Windows , SSD , /dev/sdb , sdb ,,
10.1.1.54 , , root , ******** , linux , , Yes , Windows , SSD , /dev/sdb , sdb ,,
10.1.1.61 , CORP , Administrator , ******** , windows , , , , , , , Yes ,"10.1.1.51,10.1.1.52"
I verified uploading the file into Excel and it parses correctly, what could be causing this error?
Thank you for following up with me.
Saul
SignatureIT
23 Posts
0
July 2nd, 2015 09:00
I will like to share my findings...
It seems the problem happens during the SDC install via the Installation Manager, as I have found no way to provide both MDMs IPs on the CSV file, does anyone know how to do this via CSV?
I read the Install guide multiple times, looked at the examples but they don't show a sample CSV file to install a SDC pointing to both MDMs using Installation Manager.
As a workaround if I perform the SDC installation manually, it works correctly during a Primary MDM outage.
PS C:\Users\Administrator.CORP\Downloads\ScaleIO_1.32_Complete_Windows_SW_Download\ScaleIO_1.32_Windows_Download> msiexec /i EMC-ScaleIO-sdc-1.32-402.1.msi MDM_IP="10.1.1.51,10.1.1.52"
PS C:\Program Files\EMC\ScaleIO\sdc\bin> .\drv_cfg.exe --query_mdms
Retrieved 1 mdm(s)
MDM-ID 7bda3fd532f295fc SDC ID eb35d4f300000005 INSTALLATION ID 1bd9239a1b9a0645 IPs [0]-10.1.1.51 [1]-10.1.1.52
PS C:\Program Files\EMC\ScaleIO\sdc\bin> .\drv_cfg.exe --query_vols
Retrieved 1 volume(s)
VOL-ID 58f2b48700000000 MDM-ID 7bda3fd532f295fc
MDM restricted SDC mode: Disabled
Query all SDC returned 5 SDC nodes.
SDC ID: eb3586d300000000 Name: N/A IP: 10.1.1.54 State: Connected GUID: 8C428CDA-CBFF-42EB-8080-60A8D9F96AC2
Read band 0 IOPS 0 Bytes per-second
Write band 0 IOPS 0 Bytes per-second
SDC ID: eb3586d400000001 Name: N/A IP: 10.1.1.52 State: Connected GUID: 92016AA0-5599-46F1-A593-0E71EE575EF1
Read band 0 IOPS 0 Bytes per-second
Write band 0 IOPS 0 Bytes per-second
SDC ID: eb3586d500000002 Name: N/A IP: 10.1.1.51 State: Connected GUID: 8761EE53-5386-49C2-A616-11687E91BB81
Read band 0 IOPS 0 Bytes per-second
Write band 0 IOPS 0 Bytes per-second
SDC ID: eb3586d600000003 Name: N/A IP: 10.1.1.53 State: Connected GUID: CE8A66FB-6C20-4D9A-9733-260A1C4BE242
Read band 0 IOPS 0 Bytes per-second
Write band 0 IOPS 0 Bytes per-second
SDC ID: eb35d4f300000005 Name: N/A IP: 10.1.1.61 State: Connected GUID: BD249CA9-3809-D548-8F0C-615747734956
Read band 0 IOPS 0 Bytes per-second
Write band 0 IOPS 0 Bytes per-second
Thank you
Saul
SignatureIT
23 Posts
0
July 2nd, 2015 09:00
Here is output from the SDC side
PS C:\Program Files\emc\ScaleIO\sdc\bin> .\drv_cfg.exe --rescan
Calling kernel module to refresh MDM configuration information
Successfully completed the rescan operation
PS C:\Program Files\emc\ScaleIO\sdc\bin> .\drv_cfg.exe --query_mdms
Retrieved 1 mdm(s)
MDM-ID 7bda3fd532f295fc SDC ID eb35ade300000004 INSTALLATION ID 1bd9239a1b9a0645 IPs [0]-10.1.1.51
PS C:\Program Files\emc\ScaleIO\sdc\bin> .\drv_cfg.exe --query_vols
Retrieved 1 volume(s)
VOL-ID 58f2b48700000000 MDM-ID 7bda3fd532f295fc