Start a Conversation

Unsolved

This post is more than 5 years old

3800

January 9th, 2017 23:00

Problem with SDS-SDS Only IPs network failure ScaleIO 2.0.0.2

Hi,

I installed ScaleIO with MDM IPs, SDS-SDS Only IPs and SDS-SDC Only IPs.

Those 3 networks are in separete segments like 10.0.0.0/24,10.0.1.0/24 and 10.0.2.0/24.

The problems here is

When SDS-SDS Only network is down on the Master MDM node, Then Master MDM thinks the all SDSs dead.

So no App I/Os are possible till Master MDM is manually switched.

*In this case Master MDM does not failover(I tested more than 10 times)

An solution is

Add another SDS-SDS Only IPs so Master MDM is still communicatable with SDSs when the other SDS-SDS Only network is down on the Master MDM node.

But the number of the pyhsical IFs on the MDM nodes are few.

So I don't want to add an SDS-SDS Only IPs as long as possible.

Any possible solutions?

like make Master MDM failover when the SDS-SDS Only network is down on the Master MDM node, or to make Master MDM have the correct SDSs status.

or anything else.

Many thanks,

Taizo

306 Posts

January 11th, 2017 00:00

Hi Taizo,

Is the MDM network up when you bring down the SDS-SDS network? Could you please paste "scli --query_all" and "scli --query_all_sds" outputs here?

Thank you,

Pawel

16 Posts

January 11th, 2017 16:00

Hi Pawel,

Thanks for your reply.

I pasted the command outputs.

# ip -f inet a

# ifdown vlan3003

# scli --query_cluster

# scli --query_all

# scli --query_all_sds

# ip -f inet a

1: lo: mtu 65536 qdisc noqueue state UNKNOWN

inet 127.0.0.1/8 scope host lo

valid_lft forever preferred_lft forever

10: virbr0: mtu 1500 qdisc noqueue state DOWN

inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0

valid_lft forever preferred_lft forever

34: vlan3004@bond1: mtu 1500 qdisc noqueue state UP

inet 10.11.188.39/24 brd 10.11.188.255 scope global vlan3004

valid_lft forever preferred_lft forever

46: vlan3003@bond0: mtu 1500 qdisc noqueue state UP

inet 10.11.187.39/24 brd 10.11.187.255 scope global vlan3003

valid_lft forever preferred_lft forever

48: vlan3000@bond0: mtu 1500 qdisc noqueue state UP

inet 10.11.184.39/24 brd 10.11.184.255 scope global vlan3000

valid_lft forever preferred_lft forever # ifdown vlan3003 # scli --query_cluster

Cluster:

Mode: 5_node, State: Normal, Active: 5/5, Replicas: 3/3 Master MDM:

ID: 0x3c97b2f05c849180

IPs: 10.11.184.39, Management IPs: 10.11.184.39, Port: 9011

Version: 2.0.7120

Slave MDMs:

ID: 0x07c6d91032d07412

IPs: 10.11.184.41, Management IPs: 10.11.184.41, Port: 9011

Status: Normal, Version: 2.0.7120

ID: 0x4d93397412e89f21

IPs: 10.11.184.40, Management IPs: 10.11.184.40, Port: 9011

Status: Normal, Version: 2.0.7120

Tie-Breakers:

ID: 0x7510028d2e11b564

IPs: 10.11.184.25, Port: 9011

Status: Normal, Version: 2.0.7120

ID: 0x2ebd3163386dc713

IPs: 10.11.184.42, Port: 9011

Status: Normal, Version: 2.0.7120 # scli --query_all System Info:

        Product:  EMC ScaleIO Version: R2_0.7120.0

        ID:      794940494546e301

        Manager ID:      0000000000000000

License info:

        Installation ID: 10fa553e7ac3b791

        SWID:

        Maximum capacity: Unlimited

        Usage time        Enterprise features: Enabled

        The system was activated 2 days ago

System settings:

        Capacity alert thresholds: High: 80, Critical: 90

        Thick volume reservation percent: 0

        MDM restricted SDC mode: disabled

        Management Clients secure communication: enabled

        TLS version: TLSv1.2

        User authentication method: Native

        SDS connection authentication: Enabled

Query all returned 1 Protection Domain:

Protection Domain default (Id: 5fe8d81b00000000) has 1 storage pools,

5 Fault Sets, 20 SDS nodes, 1 volumes and 0 Bytes available for volume allocation Operational state is Active Rfcache enabled, Mode: Write miss, Page Size 64 KB, Max IO size 128 KB

Storage Pool defaultSP (Id: c3202db500000000) has 1 volumes and 0 Bytes available for volume allocation

        The number of parallel rebuild/rebalance jobs: 2

        Rebuild is enabled and using No-Limit policy

        Rebalance is enabled and using No-Limit policy

        Background device scanner: Disabled

        Zero padding is disabled

        Spare policy: 25% out of total

        Checksum mode: disabled

        Doesn't use RAM Read Cache

        Doesn't use Flash Read Cache

        Capacity alert thresholds: High: 80, Critical: 90

SDS Summary:

        Total 20 SDS Nodes

        20 SDS nodes have membership state 'Decoupled'

        20 SDS nodes have connection state 'Disconnected'

        22.8 TB (23326 GB) total capacity

        22.3 TB (22814 GB) unused capacity

        0 Bytes snapshots capacity

        512.0 GB (524288 MB) in-use capacity

        0 Bytes thin capacity

        0 Bytes protected capacity

        512.0 GB (524288 MB) failed capacity

        0 Bytes degraded-failed capacity

        0 Bytes degraded-healthy capacity

        22.2 TB (22764 GB) unreachable-unused capacity

        0 Bytes active rebalance capacity

        0 Bytes pending rebalance capacity

        0 Bytes active forward-rebuild capacity

        0 Bytes pending forward-rebuild capacity

        0 Bytes active backward-rebuild capacity

        0 Bytes pending backward-rebuild capacity

        0 Bytes rebalance capacity

        0 Bytes forward-rebuild capacity

        0 Bytes backward-rebuild capacity

        0 Bytes active moving capacity

        0 Bytes pending moving capacity

        0 Bytes total moving capacity

        0 Bytes spare capacity

        512.0 GB (524288 MB) at-rest capacity

        0 Bytes semi-protected capacity

        0 Bytes in-maintenance capacity

        0 Bytes decreased capacity

        Primary-reads                            0 IOPS 0 Bytes per-second

        Primary-writes                           0 IOPS 0 Bytes per-second

        Secondary-reads                          0 IOPS 0 Bytes per-second

        Secondary-writes                         0 IOPS 0 Bytes per-second

        Backward-rebuild-reads                   0 IOPS 0 Bytes per-second

        Backward-rebuild-writes                  0 IOPS 0 Bytes per-second

        Forward-rebuild-reads                    0 IOPS 0 Bytes per-second

        Forward-rebuild-writes                   0 IOPS 0 Bytes per-second

        Rebalance-reads                          0 IOPS 0 Bytes per-second

        Rebalance-writes 0 IOPS 0 Bytes per-second

Volumes summary:

        1 thick-provisioned volume. Total size: 256.0 GB (262144 MB) # scli --query_all_sds Query-all-SDS returned 20 SDS nodes.

Protection Domain 5fe8d81b00000000 Name: default SDS ID: b231ccde0000001b Name: SDS_10.11.184.42:7073 State: 

Disconnected, Decoupled IP: 10.11.187.42,10.11.188.42 Port: 7073

Version: 2.0.7120

SDS ID: b231ccdd0000001a Name: SDS_10.11.184.39:7073 State: 

Disconnected, Decoupled IP: 10.11.187.39,10.11.188.39 Port: 7073

Version: 2.0.7120

SDS ID: b231ccc700000016 Name: SDS_10.11.184.40:7073 State: 

Disconnected, Decoupled IP: 10.11.187.40,10.11.188.40 Port: 7073

Version: 2.0.7120

SDS ID: b231ccc600000015 Name: SDS_10.11.184.42:7076 State: 

Disconnected, Decoupled IP: 10.11.187.42,10.11.188.42 Port: 7076

Version: 2.0.7120

SDS ID: b231ccc500000014 Name: SDS_10.11.184.41:7074 State: 

Disconnected, Decoupled IP: 10.11.187.41,10.11.188.41 Port: 7074

Version: 2.0.7120

SDS ID: b231ccdc00000019 Name: SDS_10.11.184.25:7074 State: 

Disconnected, Decoupled IP: 10.11.187.25,10.11.188.25 Port: 7074

Version: 2.0.7120

SDS ID: b231ccdf0000001c Name: SDS_10.11.184.40:7076 State: 

Disconnected, Decoupled IP: 10.11.187.40,10.11.188.40 Port: 7076

Version: 2.0.7120

SDS ID: b231ccc200000011 Name: SDS_10.11.184.41:7076 State: 

Disconnected, Decoupled IP: 10.11.187.41,10.11.188.41 Port: 7076

Version: 2.0.7120

SDS ID: b231ccc100000010 Name: SDS_10.11.184.25:7075 State: 

Disconnected, Decoupled IP: 10.11.187.25,10.11.188.25 Port: 7075

Version: 2.0.7120

SDS ID: b231ccc00000000f Name: SDS_10.11.184.42:7075 State: 

Disconnected, Decoupled IP: 10.11.187.42,10.11.188.42 Port: 7075

Version: 2.0.7120

SDS ID: b231ccbf0000000e Name: SDS_10.11.184.40:7074 State: 

Disconnected, Decoupled IP: 10.11.187.40,10.11.188.40 Port: 7074

Version: 2.0.7120

SDS ID: b231ccbe0000000d Name: SDS_10.11.184.39:7075 State: 

Disconnected, Decoupled IP: 10.11.187.39,10.11.188.39 Port: 7075

Version: 2.0.7120

SDS ID: b231ccae00000007 Name: SDS_10.11.184.39:7076 State: 

Disconnected, Decoupled IP: 10.11.187.39,10.11.188.39 Port: 7076

Version: 2.0.7120

SDS ID: b231ccad00000006 Name: SDS_10.11.184.25:7073 State: 

Disconnected, Decoupled IP: 10.11.187.25,10.11.188.25 Port: 7073

Version: 2.0.7120

SDS ID: b231ccac00000005 Name: SDS_10.11.184.41:7075 State: 

Disconnected, Decoupled IP: 10.11.187.41,10.11.188.41 Port: 7075

Version: 2.0.7120

SDS ID: b231ccab00000004 Name: SDS_10.11.184.39:7074 State: 

Disconnected, Decoupled IP: 10.11.187.39,10.11.188.39 Port: 7074

Version: 2.0.7120

SDS ID: b231ccaa00000003 Name: SDS_10.11.184.42:7074 State: 

Disconnected, Decoupled IP: 10.11.187.42,10.11.188.42 Port: 7074

Version: 2.0.7120

SDS ID: b231cca900000002 Name: SDS_10.11.184.41:7073 State: 

Disconnected, Decoupled IP: 10.11.187.41,10.11.188.41 Port: 7073

Version: 2.0.7120

SDS ID: b231cca800000001 Name: SDS_10.11.184.25:7076 State: 

Disconnected, Decoupled IP: 10.11.187.25,10.11.188.25 Port: 7076

Version: 2.0.7120

SDS ID: b231cca700000000 Name: SDS_10.11.184.40:7075 State: 

Disconnected, Decoupled IP: 10.11.187.40,10.11.188.40 Port: 7075

Version: 2.0.7120

------------------------

Many thanks,

Taizo

306 Posts

January 11th, 2017 23:00

Hi Taizo,

Thank you for the outputs, can you please provide one more "scli --query_sds --sds_name " for any disconnected SDS?

Pawel

16 Posts

January 12th, 2017 02:00

Hi Pawel,

Here is more command outputs

# scli --query_sds --sds_name (x2 randomly picked because the all sds're judged disconnected) # scli --switch_mdm_ownership --new_master_mdm_ip # scli --login --username admin --password Kvm-system --mdm_ip # scli --query_cluster # scli --query_all_sds # scli --query_all

You can see only the SDSs in the nodes where Master MDM is and ifdown was executed

are disconnected & Discoupled, When Master MDM was switched to a Slave MDM.

#scli --query_all_sds --sds_name SDS_10.11.184.42:7073 SDS b231ccde0000001b Name: SDS_10.11.184.42:7073 Version: 2.0.7120 Protection Domain: 5fe8d81b00000000, Name: default Fault Set: f51271b300000002, Name: fs4 DRL mode: Volatile Authentication error: None IP information (total 2 IPs):

         1: 10.11.187.42     Role: SDS Only

         2: 10.11.188.42     Role: SDC Only

        Port: 7073

RAM Read Cache information:

        128.0 MB (131072 KB) total size

        Cache is enabled

        RAM Read Cache memory allocation state is PENDING

Device information (total 3 devices):

         1: Name: N/A Path: /dev/sda  Original-path: /dev/sda  ID: 19253c80001b0000

                Storage Pool: defaultSP, Capacity: 498 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Normal

         2: Name: N/A Path: /dev/sdc  Original-path: /dev/sdc  ID: 19253c81001b0001

                Storage Pool: defaultSP, Capacity: 498 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Normal

         3: Name: N/A Path: /dev/sdj  Original-path: /dev/sdj  ID: 19253c82001b0002

                Storage Pool: defaultSP, Capacity: 498 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Normal

Rfcache device information (total 0 devices):

Membership-state: Decoupled; Connection-state: Disconnected;

SDS-state: Normal; 0 devices with error; 0 Rfcache devices with errors

        1.5 TB (1496 GB) total capacity

        1.4 TB (1461 GB) unused capacity

        0 Bytes snapshots capacity

        32.5 GB (33280 MB) in-use capacity

        0 Bytes thin capacity

        1.4 TB (1461 GB) unreachable-unused capacity

        0 active moving-in forward-rebuild jobs

        0 active moving-in backward-rebuild jobs

        0 active moving-in rebalance jobs

        0 active moving-out forward-rebuild jobs

        0 active moving-out backward-rebuild jobs

        0 active moving-out rebalance jobs

        0 pending moving-in forward-rebuild jobs

        0 pending moving-in backward-rebuild jobs

        0 pending moving-in rebalance jobs

        0 pending moving-out forward-rebuild jobs

        0 pending moving-out backward-rebuild jobs

        0 pending moving-out rebalance jobs

        0 Bytes decreased capacity

        Primary-reads                            0 IOPS 0 Bytes per-second

        Primary-writes                           0 IOPS 0 Bytes per-second

        Secondary-reads                          0 IOPS 0 Bytes per-second

        Secondary-writes                         0 IOPS 0 Bytes per-second

        Backward-rebuild-reads                   0 IOPS 0 Bytes per-second

        Backward-rebuild-writes                  0 IOPS 0 Bytes per-second

        Forward-rebuild-reads                    0 IOPS 0 Bytes per-second

        Forward-rebuild-writes                   0 IOPS 0 Bytes per-second

        Rebalance-reads                          0 IOPS 0 Bytes per-second

        Rebalance-writes                         0 IOPS 0 Bytes per-second

# scli --query_sds --sds_name SDS_10.11.184.25:7076 SDS b231cca800000001 Name: SDS_10.11.184.25:7076 Version: 2.0.7120 Protection Domain: 5fe8d81b00000000, Name: default Fault Set: f51271b500000004, Name: fs5 DRL mode: Volatile Authentication error: None IP information (total 2 IPs):

         1: 10.11.187.25     Role: SDS Only

         2: 10.11.188.25     Role: SDC Only

        Port: 7076

RAM Read Cache information:

        128.0 MB (131072 KB) total size

        Cache is enabled

        RAM Read Cache memory allocation state is PENDING

Device information (total 2 devices):

         1: Name: N/A Path: /dev/sdh  Original-path: /dev/sdh  ID: 19253c3f00010000

                Storage Pool: defaultSP, Capacity: 498 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Normal

         2: Name: N/A Path: /dev/sdi  Original-path: /dev/sdi  ID: 19253c4000010001

                Storage Pool: defaultSP, Capacity: 498 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Normal

Rfcache device information (total 0 devices):

Membership-state: Decoupled; Connection-state: Disconnected;

SDS-state: Normal; 0 devices with error; 0 Rfcache devices with errors

        998.0 GB (1021948 MB) total capacity

        974.0 GB (997372 MB) unused capacity

        0 Bytes snapshots capacity

        22.0 GB (22528 MB) in-use capacity

        0 Bytes thin capacity

        974.0 GB (997372 MB) unreachable-unused capacity

        0 active moving-in forward-rebuild jobs

        0 active moving-in backward-rebuild jobs

        0 active moving-in rebalance jobs

        0 active moving-out forward-rebuild jobs

        0 active moving-out backward-rebuild jobs

        0 active moving-out rebalance jobs

        0 pending moving-in forward-rebuild jobs

        0 pending moving-in backward-rebuild jobs

        0 pending moving-in rebalance jobs

        0 pending moving-out forward-rebuild jobs

        0 pending moving-out backward-rebuild jobs

        0 pending moving-out rebalance jobs

        0 Bytes decreased capacity

        Primary-reads                            0 IOPS 0 Bytes per-second

        Primary-writes                           0 IOPS 0 Bytes per-second

        Secondary-reads                          0 IOPS 0 Bytes per-second

        Secondary-writes                         0 IOPS 0 Bytes per-second

        Backward-rebuild-reads                   0 IOPS 0 Bytes per-second

        Backward-rebuild-writes                  0 IOPS 0 Bytes per-second

        Forward-rebuild-reads                    0 IOPS 0 Bytes per-second

        Forward-rebuild-writes                   0 IOPS 0 Bytes per-second

        Rebalance-reads                          0 IOPS 0 Bytes per-second

        Rebalance-writes                         0 IOPS 0 Bytes per-second

# scli --switch_mdm_ownership --new_master_mdm_ip 10.11.184.40 Successfully switched MDM ownership.

# scli --login --username admin --password ******* --mdm_ip 10.11.184.40 Logged in. User role is SuperUser. System ID is 794940494546e301 # scli --query_cluster --mdm_ip 10.11.184.40

Cluster:

Mode: 5_node, State: Normal, Active: 5/5, Replicas: 3/3 Master MDM:

ID: 0x4d93397412e89f21

IPs: 10.11.184.40, Management IPs: 10.11.184.40, Port: 9011

Version: 2.0.7120

Slave MDMs:

ID: 0x3c97b2f05c849180

IPs: 10.11.184.39, Management IPs: 10.11.184.39, Port: 9011

Status: Normal, Version: 2.0.7120

ID: 0x07c6d91032d07412

IPs: 10.11.184.41, Management IPs: 10.11.184.41, Port: 9011

Status: Normal, Version: 2.0.7120

Tie-Breakers:

ID: 0x7510028d2e11b564

IPs: 10.11.184.25, Port: 9011

Status: Normal, Version: 2.0.7120

ID: 0x2ebd3163386dc713

IPs: 10.11.184.42, Port: 9011

Status: Normal, Version: 2.0.7120 # scli --mdm_ip 10.11.184.40 --query_all_sds Query-all-SDS returned 20 SDS nodes.

Protection Domain 5fe8d81b00000000 Name: default SDS ID: b231ccde0000001b Name: SDS_10.11.184.42:7073 State: Connected, Joined IP: 10.11.187.42,10.11.188.42 Port: 7073 Version: 2.0.7120 SDS ID: b231ccdd0000001a Name: SDS_10.11.184.39:7073 State: 

Disconnected, Decoupled IP: 10.11.187.39,10.11.188.39 Port: 7073

Version: N/A

SDS ID: b231ccc700000016 Name: SDS_10.11.184.40:7073 State: Connected, Joined IP: 10.11.187.40,10.11.188.40 Port: 7073 Version: 2.0.7120 SDS ID: b231ccc600000015 Name: SDS_10.11.184.42:7076 State: Connected, Joined IP: 10.11.187.42,10.11.188.42 Port: 7076 Version: 2.0.7120 SDS ID: b231ccc500000014 Name: SDS_10.11.184.41:7074 State: Connected, Joined IP: 10.11.187.41,10.11.188.41 Port: 7074 Version: 2.0.7120 SDS ID: b231ccdc00000019 Name: SDS_10.11.184.25:7074 State: Connected, Joined IP: 10.11.187.25,10.11.188.25 Port: 7074 Version: 2.0.7120 SDS ID: b231ccdf0000001c Name: SDS_10.11.184.40:7076 State: Connected, Joined IP: 10.11.187.40,10.11.188.40 Port: 7076 Version: 2.0.7120 SDS ID: b231ccc200000011 Name: SDS_10.11.184.41:7076 State: Connected, Joined IP: 10.11.187.41,10.11.188.41 Port: 7076 Version: 2.0.7120 SDS ID: b231ccc100000010 Name: SDS_10.11.184.25:7075 State: Connected, Joined IP: 10.11.187.25,10.11.188.25 Port: 7075 Version: 2.0.7120 SDS ID: b231ccc00000000f Name: SDS_10.11.184.42:7075 State: Connected, Joined IP: 10.11.187.42,10.11.188.42 Port: 7075 Version: 2.0.7120 SDS ID: b231ccbf0000000e Name: SDS_10.11.184.40:7074 State: Connected, Joined IP: 10.11.187.40,10.11.188.40 Port: 7074 Version: 2.0.7120 SDS ID: b231ccbe0000000d Name: SDS_10.11.184.39:7075 State: 

Disconnected, Decoupled IP: 10.11.187.39,10.11.188.39 Port: 7075

Version: N/A

SDS ID: b231ccae00000007 Name: SDS_10.11.184.39:7076 State: 

Disconnected, Decoupled IP: 10.11.187.39,10.11.188.39 Port: 7076

Version: N/A

SDS ID: b231ccad00000006 Name: SDS_10.11.184.25:7073 State: Connected, Joined IP: 10.11.187.25,10.11.188.25 Port: 7073 Version: 2.0.7120 SDS ID: b231ccac00000005 Name: SDS_10.11.184.41:7075 State: Connected, Joined IP: 10.11.187.41,10.11.188.41 Port: 7075 Version: 2.0.7120 SDS ID: b231ccab00000004 Name: SDS_10.11.184.39:7074 State: 

Disconnected, Decoupled IP: 10.11.187.39,10.11.188.39 Port: 7074

Version: N/A

SDS ID: b231ccaa00000003 Name: SDS_10.11.184.42:7074 State: Connected, Joined IP: 10.11.187.42,10.11.188.42 Port: 7074 Version: 2.0.7120 SDS ID: b231cca900000002 Name: SDS_10.11.184.41:7073 State: Connected, Joined IP: 10.11.187.41,10.11.188.41 Port: 7073 Version: 2.0.7120 SDS ID: b231cca800000001 Name: SDS_10.11.184.25:7076 State: Connected, Joined IP: 10.11.187.25,10.11.188.25 Port: 7076 Version: 2.0.7120 SDS ID: b231cca700000000 Name: SDS_10.11.184.40:7075 State: Connected, Joined IP: 10.11.187.40,10.11.188.40 Port: 7075 Version: 2.0.7120 # scli --mdm_ip 10.11.184.40 --query_all System Info:

        Product:  EMC ScaleIO Version: R2_0.7120.0

        ID:      794940494546e301

        Manager ID:      0000000000000000

License info:

        Installation ID: 10fa553e7ac3b791

        SWID:

        Maximum capacity: Unlimited

        Usage time        Enterprise features: Enabled

        The system was activated 2 days ago

System settings:

        Capacity alert thresholds: High: 80, Critical: 90

        Thick volume reservation percent: 0

        MDM restricted SDC mode: disabled

        Management Clients secure communication: enabled

        TLS version: TLSv1.2

        User authentication method: Native

        SDS connection authentication: Enabled

Query all returned 1 Protection Domain:

Protection Domain default (Id: 5fe8d81b00000000) has 1 storage pools,

5 Fault Sets, 20 SDS nodes, 1 volumes and 8.2 TB (8440 GB) available for volume allocation Operational state is Active Rfcache enabled, Mode: Write miss, Page Size 64 KB, Max IO size 128 KB

Storage Pool defaultSP (Id: c3202db500000000) has 1 volumes and 8.2 TB

(8440 GB) available for volume allocation

        The number of parallel rebuild/rebalance jobs: 2

        Rebuild is enabled and using No-Limit policy

        Rebalance is enabled and using No-Limit policy

        Background device scanner: Disabled

        Zero padding is disabled

        Spare policy: 25% out of total

        Checksum mode: disabled

        Doesn't use RAM Read Cache

        Doesn't use Flash Read Cache

        Capacity alert thresholds: High: 80, Critical: 90

SDS Summary:

        Total 20 SDS Nodes

        16 SDS nodes have membership state 'Joined'

        4 SDS nodes have membership state 'Decoupled'

        16 SDS nodes have connection state 'Connected'

        4 SDS nodes have connection state 'Disconnected'

        22.8 TB (23326 GB) total capacity

        21.1 TB (21648 GB) unused capacity

        0 Bytes snapshots capacity

        564.2 GB (577792 MB) in-use capacity

        0 Bytes thin capacity

        407.5 GB (417280 MB) protected capacity

        0 Bytes failed capacity

        52.2 GB (53504 MB) degraded-failed capacity

        52.2 GB (53504 MB) degraded-healthy capacity

        4.5 TB (4603 GB) unreachable-unused capacity

        0 Bytes active rebalance capacity

        0 Bytes pending rebalance capacity

        17.5 GB (17920 MB) active forward-rebuild capacity

        34.8 GB (35584 MB) pending forward-rebuild capacity

        0 Bytes active backward-rebuild capacity

        0 Bytes pending backward-rebuild capacity

        0 Bytes rebalance capacity

        52.2 GB (53504 MB) forward-rebuild capacity

        0 Bytes backward-rebuild capacity

        17.5 GB (17920 MB) active moving capacity

        34.8 GB (35584 MB) pending moving capacity

        52.2 GB (53504 MB) total moving capacity

        1.1 TB (1166 GB) spare capacity

        459.8 GB (470784 MB) at-rest capacity

        0 Bytes semi-protected capacity

        0 Bytes in-maintenance capacity

        0 Bytes decreased capacity

        Primary-reads                            0 IOPS 0 Bytes per-second

        Primary-writes                           0 IOPS 0 Bytes per-second

        Secondary-reads                          0 IOPS 0 Bytes per-second

        Secondary-writes                         0 IOPS 0 Bytes per-second

        Backward-rebuild-reads                   0 IOPS 0 Bytes per-second

        Backward-rebuild-writes                  0 IOPS 0 Bytes per-second

        Forward-rebuild-reads                    562 IOPS 562.6 MB (576102 

KB) per-second

        Forward-rebuild-writes                   598 IOPS 598.6 MB (612966 

KB) per-second

        Rebalance-reads                          0 IOPS 0 Bytes per-second

        Rebalance-writes                         0 IOPS 0 Bytes per-second

Volumes summary:

        1 thick-provisioned volume. Total size: 282.1 GB (288896 MB)

------------------

Many thanks

Taizo

306 Posts

January 12th, 2017 06:00

Hi Taizo,

It looks like your SDS don't use the MDM network (10.11.184.x/24) at all, so the only network capable of talking to the MDMs is your backend network (10.11.187.x/24) - when you shut it down, the SDS can no longer talk to the MDMs, as they can't use the SDC only network (10.11.188.x) for this purpose.

I would suggest you to configure additional IPs from the MDM network on the SDS' if you want to survive a backend network outage - for now, it is working as designed.

In general, we wouldn't recommend to use IP roles with 2 NICs only - you have SPOF in each network; if you really have to use the roles I would suggest to use 4 NICs if possible (2 networks for SDC, 2 other for SDS-SDS traffic) - this way you don't have a single point of failure.

Hope it helps!

cheers,

Pawel

16 Posts

January 24th, 2017 15:00

Hi Pawel,

Thank you for the reply.

1,So MDM IPs does not cover the communications betwoeen MDM and SDSs.

   *Deployment guide says "MDM IPs ...MDM control communications with

SDSs and SDCs"

   Is MDM IPs only for communications between MDMs(and between MDM and SDCs)?

2, I am planning to alter the network topology like below.

    Would you give me any comments of right, wrong or better ways?

  MDM Mgmt IP       - for communications for CLI, GUI, REST API

                      e.g 10.0.0.0/24

  MDM IPs           - for communications for MDM-MDM and MDM-SDC?

                      should be on the same network as SDS-SDS Only IPs?

                      e.g 10.0.1.0/24?

  SDS-SDS Only IPs  - for communications for SDS-SDS(rebuild and

rebalance) and MDM-SDS.

                      should be more than one IP, or one IP with IF teaming?

                      e.g 10.0.1.0/24, 10.0.2.0/24  SDS-SDC Only IPs  -

MDM does not use this IPs at all

                      can also have upto 8 IPs?

                      e.g 10.1.0.0/24, (10.2.0.0/24?)

3, It looks Maximum IP addresses per server between MDM and SDSs is 8.

    As SDS-SDS IPs are on the same network,

    does SDS-SDS network(I mean rebuild and rebalance) also make use of

the all networks?

    Can we also have multiple IPs up to 8 for SDS-SDC network

    ,which means a SDC can wait on separate IP connections for a single SDS?

The current NIC configurations are just for this test environment, We

will have 2 separate NICs for SDS-SDC and SDS-SDS networks.

Anyway, thank you for your suggestion.

Many thanks,

Taizo

306 Posts

January 25th, 2017 06:00

Hi Taiz,

A1: MDM IPs can be used for communication with SDS and SDCs, however you don't have MDM IP network configured on your SDS at all - currently your SDS are only in SDS-SDS and SDS-SDC networks.

A2: Each interface should be on a separate network. If you are planning to use more than one NIC for a particular role (i.e. SDS-SDS), don't team them, rather configure different IP network on each and let ScaleIO do the load-balancing - in most cases the performance is superior to NIC bonding. You shouldn't mix MDM with SDS-SDS networks, each should be in a separate address space. I didn't quite get the last statement regarding "MDM does not use this IP at all", can you please rephrase it?

A3: Yes, you can have multiple networks for each role and ScaleIO will take care of spreading the load (regardless of whether it's a backend or frontend traffic)

Cheers,

Pawel

16 Posts

January 26th, 2017 04:00

Hi Pawel,

Thank you for the reply.

To A1,

"You don't have MDM IP network configured on your SDS at all"

How can I have MDM IP network on my SDS?

CSV has been like below. (MDM IP, SDS-SDSm SDS-SDC are all listed) ``` Password,Operating System,Is MDM/TB,MDM IPs,Is SDS,SDS-SDS Only IPs,SDS-SDC Only IPs,SDS Device List,Is SDC ********,linux,Master,10.11.184.39,Yes-1,10.11.187.39,10.11.188.39,"/dev/sda,/dev/sdc,/dev/sdj",Yes

```

There looks no scli command to add MDM IP to SDS. "--add_sds_ip" can only add SDS only, SDC only and both.

To A2,

As you advise, I will configure each network on separate interfaces.

It is good to know that ScaleIO does better in most cases than NIC bonding.

I am sorry. "MDM does not use this IPs at all" merely describes SDS-SDC Only IPs.

A line break is missing after "e.g 10.0.1.0/24, 10.0.2.0/24"

To A3,

Do I need to have multupath setting on SDC when I configure multiple networks for SDS-SDC?

To use Linux multipathing, do I need to edit the file like below?

/etc/udev/rules.d/20-scini-rules

Many thaks,

Taizo

306 Posts

January 26th, 2017 05:00

Hi Taiz,

Regarding 1) - try to configure the MDM IP in "SDS All IPs" column, 2) - yes, that's our recommendation, 3) - no, you don't have to use an multipathing - SDC driver will do all the job for you, you should simply use /dev/sciniX device.

Cheers,

Pawel

16 Posts

January 29th, 2017 23:00

Hi Pawel,

1)

So Column "MDM IPs" and "SDS All IPs" come to have the same IP addresses.

In my understanding, "SDS All IPs" stands for both "SDS-SDS Only IPs" 

and "SDS-SDC Only IPs" as GUI just telles that SDS which installed with "SDS All IPs" column eventually gains "SDS-SDS Only IPs" and "SDS-SDC Only IPs".

I think the following two configurations are same.

MDM IPs          - 10.0.1.1

SDS All IPs      - 10.0.1.1

SDS-SDS Only IPs - 10.0.2.1

SDS-SDC Only IPs - 10.0.3.1

MDM IPs          - 10.0.1.1

SDS-SDS Only IPs - 10.0.1.1, 10.0.2.1

SDS-SDC Only IPs - 10.0.1.1, 10.0.3.1

If it is the case, Rebuild(Rebalance) I/Os and App I/Os share the same network(10.0.0.1). I assume that there should be no concern because SDC does load-balancing between the 2 networkss(10.0.1.1, 10.0.3.1) by monitoring the bandwidth being used on the both networks.

For nodes which only have SDC, we need to configure just like this, right?

MDM IPs          - 10.0.1.1

SDS-SDC Only IPs - 10.0.3.1

3)

totally clear, thank you.

Much appreciated,

Taizo

16 Posts

February 9th, 2017 01:00

Hi Pawel,

Fisrt, I configured ScaleIO NW like below.

MDM Mgmt IP  - NW-A

MDM IP         - NW-B *newly added

SDS All IPs      - NW-B *newly added

SDS-SDS Only   - NW-C

SDS-SDC Only   - NW-D

I figured out that MDM does not realize MDM Mgmt failure.

So Master MDM does not switch when MasterMDM's MDM Mgmt is down and REST communication from Gateway and GUI keeps failing.

Second, I removed MDM Mgmt IP.

MDM IP        - NW-A

SDS All IPs     - NW-A

SDS-SDS Only  - NW-C

SDS-SDC Only  - NW-D

I figured out that App I/O does not use NW-D.

As SDS All IPs cover SDS-SDC network, The connection between SDSs and SDCs are established using only NW-A.

So ifdown of the all NW-D causes no performance loss of App I/Os.

But ifdown of NW-A degrades App I/Os performance and once in a while ends up in an App I/Os error.

Third, I replaced SDS All IPs with another SDS-SDS Only IPs.

MDM IP        - NW-A

SDS SDS Only  - NW-A

SDS-SDS Only  - NW-C

SDS-SDC Only  - NW-D

The connection between SDSs and SDCs are established using only NW-D.

So ifdown of NW-A causes no performance loss of App I/Os.

And SDSs are alive even when either one of NW-A and NW-C is ifdowned on the MAster MDM node.

So far, I am thinking to go with the third configuration.

Any comments would be appreciated.

Best,

Taizo

No Events found!

Top