Start a Conversation

Unsolved

This post is more than 5 years old

1662

April 19th, 2016 20:00

Issue with random IO speeds with more than one NIC

I'm running into a weird issue with ScaleIO 2.0 on Windows Server 2012 R2, three nodes with 3x 960GB SanDisk Extreme Pro SSDs per node.

With just 1GbE, 10GbE or 40Gb Infiniband (with IPoIB), I get speeds as expected. For example, over 10GbE or IB I get around 40MB/s read and 20MB/s write random 4K.

However, when I enable a second NIC on the MDM and SDCs (on a different subnet and different physical NIC and switch, all separate), the speeds drop to <1MB/s read/write for the random 4K test.

What is going on? If I remove either network out, the speed goes back up.

306 Posts

April 21st, 2016 23:00

Hi,

How exactly is your second network configured? Are you using some kind of a NIC teaming, or you just add another NIC in different network?

You mentioned you tried to add second IPs to the MDM and SDCs, how about the SDS?

Can you please provide scli --query_all and scli --query_all_sds outputs?

Thanks!

Pawel

5 Posts

April 24th, 2016 20:00

Pawel

The second network is an additional NIC on a different network. The networks involved are:

172.16.32.0/24 is the 1GbE management network, 1500 MTU

10.0.212.0/24 is the 10GbE network, 9000 MTU

10.0.213.0/24 is the IPoIB network over 40/32Gb IB, getting around 15-20Gb/s over IPoIB

Three nodes (hv1, hv2, hv3) that are all identically configured hardware wise and running Windows Server 2012 R2 Hyper-V in a hyperconverged setup with ScaleIO as both SDS+SDC on all nodes.

If I use one of the networks individually, I get speeds as such:

Seq Q32T1: 1937MB/s read, 1103MB/s write

Random 4K Q32T1: 216MB/s read, 116MB/s write

Seq: 941MB/s read, 430MB/s write

Random 4K: 45MB/s, 19MB/s

If I use both at the same time (or even add in the 1GbE network), speeds for Random 4K tank:

Seq Q32T1: 2344MB/s read, 997MB/s write

Random 4K Q32T1: 240MB/s read, 139MB/s write

Seq: 475MB/s read, 402MB/s write             <-- about 1/2 speed read

Random 4K: 1MB/s, 1MB/s                         <-- significant drop

Info as requested:

Query-all-SDS returned 3 SDS nodes.

Protection Domain 526c578f00000000 Name: default

SDS ID: b0acd0a100000002 Name: hv1 State: Connected, Joined IP: 10.0.213.1,10.0.212.1 Port: 7072 Version: 2.0.5014

SDS ID: b0acd0a000000001 Name: hv2 State: Connected, Joined IP: 10.0.213.2,10.0.212.2 Port: 7072 Version: 2.0.5014

SDS ID: b0acd09f00000000 Name: hv3 State: Connected, Joined IP: 10.0.213.3,10.0.212.3 Port: 7072 Version: 2.0.5014

System Info:

  Product:  EMC ScaleIO Version: R2_0.5014.0

  ID:     

  Manager ID:      0000000000000000

License info:

  Installation ID:

  SWID:

  Maximum capacity: Unlimited

  Usage time Enterprise features: Enabled

  The system was activated 55 days ago

System settings:

  Capacity alert thresholds: High: 80, Critical: 90

  Thick volume reservation percent: 0

  MDM restricted SDC mode: disabled

  Management Clients secure communication: disabled

  TLS version: TLSv1.2

  User authentication method: Native

  SDS connection authentication: Disabled

Query all returned 1 Protection Domain:

Protection Domain default (Id: 526c578f00000000) has 1 storage pools, 0 Fault Sets, 3 SDS nodes, 2 volumes and 1.6 TB (1624 GB) available for volume allocation

Operational state is Active

Rfcache enabled, Mode: Write miss, Page Size 64 KB, Max IO size 128 KB

Storage Pool default (Id: 4612372000000000) has 2 volumes and 1.6 TB (1624 GB) available for volume allocation

  The number of parallel rebuild/rebalance jobs: 2

  Rebuild is enabled and using Limit-Concurrent-IO policy with the following parameters:

  Number of concurrent IOs per device: 1

  Rebalance is enabled and using Limit-Concurrent-IO policy with the following parameters:

  Number of concurrent IOs per device: 1

  Background device scanner: Disabled

  Zero padding is disabled

  Spare policy: 34% out of total

  Checksum mode: disabled

  Uses RAM Read Cache

  RAM Read Cache write handling mode is 'cached'

  Doesn't use Flash Read Cache

  Capacity alert thresholds: High: 80, Critical: 90

SDS Summary:

  Total 3 SDS Nodes

  3 SDS nodes have membership state 'Joined'

  3 SDS nodes have connection state 'Connected'

  7.8 TB (8038 GB) total capacity

  3.2 TB (3305 GB) unused capacity

  0 Bytes snapshots capacity

  2.0 TB (2000 GB) in-use capacity

  70.0 MB (71680 KB) thin capacity

  2.0 TB (2000 GB) protected capacity

  0 Bytes failed capacity

  0 Bytes degraded-failed capacity

  0 Bytes degraded-healthy capacity

  0 Bytes unreachable-unused capacity

  0 Bytes active rebalance capacity

  0 Bytes pending rebalance capacity

  0 Bytes active forward-rebuild capacity

  0 Bytes pending forward-rebuild capacity

  0 Bytes active backward-rebuild capacity

  0 Bytes pending backward-rebuild capacity

  0 Bytes rebalance capacity

  0 Bytes forward-rebuild capacity

  0 Bytes backward-rebuild capacity

  0 Bytes active moving capacity

  0 Bytes pending moving capacity

  0 Bytes total moving capacity

  2.7 TB (2732 GB) spare capacity

  2.0 TB (2000 GB) at-rest capacity

  0 Bytes semi-protected capacity

  0 Bytes in-maintenance capacity

  0 Bytes decreased capacity

  Primary-reads                            0 IOPS 0 Bytes per-second

  Primary-writes                           15 IOPS 108.0 KB (110592 Bytes) per-second

  Secondary-reads                          0 IOPS 0 Bytes per-second

  Secondary-writes                         17 IOPS 158.4 KB (162201 Bytes) per-second

  Backward-rebuild-reads                   0 IOPS 0 Bytes per-second

  Backward-rebuild-writes                  0 IOPS 0 Bytes per-second

  Forward-rebuild-reads                    0 IOPS 0 Bytes per-second

  Forward-rebuild-writes                   0 IOPS 0 Bytes per-second

  Rebalance-reads                          0 IOPS 0 Bytes per-second

  Rebalance-writes                         0 IOPS 0 Bytes per-second

Volumes summary:

  1 thick-provisioned volume. Total size: 1000.0 GB (1024000 MB)

  1 thin-provisioned volume. Total used size: 35.0 MB (35840 KB)

Thanks for the help!

306 Posts

April 24th, 2016 23:00

Thank you; can you also paste "ipconfig" output from one of the nodes as well as try to ping one node from another using Jumbo Frames?

i.e. from 10.0.213.1:

ping -f -l 9000 10.0.213.2

ping -f -l 9000 10.0.213.3

ping -f -l 9000 10.0.212.2

ping -f -l 9000 10.0.212.3

5 Posts

April 25th, 2016 18:00

Windows IP Configuration

Ethernet adapter Ethernet 32G IPoIB:

   Connection-specific DNS Suffix  . :

   IPv4 Address. . . . . . . . . . . : 10.0.213.1

   Subnet Mask . . . . . . . . . . . : 255.255.255.0

   Default Gateway . . . . . . . . . :

Ethernet adapter Ethernet 10G:

   Connection-specific DNS Suffix  . :

   IPv4 Address. . . . . . . . . . . : 10.0.212.1

   Subnet Mask . . . . . . . . . . . : 255.255.255.0

   Default Gateway . . . . . . . . . :

Ethernet adapter vEthernet (VM) #1:

   Connection-specific DNS Suffix  . :

   IPv4 Address. . . . . . . . . . . : 172.16.32.70

   Subnet Mask . . . . . . . . . . . : 255.255.255.0

   Default Gateway . . . . . . . . . : 172.16.32.1

However, the ping will fail as 9000 doesn't account for the UDP and ICMP overhead. With the MTU set at 9014 I can ping on the 212 10GbE subnet at 8972 bytes, switch is set to 9216.

On the IPoIB, the MTU is 4092 so I can't do a ping of 8972. This is an IPoIB best practice.

I also tried lowering the MTU down to 1514 on all interfaces and I still end up with very slow 4K writes when using any two interfaces. For example, the 1GbE and 10GbE interfaces (on different switches) both at 1514.

Very odd... this is a weird issue.

No Events found!

Top