Start a Conversation

Unsolved

This post is more than 5 years old

8981

December 8th, 2015 07:00

ScaleIO VMware Poor Performance

We had good success with ScaleIO running on dedicated Centos 7 Bare Metal servers.

Environment is 4 nodes running Centos 7, dual 10Gb attached and local SSDs JBOD storage.

SDC client performance is great and systems are very stable using different workloads.

We have not been successful running SDS within VMware in hyper-converged fashion.

We take the exact same system as above, and re-install in this fashion

Environment is SAME 4 nodes this time running ESX 6.0u1a, dual 10Gb attached and local SSDs JBOD storage.


Hardware is the same, network is the same, we just replace the OS with ESX

Each node with LSI Controller Direct JBOD Mode (Controller cache is bypassed)

2 MDMs running as Centos 7 Guest OS on ESX local storage

1 TB running as Centos 7 Guest OS on ESX local storage

4 SDSs running as Centos 7 Guest OS on ESX local storage (we followed the SIO performance tuning guide)

8 SSDs dedicated to SIO in JBOD mode (2 per ESX server)

4 SDCs ESXi Kernel VIB driver mapping SIO Datastores from SSDs pool

Simple VM guest install on the SIO Datastores performance heavily impacted

Doing a simple linux DD bandwidth test on a guest VM using the SIO shared storage, the performance is not consistent some time results are somewhat ok 500MB/s, some times performance is very bad 20MB/s, yes we are purging the Linux FS cache before each run.

The system performance degrades tremendously and is inconsistent, when you do TOP or IOTOP you can see high wait time on the SDSs (yes we are using NOOP scheduler, all SIO datastores are thick-provisioned lazy zero).

It has been 3 days of trials...we are unable to pin-point where the problem is.

Your assistance is appreciated.

Thank you!

12 Posts

December 9th, 2015 05:00

We have the same problem with vmware installation.

And if we add direct mode for dd command IOPS are:

dd if=/dev/zero of=/mnt/ssd/t1 bs=4k oflag=direct

virtual disk: busy 97%, write iops 1200, write 5 MB/s

dd if=/dev/zero of=/mnt/ssd/t1 bs=64k oflag=direct

virtual disk: busy 97%, write iops 600, write 40 MB/s

dd if=/dev/zero of=/mnt/ssd/t1 bs=512k oflag=direct

virtual disk: busy 98%, write iops 150, write 70 MB/s

That's maximum we can get on 3 SSDs and 10 Gbit network

December 10th, 2015 00:00

1st of all I can refer you to our fine tuning doc to make sure all this is set and then re-test your performance:

https://support.emc.com/docu55407_Fine-Tuning_ScaleIO_1.3X_Performance.pdf?language=en_US


But beside this document  there are other non-networking reasons /factors that can effect the performance:

Is read cache enabled or disabled?

How many CPUs were assigned to each SVM?

Lazy zero policy affects the performance as well.

So you can also check if some / all these can help:

turning the cache off

trying to write to already zeroed areas

assigning 8 CPUs to each SVM

SSDs can be exposed to SVMs in a direct mapping

23 Posts

December 11th, 2015 06:00

Hi, thank you for the suggestions!

Yes we are following the performance tuning guide, the results are incorrect before and after tuning.


Networking is rock solid, we are looking at the 10Gb switch is dedicated to this environment, nothing else is overwhelming the network.


Is read cache enabled or disabled? <-Cache is disabled

How many CPUs were assigned to each SVM? <-8vCPU (but is not CPU bound)

Lazy zero policy affects the performance as well. <-Yes correct

SSDs can be exposed to SVMs in a direct mapping <- we tried mapping the SSDs as datastores or RDM with the same result.

Please can you help us confirm you have in your lab a ScaleIO environment running within ESX with stable performance over 1GB/s throughput?

Our system is able to do that kind of performance not running ESX, is when we visualize ScaleIO the problem occurs.

Thank you for your help.

23 Posts

December 11th, 2015 06:00

Some reference positive test results of the same environment running non-virtualized bare metal Centos 7 ScaleIO, and Windows 2012 R2 SDC client

Consecutive DD back to back, performance is very good and predictable...

dd.exe if=/dev/zero of=f:\bigfile bs=4096 count=1000024

1000024+0 records in

1000024+0 records out

4096098304 bytes (4.1 GB) copied, 3.56679 s, 1.1 GB/s

dd.exe if=/dev/zero of=f:\bigfile bs=8092 count=1000024

1000024+0 records in

1000024+0 records out

8092194208 bytes (8.1 GB) copied, 7.40389 s, 1.1 GB/s

dd.exe if=/dev/zero of=f:\bigfile bs=12288 count=1000024

1000024+0 records in

1000024+0 records out

12288294912 bytes (12 GB) copied, 11.1811 s, 1.1 GB/s

23 Posts

December 11th, 2015 06:00

Correct, seems similar behavior to our results.

We use exactly the same hardware and config but running in Centos 7 bare metal, and performance is great!

It sounds like something related to how VMware addresses high throughput passing blocks through the kernel stack, we see high CPU wait time on the SDS VMs....

In contrast, running the same in Bare metal non-virtualized the CPU is hardly busy.

How long have you had this problem? have you been able to work it out or pin-point?

Thank you for helping!

23 Posts

December 11th, 2015 06:00

We been struggling with this problem for 4 days now, it all started when we imported VM templates into a freshly installed ESXi 6 4 x node cluster with ScaleIO datastores.

The copy/import performance is very slow 4GB VM took 4 to 5 minutes.

That trigger a set of additional tests, like installing vCenter Server 6 appliance directly on the ScaIeIO shared with all ESXi hosts datastore, the installation was never ever finished it had to be aborted.

Doing the same test on a local (not ScaleIO) SSD attached datastore completed successfully, performance was acceptable.

Yes you are correct, we understand about FIO and IOmeter, we used DD as a simple way to point sequential IO problems, the point is we ran DD multiple times on the ScaleIO datastores and the results were very inconsistent, not 10% or 20% delta but 500% to 600% delta between each run, that is obviously not good!

Lets talk about how the environment is architected in more detail:

ScaleIO version EMC-ScaleIO-XXX-1.32-2451.4.el7.x86_64

Node 1:

Xeon Dual 8 Core Hyper-Threaded 32GB RAM

LSI 3108 SAS 12Gb in JBOD Mode (cache is disabled)

2 x SAS SSDs JBOD (Dedicated for ScaleIO)

Intel X540 Dual Port 10Gb

ESXi 6.0u1a latest (SDC Driver instaled on ESXi kernel)

Inside ESXi Node 1 Guest VM Centos 7 latest for MDM01 8vCPU 8GB RAM VMXNET3 NIC

Inside ESXi Node 1 Guest VM Centos 7 latest for SDS01 8vCPU 8GB RAM VMXNET3 NIC (2 x SSDs JBOD as 2 x VMFS5 Datastores using Paravirtual Controller dedicated for ScaleIO)

1 x VMFS5 Datastore multi-mapped volume from ScaleIO SSD Pool to all 4 ESXi servers

Node 2:

Xeon Dual 8 Core Hyper-Threaded 32GB RAM

LSI 3108 SAS 12Gb in JBOD Mode (cache is disabled)

2 x SAS SSDs JBOD (Dedicated for ScaleIO)

Intel X540 Dual Port 10Gb

ESXi 6.0u1a latest (SDC Driver instaled on ESXi kernel)

Inside ESXi Node 2 Guest VM Centos 7 latest for MDM02 8vCPU 8GB RAM VMXNET3 NIC

Inside ESXi Node 2 Guest VM Centos 7 latest for SDS02 8vCPU 8GB RAM VMXNET3 NIC (2 x SSDs JBOD as 2 x VMFS5 Datastores using Paravirtual Controller dedicated for ScaleIO)

1 x VMFS5 Datastore multi-mapped volume from ScaleIO SSD Pool to all 4 ESXi servers

Node 3:

Xeon Dual 8 Core Hyper-Threaded 32GB RAM

LSI 3108 SAS 12Gb in JBOD Mode (cache is disabled)

2 x SAS SSDs JBOD (Dedicated for ScaleIO)

Intel X540 Dual Port 10Gb

ESXi 6.0u1a latest (SDC Driver instaled on ESXi kernel)

Inside ESXi Node 3 Guest VM Centos 7 latest for TB01 8vCPU 8GB RAM VMXNET3 NIC

Inside ESXi Node 3 Guest VM Centos 7 latest for SDS03 8vCPU 8GB RAM VMXNET3 NIC (2 x SSDs JBOD as 2 x VMFS5 Datastores using Paravirtual Controller dedicated for ScaleIO)

1 x VMFS5 Datastore multi-mapped volume from ScaleIO SSD Pool to all 4 ESXi servers

Node 4:

Xeon Dual 8 Core Hyper-Threaded 32GB RAM

LSI 3108 SAS 12Gb in JBOD Mode (cache is disabled)

2 x SAS SSDs JBOD (Dedicated for ScaleIO)

Intel X540 Dual Port 10Gb

ESXi 6.0u1a latest (SDC Driver instaled on ESXi kernel)

Inside ESXi Node 4 Guest VM Centos 7 latest for SDS04 8vCPU 8GB RAM VMXNET3 NIC(2 x SSDs JBOD as 2 x VMFS5 Datastores using Paravirtual Controller dedicated for ScaleIO)

1 x VMFS5 Datastore multi-mapped volume from ScaleIO SSD Pool to all 4 ESXi servers

Like we said before, when we ran simple copy test onto the shared ScaleIO datastore and performance is not stable.

We take the same configuration as above, and instead of installing ESXi bare metal we install Centos 7 bare metal, the performance is rock solid and stable, throughput is over 1.2GB/s on every test run.

We really want this ESXi ScaleIO configuration to work, as it will enable us to do additional benefits!

Thank you David!

23 Posts

December 14th, 2015 10:00

This weekend we did additional testing, the problem seems to be related to the ESX networking but the problem unfortunately is not resolved.

As a baseline we did a network throughput test from a linux bare metal server to another linux bare metal server

The same 10Gb network devices/switches and HW config as before, network performance is excellent


Ncat: Version 6.40 ( http://nmap.org/ncat )

Ncat: Connected to 10.1.1.112:8100.

1024+0 records in

1024+0 records out

10510925824 bytes (11 GB) copied, 10.5302 s, 998 MB/s

Ncat: 10510925824 bytes sent, 0 bytes received in 10.53 seconds.

Now we do the same test from a linux bare metal server from above to one of the ESXi (5.5U3b) servers VMkernel interfaces

Please note we are testing performance to the ESXi VMkernel, not to a virtual machine within ESX

The 10Gb performance is very low in comparison with the prior test...

Ncat: Version 6.40 ( http://nmap.org/ncat )

Ncat: Connected to 10.1.1.84:8100.

1024+0 records in

1024+0 records out

10510925824 bytes (11 GB) copied, 51.3748 s, 205 MB/s

9:12:34am up  1:17, 536 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.01, 0.02, 0.02

   PORT-ID              USED-BY  TEAM-PNIC DNAME              PKTTX/s  MbTX/s    PKTRX/s  MbRX/s %DRPTX %DRPRX

  33554433                Management        n/a vSwitch0                        0.00    0.00       0.00    0.00   0.00   0.00

  33554434               vmnic0          - vSwitch0                              69581.29   35.04  131394.83 1517.66   0.00   0.00

  33554435          Shadow of vmnic0        n/a vSwitch0                   0.00    0.00       0.00    0.00   0.00   0.00

  33554436                 vmk0     vmnic0 vSwitch0                         69581.09   35.04       3.55    0.00   0.00   0.00

  50331649           Management        n/a vSwitch1                             0.00    0.00       0.00    0.00   0.00   0.00

  50331650               vmnic1          - vSwitch1                                  0.00    0.00  131396.21 1517.69   0.00   0.00

  50331651          Shadow of vmnic1        n/a vSwitch1                   0.00    0.00       0.00    0.00   0.00   0.00

  50331653                 vmk1     vmnic1 vSwitch1                             0.00    0.00   11711.85 1457.40   0.00   0.00

We attempted to troubleshot the performance by upgrading the stock ESXi Intel x540 driver to

ixgbe-3.21.4-1710123

And

ixgbe-4.1.1.1-1331820-3187103

We tried the same scenario to an ESXi 6.01a host, the performance issue repeats.

The performance issue is still unresolved.

Please advice what could we wrong with ESXi networking?

12 Posts

December 18th, 2015 11:00

We have the same issue with esxi 10G networking.

You can try set "VM Options > Advanced > Latency Sensitivity" to Hight (you need reserve all vm memory, as well this option will reserve cpu) for every sds vm.

I know, this is not the best solution, but we got 8Gb/s vm-to-vm speed after this.

PS. I'll be happy to find another ways to get good vm-to-vm speed in vmware...

12 Posts

January 14th, 2016 11:00

Are there anybody? I have more news on this theme.

We have ScaleIO cluster on bare metal Centos7 (3 nodes with 1 SSD disk each), network - 1Gbit

On vmware vm with centos7 I setup two disks:

- the first is using scini sdc (native linux client)

- the second is using vmware sdc, then create VMFS datastore and place virtual disk there (I tried RDM this device without VMFS - got the same results)

After that I run fio: fio --name=testfile --readwrite=randread --time_based --runtime=10 --direct=1 --numjobs=1 --size=256M --bs=4k

Results:

- First variant 1800 IOPS

- Second variant 70 IOPS

As we see, linux sdc is 20 times faster than vmware sdc

PS. All recommendation on SDC from fine-tuning-scaleio-performance guide applied (except jumbo frames, but it is the same for both drives).

51 Posts

January 15th, 2016 02:00

Just one thing guy, you are just trying sequential testing for what i saw.

When you do a sequential stuff on a bare metal host it will be sequential, but if you do a sequential stuff in a VM, vmware will transform the load into random patern.

So you can never match the same performance in a VM than a bare metal server and at the end you can't really compare both of them.

What you can compare is running a VM on KVM into a centos bare metal vs a Vm in VMWARE.

And by the way check the perf mon of scaleIO when you are doing the test.

12 Posts

January 15th, 2016 02:00

Have you read my last post? Or just "wrote something to wrote"?

I analyze ONLY client software (sdc).

I use RANDOM READ operation!

And as I wrote, we use SSD disks.

Do you think 70 IOPS is good for ANY sort of load on SSD disk?

60 Posts

January 17th, 2016 01:00

Hi,

Lets perform a few more tests?

Firs, lets test your network, for this please run fio on ESX sdc:

fio --name=testfile --readwrite=randread --time_based --runtime=10 --direct=1 --numjobs=3 --iodepth=16 --size=256M --bs=64k

Then, please run device test on one of the SDS nodes, using ScaleIO scli and post the results of both tests here.


Thanks.


-Alex.

12 Posts

January 18th, 2016 03:00

fio --name=testfile --readwrite=randread --time_based --runtime=10 --direct=1 --numjobs=3 --iodepth=16 --size=256M --bs=64k


on ESX sdc: 350 IOPS

    lat (msec) : 2=23.26%, 4=36.78%, 10=16.80%, 20=10.94%, 50=12.06%

    lat (msec) : 2=25.73%, 4=35.97%, 10=16.74%, 20=9.24%, 50=12.32%

    lat (msec) : 2=24.72%, 4=37.25%, 10=16.05%, 20=9.18%, 50=12.79%


on linux sdc: 1389 IOPS

    lat (msec) : 2=88.74%, 4=7.22%, 10=0.12%, 20=0.60%, 50=0.78%

    lat (msec) : 2=87.24%, 4=8.37%, 10=0.29%, 20=0.66%, 50=1.24%

    lat (msec) : 2=89.85%, 4=7.14%, 10=0.20%, 20=0.59%, 50=0.65%


My remind here - all these tests are made on the SAME vm (linux centos 7 vm) on the same ESX host!

First disk is ScaleIO using vmware sdc, then VMFS datastore on it, then vmdk disk on datastore.

Second disk is ScaleIO using linux sdc directly in vm.

There can not be network difference on these tests!


PS. We have bandwidth limit in linux sdc test I think (91MB/s on 1G network)

PPS. We have no bandwidth limit in vmware sdc test (22MB/s on 1G network)


60 Posts

January 18th, 2016 05:00

Can you please post the output of: cat /etc/vmware/esx.conf | grep scini

60 Posts

January 18th, 2016 05:00

There is a problem with this string, although this shouldn't cause any performance issues, however, you've configured only a single MDM IP for an SDC, while you have to configure the IPs of all MDMs, otherwise in case of switchover between the MDMs, you are going to experience DU.

Sorry for asking more and more questions, however:

1. Did you reboot your ESX hosts after making changes of the scini parameters?

2. Please post the output os scli --query_all_sds, scli --query_all_sdc, scli --query_cluster

3. What is the IP of Linux SDC and ESX SDC?

No Events found!

Top