Unsolved
This post is more than 5 years old
29 Posts
0
1547
April 16th, 2016 09:00
Experiencing subpar performance installing VMs onto SSD pool.
I have a 6-node cluster of servers that are all configured nearly identically. I've installed Scale IO 2.0 and seem to get pretty below standard performance, or, at least lower than I would expect from a solid state storage pool.
So each node is basically a Supermicro board with LSI 2308 in IT mode, x2 E5-2670 chips, and 64GB of RAM. OS is Server 2012 R2, each config is hyperconverged with SDS/SDC running on each exposing the SSD pool to failover clustering as a cluster shared volume.
The primary circuit for the SAN is a 10 gigabit network, on a Quanta LB6M switch. Each node has a ConnectX-3 EN dual port card. I've configured jumbo frames at the switch and NIC and confirmed it with pings to appropriate IPs.
So when I have my 6 SSDs together in a storage pool and added to Server 2012 Failover Clustering as a Cluster Shared Volume, my 'test' is to install a VM onto the solid state storage pool. I was figuring that at a minimum I could at least get up to the write speed of one disk during an install, which is about 480~MB/s or so. Unfortunately when I start installing an OS to a VM at best I get like 130 MB/s write bandwidth, each node contributing about like 15-20MB/s or so. Pretty weak stuff. Maybe 1-2k IOPS if I'm lucky. I'm using thin provisioning.
I've referred to EMC's performance tuning guide but I don't think they've released an updated version for 2.0 yet. So the guide for 1.32 is a bit outdated in a few ways, firstly is that the "num_of_io_buffers" command option I think is removed from the SCLI and I can't find it in the GUI, so it must be related to the SDS/MDM/SDC performance setting in the GUI which I've set to "High" for all nodes/services.
Secondly the guide says to configure the SDS cfg file with special paramters if it's running on flash based storage, when I did that the SDS' were not able to connect to the MDMs, so that obviously doesn't work. There's also a recommendation to remove AckDelay from tcpip in the registry, however, this doesn't apply to Server 2012 R2 anymore. The last recommend is to change the values of DWORDs in the 'scini' registry which weren't there for me, so, I added them, but I'm not sure it made any difference.
I'm hoping there's someone else out there who has a similar setup or has troubleshooted similar performance issues before who can help point me in the right direction. Cause I'm really not sure what else to set or do. When I remove an SSD from the pool/cluster and test it with something like CrystalDiskMark I get the advertised speeds from the manufacturer. The SSDs in question are 960GB SanDisk Cloudspeed Ascends.
I pointed this out to some folks who said it was possible the SSDs are not optimized for synchronous writes, I'm not entirely sure how ScaleIO works and if it's synchronous but I've read some stuff that may suggest it is, however, again I'm not sure that's the issue. I would've thought that I would've seen decent write performance since doesn't ScaleIO write to each drive since it's distributed block storage?
Thanks for any insight.
Dajinn
29 Posts
0
April 22nd, 2016 08:00
Hi David,
I did use fio but the way I used it was recommended to basically run it in synchronous 100% write mode, so basically depth and length vars were both set to 1, (in Windows). I ran it against an SIO volume and saw pretty low IOPS, but again, that was probably because of the depth and length being only 1. If ScaleIO SIO volumes are not synchronous then I would understand your concern if that wasn't the proper benchmark to perform. If you have any general suggestions about how many threads/depth/length to set in a fio test I'd be happy to run it against an SIO volume and report results.
As for the OS install, I'm installing the OS from an iso file that resides on local storage, the local storage in question is an Intel S3500 boot drive on each node.
Thanks,
Chris