Start a Conversation

Unsolved

This post is more than 5 years old

1542

August 15th, 2014 08:00

Difficulty doing efficient large IO through PowerPath 5.7 under Linux to VNX5300

There seems to be an internal limit of 1 MB when performing large IO through PowerPath v 5.7x.  Can this number be increased to 2 MB via some configuration setting?

Background.

VNX5300 configured to do large IO ... specifically (4+1) R5 using special stripe element size of 1024 (512kb), yielding a 2MB full-stripe size.  600 GB 15K RPM disks are being used.

Trying to properly configure host to allow full-stripe reads and writes.  IBM GPFS being used, and volumes and IO are 2MB aligned.

We understand that you need to change

     /sys/block/emcpower{n}/queue/max_sectors_kb

to a larger value to enable the Linux block layer from breaking up the IO before Powerpath


And you also need to change the child paths:

     /sys/block/sd{n}/queue/max_sectors_kb


There are other Linux block layer parameters that you may also want to change (like scheduler to noop, so Linux does not mess with PowerPath's IO scheduling), but that is not the concern here.


We are an experienced Linux dm-multipath user, and are running RANDOM IO as large as 8 MB under dm-multipath.  We know we need to change /sys/block/{multipath-pseudo-device}/queue/max_sectors_kb, /sys/block/{sd-xx-name}/queue/max_sectors_kb, AND enable large scatter/gather lists in the fibre channel driver.  For Emulex, this is the "options lpfc lpfc_sg_seg_cnt=512" setting in the /etc/modprobe.d/local.conf file.  This scatter/gather setting is visible in /sys/class/scsi_host/host{n}/sg_tablesize.


Setting lpfc lpfc_sg_seg_cnt=512 enables 8 MB IO through the Emulex 8.3+ driver under RHEL 6.x.  We are doing this with non-EMC storage.


When we use EMC PowerPath to manage a VNX5300, we get some unexpected behavior.


We can set the max_sectors_kb for the emcpower and "sd" devices to 2 MB, and we validate that 2 MB transfers are happening but performance is POOR.  About 1/2 as expected, with extra-long service times.


We lower max_sectors_kb for the emcpower and "sd" devices to 512kb, io transfer size lowers to 512kb as expected, and overall performance goes UP, but there more CPU consumed, and the disks appear busier.


Playing with the sizes, setting a value of 1024 kb to 1920 kb yields the best tradeoff ... with the IO transfer size being 1024 kb.  We can max-out the throughput of the 2 x 8 gbit FC controllers using 1MB IO transfer sizes.  

Settings of 1025 to 1920 yield a 1MB transfer size and are efficient.  There seems to be some rounding-down to a 1MB boundary.

Settings above 1920 .... all the way to 8MB (with the file system configured to a 2 MB block size, and doing no more than 2MB IO) yield 1/2 the performance.  It looks like the 2 MB IO coming from the scsi block layer above powerpath is mis-handled within powerpath.


If we shield powerpath from seeing a 2 MB IO size, by lowering the max_sectors_kb below 2MB, Powerpath is happy to use 1 MB IO transfers.


I'm wondering if there is a /etc/modprobe.d/powerpath.conf parameter that needs to be increased to enable EFFICIENT 2 MB IO.  Note ... we CAN what appears to be 2 MB IO transfers, but they are occurring with much lower throughput.


We are using the Linux "iostat" tool to monitor the IO statistics on each individual child path, and then properly aggregating the statistics from the child paths of a given LUN using proper math methods and re-compute the average IO sizes and statistical service and response times.  These monitoring tools have been used for years on non-PowerPath systems.


Is there some additional setting needed to allow PowerPath 5.7x to efficiently handle IO greater than 1MB, after the max_sectors_kb Linux block setting has already been increased for the emcpower* pseudo device, the sd* child paths, and the FC driver's scatter/gather list setting?


Thank you for your help.


Dave B



No Responses!
No Events found!

Top