Paul_Dwyer

12 Posts

7698

October 22nd, 2012 00:00

Pool and Volume latency data doesn't match in SanHQ

Using some PS6000's (Sata and SAS) as a back end for ESXi and looking at both the graphical display on SanHQ and the raw data on an export but I'm not able to make sense of the latency information.

Generally for fields, the data in the Pool or Member column matches very closely to the sum of the volume parts (for IOPS, Throughput, read, write etc)

Latency however does not, nor does it match anywhere near an average either.

In the graphical display we have been noticing high (20-40ms) read latency on our Pools and members but the volumes that make up those pools are all under 10ms. During a storage vmotion we'll see a huge spike in IOPS and KB sizes and latency will drop to nearly nothing during the copy the bounce back to high levels as the traffic goes down.

This doesn't make a lot of sense to me (except maybe if seq I/O is lower latency to process)

To monitor latency it seems like I need to watch the highest latency volume of a given pool but not the pool itself or I get skewed data.

I'd like to get some reports from MS Log Parser against the export CSVs but I need to make sense of this first. Latency is a KPI not making a lot of sense.

Responses(5)

DELL-Kenny K

685 Posts

0

October 23rd, 2012 16:00

When you see the IOPS go up and the Latency goes down, that is actually a normal thing. It is referred to as Nagle's algorithm. It pretty much makes TCP packets wait till they're full before sending them. Here is a document that may be helpful that explains the algorithm in more detail.

http://searchnetworking.techtarget.com/definition/Nagles-algorithm

As far as the reporting issue you are referring to I would recommend making sure the latest FW is installed on all of the members in the group and well ensuring the latest version of SAN HQ is also installed. Just curious do you happen to have replication configured? I look forward to hearing back from you.

Paul_Dwyer

12 Posts

0

October 23rd, 2012 19:00

Thank you, this matches what I'm seeing. The algo info makes good reading too!

Paul_Dwyer

12 Posts

0

October 23rd, 2012 22:00

Sorry, I forgot to reply to your questions.

Our sans here are mainly for ESXi and one for MS DPM.

SanHQ is on the lastest version, the DPM SANs are on 5.2.5 and the ESXI SANs are on 5.2.2

San HQ is a little young though I think, has bugs and typos in it's reports and columns missing from it's exports. They should rebuild it on SQLite and put some power into those flat files, performance is a little lacking.

Christian Hanse

1 Rookie

•

62 Posts

0

October 24th, 2012 23:00

Ive actually often wondered the same, about the whole "low latency on volumes, high latency on pools/members" and i opend a support case.

What the support engineer came back with, was that the member latency atleast appears to be a cumulative of the read or write latency of all the volumes on the member. Im not sure if thats correct or not, but its the best i could get out of support.

And yes, the performance gets pretty bad when your handling larger logs in sanHQ

Origin3k

4 Operator

•

2.3K Posts

1

October 27th, 2012 09:00

If your using ESXi and your normal workload is a low IOPS one consider to disalbe the "DelayedACK" feature for the vSphere swISCSI Initator or Target Configuration. Its enable by default and allow that ESXi hosts collect and wait until he have enough packets to send out in a whole.

This behaviour effect the statics on SANhq site.

A reboot of the host is needed when you change the parameter. Our latency for ready drops from 30-70ms down to >10ms in the graphs.

Regards

Joerg

View All

No Events found!

FluidFS

Pool and Volume latency data doesn't match in SanHQ

Was this post helpful?