SAN HQ Alert 4.2 in-use space on volume

Question

I have recently been reconfiguring a client EQL group (1 PS6100, 1 PS4000). As part of clean-up and modernizing of an older ESXi environment, VMs were storage vMotioned from 4000 to 6100 and the4000 reset, reconfigured and added to the "new" group started with the PS6100.

The PS6100 is RAID10 and in its own storage pool, the PS4000 is RAID6 and in its own storage pool. The PS6100 has a single 5.5 TB thin provisioned volume (total available is 5.75TB), the reconfigured PS4000 has a single 4 TB thin volume (total available is 5.66 TB).

About halfway through storage vmotioning the VMs from PS4000 to PS6100, SANHQ began reporting 4.2 in-use space warnings on the PS61000 volume even though in Group Manager the reported in-use space for the volume and pool was only around 2 TB with the rest marked free. Once the PS4000 was reconfigured and its volume created and about 1.1 TB migrated back, that volume began throwing 4.2 errors as well.

SanHQ reports 4.2 errors, both volumes currently as over 99% full, even though Group Manager shows the volumes as much less full -- the PS6100 is at about 70% full, the PS4000 only 30% full (volume, pool is only 20% full).

My understanding of thin volumes with VMware is that they only grow represented by how much space has been consumed by the upstream filesystem (eg, VMware VMFS) and only "shrink" if you manually unmap them (VMware) or run some other operating system that automatically does unmap.

That being the case, none of my thin EQL LUNs have ever shown more than 70% space utilization (maximum) and some never more than 25% maximum, yet San HQ is throwing 4.2 errors. Why?

cyberswb · Answer

Thanks, it's made me crazy at more than one location.

Rular · Answer

Same behaviour with SAN HQ 3.4 SAN HQ is reporting, that volume  is 96,4% in use - but vcsa reports only 80% in use.

dwilliam62 · Answer

Hello,

Unless UNMAP is done on a non-replicated VMFS volume, it is expected that the in-use space seen by the array (and thus SANHQ) will be different from what the OS itself reports. When a file is deleted on a VMFS volume the OS does not notify the storage device that those blocks are now free. So over time the SAN will report a larger in-use amount than the OS will.

You can manually run UNMAP on VMFS v5.x and above Datastores as long as they are not configured for EQL replication.

This will bring the in-use amounts seen by SANHQ closer to what the OS reports.

Regards,

Don

Rular · Answer

Thanks! Had an old state of unmap in my mind. While using unmap you can get some problems with your datastore. And unmap being rejected by vmware.

but now unmap seems safe to use. And with vmfs6 datastores, unmap can run as a background process with low priority.

dwilliam62 · Answer

Hello,

You are very welcome.

VMFS unmap via esxcli command is safe to use.

VMFS v6 auto reclaim does not work with SAN that's have a larger than 1MB UNMAP granular size. Which the EQL does, so you will still have to run UNMAP manually on EQL VMFS Datastores.

Regards,

Don

Bullshevic · Answer

Hi dwilliam,I am facing this exact problem where the UNMAP will not run automatically. Your post is the only one that describes the UNMAP granularity bigger than 1MB on the EQL. Is there any Dell documentation that mentions that size or that it specifically says it is bigger and non-modifiable?

dwilliam62 · Answer

Hello,

I'm sorry there is no official document. It has been 15MB since day one and there is absolutely no way to modify that.

ESXi v6.7U2 has a code change that allows larger granularity. Manual unmap using ESXCLI command still works.

Regards,

Don

Bullshevic · Answer

Don, thanks for your quick response!

I will have to stick with the manual esxcli command it seems. Here is another curve ball. I just received a call back form VMWare support and the tech confirmed the granularity mismatch and the auto UNMAP not running in that situation. However, he discouraged us from running the manual unmap command on VMFS6 volumes as it can potentially lead to some corruption on the long run. He recommended to reformat them as VMFS5 to benefit from the command "esxcli storage vmfs unmap -u ". What's your experience or knowledge about that with an EQL PS6100X?

Bullshevic · Answer

Hi Don,

I was puzzled by his answer as well. I have manually executed the unmap command without issues for the past 4 days. I have been troubleshooting the auto-unmap but in the mean time I have been using it manually without issues. The only reason the VMWare engineer provided is that on this KB:

https://kb.vmware.com/s/article/2057513

Right under "Resolution" states this: "Note: If you are using VMFS6 in ESXi 6.5, this article is not applicable."

It doesn't say anything about causing issues and from the document it seems as they assume AUTO-Unmap is a guaranteed with VMFS6 and they don't seem to take in consideration that some SANs like the EQL does not match the granularity. Let me know what you think.

dwilliam62 · Answer

Hello,

That is something I have never heard of before or personally seen.

I am very confused by it though. In my personal opinion.

The command creates a hidden file, based on the number of units specified. Default is 200 units at 1MB each, so 200MB file.

The the file is deleted. The OS then sends the SCSI UNMAP commands to the array. Which includes the starting LBA and the number of LBAs to sequentially UNMAP after that.

So after the hidden file is deleted the rest is up to the storage device.

I would ask them for more clarification. Is there an VMware KB stating this?

Maybe it happened on a particular storage device?

VMware v6.7U2 does allow for auto UNMAP on storage with granularity of greater than 1MB

Regards,

Don

Bullshevic · Answer

I guess this is the only other case I can find about corruption?   https://www.reddit.com/r/vmware/comments/7f0n9h/datastore_corruption_with_equallogic_and_vmfs6/

dwilliam62 · Answer

Hello,  What firmware are you running on your EQL storage? Don

Bullshevic · Answer

It's PS6100X running the latest firmware to date: V10.0.2 (R465844) ESXi Hosts are all running the latest updates from VMWare for 6.5:  VMware ESXi, 6.5.0, 13004031

dwilliam62 · Answer

Thank you.   Well you could run at test Datastore then start UNMAPs over and over.  The Reddit article was over a year old.   Don

EqualLogic

SAN HQ Alert 4.2 in-use space on volume

Was this post helpful?