Unsolved

This post is more than 5 years old

132 Posts

136772

August 14th, 2012 13:00

vFoglight alerts to run

Here are the alerts which we think are important, the rest we turn off:

  • CPU % Utilization – various measures – Fatal level only 
    • Cluster CPU Utilized > 60%
    • ESX CPU Utilized > 70%
  • VM % ready > 5%
  • VM and ESX Memory Swapping – any is bad, not the same as OS level swapping, this is swapping at the ESX host level, causes massive performance issues
  • ESX or VM Memory Balloon Deflation Failure (Balloon memory not being released, indicates an OS level issue)
  • Cluster vRAM allocated > 100%
  • Cluster vRAM Utilized > 70%
  • Low Datastore Disk Space - Cluster datastore with less than 50GB free is a real issue when you have to do a restore
  • Low VM Volume disk space – Fatal level only – need to decide on thresholds (C:\ Drive with less than (vRAM Amount) space free, for example)
  • VMWare Tools not running/reporting - only send alerts once per day
  • ESX Hosts disconnected from VirtualCenter
  • ESX Host multipathing outage
  • Agent Data Collection Alerts - working on some self-healing capabilities for these
  • FMS Operational Alerts
  • Rule to clear any alert after 3 days - assuming you don't have people actively clearing them (shocking, I know)

I'm looking for others' suggestions for important things to monitor.

1 Message

August 24th, 2012 15:00

Those are good.  We'd also like to create an alarm on any disk swapping/paging, but i'm having a hard time with that one.  Am I just missing it?

132 Posts

August 24th, 2012 15:00

Disk swapping/paging at the VMWare level is the same as the Memory Swapping, and occurs at the ESX host level when it runs out of available RAM and starts using ESX Host disk to hold the less-used memory blocks.

Things which occur within the OS, such as the swapping which normally occurs with the Windows Swap File, is not captured by vFoglight's agent.  It can be monitored by the Infrastructure agent, but this requires more than a little additional effort to implement and maintain, at least for the moment.

Top