Unsolved

This post is more than 5 years old

22 Posts

135373

August 2nd, 2012 05:00

Memory/CPU/disk monitoring

Hi,

I have copied the infrastructure rules for cpu, memory and disk space monitoring and noticed that (certainly the memory) will alert if the figure changes. i.e. we have thresholds for 90% and 95%, however I get alerts if it changes from say 90.08% to 91.10%. Is there a way to only alert once if the memory/cpu/ etc are between a certain limit - i.e. the 90% - 95% and then only one alert when it goes over the 95%?

Any info appreciated.

Thanks

Davie

132 Posts

August 2nd, 2012 10:00

You can do it, but you have to look at it from a time perspective.

"Only alert once" changes to "Only alert once within a certain time period"

Three pieces of the puzzle:

  1. Rule condition is evaluated
  2. Event is recorded (rule returns 'true')
  3. Alert notification is sent

Two ways to look at it: 

  • Only send the alert notifications if the condition has occurred just once during the last x minutes, but still record the events each time the condition occurs.

          There is a thread here with some of the answer:  http://en.community.dell.com/techcenter/performance-monitoring/foglight-administrators/f/4788/t/19556657#55561

          Brian provided the starting point towards the answer, using the

               if ((getAlarmCount(Date startTime, Date endTime, String topologyObjectID)  < 2) {send the alert}

          to see how many times something has fired in the desired time range, and only send the alert

  • Only record the event if the condition has occurred just once during the last x minutes. 

         

          This is more tricky, since you have to change your underlying rule condition to take the frequency of occurance into account,

          and only return 'true' after checking to make sure the rule has not already generated an event in the time period.

          you can still use the above approach, modified a bit:

          if ((whatever condition indicates an event) && (((getAlarmCount(Date startTime, Date endTime, String topologyObjectID)  < 2))) {true} else {false}

I have not tested this, and the code is not fully formed, just indicative of what needs to be done. 

Top