Start a Conversation

Unsolved

This post is more than 5 years old

27534

July 7th, 2015 00:00

OME Status Flapping

Hi,

We are managing around 20 Dell servers with OME 2.0.1.222 We are noticing an increase number of alerts which are coming in reporting that a server is of an Unknown Status, which then changes back to Normal within a few seconds.

Over the past 12 hours there have been 40+ alerts of this type. The  issue seems to have gotten worse since the weekend when we fully patched the servers.

It looks to me like that the servers are taking longer to respond with a status than OME is expecting, hence the alert.

This has to be resolved however, because with the number of alerts being generated, actual real problem would be missed.

Any ideas?

3 Apprentice

 • 

2.8K Posts

July 8th, 2015 12:00

Thanks for the post....a few questions:

1. Are they all the same server model?  What model?

2. Did you discover via WSMan and iDRAC or in-band (OMSA+SNMP)?

3. Did you change the status poll time settings from the default of 1 hour?

4. Did you modify the ICMP timeout/retry or the WSMan timeout/retry?

Thanks,

Rob

delltechcenter.com/ome

1 Rookie

 • 

11 Posts

July 8th, 2015 23:00

Hi,

Thanks for your response, please see answers to questions below.

1) We have a mix of R410, R710, R520. it only is happening on the R410 + R710 (so iDrac 6 not 7).

2) They are discovered via OMSA/

3) Yes, we have reduced this to 15 minutes

4) No

Kind Regards,

3 Apprentice

 • 

2.8K Posts

July 9th, 2015 08:00

Ok,

So for these servers as you say, be sure to be using OMSA + SNMP wtih the IP address of the host.  You don't need to include the IP address of the iDRAC in the discovery range for these.  You might try to exclude the iDRAC ip address if you've not done so already.

I forgot to ask how many devices you have discovered and what the number of cores/RAM you have for the OME box.

The Dell Troubleshooting tool on the OME desktop may be useful to run the SNMP test against some of the dodgy boxes just to confirm you are getting good results.

We may need a support ticket to get a closer look at these systems.

Thanks,

Rob

1 Rookie

 • 

11 Posts

July 13th, 2015 00:00

Hi Rob,

The OME box has 4GB of RAM (53% used), and 4 cores (1% usage). We are monitoring a total of 20 servers, and the iDRAC ranges are not being discovered.

Over the past couple of day's it has calmed down, nothing has been changed or rebooted, so not sure what is happening...

Thanks,

No Events found!

Top