Start a Conversation

Unsolved

This post is more than 5 years old

42488

November 12th, 2015 20:00

Fault Alerts: System is down/Up alerts display everyday, Dell OME 2.1

Hi,

Firstly, I would like to provide some information of our Dell OME server for your reference:

1. Upgraded from ver 1.2 to ver 2.1 in the past week

2. Discovery Schedule & Inventory Schedule : Every 1 day

3. Status Schedule: Every 1.5 hours

4. In every Discovery Range: SNMP & WS-Man discovery are enabled

My questions is: In our Dell OME server (just upgraded from ver1.2 to ver2.1 in the past week), there are 2 servers always display the "System is Down" alert after every inventory schedule ran. But actually the servers are running normally. After few minutes or few hours, the alert "System is Up" displayed.

Here's some information of these 2 servers:

1. Server A is located in our office (same network with OME server), and Server B is located in datacenter (other servers in this datacenter do not have the same issue)

2. In the discovery range configuration of these 2 server, ICMP configuration has been set Timeout: 1000 Milliseconds and Retires: 2 attempts. SNMP configuration has been set Timeout:4 seconds, Retries: 2 attempts.

I have no idea why only these 2 server always received the fault alert. Does anyone can provide some advices to help me to fix this problem?

Thank you. 

November 13th, 2015 14:00

Hi L.Y.

Thanks for the detailed post.

Few options to try for isolating the problem:

1. Is there any log messages related to these affected servers after the scheduled inventory task executes in the logs->application logs in OME UI?

2. Does increasing the timeout/retries for ICMP configuration helps?

Also, please note, if your range only includes only iDRACs then only WS-Man configuration should be enabled and SNMP is required for discovering the server with in-band IP with OMSA.

Thanks,
Vijay

3 Posts

November 15th, 2015 17:00

Hi Vijay,

Thanks for your reply.

1. yes, there is a log message - "Device name change has been detected (....)", but only relate to one affected server and only logged after some scheduled inventory (i.e. not logged every day)

2. No, I tried to increase the timeout/retires before I post this question, but nothing changed.

Regarding the range, I would like to ask you for some advice. Would you suggest the inventory range only includes iDRACs or only Server?
I found some article is taking about there is some problems if the range includes both iDRAC & Server IPs, is it right? Because I saw the log messages always show "Device name change".
If I change the range,  which type you will suggest to set? Only iDRACs or only Server?

Thank you,
Lavenie

November 15th, 2015 23:00

Hi Lavenie,

If your range includes both iDRAC IP as well as Server IP, having both these IP in the same range with SNMP/WSMAN enabled is not a problem. Also, device name change is not a problem in this case and just a informational message.

As I understand from your setup, you are discovering the server with both iDRAC IP and Server IP (which is perfectly fine). Can you try one more thing as follows:

1. Perform ICMP test from troubleshooting tool for the affected server for both iDRAC IP as well as Server IP.

2. Perform SNMP test from troubleshooting tool for the affected server for Server IP.

3. Perform WSMAN test from troubleshooting tool for the affected server for the iDRAC IP.

Let us know the results.

Thanks,
Vijay

3 Posts

November 16th, 2015 00:00

Hi Vijay,

Noted. Thanks for your advice.

Please find the test results details below:

ICMP Test:
Server A
Server IP - Test Result: Pingable , RoundTrip time(ms): 0
iDRAC IP: - Test Result: Pingable, RoundTrip time(ms): 0

Server B
Server IP - Test Result: Pingable, RoundTrip time(ms): 5
iDRAC IP - Test Result: Not pingable

SNMP Test:
Server A - Test Result:  The SNMP read request has failed for Get Community name - public
Server B - Test Result: The SNMP read request has failed for Get Community name - public


WS-MAN Test:
Server A - Test Result: Connected.
Server B - Test Result: Connected.

Thanks,
Lavenie

November 17th, 2015 14:00

Thanks Lavenie for the details.

From the data looks like your discovery/inventory is only happening via iDRAC IP as your WS-MAN tests are passing and SNMP tests are failing. Or in case your SNMP discovery was working earlier, make sure the OMSA services on the targets are up and running.

Also, for Server B, the iDRAC IP ICMP test result is not pingable. In this case OME will not attempt WS-MAN connection and detect server B is not reachable.

Thanks,
Vijay

 

No Events found!

Top