Unsolved
This post is more than 5 years old
1 Message
0
556
January 3rd, 2005 11:00
OMSA causing server to reboot
Hello All
We've tried OMSA on our PE 1750 servers (RHEL 3.0) several times in the past and always had problems running it on anything but plain vanilla installations since installations with updated kernels was rebooting without warning once or twice per week.
We had Dell support look into the matter and while they acknowledged the problem they couldn't provide a solution other than wait for the next version of OMSA (the best bet was a problem with esm.o and the updated kernel) and see if that fixed things. Well, that is now, and things *have* changed: Our PE 2400's has started crashing and rebooting as well...!
Talked to support - they had no idea what might cause it. Removing OMSA fixes the problem immediately.
Has anybody tried something similar and is there a way to remedy the situation? - or can anyone suggest a different tool to do remote health checks of the servers? All I need is something that can alert me if CPU-temperature becomes critical or a RAID container loses redundancy - anything else is non-critical.
And please don't tell me to contact support again - although they're very friendly and supportive, I don't think they're able to solve this. The problem started more than a year ago and since we're pretty much alone with this problem they're most likely not going to spend resources on it.
Best regards,
Anders C. Madsen
Golden Planet
We've tried OMSA on our PE 1750 servers (RHEL 3.0) several times in the past and always had problems running it on anything but plain vanilla installations since installations with updated kernels was rebooting without warning once or twice per week.
We had Dell support look into the matter and while they acknowledged the problem they couldn't provide a solution other than wait for the next version of OMSA (the best bet was a problem with esm.o and the updated kernel) and see if that fixed things. Well, that is now, and things *have* changed: Our PE 2400's has started crashing and rebooting as well...!
Talked to support - they had no idea what might cause it. Removing OMSA fixes the problem immediately.
Has anybody tried something similar and is there a way to remedy the situation? - or can anyone suggest a different tool to do remote health checks of the servers? All I need is something that can alert me if CPU-temperature becomes critical or a RAID container loses redundancy - anything else is non-critical.
And please don't tell me to contact support again - although they're very friendly and supportive, I don't think they're able to solve this. The problem started more than a year ago and since we're pretty much alone with this problem they're most likely not going to spend resources on it.
Best regards,
Anders C. Madsen
Golden Planet
No Events found!
HWYSTR
50 Posts
0
January 18th, 2005 13:00