This post is more than 5 years old
1 Rookie
•
47 Posts
0
6013
May 9th, 2017 20:00
Equallogic PS6100
Earlier I was having issues writing to disk in my VMware environment and received an email notiifcation from SANHQ stating both active and standby controller restarted. I also received a caution that "Controller failed over in member". When I look at the boot time on both controllers from the GUI they both show the same boot time. Does this mean that both controllers rebooted at the same time and if so, what conditions would cause this?
No Events found!



phil435
1 Rookie
•
47 Posts
0
May 10th, 2017 02:00
Thanks Donald. I am working with my Dell team to get support renewed on this unit so that I can open a case. I am working on replacing the unit with a Compellent array but need to get support added until then.
phil435
1 Rookie
•
47 Posts
0
May 10th, 2017 03:00
Firmware is 7.0.7 (R397085)
Can you tell from the messages below if the unit did a complete reboot? At around 8:20 all of my iSCSI initiators logged into each target and this happened again around 9:12. I assume when a controller fails over that I will see the initiators connect to each target. There is only one member in this group and it had me concerned when I saw the "member is now active in the group" message.
Info 5/9/2017 9:12:22 PM Sheldon1 18.2.0 Group member Sheldon1 now active in the group.
Warning 5/9/2017 8:24:08 PM Sheldon1 28.3.51 | 28.3.29 | 28.3.53 Warning health conditions currently exist. Correct these conditions before they affect array operation. | Active control module cannot communicate with secondary control module. Failover cannot occur. | There are 1 outstanding health conditions. Correct these conditions before they affect array operation.
Info 5/9/2017 8:24:08 PM Sheldon1 28.2.20 Control module has been installed in slot 1.Info 5/9/2017 8:20:25 PM Sheldon1 18.2.0 Group member Sheldon1 now active in the group.
phil435
1 Rookie
•
47 Posts
0
May 10th, 2017 06:00
Can you tell me where the message in group manager can be found so we can confirm it's the same issue:
While running firmware version 7.0.x, unexpected controller failovers might have occurred at 248 consecutive days of uptime. After the failover, the following message was displayed: ERROR:: 15.4.3:NVRAM contains valid data. This is a WATCHDOG RECOVERY due to the watchdog on a NetBSD processor.
Origin3k
4 Operator
•
2.3K Posts
0
May 10th, 2017 07:00
Ah.... we are monitoring the "Uptime" of our devices so we able to check if its happend after 248 days. Without iam not sure if you can find the reason for the reboot without the help of the dell support. Maybe your SANHQ Archive contains info about the last FW upgrade / CM start.
The message "....control module cannot communicate..." is very generic and i saw that one very often (24 EQLs in the house for 8 years now). But i never saw that both CMs restart at the same time like you. Also we never was effected about the Uptime issue because we update FW on regular bases.
Regards
Joerg
Origin3k
4 Operator
•
2.3K Posts
0
May 10th, 2017 07:00
If you select your member on the left just click on the "controller" in the middle tab so see the FW versions of your CMs.
Regards
Joerg
phil435
1 Rookie
•
47 Posts
0
May 10th, 2017 07:00
Hi Joerg,
I do know where to find the firmware version. I just want to verify that the bug is indeed what caused the reboot of both controllers. If it was indeed the bug listed above, then where would I find the error message pertaining to this issue.
Thanks
dwilliam62
4 Operator
•
1.5K Posts
0
May 10th, 2017 09:00
Hello,
There's no specific error message for this issue. It's based on behavior and logs. So until you upgrade you'll want to shutdown the array before you hit 248 days. But sounds like you'll have migrated to CML before then.
Regards,
Don
phil435
1 Rookie
•
47 Posts
0
May 10th, 2017 09:00
I forgot about the archive. I did happen to take one a couple of months ago and our uptime from the last reboot was exactly 248 days.
I believe this confirms that it is related to the bug were both controllers reboot.