Equallogic PS6100

Question

Earlier I was having issues writing to disk in my VMware environment and received an email notiifcation from SANHQ stating both active and standby controller restarted. I also received a caution that 'Controller failed over in member'. When I look at the boot time on both controllers from the GUI they both show the same boot time. Does this mean that both controllers rebooted at the same time and if so, what conditions would cause this?

phil435 · Answer

Thanks Donald. I am working with my Dell team to get support renewed on this unit so that I can open a case. I am working on replacing the unit with a Compellent array but need to get support added until then.

phil435 · Answer

Firmware is 7.0.7 (R397085)

Can you tell from the messages below if the unit did a complete reboot? At around 8:20 all of my iSCSI initiators logged into each target and this happened again around 9:12. I assume when a controller fails over that I will see the initiators connect to each target. There is only one member in this group and it had me concerned when I saw the "member is now active in the group" message.

Info 5/9/2017 9:12:22 PM Sheldon1 18.2.0 Group member Sheldon1 now active in the group.
Warning 5/9/2017 8:24:08 PM Sheldon1 28.3.51 | 28.3.29 | 28.3.53 Warning health conditions currently exist. Correct these conditions before they affect array operation. | Active control module cannot communicate with secondary control module. Failover cannot occur. | There are 1 outstanding health conditions. Correct these conditions before they affect array operation.
Info 5/9/2017 8:24:08 PM Sheldon1 28.2.20 Control module has been installed in slot 1.Info 5/9/2017 8:20:25 PM Sheldon1 18.2.0 Group member Sheldon1 now active in the group.

phil435 · Answer

Can you tell me where the message in group manager can be found so we can confirm it's the same issue:

While running firmware version 7.0.x, unexpected controller failovers might have occurred at 248 consecutive days of uptime. After the failover, the following message was displayed: ERROR:: 15.4.3:NVRAM contains valid data. This is a WATCHDOG RECOVERY due to the watchdog on a NetBSD processor.

Origin3k · Answer

Ah....  we are monitoring the 'Uptime' of our devices so we able to check if its happend after 248 days. Without iam not sure if you can find the reason for the reboot without the help of the dell support. Maybe your SANHQ Archive contains info about the last FW upgrade / CM start. The message '....control module cannot communicate...' is very generic and i saw that one very often (24 EQLs in the house for 8 years now). But i never saw that both CMs restart at the same time like you. Also we never was effected about the Uptime issue because we update FW on regular bases. RegardsJoerg

Origin3k · Answer

If you select your member on the left just click on the 'controller' in the middle tab so see the FW versions of your CMs. RegardsJoerg

phil435 · Answer

Hi Joerg,

I do know where to find the firmware version. I just want to verify that the bug is indeed what caused the reboot of both controllers. If it was indeed the bug listed above, then where would I find the error message pertaining to this issue.

Thanks

dwilliam62 · Answer

Hello,

There's no specific error message for this issue. It's based on behavior and logs. So until you upgrade you'll want to shutdown the array before you hit 248 days. But sounds like you'll have migrated to CML before then.

Regards,
Don

phil435 · Answer

I forgot about the archive. I did happen to take one a couple of months ago and our uptime from the last reboot was exactly 248 days.

I believe this confirms that it is related to the bug were both controllers reboot.

EqualLogic

Equallogic PS6100

Was this post helpful?