1 Rookie
•
9 Posts
0
53
May 20th, 2025 12:27
Two Dell R730s and ond Dell R320 immediately switched off after *one* power source has failed
Hi all,
I've had a very serious issue yesterday evening which had cost me my night and half this day to restore corrupted file systems.
In my homelab I have one R320 and two R730 running. All servers with two power supplies, i.e. in the R730s two 0CMPGMA00 with 1100W each. The servers are running at <300W, so an easy load.
I've set the power supplies to "Input Power Redundancy" (German Original: "Eingangsstromredundanz" in the Power Configuration:
PSU1 is connected to my UPS, PSU2 is directly connected to the power line (Backup in case the UPS would fail).
Last night we've had a power cut of a few minutes. I quickly saw that everthing was unexpectedly down and I went to the basement - finding all three servers pitch dark and powered off. The UPS was fine!
I thought the UPS might have had a kind of "blackout", therefore I checked the logs of my intelligent PDU's in my rack. The PDU which sources the direct line connection shows an expected cold start as the line was off due to the outage. But the PDU behind my UPS was absolutely fine - so the UPS delivered power all the time!
That means: PSU1 was all the time sourced by the UPS while the source of PSU2 failed due to the outage. This can also been seen in the server logs:
Mon May 19 2025 23:30:40 The power supplies are redundant.Mon May 19 2025 23:30:31 The input power for power supply 2 has been restored.Mon May 19 2025 23:29:56 Power supply redundancy is lost.Mon May 19 2025 23:29:54 The power input for power supply 2 is lost.Mon May 19 2025 23:27:29 The power supplies are redundant.Mon May 19 2025 23:27:22 The input power for power supply 2 has been restored.Mon May 19 2025 23:26:36 Power supply redundancy is lost.Mon May 19 2025 23:26:33 The power input for power supply 2 is lost.
We see two "outages" as after the line failure my Diesel kicks in (and restores all power to the house), the second failure in PSU2 is the switchback after the power had been restored.
The big question:
Why did all three servers (!) switch off without a proper shutdown sequence and without any reason, although PSU1 was powered fine all the time and only PSU2 did fail due to the outage?
I can exclude a single hardware issue because all servers behaved exactly the same.
Thanks for your ideas!
Marco
jacotec
1 Rookie
•
9 Posts
0
May 20th, 2025 13:03
After digging even more I found the answer: Indeed the UPS has issues a shutdown command to the servers. I need to check why this happened so early (as it was under generator power). But it was no server issue!
jacotec
1 Rookie
•
9 Posts
0
May 20th, 2025 12:35
Addon:
In the Livecycle logs I found the system did turn off around 4 minutes after the power was finally restored. "SYS1001" - why does my server turn off by itself 4 minutes after everything was over?