Unsolved
1 Rookie
•
31 Posts
0
101
December 11th, 2024 23:25
Failed firmware upgrades for X710 card on C6420
I have a rack of C6420's that we purchased during COVID as an expansion to our HPC system as it was over subscribed.
They all have Intel X710-DA2 Dual Port 10G SFP+ OCP 2.0 network cards part number T44PH.
However eight of the cards refuse to take any firmware later than 20.5.13. The Network_Firmware_234W1_LN_23.0.8_A00_01.BIN upgrade file says it is compatible and runs (though produces less dots than on a successful machine) but on reboot if I run with -qc it shows that it is still on the only 20.5.13 version and not 23.0.8. They all upgraded to 20.5.13 in the past.
The failed upgrades are scattered throughout the rack with C6420 enclosures with failed upgrades also have successful upgrades. Reseating the sled or power cycling the whole C6420 enclosure makes no difference. I have taken a good and bad OCP card out and compared them side to side and they look identical.
Anyone have any idea what is going on and how I can get the failed eight cards to upgrade so I have a consistent set of firmware in my cluster?



DELL-Young E
Moderator
•
5.2K Posts
0
December 12th, 2024 04:56
Hello thanks for choosing Dell and welcome to our community.
According to the release note,
https://dell.to/3BsLxZb
Firmware Upgrade/Downgrade Support
==================================
Release 22.5.x -> Release 23.0.x: OS supported
Release 22.5.x -> Release 23.0.x: Lifecycle Controller or iDRAC supported
Then I would suggest stage upgrade.
https://dell.to/49A0fKC
You can do 22.5X first then move on to 23.
Respectfully,
jabuzz
1 Rookie
•
31 Posts
0
December 12th, 2024 13:13
All the nodes that are not upgrading have firmware version 20.5.13 on them. What the other nodes had, I can't say anymore.
I have tried updating them with the following versions, starting at the lowest and continuing to the highest, with a reboot after every installation. These are all the old versions that are listed for download on the website that are more recent than 20.5.13
20.5.16
21.5.9
22.0.9
22.5.7
Using these packages for RHEL
Network_Firmware_H8M48_LN_20.5.16_A00.BIN
Network_Firmware_GXJ5G_LN_21.5.9_A02.BIN
Network_Firmware_9NPPG_LN_22.0.9_A00.BIN
Network_Firmware_1R0W0_LN_22.5.7_A00.BIN
None of them worked, so it is more than just upgrading from the wrong version IMHO. What is suspicious is that the failed upgrades produce far fewer progress periods before they stop than successful upgrades.
DELL-Chris H
Moderator
•
9.6K Posts
0
December 12th, 2024 13:49
Jabuzz,
Have you tried running the updates outside the OS? What I would suggest is booting to the Live ISO and then trying to run the updates from that environment.
Also, would you clarify if the systems with the update issue are also up-to-date on BIOS, iDrac, controller, etc? Do they match the version being run by the successfully updated systems?
Let us know.
jabuzz
1 Rookie
•
31 Posts
0
December 12th, 2024 15:45
I just booted one of the nodes in the Dell Live ISO, and predictably, it didn't work when I tried to make the smallest step from 20.5.13 to 20.5.16. Again, there are far fewer progress dots than on a successful flash.
Yes, all the nodes are identical regarding the firmware versions being run and are the latest and greatest for everything, including the PSUs in the back and the 1TB SATA drive up the front. I can't see why that would make any difference.
The OS on the nodes should also be identical. This is an HPC system, and they are all imaged identically. If they are not identical, then I have messed up badly (unlikely, as jobs on the cluster would likely have had issues, and the users would have complained). However, it fails from the Dell Live ISO, so I don't think it has anything to do with the OS on the system.