Unsolved
4 Posts
0
475
July 7th, 2022 05:00
Dell PowerEdge 730XD - PCIe Link Training Failure ... - Happened on both servers..
Hi All,
BIOS update to 2.14.1 .... Dell pulled this bios as it was bad.
Since then. Rebooted 2x 730XD servers in a cluster and both have exactly the same error.
I cannot boot or flash as the system has a halt due to this PCIe error.
The only way to progress is to remove the Network Daughter Card 6vdpg .
Now I need to F1 to complete POST.
However, it did allow me to downgrade the BIOS down to 2.13.0 ...
The only way to run this machine is with the card absent.
This cannot be a coincidence. Both servers. Same issue after its first reboot in some time.
Please advise so I can get this card back in and the server operating normally again.
OR
Disable the F1 prompt as a minimum.
Thanks in advance
R
No Events found!
RomUK
4 Posts
0
July 7th, 2022 06:00
Downgraded to 2.12.1 ... problem persists.
When these Hosts were built months ago... then were rebooted several times all ok.
The only difference is 2.14.1 BIOS update that we can determine and now the network daughterboard card performs a system halt allowing nothing else to continue.
I now have to run these 2 servers having to press F1 on a reboot as it says it is missing the daughter board. The trouble is I cannot even get into the Bios Setup if the card is installed to do anything.
Please advise anyone
Thanks
R
DELL-Chris H
Moderator
•
9.5K Posts
0
July 7th, 2022 12:00
RomUK,
I assume the two servers are ScaleIO Ready nodes, can you confirm that?
Also, do you still see the issue if you upgrade from 2.12.1 to 2.13.0 via idrac?
Let me know.
RomUK
4 Posts
0
July 8th, 2022 02:00
HI Chris,
Not tried upgrading from 2.12 to 2.13.
These are just 2x 730xd regular servers in a 2 node Microsoft Failover Cluster. No other technology used apart from a Fibre Channel SAN for the Cluster.
Please confirm if you have seen this issue before etc... what can you tell me about this?
This is TOO much of a coincidence for 2x servers to happen at the same time.
Thanks in advance
R
DiegoLopez
4 Operator
•
2.7K Posts
0
July 8th, 2022 07:00
Hello @RomUK,
Please, try the BIOS upgrade to 2.13 via iDRAC: https://dell.to/3ytN1g7 Also IDRAC firmware is updated? What is the current firmware version? What was the firmware version of the Network Daughter Card 6vdpg? And which slot were they installed to in the servers? can you switch it to a different slot?
Once you have discarded the firmware for these components, we could think about a hardware error.
Regards.
RomUK
4 Posts
0
July 11th, 2022 01:00
Hi Diego,
Could not verify firmware as system halt before anything could actually happen.
Just pulled the card as this node had to boot.
Could not swap to another slot as this is NOT expansion. This plugs straight into the Motherboard and the pins are directly underneath the card.
I can safely say upgrading does not help at all.
I have 2x nodes now.. one on 2.13 and one on 2.12.
This must be a common problem to occur on both nodes simultaneously.
Please help
R
DiegoLopez
4 Operator
•
2.7K Posts
0
July 11th, 2022 08:00
Yes. I understand the server was not able to post with the card. But you can always verify firmware from the iDRAC for most devices. Do you have iDRAC access? Can you see there the iDRAC firmware? Maybe you will not see it for the card, but it's ok.
Also, can you try to switch the cards between servers? If they both are the same cards and the same servers that could also be a good opportunity to see if they fail. Can you take a picture of the label on the card and share it with us?
Regards-