Unsolved

1 Rookie

 • 

8 Posts

687

November 10th, 2024 05:24

A fatal error was detected on a component at bus 0 device 3 function

Hi,

Good Day!

<URL removed by moderator due to private information>

We are experiencing an issue with our R730 server, which hangs during POST and displays an error related to the network card. Upon investigation, we removed the NetXtreme II BCM57810 10 Gigabit Ethernet adapter, after which the server resumed normal operation. We subsequently replaced the network adapter, but the server still reported an error on the same network card after booting into ESXi. When we attempt to establish network connectivity on that port, the error reoccurs.

For your reference, we have attached the TSR logs and an image of the error message. Notably, there are no errors displayed in the hardware inventory. Could you assist us in diagnosing the root cause of this issue?

A fatal error was detected on a component at bus 0 device 3 function 2.
2024-11-08 02:03:04 A bus fatal error was detected on a component at slot 5.

Moderator

 • 

4K Posts

November 11th, 2024 08:05

Hi,

 

I would like to inform you that, I have to remove the link to your TSR log as TSR contains your server's information and this is a public forum. 

 

What is the ESXi's version? 

 

Also, I had a look at the server's firmware, the BIOS and iDRAC is out of date. Could you update them, this probably can help the issue. 

1 Rookie

 • 

8 Posts

November 11th, 2024 10:49

@DELL-Joey C

Thank you very much for your reply. After updating with the latest patch, we are still encountering the same issue. We remain uncertain about the root cause, which could be related to the network card firmware, the riser card, or the motherboard, as the logs do not indicate any other errors.

Is it necessary to update the network card firmware? We have been unable to locate the specific firmware version on Dell's website. Could you suggest the next steps for further analysis to determine the cause of this error?

ESXi's version is:7.0.3

Moderator

 • 

2.9K Posts

November 11th, 2024 11:54

Hi,

I can't read the PSOD warnings from the image exactly. Could you upload a higher resolution? I think the server needs to keep up to update including the Netxtreme II network card.

1 Rookie

 • 

8 Posts

November 11th, 2024 12:35

@DELL-Erman O​ hi kindly see the picture i have provided drive link here as picture resolution drop after uploading here. your help is appreciated. 

Link: https://drive.google.com/file/d/1S9s6fcJTtnJ5WMB0ms8bt_9e8nBOGLJo/view?usp=drive_link

Moderator

 • 

2.9K Posts

November 11th, 2024 14:23

Hi, thank you. Actually, I'm not sure what exactly caused that. Might be NMI watchdog disable helpful. Disabling the NMI Watchdog temporarily can help determine if the watchdog is being overly aggressive and causing unnecessary system halts. I suspected NMI watchdog. 

  1. Open theboot.cfgfile located in the /bootbank directory using a text editor like vi:

    vi /bootbank/boot.cfg
    
  2. Add the following parameter to the kernel options line:

    nmiWatchdog=0
    
  3. Save and close the file.

  4. Reboot the ESXi host to apply the changes.

Maybe you can ask the VMware community about this topic. Since I have limited resources about the ESXi side.

1 Rookie

 • 

8 Posts

November 12th, 2024 05:22

@DELL-Erman O​ Hi Thanks for your suggestion. Can you kindly share the network card specific firmware version i can not find in the dell portal.

Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet


 Again thank you for your kind assistance 

Moderator

 • 

4K Posts

November 12th, 2024 08:55

Hi,

 

Broadcom NetXtreme II firmware are now found under QLogic products; ref: https://dell.to/3YMc9Mf

 

Hence, you can find the firmware here: https://dell.to/3YJjhJl

 

May we know if the server was working fine on ESXi before the network card being replaced? If you were to remove the card, does the OS have any error? I noticed, in the ESXi PSOD screenshot, it is showing the error is pointing to PCI Express Root Port with VID of 8086. VID 8086 is Intel chipset, which might be related any the processor or mainboard chipset. 

1 Rookie

 • 

8 Posts

November 12th, 2024 09:39

@DELL-Joey C​Hi Joey, Thank you for your response and assistance. Before encountering the network card error, our server was stuck during the POST process. We removed the network card and confirmed that the server was operational, no longer hanging. Afterward, we proceeded with a firmware update and reinstalled the network card, but this led to a PSOD (Purple Screen of Death) error. We suspected the network card might be faulty, so we replaced it with a new one.

However, as soon as we connected the network cable to the port, the error reappeared. Interestingly, after performing a cold reboot, the error temporarily disappeared, but it reoccurred after 1-2 days. For further diagnosis, we decided to swap the network card from slot 4 to slot 5. Since then, we haven’t encountered the error over the last two days, though we are unsure if this has resolved the issue permanently. Thank you very much for your ongoing support. We would appreciate any further insights or recommendations to help us identify the root cause.


Moderator

 • 

4K Posts

November 12th, 2024 09:51

Hi,

 

I was going to ask to install the card on another slot to troubleshoot the issue, after understanding the whole situation. There are only a few solutions to the cause of 'bus fatal error' in log. If the firmware of all hardware have been updated and the issue persist, you may need to check hardware cause. Since you have replaced the network card the issue still persist, then it might be the slot. Now that you have already swapped the slot to another one, that leads to the mainboard slot issue. 

No Events found!

Top