Unsolved

1 Rookie

 • 

64 Posts

81

June 9th, 2025 16:08

R830 - "A fatal error was detected on a component at bus 64 device 3 function 0."

How do I find out which device is this ? (This server is running vmware esxi) & I have drac access, but I am not able to locate it from "System" -> "system Inventory" -> "Hardware Inventory" page. I can export the data to an xml file. What should I search for in the xml to locate this device ?

1 Rookie

 • 

64 Posts

June 9th, 2025 18:09

to add to the above. (1) pre-boot ePSA does not find any hardware errors. It flags the errors in the event log, but every thing else passes, (2) there is no Bus 64 in the config displayed in ePSA. (in pcie slots)

(edited)

1 Rookie

 • 

64 Posts

June 9th, 2025 18:19

Another observation. I am running Dell OME 4.4 appliance in this ESXi, and the PSOD has a reference to that VM. So, may be the Dell OME is the culprit? Screenshot below.

Moderator

 • 

9.6K Posts

June 9th, 2025 20:23

CU.Dell.User,
 

To identify the device at Bus 64, Device 3, Function 0 on your Dell PowerEdge R730 running VMware ESXi, and using the iDRAC XML export, here are a couple of things you can try.

 

You can locate the Device in the XML

Export the Hardware Inventory XML
From iDRAC: Go to System → Hardware Inventory Click Export → XML

Open the XML File
Use a text editor like Notepad++, VS Code, or any XML viewer.

Search for PCI Device Entries
Look for entries like:

 

<PCIDevice>
  <Bus>64</Bus>
  <Device>3</Device>
  <Function>0</Function>
  ...
</PCIDevice>

 


Now if the XML doesn’t show PCI bus mappings clearly, you can also use ESXi shell (if accessible) and run:

lspci -v | grep -B 1 "03:00.0"


or

vmkchdev -l | grep "0000:40:03.0"


Let me know what you see, and if this helps.

 

1 Rookie

 • 

64 Posts

June 10th, 2025 17:50

@DELL-Chris H  The exported xml does not have entries like "<pcidevice>" or "<Bus>".  But it has two instances with <Property Name="Bus" type="string"> and 35 instances with "<Property Name="BusNumber" type="uint32">. But strangely, there is no "bus number" property with value 64.  I can paste the entire file if that would help (8467 lines).  One sample entry copied below.    

<PROPERTY NAME="BusNumber" TYPE="uint32">
      <VALUE>68</VALUE>
      <DisplayValue>68</DisplayValue>
    </PROPERTY>

---

The only time <value>64</value>  appears in the xml is wrt RAIDTypes.

    <PROPERTY NAME="RAIDTypes" TYPE="uint32">
      <VALUE>64</VALUE>
      <DisplayValue>RAID5</DisplayValue>
    </PROPERTY>

lspci -v | grep -B 1 "03:00.0" gives me a null output, but

vmkchdev -l | grep "0000:40:03.0" outputs "0000:40:03.0 8086:6f08 0000:0000 vmkernel PCIe RP[0000:40:03.0]"

Two screenshots are attached to show where I am generating the XML, and the ESXi shell outputs.

1 Rookie

 • 

64 Posts

June 10th, 2025 18:02

I also searched for <value>40</value> in case the XML has numbers in hex, but <value>40</value> does not appear at all in the file.

Moderator

 • 

9.6K Posts

June 10th, 2025 20:15

Sorry about that. Would you confirm if you have anything in slot 6 or 7 in the server? I ask as I believe that is where the bus device function is pointing to. If it is a NIC/HBA, do you know if it is up to date? Also, is the rest of the server up to date on BIOS, IDrac, perc, etc? Lastly, if you remove the divice in that slot from the VM, does it resolve the PSOD. If not then if you power down the server and remove the card, does that clear the fatal error?

1 Rookie

 • 

64 Posts

June 11th, 2025 14:21

Slot 6 has the raid controller (PERC H730P), and nothing in slot 7.

The PERC and BIOS firmware are 25.5.9.0001 and 1.19 respectively. PERC driver version is 7.728.02.00.  I believe these are up to date. Can you please confirm.

Do you have any thoughts on whether the reference to the "Dell OME 4.4" VM in the PSOD (see the PSOD screenshot in the 2nd comment above—line highlighted by yellow rectangle) indicates that the VM is causing the problem somehow? (Is it even possible for a VM to bring down ESXi v8?)

Two notes: (1) When I ran the pre-boot ePSA every passed, (2) afterwards I rebooted, and it has been running fine for the last two days. Thank you.

If I remove the card from slot 6 then I would HDDs won't be connected, so not sure whether that would help.

(edited)

Moderator

 • 

9.6K Posts

June 11th, 2025 14:43

I am not sure if you mistyped the BIOS version, as the latest is 2.19, but the perc looks to be up to date. Would you confirm what version the idrac is currently at as well?

Also, is the issue still occurring, as you stated it is running fine for the last couple of days, or has it stopped?

 

On a side note, I am not seeing ESXi 8 listed as supported on the R730, the latest version I see listed for that server is ESX7.0

 

 

1 Rookie

 • 

64 Posts

June 11th, 2025 14:51

This is an R830. 

(1) Yes, I know ESXi8 is not on the supported list, but I have been running it without issues for well over a year.

(2) The problem did not appear again. The server is running fine now. (rebooted on Monday around noon, so no issues for almost 47 hours).

(3)The BIOS firmware is 1.19.0 (is there an updated BIOS?) If I go to support and enter the service tag <Private data removed from public view. DELL-Admin>, I only see the following as the latest BIOS (which is the same as I have).

Dell Server BIOS PowerEdge R830 Version 1.19.0
Urgent BIOS 19 Mar 2024

(4) The iDRAC 8 is running Version 2.86.86.86 (Build 06)

Thank you.

(edited)

Moderator

 • 

9.6K Posts

June 11th, 2025 15:03

Sorry about that, the title of the posting was for an R730 so that is what I was going off of. The BIOS version for the R830 is indeed 1.19, where as for the R730 it was 2.19. I don't see an update available for it currently. Also, the idrac looks to be up to date as well. 

Was anything added to the server, updated, removed, or changed at the time of the error? 

1 Rookie

 • 

64 Posts

June 11th, 2025 15:19

Huh, that was my mistake with that title. I don't think I can edit it. Can you (if so, please change R730 to R830)? 

Nothing was added or changed recently (Last upgrade was RAM from 384GB to 768GB in Dec 2024).

My internet search says power fluctuations might also cause PSOD in vmware, but the server is connected via a good UPS, and we did not have power issues at the time.  The temperature in the room was at 35 degrees Celsius.

(edited)

Moderator

 • 

9.6K Posts

June 11th, 2025 15:25

I went ahead and changed the title to reflect the correct server. With the error having cleared, and isn't currently causing anything it will be hard to troubleshoot what caused it, unless there is something in the logs, seeing as the diagnostic came back clean. What I would suggest is to monitor the server for the time being and if the issue reappears then we can diagnose it when the issue is active. 

1 Rookie

 • 

64 Posts

June 11th, 2025 15:34

Thank you. I will post if the error shows up again.

No Events found!

Top