Unsolved

1 Rookie

 • 

1 Message

2126

August 7th, 2022 03:00

VMware ESXi 7.0 PSOD/MCA report

Dear Dell Community,

I am running a VMware ESXi, in a medium size infrastructure. This morning the ESXi HOST has had a PSOD (Purple Screen of Death). Since the server has no longer warranty aid, I would like to ask for you help to debug the issue.

I am running the following VMware build:
VMware ESXi 7.0.2 build-18538813
VMware ESXi 7.0 Update 2

 

The following logs I have ran into:

2022-07-02T08:02:27.533Z cpu2:2097540)HPP: HppAADetermineStatus:96: Unknown Check condition 0/2 0x2 0x3a 0x1.
2022-07-02T08:02:27.579Z cpu4:2097540)ScsiUid: 319: Path 'vmhba2:C0:T5:L0' does not support VPD Device Id page.
2022-07-02T08:02:27.594Z cpu0:2097540)VMWARE SCSI Id: Could not get disk id for vmhba2:C0:T5:L0
2022-07-02T08:06:18.843Z cpu28:2098033)ScsiDeviceIO: 4298: Cmd(0x45d9161fe100) 0x85, CmdSN 0xc33c from world 2100216 to dev "naa.6d09466018c3430021709d9206764bd9" failed H:0x0 D:0x2 P:0x0 Valid sense da
ta: 0x5 0x20 0x0
ESC[7m2022-07-02T08:06:18.845Z cpu41:2100216)WARNING: NvmeScsi: 156: SCSI opcode 0x85 (0x45d9161fe100) on path vmhba3:C0:T0:L0 to namespace t10.NVMe____Dell_Express_Flash_PM1725a_1.6TB_AIC____68010071E5
382500 failed with NVMe error status: 0x1ESC[0m
ESC[7m2022-07-02T08:06:18.845Z cpu41:2100216)WARNING: translating to SCSI error H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0ESC[0m
2022-07-02T08:06:18.845Z cpu48:2097267)ScsiDeviceIO: 4298: Cmd(0x45d9161fe100) 0x85, CmdSN 0xc341 from world 2100216 to dev "t10.NVMe____Dell_Express_Flash_PM1725a_1.6TB_AIC____68010071E5382500" failed
H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0
2022-07-02T08:07:27.534Z cpu27:2097540)HPP: HppAADetermineStatus:96: Unknown Check condition 0/2 0x2 0x3a 0x1.
2022-07-02T08:07:27.579Z cpu27:2097540)ScsiUid: 319: Path 'vmhba2:C0:T5:L0' does not support VPD Device Id page.
2022-07-02T08:07:27.594Z cpu27:2097540)VMWARE SCSI Id: Could not get disk id for vmhba2:C0:T5:L0
2022-07-02T08:11:42.170Z cpu21:2098032)ScsiDeviceIO: 4298: Cmd(0x45b918a501c0) 0x1a, CmdSN 0x6a17f4 from world 0 to dev "naa.6d09466018c3430021709d9206764bd9" failed H:0x0 D:0x2 P:0x0 Valid sense data:
0x5 0x24 0x0
2022-07-02T08:12:27.534Z cpu1:2097540)HPP: HppAADetermineStatus:96: Unknown Check condition 0/2 0x2 0x3a 0x1.
2022-07-02T08:12:27.580Z cpu1:2097540)ScsiUid: 319: Path 'vmhba2:C0:T5:L0' does not support VPD Device Id page.
...skipping...

>>>>
2022-08-07T08:58:34.929Z cpu38:3285114)@BlueScreen: Machine Check Exception on PCPU38 in world 3285114:vmm2:ceph-rg
System has encountered a Hardware Error - Please contact the hardware vendor
>>>>

2022-08-07T08:58:34.940Z cpu38:3285114)Code start: 0x420035600000 VMK uptime: 184:18:56:02.157
2022-08-07T08:58:34.956Z cpu38:3285114)0x4538e5f1beb0:[0x42003574e446]IDTVMMMCE@vmkernel#nover+0x12 stack: 0xffffffffffffffff
2022-08-07T08:58:34.971Z cpu38:3285114)0x4538e5f1bf90:[0x420035750ba6]IDT_VMMForwardMCE@vmkernel#nover+0xb stack: 0x0
2022-08-07T08:58:34.986Z cpu38:3285114)0x4538e5f1bfa0:[0x420035728859]VMMVMKCall_Call@vmkernel#nover+0xee stack: 0x0
2022-08-07T08:58:35.004Z cpu38:3285114)0x4538e5f1bfe0:[0x420035754549]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x42003575453c
2022-08-07T08:58:35.014Z cpu38:3285114)base fs=0x0 gs=0x420049800000 Kgs=0x0

>>>>
2022-08-07T08:58:34.724Z cpu38:3285114)MCA: 196: UC Excp G5 B1 Sbb80000000000174 A0 M86 P0/0 Cache Hierarchy: Level 0 Data Cache Eviction Error.
>>>>

2022-08-07T08:58:35.041Z cpu38:3285114)CPU model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz, FMS: 06/4f/1, uCodeRev: b00003e
2022-08-07T08:58:35.041Z cpu38:3285114)PRODUCTNAME:PowerEdge R730, VENDORNAME:Dell Inc., SERIAL_NUMBER:52SJ1M2, SERVER_UUID:4c4c4544-0032-5310-804a-b5c04f314d32, VERSION:, SKU:SKU=NotProvided;ModelName=
PowerEdge R730, FAMILY:

 

What do you guys think the issue was behind the PSOD based on the MCA log report found in the vmkernel dump?

Looking forward to your replies. 

No Responses!

Top