Unsolved
1 Rookie
•
7 Posts
0
276
November 1st, 2023 11:48
R650xs + ipxe == server hangs on reboot if tg3 driver is used in Linux
Summary: If server is booted ipxe in EFI mode, reboot/poweroff is hanging in Linux if tg3 driver is loaded.
Server: PowerEdge R650xs (SKU=0A19, s/n [private information removed by moderator])
BIOS: 1.11
Steps to reproduce:
1. Install any Linux with kernel 6.0 or newer (I used 6.5, but any version newer than 6.0 will do)
2. Switch server to EFI mode
3. Enable PXE on any interfaces
4. Enable PXE as first in boot order
5. Configure external dhcp/tftp server with ipxe.efi (of any version)
6. Boot. ipxe.efi will try to get loading parameters and exit, a normal linux is loaded
7. Assure tg3 driver is loaded (rmmod tg3; modprobe tg3)
8. Reboot
Expected behaivor: reboot
Actual behavior: system hanging after message `ACPI: PM: Preparing to enter system sleep state S5`.
I identify following changes in Linux kernel: 2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca
which changed tg3.c in function tg3_shutdown:
```
- if (system_state == SYSTEM_POWER_OFF)
- tg3_power_down(tp);
+ tg3_power_down(tp);
```
The call to tg3_power_down (if ipxe.efi was ever run on the server) causing it to hang on reboot.
I also confirmed that this problem present in older kernel for power-off: e.g. even with older kernel versions, `systemctl poweroff` causing hang at the same moment (ACPI: PM: Preparing to enter system sleep state S5).
This problem does not present in other servers (I've tested Dell R350).
This problem happens disregarding if tg3-managed network interfaces (BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller) are used or not, just presence of tg3 driver is enough to trigger the problem.
Contrary, removing tg3 driver is solving the problem (because kernel function tg3_shutdown is no longer is called).
I assume there is a problem with SNP driver in UEFI for BCM57416, and kindly ask to escalate it to engineers.
DELL-Charles R
Moderator
•
4.4K Posts
0
November 1st, 2023 18:07
Hello amarao ,
The title of the thread says R640xs but the service tag you posted says R650xs. I assume it is the R650xs. Please let me know if that is incorrect. Also please note I removed the Service Tag for privacy.
Note: RHEL6 not listed as supported on the R650xs
R650xs supports these operating systems : https://dell.to/3QC4pcZ
BIOS 1.11.2 is current version
Is the iDRAC and NIC up to date?
iDRAC v.7.00.30.00
https://dell.to/3u64FrK
Also confirm the NIC is at the latest firmware and driver depending on OS:
-------------------------
BCM57416 Broadcom NetXtreme-E Network Device Firmware 22.61.10.77
https://dell.to/3QGMfqw
RHEL8,9
NIC driver:
Broadcom Linux RPM packaged driver updates for NetXtreme-E Ethernet, 22.6
https://dell.to/3QF7Dwh
RHEL8,9
-------------------------
Broadcom NetXtreme-E Network Device Firmware 22.0
https://dell.to/3QE0L2n
RHEL7,8
Broadcom Linux RPM packaged driver updates for NetXtreme-E Ethernet, 22.0
https://dell.to/3MnPSis
RHEL7,8
-------------------------
Could you test, reproduce, issue on RHEL7,8 or 9 after applying the OS associated firmware /driver?
amarao
1 Rookie
•
7 Posts
0
November 2nd, 2023 13:53
Thank you very much for the answer!
1. You are right, the proper title for the post is R650xs. I've fixed it, thanks.
2. BIOS version 1.11.2 (I've updated it yesterday and re-confirmed the presence of problem)
3. iDRAC version is 7.00.30.00 (updated yesterday, re-confirmed)
4. Nic firmware for BCM5720 is FFV22.61.8 bc 5720-v1.39 (per ethtool -i output). It was updated from firmware 22.61.8 (Network_Firmware_4G8G9_WN64_22.61.8.EXE).
5. I've installed supported Ubuntu 22.04 with 6.2.0-35 kernel (the problem appear for reboot starting the kernel 6.0, for kernels <6.0 it happens only on poweroff events).
Given all those I was able to reproduce the problem with the most up-to-date drivers/firmware on Ubuntu 22.04, which is listed as supported OS for R650xs. Unfortunately, I don't have RHEL with kernel version 6+ on hands.
I've send patch to Linux upstream, but they said that Dell kernel developers should see it first. Can I ask you to escalate this issue? Thanks!
(edited)
amarao
1 Rookie
•
7 Posts
0
November 3rd, 2023 11:55
Sorry, I didn't get the answer. Is this problem acknowledged or rejected?
amarao
1 Rookie
•
7 Posts
0
November 6th, 2023 13:05
I'm sorry, I I really can't understand, is my problem acknowledged? Is there anything else I can provide? I've send workaround into current Linux upstream, but it solves only reboot problem (not shutdown), and I believe the problem is on the Dell side (within EFI SNP stack and ACPI).
amarao
1 Rookie
•
7 Posts
0
November 9th, 2023 12:24
Thank you!
I've send ith Support Assist Collection.
(edited)
amarao
1 Rookie
•
7 Posts
0
November 10th, 2023 12:17
It's not a particular device's fault.
This error is registered every time I unload tg3 driver and do reboot due to inproper power state of device during reboot. I confirmed it on multiple servers, and this particular server is just my laboratory exemplar. I'm investigating repeatable problem on multiple servers during IPXE provisioning.