Start a Conversation

Unsolved

A

1 Rookie

 • 

7 Posts

276

November 1st, 2023 11:48

R650xs + ipxe == server hangs on reboot if tg3 driver is used in Linux

Summary: If server is booted ipxe in EFI mode, reboot/poweroff is hanging in Linux if tg3 driver is loaded.

Server: PowerEdge R650xs (SKU=0A19, s/n [private information removed by moderator])

BIOS: 1.11

Steps to reproduce:

1. Install any Linux with kernel 6.0 or newer (I used 6.5, but any version newer than 6.0 will do)

2. Switch server to EFI mode

3. Enable PXE on any interfaces

4. Enable PXE  as first in boot order

5. Configure external dhcp/tftp server with ipxe.efi (of any version)

6. Boot. ipxe.efi will try to get loading parameters and exit, a normal linux is loaded

7. Assure tg3 driver is loaded (rmmod tg3; modprobe tg3)

8. Reboot

Expected behaivor: reboot

Actual behavior: system hanging after message `ACPI: PM: Preparing to enter system sleep state S5`.

I identify following changes in Linux kernel: 2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca

which changed tg3.c in function tg3_shutdown:

```

-       if (system_state == SYSTEM_POWER_OFF)
-               tg3_power_down(tp);
+       tg3_power_down(tp);

```

The call to tg3_power_down (if ipxe.efi was ever run on the server) causing it to hang on reboot.

I also confirmed that this problem present in older kernel for power-off: e.g. even with older kernel versions, `systemctl poweroff` causing hang at the same moment (ACPI: PM: Preparing to enter system sleep state S5).

This problem does not present in other servers (I've tested Dell R350).

This problem happens disregarding if tg3-managed network interfaces (BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller) are used or not, just presence of tg3 driver is enough to trigger the problem.

Contrary, removing tg3 driver is solving the problem (because kernel function tg3_shutdown is no longer is called).

I assume there is a problem with SNP driver in UEFI for BCM57416, and kindly ask to escalate it to engineers.

Moderator

 • 

4.4K Posts

November 1st, 2023 18:07

Hello amarao ,

 

The title of the thread says R640xs but the service tag you posted says R650xs. I assume it is the R650xs. Please let me know if that is incorrect. Also please note I removed the Service Tag for privacy.

 

Note: RHEL6 not listed as supported on the R650xs

R650xs supports these operating systems : https://dell.to/3QC4pcZ

 

 

BIOS 1.11.2 is current version

Is the iDRAC and NIC up to date?

 

iDRAC v.7.00.30.00

https://dell.to/3u64FrK

 

 

Also confirm the NIC is at the latest firmware and driver depending on OS:

-------------------------

BCM57416 Broadcom NetXtreme-E Network Device Firmware 22.61.10.77

https://dell.to/3QGMfqw

RHEL8,9

 

NIC driver:

Broadcom Linux RPM packaged driver updates for NetXtreme-E Ethernet, 22.6

https://dell.to/3QF7Dwh

RHEL8,9

 

-------------------------

Broadcom NetXtreme-E Network Device Firmware 22.0

https://dell.to/3QE0L2n

RHEL7,8

 

Broadcom Linux RPM packaged driver updates for NetXtreme-E Ethernet, 22.0

https://dell.to/3MnPSis

RHEL7,8

-------------------------

 

Could you test, reproduce, issue on RHEL7,8 or 9 after applying the OS associated firmware /driver?

 

1 Rookie

 • 

7 Posts

November 2nd, 2023 13:53

Thank you very much for the answer!

1. You are right, the proper title for the post is R650xs. I've fixed it, thanks.

2. BIOS version 1.11.2 (I've updated it yesterday and re-confirmed the presence of problem)
3. iDRAC version is 7.00.30.00 (updated yesterday, re-confirmed)
4. Nic firmware for BCM5720 is FFV22.61.8 bc 5720-v1.39 (per ethtool -i output). It was updated from firmware 22.61.8 (Network_Firmware_4G8G9_WN64_22.61.8.EXE).
5. I've installed supported Ubuntu 22.04 with 6.2.0-35 kernel (the problem appear for reboot starting the kernel 6.0, for kernels <6.0 it happens only on poweroff events).

Given all those I was able to reproduce the problem with the most up-to-date drivers/firmware on Ubuntu 22.04, which is listed as supported OS for R650xs. Unfortunately, I don't have RHEL with kernel version 6+ on hands.

I've send patch to Linux upstream, but they said that Dell kernel developers should see it first. Can I ask you to escalate this issue? Thanks!

(edited)

Moderator

 • 

3.5K Posts

02-11-2023 15:12 PM

Hi,

for RHEL we did not patch this for R650xs probably.

with kind regards Martin/ Liebe Grüße Martin

DELL-Martin S

Social Media and Communities Professional

Dell Technologies | Enterprise Support Services

#IWork4Dell

Did I answer your query? Please click on ‘Mark as Accepted Answer’. ‘Thumbs up’ the posts you like!

1 Rookie

 • 

7 Posts

November 3rd, 2023 11:55

Sorry, I didn't get the answer. Is this problem acknowledged or rejected?

Moderator

 • 

3.5K Posts

03-11-2023 15:17 PM

sorry my fault I mean RHEL 6

with kind regards Martin/ Liebe Grüße Martin

DELL-Martin S

Social Media and Communities Professional

Dell Technologies | Enterprise Support Services

#IWork4Dell

Did I answer your query? Please click on ‘Mark as Accepted Answer’. ‘Thumbs up’ the posts you like!

1 Rookie

 • 

7 Posts

November 6th, 2023 13:05

I'm sorry, I I really can't understand, is my problem acknowledged? Is there anything else I can provide? I've send workaround into current Linux upstream, but it solves only reboot problem (not shutdown), and I believe the problem is on the Dell side (within EFI SNP stack and ACPI).

Moderator

 • 

4.4K Posts

06-11-2023 13:46 PM

Hello amarao ,

 

Could you pull a SupportAssist report and upload for me to review?

 

Export a SupportAssist Collection Using an iDRAC9

https://dell.to/3spGNiv

 

How to Share Log Files, Screenshots, and Error Messages with Dell

https://dell.to/3QsihoV

DELL-Charles R

Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#IWork4Dell

Did I answer your query? Please click on ‘Accept as Solution’. ‘Thumbs up’ the posts you like!

1 Rookie

 • 

7 Posts

November 9th, 2023 12:24

Thank you!


I've send  ith Support Assist Collection.

(edited)

Moderator

 • 

4.4K Posts

09-11-2023 14:40 PM

Hello George,

Thank you for the update. I will gather the log, review and update you. Please allow me some time on this.

DELL-Charles R

Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#IWork4Dell

Did I answer your query? Please click on ‘Accept as Solution’. ‘Thumbs up’ the posts you like!

Moderator

 • 

4.4K Posts

09-11-2023 15:57 PM

Hello George,

I would recommend to contact Support directly. Looks like it may need a system board replacement for the Embedded NIC having errors:

TSR review:

2023-11-08 14:29:55        117        A fatal error was detected on a component at bus 4 device 0 function 0.

004 : 00 : 00        Broadcom Inc. and subsidiaries        NetXtreme BCM5720 Gigabit Ethernet PCIe        NIC.Embedded.1-1-1

DELL-Charles R

Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#IWork4Dell

Did I answer your query? Please click on ‘Accept as Solution’. ‘Thumbs up’ the posts you like!

1 Rookie

 • 

7 Posts

November 10th, 2023 12:17

It's not a particular device's fault.

This error is registered every time I unload tg3 driver and do reboot due to inproper power state of device during reboot. I confirmed it on multiple servers, and this particular server is just my laboratory exemplar. I'm investigating repeatable problem on multiple servers during IPXE provisioning.

Moderator

 • 

3.8K Posts

10-11-2023 14:10 PM

Hello,

please can you contact Support directly, our enginner team can get a look with you with a remote session directly, we cannot do it in the forum.

Thanks

DELL- Marco B

Social Media and Communities Professional

Dell Technologies | Enterprise Support Services

#IWork4Dell

Did I answer your query? Please click on ‘Mark as Accepted Answer’. ‘Thumbs up’ the posts you like!

No Events found!

Top