Start a Conversation

Solved!

Go to Solution

1 Rookie

 • 

1 Message

448

September 2nd, 2024 06:03

High PSU fan speed

Hi evereyone

I am struggeling with very noisy PSU fan speeds. I've had this issue a while back and a firmware upgrade of the 2 power supplies fixed the issue back then. However, there is no FW upgrade available now and there is nothing I can think of to fix the issue.

The machine is a Dell T630 with 2 Delta 1100W PSUs (PN: Y26KX). FW is 00.1D.80. Usually the power draw peak is sub 200W according to iDrac.However, there are occasianly peaks of ca. 300W.

Sometimes a reboot and unplugging the PSU's helps but the issue comes back again. No errors in iDrac reported. System fans are normal.


Power settings:

Power Cap Policy: disabled

Redundancy Policy: Input Power Redundant

Hot Spare: enabled

Primary Power Supply Unit: PSU1

PFC: disabled

Is there a way to find out why the PSU fans ramp up that much? Are there any IPMI commands to check PSU fan data?

Moderator

 • 

2.8K Posts

September 2nd, 2024 11:14

Hello,

This could be happening because of poor airflow in the case or a buildup of dust. Also, a faulty part in the PSU or somewhere else in the system might be causing it to draw more power, which makes the fans speed up. Do you see any warning or error on system event logs? Please ensure your iDRAC and BIOS are also up to date. 

from here you can review commands iDRAC 8/7 v2.50.50.50 RACADM CLI Guide (https://dell.to/3MrRdoj) 

 

Hope that helps!

1 Rookie

 • 

3 Posts

January 9th, 2025 12:26

I'm also having this issue. PSU fans ramp up to high speed during boot despite low active power draw (~250w). Power from GPU boards are in use with two GPUs. I upgraded from 1100w power supplies to 1600w and am still experiencing loud, constantly high speed PSU fans despite cold ambient temps and no load. I do have 16/18 drive bays populated as well.

I can confirm there is no buildup of dust. The behavior is the same across all 4 PSUs (all with updated firmware) so it doesn't seem likely all are faulty in the same manner. 

How would I determine if there is a faulty part somewhere else in the system? Are there specific iDRAC commands in the command reference you sent that are relevant to investigating this issue? Thank you for your help!

Moderator

 • 

3.5K Posts

January 9th, 2025 14:09

Hi,

I'm sorry you're dealing with this persistent noisy PSU fan issue. Since a firmware update previously resolved the problem and that's not an option now, we need to explore other possibilities. The fact that the issue persists across multiple PSUs (even after upgrading to 1600W units) suggests the problem likely isn't solely with the power supplies themselves.

Let's systematically investigate potential causes:

  1. iDRAC Monitoring: While you mention no iDRAC errors, we need to delve deeper. Check these aspects via iDRAC's web interface or command-line interface:

    • PSU individual readings: Don't just look at overall power draw. Check the individual voltage rails (3.3V, 5V, 12V) of each PSU. Inconsistent voltages or readings outside the acceptable range on any rail could indicate a problem.
    • Temperature sensors: Monitor the internal temperature sensors within the chassis and on the PSUs themselves (if available). High temperatures, even with seemingly low power draw, can trigger the fans.
    • Power supply health: iDRAC often provides a health status for each PSU. Look for any warnings or degraded status reports.
    • Log files: Examine the iDRAC event logs for any entries related to power supply events, even if not classified as errors. Sometimes warnings or informational messages hint at underlying issues.
  2. Load Distribution: With two GPUs and numerous drives, uneven load distribution could be a factor, even if the total power draw seems low. Try these steps to check for inconsistencies:

    • Isolate components: If possible, temporarily remove some of the less critical components (e.g., some drives) to see if the fan noise changes. This helps isolate if a specific component or group of components is overloading a particular PSU.
    • GPU power draw: Monitor individual GPU power consumption (using GPU monitoring software). One GPU might be drawing more power than expected, leading to uneven PSU loading and triggering the fans.
  3. Cable Management and Connections:

    • Secure connections: Ensure all power supply cables are securely connected to both the PSUs and the motherboard/components. Loose connections can lead to unexpected power fluctuations.
    • Cable routing: Check for any pinched or damaged cables that might cause resistance or poor connections.
  4. System Board: While less likely, a failing component on the system board could send incorrect signals to the PSUs, causing them to overreact. This is a more difficult issue to diagnose without specialized tools.

iDRAC Commands (Illustrative Examples - Consult your iDRAC documentation): The exact commands depend on your iDRAC firmware version. These are general examples:

  • racadm getsysinfo (Get overall system information, including temperatures and power readings).
  • racadm getpsuinfo (Get specific information about each power supply).
  • racadm getsel (Retrieve System Event Log entries).

I sympathize with the frustration of dealing with this persistent noise. Let's work through this systematically to find a solution.

1 Rookie

 • 

3 Posts

January 9th, 2025 15:28

@Dell-Martin S​ Thank you so much for your quick reply. TO clarify a bit, this with with an idle system. There are VMs running but no load anywhere. Note, I have also tried with auto system fan speed (6 fans) and when I ramp the fans down with a script that monitors temp to ramp them back up. The PSU fans are the same in both instances.

PSU individual readings: Don't just look at overall power draw. Check the individual voltage rails (3.3V, 5V, 12V) of each PSU. Inconsistent voltages or readings outside the acceptable range on any rail could indicate a problem.

On the iDRAC Power/Thermal -> voltages page all voltages are listed as okay. I'm not seeing per-PSU readings for different rails. I installed racadm v9.4.0 (https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=0992n) but it doesn't have the getpsuinfo command.

# ./racadm getpsuinfo
ERROR: Invalid subcommand specified.

Temperature sensors: Monitor the internal temperature sensors within the chassis and on the PSUs themselves (if available). High temperatures, even with seemingly low power draw, can trigger the fans

All temperature sensors are listed as OK. CPU temps are 26/27 and inlet temp is 15C



Power supply health: iDRAC often provides a health status for each PSU. Look for any warnings or degraded status reports.

Both are listed as OK, no warnings. One at 1.2A, 122V the other at 0.8A, 122V. Both are part 095HR5A04, FW 00.3D.67.

Log files: Examine the iDRAC event logs for any entries related to power supply events, even if not classified as errors. Sometimes warnings or informational messages hint at underlying issues.

Unless there are logs to check other than iDRAC event logs / SEL entries, I'm not seeing anything other than power supply and drive insertion/removal which were all intentional.

Isolate components: If possible, temporarily remove some of the less critical components (e.g., some drives) to see if the fan noise changes. This helps isolate if a specific component or group of components is overloading a particular PSU.

Removing all 18 drives (I had added two spares I forgot about) did not change the PSU fan speed. Removing both GPUs reduced the PSU fan speed to a mid-level. Not idle/low like during boot but a clear step below the noise level with GPUs installed. I did not try removing only one GPU.

GPU power draw: Monitor individual GPU power consumption (using GPU monitoring software). One GPU might be drawing more power than expected, leading to uneven PSU loading and triggering the fans.

As I mentioned this is at complete idle.  nvidia-smi reports power draw of 5W and 7W with a temp of 24C and 26C.

Secure connections: Ensure all power supply cables are securely connected to both the PSUs and the motherboard/components. Loose connections can lead to unexpected power fluctuations.

I did secure cables that were accessible without removing additional components. The GPU power cables on the GPU side were re-seated along with a 10GbE and NVMe PCIe card.

Cable routing: Check for any pinched or damaged cables that might cause resistance or poor connections.

I did not observe any cables that showed signs of pinching or damage or would appear to be bent tightly. I did not remove other components to confirm.

System Board: While less likely, a failing component on the system board could send incorrect signals to the PSUs, causing them to overreact. This is a more difficult issue to diagnose without specialized tools.

Other than a basic multimeter I won't have those required specialized tools.

One note about behavior. On system power up the PSU fans immediately spin up and then slowly reduce speed during POST. As soon as POST completes and while attempting to find boot loaders the PSUs ramp to the high level and stay that way.

Again, I appreciate your assistance and insight into what things to check and probe! 

1 Rookie

 • 

3 Posts

January 19th, 2025 13:01

Just checking back in if there is anything else to check or try to keep the PSU fan speed down when the system is idle and power draw low? Thanks!

Moderator

 • 

5.1K Posts

January 20th, 2025 07:10

Hello I'm afraid I'm not aware of a method that lowers PSU fan speed.

Respectfully,

No Events found!

Top