Unsolved
4 Posts
1
2254
November 19th, 2019 19:00
Dell Precision 5540 GPU down clocking to 300MHz and not coming up again
Hello, I posted this to NVIDIA forums already but I will add this here too if anyone has any ideas.
My configuration is Dell Precision 5540, i7-9850H and Quadro T2000 (specifically NVIDIA Corporation TU117GLM [Quadro T2000 Mobile / Max-Q]) with Fedora 31, (5.3.9-300.fc31.x86_64) and NVIDIA drivers being 440.31 installed from akmod package.
The problem I'm encountering is that basically on all 3D applications, after a while, like 5 minutes or so, the GPU starts to throttle. That is normal and understandable especially on a laptop. But the problem is that it throttles down to 300MHz and does not clock any higher without a reboot. That is basically unusable at that point. The PowerMizer Preferred Mode setting does not affect this at all.
For demonstration purpose I wrote a script that takes current gpu frequency and temperature and appends them to to csv file (attached) and I run TW: WH 2 benchmarks in this order:
1. battle benchmark (avg fps 47.7)
2. skaven benchmark (avg fps 48.6)
3. battle benchmark (avg fps 12.7) as can be seen the effect on performance is huge
I will attach a csv file of the script where on timestamp 1574211968198 a drop in the frequency can be observed which happened pretty near to end of the skaven benchmark. Then also the temperature starts going down but the clock speed never picks up. I will attach screenshots from the runs and also provide the nvidia-bug-report.log
I do realize that this might be compatibility issue with my laptop manufacturer as at the same time the GPU starts to throttle the CPU also starts to throttle. Yet the CPU recovers normally as soon as the temps recover but the GPU does not recover without a reboot.
Files:
nvidia-bug-report.log.gz https://drive.google.com/open?id=16ErfsUudEoH4dkbsHN0BIarUEE03JFTb
frequency and temperature csv https://drive.google.com/open?id=11uDnWnEw3SW-b6knXNG1ZuiRJBHXH5_L
first benchmark https://drive.google.com/open?id=1_VF2Q5e3ea4F0vZUxj1zCqULdHSXlVlQ
second benchmark https://drive.google.com/open?id=1Zk4r3j9EuMs0tKJ6LSzWIwKsCOlX0Jjr
third benchmark https://drive.google.com/open?id=1RxVZuFE_Gv5wm_mKpnKhswyaQa98vSqv
Hopefully someone here can figure something out, if there are some parameters or settings I could test, I'm more that willing to give them a try!



sopsaare
4 Posts
1
November 20th, 2019 10:00
It all makes sense when one realizes that the GPU in the 5540 is T2000/Max-Q which is not supported on Linux by NVIDIA
This list contains T2000 but not the Max-Q variant.
https://download.nvidia.com/XFree86/Linux-x86_64/440.31/README/supportedchips.html
Which is again quite weird as the laptop was sold with Linux and I cannot really understand how the laptop is supposed to have Linux support when NVIDIA itself does not have the T2000 Max-Q as supported in their Linux drivers.
Yet the Dell Web Page for my Service Tag shows NVIDIA Quadro T2000 4GB but Linux identifies the card as NVIDIA Corporation TU117GLM [Quadro T2000 Mobile / Max-Q] (rev a1) so cannot really be certain which it is? Or is there any difference?
Yet it would make sense that this is a "Max-Q" model which in fact has significance in power, heat and clock management and that would not be supported under current drivers. Or if this really is the case it would give me some hope that this get resolved.
Dell-Kiran R
12 Posts
0
November 22nd, 2019 00:00
Hi, thank you for reaching out to us.
Have you checked the system requirements and operating system requirement for the 3D applications that are causing the issue.
Dell has tested Red Hat Linux 8.0, Ubuntu 18.04 and windows 10 operating system on your system model hence it should work fine. Again you need to check if the 3D applications can work with the operating system version that you have installed.
However you can try the below trouble shooting steps and check if that makes any difference.
1) Right click on desktop and go to Intel Graphics Settings or launching from Start menu -> Intel Graphics Control Panel
2) Click Power tile on Intel Graphics Control Panel
3) Click Disable in Panel Self-Refresh on On Battery page and then click Apply or Remove the AC adapter from the system, click Disable in Panel Self-Refresh and then click Apply
4) Click Disable in Panel Self-Refresh on Plugged In page and then click Apply or Attach the AC adapter to the system, click Disable in Panel Self-Refresh and then click Apply.
sopsaare
4 Posts
1
November 22nd, 2019 12:00
This is non helpful answer.
First of all the laptop was delivered with Ubuntu Linux and your trouble shooting steps are for Windows.
Second of all - all Linux distributions use same NVIDIA drivers, so the distributions does not make much difference here.
Third of all I stated that the behavior can be observed with any 3D application that is heavy enough to make the GPU throttle.
Forth of all I stated that the GPU clock do not come back up after first throttle no matter what, does not depend on the application I'm using. This in itself tells me that the problem is with drivers / firmware as the problem will apply for ALL applications after the first throttle.
What you could have told me is that is the Quadro T2000 a Max-Q variant in the laptop, that information should be available for you.
But thanks anyways, lets hope that someone else has found a solution / will find a solution or at least read this before buying this specific machine to be used with Linux.
Dell-Kiran R
12 Posts
0
November 23rd, 2019 03:00
Thank you for your reply.
We understand your situation and we are here to help you. What is the current Ubuntu version you have installed.
Please private message your system service tag for us to check your system details and assist you further.
BobGorman
7 Posts
0
December 2nd, 2019 09:00
jmamede
1 Message
0
November 24th, 2020 15:00
Same happens to me on RHEL 8.x
when on AC : nvidia throthles to 300mhz.
When on battery: everything is normal 1000+mhz.
Something is wrong in the firmware or the driver, probably forcing power saving mode on AC, and using adaptative when on Batery.
I tried to configure the module via /etc/modprobe.d/nvidia.conf and /etc/X11/Xorg.conf, respectively, with options such as :
options nvidia NVreg_RegistryDwords="OverrideMaxPerf=0x1;PowerMizerEnable=0x1; PerfLevelsrc=0x2233; PowerMizerLevel=0x1; PowerMizerDefault=0x2; PowerMizerDefaultAC=0x1"
Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelsrc=0x2233; PowerMizerLevel=0x1; PowerMizerDefault=0x2; PowerMizerDefaultAC=0x1"
It does not help.
Please help us fix this. It might be a BIOS hysteresis bad configuration from Dell (as the driver and configuration is the same in my other laptop and this does not occur with a gtx rtx 2060).
Please help us dell, I do machine learning for work in this laptop and I really could use the extra speed as my processes are nights long.