Severe performance degradation with PERC H965i when patrol reads are running

Question

Dear all,we face a serious issue with our PE-R760xd2 systems. It hosts 24x 24TB NL-SAS disks in two raid6 sets (12 disks each) controlled by a PERC H965i adapter. Systems is running AlmaLinux-9 with latest kernel, drivers, etc. Versions in use:[avocado10] /root # perccli2 /call show | grep ' Version'CLI Version = 008.0008.0000.0012 Mar 16, 2024Package Version = 8.8.0.0.18-32Firmware Version = 8.8.0.225-00018-00014Firmware Security Version Number = 00.00.00.00NVDATA Version = 08.0E.00.09Driver Version = 8.8.1.0.50During patrol reads data throughput when reading from one of these raids is decreased to ridiculously low numbers: [avocado10] /root # perccli2 /c0 start patrolread[...][avocado10] /root # dd if=/dcache/0/data.1G of=/dev/null bs=256k iflag=direct4096+0 records in4096+0 records out1073741824 bytes (1.1 GB, 1.0 GiB) copied, 102.834 s, 10.4 MB/sAfter stopping patrol reads it goes back to acceptable numbers: [avocado10] /root # perccli2 /c0 stop patrolread [...][avocado10] /root # dd if=/dcache/0/data.1G of=/dev/null bs=256k iflag=direct4096+0 records in4096+0 records out1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.43603 s, 312 MB/sAny one else seeing this?Thanks in advance,Andreas

DELL-Chris H · Answer

Ahaupt_desy,   I would start with ensuring the server is up to date on BIOS, iDrac, H965i, the drives, etc. After that then you can use OpenManage to schedule the Patrol Read to run at times that the server volume is lower, as seen on page 32 here.   Let me know if this helps.

ahaupt_desy · Answer

Dear Chris,thanks for your reply! The firmware is up-to-date already.Scheduling the patrol reads is not really an option as these nodes are in active usage all around the clock. As patrol reads also take some time to finish (in best case a day or so) and are schedule once a week automatically by the controller firmware, it does not really help in our case.I really do not think it is normal that a enterprise-grade 12-disk raid array falls back to the same read performance level like a single (older) sd card once patrol reads are running ...Cheers, Andreas

Kadrel777 · Answer

YES! Very similar behavior here (although quite a few hardware differences from your setup). Dell Support ticket open and pulling our hair out since ~February on this one.

Dell PowerEdge R7625 Windows Server 2022 Datacenter (on premise App + VDI host)

4x SAS 24gbps SSDs in RAID6

Performant (acceptably so) normally. Every Wednesday afternoon patrol hits and BAM - users scream.

We have a virtually identical R7625, (ONLY DIFF being lower-tier SSD's and less RAM) running 4x SATA 6gbps SSD in RAID6 with NO PROBLEMS. We actually moved workloads to that one because they were running better there, and we're trying to track down this issue. Dell Support has not been terribly useful.

DELL-Young E · Answer

Hello, ahaupt_desy, have you tried putting patrol read priority below 10%? default is 30%Respectfully,

ahaupt_desy · Answer

@DELL-Young E​ Yes, tried to decrease the prio to even 1% already. Data reading rates are very slightly increasing then but are still unacceptable.Found another affected customer (runs mass storage for academics, just like us) who suffers badly from this issue meanwhile. It's clearly an issue with this particular raid controller.Still wonder why DELL's enterprise support wasn't able to correlate our support cases for weeks! It was me who finally provided them with the ticket id of the other customer ...

DELL-Young E · Answer

Hello, glad you have an official ticket with Dell tech support- Keep on working with them. Another option I may mention is to turn it off and run patrol read manually when you can afford performance degradation.
Respectfully,

calestyo · Answer

Having exactly the same problem. 10 servers PowerEdge R760xd2 with H755 52.26.0-5179... as soon as the go in PR, the performance becomes a bad joke (~10-60 MB/s).

Had a similar issue already a while ago, when the controller did a completely crazy rebuild (as broken HDD was removed, and it *then* started a rebuild *before* the replacement HDD was even added back... and it did rebuilds concurrently *per* configured array, which is IMO completely nuts).

That was even discussed with support back then, but I guess it was mostly ignored as being a fundamental and severe issue in the controller by them.

jsedlacek · Answer

Hi, what is the write performance in normal state and during patrolread? Jan

DELL-Chris H · Answer

Jsedlacek,

There isn't a set numerical value for that, as there are too many variables, such as raid level and I/O.

Now it should be that the controller automatically throttles patrol read activity based on current I/O load. If the system is under heavy I/O (e.g., high write activity), patrol read should consume fewer resources. So if the system is under a heavy load, and it isn't throttling, then what I would recommend is ensuring that the server is up to date on BIOS, iDrac, Perc, and drives and then retry.

Let me know if this helps.

jsedlacek · Answer

@DELL-Chris H ​  Thank you for your answer. I am interested in performace of this specific setup. Jan

PowerEdge HDD/SCSI/RAID

Severe performance degradation with PERC H965i when patrol reads are running

Was this post helpful?