Unsolved
1 Rookie
•
3 Posts
1
110
April 10th, 2025 13:05
Severe performance degradation with PERC H965i when patrol reads are running
Dear all,
we face a serious issue with our PE-R760xd2 systems. It hosts 24x 24TB NL-SAS disks in two raid6 sets (12 disks each) controlled by a PERC H965i adapter. Systems is running AlmaLinux-9 with latest kernel, drivers, etc.
Versions in use:
[avocado10] /root # perccli2 /call show | grep ' Version'
CLI Version = 008.0008.0000.0012 Mar 16, 2024
Package Version = 8.8.0.0.18-32
Firmware Version = 8.8.0.225-00018-00014
Firmware Security Version Number = 00.00.00.00
NVDATA Version = 08.0E.00.09
Driver Version = 8.8.1.0.50
During patrol reads data throughput when reading from one of these raids is decreased to ridiculously low numbers:
[avocado10] /root # perccli2 /c0 start patrolread
[...]
[avocado10] /root # dd if=/dcache/0/data.1G of=/dev/null bs=256k iflag=direct
4096+0 records in
4096+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 102.834 s, 10.4 MB/s
After stopping patrol reads it goes back to acceptable numbers:
[avocado10] /root # perccli2 /c0 stop patrolread
[...]
[avocado10] /root # dd if=/dcache/0/data.1G of=/dev/null bs=256k iflag=direct
4096+0 records in
4096+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.43603 s, 312 MB/s
Any one else seeing this?
Thanks in advance,
Andreas
DELL-Chris H
Moderator
•
9.4K Posts
0
April 10th, 2025 20:15
ahaupt_desy
1 Rookie
•
3 Posts
0
April 11th, 2025 12:08
Dear Chris,
thanks for your reply! The firmware is up-to-date already.
Scheduling the patrol reads is not really an option as these nodes are in active usage all around the clock. As patrol reads also take some time to finish (in best case a day or so) and are schedule once a week automatically by the controller firmware, it does not really help in our case.
I really do not think it is normal that a enterprise-grade 12-disk raid array falls back to the same read performance level like a single (older) sd card once patrol reads are running ...
Cheers,
Andreas
Kadrel777
1 Rookie
•
1 Message
0
June 18th, 2025 21:21
YES! Very similar behavior here (although quite a few hardware differences from your setup). Dell Support ticket open and pulling our hair out since ~February on this one.
Dell PowerEdge R7625 Windows Server 2022 Datacenter (on premise App + VDI host)
4x SAS 24gbps SSDs in RAID6
Performant (acceptably so) normally. Every Wednesday afternoon patrol hits and BAM - users scream.
We have a virtually identical R7625, (ONLY DIFF being lower-tier SSD's and less RAM) running 4x SATA 6gbps SSD in RAID6 with NO PROBLEMS. We actually moved workloads to that one because they were running better there, and we're trying to track down this issue. Dell Support has not been terribly useful.
DELL-Young E
Moderator
•
5.1K Posts
0
June 19th, 2025 02:48
Hello, ahaupt_desy, have you tried putting patrol read priority below 10%? default is 30%
Respectfully,
ahaupt_desy
1 Rookie
•
3 Posts
0
June 19th, 2025 11:48
@DELL-Young E
Yes, tried to decrease the prio to even 1% already. Data reading rates are very slightly increasing then but are still unacceptable.
Found another affected customer (runs mass storage for academics, just like us) who suffers badly from this issue meanwhile. It's clearly an issue with this particular raid controller.
Still wonder why DELL's enterprise support wasn't able to correlate our support cases for weeks! It was me who finally provided them with the ticket id of the other customer ...
DELL-Young E
Moderator
•
5.1K Posts
0
June 20th, 2025 04:44
Hello, glad you have an official ticket with Dell tech support- Keep on working with them. Another option I may mention is to turn it off and run patrol read manually when you can afford performance degradation.
Respectfully,
calestyo
1 Rookie
•
1 Message
0
June 28th, 2025 03:41
Having exactly the same problem. 10 servers PowerEdge R760xd2 with H755 52.26.0-5179... as soon as the go in PR, the performance becomes a bad joke (~10-60 MB/s).
Had a similar issue already a while ago, when the controller did a completely crazy rebuild (as broken HDD was removed, and it *then* started a rebuild *before* the replacement HDD was even added back... and it did rebuilds concurrently *per* configured array, which is IMO completely nuts).
That was even discussed with support back then, but I guess it was mostly ignored as being a fundamental and severe issue in the controller by them.