Start a Conversation

Unsolved

I

1 Rookie

 • 

14 Posts

1083

November 24th, 2020 12:00

Avamar Gen4 Stripe suspended

Hi,

We have an Avamar storage node ( Gen4 on DELL R510). Recently came across some stripe suspended on one of the node. The node having stripe suspended status has turned red as well in gui. When doing health checks it's all showing good for node and there is no disk error as well. Any idea on how can we fix this stripe error  shown below ? hsfchecks all completed multiple times and no issues reported. Unfortunately we do not have Avamar support.

-------------

All reported states=(ONLINE), runlevels=(fullaccess), modes=(mhpu+0hpu+0hpu)
System-Status: ok
Access-Status: full
4863 stripes SUSPENDED

-----------------

 

Note : all the above happened few hours after we got alert below. But investigating further  on this specific disk it's all showing good and healthy. Vdisk and pdisk all good. Checked physically as well and no amber lights.

Nov 21 01:18:23 xxxxx Server Administrator: Storage Service EventID: 2405 Command timeout on physical disk: Physical Disk 0:0:7 Controller 0, Connector 0

Nov 21 01:18:33 xxxx Server Administrator: Storage Service EventID: 2095 Unexpected sense. SCSI sense data: Sense key: 6 Sense code: 29 Sense qualifier: 0: Physical Disk 0:0:7 Controller 0, Connector 0

2 Intern

 • 

2K Posts

November 24th, 2020 13:00

The Avamar server software runs periodic read performance tests (called "perfbeat") on its data partitions. The software will suspend a partition if the disk read performance for that partition falls below an acceptable threshold and remains that way for a certain number of read tests in a row. Suspended partitions are moved out of the suspended state automatically if the read performance recovers.

For Gen4 and earlier hardware, partition suspension was highly correlated with disk failure. Hardware diagnostics and service lights can only confirm that there is an issue; they cannot rule it out. It is very likely that disk 0:0:7 is failing.

There were some architectural changes in Gen4S and Gen4T that make perfbeat a less reliable indicator of failure, so partition suspension is normally set to a higher threshold or disabled entirely on these platforms.

1 Rookie

 • 

14 Posts

November 24th, 2020 14:00

Thanks @ionthegeek 

If a stripe is suspended does the node also go red status in gui ?  Because that's the case with us. Done the physical check, checking the pdisk the predictive failure for the disk is also showing NO. Everything looks clean. Is there a way to force the perfbeat checks of specific stripe ?

 

2 Intern

 • 

2K Posts

November 25th, 2020 06:00

If there are stripes suspended, the affected node will be marked red in the GUI, yes.

Diagnostics cannot predict or detect every kind of failure. This disk is almost certainly failing and should be replaced.

The perfbeat check runs every five minutes and if it passes, the partition will be removed from the suspended state immediately. The only reason the partition would still be suspended is if it still has poor read performance.

November 25th, 2020 23:00

I am working on the same issue as @iamtheroot .   So as explained there is no disk failure.  the PDISK & VDISK all healthy.  Is there a way to manually fail the suspect disk in this RAID-1 mirror?  Will running "Avmaint perf reset" mark stripes as healthy again?  

2 Intern

 • 

2K Posts

November 26th, 2020 08:00

The errors shown in the original post mean that the controller tried to send a command to the disk, the command timed out, then the drive was forcibly restarted (power on reset). This is often an early indicator of drive failure. Disk 0:0:7 should be replaced.

The omconfig command can be used to blink the LED on a physical disk to identify it, e.g.:

omconfig storage pdisk action=blink controller=0 pdisk=0:0:7

I strongly recommend against attempting to force the partition to resume. The underlying issue needs to be resolved. Trying to force things may lead to serious problems including data loss.

1 Rookie

 • 

14 Posts

November 26th, 2020 13:00

Thanks @ionthegeek . Does the firmware version for the disk matter if we replace the disk or just need to be the same model ?.  Also for replacing the disk do we need to follow any procedure or just  blink the disk and take it out  then replace it ? The raid will automatically detect and rebuild it ?

 

 

2 Intern

 • 

2K Posts

November 26th, 2020 14:00

Avamar disks have customized firmware. If possible, the disk should be replaced with another Avamar Gen4 disk. I don't know how much of an impact using a disk with different firmware would have.

Gen4 disks were customer replaceable units, so there should be a replacement procedure in the service manual on the support site. I believe you can just pull out the old disk and put the new one in but I recommend reviewing the procedure to confirm.

No Events found!

Top