Disks failing - VG2 Failed dataloss

Question

Goodafternoon,

We have a serious issue with a MD3260 SAS box. 4Gb disks randomly fail and we trying to comprehend the messages in the MEL (Major Event Log).

We see the following happening :

Date/Time: 8/27/21 2:00:56 AM
Sequence number: 36001
Event type: 2251
Event category: State Change
Priority: Critical
Event needs attention: true
Event send alert: true
Event visibility: true
Description: Physical disk failed - configuration read failed at start-of-day
Event specific codes: 0/0/0
Component type: Physical Disk
Component location: Enclosure 0, Drawer 2, Slot 0
Logged by: RAID Controller Module in slot 0

Followed by :

Description: Piece failed
Event specific codes: 0/0/0
Component type: Physical Disk
Component location: Enclosure 0, Drawer 2, Slot 0

.

When we look at the piece at 0,2,0. There is no problem. (from storage array profile)

Enclosure 0, Drawer 2, Slot 0 WD WD4001FYYG Serial Attached SCSI (SAS) 3,725.523 GB

0, 2, 0 Optimal 3,726.023 GB Physical Disk SAS 6 Gbps WD4001FYYG

Physical Disk at Enclosure 0, Drawer 2, Slot 0

Status: Optimal

Mode: Assigned
Raw capacity: 3,726.023 GB
Usable capacity: 3,725.523 GB
World-wide identifier: 50:00:0c:0f:01:37:2c:00:00:00:00:00:00:00:00:00
Associated disk group: 2

Port Channel
0 1
1 2

Media type: Physical Disk
Interface type: Serial Attached SCSI (SAS)
Physical Disk path consistency: OK

Security capable: No
Secure: No
Read/write accessible: Yes
Physical Disk security key identifier: Not Applicable

Speed: 7,200 RPM
Current data rate: 6 Gbps
Logical sector size: 512 bytes
Physical sector size: 512 bytes
Product ID: WD4001FYYG
Physical Disk firmware version: D1R7
Serial number: WMC1F0D3YNFH
Manufacturer: WD
Date of manufacture: Not Available

Now to the statecapture dump :

This one tells us we have 2 failed pieces :

Piece Devnum Address Tray/Slot State
0 0x00010030 0x05ea84d8 00,49 PieceOptimalState
1 0x0001002b 0x05ea83c8 00,44 PieceOptimalState
2 0x00010028 0x05ea82b8 00,41 PieceOptimalState
3 0x00010009 0x05ea81a8 00,10 PieceOptimalState
4 0x00010003 0x05ea8098 00,04 FAILED
5 0x00010006 0x05ea7f88 00,07 PieceOptimalState
6 0x00010010 0x05ea7e78 00,17 PieceOptimalState
7 0x00010007 0x05ea7d68 00,08 PieceOptimalState
8 0x00010018 0x05ea7c58 00,25 FAILED
9 0x0001002d 0x05ea7b48 00,46 PieceOptimalState

0x00010003 0x05ea8098 00,04 FAILED = in actual fact :0,1,4. This disk is healthy in the profile.

and

0x00010018 0x05ea7c58 00,25 FAILED = in actual fact 0,3,1. This disk is also healthy in the profile.

No hotspares are in use. But the recovery guru does sense something is wrong. It does not specifiy the disk that has failed. So were in a position where we already have swapped some disk that did fail but we don't know how to move forward from here.

logfiles etc are available!

Many thanks

Remco

BettaFish · Answer

hey there! You can try to rebuild your RAID array - but there are a few do's and don't so you don't make matters worse. I found this helpful guide on how to rebuild a failed RAID  that will at least get you started.  Cheers!

sculljam_ · Answer

did this failed disk(s) ever get fixed in November 2021 ?if your MD has > 8.20.10.x firmware then,
I would add that logging into controllers serially > and issuing this command may help
vdmRecoverAllRAIDVols 1,1
commands Comes with a health warning and no guarantee of fixing.

PowerVault

Disks failing - VG2 Failed dataloss

Was this post helpful?