Unsolved
1 Message
0
819
August 27th, 2021 07:00
Disks failing - VG2 Failed dataloss
Goodafternoon,
We have a serious issue with a MD3260 SAS box. 4Gb disks randomly fail and we trying to comprehend the messages in the MEL (Major Event Log).
We see the following happening :
Date/Time: 8/27/21 2:00:56 AM
Sequence number: 36001
Event type: 2251
Event category: State Change
Priority: Critical
Event needs attention: true
Event send alert: true
Event visibility: true
Description: Physical disk failed - configuration read failed at start-of-day
Event specific codes: 0/0/0
Component type: Physical Disk
Component location: Enclosure 0, Drawer 2, Slot 0
Logged by: RAID Controller Module in slot 0
Followed by :
Description: Piece failed
Event specific codes: 0/0/0
Component type: Physical Disk
Component location: Enclosure 0, Drawer 2, Slot 0
.
When we look at the piece at 0,2,0. There is no problem. (from storage array profile)
Enclosure 0, Drawer 2, Slot 0 WD WD4001FYYG Serial Attached SCSI (SAS) 3,725.523 GB
0, 2, 0 Optimal 3,726.023 GB Physical Disk SAS 6 Gbps WD4001FYYG
Physical Disk at Enclosure 0, Drawer 2, Slot 0
Status: Optimal
Mode: Assigned
Raw capacity: 3,726.023 GB
Usable capacity: 3,725.523 GB
World-wide identifier: 50:00:0c:0f:01:37:2c:00:00:00:00:00:00:00:00:00
Associated disk group: 2
Port Channel
0 1
1 2
Media type: Physical Disk
Interface type: Serial Attached SCSI (SAS)
Physical Disk path consistency: OK
Security capable: No
Secure: No
Read/write accessible: Yes
Physical Disk security key identifier: Not Applicable
Speed: 7,200 RPM
Current data rate: 6 Gbps
Logical sector size: 512 bytes
Physical sector size: 512 bytes
Product ID: WD4001FYYG
Physical Disk firmware version: D1R7
Serial number: WMC1F0D3YNFH
Manufacturer: WD
Date of manufacture: Not Available
Now to the statecapture dump :
This one tells us we have 2 failed pieces :
Piece Devnum Address Tray/Slot State
0 0x00010030 0x05ea84d8 00,49 PieceOptimalState
1 0x0001002b 0x05ea83c8 00,44 PieceOptimalState
2 0x00010028 0x05ea82b8 00,41 PieceOptimalState
3 0x00010009 0x05ea81a8 00,10 PieceOptimalState
4 0x00010003 0x05ea8098 00,04 FAILED
5 0x00010006 0x05ea7f88 00,07 PieceOptimalState
6 0x00010010 0x05ea7e78 00,17 PieceOptimalState
7 0x00010007 0x05ea7d68 00,08 PieceOptimalState
8 0x00010018 0x05ea7c58 00,25 FAILED
9 0x0001002d 0x05ea7b48 00,46 PieceOptimalState
0x00010003 0x05ea8098 00,04 FAILED = in actual fact :0,1,4. This disk is healthy in the profile.
and
0x00010018 0x05ea7c58 00,25 FAILED = in actual fact 0,3,1. This disk is also healthy in the profile.
No hotspares are in use. But the recovery guru does sense something is wrong. It does not specifiy the disk that has failed. So were in a position where we already have swapped some disk that did fail but we don't know how to move forward from here.
logfiles etc are available!
Many thanks
Remco


BettaFish
2 Posts
0
November 28th, 2021 16:00
hey there! You can try to rebuild your RAID array - but there are a few do's and don't so you don't make matters worse. I found this helpful guide on how to rebuild a failed RAID that will at least get you started. Cheers!
sculljam_
1 Rookie
•
4 Posts
0
January 21st, 2022 03:00
did this failed disk(s) ever get fixed in November 2021 ?if your MD has > 8.20.10.x firmware then,
I would add that logging into controllers serially > and issuing this command may help
vdmRecoverAllRAIDVols 1,1
commands Comes with a health warning and no guarantee of fixing.