Unsolved
1 Rookie
•
5 Posts
0
613
August 25th, 2020 08:00
Raidgroup in DEG state, failover to hotspare not possible *and* all luns in faulted state while data is still accessible
Goodafternoon,
I have 2 issues with an VNX5600. For some reason all my luns in the pool 0 have the status faulted however (luckilly :-)) all my data is still accessible. I believe this to be a fake state but I do not have a clear solution how to remedy the situation. I am thinking about rebooting the SP's one by one to see if it clears the state.
My second issue is more severe. One of my raidgroups is in DEG state. We have proactively replaced 2 disks in this RG and on one it seems that the rebuild got stuck. We replaced this disk again and all disks seem healthy but the state of the RG still remains in DEG state. Triage shows that one disk has 1 softerror but when we select this and choose "copy to hotspare" it replies "cannot copy to hotspare as the RG is in degraded mode". I feel like this is a chicken/egg situation and I do not have a good plan of approach how to remedy this situation.
Looking forward to your insights!
Regards
Remco
DELL-Sam L
Moderator
•
7.6K Posts
0
August 25th, 2020 16:00
Hello remcoetten,
Are you seeing the fault showing on both SP management ports or just a single SP management port? What is the raid level of your disk group and how many drives are in the disk group? When you replaced the drives the first time did you replace both at the same time or did you replace the first drive and wait for the rebuild ton complete? What is your current OE on your VNX5600?
remcoetten
1 Rookie
•
5 Posts
0
August 25th, 2020 23:00
Hi Sam,
My endcustomer replaced the disks, I believe, one by one. Both SP's are reporting the fault. See output from Triage :
As for the RG, please see below :
I have Triage/spcollects available if need be.
Regards
Remco
DELL-Sam L
Moderator
•
7.6K Posts
0
August 26th, 2020 13:00
Hello remcoetten,
You can try rebooting the SP’s one at a time to see if that clears the fault. Seeing as the DG is a Raid 5, if both drives that failed are part of that DG then you will need to restore from backup. If you look at the SP collects check to see which drives failed and see which DG they are a part of. I am fairly certain that they might be part of the same DG & that is why it stopped rebuilding.
remcoetten
1 Rookie
•
5 Posts
0
August 26th, 2020 23:00
Hi Sam,
Thanks for confirming my idea about the rebooting the SP's one at a time. I will schedule this.
With regards to the degraded RG, well, that is just it. There are no failed drives, all drives are optimal. So I have no clue how to remedy this situation.
Regards, Remco
DELL-Sam L
Moderator
•
7.6K Posts
0
August 27th, 2020 08:00
Hello remcoetten,
I would reboot the SP’s first and see if the faults are still present. If the faults are still present, then I will need a fresh set of sp collects so that I can see what is going on.
remcoetten
1 Rookie
•
5 Posts
0
August 27th, 2020 23:00
Hi Sam,
Thanks for helping however, I'm a bit afraid what could happen with a RG in Degraded state and then rebooting the SP's. Can I send you a copy of the SPcollects first for you to have a look? This is a university and I need to be sure (as possible) that they do not sustain downtime.
Thanks!
Remco
DELL-Sam L
Moderator
•
7.6K Posts
0
August 28th, 2020 16:00
Hello remcoetten,
I will send you a private message so that you can send me the logs.