Unsolved
3 Posts
0
1363
April 15th, 2021 16:00
SAN Array member stuck in restart 0%
Hi,
I'm new to SAN. Yesterday I restarted a member SAN and it's stuck at restart (stays 0%) for more than a day. The member has is configured as raid 5 and has 3 failed drives. And this member stays offline which means all my volumes are offline as well. What can I do for this member to be up online again?
Is replacing the failed SAN the only way? Can I wipe the member and reconstruct with less storage space? I don't want to affect other members.
Thanks,
Jason
No Events found!



DELL-Sam L
Moderator
•
7.7K Posts
0
April 16th, 2021 10:00
Hello Jason,
Which equallogic systems do you have? What is the current versions of firmware that is on your systems? How many members are in the same group?
dwilliam62
4 Operator
•
1.5K Posts
0
April 17th, 2021 13:00
Hello,
If the volumes are offline then you must have multiple members in the same pool. If you reset that member array the data from that member will be lost and those volumes will never recover. So you would loose all the data they have in common. When you have multiple PS arrays in the same pool, the data is STRIPED, RAID0 across them. There is no redundancy. Unless you have a backup of all your data, you will need to get that member back online.
RAID5 being more vulnerable to double faulting is why it's no longer recommended for production storage. It was removed from the GUI as an option as well.
At the CLI of the member that has the problem, please run:
GrpName>support exec "raidtool"
GrpName>support exec "uname -a"
GrpName>show members
If the RAIDset is faulted beyond recovery then you will likely have to send the drives out for recovery to a 3rd party company. They will try to clone them and hopefully allow a rebuild to finish.
Re: Smaller capacity. The only supported configurations for EQL arrays are 1/2 populated and fully populated drive configurations.
Note: Depending on the model array you will likely have to purchased EQL qualified drives. They are made differently from OEM or other Dell drives. If you don't they will not be recognized by the array and not usable.
Regards,
Don
JasonCheung
3 Posts
0
May 6th, 2021 11:00
Hi,
I'm using 1 each of PS 5000E, PS5000x, and PS6000. Firmware is v7.1.8 (R414953). Each of these is a member of the group.
Thanks,
Jason
dwilliam62
4 Operator
•
1.5K Posts
0
May 6th, 2021 12:00
Hello Jason,
There's a lot to unpack here.
The will stay degraded for as long as all the remaining drives stay healthy. If either the R6 or R10 arrays with failed drives go down, you will lose access to all volumes as they striped (RAID0) across the members. In the case of R6 you are out of parity so the next drive that fails will take that member down. On the R10 you have two drives without a mirror pair. So if either of those two fail that member goes offline. To put it bluntly you are on borrowed time. Replacing those failed drives is a priority. Rebuilds are disk intensive so you might have another drive fail during the process. Backing up your data now is priority one.
Rebuild time depends on the size, speed of the drives and IO load. The lower the IO load the better. RAID6 being dual parity will take longer. RAID10 is pretty quick especially with 10K or 15K RPM drives.
Re: Shrink That is not possible. You can't reduce the size nor select the number of spares. Spares are set by RAID policy. RAID6 is one spare, RAID50 and R10 are two spares.
Re: Drives. They have to be EQL specific drives. A standard OEM drive, or Dell non-EQL drive will NOT work in a PS series array. The drives are built specifically for them. Non-EQL drives will be rejected by the firmware.
Regards,
Don
JasonCheung
3 Posts
0
May 6th, 2021 12:00
Hi,
Thanks Dell-Sam L and dwillian62 for the help. Now reading my question it doesn't really make sense. I guess I was rushing to finish things off before my vacation.
I turned them off during my vacation. Now (2 weeks later) after powering up all 3 members are miraculously online and all my volumes are up and accessible. Now I have to deal with the RAID degrade issue due to the failed hdd's.
Yes there are 3 members and 5 volumes. All volumes are spread over all 3 members. It's sold as-is from my company instead of going to recycling. They don't provided after-sales support.
I actually got it wrong it's a mix of RAID 6 (2x) and RAID 10 (1x) setups. And there are 3 failed hdds in 1 member (RAID 6) and 2 in another (RAID 10).
I have quite some data in the hdd's, mostly deployment images and virtual machines, that I play with personally but would like to keep them over transferring or even losing.
So my questions regarding the failed hdd's are:
I know some of my thoughts could be silly or won't exist in SAN world but appreciate any comments / suggestions.
Thanks,
Jason