This post is more than 5 years old
10 Posts
0
1256
October 2nd, 2013 02:00
CX700 rejecting (proven good) disks
Hi there,
we have an old CX700 running Flare 26 and I have two faulted disks the replacement of which the system consequently rejects.
Candidate #1 is a 320GB ATA drive in a DAE2-ATA (EMC p/n 005047825)
We have about 10 spare disks of thiese (Model 5A30J0 CLAR320, TLA p/n 005048012) which work perfectly in another DAE2-ATA. I replace the disk and it starts to come up with green and amber LED first and then after secondes, only amber remains.
Candidate #2 is in Enclosure 0_0, disk 7, a 73GB 15k FC disk (Model ST373454 CLAR72, TLA p/n 005048600) in a standard DAE (EMC p/n 100-560-130). I tried about 20 disks which work perfectly in other DAEs - but when inserting the fresh disk, it comes up green and amber, amber goes off, green is flashing, amber comes up and green goes off again.
I rebooted the SPs but nothing seems to cure this.
Candidate #1 is bound to a normal user LUN in a RAID5 Raid group
Candidate # is bound to a private RAID5 LUN, Write Intent Log).
Is there anything I missed or can try?
Thanks!
-Chris
kelleg
4.5K Posts
0
October 8th, 2013 14:00
Every once in a while we run across a slot that goes bad. This might be your problem. This would require a replacement of the DAE to resolve.
glen
kelleg
4.5K Posts
0
October 3rd, 2013 10:00
What is the part number of the failed disks? Do you have any open slots in the DAE where you could put the replacement disks? If so, do they spin up correctly? If they do, could you try making them a Hot Spare to see if they will bind correctly?
For each disk that is failed, in Navisphere, can you see if the hot spare is engaged. Only one disk for each raid group? No double faults?
It's possible that the slot is disabled by software. In the pass we've fixed this by pulling the failed disk then powering off that DAE, then power on the DAE and when it's up, inserting the new disk.
glen
cdiedrich
10 Posts
0
October 3rd, 2013 22:00
Hi Glen,
thanks for your ideas.
Unfortunately, there are no slots available in any DAE :-(
For the faulted ATA drive (part no 005047825) a hot spare kicked in, disk 14 in the same DAE.
For the faulted FC drive (part no 100-560-130) no hot spare came up - even though we have 10 hot spares of this type in the system.
I will give it a try with powering down the DAE2-ATA which sounds like making sense to me.
How does this affect other DAEs on the same bus, linked to this DAE (it's 1_1, and there are two more DAEs after this on the bus). Should I first bypass the DAE on at least one bus so we do not lose connection to the subsequent DAEs?
I'm a bit afraid of powering down DAE 0_0 since it's the System DAE...
Is there any way to get this recovered without shutting down the whole system? Shutting down the whole CX700 would stop our whole company from working for at least one or two hours...
In terms of configuring the faulted disk(slot?) as a hot spare I must admit that I'm not familiar with reconfiguring RAID groups.
Is this possible without destroying the RAID group (and hence data loss)?
Thanks!
-Chris
kelleg
4.5K Posts
0
October 4th, 2013 14:00
The part number 100-560-130 is for the DAE. It's possible that the failed disk does not have a compatible replacement.
glen
cdiedrich
10 Posts
0
October 5th, 2013 05:00
Hi Glen,
sorry, you're right. The disks part no is 005047825.
Except the serial number, the labels on the disks (good and bad) match identically.:(
Thanks,
-Chris
Edit: Hat to swap another ATA disk yesterday in another DAE2-ATA. The disk rejected by the bespoke DAE2-ATA works perfectly in the other. The other DAE2-ATA has the same Part no as the suspect.
kelleg
4.5K Posts
0
October 9th, 2013 07:00
If you have enough free space on the array, you might consider creating a new RG and use LUN Migration to move the LUNs out of the faulted raid group to other raid groups. Then you could destroy the faulted raid group and using the good disks for other uses, leaving the one slot empty until you can get it fixed or replaced.
glen
cdiedrich
10 Posts
0
October 9th, 2013 07:00
Thanks, Glen.
I will see to get hold of one...
Cheers
-Chris
cdiedrich
10 Posts
0
October 31st, 2013 05:00
After spending many hours on finding the reason we all agreed that it's just two dead slots in the two enclosures.
Luckily we got hold of a DAE2-ATA in mint condition which I will add these days and migrate the LUN.
We will ignore the dead slot in the DAE 0_0. The LUN with the dead disk was a heritage from using MirrorView so I was able to deallocate the Write Intent Log and then unbinding the LUN and destroying the RAID group.
Now we have some more hot spares in the system and everything is fine again.
Thanks for your help.
-Chris