Unsolved
This post is more than 5 years old
3 Posts
0
5136
June 16th, 2014 08:00
Known Good Disk Status: Empty
Our team has been having issues drives on a DAE3P enclosure. There were 15 drives in the enclosure, so it was full. All were operating correctly. The other day, one drive went down. We assumed it had gone bad. Luckily, all 15 drives were in a RAID5 config, so the raid group could survive a drive failure. However, when we tried to replace the bad drive, all of our spares would enter the "Powering Up" status, then immediately go to the "Removed" faulted status. This happened on 6 drives that we tried to replace it with. We decided to try and test some of the drives, to see if we actually had that many bad drives. During the testing process, one of the known good drives from the raid group was removed (we didn't mind losing the data in the raid group). Later, after some testing that seemed to confirm that all 6 of our off-the-shelf spares were bad, when we tried to put the known good drive back in, it started to reporting "Removed" as well.
Since we couldn't get the enclosure to re-accept the drive, we removed all LUNs and destroyed the RAID group. Navisphere now reports that there are 13 good drives in the enclosure, and 2 empty slots. If we put in one of the drives (either the spares or the one that was known good before) it will do the "Powering Up" status, then drop to "Empty" status, even though there is a drive in the slot.
We have tried rebooting/power-cycling the SP and the enclosure with no success at the enclosure accepting the drives.
Any advice on what might be the issue here? While it is possible that we did get a bad batch of drives, with that many bad drives, and also possible that the "known good" drive went bad during our testing, it still seems suspect. We suspect there might some configuration issue that is preventing the enclosure from re-accepting these drives for some reason.
AnkitMehta
1.4K Posts
0
June 17th, 2014 01:00
The issue which you are reporting requires a deep SPCollects Analysis. I recommend to log a Support Ticket with EMC Technical Support for accurate troubleshooting stake. I do not advise to powercycle w/o a consent of Support. [I know its the oldest trick in the books but its better to be safe than sorry]
kelleg
4.5K Posts
0
June 17th, 2014 12:00
First, what type of array is this and what is the flare code version. There are some old flare issues with drives powering off or the slots getting put into a "off" state.
Try this first. Log into Navisphere/Unisphere using the SPA IP address then in a new browser, log into SPB. Check the DAE status from both SP's.
If the flare version is old, I'd recommend upgrading, but before that try putting in a known good disk into one of the slots marked empty, then power cycle the DAE (remove the power on both sides at same time). If the DAE does not come up with the drive powered up, then try the same thing, but reboot the SP's (SPA and SPB) one at a time with the DAE on.
If that does not work, then try upgrading the flare or open a service request with EMC.
glen
dwoo1
3 Posts
0
June 18th, 2014 05:00
We have 2 enclosures, but we don't really use the second one or it's disks for anything. That enclosure has SATA drives in it. The enclosure in question has fiber channel drives. It had all 15 slots full of functioning drives, until the one went bad, and a second went bad during our attempts to diagnose the issue.
Currently, both enclosures show a red F, even though none of the sub-components have a red F. Both enclosures have some empty slots (though none of the slots are ACTUALLY empty on the fibre channel enclosure; 2 of them are falsely reporting they are empty because they contain suspect drives), so maybe the gray E on the disk makes the enclosure report a red F?
This is a CX3-10C with 2 DAE3P enclosures.
We checked the slots with known good disks before and they seemed to work fine, so I don't think it's the slots.
Where is it that I check the flare version?
kelleg
4.5K Posts
0
June 19th, 2014 14:00
In Navisphere, right click on the Array icon and select Properties, then select the Software tab. The Flare Operating-Environment will display the flare version running on the array. It should be something like 03.26.020.005.025. The Major Version is the second number - in this case it's version 26. The third number will be the model number 020 is the CX3-20. The 4th number is the release - 005 means it's released and the last number is the update number (patch level) - 025.
glen
dwoo1
3 Posts
0
June 20th, 2014 04:00
It is 03.26.010.5.020
kelleg
4.5K Posts
0
June 24th, 2014 09:00
The most current flare is 03.26.010.5.032 - you might want to consider upgrading. I would suggest restarting the Management Server on both SPA and SPB - sometimes the GUI interface gets confused. See brn_lewisk inHow to restart the Management Server
See if the display clears up after the restart.
You could also use the CLI commands to get the status for each disk:
naviseccli -h IP_Address_SPx getdisk X_X_X where X_X_X is Bus_Enclosure_Slot - ie. 0_0_0 is the Bus 0 Enc 0 disk 0
glen
1 Attachment
All_CLARiiON_Disk_and_FLARE_OE_Matrices.pdf
lalvaradoh
12 Posts
0
January 27th, 2016 14:00
Hello all.
Have you solve it? I got a similar problem. I got 2 FC fault disks... then i put in 2 new different model disks and the state change from "removed" (old fault disks) to "requested bypass" so I destroyed the LUNs and created new arrays with two disks less. Now I got two new disks (same model than the originals that came with it's clariion trays) so I'd like to expand arrays and LUNs but I can't. When I plug in the new disks in the empty slots the state of it goes to "powering up" but then just shows "empty".
I've reseted the management in /setup and /setup... but it still shows empty.
Clariion cx4-120
Thank you so much for your replies.
kelleg
4.5K Posts
1
January 28th, 2016 11:00
You'll need to check that the new disks you added are supported - check the link below and location the CX4 section to find the type/model of disks you're adding
https://support.emc.com/docu42949_All_VNX_CLARiiON_Celerra_Storage_Systems_Disk_and_FLARE_OE_Matrices.pdf?language=en_US
glen
lalvaradoh
12 Posts
0
January 28th, 2016 12:00
Thank you Glen.
In fact it's included in the list and the minimum flare is lower than the actual installed flare.
It's not listed with the same part number than labeled but I've found that it has alternative part numbers wich are compatibles. it's supposed to be the same as 005048848.
The new disks I'm trying to spin up: Seagate cheetah 15k.6, model number ST3300656FCV, part number 9CH007-031, firmware HC08, bar code 118032600-A01. If you take a look in eBay you can found it searching: 141861998068.
Any thoughts?
Thank you so much.
ZaphodB
195 Posts
1
January 29th, 2016 09:00
The drives you purchased may have been low level formatted back to 512byte blocks...
Perhaps contact the seller and clarify with them that you are going to use them in an EMC array, and ask if they had been wiped.
Otherwise, look for a different reseller perhaps.
lalvaradoh
12 Posts
0
January 29th, 2016 10:00
Thank you! I will contact seller today.
lalvaradoh
12 Posts
0
February 24th, 2016 14:00
Hello everyone.
I asked the seller and told me he never formatted disks, he took it from a working storage. However he refund my money and invited me to let him know if I solve it and share how.
Now... is there a possibility that disks would be ok and the problem were the DAE? It shows a fault sign (SLOT 0 ENCLOSURE 0: FAULTED). Disks has no F marks. Before I though It was because of unrecognized disks (E marked) but when I unplugged it it still remains faulted.
Any comment?
PS:I also have "power/cooling module A1" faulted and "enclosure SPE SPS B" faulted too.
Thank you so much.