Start a Conversation

Solved!

Go to Solution

1 Rookie

 • 

43 Posts

4412

March 24th, 2022 03:00

EQL original replacement drive not approved

Hello Community,

ich believe to have scanned all the solutions posted so far related to this topic, but it looks like my case is different.
We have a PS EQL 6110X (FW V10.0.3) we still depend on in an unforseen demand situation.

Recently one of the drives was signalled "tripped".
We managed to shore up new original pristine drives in trays with the exact same specs as the failed drives.
Even Firmwarelevels match.
Only serial numbers and site code differ.

Still, inserting the drive will cause the array to classify the drive as "not approved".
Initially accepted with 2 green lights.
Then some seconds of drive activity. It looks like the array ia ctually dealing with the drive.
Then one of the LEDs turns orange, and the drive is rejected with message "not approved" and "not authorised".
Tried with swapping the slots with the other spare, but same result.

I tried to find additional options on the CLI to analyze the issue or convince the EQL of the certfied drive, but couldn't find anything beyound what "show" have to offer.

I did not yet try the failover approach, as i am not sure whether all of the applications can deal with the timeout, and the array is under continuous load. And also not knowing whether this would really contribute to solve the issue.

I'd be super grateful if anyone might have a good advise.

Best regards

frank

3 Apprentice

 • 

1.5K Posts

March 31st, 2022 22:00

Hello, 

 Sorry, no other option but to find a verified EQL drive.  The ID info is well established and I suspect these are not properly coded drives for EQL arrays. 

  Regards, 

Don

Moderator

 • 

2.8K Posts

March 24th, 2022 07:00

Hello, when you say the features of the drive you changed are the same, is it a drive with the same part number and firmware, if it is FW or non-certified part number, I think it may be cause an issue. There may also be a hardware failure related to the slot, but I give a low probability, I think it will be a long shot. Have you encountered HOF(history of failure) or any other error? Since I'm not very familiar with EQL storages. But we have users in our community who can help, I hope our users will contribute.

1 Rookie

 • 

43 Posts

March 24th, 2022 10:00

Hello Erman,

thank you very much for your quick reply, great!!

I can confirm the drives are as similar as it can get from my perspective.
Side by side one only can tell them appart by their serial number and site code.
Both (the "almost failed" original drive, shipped by DELL back then) and the new replacement drive are:

Certified Dell HDD
DELL Enterprise Plus
Certified by DELL
Savvio 10k.6
P/N: 9WH066-157
Model: ST900MM0006
DP/N: 0GKY31
Firmware: "LE08"
Product Code: ST900MM0006-DELL

All other 23 drives are what DELL populated the array with upon acquisition.

I hoped to sort out the slot consideration by swapping the slots with the other HotSpare (RAID6),
and i could verify that the still healthy 2nd hotspare operates without issues in both of the slots.
And inserting the tripped drive would also allow me to add it back to the raid, if i'd ever decide to ignore the warning of the GroupManager.
Therefore i believe it is more about the drive?
And from my guts feeling, the OS of the EQL may simply not to "know" that drive yet "enough"?

And the FW level (10) is from an upgrade substantially after the purchase of the array, so the FW of the drives should actually match.

I hoped for CLI magic to trigger some "throrough" check which would cause the array to reconsider the initial evaluation.

It would be great if someone else would possible have faced a similar issue, which may not be too unlikely, since our configuration is super simple and stock. Nothing fancy...

Thank you again

frank

Moderator

 • 

9.2K Posts

March 24th, 2022 11:00

Did the replacement drive come from Dell? If the drive is not a branded EQL drive it won’t work even if it is otherwise identical.

1 Rookie

 • 

43 Posts

March 24th, 2022 14:00

Hello Josh,

thank you for asking.
Our first stop actually was DELL directly, but they classified the array EOL and could not help us.
Of course also because we unfortunately don't have it under support any more.
Therefore i checked the market and found a US based reseller i highly trust.
The drives are surely original DELL certified/branded drives.
I can't tell the new and the old drives apart: label/sticker, frame, disk, ID's.
They also came sealed...
That's why i deem my problem to be different from what i found in this forum so far...

Thanks again,

frank

3 Apprentice

 • 

1.5K Posts

March 25th, 2022 10:00

Hello Frank, 

 Unfortunately, just a "Dell" drive will not work in an EQL SAN nor will an OEM drive of the same part number and firmware.   The drive is branded in such a way by the MFR, that the software recognizes it as an  Equallogic certified drive. The part number and firmware are not the issue.  A drive from a Dell server or other SAN won't work either.  They have to be specifically for Equallogic SANs.  There is no way to make a non-EQL drive work either. 

 Your supplier must get you a proper EQL drive.   

 Regards,

Don

1 Rookie

 • 

43 Posts

March 25th, 2022 12:00

Hello Don,

thank you for offering help!

i believe(d) that can be assumed, since we doublechecked with the supplier running up to the purchase that these are not just Poweredge drives, but EQL ones.  Actually they rejected my initial choice (which were PowerEdge drives, as it then turned out) and later replied after having found the EQL ones.

If i look up the "DP/N: 0GKY31" on the net, it also refers to EQL drives.
And the replacement drives are exactly the same drives as the ones DELL sold us the entire array with back then (photo attached:
right drive replacement,
left drive tripped one).


All my other 23 drives look exactly the same...

I wouldn't now be able to find another criterion to select a drive by, should we now have to haunt for other candidates...
Isn't it possible that this is a genuine drive, but the array is not recognising it for some reason?

I am asking since according to what i find on the net, sometimes it took a failover or similar activities to nudge the array to eventually accept the drive. But that is s.th. i'd prefer to postpone as long as possible, since we have multiple long running jobs using the array. And since most of the applications are custom programmed for research applications, i wouldn't put my bets on their capability to properly deal with the latency the failover will cause...

Still hoping...

Best regards,

frank

 

replacement drive next to original drivereplacement drive next to original drive

3 Apprentice

 • 

1.5K Posts

March 25th, 2022 22:00

Hello, 

 I have a case where a failover did resolve that issue.  I have also seen an EQL drive mislabeled when it didn't have the proper internal configuration. 

 Regards, 

Don

 

1 Rookie

 • 

43 Posts

March 26th, 2022 03:00

Hello Don,

thank you very much again.

@proper internal configuration.
It would be great if i mght have missed s.th. i can catch up with now...
Is there any configurative aspect i might check before i take the failover route?
CLI is welcome.
Currently, the config is just one pool, one group, RAID10, no cluster.
Networking is HA, verbatim according to DELL documentation.

If failover would be all is left, and i wouldn't be able to agree a downtime in time, i'd have to make sure to have disk timeouts at 60 and then "restart" the controller, i.e. removing network pathes (which i'd prefer) wouldn't trigger the failover, as far as i know?

Best regards

 

frank

3 Apprentice

 • 

1.5K Posts

March 26th, 2022 19:00

Hello, 

   re: Removing network paths. No, that does not cause a controller failover.  If it did, it would be same as restarting the active controller.  If you do the restart in a low IO period the failover will typically take way less that 60 seconds.  Typically under 30 seconds.  

   However, something to consider.  If your servers are not prepared to handle a planned failover, they are also not ready to handle and unplanned failure.  Also, for Windows, setting the disktimeout requires a reboot of the server too in order for the change to take effect. 

  There is nothing at the CLI that I could give you to help force the drive into the RAIDset. 

 Regards, 

Don

1 Rookie

 • 

43 Posts

March 27th, 2022 07:00

Hello Don,

thank you for pointing that out!
The servers/virtualisation (LINUX) actually are.
But the applications, in particular the database backed ones, are my concern.
Better safe then sorry. I'll try the reboot as soon as i have agreed a time window with the application users, hopefully soon.
And will be back then.

Regards,

frank

3 Apprentice

 • 

1.5K Posts

March 27th, 2022 14:00

Hello Frank, 

 What virtualization are you using?  QEMU/KVM/Procmoc?  

  The Hypervisor must also have the timeouts set  But you can do that live with Linux.  The VMs also need the timeout set as well. 

 If you have access to eqlsupport.dell.com located with the firmware downloads is the "OS Considerations" guide.  That has how to set the timeouts for Wndows and Linux 

 Good luck!  Please do let us know how this goes. 

  Regards, 

Don

1 Rookie

 • 

43 Posts

March 28th, 2022 02:00

Hello Don,

thank you for your helpful input.

I have that "Dell PS Series Storage Arrays iSCSI Initiator
and Operating System Considerations" still around, and it's a great documentation.

It's KVM for us, but effectively we have to deal with vdsm when customising the iSCSI-timeout on the hypervisors, by adjusting the multipathing config. The timeout can be increased by raising the count of attempts of reconnections.
The VM's OSes are adapted in their udev facilities and the iscsid config according to what the HIT would have applied, which we don't use.
That worked so far for the hypervisor and the VMs, but on the application level we had different experiences for specific applications.

Therefore...

Best

frank

3 Apprentice

 • 

1.5K Posts

March 28th, 2022 08:00

Hello, 

  Interesting!   Of course an application can have its own timeout, though most reply on the OS to report the error to it. 

  Regards, 

Don

1 Rookie

 • 

43 Posts

March 31st, 2022 09:00

Hello Don,

ahh, good to hear. That also was my expectation and understanding back then when we had an issue and one of our application's took a hit.
My reasoning was that maybe the latency was still too big.

But maybe this was just coincidence.

The developer higly recommends to put it's datastore only on local storage and to avoid any shared storage because of the sensitivity to latencies.
One must admit that this is very heavy database activity, writing genome sequencing data...

That's why i decided to resort to extra precaution whenver opportunities allow it for this specific application...

 

I just wanted to update that yesterday i finally had a timewindow, but unfortunately the failover did not change the landscape.

In the meantime i also tried the other drives as well, and they are all rejected, following the same pattern.
1) Plugging them in will turn on both LEDs to green.
2) Then a series of disk activity events occur, for about 10-20 seconds,
3) Eventually the right LED turns orange, while the left stays green. The disk is listed as "not approved" then in the drive list.

Not sure whether this leaves some other options on the table?

Best regards

frank

No Events found!

Top