Unsolved

1 Rookie

 • 

12 Posts

1776

December 22nd, 2021 08:00

PS6100 offline after changing controller

Hello,

 

Our PS6100 had a bad controller battery and it was simpler to just replace the controller then mess around with batteries (price point).

 

When replacing the controller, we forgot to switch the SD card from the older one.  Once we failed service to the new controller, the unit went offline.

 

We removed the new controller.  Tried to boot with the original controller, the backup controller but the unit remains offline.

 

We are able to connect through serial but are limited in knowledge and basic actions cannot be performed due to device initializing status.

 

Unfortunately, this SAN is out of warranty.  I called Dell to see if I could get anywhere but I have to spend a 1 time fee for an engineer, which I won't be able to get approved at this time of the year (holidays).

What would be the procedure from here to restore service on this SAN?

 

---

 

ATTENTION!
A critical health condition exists on the array. User intervention is required.
Please call your support provider.
Once the problem is repaired the array MUST be rebooted.

 

CLI> member show
The storage array is still initializing. Limited commands will be available until the initialization is complete. Please try again later.

 

Running commands on the second controller results in This operation is only available on the active control module.

 

 

4 Operator

 • 

1.5K Posts

December 22nd, 2021 10:00

Hello, 

  What version of the EQL FW is that member running?  What version did the replacement CM boot?   It's critical to swap the SD cards to maintain the proper FW version.   It's possible that damage to the logs has resulted.  Which can only be repaired by Dell Engineering. 

 At the CLI>  run  support exec sh 

 At the # prompt type: 

#raidtool 

#uname -a 

#echo hs | ecli

#diskview -j 

#cord -b 

Please paste the TEXT output, not a screen capture of the output.   

Regards, 

Don 

 

4 Operator

 • 

1.5K Posts

December 22nd, 2021 12:00

Hello 

 OK.  Again please post as text as attached images are delayed before they are able to be seen. 

 Is this array backed up?   

 You may be in a lost or invalid cache condition. 

 Regards, 

Don 

1 Rookie

 • 

12 Posts

December 22nd, 2021 12:00

I have to go on site to connect a system on the EQL, so results will come in tomorrow morning.

 

Firmware on the current CM is 10.0.3 and the replacement CM had 9.1.9

1 Rookie

 • 

12 Posts

December 23rd, 2021 06:00

The array is not backed up

 

 

ATTENTION!
A critical health condition exists on the array. User intervention is required.
Please call your support provider.
Once the problem is repaired the array MUST be rebooted.

PS Series Storage Arrays
Unauthorized Access Prohibited

login: grpadmin
Password:
Last login: Wed Dec 22 08:48:20 2021 on console


Welcome to Group Manager

Copyright 2001-2019 Dell Inc.

 

The storage array is still initializing. Limited commands will be available until the initialization is complete. Please try again later.

CLI> 790128:3940:netmgtd:23-Dec-2021 08:58:00.423940:rcc_util.c:1026:INFO::25.2.9:CLI: Login to account grpadmin succeeded, using local authentication. User privilege is group-admin.

CLI> support exec sh
You are running a support command, which is normally restricted to PS Series Tec
hnical Support personnel. Do not use a support command without instruction from
Technical Support.
# raidtool
Driver Status: *Admin Intervention Requested*
RAID LUN 0 Ok.
raid status unrecoverable.
12 Drives (20,2,4,6,8,1,3,5,7,15,10,0)
RAID 6 (64KB sectPerSU)
Capacity 5,761,185,873,920 bytes
RAID LUN 1 Ok.
raid status unrecoverable.
11 Drives (9,13,14,23,16,17,18,19,12,21,22)
RAID 6 (64KB sectPerSU)
Capacity 5,185,067,286,528 bytes
Available Drives List: 11
# uname -a
NetBSD 5.0_STABLE NetBSD 5.0_STABLE (EQL.PSS) #0: Wed Jun 19 00:54:06 EDT 2019 build@m64:/buildarea/V10.0.3__Wed_Jun_19_2019_00_40_06_EDT/bin/destdir.sbmips.64.release/EQL.PSS.64 sbmips
# echo hs | ecli
ecli> Health Status (0x8000000000000008/0x0000000000000020): RED Conditions:
CACHE_SYNCING_CONDITION
RAID_LOST_CACHE_CONDITION
ecli> # diskview -j
Enc/Drive State Write Read Power Drive Bad ForceWrite Reset Read Scan Max Max
Retrys Retrys Cycles Timeouts Blocks Retrys Fail Timeout Errors Cominits HrstMsecs
______________________________________________________________________________________________________________________
0/ 0 Online 0 0 0 2 4 0 0 0 0 0 0
0/ 1 Online 0 0 0 0 65 0 0 0 0 0 0
0/ 2 Online 0 0 0 0 16 0 0 0 0 0 0
0/ 3 Online 0 0 0 0 10 0 0 0 0 0 0
0/ 4 Online 0 0 0 0 0 0 0 0 0 0 0
0/ 5 Online 0 0 0 0 12 0 0 0 0 0 0
0/ 6 Online 0 0 0 0 2 0 0 0 0 0 0
0/ 7 Online 0 0 0 0 4 0 0 0 0 0 0
0/ 8 Online 0 0 0 0 14 0 0 0 0 0 0
0/ 9 Online 0 0 0 0 2 0 0 0 0 0 0
0/10 Online 0 0 0 0 5 0 0 0 0 0 0
0/11 Online 0 0 0 0 13 0 0 0 0 0 0
0/12 Online 0 0 0 0 1 0 0 0 0 0 0
0/13 Online 0 0 0 0 26 0 0 0 0 0 0
0/14 Online 0 0 0 0 15 0 0 0 0 0 0
0/15 Online 1 0 0 0 0 0 0 0 0 0 0
0/16 Online 4 0 0 0 20 0 0 0 0 0 0
0/17 Online 4 0 0 0 19 0 0 0 0 0 0
0/18 Online 0 0 0 0 18 0 0 0 0 0 0
0/19 Online 0 1 0 0 22 0 0 0 0 0 0
0/20 Online 0 0 0 0 37 0 0 0 0 0 0
0/21 Online 0 0 2 12 28 0 2 0 0 0 0
0/22 Online 0 0 0 0 0 0 0 0 0 0 0
0/23 Online 0 0 0 0 0 0 0 0 0 0 0
# cord -b
B2B=> (3) channel Up, Active CM.
=> RCP registers(8,10,18,20,68,90): 0x23 0x48 0x07 0x02 0x00 0x80.
=> B2B Requests: Queued: 64207, Pending: 0, Completed: 64207, Max Pending: 2.
=> B2B Driver: Interrupts: 3, DM: 0
Sig1 Sent: 1, Sig2 Sent: 1, Sig1 Recv: 0, Sig2 Recv: 1
Total Bytes: 132,874,972
Total Xfer Time: 4,538,457 usecs
Rate ====> 29,277,565 Bytes/Sec.
10g 0 Max Latency: 63,984 usecs
10g 0 Min Latency: 16 usecs
10g 0 Avg Latency: 70 usecs
10g small Total I/Os: 64,216
10g small Total Bytes: 2,055,004
10g large Total I/Os: 64,207
10g large Total Bytes: 132,874,972
10g receive CRC errors: 0
10g send small retry count: 0
10g send large retry count: 0
10g small dup count: 0
10g mis-directed count: 0
CORD=> (6) Comm State: ESTABLISHED_CORD_SYNC.
#

4 Operator

 • 

1.5K Posts

December 23rd, 2021 09:00

Hello, 

 Is there only one controller installed at the moment?  Do you have the original CM that had the battery warning?  Was the battery on that controller in a FAILED state? 

 If that battery had not actually failed yet, then it's possible the missing cache is in that controller.  Probably a long shot.  You would have to power off the array.  Put that controller in. Pull the one that is active now out a an inch and power up.  Connect the serial cable to that controller.  Make sure the 10.0.3 sd card is installed in that CM!   If it works then the array will boot up and you can install the other controller.  Then do a proper shutdown.  Pull out the bad controller, swap the SD card to your replacement controller and boot back up.  I would inspect the replacement CM cache battery.  Look for an expiration date.  That replacement CM could also have an old battery. that may also expire before too long. 

 If the array does not come up, OR that battery had actually expired then there is something else you can do.  It is a command that will tell the array to disregard the missing cache data.  HOWEVER, this could result in lost data or at worst the missing data is in the block allocation database.  Which would also prevent the array from booting and would need to be rebuilt by Dell engineering at a cost. 

 The command at the CLI> prompt, not the # is clearlostdata   It's discussed in the CLI guide 

Good luck! 

Regards, 

Don 

 

1 Rookie

 • 

12 Posts

December 23rd, 2021 13:00

The controller the commands were ran from is the working controller.  The second controller is also in and it is the one with the batterie issue.  

 

I think I tried removing the good controller and only booring from the one with the battery issue but it remained in secondary state and would not let me run commands on it.  Will try this again and see.

 

As for the Data, this is a backup unit so Dataloss is not a major concern but the rebuilding part is

4 Operator

 • 

1.5K Posts

December 23rd, 2021 17:00

Hello, 

 If the battery on the other controller has failed then it won't become the active controller.  

 If worse case, the allocation log is damaged you can reset that array back to factory default and recreate it.  So, trying the clealostdata command would be my next suggested step..  

 Regards, 

Don 

1 Rookie

 • 

12 Posts

December 24th, 2021 05:00

Long shot here,  If I replace the battery on the controler with the one that came with the new controller, would that let the controller complete its switch and resolve the issue?

 

Else I wil attempt clealostdata on the primary

4 Operator

 • 

1.5K Posts

December 24th, 2021 10:00

Hello, 

 I don't believe so no.  If the battery has failed then it would not have been in sync with the primary so it would not have had the cache.  

 Regards,

Don

1 Rookie

 • 

2 Posts

July 11th, 2025 04:28

Hi all,

I know this may be a very old post, but is sounds I am in the exact same situation as DaniJ1981 . Then I’d kindly like to ask how did he resolved the issue at the end (if fully resolved). If also anyone can provide me any good directions, in order to avoid data loss.

I have already tried to contact the Dell support, but is like a week now of keep trying calling and emailing them, without receiving the real support I need.

any help will be really appreciated.

thank you!

Moderator

 • 

4K Posts

July 11th, 2025 07:58

Hi,

 

As mentioned by Don, if the failed controller has been replaced but the SD card is not being able to synchronize at boot, support engineering need to intervene to check the issue, as you mentioned that you would like to avoid data loss. You mentioned that you tried calling but unable to receive support, is there any particular reason? Have you also tried the chat function?

 

If you are waiting for DaniJ1981's possible resolution, we can wait for his reply to help your situation.

1 Rookie

 • 

2 Posts

July 11th, 2025 10:46

the only two ways I can see on their DELL support website is via online Form or by the phone (where you will die before they answer you... Lol :-P ).

At this point let's see what DaniJ1981 may answer/post.

Thank you all for your patients and time

No Events found!

Top