Unsolved
25 Posts
0
1702
March 24th, 2021 05:00
type 15 controller failure on ps6210xs
2 different messages:
" Panic recovery from CPU0 with reason 'xlp_ddr_ecc_disable_error: DDR correctable threshold exceeded on channel 1, cause failover'. "
" NVRAM contains valid data. This is a PANIC RECOVERY due to a panic on a NetBSD processor."
So last night I on purpose failed over to the secondary controller to change cache battery pack in primary. That failover went great. I wasn't at work, but i was going to replace the battery today (now that primary has become the secondary controller, prefect)
fast forward 1 hour (I'm now asleep), and I boom my PS6210xs fails back to the other controller. The internet has ZERO results on this.. I have no contract with Dell (yes, THIS is why you have a contract). I'm going to buy a controller as I think it failed due to bad memory. Can anyone share if they have had a similar experience?
we are now running on the original controller but battery is depleted so we are in write-through mode, performance seems fine.



dwilliam62
4 Operator
•
1.5K Posts
0
March 24th, 2021 07:00
hello,
If you can take an outage you could swap the good battery from the bad controller to have one 100% good CM until the other one arrives.
Good luck.
Regards,
Don
dwilliam62
4 Operator
•
1.5K Posts
0
March 24th, 2021 07:00
Hello,
My best guess is yes you have bad memory in that controller. What is the FW version on that array?
Re: Battery. Is it debleated or giving you a warning? Usually after the 90% threshold warning the battery has at least 3 months left. Usually when they are at 0% the controller won't boot up fully.
Regards,
Don
jordanl17
25 Posts
0
March 24th, 2021 07:00
I had no battery complaints on the secondary after putting the fresh battery in about 1 month ago. firmware is 9.1.7. I know there is newer, but it will hard for me to get. after failback the secondary controller still reports battery is good. I'm with you, bad ram/controller. current primary controller has depleted battery and is in write-through mode. replacement controller for secondary will be here in a few days.
ALSO, looking in the Monitoring log I do see, starting about 1 min after restarting to secondary, about 2x per minute "28.2.166 A correctable error has been detected on controller in slot 0 ." then looks like it finally gave up about an hour later and switched back to the good controller.
dwilliam62
4 Operator
•
1.5K Posts
0
March 24th, 2021 07:00
Hello,
Yes, make sure you swap the SD cards from the old to the new. The FIRMWARE is on the SD but NOT the configuration. that would be a horrible design that other vendors in the past have done. Bad SD cards lead to data loss. The configuration is stored as part of the RAIDset. So it is more protected.
Regards,
Don
jordanl17
25 Posts
0
March 24th, 2021 07:00
the other controller is arriving tomorrow. I think I should be good, right?? when replacement controller arrives I will:
pull out bad CM, put it's fresh battery (from bad CM) in replacement CM and swap microSD from bad CM to replacement CM, then install replacement CM in array. that's the correct procedure? firmware and config are on microSD?
jordanl17
25 Posts
0
March 25th, 2021 09:00
ok, so replacement secondary controller is in. (swapped good battery and microSD from old/bad controller). put the controller in, it seems healthy in the GUI. BUT... I had forgotten to put the SFPs in until after the controller was booted up. is that ok??? seems like there's no way to confirm if they are healthy. do SFPs have to be in before bootup?! should I just pull it out, wait 30sec, put it back in? (or don't bother, it's fine?) this is the paranoid part of me coming out...
dwilliam62
4 Operator
•
1.5K Posts
0
March 25th, 2021 10:00
Hello,
Re: REV. No, not at all.
You are very welcome. I am glad I could assist you.
Regards,
Don
dwilliam62
4 Operator
•
1.5K Posts
0
March 25th, 2021 10:00
Hello,
The SFTP+ should be hot swappable. So to confirm your passive has not network connections? Or are you also using the copper connectors ?
Regards,
Don
jordanl17
25 Posts
0
March 25th, 2021 10:00
and do I care that the replacement CM is rev. A00, (vs. current Primary CM ver A02 ?)
thanks Don
jordanl17
25 Posts
0
March 25th, 2021 10:00
Both controllers have SFP adapters and are using 10gb fiber into 2 Arista switches.
jordanl17
25 Posts
0
March 26th, 2021 05:00
just to wrap this up: I "rebooted" over to the replacement CM last night, went smoothly. I monitored for the "recoverable errors"... none. All good, like we expected just a bad CM (or just bad ram). Today I'll go in and put the fresh battery in the current CM. that will wrap up the most dramatic cache battery change ever. ha! thanks Don, your answers here are priceless and help keep Dell customers.
dwilliam62
4 Operator
•
1.5K Posts
0
March 26th, 2021 07:00
Hello,
I am so glad it's all worked out! You are so very welcome and thank you for your kind words. I truly enjoy helping out here. I've done so for about 15 years and it's not part of my job either.
I hope you have a great weekend!
Regards,
Don