Start a Conversation

Solved!

Go to Solution

1 Rookie

 • 

42 Posts

43

August 28th, 2024 11:45

PowerEdge VRTX problem

Hello!

Strange problem - all information from the storage tab disappeared. After that, vCenter lost connection with all hosts. The connection was lost gradually, the connection to each host dropped one every 10 minutes. Thus, we lost control of the entire cluster in ~30 minutes. Never seen anything like this before.

But At the same time, everything seems to be working, the hosts are available, the storage seems to be available too.

Have no idea what's happened...Your response on this will be highly appreciated. Thank you!

Moderator

 • 

3.9K Posts

September 11th, 2024 08:53

Hi,

 

I assume 0B28470; from the internal part search and it returned HDD, 1.2TB, 512b, SAS6, 10K RPM, 2.5 inch, 64MB, Hitachi. I read the first post, you mentioned while the issue occurs, "everything seems to be working, the hosts are available, the storage seems to be available too.". And from the error that you provided, I will say it's an issue that CMC and SPERC unable to communicate.

 

Here's what I can suggest, is to reboot the CMC. Generally rebooting the CMC will not affect the server overall usage. But make sure there is no occuring updates in progress. If the issue persist, check for CMC firmware version, if it's outdated, you may need to update it. Then check SPERC's firmware too. 

Moderator

 • 

4.4K Posts

August 28th, 2024 17:18

Hello,

 

Was anything recently changed when this issue started?

 

Can you confirm the CMC firmware is up to date?

 

If you dual CMC, try failing over and see how it shows up:

 

CMC > Troubleshooting tab > Reset Components > Reset/Failover CMC

1 Rookie

 • 

42 Posts

September 5th, 2024 05:05

@DELL-Charles R​ 

Hello! thank you for your answer. We have only one CMC, so we can't do a failover...

But i have one thought - we changed 2 faulty disks to new ones, after which the following situation arose; on the weekend we changed the controller, it entered into operation, the storage field filled up, after which it showed that another disk was in the "faulty" state. VRTX worked for another day, after which the same situation arose again as in the previous post. Could it be the new disks? and if we remove them, can we restore normal operation of the system?

On the picture - after the controller change

Thank you

Moderator

 • 

3.9K Posts

September 5th, 2024 08:50

Hi,

 

When you encounter this issue, is there any CMC log trace showing "CTL96 RAID Controller in Chassis Slot 6 has entered safe mode with limited functionality"? - The logs will be shown in the CMC logs not the storage event logs page like the screenshot. 

 

Ref: https://dell.to/3XucyUq

 

The picture of the 1st post looks almost similar like the SPERC goes into safe mode. This can be caused by failed disk, pinned cache or SPERC cabling. I saw the controller count in your screenshot is showing only 1 SPERC, so I'm assuming there isn't 2 SPERC for fault-tolerant configuration. You mentioned there is a failed disk, have you resolved it by replacing it? 

1 Rookie

 • 

42 Posts

September 6th, 2024 05:32

@DELL-Joey C​ 

Hi,

No, the problem appeared just after replacing two faulty disks

Moderator

 • 

3.9K Posts

September 6th, 2024 09:42

Hi,

 

Have you checked in the CMC for the logs that I've mentioned on my previous comment?

 

When the other disk was in fault state, did you replace it? Can you provide us the replacement drive DPN#. 

 

 

1 Rookie

 • 

42 Posts

September 11th, 2024 06:23

@DELL-Joey C​ 

Hello. still checking logs

DPN of replacement drive is 0B28470

Thank you

1 Rookie

 • 

42 Posts

September 11th, 2024 06:31

@DELL-Joey C​ 

I didn't find such a message in the logs, only the one in the screenshot

No Events found!

Top