Unsolved

Closed

1 Rookie

 • 

11 Posts

661

June 6th, 2023 08:00

MD3600i - Power supplies goes offline

MD3600i
- dual ISCSI Controllers
- dual Power supplies
FW: 08.20.24.60
PS PN 06N7YJA01PSU
PS FW: 01.00.14

Happened twice so far in last 2 weeks

When this happens, the ISCSI controllers do not respond (to pings or iscsi commands)
- and all the VM hosts fences & reboots
- this is our only MD3600i and holds 1/2 of the VMs VDs

The PSs show a green light by the power plug in - indicating there is power to the PS , but the ISCSI controllers are offline
Power cycling the MD3600i PSs (unplug the power cord, wait 30 secs, plug in the power cord) will power up the MD & the both ISCSI controllers.
- if either PS is power cycled then both ISCSI controllers will be online (respond to pings or iscsi commands)


When the issue happens, the MDSM reports the MD is unavailable
- if both PSs are plugged in and the MD is unavailable, if 1 PS is power cycled,
the MD is powered up and MDSM will connect to the MD and show (degraded/non-optimal) 1 PS is online with the non-power cycled PS is offline.

I do get an email event notification from the SAN of
Event Message: The persistent monitor running on Host kye-vmh01.kyetech.local cannot reach the indicated Storage Array.

nothing in the MD event log for this time frame
- only showing when the MD was powered up

Is this a known issue?
Any suggestions/guesses?

Moderator

 • 

4.6K Posts

June 6th, 2023 13:00

Hello Kyetech-John,

 

If a power issue is suspected; Have you confirmed good power cables and no issues with the UPS?

 

Next let's get a Support Log for review.

How to : https://www.dell.com/support/kbdoc/en-us/000127694/how-to-gather-the-support-logs-of-powervault-md32xx-md34xx-md36xx-and-md38xx

 

Then you can upload the report here under the Service Tag : https://upload.dell.com/

After it's uploaded, please Private Message me the Service Tag for me to retrieve the report.

 

1 Rookie

 • 

11 Posts

June 6th, 2023 14:00

Thanks for the reply

I have confirmed good clean power is being supplied & the UPSs are working well (no issues or event in their logs).

As stated - both power supplies in the MD stop giving power to the MD at the same time, but have the green light on by the power cord.  This is odd. I would think that only 1 PS would have an issue and the other would keep the MD running, but NO, both PSs go offline.

I'll upload the MD Support log & PM you

Moderator

 • 

4.6K Posts

June 7th, 2023 05:00

Hello Kyetech-John,

 

Thank you for the notification. I received your private message.

 I will gather the log and Update you when I have more information.

Moderator

 • 

4.6K Posts

June 7th, 2023 10:00

Hello Kyetech-John,

 

I'm not seeing anything in the logs that point to a PSU related issue.  I am seeing controller reboot events and that could be what you are seeing.

 

It appears to have a host configuration issue  ( host kye-vmh01 has 2 IQNs mapped to it (likely should be defined as 2 separate hosts).

A fix could be to define iqn.2023-01.local.kyetech:01:44eb92e51227

 and iqn.1996-04.kye-vmh01:01:df2e41a70ff as separate hosts.

 

If the issue occurs again it would be best to attempt to get logs During the issue being present.

1 Rookie

 • 

11 Posts

June 8th, 2023 08:00

Thanks Charles

Hard to get the logs when all interfaces (management & ISCSI) are not responding.  And when this happens, no events are triggered to go into the event log because both controllers are unpowered.

Then IQNs on the hosts will be corrected next maintenance time.  Thanks for catching this.

 

1 Rookie

 • 

11 Posts

June 12th, 2023 08:00

Had a maintenance windows.  Edited hosts iqn & removed stray iqn, now onw 1 iqn for the 1 host.

But still the issue remains of no power to the controllers sometimes, necessitating a MD power cylce of both of the MD power supplies.

Moderator

 • 

4.6K Posts

June 12th, 2023 08:00

Hello Kyetech-John,

 

I'm sorry to see that. 

As close to the event, when you can access the interfaces, could you pull a new support log an upload again?

1 Rookie

 • 

11 Posts

June 12th, 2023 13:00

Hello Charles;

 

Uploaded new PV MD support file.

Moderator

 • 

4.6K Posts

June 13th, 2023 05:00

Hello Kyetech-John,

 

Thank you for the notification.   I will gather the log and update you when I have more information.

Moderator

 • 

4.6K Posts

June 13th, 2023 08:00

Hello Kyetech-John,

 

We don't see any events in the MEL over the weekend for anything like power supplies are rebooting.

The last "Start-of-day routine completed" events were on 6/4/23 around 1:51:36 AM so it appears the controllers have been up since then.

*What was day and time of the last issue?

*Is the management in-band(through host connectivity) or out-of-band (through management ports)?

*What Model, OS and version is on the attached hosts? 

 

Could you also provide this information:

 

*What time and date do you notice the issue?

*What are all the LEDs on the controllers showing for status and network links etc?

*What does the overall status LED on the front of the enclosure show, and drive LEDs?

*You mentioned one green LED on power supplies but there are a few different LEDs that tell us different things. What are ALL of the LEDs on the power supplies looking like while this is going on?

*Do the management ports respond to ping when this is happening?

Pictures of the above could help also.

 

1 Rookie

 • 

11 Posts

June 13th, 2023 10:00

*What was day and time of the last issue?
- this issue only happed 2 times - May 28, 2023 9:04:39 PM & Jun 3, 2023 11:37:35 PM
- times from log files on the hosts
- Event Message: The persistent monitor running on Host kye-vmh01.kyetech.local cannot reach the indicated Storage Array.

*Is the management in-band(through host connectivity) or out-of-band (through management ports)?
- out-of-band (through management ports)

*What Model, OS and version is on the attached hosts?
- Dell R620, Linux SUSE SLES 11 SP4

Could you also provide this information:

*What time and date do you notice the issue?
May 28, 2023 9:04:39 PM & Jun 3, 2023 11:37:35 PM

*What are all the LEDs on the controllers showing for status and network links etc?
- did not look at the controllers LEDs
- will take notice next time

*What does the overall status LED on the front of the enclosure show, and drive LEDs?
- drive LEDs are all off
- other LEDs on the front - all off - no LEDs lite of any color

*You mentioned one green LED on power supplies but there are a few different LEDs that tell us different things. What are ALL of the LEDs on the power supplies looking like while this is going on?
- did not notice any other LEDs on the power supplies - just the green one
- needed to get the system back online ASAP
- will take notice next time

*Do the management ports respond to ping when this is happening?
- no pings from management ports or the ISCSI ports

Moderator

 • 

4.6K Posts

June 13th, 2023 12:00

Hello Kyetech-John,

Thank you for the information. I'll need some time to review.

Moderator

 • 

4.6K Posts

June 16th, 2023 13:00

Hello Kyetech-John,

 

We'll need more information than we have at the moment.

If it happens again; Could you gather the images of the things listed above and an updated log?

We will use that to see if it can give us more insight to the issue.

1 Message

July 31st, 2023 07:00

Hello Charles;

Its been a few weeks and no SAN outages.

We thought it may have been an overheating issue since it was very warm outside those days the SAN offlined.  So we added better air conditioning and more separation between the rack appliances.  The appliances are cooler (the monitoring app shows).

The MD logs do not show if overheating and I cannot find a snmp MIB for this SAN on temp monitoring. Hope newer models do have a MIB to monitor.

Thanks for your help.

Have a good day.

 

No Events found!

Top