r_thomas1

1 Rookie

•

6 Posts

1

4275

February 25th, 2016 02:00

Problem on a AX4-5f

Hello,

My fiber storage array is equipped with 2 SPS, one for every SP. The SPS managing SPb broke down and I proceeded to its replacement.
Then, after reboot then entire array the SPb appears as defective and SPS connect to it has for state: empty

Is it the current status of SPS which returns the SPb defective ?
Is there any specific operation to do for that SPS is operational?

Thank you for your help

Responses(18)

ZaphodB

195 Posts

0

February 29th, 2016 11:00

As you are using them, the four vault disks have two distinct types of data on them:

> User data from the LUNs you defined there

> Flare/vault data that the system uses

That second type of data can only exist on disks in slot 0-3. Think of them as the boot drives for the system; their use is identified by their physical location. When your disks failed, and were rebuilt to the designated hot spares, only the first type of data was rebuilt.

Your system hasn't been healthy since that first vault drive failed. For instance, I believe that write cache has been disabled since that point in time. Until you can return it to a healthy status I wouldn't expect it to function in a normal manner.

Having put disks back into slots 1 and 3, I would guess that it would be re-syncing from the invoked hot spares. As these are SATA disks there is a chance that their issues were with internal drive recovery, and they may be essentially 'good'. But I would think that the rebuild was still in progress, and that one or both of them may fail before it finishes.

What alerts do you still have? Just the SPS, or do you still see issues with the SPs and write cache disabled?

On a small capacity system it is a bit of a luxury, but if possible, you should avoid using the vault disks for user LUNs.

As I read it, your current vault drives are 750GB SATA, and the other eight drives are 1TB SATA. If disk 1 or 3 should fail you *could* unbind your remaining hot spare and physically replace the failed disk with it. That is perhaps an imperfect solution, but only because historically the 1TB SATA disks have what is perhaps the worst failure rate of any disk made in this century. Otherwise, replacing a like interface (SATA for SATA) vault with a larger capacity disk will not cause any issues, and you can go to wherever you get your replacement disks and buy another 1TB for just about the same cost as the 750GB.

kelleg

4.5K Posts

1

February 25th, 2016 14:00

There should be some indicator lites on the SPS indicating the state of the battery. You should let the SPS fully charge until you get the green lite. Then ensure that you have the sense cable plugged to the correct SP - try re-seating it on both ends - check the other cable on the good SPS for comparison.

Once the battery is fully charged, then try re-booting the array.

glen

r_thomas1

1 Rookie

•

6 Posts

0

February 26th, 2016 03:00

Hello and thanks for your help.

I checked the connections on the both ends of the sense cable and all is correct.

Only the Active light is ON, with a green color. Others lights are OFF

This screenshot to show you the status of the storage array

status.JPG.jpg

and this one is the 'attention required' information

problem.JPG.jpg

At the beginning, it was a problem with two disks on my first disk pool ( disks 1 and 3). The system used spare disks 4 and 11 to maintain activity of the LUNs on this pool. After that, i had a problem with failed SPS, connected to SPB.

I bought 2 news disks and a new SPS, replace the failed SPS by the new one ( after stopping the SPB). I installed the new disks in slots 1 and 3. When i do this, i losted connexion with navisphere on the storage array, even after rebooting it.

The actual status is the same since i removed new disks and restart the array.

kelleg

4.5K Posts

1

February 26th, 2016 09:00

When you have two disks fail at the same time and those two disks are in the same storage pool, then you get what's called a double faulted raid group (on the Ax series the Storage Pools are Raid Groups). When two disks (1 and 3) were faulted, those are part of the OS drives and control SPB (disks 1 and 3 are a mirror set). You need to put the disks back in in the exact location they came out of - as these are the OS disks there should be special labels on the disks.

When two disks in a mirror both fail, it's very unlikely that this can be fixed without extensive work - any user data on those disks may be lost and the disks will need to be rebuild (re-imaged) to restore the OS.

From what I can see only SPA is still alive and is probably the only SP that you could communicate with. The disk faults on SPB side caused SPB to panic, which also probably caused the SPS fault.

You'll probably need assistance from a Partner or EMC to help resolve this issue.

Make sure you keep the two original disk (1 and 3) separated from the other disks and make sure you know which slot which disk belongs to as this may help restore the SPB side faster.

glen

r_thomas1

1 Rookie

•

6 Posts

0

February 28th, 2016 23:00

Thanks for your help Glen !

The disks 1 et 3 failed not at the same type. They were successively replaced by disk 4 and 11. It takes long time for me to buy new disks and it's for this reason i have 2 disks missing in the first pool.

When i tried to add to news disks in slots 1 and 3 , i was thinking this two disks replace the spare disks 4 and 11 after rebuid. Maybe i must try this, one disk after the other and not two in the same time like i did it before.

r_thomas1

1 Rookie

•

6 Posts

0

February 29th, 2016 08:00

Thanks for your answer Zaphod.

My arrray was defined like this :

1st pool disk 0-3 (raid 5 with 4*687Go disks) ===> 2 LUNs

2nd pool disk 5-10 (raid 5 with 6*917Go disks) ===> 3 LUNs

Disk 4 (687Go) spare disk for pool 1

Disk 11 (917Go) spare disk for pool 2

First disk failed = disk 1 replaced by Disk 4, on week later, disk 3 failed replaced by disk 11.

Array works fine in this configuration until i buy new disks. When i replaced failed disk by new ones, i lost connection with navisphere .... Reading informations about this problem on EMC users FAQ, i try to removed new disks. With this, i can again access to the array via navisphere, in a state like the above screenshots.

This morning i installed the failed disks (1 and 3) in their original space and here is the actual state of the array...

Capture_ax4.JPG.jpg

As you can see i could redefine disk 11 as a spare disk of the second pool.

On the other side, impossible for me to recreate 1st pool with disk 0 to 3 ...

Navisphere answers :The creation of the disk pool failed: Error reported by storage processor A:Peer SP will not allow creation of Disk Pool. It may be performing an operation on one of the requested disks. SP B: Bad FRU Configuration in RAID Group create.

What can i do now ? The supposed failed disks 1 and 3 are good or not ? and why my SPS B is faulted while it's a new one i installed ...

ZaphodB

195 Posts

1

February 29th, 2016 08:00

Disks 0-3 in the base enclosure are the vault drives where the code running the SPs live. It is an extremely poor idea to leave even one of them failed for a second longer than absolutely necessary.

If you intend to continue using this storage (...and if you can ever get it healthy again...) you should purchase at least one spare drive to have on hand; when you use that one start the process of obtaining the next one immediately.

r_thomas1

1 Rookie

•

6 Posts

0

March 1st, 2016 00:00

Thank you Zaphod , i understand now how storage array works.

array state is today the same than yesterday, but few minuets ago, i tried again to create a disk pool with disks 0-3 and it works !!! I defined disk 4 as a spare disk for this pool and actually system initializing the two LUNs i created in this pool.

But write cache already disable because SPSb staying faulted. What i don(t understand is that SPS is a new one and it never works fine or be charged since i installed it.

Is there any procedure to initialize this SPS or forcing it to charge ?

Last screenshots status below

Capture_disks.JPG.jpg

Capture_newax4.JPG.jpg

Capture_luns.JPG.jpg

ZaphodB

195 Posts

0

March 1st, 2016 06:00

When swapping batteries I have run into occasions where a 'new' SPS did not come up properly.

What do the lights on the SPS tell you? I have had occasions where the unit was all green, and the SPs didn't agree, and others when the unit itself had an amber light on it.

You should check the cabling for the sense cable.

It is possible that the cable can be bad, but I consider that unlikely in general unless it was physically traumatized (kinked, bent pins on the connectors etc.).

You can power cycle the SPS, as long as the other is in good shape. unplug it and leave it off for a minute or two, then plug it back in.

I have had occasions where a reboot of one, or both (one at a time of course), of the SPs was required to get things back in shape.

r_thomas1

1 Rookie

•

6 Posts

0

March 15th, 2016 02:00

Hello Zaphod and sorry for the late answer.

I tried many times to force SPS restart, by stopping SPB before this. SPS still stay in the same state. Green led ON at the back of the SPS and faulted status in Navisphere. I checked the cable sense, nothing seems to be bad.

The only thing i have not tried yet is to plug the original SPS. I try today .

kelleg

4.5K Posts

0

March 15th, 2016 12:00

There are two things you can try:

1. Try rebooting SPB - sometimes the interface for the sense cable could be locked up internally - a reboot would release the sense cable interface (RS232)

2. Swap the sense cable on SPA to SPB

glen

AK

alireza.kayvan

1 Rookie

•

5 Posts

0

April 15th, 2025 21:13

Hi,

I had a problem in Ax4-5f san storage. At the beginning catch module(battery) failed. Then disks started to fail. So we tried to chabge disks one by one after rebuild. But someone tried ro changed the remained disks at the same time. So after that system didnt worked.

We didnt have support,no ip assigned,no manage serial cable. So we tried to disconnct one of the cables is used to change the pins of the cable used to connect the mini db9 port of enclusure to power supply module as a manage serial cable. So we can connect now . We found the vault disks by dd command . but the problem is that 3 of 4 vault disks are available and even system can find the os on it it cant but. Because maybe the checksum or something else is not ok. Ivused deferent disk orders in one of the enclusres after the warnings and messages i see mant int13 messages. With ctrl-c i can enter to diagnostic menue but the is no menue and it just says to contact administrator.how can i repair system by these existing disks.

Thanks in advance

DELL-Sam L

Moderator

•

7.5K Posts

0

April 16th, 2025 12:15

Hello alireza.kayvan,

When you noticed the failed BBU error, did you replace the faulted drive and let it rebuild before replacing the other drives? When the drives were replaced 1x1 did you write down which drive came out of which slot? Do you have an old set of SPcollects that you can review, as they may contain your drives info on the so that you can put the drives back in there original order?

AK

alireza.kayvan

1 Rookie

•

5 Posts

0

April 16th, 2025 14:57

Yes, I did. I thoroughly tested all possible drive combinations using the three detected Vault disks.

Unfortunately, the person who previously replaced the drives did not take note of the original slot assignments. I do not have any SPCollects from this SAN array, only from my VNX system. However, since some drive caddies had default sequence labels, I was able to make educated guesses about their original positions.

Both Storage Processors (A and B) are currently able to boot. However, SP A produces noticeably more INT13 messages, which could indicate BIOS-level boot negotiation issues or misaligned disk metadata.

With what I believe to be the best possible disk order, the SAN responds as follows:

Storage Processor A Output (Partial)

All five targets come online.
System locates firmware components such as BIOS, POST, CPLD, and FLARE images at their respective sector LBAs on disk 0.
DDBS identifies two drives (disk 0 and disk 2), but reports that:
- The second disk contains an invalid DD.
- The first (primary) disk needs to be rebuilt, but is still being used by default.
The SP boots into Degraded Mode due to a reboot counter exceeding threshold.

Storage Processor B Output (Partial)

All five targets also come online.
SP B reads metadata and boot components from disk 1 and disk 3.
DDBS reports:
- Both disks contain invalid DDs.
- One disk appears to be in the wrong slot.
- Despite metadata inconsistencies, the SP defaults to using both disks.
SP B continues the boot process with minimal errors and completes successfully, without entering Degraded Mode.

In summary:

The disk set for SP A is: 0, 2
The disk set for SP B is: 1, 3
The FLARE image is located at sector LBA 0x000AE8C0 on both sets, and the mirror drive geometry is calculated correctly.

I believe with the correct DD/DD metadata alignment and slot positioning, this array may be recoverable. However, without original SPCollect logs or verified drive mappings, it remains trial and error. Any suggestions to further validate slot/disk alignment — or tools to inspect MDDE/DD headers more deeply — would be greatly appreciated.

Thanks again for your support!

AK

alireza.kayvan

1 Rookie

•

5 Posts

0

April 16th, 2025 15:01

SAN‌ Storage Output :

-----------------------------------------------------------------------------
Storage Processor A
---------------------------------------------------------------------------

Copyright (c) EMC Corporation , 2008
Disk Array Subsystem Controller
Model: Boomslang: SAN
DiagName: Extended POST
DiagRev: Rev. 05.34
Build Date: Mon Jun 16 09:28:48 2008
StartTime: 04/16/2025 15:00:53
SaSerialNo: SL7B6103800129

This product includes software developed by Viola Systems
(http://www.violasystems.com/).

ABCDEabFabGHIJabKabcdefgLMabcdeNabcdeOPQRSTUVWXYZAABBCCDDEEFFabGGabHHabIIabJJabKKLLMMNN
Target 0 is online

Target 1 is online

Target 2 is online

Target 3 is online

Target 4 is online
OO

Relocating Data Directory Boot Service (DDBS: Rev. 11.00)...

DDBS: MDDE (Rev 2) on disk 0

BIOS image located at sector LBA 0x0009E040
DDBS: MDDE (Rev 2) on disk 0

POST/DIAG/EAST image located at sector LBA 0x0009F040
DDBS: MDDE (Rev 2) on disk 0

CC/PS FW image located at sector LBA 0x000A5040
DDBS: MDDE (Rev 2) on disk 0

CPLD image located at sector LBA 0x000A3040
DDBS: MDDE (Rev 2) on disk 0

DAE MC image located at sector LBA 0x000A7040
DDBS: MDDE (Rev 2) on disk 0

SAS LSI image located at sector LBA 0x000A8040
DDBS: MDDE (Rev 2) on disk 0

DPE MC image located at sector LBA 0x000A6040
DDBS: MDDE (Rev 2) on disk 0

SAS Expander Boot image located at sector LBA 0x000A1040
DDBS: MDDE (Rev 2) on disk 0

SAS Expander image located at sector LBA 0x000A0040
DDBS: MDDE (Rev 2) on disk 0

SAS Expander istr image located at sector LBA 0x000A2040
PP
Autoflash POST ROM?

Autoflash BIOS ROM?

Autoflash LSI FW?
QQ
Disk Enclosure Autoflash
Disk Enclosure 0
SAS Address from expander table 108

Autoflash SAS Expander Boot?
Enclosure 0 : SAS Expander Boot : Rev. 03.00
Autoflash SAS Expander ISTR?
Enclosure 0 : SAS Expander ISTR : Rev. 03.05
Autoflash SAS Expander FW?
Enclosure 0 : SAS Expander FW : Rev. 04.30
Autoflash CC/PS FW?

Autoflash CPLD FW?

Autoflash MC DPE DL?

Autoflash MC DPE FW?
RR

************************************************************
* Extended POST Messages
************************************************************

WARNING: BIOS: 05.40, ExpFw: 04.30, MC: 23.00, MCDL: 13.47
WARNING: POST: 05.34, ExpBoot: 03.00, CC/PS: 01.03
WARNING: LSI: 04.34, ExpIstr: 03.05, CPLD: 00.13
WARNING: SES Page Rev: 1:: 0=4, 1=4, 2=4, 3=4, 4=4
************************************************************

DDBS: K10_REBOOT_DATA: Count = 8
DDBS: **** Warning: Reboot count exceeded. This SP is booting in DEGRADED MODE.
DDBS: K10_REBOOT_DATA: State = 1
DDBS: K10_REBOOT_DATA: ForceDegradedMode = 0

DDBS: SP A Normal Boot Partition
DDBS: MDDE (Rev 2) on disk 0
DDBS: MDDE (Rev 2) on disk 2
DDBS: Read default DDE (0x400008) off disk 2

DDBS: DD invalid on second disk.
DDBS: First (primary) disk needs rebuild.
DDBS: Using first disk by default.
DDBS: Using second disk by default.

FLARE image (0x00400007) located at sector LBA 0x000AE8C0

Disk Set: 0 2

Total Sectors: 0x00D6FAEE

Relative Sectors: 0x0000003F

Calculated mirror drive geometry:
Sectors: 63
Heads: 255
Cylinders: 877
Capacity: 14089005 sectors

Total Sectors: 0x00D6FAEE

Relative Sectors: 0x0000003F

Calculated mirror drive geometry:
Sectors: 63
Heads: 255
Cylinders: 877
Capacity: 14089005 sectors

EndTime: 04/16/2025 15:01:22

int13 - RESET (1)

int13 - READ PARAMETERS (3)

int13 - RESET (5)

int13 - READ PARAMETERS (7)

int13 - READ PARAMETERS (24)

int13 - READ PARAMETERS (547)

int13 - CHECK EXTENSIONS PRESENT (548)

int13 - GET DRIVE PARAMETERS (Extended) (549)

int13 - READ PARAMETERS (552)

int13 - CHECK EXTENSIONS PRESENT (553)

int13 - GET DRIVE PARAMETERS (Extended) (554)

int13 - READ PARAMETERS (556)

int13 - CHECK EXTENSIONS PRESENT (557)

int13 - GET DRIVE PARAMETERS (Extended) (558)

int13 - READ PARAMETERS (567)

int13 - CHECK EXTENSIONS PRESENT (568)

int13 - GET DRIVE PARAMETERS (Extended) (569)

int13 - DRIVE TYPE (586)

int13 - READ PARAMETERS (587)

int13 - DRIVE TYPE (588)

int13 - CHECK EXTENSIONS PRESENT (590)

int13 - GET DRIVE PARAMETERS (Extended) (591)

int13 - READ PARAMETERS (592)

int13 - CHECK EXTENSIONS PRESENT (593)

int13 - GET DRIVE PARAMETERS (Extended) (594)

int13 - READ PARAMETERS (605)

int13 - CHECK EXTENSIONS PRESENT (606)

int13 - GET DRIVE PARAMETERS (Extended) (607)

int13 - READ PARAMETERS (610)

int13 - CHECK EXTENSIONS PRESENT (611)

int13 - GET DRIVE PARAMETERS (Extended) (612)

---------------------------------------------------------------------------
Storage Processor B
---------------------------------------------------------------------------

Copyright (c) EMC Corporation , 2008
Disk Array Subsystem Controller
Model: Boomslang: SAN
DiagName: Extended POST
DiagRev: Rev. 05.34
Build Date: Mon Jun 16 09:28:48 2008
StartTime: 04/16/2025 15:08:10
SaSerialNo: SL7B6103800137