Start a Conversation

Unsolved

1 Rookie

 • 

4 Posts

40

September 8th, 2025 12:36

Isilon OneFS 9.5.0.7 - After a node reboot (A200), the node is no longer connected to the cluster

We had some issue in our Isilon A200 cluster where several nodes rebooted, however one node did not come back correctly and is missing from the cluster.
On the node it states: "Warning: This node is not connected to the cluster."

Rebooting or reseating the node does not resolve the issue, on the frontend management I can ping al other nodes from the problematic node and vise versa. Same goes for the infiniband IP's.

I tried to smartfail the node, it now shas the Smartfailed status, but is not removed from the cluster or ready to be joined again.

Someone suggested to reboot the the "partner" node, we did so resulting that we now have two nodes with the same issue, both are now still connected to the cluster, alle IP connectivity is fine, but both state they are not connected.

Has anyone else encountered the same issue?

Moderator

 • 

9.3K Posts

September 8th, 2025 19:14

Hi,

 

Thanks for your question.

Does the cluster have enough free space? What does “isi status” show?

Let us know if you have any additional questions.

1 Rookie

 • 

4 Posts

September 9th, 2025 07:33

@DELL-Josh Cr​ Hi Josh, 
The Cluster has enough space, even with the two nodes missing.
The cluster registers the Node as down / not connected and the Node states it is not connected to the cluster.


(edited)

Moderator

 • 

9.3K Posts

September 9th, 2025 13:31

Is it under warranty? It would be best to call in so they can troubleshoot the cluster.

1 Rookie

 • 

4 Posts

September 9th, 2025 14:01

@DELL-Josh Cr​ Not anymore, I was hoping someone here would recognise the issue and give some hints. The thing ran smoothly for eight years and it is about to be migrated to a different storage system. Like the Isilon feels it and both old clusters start acting up... :)

(edited)

1 Rookie

 • 

15 Posts

September 10th, 2025 07:42

But isn't the affected node already SmartFailed?

At least that's what it looks like in the screenshot from isi_status. 

This means that the data has probably already been copied so that the configured protection can be achieved across the entire cluster and the node can be removed from the cluster.

To make the node available in the cluster again, you now have to restore it to its “factory state,” i.e., perform an isi_reformat_node on the node and rejoin it to the cluster as if it were a completely new node.

If the node were simply offline (e.g., because the backend cables were removed or it has another problem), it would have the status Down or ReadOnly.

1 Rookie

 • 

1 Message

September 17th, 2025 11:52

@Phil2018​ Had a similar issue with a f900 recently and had to do the isi_reformat_node, it worked. In my situation the node was stuck in the smartfail state. Reseat, reboot, power drain, none of that worked. Even tried isi_reimage, wouldn't budge. This happened after a power outage, not a new drive replacement.

Procedure:

ssh qfs9

Note: Ensure you are on the node in question.

isi_format_node --nolkg

Are you sure? yes

This will wipe data? yes

Note: Finished and rebooted in about 20 minutes

isi status -q

Note: Node now shows up, no ip and not connected.

isi devices node list

- - - -

CF2ZSZ44000345

- - - -

isi devices node add CF2ZSZ44000345

Note: This took about another 30 minutes

(edited)

1 Rookie

 • 

4 Posts

September 17th, 2025 14:30

@greemi​ Thank you very much for your instructions, did you perform this with the backend  network (infifniband in our case) still connected?

Cheers Kevin.

No Events found!

Top