Centera Connection String: use of Primary, Secondary and Replica clusters

Question

I have a two node Centera configuration e.g. Centera cluster A at site 1 and Centera cluster B at site 2. Each Centera cluster has 4 nodes allocated the 'access' role, 2 different nodes allocated the 'replication' role and 2 further nodes allocated the 'management' role; network segmentation. We will establish replication of a virtual app pool between the two Centera clusters replicating from cluster A to cluster B under normal conditions.

The production application server resides at site 1 and will use Centera cluster A as the primary cluster. The connection string used specifies 2 of Centera cluster A's access nodes. As these addresses are not prefixed they assume the role of primary cluster. My assumption has been that by connecting to one of the two access nodes on Centera cluster A, it will return the IP addresses of the other access nodes and also the nodes of the replica cluster B.

I am comfortable that the FPPool_Open() function will establish a pool connection with Cluster A and should there be a network or operational failure it will automatically allow a read failover but not a write or delete failover.

However, what I do not understand is what happens if the FPPool_Open function cannot connect to cluster A in the first place and is therefore subsequently not advised of the IP addresses of the replica cluster B's access nodes. Is this the purpose of specifiying the secondary prefix for cluster B's access nodes in the connection string? And under normal conditions this would only support by default the read failover operation and not write or delete operation?

Many thanks

Steve Tegg

EMCDennis · Answer

You should only have the IP addresses of the primary cluster within the connection string used within the application unless replication is setup for bi-directional.

The SDK will use the first IP address found within the connection string unless that node is not available then it will use the second IP address listed etc.

You should not have the replica cluster’s IP’s listed as this can cause the clusters to become out of sync unless replication is bi-directional such that any new ingest on the replica will be replicated back to the source cluster.

If bi-directional replication is not setup, then failing over to the replica is not advised.

Dennis

Phukon · Answer

One to Dennis.Nodes with replication and access roles must be able to communicate to each other's subnet.

Satish_Kutty · Answer

Hi Steve,

You can also refer to Centera Online Help which is available through Centera Global services on your Host system. The Centera Online help explains the various replication topologies available on the Centera Cluster.

Regards

Satish.N.Kutty

Steve_Tegg · Answer

I don't believe the responses have addressed my question.

If the FPPool_Open() function fails to connect to any of the primary access nodes as per the connection string, is it possible for the application server to seamlessly connect to the replica cluster. Is this the purpose of specifying secondary addresses in the connection string?

My current understanding from reading the online help and various whitepapers is if a connection is made to a primary access node this will result in the the IP addresses of all primary access nodes and the Centera replica's access nodes being returned to the SDK on the host. When this happens the replica nodes ip addresses overide the secondary node addresses? Is this correct or have I misunderstood how this operates?

I think this sort of detail would be really useful to include in updates to the relevant white papers e.g. SDK API Reference Guide, SDK Programmers Guide, Centera Replication Detailed Review etc. Also useul to have more detail in the Centera Online Help.

EMCDennis · Answer

When the SDK authenticates via the FPPool_Open, a probe is done and this returns all the IP addresses for all the access nodes (providing network segregation of access nodes is not being done). These IP’s will be used to establish threads to perform other functions within the SDK.

If replication is enabled, the failover address configured within replication will be used to also authenticate and perform a probe however these IP’s are not used unless the SDK needs to perform a read failover. This occurs when the SDK attempts to read an object that is not found. This is NOT used should the access nodes on the primary cluster become unavailable. This is only done when attempting to read objects from the primary cluster that are not found for whatever reason.

By specifying multiple IP addresses within the connection string used within the application, then second IP address will only be used should the first IP listed not be available.

It is not recommended to have multiple clusters IP addresses contained within the connection string (source and target) unless replication is configured bi-directionally

Dennis

mfh2 · Answer

Hi Steve -

Dennis is absolutely correct in his statements. And it's important that you do not allow a remote cluster to take over the primary role unless you have bi-directional replication set up, to avoid having content that only exists in one Centera and is not as protected as it ought to be.

The one issue I think hasn't been clearly addressed in this thread is "what is the purpose of using the 'secondary=xx.xx.xx.xx' syntax in a Centera connect string?".

This syntax allows the storage admin to add additional clusters to the Centera 'failover collection' which may not have an existing replication relationship with the primary cluster. This is a fairly obscure feature of the API that can be useful in 'special' situations. There could be an older Centera in the environment that is out of capacity for new ingest or replication traffic but contains c-clips that are occasionally referenced by an application. Or you could have a situation where cluster A now replicates to cluster B but for a while it was replicating to cluster C while B's WAN network link was down, and you want to hit cluster C if failover is ever required for that data.

While a secondary cluster is useful for content retreivel you typically don't want new content to get written to it (even by mistake) and so with 'secondary=' you are assured that the cluster is never eligible to take over the primary role (as long as its IPs do not appear in another part of the connection string) during connection setup.

Having said all that, I don't think I ever saw an actual usage of 'secondary=' out in the field, which is why it is mostly unfamiliar to folks.

Best Regards,

Mike Horgan

Centera

Centera Connection String: use of Primary, Secondary and Replica clusters

Was this post helpful?