Recoverpoint EX on combination of Gen6 and Gen5 RPAs

Question

Hi Admins,

In our environment, we run Recoverpoint EX 4.4.1 SP1 P1 on a combination of 2 Gen6 and 2 Gen5 RPAs on each site. The backend is VNX 5600 with dedicated mixed pools for production LUNs and SAS pools for Journals. We are running through 13 CGs with a total protected size of 115 TB. Each CG is mapped to a single RPA and there is no distributed group writing through multiple RPAs. We have a 1 Gig WAN link which is 60% utilized most of the time. Finally, the failover / failback operations are orchestrated through VMware Site Recovery Manager 5.5. During our daily health checks, we do not see any LAG more than 50 MB on each CG.

During our recent DR drill of all CGs, I noticed there was a point when an RPA went into high load and returned to normal instantly.

Our RPO and RTO values are higher but with the last DR drill experience (as everything went smoothly) - We are looking at cutting the RPO / RTO values to half. I do not know if SRM plays any role in the RPO / RTO cut down time but please advise if the current setup in our premises needs any improvement?

Question: Rather having 13 CGs, can we combine multiple CGs into one ?

Question: Changing the group policy of each CG to "critical" would make any difference ?

Question: Can we failover all CGs at once ?

Would appreciate your time.

Regards

virtualphoton · Answer

During the failover / failback - I did not see any LAG on WAN link or the journal copies.

forshr · Answer

Did you notice the lag on the WAN link or the journal copies? Regards, Rich

forshr · Answer

So did the highload occur when you performed the actual failover?

virtualphoton · Answer

Yes - During the actual fail over.

forshr · Answer

One last question. Did the SRM PG contain all of the RP CGs?

virtualphoton · Answer

No.. Each RP CG contains a set of critical and non-critical applications. Similarly we have segregated the SRM PGs into multiple PGs. I can say it is a 1:1 mapping from RP to SRM.

forshr · Answer

OK, so going back to your original e-mail, SRM will generate it's own bookmarked snapshot as part of the recovery mode procedure and this snapshot will have to be replicated and then distributed before it can used by SRM. So in respect of the RPO the rate of write throughput versus the size of the WAN link is important to minimize RPO/data lag. Setting the criticality can play a part in providing preferential replication capabilities for those selected CGs. As for RTO, this is dependent on distribution to the journal/replica and the performance associated with it. An indication of slow journal will show as lag in the copy journal.

So if you want minimize both RPO/RTO then sizing is important. Moreover, in relation to the number of CGs and initiating recovery actions using SRM then a balance of CGs across the RPAs is important and the PG to CG relationship. Initiating a PG with multiple CGs will may not necessarily impact the RPA but could impact the journal as the RPA will try to roll the snapshot to the replica for multiple CGs, Where an RPA has to perform this action for multiple CGs then this can cause a highload.

Regards,

Rich

virtualphoton · Answer

So would it be a good idea to run a total protected size of 115 TB (irrespective of whatever the CG size maybe) on a combination of Gen6 and Gen 5 RPAs ?

forshr · Answer

Possibly but using a DCG utilising the maximum of four RPAs. Obviously the effects granularity and you will need to check that that the total write throughput does not exceed what the DCG can handle.

RecoverPoint

Recoverpoint EX on combination of Gen6 and Gen5 RPAs

Was this post helpful?