Start a Conversation

Unsolved

This post is more than 5 years old

2562

April 29th, 2015 03:00

SRM Test snapshot VMs disappears

Hello All

Need advice on the current issue. We have a Gen5 4 node RP cluster (EX) firmware v 4.0SP2.P1(m.29) at each site. The luns in replication are majorly ESX running on 5.5 with SRM 5.5 from EMC VNX7500 & 7600. When we had deployed the system last year we were able to successfully test failover / failback of the VMs along with test SRM snap with (direct access enable).

Yesterday we happened to test 4 VMs on a single datastore in a CG and the test was successful. We could see the VMs pingable, accessible and everything perfectly fine as it should be. we observed a very strange problem, to enable the direct access mode for the VM snapshot i had to give the control back to Recoverpoint from group policy and then enable direct access which was successful too. The moment i left the screen i saw from recovery page everything disappeared. It looked as if some one deliberately finished the testing either from SRM or from Recoverpoint which was not the case in reality.

We did the same thing couple of times and could see the test snapshot VMs successfully mounted on the hosts, VMs are pingable, RDPs are fine but the moment i take the control to enable direct access which all the time is successfully done the snapshots disappears. The VMs are no more available.

Have anyone faced a similar problem. To me it looks like a bug but not sure as i couldnt find any primus on it

regards

Firoz

2 Intern

 • 

1.1K Posts

April 29th, 2015 07:00

Hi Firoz,

When you run a test recovery from SRM, the image access mode it uses is Logged Image Access and the journal history is maintained. However, when you run an actual failover from SRM it uses Direct Image Access mode and the journal history is lost. This behaviour is by spec and also occurs when you move to Direct Image Access directly from RP.

Regards,

Rich

12 Posts

April 29th, 2015 08:00

thanks forshr for the reply. I understand this concept but when we do a test SRM snapshot i do have an option to change the logged access to direct access and this works with the same fundamentals as it erases the history. With enabling direct access it no longer requires journal space and while finishing the test it also gives an option to undo all the writes.

But my question is still unanswered and the VMs just disappears the movement i enable direct access.

regards

Firoz

2 Intern

 • 

1.1K Posts

April 29th, 2015 11:00

This is by spec also. You need to perform an SRM clean-up after the test recovery.

12 Posts

June 10th, 2015 04:00

Hi

3 days. The VM hosts SQL DB on it and I am expecting high writes during this tests. 20% space which gets allocated from the journal gets 100% full within a day and the only way I can get away with this is by enabling direct access on the SRM snapshot created and continue with the test for multiple days without the fear of log space getting full. And at the end of the testing I can clean up. We are not doing failover, its the test snapshot I am talking about.

You think by enabling direct access the test snapshot should disappear? I don't think so because we were able to do multiple tests in the pasts with direct access being enable and for many days.

12 Posts

June 10th, 2015 04:00

thanks forshr.

I understand that i can incre the proportion of image access log >20% but i dont want to do that for 2 reasons. (1) EMC as per there guidelines do not recommend. (2) If I happen to do test SRM snap for larger volume which may include multiple applications some heavy IOs some less and if I want to run the snapshot for 2-3 days which usually is in our environment, then how am i supposed to cope up with that.

I am trying to get away with the work around of manually increasing the image access log. Not sure if my understanding about enabling direct access is correct when i want to use it on an SRM test snapshot but if enabling direct access is not the right method then what else have I got incase my customer wants to do testing for multiple days on test snapshot.

2 Intern

 • 

1.1K Posts

June 10th, 2015 04:00

Hi Firoz,

One of the side effects of enabling direct image access is the loss of history from the journal including the SRM generated bookmark post disabling image access.

You could revert back to using logged image access and increase the size of your copy journal to proportionately increase the size of the image access log and/or increase the image access log %. Note that increasing the proportion of image access log (> 20%) requires the copy to be disabled.

Regards,

Rich

2 Intern

 • 

1.1K Posts

June 10th, 2015 05:00

The side-effect of initiating direct image access is 1) loss of PiT history from the journal as I've already stated, 2) replication is paused and marking is initiated at the source copy journal.

The alternative to this is to use logged image access and perform the changes I have suggested which many other customers have done, which is to increase the size of the journal and if necessary also increase the size of the image access log area. The 'do not recommend' guideline I am not familiar with, but I suspect this relates to the proportionate reduction in PiT rollback history which can be counteracted by increasingly the journal size.

12 Posts

June 10th, 2015 05:00

thanks Forshr

I wouldnt care about the effects of initiating direct image access but is that an option?? I mean i can't figure out why do I have an option of enable direct access when SRM Test snapshot is successfully initiated and i can see it as an hyperlink on unisphere. As per EMC documentation, enabling direct access is to by pass all the writes from image access logs directly to the DR lun which allows testing for a longer duration with an option to undo all the writes when i want to clean the testing.

by the meantime do you have any document which gives step by step information on how I can increase the image access log size?

thanks and appreciate your quick response

2 Intern

 • 

1.1K Posts

June 10th, 2015 05:00

You would have to take the CG(s) being protected by SRM in the PG(s) out of SRM external management control using the UI, CLI or REST API commands, change the CG(s) from logged image access mode to direct image access mode and then put the CG(s) back under SRM external management.

12 Posts

June 10th, 2015 06:00

GUI

12 Posts

June 10th, 2015 06:00

Exactly...this is what I do when my test SRM snapshot is successful but here the problem starts. after i enable the direct access while the snapshot being live on the host and during the process of giving the control back to SRM the VMs disappear and when I look at Manage recovery tab theres nothing listed.

Is this a problem or is this how it is supposed to react? I mean if it is supposed to react this way then what for do we have the enable direct access option?

2 Intern

 • 

1.1K Posts

June 10th, 2015 06:00

How are to enabling direct image access mode? Are you using the UI to move from logged to direct image access mode or the CLI?

2 Intern

 • 

1.1K Posts

June 10th, 2015 07:00

OK, so from the GUI you enable direct access from the recovery activity window by continuing the test.

If this is the case then the behaviour you are experiencing should not occur. I would recommend that you raise an SR and stipulate all off the related software versions as this could be an SRA related issue.

2 Intern

 • 

1.1K Posts

June 10th, 2015 09:00

Have there been any RP or VMware software upgrades/changes in between the last successful test and this recent test?

12 Posts

June 10th, 2015 09:00

finally....I am getting to hear what I wanted to. Yes I am facing this problem. I have checked all the compatibility from RPA firmware to SAN switch/storage firmware/SRA/VMware/HBA drivers and all seems compatible.

I see the VMs disappearing the moment I enable direct access from the recovery activity window by continuing test.

I had done this test 2 times in the past last year with no issues but recently we had face this trouble and couple of times. so its very very strange.

I have simulated this test yesterday to capture RP and SRM logs and got SR raised.

I will share the findings.

No Events found!

Top