RP4VM 4.3 vRPAs crashed

Question

Hello,Just setup a 4.3 environment yesterday (2 vRPAs per cluster, with 2vCPU & 4GB ram each). I created a CG with a single VM being protected (Windows 2012 R2). The VM is a fresh install with a 40GB disk. I waited until the CG was finished its init and performed a test copy. I then wanted to see what happened if I filled the journal up. Here's what I did:Copied 1GB into the replica VM (logged access storage showed 50% used)Copied data to the production VM until the usage showed 87% in the journal.Tried copying additional data to the replica VM (my intent was to completely fill the logged access storage)While the copy was occurring, the VM froze and I noticed that an action to power off the VM had been performed in vCenter. This took awhile, but it eventually worked. The logged access showed 0GB stored after this event. Unfortunately, both of my vRPA's in Cluster-2 also froze. I tried getting to them via the terminal, but I could never reach a login prompt. I also had an error in the GUI for lost communication (sorry I don't have the exact error). I thought I would then reboot each vRPA in Cluster-2. Since the VM's were froze (VMWare tool not running), I had use reset, but this failed. I then tried killing the VM process using ESXcli and ESXtop, but was unsuccessful. esxcli vm process kill --type= [soft,hard,force] --world-id= WorldNumberI am now rebooting both of my hosts that had the frozen vRPAs. I have a few questions:1. The Journal I created was 10GB in size. If I understand correctly, up to 20% of the size of the journal can be used to store writes made to the replica VM (logged access space). Is this space reserved, meaning that I only get 80% of the journal capacity, for 'journal writes' or can I actually use all of the journal space?2. What is supposed to happen if the journal fills? if the logged access storage fills?

Idan · Answer

Hi there,

First, we would need to investigate why the vRPAs crashed. can you please collect system information via boxmgmt and send me a link to it ?

1. 20% of the journal is reserved for writes on the replica VM(s) during logged image access, what's left is reserved for PiTs and internal system operations. the capacity is reserved so you cannot use more for other purposes.

2. The journal is circular, it will be impossible to delete old PiTs and to create new ones when logged image access is accessed for extended period of time, thus preventing the deletion of the image being accessed. When this happens, replication will pause until image access is disabled. When the image access log is filled, the behavior will be to automatically shutdown the replica VM (since data cannot be written anymore) and unregister it while registering bringing up the shadow VM

The user will be informed in the plugin that the image access log is full so he can choose whether to undo writes, disable image access or to add journal.

Hope that helps,

Idan Kentor

Corporate Systems Engineer - RecoverPoint and VPLEX

idan.kentor@emc.com

@IdanKentor

CBuhl · Answer

I have sent you a link to the logs. vRPA in Cluster-2 died over night, so there wont be any logs for that unit.

CBuhl · Answer

Instead of trying a reset (which has resulted in the VM freezing forever and unable to be shutdown), I tried powering it off.

The vRPA shows inaccessible and here's what the files in the datastore look like. The VMX file shows as File instead of Virtual Machine.

I'm tempted to redeploy everything, but I'd like to understand what is happening. I should also mention that this is to test RP/VM, we have some customers who have shown interest but my environment isn't what I'd like for testing. I have the two RPA clusters operating in the same VMWare cluster, hence a single vCenter. There is DRS setup, and the VM's will move around freely, so occasionally two of the vRPA's will end up on the same host (which I'd never want to happen in a prod scenario). Could that cause any of the observed issues?

pward99 · Answer

Hi, Have you had any progress on this issue?

Currently have a support call open with EMC for the same issue on 4.2, have two VMware clusters one in production the other in DR, unfortunately the DR cluster vRPAs keep crashing.

So far haven't been able to get to the bottom of this issue with EMC support.

Any help would be appreciated

CBuhl · Answer

Unfortunately, I haven't found anything more. I replaced both of the vRPA's in the 'problematic cluster' and they haven't had issues since then. I must admit that there hasn't been much load on them, so it could just be luck to.

pward99 · Answer

Might have a resolution on my issue, drivers inside ESXi caused the problem for me. Specifically Cisco Enic and Fnic drivers.

Idan · Answer

It is possible that its the case here as well but there are no evidence of that in the logs. I would recommend to go through support or this forum if it occurs again. Regards, Idan

RecoverPoint

RP4VM 4.3 vRPAs crashed

Was this post helpful?