Unsolved
This post is more than 5 years old
14 Posts
0
1068
December 14th, 2010 13:00
Celerra replicator performance
Good Afternoon,
I have been troubleshooting performance issues with EMC support around my Celerra Replicator V2.
I have a 2TB filesystem that houses nightly database backups and setup a replication job to my remote site Celerra.
However, the inital copy cannot even finish within 36 hours and fails because the checkpoint volume cannot expand anymore because of the high level of change from the nightly backups.
I am currently getting 3000-5000KB/sec throughput on the replication pipe.
I have an NLAN 100 meg connection from source to target data centers (no WAN routers and no WAN acceleration - very simple setup). I have 3750g switches that the data movers are plugged into at each site (same exact code installed in all of them). The ports the data movers are plugged into are set to 1000FULL as well as the data mover cge ports set to 1000FULL. There are zero errors on all the ports from a switch level and the ports are not even being pushed hard throughout the entire switch.
I also have Recoverpoint replicating over this same line but it is using only about 30% of the 100 meg pipe, so there is plenty of room for the Celerra to claim throughput but it simply is not pushing the data out of the cge ports fast enough. (recoverpoint is able to take up the entire pipe when needed if it ever has to do a full rescan - which is very very rare for me)
I have tried failing over the data movers and even changing the ports the data movers are plugged into but I get no performance increase. I have changed the fastRTO setting to '1' and I still have not seen any performance boost.
Is 3-5MB/sec a normal for the replication throughput? It seems VERY low. The throughput I am getting on the other cge ports that house CIFS shares are getting 40-50MB/sec throughput...so to me it seems that the replication piece of the Celerra is not getting the data out of the cge ports fast enough.
For reasons beyond me, the support tech I continue to troubleshoot with continues to think that it is the network as the underlying problem even though I have proven it time and time again that it isn't.
Is there anything else that can be done to try to increase performance for the replication?
Thanks
DanJost
190 Posts
0
December 16th, 2010 11:00
I'm able to get 100Mbit over my WAN connection with Replicator V2 - are you graphing the usage or just spot-checking the bandwidth? You said you are getting 3000-5000 KB/sec (which is ~23 - ~39Mbits) and your WAN connection is 100Mbit. You said you have recoverpoint running - anything else? Also, is your WAN connection private or have guaranteed bandwidth?
At a full 100Mbits it would take upwards of 40+ hours to replicate a full 2TB if my calculations are correct - How much data is already on the filesystem? How much data are you dumping into the file system daily?
As far as this "fails because the checkpoint volume cannot expand anymore" - you can solve this with rightsizing the savvol for the PFS. If you are going to dump hundreds of gigs onto this filesystem, you might need hundreds of gigs of savvol.
mmarzotto
14 Posts
0
December 16th, 2010 18:00
Recoverpoint is the only other thing running on this WAN connection. It is a full 100Mbit guaranteed link between sites.
Out of a 2TB filesystem, there is about 1.8TB worth of data sitting on it. There is a change of around 400-600GB each day when the database backups are dumped to the location (any backups older than n days are deleted and the new backups are placed on the filesystem)
How can i size the savvol through the command line? What is the syntax needed to perform this?
Thanks