Start a Conversation

Unsolved

This post is more than 5 years old

12487

December 9th, 2011 16:00

Sudden slowdown in site to site replication

We do site to site replication between two EQL.  Out of the blue, replication is now running at a speed of about 25% of what it has been running at for the past year.  Circuit speeds have been verified to be running about about 30 - 35 Mb, but we are only seening throughput of around 7 Mb on replication.  Everything else is fine as far as EQL performance.  Thinking that it might be a firmware issue, we upgraded from 5.0.8 to 5.1.2, but it didn't make a difference.  I have contacted Dell EQL support, but they have not been able to offer any help.  Anyone else seeing issues like this?  Anyone have any suggestions as to the potential cause for this to happen out of the blue?  The DR site has no other load or purpose other than to hold a a replica of our primary site, so there is no additional load on the EQL.

 

Thanks for any help!

203 Posts

December 9th, 2011 19:00

Time to start look at the connection points in between.  First, what kind of circuit are you using?  Is it some kind of metro-E based?  If so, these fail miserably at autonegotiating against any sort of switchgear or router, so the outermost interfaces should be manually set to 100mb full duplex.  

Next, lay out exactly what equipment you have between the two EQ arrays?  Fill us in on what those pieces would be.  Obviously, it would be best if you could test by replicating another group internally, then work your way out, but you may not have that luxury.

On each SAN network (source, and target) set up a temporary workstation and do some file copies to see if you are experiencing normal behavior.  Packet loss, latency, or fragmentation can all lead to drops in performance by that much.  Also, see if you can get your service provider to provide some stats.  They might have some insight.

For your testing, stick to doing just one replica a time.  That will make things easier to diagnose, and you won't get all of those crazy alarms.  Firmware 5.1.2 took a step backward on a few different things, and replication warnings/issues was one of them.  Harmless stuff, but annoying, and makes event monitoring meaningless at the moment

December 10th, 2011 05:00

Thanks for replying... Both circuits traverse the public network and are 50 Mb ethernet.  We have a tunnel set up for secure communication between the two sites.  We consistantly see about 3-4 MB/s of data transfer when doing workstation to workstation file transfers.  Up until the other day, the EQL was also transmitting at this speed for replications.

Each EQL connects to a Dell 5424 iSCSI gigabit switch.  For replication, the traffic then hits a Dell 6248 stack.  From there it goes to a Cisco ASA where the tunnel is established and transmitted to the remote network.  The setup at the remote network has the same hardware.  Unfortunately, we do not have the hardware to do an internal test.

I have tested file copies from workstation to workstation using the source and target SAN networks, and I see consistant transfer of around 3-4 MB/s (24-32 Mb/s).  The network between the two sites is solid, no loss of packets, and very minimal latency for traversing the public network.

We only ever perform one replica at a time.

Thanks for your help.

203 Posts

December 10th, 2011 11:00

Since it was running fine up until a few days ago, I wonder if it happened at the time in which your firmware upgrade switched it over to the other controller during a restart.  If you can afford to do so, it might be worthwhile to restart the array, so it switches it over to the other controller, then try a replica again.

I'd also be a bit curious about the tunnel it is running though.

December 10th, 2011 14:00

It actually began before the firmware upgrade.  It was at that time that time that we decided to go from 5.0.8 to 5.1.2.  It didn't make a difference.  Last night, I also went ahead and powercycled all equipment between the two EQL, including each EQL and it did not make a difference.  The Tunnel is just a simple IPsec tunnel between two Cisco ASAs.

203 Posts

December 10th, 2011 16:00

Well then, I'd be curious how in the heck it would react if you stood up that array right before the tunnel.

Did EqualLogic help you do any network monitoring analysis?  

What does the netblock topology look like?  The uplink to the 6248 stack, is that all on a flat network?

Are the replication frames being sent attempting to send jumbos?

No Events found!

Top