Unsolved
This post is more than 5 years old
2 Intern
•
132 Posts
0
2619
March 28th, 2018 07:00
Avamar replication jobs not terminating "gracefully" at end of schedule window?
Working with a customer and we are seeing some Avamar replication jobs that are not terminating "gracefully" at end of their schedule window, which results in a "Failed" status on the destination Avamar, but only a "Timed Out - End" status on the source Avamar.
If all jobs were performing like this, it might be easier to troubleshoot, but only a few are exhibiting this behavior - others are indeed cancelling "gracefully" and showing "Partial Replication" status on both the source and destination Avamars. And FWIW, all of these replications are related to a single Data Movement (Replication) policy.
Also, not sure if this is related, but all jobs on the replication destination are showing the "Overtime enabled" indicator (including the ones that are completing), while they do not show a similar overtime indicator for the job on the replication source.
Log info for one of the "Timed out / Failed " jobs from the Avamar replication source:
2018-03-28 07:45:16 avtar Info <7061>: Canceled by '1008-Replicate' - exiting...
2018-03-28 07:45:16 avtar Info <9772>: Starting graceful (staged) termination, cancel request (wrap-up stage)
2018-03-28 07:45:16 avtar Info <18789>: Backup Replication Stats: original backup size: 21,475,065,745 (20.00 GB), precompression size: 21,475,065,745 (20.00 GB), network bytes sent: 476,880 (465.7 KB), sent: 0.00%
2018-03-28 07:47:04 avtar Info <8688>: Status 2018-03-28 07:47:04, 12 files, 20.00 GB (12 files, 464.3 KB, 0.00% new) 351MB 0% CPU
2018-03-28 07:47:04 avtar Info <8688>: Status 2018-03-28 07:47:04, 12 files, 20.00 GB (12 files, 464.3 KB, 0.00% new) 351MB 0% CPU - (target server)
2018-03-28 07:47:22 avtar Info <18808>: Replication - canceled from bpmnplbkp02.andersencorp.com:/avamar-1510761673//cur/01ea9c1968b2883bcca0ce67b94e8efa32c47cd6/1D3C570504B4D7A/6483D3F5768C5FB01232FBE9815FB256E37042C3 to cgmnplbkp02.andersencorp.com:/avamar-1510933784//STAGING/01ea9c1968b2883bcca0ce67b94e8efa32c47cd6/BACKUP-AC102724666E46C71B9303C5FA0B8C46575F45D2-1D3C570504B4D7A/6483D3F5768C5FB01232FBE9815FB256E37042C3, offset: 590, size: 590, id: 12
2018-03-28 07:49:46 avtar Info <0000>: Entering the 'final' phase of termination, timer trigger
2018-03-28 07:52:04 avtar Info <8688>: Status 2018-03-28 07:52:04, 12 files, 20.00 GB (12 files, 464.3 KB, 0.00% new) 351MB 0% CPU
Log info for one of the "Partial Replication" jobs from the Avamar replication source:
2018-03-28 07:45:16 avtar Info <7061>: Canceled by '1008-Replicate' - exiting...
2018-03-28 07:45:16 avtar Info <9772>: Starting graceful (staged) termination, cancel request (wrap-up stage)
2018-03-28 07:45:16 avtar Info <18789>: Backup Replication Stats: original backup size: 18,983 (18.54 KB), precompression size: 18,983 (18.54 KB), network bytes sent: 15,312 (14.95 KB), sent: 80.66%
2018-03-28 07:46:59 avtar Info <8688>: Status 2018-03-28 07:46:59, 9 files, 18.64 KB (9 files, 13.07 KB, 70.10% new) 415MB 0% CPU
2018-03-28 07:46:59 avtar Info <8688>: Status 2018-03-28 07:46:59, 9 files, 18.64 KB (9 files, 13.07 KB, 70.10% new) 415MB 6251% CPU - (target server)
2018-03-28 07:49:46 avtar Info <0000>: Entering the 'final' phase of termination, timer trigger
2018-03-28 07:52:00 avtar Info <8688>: Status 2018-03-28 07:52:00, 9 files, 18.64 KB (9 files, 13.07 KB, 70.10% new) 415MB 0% CPU
2018-03-28 07:54:42 avtar Info <18808>: Replication - canceled from bpmnplbkp02.andersencorp.com:/avamar-1510761673/ to cgmnplbkp02.andersencorp.com:/avamar-1510933784/, offset: 2285895680, size: 171798691840, id: 1
2018-03-28 07:54:42 avtar Info <40176>: - Multi-stream restore summary (current 0, max active 0, max cloned 0, total cloned 0, ddr handle 2):
2018-03-28 07:54:42 avtar FATAL <0000>: Replication failed due to cancellation
2018-03-28 07:54:42 avtar FATAL <40009>: DDR encountered errors.
2018-03-28 07:54:42 avtar Info <40176>: - Multi-stream restore summary (current 0, max active 0, max cloned 0, total cloned 0, ddr handle 2):
2018/03/28-12:54:42.33193 [avtar] INTERNAL ERROR: <0001> assert error (result == MSG_ERR_NONE), /local/jenkins/workspace/client_Laguna_SLES11-64/abs2/work/src/avtar/ade/adelib/interfaceade.cpp line 396
2018-03-28 07:54:42 avtar Info <5675>: Replicated 0 of 44 backups (0 errors) for client "BPAIRWAVE1_UBj9fWxNWfqIGUJAbT2RBg"
2018-03-28 07:54:42 avtar Error <6655>: Replicate interrupted.
2018-03-28 07:54:42 avtar Info <7925>: Restored 13.07 KB from selection(s) with 18.64 KB in 9 files
2018-03-28 07:54:42 avtar Info <6090>: Restored 13.07 KB in 82.73 minutes: 9.478 KB/hour (7 files/hour)
2018-03-28 07:54:42 avtar Info <7883>: Finished at 2018-03-28 07:54:42 CDT, Elapsed time: 0001h:22m:43s
2018-03-28 07:54:42 avtar Info <40176>: - Multi-stream restore summary (current 0, max active 0, max cloned 0, total cloned 0, ddr handle 1):
2018-03-28 07:54:42 avtar Info <6149>: Error summary: 4 errors: 6655, 40009, 0
2018-03-28 07:54:42 avtar Info <8468>: Sending wrapup message to parent
2018-03-28 07:54:42 avtar Info <5314>: Command failed (4 errors, 1 warning, exit code 10013: externally cancelled)
All comments/feedback appreciated - thanks.
rgaa1
1 Message
0
March 29th, 2018 08:00
Having the same issue...
2018-03-29 08:00:24 avtar FATAL <0000>: Replication failed due to cancellation
2018-03-29 08:00:24 avtar FATAL <40009>: DDR encountered errors.
2018-03-29 08:00:24 avtar Info <0000>: Entering the 'final' phase of termination, DDR_ERROR need to exit)
2018-03-29 08:00:24 avtar Info <40176>: - Multi-stream restore summary (current 0, max active 0, max cloned 0, total cloned 0, ddr handle 28):
2018/03/29-15:00:24.40868 [avtar] INTERNAL ERROR: <0001> assert error (result == MSG_ERR_NONE), /local/jenkins/workspace/client_Laguna_SLES11-64/abs2/work/src/avtar/ade/adelib/interfaceade.cpp line 396
2018-03-29 08:00:24 avtar Info <5675>: Replicated 26 of 33 backups (0 errors) for client "nt115.lcb.state.nv.us"
2018-03-29 08:00:24 avtar Error <6655>: Replicate interrupted.