Unsolved
This post is more than 5 years old
1 Rookie
•
4 Posts
0
911
October 16th, 2019 09:00
Backups silently failing
Hi folks,
Sharing this in hopes public support will push Networker development into creating a fix.
Inherent in the design of nsrworkflows in Networker 9.x,18.x+, there is an apparent bug where Networker jobs will fail silently in the following situations:
1) An agent-based backup where the client is no longer network resolvable.
2) A file-system level backup configured with the "All" saveset attribute will fail silently if the client is not network reachable for whatever reason (e.g. client down or firewall enabled).
In both cases, a native Networker saveset report will not detect these failures. Otherwise, scanning detailed backup logs will show that these types of jobs have failed.
Our organization is providing backup as a service. Systems are not centrally managed by our group. As such, it is critical that we are able to detect and respond to any abnormalities from a backup standpoint.
There is no official fix available. As a work around, we have wrote a custom script which queries jobsdb for these cases. We have found that Data Protection Advisor is not 100% able to detect these types of failures either. We filed enhancement requests NW-I-1234 and NW-I-1247. Feel free to ask your Networker representative if you feel this is an issue worth fixing.
Thanks,
John



bingo.1
2.4K Posts
0
October 16th, 2019 11:00
IMHO you are mixing up two issues: NW and DPA
- NW will for sure be able to detect and report both problems.
It will report such problems in the logs (job/workflow/task results & the daemon.raw file).
It will only be reported in the save set report if a save set has already been started at all before it failed.
- I have no clue how DPA will react here as I have no personal experience with this product.
So it is obviously DPA that has to be improved.
PRASAD R
8 Posts
0
October 21st, 2019 07:00
Not sure if this makes sense.
check the success threshold if its "Success or Warning" in Workflow settings
Regards ,
Prasad
Yuie
1 Rookie
•
4 Posts
0
October 22nd, 2019 05:00
1) You highlight the problem with DPA. It's information is only as valuable as that information (if at all accurate) it can collect from Networker.
2) Reading each line of each log file on a daily basis is not sustainable. If I am missing something, there is not a native Networker report that will list and report the backup success status of each and every job in an accurate manner. Closest thing is the saveset details report which does indeed fail to report on the two cases I mentioned before. Secondly in the NMC, you will notice that jobs for either networker unreachable or unresolvable clients as described earlier do not show up explicitly as failed. For the fact of the matter, these types of jobs do not even have a backup status at all. You need to drill into the log details to determine that the job actually attempted to run and finally fail. This has been confirmed in NW 9.1.1.5 and 18.2.0.2. We have to resort to custom scripts which gather information from both the saveset details report and custom nsrjob queries. What are other folks doing to monitor their backups? Is there a better method?
Is it not unreasonable to assume in our line of work (data protection) that a job that is not successful, is automatically a failure? We have been using Networker for several years. This is not the only backup workflow where the product fails to report on non-successful jobs. Many times, we catch these issues by chance, and finally need to resort to custom methods to detect for future occurrences. Very frustrating. Are we the only customers noticing these issues?
Thanks,
John
bingo.1
2.4K Posts
0
October 23rd, 2019 01:00
Since some years, we have been using "Backup Eagle" which still serves us pretty well.
As long as you trust german software development it is a pretty nice tool which you can easily trim to just show you what you need. We use it especially to create monthly client backup reports (although I managed that with Powershell as well) and to detect groups/workflows which have never been started (... because they are still running) etc. One nice feature is that it will also report 'internal automatic retries' and more.
More details are available at https://www.schmitz-rz-consult.de/en/
Hope this URL will not delay the response for too long.