This post is more than 5 years old
2 Intern
•
160 Posts
1
8301
October 21st, 2013 02:00
Avamar cron replication issues between Avamar 6.1.2-47 & 6.1.1-87
Replication (cron-job) from Avamar 6.1.2-47 to Avamar 6.1.1-87 server fails... very weard error message:
2013/10/20-21:01:49 avtar Error <5803>: Error writing 32-byte header to cache file /usr/local/avamar/var/p_bewetbu10.be.recticel.net-begerbu10-AVI_BACKUPS.dat. Possibly out of disk space
2013/10/20-21:01:49 avtar FATAL <5225>: Unable to open hash cache in directory '/usr/local/avamar/var'
Manual job is working fine.
Anyone ?
See details below...
2013/10/20-21:01:48 avtar Info <5551>: Command Line: /usr/local/avamar/bin/avtar.bin --flagfile=/usr/local/avamar/etc/usersettings.cfg --password=**************** --server=begerbu10 --vardir=/usr/local/avamar/var --bindir=/usr/local/avamar/bin --id=root --vardir=/usr/local/avamar/var --bindir=/usr/local/avamar/bin --sysdir=/usr/local/avamar/etc -x --replicate --workorderid=8185b01cd5d06f38 --allbackups --retention-type=none,daily,weekly,monthly,yearly --hashcachemax=32 --statistics --informationals=1 --account=/AVI_BACKUPS
2013/10/20-21:01:48 avtar Info <7977>: Starting at 2013-10-20 21:01:48 CEST [avtar Aug 19 2013 02:20:24 6.1.102-47 Linux-x86_64]
2013/10/20-21:01:48 avtar Info <9931>: Secondary flags: --flagfile=/usr/local/avamar/etc/usersettings.cfg --password=**************** --server=begerbu10 --vardir=/usr/local/avamar/var --bindir=/usr/local/avamar/bin --id=root -x --net-throttle=1.5 --server=bewetbu10.be.recticel.net --id=repluser --password=**************** --account=/REPLICATE/BEGERBU10.BE.RECTICEL.NET/AVI_BACKUPS --status=300
2013/10/20-21:01:48 avtar Info <8475>: ADE for multicore architectures enabled (Avamar Deduplication Engine v2.0.0)
2013/10/20-21:01:48 avtar Info <5552>: Connecting to Avamar Server (begerbu10)
2013/10/20-21:01:48 avtar Info <5554>: Connecting to one node in each datacenter
2013/10/20-21:01:48 avtar Info <5552>: Connecting to Avamar Server (bewetbu10.be.recticel.net)
2013/10/20-21:01:48 avtar Info <5554>: Connecting to one node in each datacenter
2013/10/20-21:01:49 avtar Info <5583>: Login User: "root", Domain: "default", Account: "/AVI_BACKUPS"
2013/10/20-21:01:49 avtar Info <5580>: Logging in on connection 0 (server 0)
2013/10/20-21:01:49 avtar Info <5582>: Avamar Server login successful
2013/10/20-21:01:49 avtar Info <5583>: Login User: "repluser", Domain: "default", Account: "/REPLICATE/BEGERBU10.BE.RECTICEL.NET/AVI_BACKUPS"
2013/10/20-21:01:49 avtar Info <5580>: Logging in on connection 0 (server 1)
2013/10/20-21:01:49 avtar Info <5582>: Avamar Server login successful
2013/10/20-21:01:49 avtar Info <5550>: Successfully logged into Avamar Server [6.1.2-47]
2013/10/20-21:01:49 avtar Info <5295>: Starting replicate at 2013-10-20 21:01:49 CEST as "dpn" on "begerbu10" (4 CPUs) [6.1.102-47]
2013/10/20-21:01:49 avtar Info <5949>: Backup file system character encoding is UTF-8.
2013/10/20-21:01:49 avtar Info <5667>: 113 backups found for client "AVI_BACKUPS"
2013/10/20-21:01:49 avtar Info <7250>: Client "AVI_BACKUPS" has 87 backups on target (bewetbu10.be.recticel.net)
2013/10/20-21:01:49 avtar Info <5688>: Loading hash cache /usr/local/avamar/var/p_bewetbu10.be.recticel.net-begerbu10-AVI_BACKUPS.dat
2013/10/20-21:01:49 avtar Info <8650>: Opening cache file /usr/local/avamar/var/p_bewetbu10.be.recticel.net-begerbu10-AVI_BACKUPS.dat
2013/10/20-21:01:49 avtar Error <5064>: Cannot open file "/usr/local/avamar/var/p_bewetbu10.be.recticel.net-begerbu10-AVI_BACKUPS.dat"
2013/10/20-21:01:49 avtar Info <5065>: Creating new cache file /usr/local/avamar/var/p_bewetbu10.be.recticel.net-begerbu10-AVI_BACKUPS.dat (1,573,408 bytes)
2013/10/20-21:01:49 avtar Error <5803>: Error writing 32-byte header to cache file /usr/local/avamar/var/p_bewetbu10.be.recticel.net-begerbu10-AVI_BACKUPS.dat. Possibly out of disk space
2013/10/20-21:01:49 avtar FATAL <5225>: Unable to open hash cache in directory '/usr/local/avamar/var'
2013/10/20-21:01:49 avtar Stats <6152>: Hash cache: 65,536 entries, added/updated 0, booted 0
2013/10/20-21:01:49 begin stack dump bp=(nil)
2013/10/20-21:01:49 end stack dump bp=(nil)
2013/10/20-21:01:49 avtar FATAL <5889>: Fatal signal 11 in pid 118358
2013/10/20-21:01:49 Fatal signal 11
2013/10/20-21:01:49 [118358] | 00000000007ea098
2013/10/20-21:01:49 [118358] | 00007fca31c6a6b0
2013/10/20-21:01:49 [118358] | 0000000000740f02
2013/10/20-21:01:49 [118358] | 0000000000748c80
TomLambrechts
2 Intern
•
160 Posts
0
November 8th, 2013 02:00
Workaround Avamar 6.1.2_47 – repl_cron replication issue – Error writing 32-byte header to cache file | Tom Lambrechts
Nayak2010
50 Posts
1
October 21st, 2013 06:00
Do you by any chance have multiple replication jobs running ? At this moment I'd also like to suggest opening a support ticket so this issue could be thoroughly worked upon.
TomLambrechts
2 Intern
•
160 Posts
1
October 21st, 2013 06:00
That’s a very good question… I’ll check… because in deed we have 3 locations… 2 locations replicating to a central location.
So there’s a very good chance that you are right here…
I’ll keep you posted.
Thanks !
TomLambrechts
2 Intern
•
160 Posts
0
November 4th, 2013 02:00
no other replication jobs running at the same time.
And now we also are facing this on other avamar systems running the same version (6.1.2.47) as well.
The scheduled replication fails on AVI_BACKUPS, 1 client, EM_BACKUPS, MC_BACKUPS.... some client replication jobs are ok.
any idea please ?
Nayak2010
50 Posts
1
November 4th, 2013 08:00
At this point.I'd suggest opening an SR with support.
gfznjhz
4 Posts
2
November 6th, 2013 01:00
Hi,
I've encountered the same issue in v7.0. I've solved the problem.
The original replication was done with EM, so p_cache files, in /usr/local/avamar/var directory, were owned by admin user in 644 mode 'rw-r--r--). The cron job for replication is scheduled in dpn user crontab. So dpn user cannot write in p_cache files already generated.
Just change the owner of p_cache files (chown dpn p_*.dat) and it works fine.
TomLambrechts
2 Intern
•
160 Posts
0
November 7th, 2013 00:00
EMC support let me know this is a bug found in Avamar 6.1.2-47 and at the moment there's no support document or fix.
You need to change the ownership every time you add a client in the list of repl_cron...
Whenever you add a new client in the list of the repl_cron, the P_Cache file of that client is listed under admin account and hence, we get this error while it is getting replicated.
As a workaround, we changed the ownership of those files from admin to dpn and it is resolved.
A Complete fix is resolved on the Avamar Version 7.0 SP1.
TomLambrechts
2 Intern
•
160 Posts
1
November 8th, 2013 01:00
example of the error... and the way to solve it:
2013/11/07-20:01:51 avtar Info <5688>: Loading hash cache /usr/local/avamar/var/p_bewetbu11.be.blabla.net-begerbu10-AVI_BACKUPS.dat
2013/11/07-20:01:51 avtar Info <8650>: Opening cache file /usr/local/avamar/var/p_bewetbu11.be.blabla.net-begerbu10-AVI_BACKUPS.dat
2013/11/07-20:01:51 avtar Error <5064>: Cannot open file "/usr/local/avamar/var/p_bewetbu11.be.blabla.net-begerbu10-AVI_BACKUPS.dat"
2013/11/07-20:01:51 avtar Info <5065>: Creating new cache file /usr/local/avamar/var/p_bewetbu11.be.blabla.net-begerbu10-AVI_BACKUPS.dat (1,573,408 bytes)
2013/11/07-20:01:51 avtar Error <5803>: Error writing 32-byte header to cache file /usr/local/avamar/var/p_bewetbu11.be.blabla.net-begerbu10-AVI_BACKUPS.dat. Possibly out of disk space
2013/11/07-20:01:51 avtar FATAL <5225>: Unable to open hash cache in directory '/usr/local/avamar/var'
.....
root@begerbu10:/usr/local/avamar/var/#: ls -l p_*
-rw-rw-rw- 1 dpn admin 1573408 Nov 5 19:02 p_bewetbu10.be.blabla.net-begerbu10-AVI_BACKUPS.dat
-rw-rw-rw- 1 dpn admin 1573408 Nov 5 19:29 p_bewetbu10.be.blabla.net-begerbu10-EM_BACKUPS.dat
-rw-rw-rw- 1 dpn admin 1573408 Nov 5 19:29 p_bewetbu10.be.blabla.net-begerbu10-MC_BACKUPS.dat
-rw-rw-rw- 1 dpn admin 25166368 Nov 5 19:28 p_bewetbu10.be.blabla.net-begerbu10-begerms1.be.blabla.net.dat
-rw-rw-rw- 1 dpn admin 12583456 Oct 18 11:42 p_bewetbu10.be.blabla.net-begerbu10-vmwarepc064_UDSJFteNWuob4sn3tVBcLQ.dat
-rw-r--r-- 1 admin admin 1573408 Nov 7 16:00 p_bewetbu11.be.blabla.net-begerbu10-AVI_BACKUPS.dat
-rw-r--r-- 1 admin admin 1573408 Nov 7 16:38 p_bewetbu11.be.blabla.net-begerbu10-EM_BACKUPS.dat
-rw-r--r-- 1 admin admin 1573408 Nov 7 16:39 p_bewetbu11.be.blabla.net-begerbu10-MC_BACKUPS.dat
-rw-r--r-- 1 admin admin 1573408 Nov 7 16:37 p_bewetbu11.be.blabla.net-begerbu10-begerms1.be.blabla.net.dat
===>>>
root@begerbu10:/usr/local/avamar/var/#: chown dpn p_bewetbu11.be.blabla.net-begerbu10-AVI_BACKUPS.dat
root@begerbu10:/usr/local/avamar/var/#: chown dpn p_bewetbu11.be.blabla.net-begerbu10-EM_BACKUPS.dat
root@begerbu10:/usr/local/avamar/var/#: chown dpn p_bewetbu11.be.blabla.net-begerbu10-MC_BACKUPS.dat
root@begerbu10:/usr/local/avamar/var/#: chown dpn p_bewetbu11.be.blabla.net-begerbu10-begerms1.be.blabla.net.dat
When you add another client to the replication job, you will have to do the same for that client !
jjbladester1
7 Posts
0
May 5th, 2014 05:00
I administer two Avamar 7.0 SP 1 grids. Replication is bi-directional and everything on one grid should be replicated to the other grid. This means that their capacity utilization should be identical. When I found that one grid had a 10% less capacity utilization, I started investigating replication issues.
Both grids are throwing errors in /usr/local/avamar/var/cron/replicate.log:
2014/05/05-08:11:13 avtar Error <0000>: Unable to chmod hash cache file /usr/local/avamar/var/p_alb-avmr-01.oag.lawnet-nyc-avmr-01-internal-server-name-here.dat
I did a chmod 644 /usr/local/avamar/var/p_*.dat and a chown /usr/local/avamar/var/p_*.dat and kicked off replication on both grids via Enterprise Manager. Replication appears to be running properly now.
Is there a hotfix for Avamar 7.0 SP 1 for this issue?
TomLambrechts
2 Intern
•
160 Posts
0
May 6th, 2014 01:00
Same problem as with 6.1.1 & 6.1.2... Manual Replication is showing different behaviour than Scheduled Replication.
EMC Support ??
J_H_
2 Intern
•
498 Posts
0
May 6th, 2014 10:00
I would like to throw a different scenario into this.
I am on 7.0.101-61 and have converted to the Batch job replication
but I also have two way replication so they should be the same size and are not (that comment is what got my attention)
I have looked at my p_*.dat files and I have
root root
dpn admin
so I should change the root roots to dpn admin?
ionthegeek
2 Intern
•
2K Posts
0
May 6th, 2014 18:00
Just to clarify, there are two issues described in this thread.
The first issue is the avtar FATAL. It is possible for this issue to cause a discrepancy in capacity if avtar exits before replicating the backups for a client. Hotfix 55125 is available to resolve this issue for 6.1.2-47 systems. This issue is also resolved in Avamar version 7.0.1-61.
The second issue is the "Unable to chmod hash cache file" message which -- on its own -- cannot explain any capacity discrepancy. This issue will not prevent backups from being replicated (though it will cause the system to report that replication has failed).
This issue can occur on 7.0 systems if both cron-based and plug-in-based replication are being used. If plug-in-based replication is in use, it is recommended to avoid cron-based replication entirely.
The p_cache*.dat files should be owned by user dpn, group admin and have 664 permissions so that both the dpn user (under which cron-based replication runs) and the admin user (under which manual replications run) can access the caches. The following commands (run as the root user) will correct the ownership and permissions on the cache files:
If there is a difference in capacity utilization, I would recommend working with support to confirm that replication is covering all of the intended clients, that it is completing successfully (and not timing out) and have them review the system to see if there are any stale backups hanging around under the MC_DELETED domain.
jjbladester1
7 Posts
0
May 7th, 2014 05:00
Ian,
I opened a Sev 2 ticket (SR 62897766) on this issue two days ago. So far, I've worked with "first level" and "second level" technical support engineers who themselves have been working with "engineering". We have performed the chown/chmod operations you mention several times but that is not fixing the problem and replicate.log is still filled with "Unable to chmod hash cache file" errors.
We are *only* using full site-to-site Enterprise Manager (cron-based) replication and have never touched group-based replication in the Avamar Administrator GUI. The L2 tech support person thought the issue could be with the permissions of files in /tmp/replicate/ but he wasn't sure. If the issue is resolved in 7.0.1-61, it must not apply to clients who upgraded from 6.1.1-87 since that is what we did on both of our Avamar grids.
root@avamar-server-1:/tmp/replicate/#: ls -ltrh
total 19M
-rw-r--r-- 1 admin admin 0 2014-05-06 12:10 empty
-rwxr-xr-x 1 dpn admin 9.3M 2014-05-06 12:55 replold.sh
-rwxr-xr-x 1 dpn admin 8.6M 2014-05-06 12:55 replnew.sh
-rw-rw-r-- 1 dpn admin 324K 2014-05-06 12:55 repldiff.sh
We don't have an MC_DELETED domain, but we do have MC_RETIRED. I deleted the stuff in there as it wasn't important and the backups for those retired clients were already expired from both grids. Now that those are gone, I just manually started repl_cron from Enterprise Manager. According to tail -f /usr/local/avamar/var/cron/replicate.log, I just received the following:
2014/05/07-08:33:32 avtar Error <0000>: Unable to chmod hash cache file /usr/local/avamar/var/p_avamar-server-2-avamar-server-1-internal-server-name.domain.com.dat
ionthegeek
2 Intern
•
2K Posts
0
May 7th, 2014 06:00
Remember, we're talking about two different issues. If you do not see fatal errors in the replicate log, your system is not affected by the first issue which is the one that is fixed in 7.0.1-61.
Replication will have slightly different behaviour depending on whether it is a scheduled replication job or it was started manually through Enterprise Manager / Avamar Administrator. This is because scheduled replication jobs run as the dpn user (since they are started from dpn's crontab), where manual replications run as the admin user (since they are started directly by EM / MCS which runs as the admin user).
In any case, if the problem you are most interested in resolving is the capacity imbalance, I would encourage you to ask support to focus on this. I won't say it's impossible but it is vanishingly unlikely that the cache permissions error could be causing a capacity imbalance. Efforts should be focused on finding the root cause cause of the capacity difference if that is the problem you are trying to solve.
The MC_DELETED domain is not visible through the Avamar Administrator which is why I recommended you ask support to review its contents.