3 Posts
0
902
December 29th, 2021 08:00
Failed to write to socket 9; Unable to start save session with nsrmmd AIX 7.1 Networker 19.3
Hi, while starting up a backup, frequently see errors:
Failed to write to socket 9; Unable to start save session with nsrmmd (see a full log below)
The backup is failing for that saveset and finished with error: save: backup of save set is unsuccessful.
Usually restart for that exactly saveset runs well with no issues while occasionally it may require another restart to proceed without "Failed to write to socket" error. The thing is that, there are numerous savesets in a backup and such issue may affect any saveset in arbitrary order. In most cases I don't see such error but it is unpredictable when error may pops up next time and backup needs be restarted.
The server in question "192.168.46.146" is NW mgmt server and looks like port 8639 becomes unavailable (temporarily) or is too busy to respond while I can't see anything bad with "192.168.46.146" in general - there are numerous backups running on that server with no visible issues.
"192.168.46.146" runs Networker/LGTO ver. 19.3.0.2 and client tested with versions 9.2.1.1, 18.0, 18.1, 19.2, 19.3.x.x with the same exact results.
Does anyone saw the same error or may recommend anything? Verified to be the same while using a workflow and "save" command from client.
Thanks!
===========================================================
Provided below is a log for a specific saveset when problem has been encountered - it is the same for other savesets (if/when happens).
------------------------------------
/disks/nasbld/nas55/nw/9.2.1/rpc/lib/c_tcp.c:1386 Failed to write to socket 9; peer = 192.168.46.146:[8639], errno = There is no process to read data written to a pipe.
Unable to start save session with nsrmmd on vlp-filenw02.dhebak: RPC send operation failed; peer = 192.168.46.146:8639, errno = There is no process to read data written to a pipe.
mmsave setup failed: RPC send operation failed; peer = 192.168.46.146:8639, errno = There is no process to read data written to a pipe.
Unable to set up the direct save with server 'vlp-filenw02.dhebak': RPC send operation failed; peer = 192.168.46.146:8639, errno = There is no process to read data written to a pipe..
The connection to the NetWorker server or storage node was lost before the save operation completed: There is no process to read data written to a pipe.
get_mbs_version() (tid:43348941709770761): ERRORDo MBS: cannot get emitter 0 version.
DPSS save point 'lpa-nwstgfe1.dhe.duke.edu:/epic/prd08' has encountered a critical error.
vbduke
3 Posts
0
January 10th, 2022 06:00
Configuring new devices and adding a new pool helped to solve this problem - looks like the error is not network-related but resources-related issue: with load being too high, the mgmt server is not capable to serve all the requests, especially when numerous multi-streams backup jobs are running, causing the sporadic "socket" errors. An interesting thing is that, the attempts to start the backup job may generate such error against different savesets in the backup job in arbitrary order.
Changing backup from "policy/workflow' to multi-stream "save' command initiated from client usually helps either to reduce such errors significantly or the backup job may proceeds w/o the 'socket' error at all, while switching over to a new pool helps to eliminate the errors.
I don't have enough statistics collected but week-long testing makes me feel that this is an efective solution for such problem. Thanks.
bingo.1
2.4K Posts
0
December 29th, 2021 15:00
Such sporadic issues most often refer to a 'network related problem' where NW is only 'reporting' the problem because it will load the network if possible.
The problem itself could have a longer list of potential origins, related to either hardware and/or software. And because of connection problems, NW does obviously not have the chance to retrieve (better) status information to generate better error messages.
May I suggest that you contact Dell support to help in further tests/investigations.
vbduke
3 Posts
0
December 29th, 2021 15:00
Thanks! I have case opened with EMC for another reason - will ask them on this matter.