Start a Conversation

Unsolved

GL

1 Rookie

 • 

27 Posts

4551

July 12th, 2021 18:00

Cannot lock file index for client ### (Too many open files)

Hi all,

I'm having an issue where I can't restore all the indexes using nsrdr. I have multiple Networker environments and I've been asked by Management to test a disaster recovery for all environments.

We use Networker 19.5.0.0 along with a Data Domain in Windows 2016. I uninstalled Networker from the server and then reinstalled Networker with a clean slate. I then add my Data Domain using NMC (without checking to re-label the storage) and issue nsrdr -N.

I've done this in the past in a previous environment with much less data and files and there were no errors. Now, with this environment, I can't restore the indexes for 1 client. It's actually probably the biggest (in file count and size) of all my clients. The error that nsrdr gives is this : Cannot lock file index for client xxx (Too many open files)

I tried a couple of times to restore only the indexes of this client using nsrdr -c -I xxx, but it still gives me the same error.

It restores about 90GB out of 94GB, and then gives up.

I've done some researches on Google and in Networker official documentation, and I can't find something related to my environment. The only thing I find is a ulimit setting related to Linux. I'm running Windows.

I tried something else; wasn't sure if it would be working but anyway. I had taken a backup of all /nsr/ folder before uninstalling Networker. I tried to just copy the indexes of that client over the /nsr/indexes/xxx folder. Once the copy was done, I tried running an nsrck -v -L 6 xxx. The same error comes up.

There is no log in daemon.raw related to this and I can't find a log file that gives me more details. Google doesn't really help either.

- Can I "import" the indexes that I had previously backed up ? If so, was I doing it the correct way ?

- What is this error message ? I think it's Windows throwing it to Networker, but I can't find any clues related to this. Maybe there's a config file in Networker somewhere that I can edit to fix this ?

Thank you.

2.4K Posts

July 13th, 2021 06:00

IMHO - the reason why you will not see a NW error is that this is not an internal but a Windows message. Consequently, you might find more details in the Windows Event Logs.

 

Depending on the size of your data zone, it might not be a good idea to recover the file indexes as part of the NW DR procedure (nsrdr). That's why the utility has the option where you can deselect this feature:

    ....

    Do you want to recover the client file indexes? Clients that have indexes for server ### will be recovered.Y(yes)/N(no)? [Y]

    ....

 

If you continue with no, you will have certain benefits - the major one will be that the process will finish soon and you NW will be back in service asap - not spending time with the recovery of the CFI files!

    Remember: File indexes are 'nice to have' but the info in the media index is the one that counts. Without it, what is a proper CFI worth? - Correct: absolutely nothing!

 

Of course, I admit that the CFI is handy, but you can still do this later. NW provides various options to do this:

1. You can simply run the command nsrck

    As NW has all information in the media index, it will recover the CFIs for all NW clients. Unfortunately, NW will control the sequence of clients.

2. You can run nsrck client_name for a specific client (maybe you even specify more than one).

    Using this method you can control the sequence and for example recover the index of your most important ones first.

3. Just use (mminfo) to find out the latest index backup save set(s) for your specific client and recover the save set(s).

    This will ensure that your users will most likely be able to recover the latest version of their files. You can later still use method 2 to get the complete CFI back.

 

What do you think?

 

 

1 Rookie

 • 

27 Posts

July 14th, 2021 05:00

As always, thanks for your reply bingo.1.

I figured it's Windows acting up, but I can't find out why really. I looked at Windows Event Logs, but nothing is recorded. I'll dig deeper.

I wonder if I haven't reached some sort of hard limit of files for the index of that client. Looking at the backups I have made of the indexes, I have 9,585 files for that index. It's a Solaris physical server. We've done numerous files restores in the past for that client so we know it was working before.

 

Restoring all the indexes doesn't take that long in our environment (about 30 minutes for everything). So from a time perspective, this isn't an issue really to do nsrdr -N.

At first, I did a full recovery using nsrdr -N. nsrdr told me that 1 client's indexes couldn't be restored, but all others were fine. This is where I started seeing the error Cannot lock file index for client xxx (Too many open files). It restored most of the files, but was still missing about 4GB out of 94GB.

I then tried restoring only the client's indexes using nsrdr -c -I xxx. Again, same error comes up.

Doing nsrck client_name results in the same error.

 

I'm still learning my way around Networker and you've been very helpful in the past months. Without the indexes, if I try to Recover files (file recovery) in Networker, I cannot use the Browse tab. It doesn't show me any files. I believe without the indexes, I can only restore the entire Save Set (using the Save Set Recover tab). Is that correct ? We use Networker as DR yes, but also to restore files after users deleted them. The only way to browse the files is to have the indexes on the Networker server right ? If so, I need to find a way to get the indexes files back for that client.

1 Rookie

 • 

27 Posts

July 14th, 2021 09:00

I think I may have found the issue.

After plenty of hours searching this morning, I don't know how I missed this, but there KB on Networker about a bug that was introduced in 19.4.0.2 : nsrck SYSTEM warning Cannot lock file index for client ' ' (Too many open files)  

I believe that 19.5.0.0, even if not mentioned, may be affected by this bug. To me, it sounds like the issue I am currently having.

I tried the workaround to increase the number of available File Descriptors, but that didn't seem to help unfortunately.

 

So updating to 19.5.0.0 fixed my cloning issue, but may have broken nsrck for that specific index, and therefore giving me the ability to do a file recovery on that specific client...nice. 

2.4K Posts

July 14th, 2021 12:00

 

I do not know how the programmers work but it would not surprise me if improvements, which have been introduced in the NW 19.4 'branch' will appear in a newer version sometime later.

--------------------------------------

With respect to the file recovery without the CFI: This is possible but in such case, you have to provide the details what to recover. You can do this from the NW user GUI but the easiest way to do that is using the recover command:

  recover ... -S ssid -a absolute_path_name_of_the_file_or_directory

This is handy if you can remember the source name of the object but I admit it makes most likely sense for structured data/filenames.

Sometimes it is faster to use this method for the partial recovery of a save set (to a temp location) and use the OS tools to find the files which you really want to restore.

But you can also use it directly, for example to retrieve your NW configuration which is stored in the /nsr/res directory. For instance, you could run the command

  recover -y -d C:\temp -S bootstrap_ssid -a E:\nsr\res\nsrdb

BTW - this is the only method to recover the resource files from the bootstrap because these save sets start their lifecycle with the status 'recoverable' because there is no CFI created for these save sets.

-------------------------------------------

The error message (Too many open files) surprises me in general. Why? - because NW does not have a big db for that information. All CFI info is stored - for each save set separately - in one of the index directories:

    /nsr/index/client_name/db6/ssid_group/

Here you will find a set of these 3 files:

    ssid.k0     ssid.k1     ssid.rec

BTW - only the *.rec file contains the true information.

I will here and now not explain how you can recognize the SSID from the hexadecimal file name but believe me - there is a clear relationship.

What does this mean? - it means that after the recovery of these files, there is no need '... to keep them open ...' . Consequently such error message should not appear, correct?

 

 

1 Rookie

 • 

27 Posts

July 16th, 2021 06:00

Thank you bingo.1. I have faith that 19.5 will be patched accordingly. 19.4.0.3 was released a few days ago with the fix. In the mean time, I've increased the retention policy of my Solaris server on the ZFS snapshots so that I can restore from the ZFS snapshots instead of Networker.

Thanks for the tip regarding save sets recovery. I had a hunch that recovering a save set to a temporary location would be a way to accomplish files restore without CFI. Thanks for confirming. I didn't know however that you could specify the full path and name like this. It could be useful in certain scenarios for us.

I was monitoring nsrck under Windows Resource Monitor while it was running, so I could check the handles. The process is nowhere near the limit that Windows has for open files per process. At least as far as I can understand this. I just believe that 19.5.0.0 code is based on 19.4.0.2 (which was released few weeks before) and the bug was introduced in 19.4.0.2 according to the KB. I don't believe nsrck is pushing the limits of Windows, just a bug in nsrck at this time, which triggers a Windows error. Crossed fingers for this to be fixed soon. At least I got ZFS Snapshots available on this server as plan B (which technically could also be plan A).

1 Rookie

 • 

27 Posts

July 16th, 2021 06:00

I have put Networker backups on hold for that very specific client until I can get this resolved.

Not knowing when DELL will be releasing a new version of 19.5 with the fix, do you think I can resume backups for that client and later on do an nsrck to import all the CFI ?

The CFI are already into the /index/clientxxx folder, I just can't run nsrck. If I let them inside this folder, resume backups, Networker will start rebuilding indexes into that folder. Will this cause any conflict since Networker doesn't think there are files in there ?

2.4K Posts

July 16th, 2021 08:00

With respect to your statement "... Networker will start rebuilding indexes into that folder ..." I am not sure. The 'part' of the ssid that will be used is most likely not used again. However, this is my private assumption.

BTW

  - Take the 'normal' ssid and convert it into a hex number

 -  Then run mminfo -q ssid your_ssid -r ssid, ssid(64)

 - compare both numbers  ....  and be surprised

 

1 Rookie

 • 

27 Posts

July 21st, 2021 06:00

Thanks bingo.1. I'll look into the hex number when I got some time.

19.5.0.1 was released yesterday and includes the fix for nsrck Cannot lock file index for client (Too many open files). I am happy to report that after installing 19.5.0.1 this morning, it fixed the issue and I was able to properly import my index.

2.4K Posts

July 21st, 2021 14:00

Thank you for sharing this good news.

 

July 25th, 2021 10:00

Makes one wonder about basic functionality testing for new releases by Dell? Nw19.5 already took a bit longer to be released, but that was not even noticed during testing?

No Events found!

Top