Unsolved
2.4K Posts
1
2398
November 27th, 2021 02:00
"scanner -i" will take too long for DDBoost & DD-AFTD devices
I recently wanted to compare scanner operations when using the same command for different NW device types. During my tests I found out that "scanner -i" will perform extremely slow for a backup device if it resides on a Data Domain (DDBoost & DD-AFTD devices).
I verified this for NW 19.3.0 up to the latest version 19.5.0.2 - it might well be that this behavior is the same for a bunch of earlier NW versions.
It took a while to verify and detect - along with Dell/EMC support - the reason for this performance degredation. As it turns out, the 'scanner -i' process actually issues a kind of 'ping' for each discovered file. Consequently the overall performance will decrease especially if a save set/volume contains millions of small files which is nothing unusual today. Just to give you an idea: scanning one volume containing 'only' 10mio small files may take several days for a DDBoost device ... for the same save set on a DD-based AFTD the performance is even slower
Another issue you should keep in mind:
At the end of the disaster recovery routine nsrdr the programs asks you whether you want to scan your volumes right now. If yes (or if you confirm the default setting) nsrdr will silently execute a 'scanner -i' in the background which of course will end in the same effect as described above. So whenever you use nsrdr you ...
- should avoid the built-in scanner process
- run 'scanner -m' on the volumes to prevent the loss of the save set information
- but ran 'scanner -i' on new save sets only if this is necessary.
This issue has been addressed and escalated - it will be solved soon although a specific date is not available yet. Until then, it is a good idea to avoid 'scanner -i' and the automatic 'scanner' process at the end of 'nsrdr' for the DD-bases device types.



barry_beckers
393 Posts
0
November 28th, 2021 13:00
Thnx.
One would have thought that this kind of basic DR functionality would (should really) have been spotted Dell internal.
Any bug or escalation id mentioned as reference?
bingo.1
2.4K Posts
0
November 28th, 2021 15:00
I do not have an escalation number but these
barry_beckers
393 Posts
0
November 28th, 2021 23:00
thnx, I will reach out to our TAM team at Dell and ask about it...
barry_beckers
393 Posts
0
November 28th, 2021 23:00
Might I ask what your setup might be?
For one we had some issues some time ago with remote locations that have a high latency to be able to perform NW DR of windows systems using the bootable winpe iso. It needs to sift through the nw client index and hits extremely long delays, so in case of latency above 50ms or especially if it is even worse, like above 150ms, then it can take more than a day if not longer, before the actual restore starts, even when its "only" 300k files (which is not a terrible large amount for a normal windows system, this would take hours already). If millions of files are involved, it becomes even worse and would take more than a day or even days to complete.
So I wonder if you might be dealing here with locations with higher latency, that are completely unaffected wrg to making backups (local DD and local NW storage node on the locations in our case) but have a huge impact during restores. as expected saveset recoveries are not affected at all, as no sifting through client indices is required whatsoever.
Above made me think asking about how your setup might be, if possibly latency might be rearing its ugly head?
bingo.1
2.4K Posts
0
November 29th, 2021 04:00
Actually, I used a very simple isolated/dedicated test environment:
- ESXi 7U2 server with an SSD RAID datastore, hosting a
- DDVE 7.2 server
- Windows 2019 Server with NW 19.3.0.4 as NW client
using various filesystems with a number between 100k and 10M tiny little files (100 bytes each)
- Windows 2019 Server with NW 19.3.0.4 as NW server/snode
I defined 5 DDBoost backup devices - each hosting one of the backed up file systems. The largest save set was only about 5GB. Then I cloned these 5 save sets/volumes each to a separate device/volume on
- a DDVE based AFTD
- a local attached AFTD
- a local attached FTD
Each clone process performed 'normal' - even the largest backup did not take longer than 5 minutes. Now I had everything ready to start my 'scanner -i' tests. The outcome was that scanning the local devices was fine but the result for the DDVE based devices was unacceptable. The large DDBoost devices took hours or even days to finish. And for the DDVE based AFTDs the result was even worse. Fortunately you can verify the behavior already with a test volume of 1K files - the relationship is different but the result is the same.
I see the real problem when you just start the 'default' scanner process at the end of the disaster recovery routine. Why? - because the admin will not receive any feedback until the process is over. And depending on your environment, this could take days, weeks or even months. And of course nobody will dare to interrupt the process.
If you need anything else, please feel free to ask.
bingo.1
2.4K Posts
0
January 12th, 2022 01:00
Over the years I had some time to verify since when the issue appeared for the first time - it first showed with NW 19.3.0:
- NW 19.2.0.4 - all is fine so far
- NW 19.3.0.0 could not execute scanner on DDBoost devices due to a missing libDDBoost.dll
- NW 19.3.0.1 has never been released
- NW 19.3.0.2 showed the effect for the first time.
BTW - the issue still exists in NW 19.6.0.0 which has been released last weekend. So this big issue now lives within the product for at least 18 months and QA has neither detected nor solved it so far. When will they finally solve the issue for the sake of all of us?
JChan2020
18 Posts
0
January 17th, 2022 00:00
Great thanks for your information.
Would you mind share your test results on the different NW versions for comparison?
e.g. Using "scanner -i" on DDBoost devices with save sets containing 100k files, how much time was used on NW 19.2.0.4 vs NW 19.3.0.2 vs NW 19.6.0.0?
barry_beckers
393 Posts
0
January 17th, 2022 01:00
So testing was "only" done using windows based NW server?
So not tested nor verified on Linux (as we use RHEL or NW NVE (which runs Suse 12 SP5))?
Who knows if it might be limited therefor maybe only to Windows? Or was there any confirmation that the same also occurs on Linux?
Don't know if I'd find the time ant time soon to test this maybe on a RHEL NW 19.5.0.5 and NW NVE 19.5.0.5 test backup server... that don't backup much more than just themselves.
bingo.1
2.4K Posts
1
January 17th, 2022 15:00
No - I have not tested/verified the issue for a NW server for Linux. And I clearly refrain from doing this.
With all respect: pointing towards an issue and to clearly inform other users about an important problem is my mission - taking over Dell/EMC's Quality Assurance is not. At a certain point I expect them to take over and to fulfill all necessary steps to 'get the cow off the ice'. Instead of letting other people lose their time, they should be able to compare the source code and undertake all necessary steps to solve the issue in a moderate time frame.
If you want to read more, just download my translated document from this URL: https://www.avus-cr.de/gener785_eng.pdf
bingo.1
2.4K Posts
1
January 17th, 2022 15:00
Your request made up my mind so I translated the appropriate document which I prepared for my german speaking audience. You can get it via this URL: https://www.avus-cr.de/gener785_eng.pdf
Please let me know whether this is clear enough for you.
JChan2020
18 Posts
0
January 18th, 2022 01:00
Thanks. Your document is very detailed.
Though similar to barry_beckers, we are running NW on RHEL, not Windows.
Besides DDBoost devices, we'll also need to test if similar issue affects scanning of tapes when upgrade to NW 19.5.
bingo.1
2.4K Posts
0
February 20th, 2022 06:00
@barry_beckers
In a moment of mental weakness, I today verified that the issue also exists an a NW Linux server (CentOS 8.1 with NW 19.6.0.0 and a DDVE 7.4 and a local FTD).
While a 100K save set took 25 secs to be scanned on a local FTD, the time needed for the DDBoost device was 15 mins.
So - do you want me to verify the behavior with a NVE as well?
bingo.1
2.4K Posts
0
February 20th, 2022 07:00
@barry_beckers
In a moment of mental weakness, I today verified that the issue also exists an a NW Linux server (CentOS 8.1 with NW 19.6.0.0 and a DDVE 7.4 and a local FTD).
While a 100K save set took 25 secs to be scanned on a local FTD, the time needed for the DDBoost device was 15 mins.
So - do you want me to verify the behavior with a NVE as well? - Will this make Dell/EMC's support to move?
bingo.1
2.4K Posts
0
February 20th, 2022 07:00
Maybe you will not see my today's answer to barry_beckers. So I will repeat it here again:
In a moment of mental weakness, I today verified that the issue also exists an a NW Linux server (CentOS 8.1 with NW 19.6.0.0 and a DDVE 7.4 and a local FTD).
While a 100K save set took 25 secs to be scanned on a local FTD, the time needed for the DDBoost device was 15 mins.
So - do you want me to verify the behavior with a NVE as well?
####################################################################
I do not expect any problem with tape drives as they are working with an OS- or vendor-based driver. This is the major difference.
bingo.1
2.4K Posts
0
April 23rd, 2022 08:00
This is just to confirm that the issue still persists with NW 19.6.0.3