Unsolved
This post is more than 5 years old
2 Posts
0
1907
February 1st, 2013 08:00
The probe limit time was reached error in multi-threaded application
Hi
I am trying to migrate moderately sized (500kB) files to Centera using multiple threads and receiving following error:
com.filepool.fplibrary.FPLibraryException: The probe limit time was reached
at com.filepool.fplibrary.FPTag.BlobWrite(Unknown Source)
Error code: -10212
Some of the writes succeed, some fail.
The import application is using code based on the Centera SDK sample (StoreContent), with addition of using threads to write multiple files at same time. The problem does not occur with 3 threads, but manifests itself when I use 10 concurrent threads (10 files imported at same time).
The options are:
FPPool.setGlobalOption(FPLibraryConstants.FP_OPTION_OPENSTRATEGY,
FPLibraryConstants.FP_NORMAL_OPEN);
thePool.setOption(FPLibraryConstants.FP_OPTION_BUFFERSIZE, 1024 * 5000);
I have turned on the Centera SDK logging and the only interesting lines I see there are following:
1359726523649 2013-02-01 13:48:43.649 | [debug] | 807158.258 | [EXCEPTION] | In 'Connection.cpp' at line 994: Excepti |
on
error=-10101
syserror=0
message=receive: waitForReadingData(1000) returned zero
trace=No trace available
1359726523649 2013-02-01 13:48:43.649 | [debug] | 807158.258 | [RETRY] retry (0) Probe because Exception |
error=-10101
syserror=0
message=receive: waitForReadingData(1000) returned zero
trace=FPDatagramSocket.receive (timeout=1000)
transid=ourServerName/1/PROBE
Application host system: AIX
Centera SDK: Centera_SDK_AIX-5_3
Has anyone encountered this issue?
Do you know the cause of the issue or how to fix it?
Thank you
Michal
mfh2
208 Posts
0
February 4th, 2013 06:00
Hello Michal -
One simple question: are you testing against a local Centera or one of the online clusters? Could it be a simple issue of needing to increase the UDP probe timeout to compensate for a long network delay?
Best Regards,
Mike Horgan
miso1
2 Posts
0
February 4th, 2013 06:00
I am testing against centera within our datacenter. Latency might be an issue. By increasing the UDP probe limit, do you mean setting FP_OPTION_PROBE_LIMIT?
In the meantime I have mitigated the problem by increasing buffer size to 5MB (used to be 5kB), increasing number of retries (FP_OPTION_RETRYCOUNT) and adding retries on the application level if I encounter this exception.
After increasing the buffer size the timeouts got more rare (maybe 1 per 100 files) and when I use retry on the application level, the document will get through. However it worries me, I did not remove the real cause of the issue.
Do you have any idea how to pinpoint/resolve the actual issue?
mfh2
208 Posts
0
February 4th, 2013 08:00
Yes, I would probably increase the probe limit to 10000 ms as a test, although this is a rarely-altered SDK parameter and shouldn't need changing unless you were doing something like talking over a very-high-latency WAN link.
The 5KB CDF buffer was far too small, so it's good that this value was increased. If you only have 1 blob per clip in your application then a value of 200KB for that buffer should be sufficient unless you are writing a tremendous amount odf app-specific metadata. Specifying a 5MB buffer will cause that amount to be allocated in a static buffer for each thread; not a big deal but kind of wasteful, esp. at high thread counts.
In terms of pinpointing, the only way I know of to track this down is to record an SDK log showing the problem and then decompose it into individual logs for each thread (using the process.thread value in the third column in your table). Look for commonalities in the thread traces that exhibit the problem. The SDK's UDP probe protocol is a very simple call/response and it should not encounter issues in network environments that are functioning correctly. If you see a lot of retry activity, even on threads that show ultimately successful transactions, that also points to possible network issues.
Best Regards,
MIke Horgan