Unsolved
This post is more than 5 years old
7 Posts
0
3844
December 14th, 2009 14:00
Relationship between threads and FP_OPTION_MAXCONNECTIONS
Hi,
I've inherited an application that calls out to Centera. The application is clustered across 4 weblogic nodes. As far as I can work out from the (Java) code a single FPPool object is created within each node - so 4 FPPool objects are used across the clustered app. There are only 2 threads within the app that use the pool (basically driven by single instance MDBs each consuming messages from a separate JMS queue). So in total I think there are 8 threads across the cluster that are trying to access the pool.
I've also discovered that the FP_OPTION_MAXCONNECTIONS is set (globally) to 500. Apart from the value being set in the code there is no indication of why this figure was chosen. Can anybody tell me what this actually means?
Secondly - from what I've read about Centera so far - I think we are using too few threads to call into the SDK. We need to store several thousand smallish files every day. I think we have 2 Access Nodes, so I'm guessing we could go up to about 40 threads in total across the cluster (presuming no other App is also accessing the AN) would provide more efficient use of the resources?
And finally - I understand the reason the number of threads accessing the SDK has been throttled is because historically the application logs are filled with exceptions thrown when trying to call the SDK's writeBlob method. This was attributed to the 'flaky' product and the SDK's management of the connections to the physical cluster. I'm not sure this is the case - so any indication of how I can start to investigate these errors would be greatly received.
thanks in advance.


gstuartemc
2 Intern
•
417 Posts
0
December 15th, 2009 03:00
Hi Paul - the SDK allocates a collection of sockets when a Pooll cotnnection is made. It then chooses unused ones each time a transaction needs to access the cluster over the network. So it's not strictly a 1:1 relationship between sockets and threads. It is more likely to be the case that there are less sockets in use than concurrent threads of access - unless all threads are performing active transactions over the wire at any given time (which would be 1:1).
If you have 2 Access Nodes than your app could use 40-50 threads of access (in total for reading and writing), and this would certainly improve your application performance. In this case, the default value of 100 for MAXCONNECTIONS would be sufficient.
I suggest you revert back to this and then we can work an any write errors if / when they occur.
paulb102
7 Posts
0
December 15th, 2009 06:00
hi - thanks for the quick response.
I'll increase the number of threads calling into the FPPool and lower the max connectiond and see what happens.
As an aside - I assume if MAX_CONNECTIONS is set to 500 this doesn't mean that 500 connections are allocated for use immediately.
Thanks again.
gstuartemc
2 Intern
•
417 Posts
0
December 15th, 2009 09:00
paulb102
7 Posts
0
December 18th, 2009 02:00
thanks again.
the learning continues.......
I'm now seeing multiple occurrences of 2 exeptions.:
com.filepool.fplibrary.FPLibraryException: Error on network socket (transid='xxxxx/321/WRITE_BLOB')
at com.filepool.fplibrary.FPTag.BlobWrite(Unknown Source)
at com.filepool.fplibrary.FPTag.BlobWrite(Unknown Source)
and
com.filepool.fplibrary.FPLibraryException: Error received from send(...) (transid='xxxxx/1712/WRITE_BLOB')
at com.filepool.fplibrary.FPTag.BlobWrite(Unknown Source)
at com.filepool.fplibrary.FPTag.BlobWrite(Unknown Source)
Any tips on how I can start to investigate what the underlying cause may be or what these exceptions are a symptom of?
Much appreciated.
gstuartemc
2 Intern
•
417 Posts
1
December 18th, 2009 08:00
These are symptomatic of a saturated network or overloaded cluster. Are you using one of the public clusters or a local one?
Also - you should be catching FPLibraryExceptions and printing out the error code and message. That will be more helpful in diagnosing any issues than just a simple java stack trace.
paulb102
7 Posts
0
December 21st, 2009 01:00
thanks - the poor error handling is unfortunately an indication of the quality of the code I've inherited. I'll tidy that bit up today and see what info I can get from the returned exceptions.
Update - the error returned contains the following:
Error Code is [-10101]
Error String is [FP_SOCKET_ERR]
Error Message is [Error received from send(...) (transid='XXXXX/131/WRITE_BLOB')]
Client Error [false]
Network Error is [false]
Sever Code is [false]
FPSocket.send
It is a private cluster and we are seeing those errors in live. In an attempt to understand the code, the SDK, Centera and ultimately the root cause of this issue I've set up my own test harness pointing at one of our 'test' Centera clusters. If I apply load through 8 concurrent JVMs, each with a single thread calling into the SDK then the error doesn't occur. If however I apply the same load through 4 JVMs each with 2 threads calling into a singleton SDK instance then I see the errors - which (to a layman) points to threading issues within the SDK rather than the cluster itself.
I'll keep investigation (and learning) :-).
thanks again.