Unsolved
This post is more than 5 years old
12 Posts
0
2028
September 13th, 2011 06:00
Performance FPClip.write() ?
I recognized, that writing the clipID takes the most time. Have a look at this code:
(1) FPFileInputStream inputStream = new FPFileInputStream(oFile);
(2) newTag.BlobWrite(inputStream, FPLibraryConstants.FP_OPTION_CLIENT_CALCID);
(3) inputStream.close();
(4) String clipID = theClip.Write();
The last line (4) takes the most time. I expected that BlobWrite() (2) would take the most time.
Here are the times of storing an 11 kb document:
(1) 0.000 seconds
(2) 0.032 seconds
(3) 0.000 seconds
(4) 2.405 seconds
How can I optimize that?
No Events found!
mfh2
208 Posts
0
September 15th, 2011 14:00
Hello Andreas -
I'm guessing that what you really want to do is optimize the number of files you can archive to Centera per second, rather than optimizing the time to write an individual file. With Centera, the answer is to increase the parallellism of your write transactions. I recommend you view the first 20 minutes or so of this presentation on Centera API Best Practices which discuss your issue: https://community.emc.com/docs/DOC-3303
Best Regards,
Mike Horgan
nishiichinoe
12 Posts
0
September 15th, 2011 23:00
Hello Mike,
Thank you very much for your response!
I use parallellism for writing huge files (> 5MB) as recommended (BlobWritePartial). For huge files I can see an amazing performance increasement using BlobWritePartial and Threading rather than using BlobWrite.
But we deal with small Files (Emails and PDFs). I've tried to do the sending within a Thread, but I could not see any performance improvements. Also we have only around 200 files each day, which is not that much.
I use the archiving on Centera within a webservice. So the response time of storing a single file is most important for me, since the user is waiting for it.
Kindly Regards,
Andreas
mckeown_paul
409 Posts
0
September 16th, 2011 03:00
From your previous posts I thoughtyou were probably using embedded blobs as you statted the longest time taken was on the clip_write but I thought I'd mention it anyway. This looks okay although you might want to make the FP_OPTION_BUFFERSIZE slightly larger so that if you do write a blob right on the embedded data threshold you don't exceed the buffer size once you add the small overhead for the CDF itself add 64KB to it, that wont do any harm.
I'm not sure why the commit times vary so much, what sort of network do you have? Is the app server connected to the centera on the same VLAN? Are there any network issue? Is it GbE? etc etc I would normally expect an embedded blob to take of the order of 100ms
mckeown_paul
409 Posts
0
September 16th, 2011 03:00
Are you using the Embedded Blobs functionality? By default this is not used and there is an IO for each blob written and for the Content Descriptor File (CDF - whatthe clip write commits to the centera). Embedded Blobs base64 encodes the blob as an attribute in the CDF. This reduces the number of IOs for small files (<100KB) this can increase your throughput (as well as halving the number of objects used).
nishiichinoe
12 Posts
0
September 16th, 2011 03:00
I'm using the following settings:
FPPool.setGlobalOption(FPLibraryConstants.FP_OPTION_OPENSTRATEGY,
FPLibraryConstants.FP_LAZY_OPEN);
FPPool.setGlobalOption(FPLibraryConstants.FP_OPTION_MAXCONNECTIONS, 500);
FPPool.setGlobalOption(FPLibraryConstants.FP_OPTION_EMBEDDED_DATA_THRESHOLD, 102400);
thePool.setOption(FPLibraryConstants.FP_OPTION_BUFFERSIZE, 102400);
Does that look OK to you?
nishiichinoe
12 Posts
0
September 16th, 2011 04:00
We had today an EMC technican inhouse and he found some problems on the Centera related to full nodes. We will get an update of the CentraStar on tuesday. Hopefully this solves our problems.
mfh2
208 Posts
0
September 16th, 2011 05:00
I think these system improvements will help - your performance is pretty good except for the 'outlier' case which is 100X longer than your average write time. This can be a symptom of a cluster health problem.
The one other issue you should be aware of (and I suppose you already are) is the impact of writing many small objects to Centera. As you eventually upgrade to newer nodes with larger drives, the average object size needed to consume the storage capacity of a node prior to exhausting it's available object count increases. By using embedded blobs you are cutting your object count consumption in half, which will come close to solving this problem; the one other step you should consider is asking EMC to configure your Centera for 100M obj/node while they are doing the C* upgrade. Taken together, these steps should eliminate object count concerns for your application environment.
Best of luck,
Mike Horgan