Unsolved
This post is more than 5 years old
3 Posts
0
2132
August 5th, 2011 11:00
Performance Issue
Writing a 30KB clip is taking about 200ms vs 20ms few weeks ago. We are inserting images into the CAS storage and have inserted about 20 million clips till now (in 4 weeks time). Starting about a week ago, we are experiencing write performance issue and is taking 200 milliseconds for each clip. I captured the SDK log and would like an SDK engineer to analyze the log and see where the bottleneck is.
No Events found!
mfh2
208 Posts
0
August 5th, 2011 12:00
This could simply be related to your Centera filling up with objects. As the system databases grow the response time for individual operations tends to increase. I've noticed a perceptible drop in write performance somewhere around 30M obj/node, but this is very dependent on the Centera configuration and access pattern. One way to compensate is to increase the number of write threads to take advantage of the Centera RAIN architecture.
Regards,
Mike Horgan
mckeown_paul
409 Posts
0
August 5th, 2011 13:00
Duh sorry you DID mention the ave size!
30KB I would expect to ingest faster than 200ms but 200ms is not way out of order.
How many IO threads are you using?
mckeown_paul
409 Posts
0
August 5th, 2011 13:00
As mike mentioned, at around a few 10m of objects the performance will increase but shouldn't tblobsail off again until you gtet towards the high object count supported.
One thing you never mentioned is the average size of the blobs you are writing. Also what naming scheme are you using?
If you want someone to check an SDK log I would recommend you enter a support call with EMC support. However looking at the log will probably just confirm what you are seeing, if there was a performance problem in the cluster the SDK log may not show up anything.
akenapalli
3 Posts
0
August 5th, 2011 13:00
Performance deterioration started when the object count reached around 15 million.
Few questions:
1. What is the high object count?
2. What will be the performance when we reach around 50M, because we plan to add around 50 million objects.
3. I do not understand the naming scheme question?
Also, We opened a service ticket with EMC: Service Request: 40735918 and the engineer I worked with is Sooraj Kumar. He looked at thw write stats on our CAS server and sent us the attached readwrite_perf.csv
The SDK log file is around 70MB and am unable to attach. Is it possible get it from the other EMC engineer(Sooraj Kumar) who has it. Otherwise, please send me the FTP link and I can FTP it.
1 Attachment
readwrite_perf_2425.csv
mckeown_paul
409 Posts
0
August 7th, 2011 11:00
High Object Count is the max number of objects supported on a single node. Currently 100M on the latest hardware generation GEN4LP
The csv you attached shows the average write time of 87-88 msecs and is a reasonable number.
There are (basically) two naming schemes in use on Centera to calculate the content address of blobs. M++ which uses a combination of MD5 and SHA256 to generate the Content address and GM which uses MD5 and also then add timestamp and cluster infro. The reason why why we have two is that with the M++ (and its predeccessor M), content addresses are very random like with no order to them, whereas the GM adds in timestamp info which makes them more "sequential" like. When inserting random keys into indexes, the index tends to get deep and sparse to updates take longer than if the keys are more sequential like. So the recommendation is that if you don;t care about singlke instancing your objects then use the GM naming scheme or storage stategy performance.
I would recommend you give the sdk log the customer service engineer who is working your service request. This is a developer forum and support issues like this need to be worked through EMC supports procedures
mfh2
208 Posts
0
August 12th, 2011 15:00
There is a service tool which can be used to 'exercise' a Centera for this type of performance investigation; since your icon indicates you are an EMC employee Paul may be able to provide more information offline.
Good Luck,
Mike Horgan
akenapalli
3 Posts
0
August 19th, 2011 09:00
I have highlighted the lines in red (in the attached file pru_cas_perf_issue.log) where the XAM API seems to be taking too much time. What is happening at those lines and why is it taking that much time?
Thank you.
1 Attachment
pru_cas_sdk_perf_issue.log
kim_marivoet
41 Posts
0
August 22nd, 2011 12:00
I might be completely wrong here, but form the log it seems like you're using a 'slow' naming scheme for writing blobs and clips:
<<<<<<<<<<<<<<<<<<<<<<<<<
[PACKET] send SmartPacket
NET_SYSTEMID type=string value=PAERSCBBLA0698
NET_TRANSACTIONID type=string value=PAERSCBBLA0698/15/WRITE_BLOB
NET_VERSION type=integer value=3 HPP_CLIENT_VERSION type=integer value=262146 fieldcode=301 type=integer value=65538 NET_MESSAGEID type=integer value=42 HPP_VERSION type=integer value=1 HPP_OFFSET type=long value=35352 HPP_SEGDATA type=bytearray value=0
HPP_CONTROL type=integer value=1 HPP_BLOBID type=string value=6UEFG8CN3HMHJxFFQAOPB7RHKIL
1311998542579 2011-07-30 04:02:22.579 [debug] 5148.9096 [PACKET] send SmartPacket
NET_SYSTEMID type=string value=PAERSCBBLA0698
NET_TRANSACTIONID type=string value=PAERSCBBLA0698/18/WRITE_CLIP
NET_VERSION type=integer value=3 HPP_CLIENT_VERSION type=integer value=262146 fieldcode=301 type=integer value=65538 NET_MESSAGEID type=integer value=42 HPP_VERSION type=integer value=1 HPP_OFFSET type=long value=1456 HPP_SEGDATA type=bytearray value=0
HPP_CONTROL type=integer value=1 HPP_BLOBID type=string value=DD0RL6B6NI1RCe4EL52RHAN4BHP
The BlobId (6UEFG8CN3HMHJxFFQAOPB7RHKIL) and the clip id (DD0RL6B6NI1RCe4EL52RHAN4BHP) are both short ones. This probably means that the cluster is using the old MD5 naming scheme, which suffers from performance degradation. What I'm not 100% sure of is whether these clip/blobids are the complete blob/clipids (including the discriminator part), so don't shoot me if I'm on the wrong track here. EMC service will probably be able to tell you that.