Start a Conversation

Unsolved

This post is more than 5 years old

2132

August 5th, 2011 11:00

Performance Issue

Writing a 30KB clip is taking about 200ms vs 20ms few weeks ago. We are inserting images into the CAS storage and have inserted about 20 million clips till now (in 4 weeks time). Starting about a week ago, we are experiencing write performance issue and is taking 200 milliseconds for each clip. I captured the SDK log and would like an SDK engineer to analyze the log and see where the bottleneck is.

208 Posts

August 5th, 2011 12:00

This could simply be related to your Centera filling up with objects. As the system databases grow the response time for individual operations tends to increase.  I've noticed a perceptible drop in write performance somewhere around 30M obj/node, but this is very dependent on the Centera configuration and access pattern.  One way to compensate is to increase the number of write threads to take advantage of the Centera RAIN architecture.

Regards,

Mike Horgan

409 Posts

August 5th, 2011 13:00

Duh sorry you DID mention the ave size!

30KB I would expect to ingest faster than 200ms but 200ms is not way out of order.

How many IO threads are you using?

409 Posts

August 5th, 2011 13:00

As mike mentioned, at around a few 10m of objects the performance will increase but shouldn't tblobsail off again until you gtet towards the high object count supported.

One thing you never mentioned is the average size of the blobs you are writing.  Also what naming scheme are you using?

If you want someone to check an SDK log I would recommend you enter a support call with EMC support.  However looking at the log will probably just confirm what you are seeing, if there was a performance problem in the cluster the SDK log may not show up anything.

3 Posts

August 5th, 2011 13:00

Performance deterioration started when the object count reached around 15 million.

Few questions:

1. What is the high object count?
2. What will be the performance when we reach around 50M, because we plan to add around 50 million objects.
3. I do not understand the naming scheme question?

Also, We opened a service ticket with EMC: Service Request: 40735918 and the engineer I worked with is Sooraj Kumar. He looked at thw write stats on our CAS server and sent us the attached readwrite_perf.csv

The SDK log file is around 70MB and am unable to attach. Is it possible get it from the other EMC engineer(Sooraj Kumar) who has it. Otherwise, please send me the FTP link and I can FTP it.

1 Attachment

409 Posts

August 7th, 2011 11:00

High Object Count is the max number of objects supported on a single node.  Currently 100M on the latest hardware generation GEN4LP

The csv you attached shows the  average write time of 87-88 msecs and is a reasonable number.

There are (basically) two naming schemes in use on Centera to calculate the content address of blobs.  M++ which uses a combination of MD5 and SHA256 to generate the Content address and GM which uses MD5 and also then add timestamp and cluster infro.  The reason why why we have two is that with the M++ (and its predeccessor M), content addresses are very random like with no order to them, whereas the GM adds in timestamp info which makes them more "sequential" like.  When inserting random keys into indexes, the index tends to get deep and sparse to updates take longer than if the keys are more sequential like.   So the recommendation is that if you don;t care about singlke instancing your objects then use the GM naming scheme or storage stategy performance.

I would recommend you give the sdk log the customer service engineer who is working your service request.  This is a developer forum and support issues like this need to be worked through EMC supports procedures

208 Posts

August 12th, 2011 15:00

There is a service tool which can be used to 'exercise' a Centera for this type of performance investigation; since your icon indicates you are an EMC employee Paul may be able to provide more information offline.

Good Luck,

Mike Horgan

3 Posts

August 19th, 2011 09:00

I have highlighted the lines in red (in the attached file pru_cas_perf_issue.log) where the  XAM API seems to be taking  too much time. What is happening at those lines and why is it taking that much time?

Thank you.

1 Attachment

41 Posts

August 22nd, 2011 12:00

I might be completely wrong here, but form the log it seems like you're using a 'slow' naming scheme for writing blobs and clips:

<<<<<<<<<<<<<<<<<<<<<<<<<

[PACKET]     send SmartPacket

NET_SYSTEMID type=string value=PAERSCBBLA0698

NET_TRANSACTIONID type=string value=PAERSCBBLA0698/15/WRITE_BLOB

NET_VERSION type=integer value=3  HPP_CLIENT_VERSION type=integer value=262146  fieldcode=301 type=integer value=65538  NET_MESSAGEID type=integer value=42  HPP_VERSION type=integer value=1  HPP_OFFSET type=long value=35352  HPP_SEGDATA type=bytearray value=0

HPP_CONTROL type=integer value=1  HPP_BLOBID type=string value=6UEFG8CN3HMHJxFFQAOPB7RHKIL

1311998542579     2011-07-30 04:02:22.579          [debug]          5148.9096     [PACKET]     send SmartPacket

NET_SYSTEMID type=string value=PAERSCBBLA0698

NET_TRANSACTIONID type=string value=PAERSCBBLA0698/18/WRITE_CLIP

NET_VERSION type=integer value=3  HPP_CLIENT_VERSION type=integer value=262146  fieldcode=301 type=integer value=65538  NET_MESSAGEID type=integer value=42  HPP_VERSION type=integer value=1  HPP_OFFSET type=long value=1456  HPP_SEGDATA type=bytearray value=0

HPP_CONTROL type=integer value=1  HPP_BLOBID type=string value=DD0RL6B6NI1RCe4EL52RHAN4BHP

The BlobId (6UEFG8CN3HMHJxFFQAOPB7RHKIL) and the clip id (DD0RL6B6NI1RCe4EL52RHAN4BHP) are both short ones. This probably means that the cluster is using the old MD5 naming scheme, which suffers from performance degradation. What I'm not 100% sure of is whether these clip/blobids are the complete blob/clipids (including the discriminator part), so don't shoot me if I'm on the wrong track here. EMC service will probably be able to tell you that.

No Events found!

Top