SabarnaDeb

2 Intern

•

139 Posts

0

3516

December 28th, 2013 06:00

Cache Vaulting in Clarrion

Hi All,

Storage Box:- CX4-240

Currently we are using "Mirrored Write Cache". We want to enable Cache Vaulting so that i believe we will get more space for write cache as we are facing performanace issue due to force flushing. Increasing IOwait% and response time. Kindly advice and suggest me. Thanks.

Responses(14)

Roger_Wu

4 Operator

•

4K Posts

0

December 28th, 2013 19:00

Try MetaLUN or add more disks to your RAID groups.

Storagesavvy

474 Posts

3

December 29th, 2013 11:00

I don't think the terminology you are using is really what you are looking for.. Cache vaulting is a feature that protects the data in write cache during a power outage. It is always enabled. Cache Mirroring is also always enabled for the write cache. There is an HA Cache Vault option that allows write cache to remain enabled when certain failure conditions exist, at the risk of data corruption upon further failures.

What I think you are looking for is a way to increase the amount of write cache to solve a forced flushing issue. Based on your other post (https://community.emc.com/message/784863?et=watches.email.thread#784863) to the forum you seem to have some LUNs with high IO wait and high response time on your hosts running SAP.

Are you sure you have forced flushing? You need to look at analyzer data from the array to see that.

You also mentioned that you were seeing trespassing on a new Thick LUN that you created, are you sure you aren't seeing trespassing on the existing Thin LUNs also? Have you verified the multipathing configuration of the SAP Hosts. Assuming they are RedHat you can have DMP, native MPIO, or PowerPath and each has specific configuration requirements. In addition, depending on the multipath configuration, the failover mode for the host needs to be correct on the CX4 itself. If these settings are incorrect you can have trespass storms where the LUNs trespass back and forth continually between the two SPs. This will definitely cause high response times.

There is no way to get more write cache other than to further reduce the read cache size, and the result will not make much of a difference. Adjusting the watermarks down can help reduce forced flushing IF the write load is very peaky and not sustained, such that the additional cache headroom above the HWM can absorb the peak without forced flushing, then the lower write load between the peaks allow the cache to destage to disk and get back down between the LWM and HWM before the next peak. If the write load is sustained and forced flushing is continuous, lowering the watermarks will not help and all you can do is increase the backend performance (ie: add disks, change RAID type, etc).

Have you looked at Dirty Pages % in analyzer to verify if Forced Flushing is occurring? If dirty pages never gets to 100% or very rarely does, write cache isn't going to be your problem.

Have you run a report of the LUNs in Unisphere and verified that all of the LUNs are assigned to their "allocation owner"? If a pool LUN's current owner is NOT the same as the allocation owner, there will be a performance impact.

Have you looked at the SP Event Logs to see if you are experiencing lots of trespasses?

Richard

christopher_ime

2K Posts

1

December 30th, 2013 20:00

While it definitely looks like Roger and Richard are taking care of you, I simply wanted to directly answer one question you had. Keep in mind, I'm only taking the question at face value, but as Richard pointed out there are many other things to consider.

Sabarna Deb wrote:

more space for write cache as we are facing performanace issue due to force flushing.

As you already know, one option you have to get more memory to allocate to write cache is to decrease read cache (the proverbial rob Peter to pay Paul since you have a finite amount of DRAM cache).

However, what I wanted to point out is that there are several enablers that when loaded reserve SP memory which in turn reduces what is available to allocate to read and write cache. So if you aren't using any of the following features you can reclaim some of the SP memory by removing any of the following enablers. It won't be significant but again, I simply wanted to answer your question as-is:

FAST

Thin (reported as Virtual Provisioning if running FLARE 28 or 29)

Compression (requires Thin Provisioning Enabler)

FAST Cache

For example, obviously you are using the "Thin" feature; however, if you aren't using compression then that might be a candidate for removal.

SabarnaDeb

2 Intern

•

139 Posts

0

December 31st, 2013 02:00

Yes we got performance report from EMC.

1. Effected LUN14 is assigned to SPB. As per EMC " The write cache usage of SP-A is less than 90 most of the times, where its more than 90 for SP-B which causes forced flush."

2. Please check the below LUN 14 report.Please check the DP and X value.

3. I sent a mail to EMC to verify whether forced flushing is happening or not?

4. ESXi host using native multipathing and we changed failover mode from 1 to 4 & rebooted the ESXi Server to take effect.

5. We migrated the effected old 3 TB thin LUN (total data size:- 1.3TB) to new 2 TB thin LUN. We thought of creating Thick LUN but it was trespassing nearly 3000 times, may be due to low storage available space.

6. Yes, all LUNs got trespassed nearly 4 to 6 times, may be due to low storage free space. I am monitering the trespass status for all LUNs. Now there is no more trespassing for LUNs.

7. As per the performanace report shared by EMC, (please check the above screenshot), They mentioned "dirty pages at 95" & "DP at 95". I sent a mail to EMC to re-verify Dirty pages% again .

8. All LUNs current owner and allocation owner are same.

9. Now, EMC engineer want to know how much Read KB/Sec and Write KB/Sec.

SabarnaDeb

2 Intern

•

139 Posts

0

December 31st, 2013 02:00

Hi Christopher,

We are using "Thin provisioning", "SnapView" and "Accesslogix". As of now, we are not using snapview feature.

Storagesavvy

474 Posts

0

December 31st, 2013 09:00

What is LUN 14? From the EMC Support report you attached it appears to be mostly large block write workload and the pool is 100% SATA? That one LUN is doing 60MB/sec of writes. What is the purpose of the LUN?

SabarnaDeb

2 Intern

•

139 Posts

0

January 2nd, 2014 01:00

LUN14 is the effected LUN. It is using for SAP Application (MaxDB) for business transactions etc. Don't know much about this Application.

Storagesavvy

474 Posts

1

January 2nd, 2014 09:00

MaxDB is an OLTP database used within SAP. I think the overarching issue you are running into is a busy database overrunning the SATA drives. Are you able to migrate the LUN to a different pool or faster disks?

SabarnaDeb

2 Intern

•

139 Posts

0

January 3rd, 2014 00:00

We migrated data from old lun to new lun within the same pool (SATA to SATA).

lbseraph

53 Posts

1

January 3rd, 2014 02:00

Such migration should not help for this issue, you can consider to migrate it out of current storage pool, to a pool of FC disks, or to a metaLUN will be better.

SabarnaDeb

2 Intern

•

139 Posts

0

January 3rd, 2014 02:00

Yes correct. In same storage pool, two thin lun's utilization is more in same SP. As of now, i am planning to migrate most utilized LUN to meta-lun or to different storage pool's thick lun or thin lun on different SP.

christopher_ime

2K Posts

1

January 8th, 2014 02:00

Sorry for the delay.

Sabarna Deb wrote:

Hi Christopher,

We are using "Thin provisioning", "SnapView" and "Accesslogix". As of now, we are not using snapview feature.

SnapView is not an enabler/feature that reserves SP memory upfront as the four listed above. So unloading it won't reclaim memory in the same manner.

kelleg

4.5K Posts

1

January 13th, 2014 14:00

You should first determine what the real workload on the LUN is. Look at the total IOPS for the LUN in the NAR file. For example, if the LUN is doing a maximum of 3000 IOPS, you will need to create a LUN with the capacity to handle 3000 IOPS.

If a single 15K FC disk can handle about 180 IOPS (as an example), then divide 3000 by 180 to get the number of disks required to handle that workload.

I've attached two documents about the CX4 and configuring for best performance. See the Chapter 5 in the Performance and Availability guide.

glen

2 Attachments

EMC CLARiiON Storage System Fundamentals for Performance and Availability Applied Best Practices.pdf

EMC CLARiiON Performance and Availability Release 30 Firmware Update Applied Best Practices.pdf

AU

Anonymous User

375 Posts

0

August 18th, 2014 00:00

As it is SAP (OLTP), looking at the IOPS, you should really migrate the LUN to higher tier. It may not be better to keep it on the same tier, it will not make any difference, if drives' speed is an issue in order to handle the IOPS.

Thanks

Rakesh

View All

No Events found!

CLARiiON

Cache Vaulting in Clarrion

Was this post helpful?