Unsolved
This post is more than 5 years old
26 Posts
0
1043
August 1st, 2012 07:00
DMX4 - WP % Hitting 100% - Where Is Our Problem?
Good morning,
We are running a DMX4 configured as follows:
Model - DMX4-24
Enginuity Build - 5773.176
Cache Size - 131072 (MB)
We have an issue where we are occasionally hitting 100% cache utilization and when this occurs, as you can imagine, we begin to suffer severe performance issues.
Here is an example of that from SPA:
In this case, we hit the 100% utlization mark twice in this 24 hour interval. During both of these examples, write response time went out of control.
Digging a bit further, I find that our DA utilization is pretty warm as well all of the time. This differs greatly from the other DMX's in our environment where they are running much cooler. Following is an exaple of that from SPA:
Looking at the utlization of the DF's I can see that they are running consistently at around 70% and peaking up over 80%.
So, data in cache must destage to the back end and this must traverse the DA Directors, which are running at a high utilization.
So...to my questions:
1) What percentage utilization it too much for a DA director?
2) If the DA is our issue, how can we go about improving performance of the environment? Adding additional DA's?
3) Is there something other then Cache and DA utilization that I should be looking at?
Thank you in advance for your input!
Quincy561
1.3K Posts
0
August 1st, 2012 07:00
BTW, if your DAs are showing over 50% utilization most of the time, that is probably too high. Going above 50% for breif periods of time during the day is probably OK.
Quincy561
1.3K Posts
0
August 1st, 2012 07:00
System WP level events are caused by filling cache with writes faster than the system can destage.
Generally adding cache won't help much, if at all.
This can be corrected by slowing the writes into the system, or speeding up the destage.
Most people don't want to slow the writes into the system, so you probably want to speed up the destage.
First thing to look for is hot drives. You can hit system WP because of a few hot disks. In this case you need to spread the load over more drives, or implement cache partitioning so the slow drives don't hurt all the workloads.
If the workload is already spread over the drives, then you generally need to add more drives and/or DA CPUs to speed up the destage.
One other option is to change RAID protection to one that can destage faster, such as from RAID5 to RAID1 or RAID6 to RAID1.
sunking1
26 Posts
0
August 1st, 2012 08:00
I know that we are collecting STP data on the frame but our CE normally pulls it for us. Is this something that I can pull on my own?
sunking1
26 Posts
0
August 1st, 2012 08:00
Quincy,
Thanks for the responses.
Here is a sample of our DA utilization for the past 4 hours. Note, that this is normal state for us and we do peak close to the 90% mark often:
Seemed awful high to me.
Quincy561
1.3K Posts
0
August 1st, 2012 08:00
Also if you have STP data, that could be helpful to me.
Quincy561
1.3K Posts
0
August 1st, 2012 08:00
Not from the service processor, but it is possible to collect it on the same host that is collecting the SPA data, or another host connected to the Symm with SE installed.
If you want to PM me with the serial # I can possibly get the STP data, depending on how big the files are, and how slow the connection is.
Quincy561
1.3K Posts
0
August 1st, 2012 08:00
Yes it is. Looks like you have a DMX2500, so you can add two more pairs of DAs and drives which would cut the DA utilization in half, if you could spread the workload out over the new DAs.
Or maybe it is time for a new VMAX :-)