This post is more than 5 years old
4 Posts
0
261122
May 22nd, 2014 09:00
Tiers 1 & 2 are full - replays taking up the space
Hey all,
The title pretty much states the problem I'm having - we have SSDs in Tier 1, 15K in Tier 2, and 7.2K in Tier 3.
We've got DBs running on volumes in Tiers 1 & 2, and the other servers running on volumes in Tiers 2 & 3. Our Tier 2 keeps filling, and it looks like a ton of space is being used by replays from both sets of volumes.
Is there any way to force replays to be stored in Tier 3? I understand that since we are using Tiers 1&2 for our DBs that those replays would be stored there, but even the server replays are being stored in Tier 2 and not progressing down to Tier 3 before active/new data starts getting written to Tier 3 since there isn't space left anywhere else. Then the performance lags for obvious reasons. It seems like very few of the replays are actually being placed in Tier 3..
Any help is appreciated. Thanks!
kimlawn
1 Message
0
August 4th, 2015 10:00
Picking up on this old discussion. Can you elaborate onnumber 3. Full backups and antifvirus file scans come to mind. "3) Replay sizes can be drastically affected by settings in backup, anti-virus, or even database configurations. Anything that touches all the blocks can cause bloated replays and can cause this type of problem - you would want to eliminate these problems FIRST. " Full backups and antifvirus file scans come to mind.
mtanen
118 Posts
0
May 22nd, 2014 11:00
Couple of questions - which Compellent Firmware version and Controller model?
SO.. to have this conversation its important to make a couple of statements about replays.
1) Replays are not COPIES of the data - they are the data (in read only mode)
2) Blocks that are "unavailable" (which means they are not to be read because of a changed block that will replace them) will automatically progress down to the lowest tier in the Storage Profile. Everything else follows the standard rules regarding aging of the block.
3) Replay sizes can be drastically affected by settings in backup, anti-virus, or even database configurations. Anything that touches all the blocks can cause bloated replays and can cause this type of problem - you would want to eliminate these problems FIRST.
4) SSD (when used before the Flash Enabled Compellent configurations) are workload specific and should never be used as a general tier.
It sounds like the main problem here is Tier 2. This is the write Tier for some of your applications (with RAID10 probably) and the archive tier with your more critical applications. I would probably check the following:
1) Do you have any stuck replays - check for something past its expire date (that is not the first and only replay) and if it wont go away call Copilot and have them look at it.
2) Are you using replication? If something slows down the sync between sites replays can stay around longer then expected
3) For the applications using Tier 1 as their write destination (SSD), Have you verified that write data is saved as RAID5 to Tier 2? Is replay data set to RAID 5 for Tier 1/2?
4) Is there a need to do space reclamation for these applications (ie: has thin provisioning not kept up with the OS level changes?) - this may help you free up some blocks.
If you get desperate Copilot can walk you through scenarios to clear up space (use CMM to force move a LUN to a specific tier, etc)
alfoxo
4 Posts
0
May 22nd, 2014 12:00
Oh, forgot to mention.
It's an SC8000 running v6.3.10.
alfoxo
4 Posts
0
May 22nd, 2014 12:00
Thanks a lot for the response, Michael. As you can probably tell, I'm pretty new to the storage side of things so you helped my understanding a lot.
2) Blocks that are "unavailable" (which means they are not to be read because of a changed block that will replace them) will automatically progress down to the lowest tier in the Storage Profile. Everything else follows the standard rules regarding aging of the block.
I want to make sure I understand this statement correctly - If I have a replay of a volume done today, then all of that data is read-only. The next day when a replay is done, the delta between the two replays gets stored along with the original replay and then the data that changed in the original replay automatically gets dropped to the lowest tier in the storage profile?
2) Blocks that are "unavailable" (which means they are not to be read because of a changed block that will replace them) will automatically progress down to the lowest tier in the Storage Profile. Everything else follows the standard rules regarding aging of the block.
I'm not familiar with the standard rules of aging a block. Is that just a daily thing that happens with Data Progression automatically?
No stuck replays, no replication (yet).
Here is a screenshot of one of my DB volumes for reference:
As you can see, some replay data is in Tier 3, which might be because our Tiers 1 & 2 are full.
Steve Kneuper
4 Posts
1
May 22nd, 2014 13:00
I'm usually a "lurker" here... but find the following in your management gui. It shows the global view of how your storage is used... you can see ours is just tier-1 and tier-3, with lots of data on tier-1 raid-5 as it works it way down to the tier-3 raid-6. We only have one server locked into tier-1... everything else progresses toward the tier-3. Daily snapshots kept one day, and data progression each early morning.
--steve
mtanen
118 Posts
0
May 22nd, 2014 14:00
How many and what kind of disks make up your system?
alfoxo
4 Posts
0
May 22nd, 2014 14:00
Thanks for all the responses! That makes sense, Michael. It is definitely different than what I thought it was initially. Any documentation to explain is appreciated!
8x 372GB SLCSSDs including 1 hot spare
11x 560GB 15K disks including 1 hot spare
7x 2.73TB 7K disks including 1 hot spare
mtanen
118 Posts
0
May 22nd, 2014 14:00
When you make a replay you are making all of the blocks or changed blocks of the volume read only (depending on whether its the first or subsequent replay). The replay is still the data that is being served when a read request comes in for a block and therefore follows the normal aging rules for data progression. If a block is changed (a change to a block is made) and then a replay is taken - the original block is no longer used to answer a read request and it moves to the lowest tier immediately.
I am looking to see if the "day in the life of a page" presentation from DEF/DUF is available publically for distribution. It provides a powerpoint that explains this in infinite detail. (including the normal rules).
I would recommend that you work with Copilot to eliminate RAID10 from your Tier3 - there are some caveats you will need to consider, but it will return a BUNCH of Tier 3 storage to you.
Steve Kneuper
4 Posts
0
May 22nd, 2014 15:00
Michael has given you good advice... I'm not sure how tight you are on space, but perhaps you should aim for raid-10 on your SLCSSD disks, and a mix of raid-10 and raid-5 on your tier 2, and there is little reason to be doing raid-10 on your Tier-3; it should be catching the inactive blocks waterfalling out of Tier-2. --steve
mtanen
118 Posts
0
May 22nd, 2014 17:00
Ok - so as I read this - your Tier 2 is not out of space - space is simply allocated into the RAID5-9 extent and that is leaving no space for the RAID10 extent to grow. You can get Copilot to get that extent to prune to provide you a little short term relief.
Honestly - I am a little confused about the configuration of the system as a whole. Depending on when you bought it, the budget, and the source - I would normally see a system like this entirely Flash. The reason is that at 14TB of space I can cover more IOPs and deliver more value with 12 MLC 1.6TB drives and 6 400GB SLC drives.
As you stand now I am a little concerned about your I/O needs for the applications. Your tier 1 is effectively delivering 40K IOPS, your Tier 2 is delivering 1760 IOPS, and your Tier 3 is bringing 420.
I would expect performance problems in those lower tiers with multiple workloads hitting them.
mtanen
118 Posts
0
August 4th, 2015 19:00
When something like a backup or anti-virus does it scans by using a flag like the archive bit, this causes alot of change because setting even one bit means the block has to rewrite. This overhead will then get stored in your replays and bloat them. You should be sure to disable this type of function in anything scanning the block system. Also - you should not be running defrags for similar reason on any sort of regular basis. I have seen instances where a defrag now and then will fix OS level index problems, but it does not really help on a SC backend.