VNX7600 optimal/reasonable cabling and disk placement.

Question

I am acquiring a VNX7600 with the following:

DPE + 11 x 2.5" DAEs

46 x 3.5" DAEs

I have a full six back-end I/o channels and the following disks:

4 x 2.5" 300GB (flare)

15 x 2.5" 200GB SSDs

281 x 2.5" 600GB 10k

30 x 3.5" 200GB SSDs

660 x 3.5" 2TB NL-SAS

The 45 x SSDs are for FAST Cache...there will be two more than I technically need for that, but I'm content to just leave the extra pair as additional spares.

I intend to create 10 x pools where each pool is built with ((3 x (8+1R5 of 600GB)) + (4 x (14+2R6 of 2TB))). That math leaves me with a total of 9 x 600GB spares, and 20 x 2TB spares; which seems more than sufficient to the task.

In round numbers, each pool will have ~12TBs of 10k disks, and ~100TBs of NL-SAS, and there will be a full ~4TBs of FAST Cache above them. I'm planning to provision ~80-96TBs of mostly thick LUNs out of each pool to VMWare clusters. Our clusters are fairly large, so these will be predominantly 8TB and 16TB LUNs, with few exceptions.

The usage will be general purpose...everything...from file servers to DBs/web servers/etc. So I am looking to balance performance rather than favor some portion over the rest.

That's all background.

I prefer to reason out a layout for the DAEs and disks, rather than just do whatever and hope for the best. So the considerations I have are generally:

> Placement of the FAST Cache, should I spread it out to all six back-end channels, or put it on a smaller number. Generally, spreading as widely as possible was the old school advice, and I'm inclined to do that.

> Placement of the 10k vs NL-SAS disks. Same basic question, I could stack all of the 10k disks onto a couple of channels (with just a modest mix of NL-SAS being necessary to accommodate the total disk count), or I could balance all the disk types across all the channels.

If you read this far: thanks. I'd appreciate any input/feedback.

I tend to think that spreading everything out will be best, but if there are any good reasons not to, I would love to hear them.

kelleg · Answer

1. FAST Cache - spread the disks over all the buses and place in the first slot on each DAE - try to use the same number in each DAE. SSD's can eat up the bandwidth on the buses pretty quickly, so spread them over as many busses as possible. FAST Cache will create two disk Raid 1 sets. See KB 84362 CLARiiON and VNX: Where do I find information about FAST Cache - then see the 73184 FAST Cache configuration best practices. This is probably the most important step in setting up the array. A rule of thumb is to have 1 SSD in FAST Cache for every 25 spinning disks - so if you have 100 spinning disks in Pools, then you should have a minimum of 4 SSD in FAST Cache. Do use all the SSD's for FAST Cache, you can get better performance in the Pools using at least one group of SSD for the Extreme Performance tier - 5 SSD in 4+1 Raid 5.

2. For the 10K SAS use Raid 5 in groups of 5 disks

3. For the NL-SAS - use 8 disks in 6+2 Raid 6.

4. You don't need to spread the SAS/NL-SAS over the buses, but maybe keep the different types in different DAE's. It won't hurt to not to, but might help.

5. Make sure you leave at least 5% (10% is better) free in each Pool for FAST VP (relocation/re-balancing).

6. If you're going to be using Data Based (SQL, Oracle) then you need to take for care in configuring the Pools. The LOGs and Temp should be in a separate Pool with FAST cache disable and no SSD. The DB should be in a Raid 5 pool and FAST Cache enabled - the best practice for SSD talks about this also. There are a number of White Papers for VNX and databases - check those out if you plan on uses DB's.

docu50157_White-Paper--Microsoft-SQL-Server-Best-Practices-and-Design-Guidelines-for-EMC-Storage-EMC-VNX-Family,-EMC-Symmetrix-VMAX-Systems,-EMC-Xtrem-Server-Products.pdf

Some documents that will be helpful:

docu48708_White-Paper_-Introduction-to-the-New-VNX-Series-VNX5200,-VNX5400,-VNX5600,-VNX5800,-VNX7600,-and-VNX8000_-A-Detailed-Review.pdf

docu48706_White-Paper_-Virtual-Provisioning-for-New-VNX-Series-VNX5200,-VNX5400,-VNX5600,-VNX5800,-VNX7600,-and-VNX8000.pdf

docu42660_VNX2-Unified-Best-Practices-for-Performance---Applied-Best-Practices-Guide.pdf

glen

ZaphodB · Answer

Glen, I would not argue with anything you said, but I do want to see if you have a typo...

"Do use all the SSD's for FAST Cache"

Did you mean that, or did you mean to say do not?

With over 950 spinning disks in the configuration, I'd get 38 fast cache disks with your calculation...which is pretty darn close to the 45 that I have available. so I think it's basically a moot point, but I thought I'd ask for clarification...for the next guy.

I expected the advice to use the smaller RAID group sizes for the 10k and NL-SAS disks. I am accustomed to planning for maximum capacity without undo compromise to integrity. And I've been using the larger RAID sizes successfully for something like a decade based on that. But I am giving that a bit of thought in this case. I am less bound by acquisition costs in this case than I usually am, and I could decide to place a higher value on reliability, based on a different service/maintenance model.

If I can justify another 2.5" DAE and 25 more 10k disks I can shift from 8+1R5 to 4+1R5 at that tier and get the same amount of 10K capacity in each pool (~12-13TBs).

Making that change would cut into my 2TB spares, down from 20 to 5...which isn't unreasonable as long as we keep an eye on the array and don't let failures go unnoticed. The spares for the 10k disks would go from 9 to 4, which again, isn't unthinkable.

Shifting the capacity part of each pool from (4 x (14+2R6 of 2TBs)) to (8 x (6+2R6 of 2TBs)) will cut the capacity at that tier by ~14TBs. From ~100TBs, to ~86TBs. That too is possible, but at a cost:

I had figured each pool at a net capacity of ~112TBs, and set the expectation of getting a minimum of 80TB of LUNs, and a maximum of 96TB of LUN out of each. That's where I agree with, and lean to the cautious side of, your suggestion to leave 5-10% free in each pool. The bare minimum is 10+% of whatever the net size of the capacity tier size is, as the logic for tier balancing will go a bit batshit crazy(to use the technical term) if you go under that. My personal experience is that 10-15% of the net pool size is a range where you shouldn't see any issues, and that above that...your mileage may vary.

But if I set that reservation issue partially to the side, and look at shifting to the smaller RAID groups, I will still have a net of ~99TB (please check that math if it doesn't look right) comprised of ((6 x (4+1R5 of 600GB)) + (8 x (6+2R6 of 2TB))) in each pool. That's more than big enough to provision 80TB, but uncomfortably tight at 96TB, even at the bare minimum. So I will cut into my maximum yield, and be foregoing the advantages of short-stroking the NL-SAS disks a bit.

Still, if I accept the ceiling as 88TB instead of 96, then I could be comfortable with that..

One question then becomes: Will that pool structure perform significantly better than the originally proposed ~112TB net pools?

I would love to know that, but I'm fairly sure that the best answer available would be no better than 'it depends'.

I would posit that if you put equal workloads into pools built with 14+2R6 versus 6+2R6 you *might* measure a difference, but I wouldn't bet more than lunch money which side it would fall on, if they were truly equal workloads which would fit in either structure.

I would, however, definitely bet that you could get more capacity out of the 14+2R6 pools without failure/degradation than the 6+2R6 pools. And the spinning disk based IOPS likely wouldn't be significant;y different either way.

I appreciated seeing the advice concerning physical separation of Log and Temp in DBs; these days most vendors, Dell EMC included are selling "Flash for everything, and who cares where it lives", and that just violates everything I have learned in decades of working with various databases. Our guest templates do still separate data from log, from temp, from OS, even though that may become an anachronism.

If you read this far *again* I owe you an internet beer...redeemable at some future conference...

Please let me know what you think.

kelleg · Answer

Sorry - that was a typo - I meant to say do not use all SSD for FAST Cache. It's a trade off between FAST Cache and adding SSD to the Pool. If your data has a high degree of locality, then more FAST cache is preferred. If not, then using SSD in the Pool is probably better. If you have a lot of sequential IO, then FAST cache is wasted if you have a lot and is more useful in the Pool. From what I've seen over the years is that adding at least one 4+1 R5 SSD to all Pools helps a lot - the metadata for the LUNs is now stored there and that certainly helps performance. When you have more free space (10%-5%), auto-tiering is much more likely to always finish each night - the system is biased to put as much of the metadata as possible in the SSD tier and this helps when LUNs trespass. Sometimes this is just a guess, but I know that the EMC local teams have tools which attempt to size the Pools based on data collected on the array over a period of time (say one week).Mitrend was very useful in characterizing the performance, now there's LiveOptics - sort of the same.

I recommend the smaller raid group to minimize the effects of disk failures - the re-build times for NL disks can be quite long and the more disks in each private raid group in the pool, then longer the degraded time. Again, it's a trade off between risk and capacity. In terms of performance the difference between 12+2 and 6+2 is not that big of a different and with R6 you do have that added protection during re-build. That one is a toss up. The same applies to R5 - 4+1 vs 8+1 - some workloads perform a bit better with 8+1, but most are best at 4+1. In real world, the difference is minimal - my concern is always to lean toward higher safety.

On DB - on the Unity with all flash, the separation of the different components is no longer the same. But I still tend to believe that the different IO sizes and types (R vs W) is still important. On Unity I would recommend testing that out first.

In my experience its the very large IO Size that causes the most performance issues - IO's over 126KB in size clog up the front end ports. We've seem several classes of applications that like to use very large IO's - up to 1MB in size - this is great for lots of Reads, but it kills the Write acknowledgements waiting behind those large IO's.

glen

Rainer_EMC · Answer

dont forget - especially with slow disks and thin objects you really want a decent amount of flash in the pool so that we can at least store the metadata on flash

ZaphodB · Answer

With the resources I have available (45 x 200GB SSDs), I was planning to create a maximum sized FAST Cache, and leave only spinning disks in the pools. Traditionally, FAST Cache is more responsive to changes (no relocation up/down), and gets suballocated on a much more granular basis (64KB vs 1GB).

I can see the value in having SSD capacity in pools for metadata, but I'm wondering if that is less of an issue for thick LUNs versus thin LUNs. Institutionally we prefer not to oversubscribe at the storage layer; the guests are provisioned thin into their datastores and the monitoring and adjusting happens there. Which is a fancy way of saying that nearly everything is thick on the arrays.

I may have the freedom to do a bit of testing before this goes into use, and if so, I would probably want to see what happens if I lower the FAST Cache to 1.8TBs. Doing that would leave me 25 SSDs to distribute to pools. I don't know if I could simulate a diverse enough load to do any significant measurements of the difference, but I'd be curious.

VNX

VNX7600 optimal/reasonable cabling and disk placement.

Was this post helpful?