Unsolved
This post is more than 5 years old
11 Posts
1
524
June 2nd, 2017 10:00
Impacts of reads and writes on a storage system
I have a VNX 5800 with a ton of 2tb NL-SAS that I just inherited.
I ran a short period of time through Mitrends
Of the top 15 LUNs:
14 reside on 2TB NL-SAS (413 drives total out of 637 in the array)
The read average on the 2tb NL-SAS is 51% with 4 LUN's peaking between 62-84%. The skew is 30% of the LUNs doing 70% of the I/O
Average I/O size is 100k with a few LUNs peaking near 200k
There are several "company standards" we have to adhere to because of the sheer number of arrays we manage (100pb), Though I manage 1pb for a specific client.
The standard here is 60 drives per 2tb pool.
No mixing of drive types within pools
We do not utilize FAST tiering between drive types (only within pools)
For every approx (20) 2TB drives added a drive is allocated to FAST Cache (though obviously not allocated every single time we add 20 drives). Today, we have 20-200gb Cache drives
My question is, I have a customer running Hadoop, where as I said, the read skew on some of my LUN's is quite high. According to this doc, https://www.emc.com/collateral/white-papers/h12682-vnx-best-practices-wp.pdf
One front end read iop translates to one back end iop, however, random reads are more likely to result in a cache miss where in the case of writes virtually all writes can be satisfied by the cache. I've been told this is a terrible setup to run Hadoop on, can anyone point me in a direction so I can illuminate the problem of running Hadoop in this storage environment?
I found this tool, would this be helpful? https://0x0fff.com/hadoop-cluster-sizing/


