Unsolved
This post is more than 5 years old
64 Posts
0
13894
March 1st, 2011 09:00
MD1220 Benchmarking.
Hi,
I have connected PE310 server (1cpu,2.93GHz,8cores,4GB memory) to a MD1220 via a H800 PERC adapter (1Gb NV cache).
the connection is single node , unified(redundant mode) . OS as well as , BIOS and H700(local disk) and H800 firmware is up to date.
I wanted to benchmark the R/W speeds on RAID 5 and RAID10 volumes on MD1220 so I used dd command after mounting them on centos 5.5.
Here are my results :
Note : Disk Cache Policy = Enabled in all cases and other settings were default with intelligent mode enabled on RAID 5 and RAID 10.
== RAID 10 - EXT3 ==
/dev/sdb1 402G 199M 382G 1% /mnt/raid10 EXT3
Write :
[root@apple ~]# time dd if=/dev/zero of=/mnt/raid10/zerofile.tst bs=1k count=4700000
4700000+0 records in
4700000+0 records out
4812800000 bytes (4.8 GB) copied, 13.9526 seconds, 345 MB/s
real 0m13.959s
user 0m0.238s
sys 0m12.334s
Read:
[root@apple ~]# time dd of=/dev/random if=/mnt/raid10/zerofile.tst bs=1k count=4000000
4000000+0 records in
4000000+0 records out
4096000000 bytes (4.1 GB) copied, 42.5778 seconds, 96.2 MB/s
real 0m42.579s
user 0m0.066s
sys 0m20.184s
== RAID 10 - EXT4 ==
Write :
time dd if=/dev/zero of=/mnt/raid10/zerofile.tst bs=1k count=4700000
4700000+0 records in
4700000+0 records out
4812800000 bytes (4.8 GB) copied, 10.5057 seconds, 458 MB/s
real 0m10.535s
user 0m0.085s
sys 0m10.224s
Read:
time dd of=/dev/random if=/mnt/raid10/zerofile.tst bs=1k count=4000000
4000000+0 records in
4000000+0 records out
4096000000 bytes (4.1 GB) copied, 53.6638 seconds, 76.3 MB/s
real 0m53.672s
user 0m0.037s
sys 0m20.345s
== RAID 5 - EXT3 ==
/dev/sdb1 536G 198M 509G 1% /mnt/raid5 -EXT3
Write :
time dd if=/dev/zero of=/mnt/raid5/zerofile.tst bs=1k count=4700000
4700000+0 records in
4700000+0 records out
4812800000 bytes (4.8 GB) copied, 14.5641 seconds, 330 MB/s
real 0m14.592s
user 0m0.207s
sys 0m12.289s
Read:
time dd of=/dev/random if=/mnt/raid5/zerofile.tst bs=1k count=4000000
4000000+0 records in
4000000+0 records out
4096000000 bytes (4.1 GB) copied, 32.3221 seconds, 127 MB/s
real 0m32.323s
user 0m0.028s
sys 0m20.174s
== RAID 5 - EXT 4 ==
/dev/sdb1 536G 198M 509G 1% /mnt/raid5 - EXT4
Write :
time dd if=/dev/zero of=/mnt/raid5/zerofile.tst bs=1k count=4700000
4700000+0 records in
4700000+0 records out
4812800000 bytes (4.8 GB) copied, 10.0236 seconds, 480 MB/s
real 0m10.056s
user 0m0.058s
sys 0m9.804s
Read:
time dd of=/dev/random if=/mnt/raid5/zerofile.tst bs=1k count=4000000
4000000+0 records in
4000000+0 records out
4096000000 bytes (4.1 GB) copied, 31.7629 seconds, 129 MB/s
real 0m31.764s
user 0m0.061s
sys 0m20.108s
-------------
Hope the o/p above is understandable. If there is more info required I can provide that. My question is does this performance look okay/reasonable? or speeds are very low ?
If there is any other method to do r/w benchmarking please let me know. I do see ext4 performance far better than ext3 however, I am not sure if ext4 is being used for NFS sharing , as well as main concern if any backup solution currently backup ext4 fs on client?
Thanks
ora4dba
1 Rookie
•
7 Posts
0
March 17th, 2011 13:00
Hi UPENGAN78,
Your post might have some more information to allow comparing with other tests. At the very least, it needs:
What kind and how many drives you have in your array? What is the stripe size? What ids you Write POlicy configuration?
Also, your test is using a file of the size of your RAM. A good practice for benchmarking is to use files at least twice the size of your RAM, so you don't test your OS cache throughput instead of your drives. Also, when you are doing writing test with dd, to make it reall array test, you need to execute sync after the dd. Otherwise,
some of your data in the end of the test may be only written to the OS buffer cache, but not flushed to the disks. It inflates your dd write numbers. You need to time execution of the pair dd+sync, and then divide the file size by the total time. I saw adding sync after the dd showed that the real throughput is 30% lower than what dd reports.
Anyway, your numbers look pretty low to me, especially given that you used such a small file size.
I did my tests on MD1220, too, using pgbench.
My test configuration was:
Dell R710 2CPU X5650 @ 2.67GHz 12 cores 96GB DDR3, RHEL5.5 , kernel 2.6.18-194.26.1.el5 , filesystem XFS
MD1220 Dell H800 1GB cache, uniform ( redundant path) connection, Write Policy Write Through, Read Policy: No-Read-Ahead
RAID10 512K chunk, 22x300GB 2.5" SAS 10K RPM
bonnie++ v1.96, file size 192GB, 8 concurrent threads: Write throughput 1GB/s, Read throughput 1.1GB/s
For dd tests I had the same config, but the stripe element size was 1MB, instead of 512K, which does not make much difference on results.
dd write without sync 1.2GB/s
dd with sync 919MB/s
dd read 1.3GB/s
The stripe size I've chosen makes my set up optimized better for the sequential I/O throughput in MB/s. Though, it still gave me at least decent 2400 IOPS of random I/O below 40ms latency.
If you don't really care about sequential I/O throughput, and your I/O is random, you may use lower stripe size. Then better test for IOPS, not for MB/s.
My machine had much more memory tan yours because I wanted to test it in the configuration close to what I'm going to use. However, this large memory should be more than compensated by the file size I was using, which won't fit into the memory.
I think you should be able to get much more throughput from your array, unless you tested for very few spindles . I would look at the controller configuration: Do you have Write BAck cache turned on, what stripe size do you have? If you going to look through your configuration, make any changes and retest, I would be interested in knowing your results.
Regards
Igor Polishchuk
upengan78
64 Posts
0
March 17th, 2011 14:00
>What kind and how many drives you have in your array? What is the stripe size? What ids you Write POlicy configuration?
total Drives in bay = 8, 146G,SAS6GB,15K,2.5
RAID 10 - all 8 disks were used
RAID 5 - 5 in disks in volume and 2 hot spare, 1 disk unused.
Read Policy -AAdaptive Read Ahead
Write Policy - Write back
Disk Cache Policy - Enabled
>I think you should be able to get much more throughput from your array, unless you tested for very few spindles . I would look at the controller configuration: Do you >have Write BAck cache turned on, what stripe size do you have?
Stripe Element Size in Raid 5 is 64KB and I believe it must be the same in RAID 10 as well? I didn't change the default ;)
Your inputs are much appreciated, thanks for that. It is very informative..
I'd like to redo tests per the parameters that you have set for RAID10 and with bonnie++. can you send me the command that you use for bonnie++ ?
Thanks once again!
upengan78
64 Posts
0
March 17th, 2011 14:00
Raid 10, all 8 disks used, unified (redundant mode) 2 disks per span
No Read ahead, Write through, Disk cache enabled, Intelligent raid not checked, Centos 5.5, ext4
dd if=/dev/zero of=/mnt/raid5/zerofile.tst bs=1k count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 21.991 seconds, 466 MB/s
dd of=/dev/random if=/mnt/raid5/zerofile.tst bs=1k count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 163.891 seconds, 62.5 MB/s
dumpe4fs /dev/sdb1
dumpe4fs 1.41.9 (22-Aug-2009)
Filesystem volume name:
Last mounted on: /mnt/raid5/mnt/raid5
Filesystem UUID: 9e963db8-4b18-41e5-9d93-a2929778ca80
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 35684352
Block count: 142735509
Reserved block count: 7136775
Free blocks: 140445301
Free inodes: 35684341
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 989
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Thu Mar 17 15:28:55 2011
Last mount time: Thu Mar 17 15:29:34 2011
Last write time: Thu Mar 17 15:29:34 2011
Mount count: 1
Maximum mount count: 39
Last checked: Thu Mar 17 15:28:55 2011
Check interval: 15552000 (6 months)
Next check after: Tue Sep 13 15:28:55 2011
Lifetime writes: 9 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: adbbe8be-de10-42f5-b91f-0523117ab077
Journal backup: inode blocks
Journal size: 128M
Well, one thing I note here it says, 9% complete Progress. Does it mean raid wasn't initialized? I could create partition /fs anyways :emotion-1:
ora4dba
1 Rookie
•
7 Posts
0
March 18th, 2011 12:00
UPENGAN78,
Answering your question: The bonnie command I've used is:
bonnie++ -d /dev/sdc -s 288g -x 3 -c 8
-c is concurrency here, -x - number of passes, -s - file size ( your 10GB is enough for your 4GB RAM)
Your write data actually look ok for just 8 spindles.
Strangely, the reads are way worse than writes, even on the file that small. Something is not right here.
I still did not get what stripe size you are using. It may be important for the large sequential I/O to have a high stripe size element.
I was testing 1MB and 512KB. Did not see much difference between them, but I'm sure I would if I had 64KB size.
I also did not mentioned, that I've disabled read-ahead on the controller, but increased the read-ahead size on the OS level.
You may just enable it on the controller, I just think it is more benefitial for the writes, if controller is not busy with read-aheads.
Also, you don't need to create your array as 2 drives per span. It seems to me, what you have now is 4 RAID1 in JBOD.
These spans are not really well doccumented, but this is my understanding of the thing, and I'm actually not a storage admin, rather DBA.
I understand that multiple spans are ment to be used in RAID 50 and 60. For your purposes, you just need a single-span RAID10.
Regards
Igor Polishchuk
upengan78
64 Posts
0
March 18th, 2011 12:00
Current setup:
RAID 10, 408G Volume size, 6 disks used in volume + 2 hot spares.
Read Policy Adaptive Read Ahead
Write Policy Write Through
Stripe Element Size : 256KB
Disk Cache policy Enabled
File system on Centos = Ext4 (with journal)
bonnie++ -u root -d /mnt/raid10/ -s 10g -x 3 -c 8
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
myserver.domain 10G 858 99 317790 61 186272 18 2358 99 422118 16 1367 11
Latency 9721us 89843us 104ms 7276us 11097us 34793us
myserver.domain 10G 840 99 321779 61 185796 18 2307 99 423365 14 1444 12
Latency 9946us 89414us 95007us 8473us 12339us 50543us
myserver.domain 10G 838 99 318344 60 181858 18 2355 98 410625 14 1452 13
Latency 10035us 58022us 86463us 23355us 12298us 34662us
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
myserver.domaine 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency 141us 416us 438us 137us 15us 30us
myserver.domaine 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency 410us 420us 432us 408us 7us 28us
myserver.domaine 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency 226us 413us 434us 96us 66us 82us
What do you think now? it is 317 MB/s for sequential, and 422MB/s for random. It still looks low but I am wondering if I should change any parameters.
upengan78
64 Posts
0
March 18th, 2011 13:00
From what I understand that SAS Cable is 6Gbps and SAS drive is 15K6GB specification, maximum speed in theory I should get should be 6 Gigabit /second. While I am getting 422MB/s for Random which = 422x8 = 3376 Mbits/s = 3.376 Gbits/s which I think is okay.
With dd I got 465MB/s = 465x8 = 4520 Mbits/s = 4.520 Gbits/sec
Am I correct above?
ora4dba
1 Rookie
•
7 Posts
0
March 20th, 2011 00:00
1. Your 422MB/s is not for random, this is your sequential read throughput.
Your read 422MB/s and write 317MB/s, when divided by just 6 spindles look in-line with my results ( about 1GB/s on 20 spindles). You probably cannot get much more from them.
2. You don't measure your random performance in MB/s, but rather in IOPS, which you have about 1367 for 6 spindles - a little too good. Probably, at this small size of the file, the 1GB I/O controller cache slightly inflates the performance. Anyway, random IOPS look more than decent for 6 spindles.
3. As I mentioned again, you need execute dd like this:
time dd
Than devide your 10GB size by the total real time that you'll get. Just plain dd gives inflated numbers for writes, and your last dd number looks a little too good. You don't have this problem with bonnie, because it runs sync as a part of its execution. But overall, I think you are ok. Your reads look much more decent now than initially, and I'm glad that we have about the same throughput per spindel :-)
Regards
Igor Polishchuk
upengan78
64 Posts
0
March 21st, 2011 08:00
Thanks Igor. Thanks for all tips . It helped me understand a lot of things.
I did one more test as I forgot to use sync in my last attempt.
####
RAID 10, 536G Volume size, all 8 disks used in volume
Read Policy Adaptive Read Ahead
Write Policy Write Back
Stripe Element Size : 512KB
Disk Cache policy Enabled
File system on Centos = Ext4 (with journal)
####
bonnie++ -u root -d /mnt/raid10/ -s 10g -x 3 -c 8
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
myserver.domain 10G 860 99 404278 76 259420 22 1941 75 541458 17 1653 13
Latency 9768us 380ms 377ms 322ms 380ms 30892us
myserver.domain 10G 851 99 397416 75 260891 22 1853 75 536912 17 1808 8
Latency 9780us 334ms 352ms 337ms 366ms 26967us
myserver.domain 10G 847 99 401808 74 256577 22 1821 74 537366 16 1874 6
Latency 9967us 474ms 348ms 357ms 357ms 30951us
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
myserver.domaine 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency 416us 405us 437us 407us 7us 32us
myserver.domaine 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency 236us 405us 439us 92us 10us 77us
myserver.domaine 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency 396us 409us 442us 410us 6us 38us
Now with dd and sync,
time dd if=/dev/zero of=/mnt/raid5/zerofile.tst bs=1k count=10000000 && sync
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 22.1527 seconds, 462 MB/s
real 0m22.178s
user 0m0.208s
sys 0m21.913s
---> 10G/22.178 = 450.897286 MB/s
time dd of=/dev/null if=/mnt/raid5/zerofile.tst bs=1k count=10000000 && sync
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 17.5131 seconds, 585 MB/s
real 0m17.519s
user 0m0.122s
sys 0m9.156s
-----> 10G/17.519 = 570.808836 MB/s ( Although I don't think sync is needed for read speed)
These with 512 KB stripe element size look better than those with 256KB stripe element size.
As long as overall performance looks okay, I wouldn't worry much. Thanks!