Start a Conversation

Unsolved

This post is more than 5 years old

13894

March 1st, 2011 09:00

MD1220 Benchmarking.

Hi,

I have connected PE310 server (1cpu,2.93GHz,8cores,4GB memory) to a MD1220 via a H800 PERC adapter (1Gb NV cache).

the connection is single node , unified(redundant mode) . OS as well as , BIOS and H700(local disk) and H800 firmware is up to date.

 

I wanted to benchmark the R/W speeds on RAID 5 and RAID10 volumes on MD1220 so I used dd command after mounting them on centos 5.5.

Here are my results :

Note : Disk Cache Policy = Enabled in all cases and other settings were default with intelligent mode enabled on RAID 5  and RAID 10.

== RAID 10 - EXT3 ==

/dev/sdb1             402G  199M  382G   1% /mnt/raid10  EXT3

Write :

[root@apple ~]# time dd if=/dev/zero of=/mnt/raid10/zerofile.tst bs=1k count=4700000
4700000+0 records in
4700000+0 records out
4812800000 bytes (4.8 GB) copied, 13.9526 seconds, 345 MB/s

real    0m13.959s
user    0m0.238s
sys    0m12.334s

Read:

[root@apple ~]# time dd of=/dev/random if=/mnt/raid10/zerofile.tst bs=1k count=4000000

4000000+0 records in
4000000+0 records out
4096000000 bytes (4.1 GB) copied, 42.5778 seconds, 96.2 MB/s

real    0m42.579s
user    0m0.066s
sys    0m20.184s

== RAID 10 - EXT4 ==

Write :
time dd if=/dev/zero of=/mnt/raid10/zerofile.tst bs=1k count=4700000
4700000+0 records in
4700000+0 records out
4812800000 bytes (4.8 GB) copied, 10.5057 seconds, 458 MB/s
real    0m10.535s
user    0m0.085s
sys    0m10.224s

Read:


time dd of=/dev/random if=/mnt/raid10/zerofile.tst bs=1k count=4000000
4000000+0 records in
4000000+0 records out
4096000000 bytes (4.1 GB) copied, 53.6638 seconds, 76.3 MB/s

real    0m53.672s
user    0m0.037s
sys    0m20.345s



== RAID 5 - EXT3 ==

/dev/sdb1             536G  198M  509G   1% /mnt/raid5 -EXT3

Write :


time dd if=/dev/zero of=/mnt/raid5/zerofile.tst bs=1k count=4700000
4700000+0 records in
4700000+0 records out
4812800000 bytes (4.8 GB) copied, 14.5641 seconds, 330 MB/s

real    0m14.592s
user    0m0.207s
sys    0m12.289s

Read:


time dd of=/dev/random if=/mnt/raid5/zerofile.tst bs=1k count=4000000
4000000+0 records in
4000000+0 records out
4096000000 bytes (4.1 GB) copied, 32.3221 seconds, 127 MB/s

real    0m32.323s
user    0m0.028s
sys    0m20.174s


== RAID 5 - EXT 4 ==

/dev/sdb1             536G  198M  509G   1% /mnt/raid5  - EXT4

Write :
time dd if=/dev/zero of=/mnt/raid5/zerofile.tst bs=1k count=4700000
4700000+0 records in
4700000+0 records out
4812800000 bytes (4.8 GB) copied, 10.0236 seconds, 480 MB/s

real    0m10.056s
user    0m0.058s
sys    0m9.804s

Read:

time dd of=/dev/random if=/mnt/raid5/zerofile.tst bs=1k count=4000000
4000000+0 records in
4000000+0 records out
4096000000 bytes (4.1 GB) copied, 31.7629 seconds, 129 MB/s

real    0m31.764s
user    0m0.061s
sys    0m20.108s

-------------

Hope the o/p above is understandable. If there is more info required I can provide that. My question is does this performance look okay/reasonable? or speeds are very low ?

If there is any other method to do r/w benchmarking please let me know. I do see ext4 performance far better than ext3 however, I am not sure if ext4 is being used for NFS sharing , as well as main concern if any backup solution currently backup ext4 fs on client?

Thanks

1 Rookie

 • 

7 Posts

March 17th, 2011 13:00

Hi UPENGAN78,

Your post might have some more information to allow comparing with other tests. At the very least, it needs: 

What kind and how many drives you have in your  array? What is the stripe size? What ids you Write POlicy configuration? 

Also, your test is using a file of the size of your RAM. A good practice for benchmarking is to use files at least twice the size of your RAM, so you don't test your OS cache throughput instead of your drives. Also, when you are doing writing test with dd, to make it reall array test, you need to execute sync after the dd. Otherwise, 

some of your data in the end of the test may be only written to the OS buffer cache, but not flushed to the disks. It inflates your dd write numbers. You need to time execution of the pair dd+sync, and then divide the file size by the total time.  I saw adding sync after the dd showed that the real throughput is 30% lower than what dd reports.

Anyway, your numbers look pretty low to me, especially given that you used such a small file size. 

I did my tests on MD1220, too, using pgbench. 

My test configuration was:

Dell R710 2CPU X5650 @ 2.67GHz 12 cores 96GB DDR3, RHEL5.5 , kernel  2.6.18-194.26.1.el5 , filesystem XFS

MD1220 Dell H800 1GB cache, uniform ( redundant path) connection, Write Policy Write Through, Read Policy: No-Read-Ahead

RAID10 512K chunk, 22x300GB 2.5" SAS 10K RPM

bonnie++ v1.96, file size 192GB, 8 concurrent threads: Write throughput 1GB/s, Read throughput 1.1GB/s

For dd tests  I had the same config, but the stripe element size was 1MB, instead of 512K, which does not make much difference on results.

dd write without sync 1.2GB/s

dd with sync 919MB/s

dd read 1.3GB/s 

The stripe size I've chosen makes my set up optimized better for the sequential I/O throughput in MB/s.  Though, it still gave me at least decent  2400 IOPS of random I/O below 40ms latency. 

If you don't really care about sequential I/O throughput, and your I/O is random, you may use lower stripe size. Then better test for IOPS, not for MB/s. 

My machine had much more memory tan yours because I wanted to test it in the configuration close to what I'm going to use. However, this large memory should be more than compensated by the file size I was using, which won't fit into the memory. 

I think you should be able to get much more throughput from your array, unless you tested for very few spindles . I would look at the controller configuration: Do you have Write BAck cache turned on, what stripe size do you have? If you going to look through your configuration, make any changes and retest, I would be interested in knowing your results.

Regards

Igor Polishchuk

 

64 Posts

March 17th, 2011 14:00


>What kind and how many drives you have in your  array? What is the stripe size? What ids you Write POlicy configuration?

total Drives in bay = 8, 146G,SAS6GB,15K,2.5

RAID 10 - all 8 disks were used

RAID 5 - 5 in disks in volume and 2 hot spare, 1 disk unused.

Read Policy -AAdaptive Read Ahead

Write Policy - Write back

Disk Cache Policy - Enabled


>I think you should be able to get much more throughput from your array, unless you tested for very few spindles . I would look at the controller configuration: Do you >have Write BAck cache turned on, what stripe size do you have?

 


Stripe Element Size in Raid 5 is 64KB and I believe it must be the same in RAID 10 as well? I didn't change the default ;)

Your inputs are much appreciated, thanks for that. It is very informative..

I'd like to redo tests per the parameters that you have set for RAID10 and with bonnie++. can you send me the command that you use for bonnie++ ?

Thanks once again!

64 Posts

March 17th, 2011 14:00

Raid 10, all 8 disks used, unified (redundant mode) 2 disks per span

No Read ahead, Write through, Disk cache enabled, Intelligent raid not checked, Centos 5.5, ext4

 

 

dd if=/dev/zero of=/mnt/raid5/zerofile.tst bs=1k count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 21.991 seconds, 466 MB/s

dd of=/dev/random if=/mnt/raid5/zerofile.tst bs=1k count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 163.891 seconds, 62.5 MB/s

 

 dumpe4fs /dev/sdb1
dumpe4fs 1.41.9 (22-Aug-2009)
Filesystem volume name:  
Last mounted on:          /mnt/raid5/mnt/raid5
Filesystem UUID:          9e963db8-4b18-41e5-9d93-a2929778ca80
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              35684352
Block count:              142735509
Reserved block count:     7136775
Free blocks:              140445301
Free inodes:              35684341
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      989
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Thu Mar 17 15:28:55 2011
Last mount time:          Thu Mar 17 15:29:34 2011
Last write time:          Thu Mar 17 15:29:34 2011
Mount count:              1
Maximum mount count:      39
Last checked:             Thu Mar 17 15:28:55 2011
Check interval:           15552000 (6 months)
Next check after:         Tue Sep 13 15:28:55 2011
Lifetime writes:          9 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:              256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      adbbe8be-de10-42f5-b91f-0523117ab077
Journal backup:           inode blocks
Journal size:             128M

 

Well, one thing I note here it says, 9% complete Progress. Does it mean raid wasn't initialized? I could create partition /fs anyways :emotion-1:

1 Rookie

 • 

7 Posts

March 18th, 2011 12:00

UPENGAN78, 

Answering your question: The bonnie command I've used is:

bonnie++ -d /dev/sdc -s 288g -x 3 -c 8

-c is concurrency here, -x - number of passes, -s - file size ( your 10GB is enough for your 4GB RAM)

 

Your write data actually look ok for just 8 spindles. 

Strangely, the reads are way worse than writes, even on the file that small. Something is not right here. 

I still did not get what stripe size you are using. It may be important for the large sequential I/O to have a high stripe size element. 

I was testing 1MB and 512KB. Did not see much difference between them, but I'm sure I would if I had 64KB size.

I also did not mentioned, that I've disabled read-ahead on the controller, but increased the read-ahead size on the OS level.

You may just enable it on the controller, I just think it is more benefitial for the writes, if controller is not busy with read-aheads.

Also, you don't need to create your array as 2 drives per span. It seems to me,  what you have now is 4 RAID1 in JBOD. 

These spans are not really well doccumented, but this is my understanding of the thing, and I'm actually not a storage admin, rather DBA.

I understand that multiple spans are ment to be used in RAID 50 and 60. For your purposes, you just need  a single-span RAID10.

Regards

Igor Polishchuk

64 Posts

March 18th, 2011 12:00

Current setup:

RAID 10,  408G Volume size, 6 disks used in volume + 2 hot spares.

Read Policy     Adaptive Read Ahead

Write Policy    Write Through

Stripe Element Size  : 256KB

Disk Cache policy  Enabled

File system on Centos = Ext4 (with journal)

bonnie++ -u root -d /mnt/raid10/ -s 10g -x 3 -c 8

Version      1.96   ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
myserver.domain 10G   858  99 317790  61 186272  18  2358  99 422118  16  1367  11
Latency              9721us   89843us     104ms    7276us   11097us   34793us
myserver.domain 10G   840  99 321779  61 185796  18  2307  99 423365  14  1444  12
Latency              9946us   89414us   95007us    8473us   12339us   50543us
myserver.domain 10G   838  99 318344  60 181858  18  2355  98 410625  14  1452  13
Latency             10035us   58022us   86463us   23355us   12298us   34662us
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
myserver.domaine 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency               141us     416us     438us     137us      15us      30us
myserver.domaine 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency               410us     420us     432us     408us       7us      28us
myserver.domaine 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency               226us     413us     434us      96us      66us      82us

 

What do you think now? it is 317 MB/s for sequential, and 422MB/s for random. It still looks low but I am wondering if I should change any parameters.

64 Posts

March 18th, 2011 13:00

From what I understand that SAS Cable is 6Gbps and SAS drive is 15K6GB specification, maximum speed in theory I should get should be 6 Gigabit /second. While I am getting 422MB/s for Random which = 422x8 = 3376 Mbits/s = 3.376 Gbits/s which I think is okay.

 

With dd I got 465MB/s = 465x8 = 4520 Mbits/s = 4.520 Gbits/sec

 

Am I correct above?

1 Rookie

 • 

7 Posts

March 20th, 2011 00:00

1. Your 422MB/s is not for random, this is your sequential read throughput.

Your read 422MB/s  and write  317MB/s, when divided by just 6 spindles look in-line with my results ( about 1GB/s on 20 spindles).  You probably cannot get much more from them. 

2. You don't measure your random performance in MB/s, but rather in IOPS, which you have about 1367 for 6 spindles - a little too good.  Probably, at this small size of the file, the 1GB I/O controller cache slightly inflates the performance. Anyway, random IOPS look more than decent for 6 spindles. 

3. As I mentioned again, you need execute dd like this:

time dd

&& sync

Than devide your 10GB size by the total real time that you'll get. Just plain dd gives inflated numbers for writes, and your last dd number looks a little too good. You don't have this problem with bonnie, because it runs sync as a part of its execution. But overall, I think you are ok. Your reads look much more decent now than initially, and I'm glad that we have about the same throughput per spindel :-)

Regards

Igor Polishchuk

64 Posts

March 21st, 2011 08:00

Thanks Igor. Thanks for all tips . It helped me understand a lot of things.

I did one more test as I forgot to use sync in my last attempt.

####

RAID 10,  536G Volume size, all 8 disks used in volume

Read Policy     Adaptive Read Ahead

Write Policy    Write Back

Stripe Element Size  : 512KB

Disk Cache policy  Enabled

File system on Centos = Ext4 (with journal)

####

bonnie++ -u root -d /mnt/raid10/ -s 10g -x 3 -c 8

Version      1.96   ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
myserver.domain 10G   860  99 404278  76 259420  22  1941  75 541458  17  1653  13
Latency              9768us     380ms     377ms     322ms     380ms   30892us
myserver.domain 10G   851  99 397416  75 260891  22  1853  75 536912  17  1808   8
Latency              9780us     334ms     352ms     337ms     366ms   26967us
myserver.domain 10G   847  99 401808  74 256577  22  1821  74 537366  16  1874   6
Latency              9967us     474ms     348ms     357ms     357ms   30951us
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
myserver.domaine 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency               416us     405us     437us     407us       7us      32us
myserver.domaine 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency               236us     405us     439us      92us      10us      77us
myserver.domaine 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency               396us     409us     442us     410us       6us      38us

 

Now with dd and sync,

time dd if=/dev/zero of=/mnt/raid5/zerofile.tst bs=1k count=10000000 && sync
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 22.1527 seconds, 462 MB/s

real    0m22.178s
user    0m0.208s
sys    0m21.913s

---> 10G/22.178 = 450.897286 MB/s

time dd of=/dev/null if=/mnt/raid5/zerofile.tst bs=1k count=10000000 && sync
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 17.5131 seconds, 585 MB/s

real    0m17.519s
user    0m0.122s
sys    0m9.156s

-----> 10G/17.519 = 570.808836 MB/s ( Although I don't think sync is needed for read speed)

 

These with 512 KB stripe element size look better than those with 256KB stripe element size.

As long as overall performance looks okay, I wouldn't worry much. Thanks!

No Events found!

Top