Skip to content

Dell MD1120 + Perc6/E Performance

2009 May 13
by Joe

The Hardware

We recently ordered one of Dell’s MD1120 units and a Perc 6/E raid card with 512MB battery-backed cache to beef up our production database.

Dell’s raid controllers are rebranded models manufacturered by other companies and they have been hit or miss. They’ve done some horrible things (including advertising raid1-concatenated as raid10 a long while back) but my impression from reading online and from my own benchmarking are that these Perc6 cards are decent (but not exceptional). You still get the lockin aspect – Dell won’t support your machine if there is a non-Dell raid card in it and the MD1xxx units supposedly only connect to Perc5 and Perc6 cards.

The MD1120 itself is a pretty cool unit. Its only 2U and packs a lot of drives. We ordered one with 24x 73GB 15K SAS drives. No SSDs, I am amazed by SSD numbers but figure we can wait a few more years before shelling out the cash to fill an array with them. I want more data on their reliability in a 24×7 high IO server environment. Here’s a picture of the new guy.



  

The next time we have the need and budget to purchase a new database server from the ground up I plan to go whitebox but in this case we were looking for a relatively inexpensive way to get more capacity out of our existing Dell server and this seemed like a good option. I just had to ignore their storage tech guy who wanted me to buy a Gigabit SAN unit and screw our performance. So just ignore the tech guys that are part of the sales process and do your own research and benchmarking.

After benchmarking this Perc6+MD1120 combination extensively and putting it in production I am reasonably happy with its performance. Going to share those numbers now as it is sometimes hard to track down data on these things.

The Benchmark

Bunch of notes about the testing environment and configurations for anyone interested. If just want numbers skip past these.

  • Perc 6/E upgraded to latest 6.2.0-0013 firmware and connected to a new PowerEdge 1950 with 2x Xeon E5410s and 8GB RAM.
  • MD1120 connected directly to Perc 6/E.
  • All hardware raid configured with 64kb stripes, write back enabled, read ahead disabled (Dell hardware read ahead isn’t good).
  • Server running latest opensuse. Did this purely to make it easier to upgrade firmware, get Dell support etc. If you call Dell and are using opensuse just lie and say you are running Suse 10 – everything will work and they will never know the difference.
  • All tests were run 3 times and the middle run was recorded.
  • xfs mount options were just noatime and ext3 mount options were noatime,data=writeback.
  • xfs file system params were -b size=4096 -d su=64k,sw=X where X was the appropriate value for the configuration involved. ext3 params were -b 4096 -E stride=16,stripe-width=192.
  • dd params were “bs=8k, count=2000000″ ensuring a file 2 times size of RAM to bypass OS cache.
  • The bonnie++ random seeks/second is the most important number for DB performance.
  • I did a ton of tests with the first configuration and then settled into a groove of just testing the bits that seemed to matter. Hence the odd distribution of tests by config.

The dd and bonnie++ 1.02 results on opensuse

The distinct raid configurations are color coded and numbered. The winning individual tests are bolded.

Record# Raid Level Linux Params Results
Test Config HW SW Total File System Read Ahead Sched. dd Write MB/s dd Read MB/s bonnie++ seeks/s
1 1 24disk raid10 None 10 xfs 256 cfq 540 519 787.4
2 1 24disk raid10 None 10 xfs 256 noop 471 439 811.6
3 1 24disk raid10 None 10 xfs 256 deadline 494 429 812.1
4 1 24disk raid10 None 10 xfs 4096 cfq 544 836 802.7
5 1 24disk raid10 None 10 xfs 4096 noop 474 837 809.4
6 1 24disk raid10 None 10 xfs 4096 deadline 492 791 808.4
7 1 24disk raid10 None 10 xfs 8192 cfq 533 853 805.9
8 1 24disk raid10 None 10 xfs 16384 cfq 536 976 806.3
9 1 24disk raid10 None 10 xfs 32768 cfq 543 1035 808.6
10 1 24disk raid10 None 10 ext3 32768 cfq 332 602 695.2
11 1 24disk raid10 None 10 ext3 4096 cfq 339 929 743
12 1 24disk raid10 None 10 ext3 4096 noop 356 925 765.4
13 1 24disk raid10 None 10 ext3 4096 deadline 342 909 712.9
14 2 12disk raid10 None 10 xfs 4096 cfq 566 572 780.4
15 2 12disk raid10 None 10 xfs 4096 noop 561 567 788.9
16 2 12disk raid10 None 10 xfs 4096 deadline 552 571 786.6
17 2 12disk raid10 None 10 xfs 8192 cfq 566 623 778
18 3 2x12disk raid10 raid0 100 xfs 256 cfq 560 507 535.9
19 3 2x12disk raid10 raid0 100 xfs 4096 cfq 560 955 816
20 3 2x12disk raid10 raid0 100 xfs 8192 cfq 558 857 817.5
21 4 24disk raid6 None 6 xfs 256 cfq 436 478 415.9
22 4 24disk raid6 None 6 xfs 4096 cfq 440 1038 666
23 4 24disk raid6 None 6 xfs 8192 cfq 437 1054 670
24 4 24disk raid6 None 6 xfs 8192 noop 434 1058 651.7
25 4 24disk raid6 None 6 xfs 8192 deadline 435 1044 666.1
26 4 24disk raid6 None 6 xfs 16384 cfq 437 1083 667.3
27 5 24disk raid60 None 60 xfs 256 cfq 424 391 670.2
28 5 24disk raid60 None 60 xfs 4096 cfq 426 1038 669.6
29 5 24disk raid60 None 60 xfs 8192 cfq 424 1052 669.9
30 5 24disk raid60 None 60 xfs 16384 cfq 424 1082 657.5
31 6 3x8disk raid10 raid0 100 xfs 256 cfq 557 530 621.4
32 6 3x8disk raid10 raid0 100 xfs 4096 cfq 555 936 820.6
33 6 3x8disk raid10 raid0 100 xfs 8192 cfq 560 902 817.7
34 6 3x8disk raid10 raid0 100 xfs 16384 cfq 555 1041 815.5
35 7 24disk jbod raid10 10 xfs 256 cfq 367 573 817.5
36 7 24disk jbod raid10 10 xfs 4096 cfq 360 964 814.7
37 7 24disk jbod raid10 10 xfs 8192 cfq 358 994 816.3
38 7 24disk jbod raid10 10 xfs 16384 cfq 377 1049 818.7
39 8 12x2disk raid1 raid0 10 xfs 256 cfq 549 408 598
40 8 12x2disk raid1 raid0 10 xfs 4096 cfq 549 714 578.5
41 8 12x2disk raid1 raid0 10 xfs 8192 cfq 546 643 563.8
42 8 12x2disk raid1 raid0 10 xfs 16384 cfq 546 861 549.9
43 9 24disk jbod raid0 0 xfs 16384 cfq 743 1054 687.6
44 10 24disk raid0 None 0 xfs 16384 cfq 773 1094 671.8

  

Observations:

  • Raid10 is the best option. Winning configurations are pure hardware raid10 with loads of readahead (test #9) and software-striped raid 10 for “raid 100″ (tests #19 and #34).
  • I wasn’t impressed with raid6 or raid60 and raid0 isn’t a realistic option so that is why those setups aren’t as heavily hit in the above configurations.
  • The IO scheduler didn’t really make much difference. CFQ seemed to be just as good or better so stuck with it (its the default).
  • readahead makes a huge difference. Linux defaults this to 256 per drive and Linux sees a hardware raid array as 1 drive. You absolutely must increase that 256 default value to at least 4096 in my opinion. I increased it as high as 32768 for the pure raid10 config and performance didn’t suffer in the seeks/sec or as reported by pg_bench while sequential read speeds increased dramatically.
  • xfs is faster than ext3 in all tests I compared them in. Included just a few ext3 numbers above. But, don’t use XFS (or ext3 with data=writeback) unless you have both a battery-backed cache on your raid controller and are connected to a UPS for main power (and ideally you should be monitoring the health of the BBU on the raid controller to ensure the battery isn’t dead). You could lose data if this advice is ignored.

The bonnie++ 1.03e results on Arch Linux

Next I installed Arch Linux which happens to come with bonnie++ 1.03e. I did a couple bonnie++ runs just to make sure the new OS didn’t mess anything up and was shocked to see dramatically better random seeks/second numbers. Sequential speeds were virtually the same, but seeks/second was massively faster. Here is a table showing some of the configs (I didn’t retest them all – was running out of time) with bonnie++ 1.03e seeks/second numbers. I am going to credit the newer version of bonnie++ for this difference. I retested enough configurations to feel good about the trends seen when testing with opensuse and bonnie++ 1.02 still holding. Open to hearing other possible explanations for the performance increase. I was glad to see these numbers as I was disappointed with the apparent 800ish ceiling I was seeing in the first batch of tests.

Here are the configs I retested (middle of 3 runs again). The Test# matches the row in the above table. The new 1.03e score is listed next to the old 1.02 score.

Test# Config# bonnie++ 1.02 seeks/sec bonnie++ 1.03e seeks/sec
1 1 787.4 1613
4 1 802.7 1652
5 1 809.4 1639
6 1 808.4 1688
7 1 805.9 1684
8 1 806.3 1697
9 1 808.6 1717
19 3 816 1662
26 4 667.3 1056
30 5 657.5 1168
34 6 815.5 1705
38 7 818.7 1560
44 10 671.8 1175

  

Observations:

Higher numbers across the board. No big new insights. Raid 10 still wins, raid 6/60 still substantially slower. At this point that pure raid 10 config (test #9) that scored a 1717 is looking pretty nice. The pure software raid 10 (test #38) fell behind the hardware version further. Biggest take away from this is to be absolutely certain that when you are benchmarking disks the OS and tools are identical.

The pgbench results

Finally, I took the 3 fastest configurations and did pgbench runs with those. I was running out of time so I took what looked like the winner (test #9) and additionally tuned readahead and schedules a bit to ensure I got the best combination. pgbench isn’t perfect and there are people who dislike it but it gives us another number to compare and consider along with the raw dd/bonnie++ numbers already known. Keep in mind I am using a PostgreSQL install and pgbench on this new server – not the actual production server. Doing the benchmarks on the actual final server just wasn’t an option. The only significant difference between the benchmark server and the production server is that the latter has 4 times the RAM and has had nontrivial postgresql.conf tuning so I can only assume these numbers would improve a good bit.

I just tweaked a few things in the postgresql.conf file on the benchmark server. The non default values:

  • shared_buffers = 2048MB
  • checkpoint_segments = 10
  • effective_cache_size = 4096MB
  • max_connections = 500
  • work_mem = 20MB
  • maintenance_work_mem = 128MB
  • synchronous_commits = off
  • random_page_cost = 2.0

Note that I did have to increase SHMMAX to get shared_buffers + connections that high. See my PostgreSQL setup post for more information about that.

Between each run I dropped the “test” database and recreated it. I initialized with these commands. Note the different scale factors to get nontrivial data amounts – scale factor 1000 is pretty large but was necessary before I saw the disks working constantly during the benchmarks.

pgbench -i -s 100 -U postgres -d test
pgbench -i -s 1000 -U postgres -d test

Then I ran the tests specifying 40 clients and 10,000 transactions per client in the params. Read more about the pgbench tool for the test differences and what the transactions involve.

Here are the pgbench results. Numbers are transactions/second. Values listed are again the middle of 3 runs. I would have liked to do these pgbench runs on more configs but I again was running out of time and these suckers took a long time to run with all the dropping and re-initializing.

Test# Config# TPC-B s=100 SELECT s=100 TPC-B s=1000 SELECT s=1000
4 1 1580 10885 1224 3655
5 1 1591 10667 1270 3499
6 1 1567 10647 1267 3048
7 1 1551 10656 1202 3478
8 1 1553 10644 1209 3386
9 1 1606 10759 1296 3548
19 3 1581 10743 1311 3269
34 6 1563 10677 1323 3156

  

Observations:

These pgbench tests were mostly a wash. But that isn’t a big surprise considering I only compared the 3 best configurations that were close in the dd/bonnie++ tests. Really wish there had been time to get some other configs in that table to see the difference. The #9 config did well with a top 3 number in all 4 tests.

Conclusion

I went with setup #9. Pure hardware raid 10. It won almost all tests (with the striped raid 10s being the only real competitors) and pure hardware raid is super easy to configure, maintain, and monitor.

More generally this MD1120 performs pretty well, especially for the relatively low price. As a quick note on price buying one of these MD1120s as configured above along with a PowerEdge 1950 with 16GB RAM and a Perc6/E card to connect them would cost less than $15k (about $540/mo for a $1 36mo lease) depending on the configuration and the deal you got. You could probably even shave a few thousand off that number if you got a real good deal.

If you do buy stuff from Dell be sure to get in touch with a small business sales team. They can offer nontrivial discounts on the price you can finagle in their online shopping cart, you get to talk to the same people every time you place an order, it gives you a contact if you have questions or run into issues with technical support, and it just generally is the way to go.

This MD1120+Perc6/E combo is connected to our existing DB server now and all I will say is performance is excellent. I am seeing zero query backup, barely any IO wait reported by vmstat, and hugely impressive random IO spikes when load gets heavy (though we haven’t gotten close to maxing out so who knows how high it could go). And, thanks to all this testing, I feel really good about having the optimal configuration for the hardware out there doing work.

3 Responses
  1. Tom permalink
    July 20, 2009

    Can I pay you to run some benchmarks with this gear?

  2. July 21, 2009

    Hey Tom, would love to be able to help you out but this fella is in production now and its a 24/7 environment. I’ll shoot you an email if we end up with another one anytime soon as I enjoy benchmarking this stuff and getting numbers out there – its hard to find real numbers from the manufacturers.

Trackbacks and Pingbacks

  1. Cheap Home Network Storage | gtuhl: startup technology

Comments are closed.