Dell MD1120 + Perc6/E Performance
We recently ordered one of Dell’s MD1120 units and a Perc 6/E raid card with 512MB battery-backed cache to beef up our production database.
Dell’s raid controllers are rebranded models manufacturered by other companies and they have been hit or miss. They’ve done some horrible things (including advertising raid1-concatenated as raid10 a long while back) but my impression from reading online and from my own benchmarking are that these Perc6 cards are decent (but not exceptional). You still get the lockin aspect – Dell won’t support your machine if there is a non-Dell raid card in it and the MD1xxx units supposedly only connect to Perc5 and Perc6 cards.
The MD1120 itself is a pretty cool unit. Its only 2U and packs a lot of drives. We ordered one with 24x 73GB 15K SAS drives. No SSDs, I am amazed by SSD numbers but figure we can wait a few more years before shelling out the cash to fill an array with them. I want more data on their reliability in a 24×7 high IO server environment. Here’s a picture of the new guy.
The next time we have the need and budget to purchase a new database server from the ground up I plan to go whitebox but in this case we were looking for a relatively inexpensive way to get more capacity out of our existing Dell server and this seemed like a good option. I just had to ignore their storage tech guy who wanted me to buy a Gigabit SAN unit and screw our performance. So just ignore the tech guys that are part of the sales process and do your own research and benchmarking.
After benchmarking this Perc6+MD1120 combination extensively and putting it in production I am reasonably happy with its performance. Going to share those numbers now as it is sometimes hard to track down data on these things.
Bunch of notes about the testing environment and configurations for anyone interested. If just want numbers skip past these.
- Perc 6/E upgraded to latest 6.2.0-0013 firmware and connected to a new PowerEdge 1950 with 2x Xeon E5410s and 8GB RAM.
- MD1120 connected directly to Perc 6/E.
- All hardware raid configured with 64kb stripes, write back enabled, read ahead disabled (Dell hardware read ahead isn’t good).
- Server running latest opensuse. Did this purely to make it easier to upgrade firmware, get Dell support etc. If you call Dell and are using opensuse just lie and say you are running Suse 10 – everything will work and they will never know the difference.
- All tests were run 3 times and the middle run was recorded.
- xfs mount options were just
noatimeand ext3 mount options were
- xfs file system params were
-b size=4096 -d su=64k,sw=Xwhere X was the appropriate value for the configuration involved. ext3 params were
-b 4096 -E stride=16,stripe-width=192.
- dd params were “bs=8k, count=2000000″ ensuring a file 2 times size of RAM to bypass OS cache.
- The bonnie++ random seeks/second is the most important number for DB performance.
- I did a ton of tests with the first configuration and then settled into a groove of just testing the bits that seemed to matter. Hence the odd distribution of tests by config.
The dd and bonnie++ 1.02 results on opensuse
The distinct raid configurations are color coded and numbered. The winning individual tests are bolded.
|Record#||Raid Level||Linux Params||Results|
|Test||Config||HW||SW||Total||File System||Read Ahead||Sched.||dd Write MB/s||dd Read MB/s||bonnie++ seeks/s|
- Raid10 is the best option. Winning configurations are pure hardware raid10 with loads of readahead (test #9) and software-striped raid 10 for “raid 100″ (tests #19 and #34).
- I wasn’t impressed with raid6 or raid60 and raid0 isn’t a realistic option so that is why those setups aren’t as heavily hit in the above configurations.
- The IO scheduler didn’t really make much difference. CFQ seemed to be just as good or better so stuck with it (its the default).
- readahead makes a huge difference. Linux defaults this to 256 per drive and Linux sees a hardware raid array as 1 drive. You absolutely must increase that 256 default value to at least 4096 in my opinion. I increased it as high as 32768 for the pure raid10 config and performance didn’t suffer in the seeks/sec or as reported by pg_bench while sequential read speeds increased dramatically.
- xfs is faster than ext3 in all tests I compared them in. Included just a few ext3 numbers above. But, don’t use XFS (or ext3 with data=writeback) unless you have both a battery-backed cache on your raid controller and are connected to a UPS for main power (and ideally you should be monitoring the health of the BBU on the raid controller to ensure the battery isn’t dead). You could lose data if this advice is ignored.
The bonnie++ 1.03e results on Arch Linux
Next I installed Arch Linux which happens to come with bonnie++ 1.03e. I did a couple bonnie++ runs just to make sure the new OS didn’t mess anything up and was shocked to see dramatically better random seeks/second numbers. Sequential speeds were virtually the same, but seeks/second was massively faster. Here is a table showing some of the configs (I didn’t retest them all – was running out of time) with bonnie++ 1.03e seeks/second numbers. I am going to credit the newer version of bonnie++ for this difference. I retested enough configurations to feel good about the trends seen when testing with opensuse and bonnie++ 1.02 still holding. Open to hearing other possible explanations for the performance increase. I was glad to see these numbers as I was disappointed with the apparent 800ish ceiling I was seeing in the first batch of tests.
Here are the configs I retested (middle of 3 runs again). The Test# matches the row in the above table. The new 1.03e score is listed next to the old 1.02 score.
|Test#||Config#||bonnie++ 1.02 seeks/sec||bonnie++ 1.03e seeks/sec|
Higher numbers across the board. No big new insights. Raid 10 still wins, raid 6/60 still substantially slower. At this point that pure raid 10 config (test #9) that scored a 1717 is looking pretty nice. The pure software raid 10 (test #38) fell behind the hardware version further. Biggest take away from this is to be absolutely certain that when you are benchmarking disks the OS and tools are identical.
The pgbench results
Finally, I took the 3 fastest configurations and did pgbench runs with those. I was running out of time so I took what looked like the winner (test #9) and additionally tuned readahead and schedules a bit to ensure I got the best combination. pgbench isn’t perfect and there are people who dislike it but it gives us another number to compare and consider along with the raw dd/bonnie++ numbers already known. Keep in mind I am using a PostgreSQL install and pgbench on this new server – not the actual production server. Doing the benchmarks on the actual final server just wasn’t an option. The only significant difference between the benchmark server and the production server is that the latter has 4 times the RAM and has had nontrivial postgresql.conf tuning so I can only assume these numbers would improve a good bit.
I just tweaked a few things in the postgresql.conf file on the benchmark server. The non default values:
shared_buffers = 2048MB
checkpoint_segments = 10
effective_cache_size = 4096MB
max_connections = 500
work_mem = 20MB
maintenance_work_mem = 128MB
synchronous_commits = off
random_page_cost = 2.0
Note that I did have to increase SHMMAX to get shared_buffers + connections that high. See my PostgreSQL setup post for more information about that.
Between each run I dropped the “test” database and recreated it. I initialized with these commands. Note the different scale factors to get nontrivial data amounts – scale factor 1000 is pretty large but was necessary before I saw the disks working constantly during the benchmarks.
pgbench -i -s 100 -U postgres -d test
pgbench -i -s 1000 -U postgres -d test
Then I ran the tests specifying 40 clients and 10,000 transactions per client in the params. Read more about the pgbench tool for the test differences and what the transactions involve.
Here are the pgbench results. Numbers are transactions/second. Values listed are again the middle of 3 runs. I would have liked to do these pgbench runs on more configs but I again was running out of time and these suckers took a long time to run with all the dropping and re-initializing.
|Test#||Config#||TPC-B s=100||SELECT s=100||TPC-B s=1000||SELECT s=1000|
These pgbench tests were mostly a wash. But that isn’t a big surprise considering I only compared the 3 best configurations that were close in the dd/bonnie++ tests. Really wish there had been time to get some other configs in that table to see the difference. The #9 config did well with a top 3 number in all 4 tests.
I went with setup #9. Pure hardware raid 10. It won almost all tests (with the striped raid 10s being the only real competitors) and pure hardware raid is super easy to configure, maintain, and monitor.
More generally this MD1120 performs pretty well, especially for the relatively low price. As a quick note on price buying one of these MD1120s as configured above along with a PowerEdge 1950 with 16GB RAM and a Perc6/E card to connect them would cost less than $15k (about $540/mo for a $1 36mo lease) depending on the configuration and the deal you got. You could probably even shave a few thousand off that number if you got a real good deal.
If you do buy stuff from Dell be sure to get in touch with a small business sales team. They can offer nontrivial discounts on the price you can finagle in their online shopping cart, you get to talk to the same people every time you place an order, it gives you a contact if you have questions or run into issues with technical support, and it just generally is the way to go.
This MD1120+Perc6/E combo is connected to our existing DB server now and all I will say is performance is excellent. I am seeing zero query backup, barely any IO wait reported by vmstat, and hugely impressive random IO spikes when load gets heavy (though we haven’t gotten close to maxing out so who knows how high it could go). And, thanks to all this testing, I feel really good about having the optimal configuration for the hardware out there doing work.