<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>gtuhl: startup technology &#187; sysadmin</title>
	<atom:link href="http://blog.gtuhl.com/category/sysadmin/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.gtuhl.com</link>
	<description>Development, IT, Gadgets, and Startups</description>
	<lastBuildDate>Fri, 14 Oct 2011 12:15:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Installing Percona 5.1 on Centos 5</title>
		<link>http://blog.gtuhl.com/2011/06/14/installing-percona-5-1-on-centos-5/</link>
		<comments>http://blog.gtuhl.com/2011/06/14/installing-percona-5-1-on-centos-5/#comments</comments>
		<pubDate>Wed, 15 Jun 2011 02:23:33 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[centos]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[percona]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=797</guid>
		<description><![CDATA[I recently moved several peripheral machines to using the Percona builds of MySQL with their XtraDB storage engine. I picked up more interest around their products after attending their conference in NYC a couple weeks ago. The performance difference is substantive and easily seen in the various benchmarks I&#8217;ve done. The Percona build hits a [...]]]></description>
			<content:encoded><![CDATA[<p>I recently moved several peripheral machines to using the <a href="http://percona.com">Percona</a> builds of MySQL with their XtraDB storage engine.  I picked up more interest around their products after attending their conference in NYC a couple weeks ago.</p>
<p>The performance difference is substantive and easily seen in the various benchmarks I&#8217;ve done.  The Percona build hits a higher transactions/sec peak and holds out longer before InnoDB/XtraDB starts thrashing on significant concurrency.  I also like knowing there is a support structure in place should it be needed that doesn&#8217;t involve talking to Oracle.  </p>
<p>Though not yet battle tested, I believe overall it will be a good move and their stuff is certainly worth a look if you are exploring options with MySQL.  They have lots of nice, tested adjustments in the code and bake in the cool addons you read about (<a href="http://yoshinorimatsunobu.blogspot.com/search/label/handlersocket">HandlerSocket</a>, <a href="http://www.facebook.com/note.php?note_id=388112370932">FlashCache</a>, <a href="http://www.percona.com/docs/wiki/percona-xtrabackup:start">XtraBackup</a>, etc) plus you know you are working with all the latest good stuff in the <a href="http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-plugin-installation.html">InnoDB Plugin</a> as all of that is included in the base Percona install.  It&#8217;s a drop in replacement so replication, mmm, mytop, and all the other tools you are used to work just fine.</p>
<p>But, their packaging and installation is fairly rough.  Right now 7 of the most recent 10 posts in their forum are about package incompatibilities and broken installations across Centos and Ubuntu and there isn&#8217;t much help to be found from the staff there.</p>
<p>Here&#8217;s how I got 5.1 to install on Centos 5.5 machines (where stock 5.0 is the default package) such that I could still install other packages like mytop that depended on a 5.0 shared library .  I hate having to force packages in directly outside the package manager but couldn&#8217;t find another way given their packages collide with one another.  We&#8217;re basically forcing in the shared-compat package and then forcing the shared package we want in on top of that.  This was to upgrade an existing stock MySQL server.</p>
<ul>
<li>Backup your data completely and shove it in a safe place just in case, then as root:</li>
<li><code>$/etc/init.d/mysqld stop</code></li>
<li><code>$rpm -Uhv http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm</code></li>
<li><code>$yum remove mysql-server mysql mysql-devel mysql-test mysql-bench</code> (you may not have all of these)</li>
<li><code>$yum install Percona-Server-server-51 Percona-Server-client-51</code>
<ul>
<li>The server will be started if install finishes and there are no issues, kill it with <code>/etc/init.d/mysql stop</code> if this is the case.</li>
</ul>
</li>
<li><code>$ln -s /etc/init.d/mysql /etc/init.d/mysqld</code> (preference thing, but this way it matches the Centos init script name)</li>
<li><code>$yum install Percona-SQL-shared-compat-5.1.42-2.x86_64</code>
<ul>
<li>This will error out but will leave the rpm we want in yum&#8217;s cache.</li>
</ul>
</li>
<li><code>$rpm -Uvh /var/cache/yum/percona/packages/Percona-SQL-shared-compat-5.1.43-2.x86_64.rpm --force</code></li>
<li><code>$yum remove Percona-Server-shared-51</code></li>
<li><code>$yum install Percona-Server-shared-51</code>
<ul>
<li>This will error out but will leave the rpm we want in yum&#8217;s cache.</li>
</ul>
</li>
<li><code>$rpm -Uvh /var/cache/yum/percona/packages/Percona-Server-shared-51*.rpm --force</code></li>
<li>
</ul>
<p>That should get things installed so try starting mysql back up and watch /var/log/mysqld.log for any errors.  You may have old cruft in your 5.0 my.cnf file that is not compatible with 5.1.</p>
<p>Assuming it does start back up you will want to complete the upgrade:</p>
<ul>
<li>Run a <code>mysql_upgrade</code></li>
<li>Run a <code>mysqlcheck -A --check-upgrade</code> to make sure everything checks out.  If you get messages containing &#8220;Table upgrade required&#8221; for some of your InnoDB tables manually fix them with <code>alter table MY_TABLE engine=InnoDB</code>.</li>
</ul>
<p>That should wrap things up.  No guarantees but this is what worked for me.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2011/06/14/installing-percona-5-1-on-centos-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Basic Command Line Shortcuts</title>
		<link>http://blog.gtuhl.com/2010/12/18/basic-command-line-shortcuts/</link>
		<comments>http://blog.gtuhl.com/2010/12/18/basic-command-line-shortcuts/#comments</comments>
		<pubDate>Sat, 18 Dec 2010 16:12:17 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[commandline]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[osx]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=710</guid>
		<description><![CDATA[The command line is one of the greatest benefits of using Linux or OSX over Windows. One tip that some people do not know is that many familiar emacs shortcuts work on the command line as well. If you are not an emacs user already, here are some of the most basic but useful shortcuts: [...]]]></description>
			<content:encoded><![CDATA[<p>The command line is one of the greatest benefits of using Linux or OSX over Windows.  One tip that some people do not know is that many familiar emacs shortcuts work on the command line as well.  If you are not an emacs user already, here are some of the most basic but useful shortcuts:</p>
<table>
<tr>
<td style="width: 120px">CTRL-a</td>
<td>Move cursor to beginning of line.</td>
</tr>
<tr>
<td style="width: 120px">CTRL-e</td>
<td>Move cursor to end of line.</td>
</tr>
<tr>
<td style="width: 120px">ESC-b</td>
<td>Move cursor back one word.</td>
</tr>
<tr>
<td style="width: 120px">ESC-f</td>
<td>Move cursor forward one word.</td>
</tr>
<tr>
<td style="width: 120px">CTRL-k</td>
<td>Cuts all characters to the right of your cursor.</td>
</tr>
<tr>
<td style="width: 120px">CTRL-w</td>
<td>Cuts all characters to the left of your cursor.</td>
</tr>
<tr>
<td style="width: 120px">CTRL-y</td>
<td>Paste cut characters at current cursor position.</td>
</tr>
<tr>
<td style="width: 120px">CTRL-SHIFT-dash</td>
<td>Undo cuts, pastes, and typing steps.</td>
</tr>
<tr>
<td>CTRL-r</td>
<td>Reverse search through your recently executed commands.  This one is perhaps the most useful of all the above.  Just hit the shortcut, type the first characters of a command you want to run again, and hit enter.</td>
</tr>
</table>
<p></p>
<p>For emacs users these are well baked into muscle memory but even if you hate emacs they are worth learning for when you are working on the command line.  Just the small set listed above is enough of a foundation that you can quickly make big changes to your commands with just a few key strokes and generally without having to touch the arrow keys.  Combining these basics with a few additional tricks (like using &#8220;cd -&#8221; to jump back to previous path, sudo !!, etc) can really get your speed up.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2010/12/18/basic-command-line-shortcuts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Zabbix 1.8 Network Monitoring</title>
		<link>http://blog.gtuhl.com/2010/05/03/zabbix-1-8-network-monitoring/</link>
		<comments>http://blog.gtuhl.com/2010/05/03/zabbix-1-8-network-monitoring/#comments</comments>
		<pubDate>Tue, 04 May 2010 02:22:30 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[books]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[zabbix]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=558</guid>
		<description><![CDATA[In the next few days the first book completely about Zabbix (that I am aware of) is being released. Entitled Zabbix 1.8 Network Monitoring it is a welcome release. In my mind books and ever increasing version numbers are a good sign for the health of Zabbix. While the existing documentation for Zabbix is pretty [...]]]></description>
			<content:encoded><![CDATA[<p>In the next few days the first book completely about Zabbix (that I am aware of) is being released.   Entitled <a href="https://www.packtpub.com/zabbix-1-8-network-monitoring/book">Zabbix 1.8 Network Monitoring</a> it is a welcome release.  In my mind books and ever increasing version numbers are a good sign for the health of Zabbix.  </p>
<p>While the existing documentation for Zabbix is pretty decent it is presented in a very reference-focused manner.  It can explain something you know to ask about very well but doesn&#8217;t offer tips or more general context for the best approach to take in your setup.  This new book takes that core documentation (restates it perhaps a tiny amount too much) and makes it far more presentable and easier to scan in a way that promotes picking up those little details the documentation is not good for.</p>
<p>I&#8217;ve been working through most of this book the last few days and wanted to provide some thoughts both good and bad.</p>
<h2>The Good:</h2>
<ul>
<li>This is a very complete book.  The gaps in the documentation are well filled.</li>
<li>I&#8217;ve setup 1.4 and 1.6 Zabbix servers and installed a 1.8 server to work through the book with.  Even when reading it very quickly I learned several new things that I flat out wasn&#8217;t aware of or saw cleaner ways to do things (like using the built in IPMI agent support instead of wrapping freeipmi or similar).</li>
<li>Tasks around maintaining Zabbix are well covered.  I greatly appreciate there being dedicated sections and chapters on upgrading, backups, reporting, and performance tuning of Zabbix itself.</li>
<li>While the book does start out slow with basic content it finishes very strong and covers a lot of more advanced topics that make it obvious the author has actually used the software he is writing about.</li>
<li>The screenshots, code samples, and command line examples are complete enough that you don&#8217;t have to have a terminal open while reading.</li>
<li>I like the iterative approach used in many of the examples where the reader is shown each layer of the setup to aid with troubleshooting.</li>
<li>The versions of Zabbix are not highly dissimilar so this would be a good reference for any reasonably recent release (with primarily 1.6 experience the entire book still felt completely relevant).</li>
</ul>
<h2>The Bad:</h2>
<ul>
<li>The book is pretty verbose and a nontrivial portion is already captured by the freely available documentation.</li>
<li>The Linux help early on feels out of place and is not particularly good advice.  Perhaps it is just me but surely it would be safe to assume intermediate Linux knowledge from an audience that is setting up network monitoring.  The parts towards the front about writing init scripts and getting things installed are completely unnecessary on any reasonable Linux distribution (yum, pacman, apt-get all have Zabbix server and clients you can install with a single command).  Though, I suppose that information could be helpful if you want a non standard config or some insight into how things work a bit under the hood.</li>
<li>My biggest issue is the amount of background and the number of pages before what I consider the meat gets covered in depth.  That is, triggers, actions, notifications, and charting.</li>
</ul>
<h2>Conclusion</h2>
<p>Overall I am happy to see this book released.  Despite some rough edges in the interface and documentation (this book hopefully will fill the latter gap) Zabbix is a really powerful, flexible piece of software that deserves more users.  I have used Nagios, Zenoss, Cacti, and plenty of my own bash script setups and prefer Zabbix over everything else.  </p>
<p>While this book does restate some of the official documentation it does bring deeper examples and better context in addition to a lot of additional content that would be very helpful for someone getting started with Zabbix.  It is a long book that probably doesn&#8217;t make sense to read cover to cover (you probably will never need all the information in here unless you are building a gnarly monitoring install) but it would be a great learning tool and handy to have on hand as a reference long term for when setting up more advanced configurations.</p>
<p>There is enough good content in here to strongly recommend it to anyone learning about Zabbix especially if you are tasked with setting up a large or complicated installation.  If you aren&#8217;t sure, give this <a href="http://blog.gtuhl.com/downloads/7689_Zabbix_SampleChapter.pdf">Sample Chapter</a> a read and see how you like the feel of the book first.  That sample chapter is actually a pretty great resource as it covers the basic monitoring, triggers, actions, and notifications that are the core of Zabbix.  </p>
<p>If you are looking to monitor your gear with any degree of depth Zabbix really is a great tool.  The web-based monitoring options popping up are great for shallow uptime reports and basic notifications on complete outtages but Zabbix can be used to tell you a lot more about <strong>why</strong> things are failing.  At my old company I was monitoring DB TPS, Java heap and garbage collection stats, OpenMQ queue sizes, Postfix queues, end-to-end response time for users logging into their dashboards, and all sorts of other cool stuff in addition to the basics you get with the default templates.</p>
<p>Disclaimer:  I was provided a copy of this book by Packt Publishing and was happy to give it a read.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2010/05/03/zabbix-1-8-network-monitoring/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Not a Fan of MySQL</title>
		<link>http://blog.gtuhl.com/2010/04/15/not-a-fan-of-mysql/</link>
		<comments>http://blog.gtuhl.com/2010/04/15/not-a-fan-of-mysql/#comments</comments>
		<pubDate>Thu, 15 Apr 2010 13:04:09 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=527</guid>
		<description><![CDATA[It has been a wonderfully busy 2010 thus far. The blog has suffered but hoping to have a post up soon about building a monstrous build server on a budget. That said, I have been forced (by software, not by management) to use MySQL on a couple projects in the last month and after being [...]]]></description>
			<content:encoded><![CDATA[<p>It has been a wonderfully busy 2010 thus far.  The blog has suffered but hoping to have a post up soon about building a monstrous build server on a budget.  </p>
<p>That said, I have been forced (by software, not by management) to use MySQL on a couple projects in the last month and after being a PostgreSQL user for the last several years it has been an incredibly frustrating experience worth throwing up a couple bullet points about.  The more I use MySQL the more frustrated I get.  These are high level and not well argued but hoping to get points across.  I always invite digging into details.</p>
<p><strong>Reasons why I dislike MySQL:</strong></p>
<ul>
<li>The planner is incredibly dumb.  I feel like it does the wrong thing most of the time.</li>
<li>Temporary tables and subselects are relatively worthless.  They crush performance and are full of bizarre gotchas due to limitations and bugs in the planner (like not being able to use a temp table more than once in the same query).</li>
<li>The tuning process is not at all intuitive or consistent.  Tuning MySQL queries feels like trying to maximally combine hacks and workarounds for bugs in the planner to achieve something vaguely close to desired speed.</li>
<li>The planner goes to disk a LOT.  Subtle adjustments to the query will prevent it from doing so but why can&#8217;t it do the right thing and avoid disk except as a last resort or only when something doesn&#8217;t fit in memory?  My laptop has 4GB and even with a 100MB DB the MySQL planner goes to disk all the time.</li>
<li>The documentation is superficial, incomplete, and inconsistent.  The examples are trivial and unhelpful.</li>
<li>Doing a DB dump locks the ENTIRE DB.  This is awful.  I don&#8217;t want to setup replication on a trivial DB (less than say 100MB) just to do backups without locking things up.</li>
</ul>
<p><strong>PostgreSQL Comparison</strong><br />
This experience has really emphasized some huge advantages of PostgreSQL even ignoring the technical points:</p>
<ul>
<li>The code is of exceptionally high quality and exceedingly clean.  Corner cases are rare and bugs even rarer.</li>
<li>The documentation is complete, organized, and consistent.  You can genuinely learn 95% of what there is to know about PostgreSQL by reading the excellent documentation (that is kept updated and synchronized with each new release).  It is also very easy to find.  It feels like they spend as much time and effort on the documentation as they do the code.  Only OpenMQ documentation has rivaled PostgreSQL in completeness in recent memory.</li>
<li>The PostgreSQL planner is very solid and it makes the right decisions most of the time without any assistance from a human.  You can use the documentation to build a solid foundation of understanding about how things work and then use that plus your own intuition to achieve desired results.  If a feature is available in PostgreSQL it will be fast and fully understood by the planner.  The same cannot be said for MySQL where raw cycles are required to slowly absorb all the known bugs, workarounds, and feature-specific knowledge about what is actually usable on a non trivial DB and what is not.</li>
</li>
</ul>
<p>Just some thoughts.  Perhaps my perception is incorrect given the disparity in usage time.  To be blunt I would smile if Oracle destroyed MySQL and on a semi-related note believe the shenanigans of the founder/creator of MySQL around the Oracle acquisition and trying to get back something that was fairly purchased are completely lame.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2010/04/15/not-a-fan-of-mysql/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Monitoring PSUs in Arch Linux Dell Servers</title>
		<link>http://blog.gtuhl.com/2009/08/22/monitoring-psus-in-arch-linux-dell-servers/</link>
		<comments>http://blog.gtuhl.com/2009/08/22/monitoring-psus-in-arch-linux-dell-servers/#comments</comments>
		<pubDate>Sat, 22 Aug 2009 12:27:40 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[archlinux]]></category>
		<category><![CDATA[dell]]></category>
		<category><![CDATA[ipmi]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[zabbix]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=492</guid>
		<description><![CDATA[We currently use Arch Linux exclusively for servers.  Much of our equipment comes from Dell and one of the gotchas of using a non-Redhat, non-SUSE linux distribution with their servers is you cannot just drop in their Open Manage tools to monitor everything.  As a side note, despite the bloat of Open Manage, it actually [...]]]></description>
			<content:encoded><![CDATA[<p>We currently use Arch Linux exclusively for servers.  Much of our equipment comes from Dell and one of the gotchas of using a non-Redhat, non-SUSE linux distribution with their servers is you cannot just drop in their Open Manage tools to monitor everything.  As a side note, despite the bloat of Open Manage, it actually isn&#8217;t a bad set of tools once you get it installed (on an rpm-based distribution) &#8211; the command line utilities you get with it are pretty decent.  The GUI stuff is largely worthless in my biased opinion.</p>
<p>In any case, Open Manage is a big pile of rpms with lots of dependencies so it wouldn&#8217;t be easy to transfer to Arch.  I posted awhile back about <a href="http://blog.gtuhl.com/2009/03/11/monitoring-dell-perc5-and-perc6-disks-in-arch-linux/">using LSI&#8217;s Megaraid CLI tool to monitor Dell raid arrays</a> but what about everything else?  One big item that was really haunting me was power supplies.  I had no monitoring on those things so if I went too long between datacenter visits to check for amber lights on the fronts of the servers, we could have a double PSU failure on a really important server and be in a lot of trouble.  Server PSUs fail A LOT so don&#8217;t discount the importance of monitoring them.  My personal experience might be unusual, but i&#8217;ve had more PSUs fail than hard drives.</p>
<p>Yesterday I read up on <a href="http://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface">Intelligent Platform Management Interface (IPMI)</a> and saw that Dell has supported it for awhile.  I suspect their Open Manage tools are simply a proprietary wrapper around this.</p>
<p>So you just need some other software that will let you access this stuff. Some quick searches revealed ipmitools and freeipmi being popular.  I went with <a href="http://www.gnu.org/software/freeipmi/">freeipmi</a> because it is a very active project with recent updates (the Arch Linux AUR had them both but both PKGBUILDs were broken).</p>
<p>Installing it is straight forward:  download tarball, unpack, ./configure, make, make install.  This will place a bunch of command line tools in /usr/local/sbin/.  You&#8217;ll probably want to script the install process if you have to setup lots of servers.  Or, don&#8217;t be lazy like me and bundle up a PKGBUILD for the Arch AUR and just use yaourt for the installs.</p>
<p>You can check out the README and man pages for information about all the various commands but I am just using one: ipmi-sel.  This prints out the contents of the &#8220;System Event Log&#8221; and seems to correspond very nicely with any messages that appear on the front LCD of the server.  I removed, replaced, and removed again a PSU in a server and saw this perfectly parseable and useful output:</p>
<pre>28:21-Aug-2009 09:00:36:Power Supply Status <img src='http://blog.gtuhl.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> resence detected
29:21-Aug-2009 09:00:37:Power Supply PS Redundancy:Redundancy Lost
30:21-Aug-2009 09:04:56:Power Supply Status <img src='http://blog.gtuhl.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> resence detected
31:21-Aug-2009 09:04:57:Power Supply PS Redundancy:Fully Redundant (formerly "Redundancy Regained")
32:21-Aug-2009 09:05:16:Power Supply Status <img src='http://blog.gtuhl.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> resence detected
33:21-Aug-2009 09:05:17:Power Supply PS Redundancy:Redundancy Lost</pre>
<p>Other events I have seen in this log are memory DIMM failures and the case being open &#8211; and those are conveniently the only other alerts i&#8217;ve seen on the LCDs before.  You can clear out the SEL and have a few other options just with that one ipmi-sel tool &#8211; read the man page for more information.</p>
<p>For monitoring PSU failure I am using this script.  I am sure there are tighter approaches but this one works fine:</p>
<p><code>/usr/local/sbin/ipmi-sel | grep "PS Redundancy:" | tail -n1 | grep "Redundancy Lost" | wc -l</code></p>
<p>That will return a 1 if the last &#8220;PS Redundancy&#8221; related item in the log is a failure, and 0 otherwise.  You can then easily snap that into Zabbix or whatever monitoring software you prefer.  I did a <a href="http://blog.gtuhl.com/2009/02/21/monitoring-postgresql-tps-with-zabbix/">post with more detail on adding items to Zabbix</a> awhile back that might be helpful if you are not familiar with the process.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/08/22/monitoring-psus-in-arch-linux-dell-servers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL Tips and Tricks</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/</link>
		<comments>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/#comments</comments>
		<pubDate>Fri, 07 Aug 2009 22:00:03 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420</guid>
		<description><![CDATA[Here&#8217;s a dozen tips for working with a PostgreSQL database. It is a sophisticated and powerful piece of software and just knowing a few rules of thumb before diving in can be a huge help. If you want more detail read the amazing documention. My list of tips was very long so I just chopped [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a dozen tips for working with a PostgreSQL database.  It is a sophisticated and powerful piece of software and just knowing a few rules of thumb before diving in can be a huge help.  If you want more detail <a href="http://www.postgresql.org/docs/8.3/interactive/">read the amazing documention</a>.  My list of tips was very long so I just chopped off a dozen for this post.</p>
<h2>#1: Don&#8217;t do sequential scans &#8211; use indexes</h2>
<p>Do SELECTs against indexes.  Sequential scans will devastate your IO and in most cases should be avoided.  I read on a bogus guide somewhere that needing indexes was a sign of a bad database schema.  That is complete BS.</p>
<p>The only time indexes can hurt is when doing bulk inserts.  I <a href="http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/">did a post</a> about dropping/recreating to temporarily boost bulk insert performance awhile back.</p>
<p>Most of the time it doesn&#8217;t matter though.  Throw down as many indexes as you need.</p>
<h2>#2: Index all foreign keys</h2>
<p>There are certain rare cases where this doesn&#8217;t help, but as your database grows in size this becomes increasingly important.  Say you have a &#8220;people&#8221; table.  You then have an &#8220;emails&#8221; table with a person_id column that is a FK to the people table.  As an example say you have 10 million rows in the emails table.</p>
<p>Now delete a row from that people table.  PostgreSQL must scan the emails table and ensure that no email rows reference the person row you just deleted.  If that person_id column on emails is not indexed you just kicked off a sequential scan of 10 million rows with your deletion attempt.  Index that column and the pain goes away.  My general rule of thumb is to index every single FK.  You can always drop the indexes later if they aren&#8217;t needed.</p>
<p>Here&#8217;s a handy query for discovering all the FKs pointed at a table in a database.  You can then shoot through the results and verify that each table has indexes for those FKs.</p>
<pre>
select t.constraint_name, t.table_name, t.constraint_type,
   c.table_name, c.column_name
from information_schema.table_constraints t,
   information_schema.constraint_column_usage c
where t.constraint_name = c.constraint_name
   and t.constraint_type = 'FOREIGN KEY'
   and c.table_name = 'MY_TABLE_NAME';
</pre>
<h2>#3: Don&#8217;t be afraid to selectively de-normalize</h2>
<p>It is only a matter of time before you realize the <a href="http://en.wikipedia.org/wiki/Database_normalization#Normal_forms">normal forms</a> you learned about in school can seriously hurt your database performance as size gets huge.  De-normalize (usually meaning duplicate a value in two tables) when it provides a big performance increase.  You must be mindful of the maintenance concerns with de-normalization.  The maintenance can be done in code, with triggers or similar in the DB, or sometimes the data is static enough that it isn&#8217;t a huge concern.</p>
<p><strong>Edit:</strong> I hedged this advice from the original post (originally was a blanket &#8220;de-normalize all the time&#8221; that could burn someone new to databases).  You should know the normal forms and understand them.  My point was that you shouldn&#8217;t feel that wavering from them with de-normalization makes your database a bad one &#8211; it is simply a reality of nontrivial production databases.  But, if you don&#8217;t understand them, and/or don&#8217;t understand when and why de-normalization can help blindly ignoring the normal forms is only going to get you into more trouble.  So learn them and understand them but then don&#8217;t be afraid to selectively de-normalize where it can give you big boosts of performance.  A common scenario that is often very helpful is de-normalizing to eliminate a join or multiple reads.</p>
<h2>#4: Avoid joins on huge tables</h2>
<p>As your tables get bigger and bigger (many millions of rows) joins cannot be done in memory and spill over to disk.  Once that happens your performance is gone.  For smaller tables it doesn&#8217;t matter but as tables get bigger avoid joining when you can.  Try using unions, subselects, or denormalization instead.  If you have two big tables that are 1-to-1 that you are joining a lot combine them into one wide table.</p>
<h2>#5: Don&#8217;t do unanchored text searches or full text searching</h2>
<p>That is don&#8217;t try something like this with wildcards on both ends:<br />
<code>select * from people where first_name like '%joe%'</code></p>
<p>The reason is that no index can be used so PostgreSQL will be doing a sequential scan no matter what.  If you truly need full text searching use Lucene or PostgreSQL 8.3+ full text searching capabilities instead of standard indexes.</p>
<p>If you are in a pinch and can get by with the left side being anchored you can create a special index using varchar_pattern_ops like this:</p>
<p><code>create index idx_people_first_name on people (first_name varchar_pattern_ops);</code></p>
<p>That index can be used by this query (no wildcard on left side):<br />
<code>select * from people where first_name like 'joe%';</code></p>
<p>But be aware that same index cannot be used by a query using the standard &#8220;=&#8221; operator so you have to create two indexes to get both (one with varchar_pattern_ops, one without).  Yes this is messy and its why you should use Lucene, PostgreSQL 8.3+ full text searching, or something similar to do text searching.</p>
<h2>#6: Use the COPY command for bulk data loading</h2>
<p>It is orders of magnitude faster.  <a href="http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/">Check a previous post of mine</a> if you want numbers.</p>
<h2>#7: Use multi-column indexes</h2>
<p>For common queries with compound WHERE clauses create a multi-column index for maximum performance.  An example:<br />
<code>create index idx_people_full_name on people (first_name, last_name);</code></p>
<p>Now you can do queries like this and hit that index:<br />
<code>select * from people where first_name = 'joe' and last_name= 'smith';</code></p>
<p>Additionally, any WHERE clause using just first_name can take full advantage of this index (because first_name is the first column) so don&#8217;t waste disk space creating redundant single column indexes if you are already covered by a multi-column.</p>
<p>Finally, you can align a multi-column index with a common ORDER BY clause to make that query instant.  Using the above index a query against people ending in <code>order by first_name, last_name</code> would be super fast.</p>
<p>As a final note PostgreSQL can combine multiple single-column indexes so multi-column indexes on every combination are not necessary.  But, a single multi-column is going to be faster so they can be helpful for queries you run a lot.  You can read more about combining of multiple indexes <a href="http://www.postgresql.org/docs/8.2/interactive/indexes-bitmap-scans.html">here</a>.</p>
<h2>#8: Don&#8217;t bother indexing evenly distributed booleans or enum columns</h2>
<p>In PostgreSQL indexes do not store data, they are instead pointers to the data.  This means that after scanning the index a random read must be done for each matching record to get the data.  Random IO is a lot more expensive than sequential IO (though SSDs are changing that) so if the PostgreSQL planner (it does sampling of your data in the background) knows you have a very evenly distributed column (a boolean split 50/50 for instance) it won&#8217;t use the index at all.  It will instead make the assumption that a straight up sequential scan will be faster than index scanning for half the rows and having to do a ton of random IO.  Indexes on these types of columns are a waste of disk space and you are slowing down your inserts for no benefit.</p>
<h2>#9: Use EXPLAIN ANALYZE to benchmark and compare queries</h2>
<p>Using EXPLAIN on its own will give you the planner&#8217;s guess at how it will run your query.  This is pretty useful if you have a very huge query that is going to take a long time and you want to tune it without having to actually run it after each tweak.  The complete EXPLAIN ANALYZE will run the query, show the plan predicted, show what actually happened, and how long it took.</p>
<p>Be sure to ANALYZE after significant table modifications so that the planner can do its sampling and adjust its plans to account for any index changes.</p>
<h2>#10: Use expression indexes &#8211; they are awesome</h2>
<p>In PostgreSQL you can index an expression.  Here is a simple example:<br />
<code>create index idx_people_lower_first_name on people (lower(first_name));</code></p>
<p>Now you can run queries like this and use the index &#8211; no need to create a separate column with lowercase values:<br />
<code>select * from people where lower(first_name) = 'joe';</code></p>
<p>You can similarly index combinations of your data-containing columns:</p>
<p><code>create index idx_people_lower_full_name on people (lower(coalesce(first_name, '')) || lower(coalesce(last_name, '')));</code></p>
<h2>#11: Indexes can have WHERE clauses</h2>
<p>You can create conditional indexes that only apply for certain rows.  A useful scenario is creating a unique constraint on a column or set of columns only where certain conditions apply:</p>
<p><code>create unique index idx_people_person_id on people (person_id) where deleted = false;</code></p>
<p>That&#8217;s a simple example that feels a bit cleaner than changing the ids of deleted records to avoid conflicting with a unique constraint.  Anything can be placed in that WHERE clause, they aren&#8217;t limited in any way.</p>
<h2>#12: Don&#8217;t use the database if it isn&#8217;t a good fit</h2>
<p>Databases only scale so far alone.  At some point you are going to have to explore memcached, terracotta, or something similar to continue to scale.  The NoSQL (over)excitement going on right now offers alternatives as well for certain scenarios.  The main point is that at some point your database will just be one component in a set of services and technologies that provide the persistence layer for your application.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Why does a Windows Server OS exist?</title>
		<link>http://blog.gtuhl.com/2009/08/03/why-does-a-windows-server-os-exist/</link>
		<comments>http://blog.gtuhl.com/2009/08/03/why-does-a-windows-server-os-exist/#comments</comments>
		<pubDate>Mon, 03 Aug 2009 21:35:06 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[archlinux]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[windows]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=408</guid>
		<description><![CDATA[Awhile back I had to fight again with the single Windows server at WeTheCitizens to get a Blackberry reset. The RIM server software only runs on Windows, else that final machine would be running Arch Linux like everything else. It was a nightmare to get working &#8211; the RIM software has literally a half dozen [...]]]></description>
			<content:encoded><![CDATA[<p>Awhile back I had to fight again with the single Windows server at <a href="http://www.wildfireplatform.com/">WeTheCitizens</a> to get a Blackberry reset.  The RIM server software only runs on Windows, else that final machine would be running <a href="http://www.archlinux.org/">Arch Linux</a> like everything else.</p>
<p>It was a nightmare to get working &#8211; the RIM software has literally a half dozen separate flaky services all running at the same time and tripping on one another that have to align perfectly.  The worst part though was having to work with a Window server.</p>
<p>That leads me to my question: why does anyone use Windows on servers any more?  How is it manageable at all?  For any Windows admins out there, would love to get some perspective on why people use it.</p>
<ul>
<li>How do you manage updates?  Especially with lots of machines, it seems borderline impossible to keep them all updated without downtime considering the mandatory restart needed for <strong>everything</strong>.  Linux only needs a restart for some kernel modifications (i.e. barely ever).  I have machines in production with 700 days of uptime.</li>
<li>What do you do for remote access?  Does an admin really have to login with a GUI remote desktop to each server?  Is there a VPN endpoint that has to be used or are all the servers left open?  How are bulk operations configured and run in that kind of mess?  With Linux I can whip up a bash script in minutes to do whatever I need done to as many machines as I wish.</li>
<li>Is there a fast approach to setting up new servers on a variety of hardware?  I can install Arch Linux in less than 10 minutes and I am not even doing anything advanced like PXE boots or disk imaging.</li>
<li>Windows fails in large environments.  The latest big example is The London Stock Exchange <a href="http://blogs.computerworld.com/london_stock_exchange_to_abandon_failed_windows_platform">completely abandoning Windows</a> after substantial investment of money and time to set it up on that platform.  As an older example I know for awhile (maybe even still today) <a href="http://www.theregister.co.uk/2001/12/12/microsoft_hotmail_still_runs/">Hotmail ran on FreeBSD</a> after Microsoft purchased it because they couldn&#8217;t make it stable on Windows.</li>
<li>Aside from the second rate Microsoft products that are decisively outperformed by Linux alternatives (IIS, MSSQL, Exchange) is there any decent server or DB software that doesn&#8217;t run better on Linux than on Windows?  I am thinking things like Apache, PostgreSQL, Tomcat (and the Java VM in general), Mongrel, JBoss, and innumerable others.</li>
</ul>
<p>This is really just my frustrated attempt at understanding why people put themselves through the utter pain of running Windows servers.  Why would you not use Linux?  As long as there are people using Windows it means crapware like Blackberry Server has no incentive to port to Linux and that means I have to suffer at work.  We&#8217;re 4:1 ratio for iPhones to BBs so maybe the best option is to get those final BB users switched over to a better phone.</p>
<p>If you are starting a company don&#8217;t even bother with Windows on the server.  Their new Bizspark program may seem like a good idea on the surface but I view it as a trap.  You get to use an inferior product for a few years and then must pay through the nose at the end or rebuild everything with the better technology you should have used from the beginning.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/08/03/why-does-a-windows-server-os-exist/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Dell MD1120 + Perc6/E Performance</title>
		<link>http://blog.gtuhl.com/2009/05/13/dell-md1120-perc6e-performance/</link>
		<comments>http://blog.gtuhl.com/2009/05/13/dell-md1120-perc6e-performance/#comments</comments>
		<pubDate>Wed, 13 May 2009 12:33:44 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[dell]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[raid]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=318</guid>
		<description><![CDATA[The Hardware We recently ordered one of Dell&#8217;s MD1120 units and a Perc 6/E raid card with 512MB battery-backed cache to beef up our production database. Dell&#8217;s raid controllers are rebranded models manufacturered by other companies and they have been hit or miss. They&#8217;ve done some horrible things (including advertising raid1-concatenated as raid10 a long [...]]]></description>
			<content:encoded><![CDATA[<h2>The Hardware</h2>
<p>We recently ordered one of Dell&#8217;s <a href="http://www1.ap.dell.com/content/products/productdetails.aspx/storage_powervault_md1120?c=my&#038;l=en&#038;s=bsd">MD1120</a> units and a Perc 6/E raid card with 512MB battery-backed cache to beef up our production database.</p>
<p>Dell&#8217;s raid controllers are rebranded models manufacturered by other companies and they have been hit or miss.  They&#8217;ve done some horrible things (including advertising raid1-concatenated as raid10 a long while back) but my impression from reading online and from my own benchmarking are that these Perc6 cards are decent (but not exceptional).  You still get the lockin aspect &#8211; Dell won&#8217;t support your machine if there is a non-Dell raid card in it and the MD1xxx units supposedly only connect to Perc5 and Perc6 cards.</p>
<p>The MD1120 itself is a pretty cool unit.  Its only 2U and packs a lot of drives.  We ordered one with 24x 73GB 15K SAS drives.  No SSDs, I am amazed by SSD numbers but figure we can wait a few more years before shelling out the cash to fill an array with them.  I want more data on their reliability in a 24&#215;7 high IO server environment.  Here&#8217;s a picture of the new guy.</p>
<p><center><br />
<img src="http://blog.gtuhl.com/wp-content/md1120.jpg" width="550px"/><br />
</center></p>
<p>
&nbsp;&nbsp;
</p>
<p>The next time we have the need and budget to purchase a new database server from the ground up I plan to go whitebox but in this case we were looking for a relatively inexpensive way to get more capacity out of our existing Dell server and this seemed like a good option.  I just had to ignore their storage tech guy who wanted me to buy a Gigabit SAN unit and screw our performance.  So just ignore the tech guys that are part of the sales process and do your own research and benchmarking.</p>
<p>After benchmarking this Perc6+MD1120 combination extensively and putting it in production I am reasonably happy with its performance.  Going to share those numbers now as it is sometimes hard to track down data on these things.</p>
<h2>The Benchmark</h2>
<p>Bunch of notes about the testing environment and configurations for anyone interested.  If just want numbers skip past these.</p>
<ul>
<li>Perc 6/E upgraded to latest 6.2.0-0013 firmware and connected to a new PowerEdge 1950 with 2x Xeon E5410s and 8GB RAM.</li>
<li>MD1120 connected directly to Perc 6/E.</li>
<li>All hardware raid configured with 64kb stripes, write back enabled, read ahead disabled (Dell hardware read ahead isn&#8217;t good).</li>
<li>Server running latest opensuse.  Did this purely to make it easier to upgrade firmware, get Dell support etc.  If you call Dell and are using opensuse just lie and say you are running Suse 10 &#8211; everything will work and they will never know the difference.</li>
<li>All tests were run 3 times and the middle run was recorded.</li>
<li>xfs mount options were just <code>noatime</code> and ext3 mount options were <code>noatime,data=writeback</code>.</li>
<li>xfs file system params were <code>-b size=4096 -d su=64k,sw=X</code> where X was the appropriate value for the configuration involved.  ext3 params were <code>-b 4096 -E stride=16,stripe-width=192</code>.</li>
<li>dd params were &#8220;bs=8k, count=2000000&#8243; ensuring a file 2 times size of RAM to bypass OS cache.</li>
<li>The bonnie++ random seeks/second is the most important number for DB performance.</li>
<li>I did a ton of tests with the first configuration and then settled into a groove of just testing the bits that seemed to matter.  Hence the odd distribution of tests by config.</li>
</ul>
<h2>The dd and bonnie++ 1.02 results on opensuse</h2>
<p>The distinct raid configurations are color coded and numbered.  The winning individual tests are bolded.</p>
<table border="1" bordercolor="gray" cellpadding="3" cellspacing="0">
<tr bgcolor="#cccccc">
<th colspan="2">Record#</th>
<th colspan="3" align="center">Raid Level</th>
<th colspan="3" align="center">Linux Params</th>
<th colspan="3" align="center">Results</th>
</tr>
<tr bgcolor="#cccccc">
<th>Test</th>
<th>Config</th>
<th>HW</th>
<th>SW</th>
<th>Total</th>
<th>File System</th>
<th>Read Ahead</th>
<th>Sched.</th>
<th>dd Write MB/s</th>
<th>dd Read MB/s</th>
<th>bonnie++ seeks/s</th>
</tr>
<tr>
<td>1</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>540</td>
<td>519</td>
<td>787.4</td>
</tr>
<tr>
<td>2</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>noop</td>
<td>471</td>
<td>439</td>
<td>811.6</td>
</tr>
<tr>
<td>3</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>deadline</td>
<td>494</td>
<td>429</td>
<td>812.1</td>
</tr>
<tr>
<td>4</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>544</td>
<td>836</td>
<td>802.7</td>
</tr>
<tr>
<td>5</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>noop</td>
<td>474</td>
<td>837</td>
<td>809.4</td>
</tr>
<tr>
<td>6</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>deadline</td>
<td>492</td>
<td>791</td>
<td>808.4</td>
</tr>
<tr>
<td>7</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>533</td>
<td>853</td>
<td>805.9</td>
</tr>
<tr>
<td>8</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>536</td>
<td>976</td>
<td>806.3</td>
</tr>
<tr>
<td><strong>9</strong></td>
<td bgcolor="#a8d255"><strong>1</strong></td>
<td><strong>24disk&nbsp;raid10</strong></td>
<td><strong>None</strong></td>
<td><strong>10</strong></td>
<td><strong>xfs</strong></td>
<td><strong>32768</strong></td>
<td><strong>cfq</strong></td>
<td><strong>543</strong></td>
<td><strong>1035</strong></td>
<td><strong>808.6</strong></td>
</tr>
<tr>
<td>10</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>ext3</td>
<td>32768</td>
<td>cfq</td>
<td>332</td>
<td>602</td>
<td>695.2</td>
</tr>
<tr>
<td>11</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>ext3</td>
<td>4096</td>
<td>cfq</td>
<td>339</td>
<td>929</td>
<td>743</td>
</tr>
<tr>
<td>12</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>ext3</td>
<td>4096</td>
<td>noop</td>
<td>356</td>
<td>925</td>
<td>765.4</td>
</tr>
<tr>
<td>13</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>ext3</td>
<td>4096</td>
<td>deadline</td>
<td>342</td>
<td>909</td>
<td>712.9</td>
</tr>
<tr>
<td>14</td>
<td bgcolor="#efc800">2</td>
<td>12disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>566</td>
<td>572</td>
<td>780.4</td>
</tr>
<tr>
<td>15</td>
<td bgcolor="#efc800">2</td>
<td>12disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>noop</td>
<td>561</td>
<td>567</td>
<td>788.9</td>
</tr>
<tr>
<td>16</td>
<td bgcolor="#efc800">2</td>
<td>12disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>deadline</td>
<td>552</td>
<td>571</td>
<td>786.6</td>
</tr>
<tr>
<td>17</td>
<td bgcolor="#efc800">2</td>
<td>12disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>566</td>
<td>623</td>
<td>778</td>
</tr>
<tr>
<td>18</td>
<td bgcolor="#5727ef">3</td>
<td>2x12disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>560</td>
<td>507</td>
<td>535.9</td>
</tr>
<tr>
<td><strong>19</strong></td>
<td bgcolor="#5727ef"><strong>3</strong></td>
<td><strong>2x12disk&nbsp;raid10</strong></td>
<td><strong>raid0</strong></td>
<td><strong>100</strong></td>
<td><strong>xfs</strong></td>
<td><strong>4096</strong></td>
<td><strong>cfq</strong></td>
<td><strong>560</strong></td>
<td><strong>955</strong></td>
<td><strong>816</strong></td>
</tr>
<tr>
<td>20</td>
<td bgcolor="#5727ef">3</td>
<td>2x12disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>558</td>
<td>857</td>
<td>817.5</td>
</tr>
<tr>
<td>21</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>436</td>
<td>478</td>
<td>415.9</td>
</tr>
<tr>
<td>22</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>440</td>
<td>1038</td>
<td>666</td>
</tr>
<tr>
<td>23</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>437</td>
<td>1054</td>
<td>670</td>
</tr>
<tr>
<td>24</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>8192</td>
<td>noop</td>
<td>434</td>
<td>1058</td>
<td>651.7</td>
</tr>
<tr>
<td>25</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>8192</td>
<td>deadline</td>
<td>435</td>
<td>1044</td>
<td>666.1</td>
</tr>
<tr>
<td>26</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>437</td>
<td>1083</td>
<td>667.3</td>
</tr>
<tr>
<td>27</td>
<td bgcolor="#6f5900">5</td>
<td>24disk&nbsp;raid60</td>
<td>None</td>
<td>60</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>424</td>
<td>391</td>
<td>670.2</td>
</tr>
<tr>
<td>28</td>
<td bgcolor="#6f5900">5</td>
<td>24disk&nbsp;raid60</td>
<td>None</td>
<td>60</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>426</td>
<td>1038</td>
<td>669.6</td>
</tr>
<tr>
<td>29</td>
<td bgcolor="#6f5900">5</td>
<td>24disk&nbsp;raid60</td>
<td>None</td>
<td>60</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>424</td>
<td>1052</td>
<td>669.9</td>
</tr>
<tr>
<td>30</td>
<td bgcolor="#6f5900">5</td>
<td>24disk&nbsp;raid60</td>
<td>None</td>
<td>60</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>424</td>
<td>1082</td>
<td>657.5</td>
</tr>
<tr>
<td>31</td>
<td bgcolor="#fdfa8e">6</td>
<td>3x8disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>557</td>
<td>530</td>
<td>621.4</td>
</tr>
<tr>
<td>32</td>
<td bgcolor="#fdfa8e">6</td>
<td>3x8disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>555</td>
<td>936</td>
<td>820.6</td>
</tr>
<tr>
<td>33</td>
<td bgcolor="#fdfa8e">6</td>
<td>3x8disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>560</td>
<td>902</td>
<td>817.7</td>
</tr>
<tr>
<td><strong>34</strong></td>
<td bgcolor="#fdfa8e"><strong>6</strong></td>
<td><strong>3x8disk&nbsp;raid10</strong></td>
<td><strong>raid0</strong></td>
<td><strong>100</td>
<td><strong>xfs</strong></td>
<td><strong>16384</strong></td>
<td><strong>cfq</strong></td>
<td><strong>555</strong></td>
<td><strong>1041</strong></td>
<td><strong>815.5</strong></td>
</tr>
<tr>
<td>35</td>
<td bgcolor="#86a3ff">7</td>
<td>24disk&nbsp;jbod</td>
<td>raid10</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>367</td>
<td>573</td>
<td>817.5</td>
</tr>
<tr>
<td>36</td>
<td bgcolor="#86a3ff">7</td>
<td>24disk&nbsp;jbod</td>
<td>raid10</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>360</td>
<td>964</td>
<td>814.7</td>
</tr>
<tr>
<td>37</td>
<td bgcolor="#86a3ff">7</td>
<td>24disk&nbsp;jbod</td>
<td>raid10</td>
<td>10</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>358</td>
<td>994</td>
<td>816.3</td>
</tr>
<tr>
<td>38</td>
<td bgcolor="#86a3ff">7</td>
<td>24disk&nbsp;jbod</td>
<td>raid10</td>
<td>10</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>377</td>
<td>1049</td>
<td>818.7</td>
</tr>
<tr>
<td>39</td>
<td bgcolor="#e68d82">8</td>
<td>12x2disk&nbsp;raid1</td>
<td>raid0</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>549</td>
<td>408</td>
<td>598</td>
</tr>
<tr>
<td>40</td>
<td bgcolor="#e68d82">8</td>
<td>12x2disk&nbsp;raid1</td>
<td>raid0</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>549</td>
<td>714</td>
<td>578.5</td>
</tr>
<tr>
<td>41</td>
<td bgcolor="#e68d82">8</td>
<td>12x2disk&nbsp;raid1</td>
<td>raid0</td>
<td>10</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>546</td>
<td>643</td>
<td>563.8</td>
</tr>
<tr>
<td>42</td>
<td bgcolor="#e68d82">8</td>
<td>12x2disk&nbsp;raid1</td>
<td>raid0</td>
<td>10</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>546</td>
<td>861</td>
<td>549.9</td>
</tr>
<tr>
<td>43</td>
<td bgcolor="#d3e8b1">9</td>
<td>24disk&nbsp;jbod</td>
<td>raid0</td>
<td>0</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>743</td>
<td>1054</td>
<td>687.6</td>
</tr>
<tr>
<td>44</td>
<td bgcolor="#6ec3fd">10</td>
<td>24disk&nbsp;raid0</td>
<td>None</td>
<td>0</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>773</td>
<td>1094</td>
<td>671.8</td>
</tr>
</table>
<p>
&nbsp;&nbsp;
</p>
<p>Observations:</p>
<ul>
<li>Raid10 is the best option.  Winning configurations are pure hardware raid10 with loads of readahead (test #9) and software-striped raid 10 for &#8220;raid 100&#8243; (tests #19 and #34).</li>
<li>I wasn&#8217;t impressed with raid6 or raid60 and raid0 isn&#8217;t a realistic option so that is why those setups aren&#8217;t as heavily hit in the above configurations.</li>
<li>The IO scheduler didn&#8217;t really make much difference.  CFQ seemed to be just as good or better so stuck with it (its the default).</li>
<li>readahead makes a <strong>huge</strong> difference.  Linux defaults this to 256 per drive and Linux sees a hardware raid array as 1 drive.  You absolutely must increase that 256 default value to at least 4096 in my opinion.  I increased it as high as 32768 for the pure raid10 config and performance didn&#8217;t suffer in the seeks/sec or as reported by pg_bench while sequential read speeds increased dramatically.</li>
<li>xfs is faster than ext3 in all tests I compared them in.  Included just a few ext3 numbers above.  But, don&#8217;t use XFS (or ext3 with data=writeback) unless you have <strong>both</strong> a battery-backed cache on your raid controller and are connected to a UPS for main power (and ideally you should be monitoring the health of the BBU on the raid controller to ensure the battery isn&#8217;t dead).  You could lose data if this advice is ignored.</li>
</ul>
<h2>The bonnie++ 1.03e results on Arch Linux</h2>
<p>Next I installed Arch Linux which happens to come with bonnie++ 1.03e.  I did a couple bonnie++ runs just to make sure the new OS didn&#8217;t mess anything up and was shocked to see dramatically better random seeks/second numbers.  Sequential speeds were virtually the same, but seeks/second was massively faster.  Here is a table showing some of the configs (I didn&#8217;t retest them all &#8211; was running out of time) with bonnie++ 1.03e seeks/second numbers.  I am going to credit the newer version of bonnie++ for this difference.  I retested enough configurations to feel good about the trends seen when testing with opensuse and bonnie++ 1.02 still holding.  Open to hearing other possible explanations for the performance increase.  I was glad to see these numbers as I was disappointed with the apparent 800ish ceiling I was seeing in the first batch of tests.</p>
<p>Here are the configs I retested (middle of 3 runs again).  The Test# matches the row in the above table.  The new 1.03e score is listed next to the old 1.02 score.</p>
<table border="1" bordercolor="gray" cellpadding="3" cellspacing="0">
<tr bgcolor="#cccccc">
<th>Test#</th>
<th>Config#</th>
<th>bonnie++ 1.02 seeks/sec
<th>bonnie++ 1.03e seeks/sec</th>
</tr>
<tr>
<td>1</td>
<td bgcolor="#a8d255">1</td>
<td align="right">787.4</td>
<td align="right">1613</td>
</tr>
<tr>
<td>4</td>
<td bgcolor="#a8d255">1</td>
<td align="right">802.7</td>
<td align="right">1652</td>
</tr>
<tr>
<td>5</td>
<td bgcolor="#a8d255">1</td>
<td align="right">809.4</td>
<td align="right">1639</td>
</tr>
<tr>
<td>6</td>
<td bgcolor="#a8d255">1</td>
<td align="right">808.4</td>
<td align="right">1688</td>
</tr>
<tr>
<td>7</td>
<td bgcolor="#a8d255">1</td>
<td align="right">805.9</td>
<td align="right">1684</td>
</tr>
<tr>
<td>8</td>
<td bgcolor="#a8d255">1</td>
<td align="right">806.3</td>
<td align="right">1697</td>
</tr>
<tr>
<td>9</td>
<td bgcolor="#a8d255">1</td>
<td align="right">808.6</td>
<td align="right">1717</td>
</tr>
<tr>
<td>19</td>
<td bgcolor="#5727ef">3</td>
<td align="right">816</td>
<td align="right">1662</td>
</tr>
<tr>
<td>26</td>
<td bgcolor="#bd550c">4</td>
<td align="right">667.3</td>
<td align="right">1056</td>
</tr>
<tr>
<td>30</td>
<td bgcolor="#6f5900">5</td>
<td align="right">657.5</td>
<td align="right">1168</td>
</tr>
<tr>
<td>34</td>
<td bgcolor="#fdfa8e">6</td>
<td align="right">815.5</td>
<td align="right">1705</td>
</tr>
<tr>
<td>38</td>
<td bgcolor="#86a3ff">7</td>
<td align="right">818.7</td>
<td align="right">1560</td>
</tr>
<tr>
<td>44</td>
<td bgcolor="#6ec3fd">10</td>
<td align="right">671.8</td>
<td align="right">1175</td>
</tr>
</table>
<p>
&nbsp;&nbsp;
</p>
<p>Observations:</p>
<p>Higher numbers across the board.  No big new insights.  Raid 10 still wins, raid 6/60 still substantially slower.  At this point that pure raid 10 config (test #9) that scored a 1717 is looking pretty nice.  The pure software raid 10 (test #38) fell behind the hardware version further.  Biggest take away from this is to be absolutely certain that when you are benchmarking disks the OS and tools are identical.</p>
<h2>The pgbench results</h2>
<p>Finally, I took the 3 fastest configurations and did pgbench runs with those.  I was running out of time so I took what looked like the winner (test #9) and additionally tuned readahead and schedules a bit to ensure I got the best combination.  pgbench isn&#8217;t perfect and there are people who dislike it but it gives us another number to compare and consider along with the raw dd/bonnie++ numbers already known.  Keep in mind I am using a PostgreSQL install and pgbench on this new server &#8211; not the actual production server.  Doing the benchmarks on the actual final server just wasn&#8217;t an option.  The only significant difference between the benchmark server and the production server is that the latter has 4 times the RAM and has had nontrivial postgresql.conf tuning so I can only assume these numbers would improve a good bit.</p>
<p>I just tweaked a few things in the postgresql.conf file on the benchmark server.  The non default values:</p>
<ul>
<li><code>shared_buffers = 2048MB</code></li>
<li><code>checkpoint_segments = 10</code></li>
<li><code>effective_cache_size = 4096MB</code></li>
<li><code>max_connections = 500</code></li>
<li><code>work_mem = 20MB</code></li>
<li><code>maintenance_work_mem = 128MB</code></li>
<li><code>synchronous_commits = off</code></li>
<li><code>random_page_cost = 2.0</code></li>
</ul>
<p>Note that I did have to increase SHMMAX to get shared_buffers + connections that high.  See my <a href="http://blog.gtuhl.com/2009/03/08/postgresql-setup-basics/">PostgreSQL setup post</a> for more information about that.</p>
<p>Between each run I dropped the &#8220;test&#8221; database and recreated it.  I initialized with these commands.  Note the different scale factors to get nontrivial data amounts &#8211; scale factor 1000 is pretty large but was necessary before I saw the disks working constantly during the benchmarks.</p>
<p><code>pgbench -i -s 100 -U postgres -d test</code><br />
<code>pgbench -i -s 1000 -U postgres -d test</code></p>
<p>Then I ran the tests specifying 40 clients and 10,000 transactions per client in the params.  <a href="http://developer.postgresql.org/pgdocs/postgres/pgbench.html">Read more about the pgbench tool</a> for the test differences and what the transactions involve.</p>
<p>Here are the pgbench results.  Numbers are transactions/second.  Values listed are again the middle of 3 runs.  I would have liked to do these pgbench runs on more configs but I again was running out of time and these suckers took a long time to run with all the dropping and re-initializing.</p>
<table border="1" bordercolor="gray" cellpadding="3" cellspacing="0">
<tr bgcolor="#cccccc">
<th>Test#</th>
<th>Config#</th>
<th>TPC-B s=100</th>
<th>SELECT s=100</th>
<th>TPC-B s=1000</th>
<th>SELECT s=1000</th>
</tr>
<tr>
<td>4</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1580</td>
<td align="right">10885</td>
<td align="right">1224</td>
<td align="right">3655</td>
</tr>
<tr>
<td>5</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1591</td>
<td align="right">10667</td>
<td align="right">1270</td>
<td align="right">3499</td>
</tr>
<tr>
<td>6</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1567</td>
<td align="right">10647</td>
<td align="right">1267</td>
<td align="right">3048</td>
</tr>
<tr>
<td>7</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1551</td>
<td align="right">10656</td>
<td align="right">1202</td>
<td align="right">3478</td>
</tr>
<tr>
<td>8</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1553</td>
<td align="right">10644</td>
<td align="right">1209</td>
<td align="right">3386</td>
</tr>
<tr>
<td>9</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1606</td>
<td align="right">10759</td>
<td align="right">1296</td>
<td align="right">3548</td>
</tr>
<tr>
<td>19</td>
<td bgcolor="#5727ef">3</td>
<td align="right">1581</td>
<td align="right">10743</td>
<td align="right">1311</td>
<td align="right">3269</td>
</tr>
<tr>
<td>34</td>
<td bgcolor="#fdfa8e">6</td>
<td align="right">1563</td>
<td align="right">10677</td>
<td align="right">1323</td>
<td align="right">3156</td>
</tr>
</table>
<p>
&nbsp;&nbsp;
</p>
<p>Observations:</p>
<p>These pgbench tests were mostly a wash.  But that isn&#8217;t a big surprise considering I only compared the 3 best configurations that were close in the dd/bonnie++ tests.  Really wish there had been time to get some other configs in that table to see the difference.  The #9 config did well with a top 3 number in all 4 tests.</p>
<h2>Conclusion</h2>
<p>I went with setup #9.  Pure hardware raid 10.  It won almost all tests (with the striped raid 10s being the only real competitors) and pure hardware raid is super easy to configure, maintain, and monitor.</p>
<p>More generally this MD1120 performs pretty well, especially for the relatively low price.  As a quick note on price buying one of these MD1120s as configured above along with a PowerEdge 1950 with 16GB RAM and a Perc6/E card to connect them would cost less than $15k (about $540/mo for a $1 36mo lease) depending on the configuration and the deal you got.   You could probably even shave a few thousand off that number if you got a real good deal.</p>
<p>If you do buy stuff from Dell be sure to get in touch with a small business sales team.  They can offer nontrivial discounts on the price you can finagle in their online shopping cart, you get to talk to the same people every time you place an order, it gives you a contact if you have questions or run into issues with technical support, and it just generally is the way to go.</p>
<p>This MD1120+Perc6/E combo is connected to our existing DB server now and all I will say is performance is excellent.  I am seeing zero query backup, barely any IO wait reported by vmstat, and hugely impressive random IO spikes when load gets heavy (though we haven&#8217;t gotten close to maxing out so who knows how high it could go).  And, thanks to all this testing, I feel really good about having the optimal configuration for the hardware out there doing work.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/05/13/dell-md1120-perc6e-performance/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Bulk Data Loading With PostgreSQL</title>
		<link>http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/</link>
		<comments>http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/#comments</comments>
		<pubDate>Sat, 18 Apr 2009 13:18:22 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=299</guid>
		<description><![CDATA[Bulk data loading is usually a niche task &#8211; something that doesn&#8217;t have to be done too often but when it does it feels painful because it takes so long and if you mess something up it has to be done again. After loading dozens of 7 and 8 digit record count files into PostgreSQL [...]]]></description>
			<content:encoded><![CDATA[<p>Bulk data loading is usually a niche task &#8211; something that doesn&#8217;t have to be done too often but when it does it feels painful because it takes so long and if you mess something up it has to be done again.  After loading dozens of 7 and 8 digit record count files into PostgreSQL databases I think it would be helpful to at least share the basic stuff that can make the process go a lot faster.  The PostgreSQL documentation of course has everything you need for bulk data loading if you would rather read the official docs <a href="http://www.postgresql.org/docs/8.3/interactive/populate.html">here</a> but I consistently hear cases where the advice there isn&#8217;t used so I wanted to show some numbers.  That link covers more techniques, all this post really adds is some numbers to back up the basics.</p>
<p>For this post I am working with 600k records being loaded into a table that looks like this.  The data plus delimiters is about 60MB.</p>
<pre>
                 Table "public.people"
   Column    |            Type             | Modifiers
-------------+-----------------------------+-----------
 id          | character varying(36)       | not null
 first_name  | character varying(255)      |
 last_name   | character varying(255)      |
 middle_name | character varying(255)      |
 client_id   | character varying(36)       |
 birth_date  | timestamp without time zone |
Indexes:
    "people_pkey" PRIMARY KEY, btree (id)
    "idx_people_client_id" btree (client_id)
    "idx_people_name_search" gin (to_tsvector('english'::regconfig,
      (((COALESCE(first_name,
      ''::character varying)::text || ' '::text) ||
      COALESCE(middle_name, ''::character varying)::text) || ' '::text) ||
      COALESCE(last_name, ''::character varying)::text))
</pre>
<p></p>
<p></p>
<p>A few points on this particular table and config:</p>
<ul>
<li>The data is a subset of a real production table made up of entries people provided on themselves so the distribution is realistic and balanced.</li>
<li>The primary key id column is a UUID generated by code.</li>
<li>A very basic index on the client_id column.</li>
<li>A more complex expression gin index on the name fields to allow full text searching.  Full text searching was added in PostgreSQL 8.3 and it&#8217;s pretty powerful.  Read more about that <a href="http://www.postgresql.org/docs/8.3/interactive/textsearch.html">here</a> if interested.</li>
<li>Overall it is a relatively small, simple table but not trivially so and its large enough to get decent timings from bulk load operations for comparison purposes.</li>
<li>Performing tests against a clean DB, empty table, and freshly created indexes so we have a best case scenario.</li>
</ul>
<p>So, let&#8217;s say you have that empty table and want to load 600k records in.  Here is a sequence of methods going from worst to best.  I am doing a VACUUM ANALZYE before and after each operation.  All timings rounded up to nearest second.  We are comparing <a href="http://www.postgresql.org/docs/current/static/sql-insert.html">INSERT</a> and <a href="http://www.postgresql.org/docs/8.3/static/sql-copy.html">COPY</a>.</p>
<h2>First Setup: leave indexes and pkey in place.</h2>
<p><strong>INSERT</strong>: 6 minutes, 5 seconds</p>
<p><strong>COPY</strong>: 3 minutes, 11 seconds. </p>
<h2>Second Setup: drop/recreate indexes, don&#8217;t touch pkey</h2>
<p>When timed it took 1 second to drop indexes and 13 seconds to rebuild them.  I have added that overhead into these numbers.</p>
<p><strong>INSERT</strong>: 3 minutes, 19 seconds</p>
<p><strong>COPY</strong>: 1 minute, 5 seconds</p>
<h2>Third Setup: drop/recreate indexes and pkey</h2>
<p>When timed it took 1 second to drop indexes and 15 seconds to rebuild them (includes pkey).  I have added that overhead into these numbers.</p>
<p><strong>INSERT</strong>: 2 minutes, 49 seconds</p>
<p><strong>COPY</strong>: 21 seconds</p>
<h2>Conclusion</h2>
<p>With that final setup, dropping/recreating all indexes and using the COPY command, I was able to load 600k records in 21 seconds.  That includes the drop/recreate overhead.  The actual COPY command only took 5 seconds.</p>
<p>General notes related to the tests done above:</p>
<ul>
<li>The COPY command is magical.  It completely blows away INSERT for bulk load performance.</li>
<li>Dropping and recreating indexes is far faster than leaving indexes in place and forcing thousands or millions of individual updates to them.</li>
<li>Do a VACUUM ANALYZE before and after bulk loading data to be safe.  If you dropped indexes be sure to recreate them before doing the ANALYZE.</li>
<li>If you don&#8217;t think the speed improvement shown is significant keep in mind we are working with a narrow 6 column table, with just 3 indexes, and not very much data.  The savings found by dropping and recreating indexes becomes enormous as your table height and width increases and/or your number of indexes goes up.  Similarly if you are only working with thousands or low tens of thousands of rows you can just do whatever is easiest since the size is so trivial.</li>
</ul>
<p>Some other notes worth mentioning:</p>
<ul>
<li>The same rules apply for updates.  ALL indexes slow down updates, not just the ones on the column you are updating so if you need to update an entire table dropping and recreating the indexes will provide a massive speed increase.</li>
<li>If you are doing a bulk update and using a field on the updated table for lookups you are generally better off dropping all indexes <strong>except</strong> the one on your lookup column.  Say for each person in my people table I am looking up the specific person by id and setting their client_id.  I would want to leave the index/pkey on the id column so those lookups were fast.  You still pay a penalty at write time because the index/pkey must be updated but not having the index would mean a sequential scan is done on the entire table for each update.</li>
<li>Though incredibly fast the COPY command is <strong>very</strong> picky about its input.  It is an all or nothing operation and if there is even a slight issue with your data it will fail.  You will want to VACUUM your table after a failed COPY to recover space.  Thankfully if it does fail it prints out the exact line number and reason.</li>
<li>When specifying a file for the COPY command you must use an absolute path and the file (and all directories in its path) must be accessible by the postgres user because it is the server that goes and reads the file not the client.  I find it easiest to just toss my input files into /tmp.</li>
<li>Consider partitioning your data to allow drop/add of indexes on certain partitions preventing an entire table from having awful read performance while data is being loaded.  Partitioning is a whole separate topic but <a href="http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html">read the docs</a>, be sure constraint_exclusion is on in your postgresql.conf, and be sure your check constraint is in the where clause of all queries if you take this route.  Partitions and significant bulk data loading fit together really really well.</li>
<li>Seriously read the <a href="http://www.postgresql.org/docs/8.3/interactive/populate.html">PostgreSQL documentation on this stuff</a> if you are bulk loading data.  Aside from the additional SQL-level recommendations it covers the tuning options for your DB that can make a very significant difference.</li>
</ul>
<p>Finally, you may find that expression index for full text searching in my people table above looks strange.  I hope to do a post on indexing specifically that explains in detail.  In my opinion tuning database indexes is a blast and made even greater by the amazing capabilities of PostgreSQL.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Connection Pooling: PgBouncer or JDBC</title>
		<link>http://blog.gtuhl.com/2009/04/04/connection-pooling-pgbouncer-or-jdbc/</link>
		<comments>http://blog.gtuhl.com/2009/04/04/connection-pooling-pgbouncer-or-jdbc/#comments</comments>
		<pubDate>Sat, 04 Apr 2009 16:09:53 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[jdbc]]></category>
		<category><![CDATA[pgbouncer]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=276</guid>
		<description><![CDATA[Why Use Connection Pooling With recent customer additions we have had to deal with lots of scaling challenges the last few months. One puny fraction of that effort has been around tuning our database connection pools. In very short, if your database server is overwhelmed with concurrent connections it can get bogged down in context [...]]]></description>
			<content:encoded><![CDATA[<h2>Why Use Connection Pooling</h2>
<p>With recent customer additions we have had to deal with lots of scaling challenges the last few months.  One puny fraction of that effort has been around tuning our database connection pools.  In very short, if your database server is overwhelmed with concurrent connections it can get bogged down in context switching and resource/io waiting such that overall throughput or transactions per second can drop.  In the best case things just get slow.  In the worst case things completely lock up because the server just can&#8217;t service all of those connections or your client code starts throwing exceptions because the database has reached it maximum configured connection count.</p>
<p>One way to control the maximum number of connections hitting your database at once is via connection pooling.  This can ensure a consistent load that your database can dispatch quickly to keep a better overall throughput going.  You can also match your aggregate pool sizes to the maximum connection count configured in your database server to ensure your client code never asks for more than the DB can give.  The other HUGE advantage to connection pooling is that you are essentially caching the overhead of obtaining new connections.  To the database there is a set number of connections made and held open &#8211; the pooling software then distributes those pooled connections to clients that need them and when the clients are done the connection remains open and waiting to be used again.  This avoids a surprisingly high amount of overhead.</p>
<p>There is definitely a trade off.  If your application needs to have a certain number of connections simply because you have too many concurrent users it is time to tune your code/queries/indexes (<strong>HUGE</strong> gains can always be found there), push stuff into distributed caches, or buy new database hardware so you can service more connections.   There are recommendations online of a connection count as low as 2-4 per core.  We&#8217;ve settled in around 30 per core and that seems to be a sweet spot for stability and performance.  With 8 cores we can sustain well over 1000 transactions per second on a single PostgreSQL server (running on decent but not exceptional hardware).</p>
<p>We use JDBC connection pooling with <a href="http://www.mchange.com/projects/c3p0/index.html">c3p0</a>.  Each of our application servers has its own pool.  We ensure that the sum of those pool sizes matches the connections that the database allows and also equals our desired connections per core count.</p>
<p>We explored using <a href="https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer">PgBouncer</a> instead.  This differs in that the connection pooling happens on the server &#8211; clients just ask for as many connections as they need talking to PgBouncer instead of directly to PostgreSQL.</p>
<p>You cannot use both JDBC pooling and PgBouncer or other server-side pooling.  Or more correctly you can but there is no point.  Say you set PgBouncer to maintain a pool of 500 connections.  If you have JDBC pooling in the mix that just means you have to tune your individual client JDBC pools such that the sum of their pools doesn&#8217;t exceed 500.  If that is the case, why use PgBouncer?  Just use PostgreSQL and set it to handle 500 connections.  Instead you would drop JDBC pooling and just have your clients ask for whatever they wanted while PgBouncer did the pooling and connection management.</p>
<h2>Comparison</h2>
<p>In deciding to move to PgBouncer or stick with c3p0 I ran a few tests.  Here are those results.  The goal here was to compare the overhead of obtaining connections <strong>not</strong> to determine the overall best solution by throughput.  That would require significantly different and more involved tests.  We were considering a move to PgBouncer and just wanted to consider any difference in connection overhead.  These tests were done with a pool size of 50 and with client code connecting to a separate database server on the same subnet and physical switch.</p>
<p><strong>First Test</strong><br />
Ran 500 queries with each query being a simple <code>select 1</code>.  This simple query means most of the time is connection overhead so its a good test to decide between pooling solutions.</p>
<table>
<tr>
<td>c3p0 ComboPooledDataSource</td>
<td>.99 s</td>
</tr>
<tr>
<td>JDBC no pooling and no PgBouncer</td>
<td>6.41 s</td>
</tr>
<tr>
<td>JDBC no pooling but with PgBouncer</td>
<td>3.99 s</td>
</tr>
</table>
<p></p>
<p><strong>Second Test</strong><br />
Ran 500 queries where the query is selected sequentially from an array of 5 distinct queries varying from very simple to very complex.  Highly imperfect test, but trying to see how connection overhead factors into a more normal load. </p>
<table>
<tr>
<td>c3p0 ComboPooledDataSource</td>
<td>122.18 s</td>
</tr>
<tr>
<td>JDBC no pooling no PgBouncer</td>
<td>139.71 s</td>
</tr>
<tr>
<td>JDBC no pooling but with PgBouncer</td>
<td>129.81 s</td>
</tr>
</table>
<p></p>
<p>So c3p0 definitely offers the fastest solution.  Additionally PgBouncer definitely makes a difference over not using connection pooling (and thus having to obtain a new connection with each query).  This makes some sense as c3p0 is closer to the client code and keeps the pooled connections local.  With PgBouncer the clients have to hit the network to queue up with the server.</p>
<h2>Conclusion</h2>
<p>We stuck with c3p0.  But in my opinion where PgBouncer really shines is on the administrative side.  As we continue to add more servers a move to PgBouncer or something similar seems likely.  It (and <a href="http://pgpool.projects.postgresql.org/">PgPool-II</a>, another solution) offer other capabilities as well.  PgPool-II even offers built-in load balancing of reads against replicated copies of a DB. </p>
<p>Say you have 10 application servers and 1 database server configured for 500 maximum connections.  You set the individual application server pool sizes to 50.  Now you need to add a new application server.  You would have to tune all 10 of your old servers dropping their pool sizes down to make room for the new one.  This problem gets more and more severe as you add more servers.</p>
<p>Also, tuning individual pool sizes per client means there is a chance that one server is starving while others have pooled connections sitting idle &#8211; the connections aren&#8217;t shared.  Probably not a big issue if you are load balancing user connections to your application servers evenly but still a possibility.</p>
<p>As you get more and more machines the server-layer pooling with PgBouncer definitely makes more sense.  It also is obviously client language neutral so if you are running in an environment that doesn&#8217;t have mature pooling solutions its a great option.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/04/04/connection-pooling-pgbouncer-or-jdbc/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

