<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>gtuhl: startup technology &#187; sysadmin</title>
	<atom:link href="http://blog.gtuhl.com/category/sysadmin/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.gtuhl.com</link>
	<description>Development, IT, Gadgets, and Startups</description>
	<lastBuildDate>Thu, 26 Aug 2010 02:04:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Zabbix 1.8 Network Monitoring</title>
		<link>http://blog.gtuhl.com/2010/05/03/zabbix-1-8-network-monitoring/</link>
		<comments>http://blog.gtuhl.com/2010/05/03/zabbix-1-8-network-monitoring/#comments</comments>
		<pubDate>Tue, 04 May 2010 02:22:30 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[books]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[zabbix]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=558</guid>
		<description><![CDATA[In the next few days the first book completely about Zabbix (that I am aware of) is being released. Entitled Zabbix 1.8 Network Monitoring it is a welcome release. In my mind books and ever increasing version numbers are a good sign for the health of Zabbix. While the existing documentation for Zabbix is pretty [...]]]></description>
			<content:encoded><![CDATA[<p>In the next few days the first book completely about Zabbix (that I am aware of) is being released.   Entitled <a href="https://www.packtpub.com/zabbix-1-8-network-monitoring/book">Zabbix 1.8 Network Monitoring</a> it is a welcome release.  In my mind books and ever increasing version numbers are a good sign for the health of Zabbix.  </p>
<p>While the existing documentation for Zabbix is pretty decent it is presented in a very reference-focused manner.  It can explain something you know to ask about very well but doesn&#8217;t offer tips or more general context for the best approach to take in your setup.  This new book takes that core documentation (restates it perhaps a tiny amount too much) and makes it far more presentable and easier to scan in a way that promotes picking up those little details the documentation is not good for.</p>
<p>I&#8217;ve been working through most of this book the last few days and wanted to provide some thoughts both good and bad.</p>
<h2>The Good:</h2>
<ul>
<li>This is a very complete book.  The gaps in the documentation are well filled.</li>
<li>I&#8217;ve setup 1.4 and 1.6 Zabbix servers and installed a 1.8 server to work through the book with.  Even when reading it very quickly I learned several new things that I flat out wasn&#8217;t aware of or saw cleaner ways to do things (like using the built in IPMI agent support instead of wrapping freeipmi or similar).</li>
<li>Tasks around maintaining Zabbix are well covered.  I greatly appreciate there being dedicated sections and chapters on upgrading, backups, reporting, and performance tuning of Zabbix itself.</li>
<li>While the book does start out slow with basic content it finishes very strong and covers a lot of more advanced topics that make it obvious the author has actually used the software he is writing about.</li>
<li>The screenshots, code samples, and command line examples are complete enough that you don&#8217;t have to have a terminal open while reading.</li>
<li>I like the iterative approach used in many of the examples where the reader is shown each layer of the setup to aid with troubleshooting.</li>
<li>The versions of Zabbix are not highly dissimilar so this would be a good reference for any reasonably recent release (with primarily 1.6 experience the entire book still felt completely relevant).</li>
</ul>
<h2>The Bad:</h2>
<ul>
<li>The book is pretty verbose and a nontrivial portion is already captured by the freely available documentation.</li>
<li>The Linux help early on feels out of place and is not particularly good advice.  Perhaps it is just me but surely it would be safe to assume intermediate Linux knowledge from an audience that is setting up network monitoring.  The parts towards the front about writing init scripts and getting things installed are completely unnecessary on any reasonable Linux distribution (yum, pacman, apt-get all have Zabbix server and clients you can install with a single command).  Though, I suppose that information could be helpful if you want a non standard config or some insight into how things work a bit under the hood.</li>
<li>My biggest issue is the amount of background and the number of pages before what I consider the meat gets covered in depth.  That is, triggers, actions, notifications, and charting.</li>
</ul>
<h2>Conclusion</h2>
<p>Overall I am happy to see this book released.  Despite some rough edges in the interface and documentation (this book hopefully will fill the latter gap) Zabbix is a really powerful, flexible piece of software that deserves more users.  I have used Nagios, Zenoss, Cacti, and plenty of my own bash script setups and prefer Zabbix over everything else.  </p>
<p>While this book does restate some of the official documentation it does bring deeper examples and better context in addition to a lot of additional content that would be very helpful for someone getting started with Zabbix.  It is a long book that probably doesn&#8217;t make sense to read cover to cover (you probably will never need all the information in here unless you are building a gnarly monitoring install) but it would be a great learning tool and handy to have on hand as a reference long term for when setting up more advanced configurations.</p>
<p>There is enough good content in here to strongly recommend it to anyone learning about Zabbix especially if you are tasked with setting up a large or complicated installation.  If you aren&#8217;t sure, give this <a href="http://blog.gtuhl.com/downloads/7689_Zabbix_SampleChapter.pdf">Sample Chapter</a> a read and see how you like the feel of the book first.  That sample chapter is actually a pretty great resource as it covers the basic monitoring, triggers, actions, and notifications that are the core of Zabbix.  </p>
<p>If you are looking to monitor your gear with any degree of depth Zabbix really is a great tool.  The web-based monitoring options popping up are great for shallow uptime reports and basic notifications on complete outtages but Zabbix can be used to tell you a lot more about <strong>why</strong> things are failing.  At my old company I was monitoring DB TPS, Java heap and garbage collection stats, OpenMQ queue sizes, Postfix queues, end-to-end response time for users logging into their dashboards, and all sorts of other cool stuff in addition to the basics you get with the default templates.</p>
<p>Disclaimer:  I was provided a copy of this book by Packt Publishing and was happy to give it a read.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2010/05/03/zabbix-1-8-network-monitoring/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Not a Fan of MySQL</title>
		<link>http://blog.gtuhl.com/2010/04/15/not-a-fan-of-mysql/</link>
		<comments>http://blog.gtuhl.com/2010/04/15/not-a-fan-of-mysql/#comments</comments>
		<pubDate>Thu, 15 Apr 2010 13:04:09 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=527</guid>
		<description><![CDATA[It has been a wonderfully busy 2010 thus far. The blog has suffered but hoping to have a post up soon about building a monstrous build server on a budget. That said, I have been forced (by software, not by management) to use MySQL on a couple projects in the last month and after being [...]]]></description>
			<content:encoded><![CDATA[<p>It has been a wonderfully busy 2010 thus far.  The blog has suffered but hoping to have a post up soon about building a monstrous build server on a budget.  </p>
<p>That said, I have been forced (by software, not by management) to use MySQL on a couple projects in the last month and after being a PostgreSQL user for the last several years it has been an incredibly frustrating experience worth throwing up a couple bullet points about.  The more I use MySQL the more frustrated I get.  These are high level and not well argued but hoping to get points across.  I always invite digging into details.</p>
<p><strong>Reasons why I dislike MySQL:</strong></p>
<ul>
<li>The planner is incredibly dumb.  I feel like it does the wrong thing most of the time.</li>
<li>Temporary tables and subselects are relatively worthless.  They crush performance and are full of bizarre gotchas due to limitations and bugs in the planner (like not being able to use a temp table more than once in the same query).</li>
<li>The tuning process is not at all intuitive or consistent.  Tuning MySQL queries feels like trying to maximally combine hacks and workarounds for bugs in the planner to achieve something vaguely close to desired speed.</li>
<li>The planner goes to disk a LOT.  Subtle adjustments to the query will prevent it from doing so but why can&#8217;t it do the right thing and avoid disk except as a last resort or only when something doesn&#8217;t fit in memory?  My laptop has 4GB and even with a 100MB DB the MySQL planner goes to disk all the time.</li>
<li>The documentation is superficial, incomplete, and inconsistent.  The examples are trivial and unhelpful.</li>
<li>Doing a DB dump locks the ENTIRE DB.  This is awful.  I don&#8217;t want to setup replication on a trivial DB (less than say 100MB) just to do backups without locking things up.</li>
</ul>
<p><strong>PostgreSQL Comparison</strong><br />
This experience has really emphasized some huge advantages of PostgreSQL even ignoring the technical points:</p>
<ul>
<li>The code is of exceptionally high quality and exceedingly clean.  Corner cases are rare and bugs even rarer.</li>
<li>The documentation is complete, organized, and consistent.  You can genuinely learn 95% of what there is to know about PostgreSQL by reading the excellent documentation (that is kept updated and synchronized with each new release).  It is also very easy to find.  It feels like they spend as much time and effort on the documentation as they do the code.  Only OpenMQ documentation has rivaled PostgreSQL in completeness in recent memory.</li>
<li>The PostgreSQL planner is very solid and it makes the right decisions most of the time without any assistance from a human.  You can use the documentation to build a solid foundation of understanding about how things work and then use that plus your own intuition to achieve desired results.  If a feature is available in PostgreSQL it will be fast and fully understood by the planner.  The same cannot be said for MySQL where raw cycles are required to slowly absorb all the known bugs, workarounds, and feature-specific knowledge about what is actually usable on a non trivial DB and what is not.</li>
</li>
</ul>
<p>Just some thoughts.  Perhaps my perception is incorrect given the disparity in usage time.  To be blunt I would smile if Oracle destroyed MySQL and on a semi-related note believe the shenanigans of the founder/creator of MySQL around the Oracle acquisition and trying to get back something that was fairly purchased are completely lame.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2010/04/15/not-a-fan-of-mysql/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Monitoring PSUs in Arch Linux Dell Servers</title>
		<link>http://blog.gtuhl.com/2009/08/22/monitoring-psus-in-arch-linux-dell-servers/</link>
		<comments>http://blog.gtuhl.com/2009/08/22/monitoring-psus-in-arch-linux-dell-servers/#comments</comments>
		<pubDate>Sat, 22 Aug 2009 12:27:40 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[archlinux]]></category>
		<category><![CDATA[dell]]></category>
		<category><![CDATA[ipmi]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[zabbix]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=492</guid>
		<description><![CDATA[We currently use Arch Linux exclusively for servers.  Much of our equipment comes from Dell and one of the gotchas of using a non-Redhat, non-SUSE linux distribution with their servers is you cannot just drop in their Open Manage tools to monitor everything.  As a side note, despite the bloat of Open Manage, it actually [...]]]></description>
			<content:encoded><![CDATA[<p>We currently use Arch Linux exclusively for servers.  Much of our equipment comes from Dell and one of the gotchas of using a non-Redhat, non-SUSE linux distribution with their servers is you cannot just drop in their Open Manage tools to monitor everything.  As a side note, despite the bloat of Open Manage, it actually isn&#8217;t a bad set of tools once you get it installed (on an rpm-based distribution) &#8211; the command line utilities you get with it are pretty decent.  The GUI stuff is largely worthless in my biased opinion.</p>
<p>In any case, Open Manage is a big pile of rpms with lots of dependencies so it wouldn&#8217;t be easy to transfer to Arch.  I posted awhile back about <a href="http://blog.gtuhl.com/2009/03/11/monitoring-dell-perc5-and-perc6-disks-in-arch-linux/">using LSI&#8217;s Megaraid CLI tool to monitor Dell raid arrays</a> but what about everything else?  One big item that was really haunting me was power supplies.  I had no monitoring on those things so if I went too long between datacenter visits to check for amber lights on the fronts of the servers, we could have a double PSU failure on a really important server and be in a lot of trouble.  Server PSUs fail A LOT so don&#8217;t discount the importance of monitoring them.  My personal experience might be unusual, but i&#8217;ve had more PSUs fail than hard drives.</p>
<p>Yesterday I read up on <a href="http://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface">Intelligent Platform Management Interface (IPMI)</a> and saw that Dell has supported it for awhile.  I suspect their Open Manage tools are simply a proprietary wrapper around this.</p>
<p>So you just need some other software that will let you access this stuff. Some quick searches revealed ipmitools and freeipmi being popular.  I went with <a href="http://www.gnu.org/software/freeipmi/">freeipmi</a> because it is a very active project with recent updates (the Arch Linux AUR had them both but both PKGBUILDs were broken).</p>
<p>Installing it is straight forward:  download tarball, unpack, ./configure, make, make install.  This will place a bunch of command line tools in /usr/local/sbin/.  You&#8217;ll probably want to script the install process if you have to setup lots of servers.  Or, don&#8217;t be lazy like me and bundle up a PKGBUILD for the Arch AUR and just use yaourt for the installs.</p>
<p>You can check out the README and man pages for information about all the various commands but I am just using one: ipmi-sel.  This prints out the contents of the &#8220;System Event Log&#8221; and seems to correspond very nicely with any messages that appear on the front LCD of the server.  I removed, replaced, and removed again a PSU in a server and saw this perfectly parseable and useful output:</p>
<pre>28:21-Aug-2009 09:00:36:Power Supply Status <img src='http://blog.gtuhl.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> resence detected
29:21-Aug-2009 09:00:37:Power Supply PS Redundancy:Redundancy Lost
30:21-Aug-2009 09:04:56:Power Supply Status <img src='http://blog.gtuhl.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> resence detected
31:21-Aug-2009 09:04:57:Power Supply PS Redundancy:Fully Redundant (formerly "Redundancy Regained")
32:21-Aug-2009 09:05:16:Power Supply Status <img src='http://blog.gtuhl.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> resence detected
33:21-Aug-2009 09:05:17:Power Supply PS Redundancy:Redundancy Lost</pre>
<p>Other events I have seen in this log are memory DIMM failures and the case being open &#8211; and those are conveniently the only other alerts i&#8217;ve seen on the LCDs before.  You can clear out the SEL and have a few other options just with that one ipmi-sel tool &#8211; read the man page for more information.</p>
<p>For monitoring PSU failure I am using this script.  I am sure there are tighter approaches but this one works fine:</p>
<p><code>/usr/local/sbin/ipmi-sel | grep "PS Redundancy:" | tail -n1 | grep "Redundancy Lost" | wc -l</code></p>
<p>That will return a 1 if the last &#8220;PS Redundancy&#8221; related item in the log is a failure, and 0 otherwise.  You can then easily snap that into Zabbix or whatever monitoring software you prefer.  I did a <a href="http://blog.gtuhl.com/2009/02/21/monitoring-postgresql-tps-with-zabbix/">post with more detail on adding items to Zabbix</a> awhile back that might be helpful if you are not familiar with the process.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/08/22/monitoring-psus-in-arch-linux-dell-servers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL Tips and Tricks</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/</link>
		<comments>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/#comments</comments>
		<pubDate>Fri, 07 Aug 2009 22:00:03 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420</guid>
		<description><![CDATA[Here&#8217;s a dozen tips for working with a PostgreSQL database. It is a sophisticated and powerful piece of software and just knowing a few rules of thumb before diving in can be a huge help. If you want more detail read the amazing documention. My list of tips was very long so I just chopped [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a dozen tips for working with a PostgreSQL database.  It is a sophisticated and powerful piece of software and just knowing a few rules of thumb before diving in can be a huge help.  If you want more detail <a href="http://www.postgresql.org/docs/8.3/interactive/">read the amazing documention</a>.  My list of tips was very long so I just chopped off a dozen for this post.</p>
<h2>#1: Don&#8217;t do sequential scans &#8211; use indexes</h2>
<p>Do SELECTs against indexes.  Sequential scans will devastate your IO and in most cases should be avoided.  I read on a bogus guide somewhere that needing indexes was a sign of a bad database schema.  That is complete BS.</p>
<p>The only time indexes can hurt is when doing bulk inserts.  I <a href="http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/">did a post</a> about dropping/recreating to temporarily boost bulk insert performance awhile back.</p>
<p>Most of the time it doesn&#8217;t matter though.  Throw down as many indexes as you need.</p>
<h2>#2: Index all foreign keys</h2>
<p>There are certain rare cases where this doesn&#8217;t help, but as your database grows in size this becomes increasingly important.  Say you have a &#8220;people&#8221; table.  You then have an &#8220;emails&#8221; table with a person_id column that is a FK to the people table.  As an example say you have 10 million rows in the emails table.</p>
<p>Now delete a row from that people table.  PostgreSQL must scan the emails table and ensure that no email rows reference the person row you just deleted.  If that person_id column on emails is not indexed you just kicked off a sequential scan of 10 million rows with your deletion attempt.  Index that column and the pain goes away.  My general rule of thumb is to index every single FK.  You can always drop the indexes later if they aren&#8217;t needed.</p>
<p>Here&#8217;s a handy query for discovering all the FKs pointed at a table in a database.  You can then shoot through the results and verify that each table has indexes for those FKs.</p>
<pre>
select t.constraint_name, t.table_name, t.constraint_type,
   c.table_name, c.column_name
from information_schema.table_constraints t,
   information_schema.constraint_column_usage c
where t.constraint_name = c.constraint_name
   and t.constraint_type = 'FOREIGN KEY'
   and c.table_name = 'MY_TABLE_NAME';
</pre>
<h2>#3: Don&#8217;t be afraid to selectively de-normalize</h2>
<p>It is only a matter of time before you realize the <a href="http://en.wikipedia.org/wiki/Database_normalization#Normal_forms">normal forms</a> you learned about in school can seriously hurt your database performance as size gets huge.  De-normalize (usually meaning duplicate a value in two tables) when it provides a big performance increase.  You must be mindful of the maintenance concerns with de-normalization.  The maintenance can be done in code, with triggers or similar in the DB, or sometimes the data is static enough that it isn&#8217;t a huge concern.</p>
<p><strong>Edit:</strong> I hedged this advice from the original post (originally was a blanket &#8220;de-normalize all the time&#8221; that could burn someone new to databases).  You should know the normal forms and understand them.  My point was that you shouldn&#8217;t feel that wavering from them with de-normalization makes your database a bad one &#8211; it is simply a reality of nontrivial production databases.  But, if you don&#8217;t understand them, and/or don&#8217;t understand when and why de-normalization can help blindly ignoring the normal forms is only going to get you into more trouble.  So learn them and understand them but then don&#8217;t be afraid to selectively de-normalize where it can give you big boosts of performance.  A common scenario that is often very helpful is de-normalizing to eliminate a join or multiple reads.</p>
<h2>#4: Avoid joins on huge tables</h2>
<p>As your tables get bigger and bigger (many millions of rows) joins cannot be done in memory and spill over to disk.  Once that happens your performance is gone.  For smaller tables it doesn&#8217;t matter but as tables get bigger avoid joining when you can.  Try using unions, subselects, or denormalization instead.  If you have two big tables that are 1-to-1 that you are joining a lot combine them into one wide table.</p>
<h2>#5: Don&#8217;t do unanchored text searches or full text searching</h2>
<p>That is don&#8217;t try something like this with wildcards on both ends:<br />
<code>select * from people where first_name like '%joe%'</code></p>
<p>The reason is that no index can be used so PostgreSQL will be doing a sequential scan no matter what.  If you truly need full text searching use Lucene or PostgreSQL 8.3+ full text searching capabilities instead of standard indexes.</p>
<p>If you are in a pinch and can get by with the left side being anchored you can create a special index using varchar_pattern_ops like this:</p>
<p><code>create index idx_people_first_name on people (first_name varchar_pattern_ops);</code></p>
<p>That index can be used by this query (no wildcard on left side):<br />
<code>select * from people where first_name like 'joe%';</code></p>
<p>But be aware that same index cannot be used by a query using the standard &#8220;=&#8221; operator so you have to create two indexes to get both (one with varchar_pattern_ops, one without).  Yes this is messy and its why you should use Lucene, PostgreSQL 8.3+ full text searching, or something similar to do text searching.</p>
<h2>#6: Use the COPY command for bulk data loading</h2>
<p>It is orders of magnitude faster.  <a href="http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/">Check a previous post of mine</a> if you want numbers.</p>
<h2>#7: Use multi-column indexes</h2>
<p>For common queries with compound WHERE clauses create a multi-column index for maximum performance.  An example:<br />
<code>create index idx_people_full_name on people (first_name, last_name);</code></p>
<p>Now you can do queries like this and hit that index:<br />
<code>select * from people where first_name = 'joe' and last_name= 'smith';</code></p>
<p>Additionally, any WHERE clause using just first_name can take full advantage of this index (because first_name is the first column) so don&#8217;t waste disk space creating redundant single column indexes if you are already covered by a multi-column.</p>
<p>Finally, you can align a multi-column index with a common ORDER BY clause to make that query instant.  Using the above index a query against people ending in <code>order by first_name, last_name</code> would be super fast.</p>
<p>As a final note PostgreSQL can combine multiple single-column indexes so multi-column indexes on every combination are not necessary.  But, a single multi-column is going to be faster so they can be helpful for queries you run a lot.  You can read more about combining of multiple indexes <a href="http://www.postgresql.org/docs/8.2/interactive/indexes-bitmap-scans.html">here</a>.</p>
<h2>#8: Don&#8217;t bother indexing evenly distributed booleans or enum columns</h2>
<p>In PostgreSQL indexes do not store data, they are instead pointers to the data.  This means that after scanning the index a random read must be done for each matching record to get the data.  Random IO is a lot more expensive than sequential IO (though SSDs are changing that) so if the PostgreSQL planner (it does sampling of your data in the background) knows you have a very evenly distributed column (a boolean split 50/50 for instance) it won&#8217;t use the index at all.  It will instead make the assumption that a straight up sequential scan will be faster than index scanning for half the rows and having to do a ton of random IO.  Indexes on these types of columns are a waste of disk space and you are slowing down your inserts for no benefit.</p>
<h2>#9: Use EXPLAIN ANALYZE to benchmark and compare queries</h2>
<p>Using EXPLAIN on its own will give you the planner&#8217;s guess at how it will run your query.  This is pretty useful if you have a very huge query that is going to take a long time and you want to tune it without having to actually run it after each tweak.  The complete EXPLAIN ANALYZE will run the query, show the plan predicted, show what actually happened, and how long it took.</p>
<p>Be sure to ANALYZE after significant table modifications so that the planner can do its sampling and adjust its plans to account for any index changes.</p>
<h2>#10: Use expression indexes &#8211; they are awesome</h2>
<p>In PostgreSQL you can index an expression.  Here is a simple example:<br />
<code>create index idx_people_lower_first_name on people (lower(first_name));</code></p>
<p>Now you can run queries like this and use the index &#8211; no need to create a separate column with lowercase values:<br />
<code>select * from people where lower(first_name) = 'joe';</code></p>
<p>You can similarly index combinations of your data-containing columns:</p>
<p><code>create index idx_people_lower_full_name on people (lower(coalesce(first_name, '')) || lower(coalesce(last_name, '')));</code></p>
<h2>#11: Indexes can have WHERE clauses</h2>
<p>You can create conditional indexes that only apply for certain rows.  A useful scenario is creating a unique constraint on a column or set of columns only where certain conditions apply:</p>
<p><code>create unique index idx_people_person_id on people (person_id) where deleted = false;</code></p>
<p>That&#8217;s a simple example that feels a bit cleaner than changing the ids of deleted records to avoid conflicting with a unique constraint.  Anything can be placed in that WHERE clause, they aren&#8217;t limited in any way.</p>
<h2>#12: Don&#8217;t use the database if it isn&#8217;t a good fit</h2>
<p>Databases only scale so far alone.  At some point you are going to have to explore memcached, terracotta, or something similar to continue to scale.  The NoSQL (over)excitement going on right now offers alternatives as well for certain scenarios.  The main point is that at some point your database will just be one component in a set of services and technologies that provide the persistence layer for your application.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Why does a Windows Server OS exist?</title>
		<link>http://blog.gtuhl.com/2009/08/03/why-does-a-windows-server-os-exist/</link>
		<comments>http://blog.gtuhl.com/2009/08/03/why-does-a-windows-server-os-exist/#comments</comments>
		<pubDate>Mon, 03 Aug 2009 21:35:06 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[archlinux]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[windows]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=408</guid>
		<description><![CDATA[Awhile back I had to fight again with the single Windows server at WeTheCitizens to get a Blackberry reset. The RIM server software only runs on Windows, else that final machine would be running Arch Linux like everything else. It was a nightmare to get working &#8211; the RIM software has literally a half dozen [...]]]></description>
			<content:encoded><![CDATA[<p>Awhile back I had to fight again with the single Windows server at <a href="http://www.wildfireplatform.com/">WeTheCitizens</a> to get a Blackberry reset.  The RIM server software only runs on Windows, else that final machine would be running <a href="http://www.archlinux.org/">Arch Linux</a> like everything else.</p>
<p>It was a nightmare to get working &#8211; the RIM software has literally a half dozen separate flaky services all running at the same time and tripping on one another that have to align perfectly.  The worst part though was having to work with a Window server.</p>
<p>That leads me to my question: why does anyone use Windows on servers any more?  How is it manageable at all?  For any Windows admins out there, would love to get some perspective on why people use it.</p>
<ul>
<li>How do you manage updates?  Especially with lots of machines, it seems borderline impossible to keep them all updated without downtime considering the mandatory restart needed for <strong>everything</strong>.  Linux only needs a restart for some kernel modifications (i.e. barely ever).  I have machines in production with 700 days of uptime.</li>
<li>What do you do for remote access?  Does an admin really have to login with a GUI remote desktop to each server?  Is there a VPN endpoint that has to be used or are all the servers left open?  How are bulk operations configured and run in that kind of mess?  With Linux I can whip up a bash script in minutes to do whatever I need done to as many machines as I wish.</li>
<li>Is there a fast approach to setting up new servers on a variety of hardware?  I can install Arch Linux in less than 10 minutes and I am not even doing anything advanced like PXE boots or disk imaging.</li>
<li>Windows fails in large environments.  The latest big example is The London Stock Exchange <a href="http://blogs.computerworld.com/london_stock_exchange_to_abandon_failed_windows_platform">completely abandoning Windows</a> after substantial investment of money and time to set it up on that platform.  As an older example I know for awhile (maybe even still today) <a href="http://www.theregister.co.uk/2001/12/12/microsoft_hotmail_still_runs/">Hotmail ran on FreeBSD</a> after Microsoft purchased it because they couldn&#8217;t make it stable on Windows.</li>
<li>Aside from the second rate Microsoft products that are decisively outperformed by Linux alternatives (IIS, MSSQL, Exchange) is there any decent server or DB software that doesn&#8217;t run better on Linux than on Windows?  I am thinking things like Apache, PostgreSQL, Tomcat (and the Java VM in general), Mongrel, JBoss, and innumerable others.</li>
</ul>
<p>This is really just my frustrated attempt at understanding why people put themselves through the utter pain of running Windows servers.  Why would you not use Linux?  As long as there are people using Windows it means crapware like Blackberry Server has no incentive to port to Linux and that means I have to suffer at work.  We&#8217;re 4:1 ratio for iPhones to BBs so maybe the best option is to get those final BB users switched over to a better phone.</p>
<p>If you are starting a company don&#8217;t even bother with Windows on the server.  Their new Bizspark program may seem like a good idea on the surface but I view it as a trap.  You get to use an inferior product for a few years and then must pay through the nose at the end or rebuild everything with the better technology you should have used from the beginning.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/08/03/why-does-a-windows-server-os-exist/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Dell MD1120 + Perc6/E Performance</title>
		<link>http://blog.gtuhl.com/2009/05/13/dell-md1120-perc6e-performance/</link>
		<comments>http://blog.gtuhl.com/2009/05/13/dell-md1120-perc6e-performance/#comments</comments>
		<pubDate>Wed, 13 May 2009 12:33:44 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[dell]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[raid]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=318</guid>
		<description><![CDATA[The Hardware We recently ordered one of Dell&#8217;s MD1120 units and a Perc 6/E raid card with 512MB battery-backed cache to beef up our production database. Dell&#8217;s raid controllers are rebranded models manufacturered by other companies and they have been hit or miss. They&#8217;ve done some horrible things (including advertising raid1-concatenated as raid10 a long [...]]]></description>
			<content:encoded><![CDATA[<h2>The Hardware</h2>
<p>We recently ordered one of Dell&#8217;s <a href="http://www1.ap.dell.com/content/products/productdetails.aspx/storage_powervault_md1120?c=my&#038;l=en&#038;s=bsd">MD1120</a> units and a Perc 6/E raid card with 512MB battery-backed cache to beef up our production database.</p>
<p>Dell&#8217;s raid controllers are rebranded models manufacturered by other companies and they have been hit or miss.  They&#8217;ve done some horrible things (including advertising raid1-concatenated as raid10 a long while back) but my impression from reading online and from my own benchmarking are that these Perc6 cards are decent (but not exceptional).  You still get the lockin aspect &#8211; Dell won&#8217;t support your machine if there is a non-Dell raid card in it and the MD1xxx units supposedly only connect to Perc5 and Perc6 cards.</p>
<p>The MD1120 itself is a pretty cool unit.  Its only 2U and packs a lot of drives.  We ordered one with 24x 73GB 15K SAS drives.  No SSDs, I am amazed by SSD numbers but figure we can wait a few more years before shelling out the cash to fill an array with them.  I want more data on their reliability in a 24&#215;7 high IO server environment.  Here&#8217;s a picture of the new guy.</p>
<p><center><br />
<img src="http://blog.gtuhl.com/wp-content/md1120.jpg" width="550px"/><br />
</center></p>
<p>
&nbsp;&nbsp;
</p>
<p>The next time we have the need and budget to purchase a new database server from the ground up I plan to go whitebox but in this case we were looking for a relatively inexpensive way to get more capacity out of our existing Dell server and this seemed like a good option.  I just had to ignore their storage tech guy who wanted me to buy a Gigabit SAN unit and screw our performance.  So just ignore the tech guys that are part of the sales process and do your own research and benchmarking.</p>
<p>After benchmarking this Perc6+MD1120 combination extensively and putting it in production I am reasonably happy with its performance.  Going to share those numbers now as it is sometimes hard to track down data on these things.</p>
<h2>The Benchmark</h2>
<p>Bunch of notes about the testing environment and configurations for anyone interested.  If just want numbers skip past these.</p>
<ul>
<li>Perc 6/E upgraded to latest 6.2.0-0013 firmware and connected to a new PowerEdge 1950 with 2x Xeon E5410s and 8GB RAM.</li>
<li>MD1120 connected directly to Perc 6/E.</li>
<li>All hardware raid configured with 64kb stripes, write back enabled, read ahead disabled (Dell hardware read ahead isn&#8217;t good).</li>
<li>Server running latest opensuse.  Did this purely to make it easier to upgrade firmware, get Dell support etc.  If you call Dell and are using opensuse just lie and say you are running Suse 10 &#8211; everything will work and they will never know the difference.</li>
<li>All tests were run 3 times and the middle run was recorded.</li>
<li>xfs mount options were just <code>noatime</code> and ext3 mount options were <code>noatime,data=writeback</code>.</li>
<li>xfs file system params were <code>-b size=4096 -d su=64k,sw=X</code> where X was the appropriate value for the configuration involved.  ext3 params were <code>-b 4096 -E stride=16,stripe-width=192</code>.</li>
<li>dd params were &#8220;bs=8k, count=2000000&#8243; ensuring a file 2 times size of RAM to bypass OS cache.</li>
<li>The bonnie++ random seeks/second is the most important number for DB performance.</li>
<li>I did a ton of tests with the first configuration and then settled into a groove of just testing the bits that seemed to matter.  Hence the odd distribution of tests by config.</li>
</ul>
<h2>The dd and bonnie++ 1.02 results on opensuse</h2>
<p>The distinct raid configurations are color coded and numbered.  The winning individual tests are bolded.</p>
<table border="1" bordercolor="gray" cellpadding="3" cellspacing="0">
<tr bgcolor="#cccccc">
<th colspan="2">Record#</th>
<th colspan="3" align="center">Raid Level</th>
<th colspan="3" align="center">Linux Params</th>
<th colspan="3" align="center">Results</th>
</tr>
<tr bgcolor="#cccccc">
<th>Test</th>
<th>Config</th>
<th>HW</th>
<th>SW</th>
<th>Total</th>
<th>File System</th>
<th>Read Ahead</th>
<th>Sched.</th>
<th>dd Write MB/s</th>
<th>dd Read MB/s</th>
<th>bonnie++ seeks/s</th>
</tr>
<tr>
<td>1</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>540</td>
<td>519</td>
<td>787.4</td>
</tr>
<tr>
<td>2</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>noop</td>
<td>471</td>
<td>439</td>
<td>811.6</td>
</tr>
<tr>
<td>3</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>deadline</td>
<td>494</td>
<td>429</td>
<td>812.1</td>
</tr>
<tr>
<td>4</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>544</td>
<td>836</td>
<td>802.7</td>
</tr>
<tr>
<td>5</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>noop</td>
<td>474</td>
<td>837</td>
<td>809.4</td>
</tr>
<tr>
<td>6</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>deadline</td>
<td>492</td>
<td>791</td>
<td>808.4</td>
</tr>
<tr>
<td>7</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>533</td>
<td>853</td>
<td>805.9</td>
</tr>
<tr>
<td>8</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>536</td>
<td>976</td>
<td>806.3</td>
</tr>
<tr>
<td><strong>9</strong></td>
<td bgcolor="#a8d255"><strong>1</strong></td>
<td><strong>24disk&nbsp;raid10</strong></td>
<td><strong>None</strong></td>
<td><strong>10</strong></td>
<td><strong>xfs</strong></td>
<td><strong>32768</strong></td>
<td><strong>cfq</strong></td>
<td><strong>543</strong></td>
<td><strong>1035</strong></td>
<td><strong>808.6</strong></td>
</tr>
<tr>
<td>10</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>ext3</td>
<td>32768</td>
<td>cfq</td>
<td>332</td>
<td>602</td>
<td>695.2</td>
</tr>
<tr>
<td>11</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>ext3</td>
<td>4096</td>
<td>cfq</td>
<td>339</td>
<td>929</td>
<td>743</td>
</tr>
<tr>
<td>12</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>ext3</td>
<td>4096</td>
<td>noop</td>
<td>356</td>
<td>925</td>
<td>765.4</td>
</tr>
<tr>
<td>13</td>
<td bgcolor="#a8d255">1</td>
<td>24disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>ext3</td>
<td>4096</td>
<td>deadline</td>
<td>342</td>
<td>909</td>
<td>712.9</td>
</tr>
<tr>
<td>14</td>
<td bgcolor="#efc800">2</td>
<td>12disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>566</td>
<td>572</td>
<td>780.4</td>
</tr>
<tr>
<td>15</td>
<td bgcolor="#efc800">2</td>
<td>12disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>noop</td>
<td>561</td>
<td>567</td>
<td>788.9</td>
</tr>
<tr>
<td>16</td>
<td bgcolor="#efc800">2</td>
<td>12disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>deadline</td>
<td>552</td>
<td>571</td>
<td>786.6</td>
</tr>
<tr>
<td>17</td>
<td bgcolor="#efc800">2</td>
<td>12disk&nbsp;raid10</td>
<td>None</td>
<td>10</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>566</td>
<td>623</td>
<td>778</td>
</tr>
<tr>
<td>18</td>
<td bgcolor="#5727ef">3</td>
<td>2x12disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>560</td>
<td>507</td>
<td>535.9</td>
</tr>
<tr>
<td><strong>19</strong></td>
<td bgcolor="#5727ef"><strong>3</strong></td>
<td><strong>2x12disk&nbsp;raid10</strong></td>
<td><strong>raid0</strong></td>
<td><strong>100</strong></td>
<td><strong>xfs</strong></td>
<td><strong>4096</strong></td>
<td><strong>cfq</strong></td>
<td><strong>560</strong></td>
<td><strong>955</strong></td>
<td><strong>816</strong></td>
</tr>
<tr>
<td>20</td>
<td bgcolor="#5727ef">3</td>
<td>2x12disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>558</td>
<td>857</td>
<td>817.5</td>
</tr>
<tr>
<td>21</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>436</td>
<td>478</td>
<td>415.9</td>
</tr>
<tr>
<td>22</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>440</td>
<td>1038</td>
<td>666</td>
</tr>
<tr>
<td>23</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>437</td>
<td>1054</td>
<td>670</td>
</tr>
<tr>
<td>24</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>8192</td>
<td>noop</td>
<td>434</td>
<td>1058</td>
<td>651.7</td>
</tr>
<tr>
<td>25</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>8192</td>
<td>deadline</td>
<td>435</td>
<td>1044</td>
<td>666.1</td>
</tr>
<tr>
<td>26</td>
<td bgcolor="#bd550c">4</td>
<td>24disk&nbsp;raid6</td>
<td>None</td>
<td>6</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>437</td>
<td>1083</td>
<td>667.3</td>
</tr>
<tr>
<td>27</td>
<td bgcolor="#6f5900">5</td>
<td>24disk&nbsp;raid60</td>
<td>None</td>
<td>60</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>424</td>
<td>391</td>
<td>670.2</td>
</tr>
<tr>
<td>28</td>
<td bgcolor="#6f5900">5</td>
<td>24disk&nbsp;raid60</td>
<td>None</td>
<td>60</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>426</td>
<td>1038</td>
<td>669.6</td>
</tr>
<tr>
<td>29</td>
<td bgcolor="#6f5900">5</td>
<td>24disk&nbsp;raid60</td>
<td>None</td>
<td>60</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>424</td>
<td>1052</td>
<td>669.9</td>
</tr>
<tr>
<td>30</td>
<td bgcolor="#6f5900">5</td>
<td>24disk&nbsp;raid60</td>
<td>None</td>
<td>60</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>424</td>
<td>1082</td>
<td>657.5</td>
</tr>
<tr>
<td>31</td>
<td bgcolor="#fdfa8e">6</td>
<td>3x8disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>557</td>
<td>530</td>
<td>621.4</td>
</tr>
<tr>
<td>32</td>
<td bgcolor="#fdfa8e">6</td>
<td>3x8disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>555</td>
<td>936</td>
<td>820.6</td>
</tr>
<tr>
<td>33</td>
<td bgcolor="#fdfa8e">6</td>
<td>3x8disk&nbsp;raid10</td>
<td>raid0</td>
<td>100</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>560</td>
<td>902</td>
<td>817.7</td>
</tr>
<tr>
<td><strong>34</strong></td>
<td bgcolor="#fdfa8e"><strong>6</strong></td>
<td><strong>3x8disk&nbsp;raid10</strong></td>
<td><strong>raid0</strong></td>
<td><strong>100</td>
<td><strong>xfs</strong></td>
<td><strong>16384</strong></td>
<td><strong>cfq</strong></td>
<td><strong>555</strong></td>
<td><strong>1041</strong></td>
<td><strong>815.5</strong></td>
</tr>
<tr>
<td>35</td>
<td bgcolor="#86a3ff">7</td>
<td>24disk&nbsp;jbod</td>
<td>raid10</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>367</td>
<td>573</td>
<td>817.5</td>
</tr>
<tr>
<td>36</td>
<td bgcolor="#86a3ff">7</td>
<td>24disk&nbsp;jbod</td>
<td>raid10</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>360</td>
<td>964</td>
<td>814.7</td>
</tr>
<tr>
<td>37</td>
<td bgcolor="#86a3ff">7</td>
<td>24disk&nbsp;jbod</td>
<td>raid10</td>
<td>10</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>358</td>
<td>994</td>
<td>816.3</td>
</tr>
<tr>
<td>38</td>
<td bgcolor="#86a3ff">7</td>
<td>24disk&nbsp;jbod</td>
<td>raid10</td>
<td>10</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>377</td>
<td>1049</td>
<td>818.7</td>
</tr>
<tr>
<td>39</td>
<td bgcolor="#e68d82">8</td>
<td>12x2disk&nbsp;raid1</td>
<td>raid0</td>
<td>10</td>
<td>xfs</td>
<td>256</td>
<td>cfq</td>
<td>549</td>
<td>408</td>
<td>598</td>
</tr>
<tr>
<td>40</td>
<td bgcolor="#e68d82">8</td>
<td>12x2disk&nbsp;raid1</td>
<td>raid0</td>
<td>10</td>
<td>xfs</td>
<td>4096</td>
<td>cfq</td>
<td>549</td>
<td>714</td>
<td>578.5</td>
</tr>
<tr>
<td>41</td>
<td bgcolor="#e68d82">8</td>
<td>12x2disk&nbsp;raid1</td>
<td>raid0</td>
<td>10</td>
<td>xfs</td>
<td>8192</td>
<td>cfq</td>
<td>546</td>
<td>643</td>
<td>563.8</td>
</tr>
<tr>
<td>42</td>
<td bgcolor="#e68d82">8</td>
<td>12x2disk&nbsp;raid1</td>
<td>raid0</td>
<td>10</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>546</td>
<td>861</td>
<td>549.9</td>
</tr>
<tr>
<td>43</td>
<td bgcolor="#d3e8b1">9</td>
<td>24disk&nbsp;jbod</td>
<td>raid0</td>
<td>0</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>743</td>
<td>1054</td>
<td>687.6</td>
</tr>
<tr>
<td>44</td>
<td bgcolor="#6ec3fd">10</td>
<td>24disk&nbsp;raid0</td>
<td>None</td>
<td>0</td>
<td>xfs</td>
<td>16384</td>
<td>cfq</td>
<td>773</td>
<td>1094</td>
<td>671.8</td>
</tr>
</table>
<p>
&nbsp;&nbsp;
</p>
<p>Observations:</p>
<ul>
<li>Raid10 is the best option.  Winning configurations are pure hardware raid10 with loads of readahead (test #9) and software-striped raid 10 for &#8220;raid 100&#8243; (tests #19 and #34).</li>
<li>I wasn&#8217;t impressed with raid6 or raid60 and raid0 isn&#8217;t a realistic option so that is why those setups aren&#8217;t as heavily hit in the above configurations.</li>
<li>The IO scheduler didn&#8217;t really make much difference.  CFQ seemed to be just as good or better so stuck with it (its the default).</li>
<li>readahead makes a <strong>huge</strong> difference.  Linux defaults this to 256 per drive and Linux sees a hardware raid array as 1 drive.  You absolutely must increase that 256 default value to at least 4096 in my opinion.  I increased it as high as 32768 for the pure raid10 config and performance didn&#8217;t suffer in the seeks/sec or as reported by pg_bench while sequential read speeds increased dramatically.</li>
<li>xfs is faster than ext3 in all tests I compared them in.  Included just a few ext3 numbers above.  But, don&#8217;t use XFS (or ext3 with data=writeback) unless you have <strong>both</strong> a battery-backed cache on your raid controller and are connected to a UPS for main power (and ideally you should be monitoring the health of the BBU on the raid controller to ensure the battery isn&#8217;t dead).  You could lose data if this advice is ignored.</li>
</ul>
<h2>The bonnie++ 1.03e results on Arch Linux</h2>
<p>Next I installed Arch Linux which happens to come with bonnie++ 1.03e.  I did a couple bonnie++ runs just to make sure the new OS didn&#8217;t mess anything up and was shocked to see dramatically better random seeks/second numbers.  Sequential speeds were virtually the same, but seeks/second was massively faster.  Here is a table showing some of the configs (I didn&#8217;t retest them all &#8211; was running out of time) with bonnie++ 1.03e seeks/second numbers.  I am going to credit the newer version of bonnie++ for this difference.  I retested enough configurations to feel good about the trends seen when testing with opensuse and bonnie++ 1.02 still holding.  Open to hearing other possible explanations for the performance increase.  I was glad to see these numbers as I was disappointed with the apparent 800ish ceiling I was seeing in the first batch of tests.</p>
<p>Here are the configs I retested (middle of 3 runs again).  The Test# matches the row in the above table.  The new 1.03e score is listed next to the old 1.02 score.</p>
<table border="1" bordercolor="gray" cellpadding="3" cellspacing="0">
<tr bgcolor="#cccccc">
<th>Test#</th>
<th>Config#</th>
<th>bonnie++ 1.02 seeks/sec
<th>bonnie++ 1.03e seeks/sec</th>
</tr>
<tr>
<td>1</td>
<td bgcolor="#a8d255">1</td>
<td align="right">787.4</td>
<td align="right">1613</td>
</tr>
<tr>
<td>4</td>
<td bgcolor="#a8d255">1</td>
<td align="right">802.7</td>
<td align="right">1652</td>
</tr>
<tr>
<td>5</td>
<td bgcolor="#a8d255">1</td>
<td align="right">809.4</td>
<td align="right">1639</td>
</tr>
<tr>
<td>6</td>
<td bgcolor="#a8d255">1</td>
<td align="right">808.4</td>
<td align="right">1688</td>
</tr>
<tr>
<td>7</td>
<td bgcolor="#a8d255">1</td>
<td align="right">805.9</td>
<td align="right">1684</td>
</tr>
<tr>
<td>8</td>
<td bgcolor="#a8d255">1</td>
<td align="right">806.3</td>
<td align="right">1697</td>
</tr>
<tr>
<td>9</td>
<td bgcolor="#a8d255">1</td>
<td align="right">808.6</td>
<td align="right">1717</td>
</tr>
<tr>
<td>19</td>
<td bgcolor="#5727ef">3</td>
<td align="right">816</td>
<td align="right">1662</td>
</tr>
<tr>
<td>26</td>
<td bgcolor="#bd550c">4</td>
<td align="right">667.3</td>
<td align="right">1056</td>
</tr>
<tr>
<td>30</td>
<td bgcolor="#6f5900">5</td>
<td align="right">657.5</td>
<td align="right">1168</td>
</tr>
<tr>
<td>34</td>
<td bgcolor="#fdfa8e">6</td>
<td align="right">815.5</td>
<td align="right">1705</td>
</tr>
<tr>
<td>38</td>
<td bgcolor="#86a3ff">7</td>
<td align="right">818.7</td>
<td align="right">1560</td>
</tr>
<tr>
<td>44</td>
<td bgcolor="#6ec3fd">10</td>
<td align="right">671.8</td>
<td align="right">1175</td>
</tr>
</table>
<p>
&nbsp;&nbsp;
</p>
<p>Observations:</p>
<p>Higher numbers across the board.  No big new insights.  Raid 10 still wins, raid 6/60 still substantially slower.  At this point that pure raid 10 config (test #9) that scored a 1717 is looking pretty nice.  The pure software raid 10 (test #38) fell behind the hardware version further.  Biggest take away from this is to be absolutely certain that when you are benchmarking disks the OS and tools are identical.</p>
<h2>The pgbench results</h2>
<p>Finally, I took the 3 fastest configurations and did pgbench runs with those.  I was running out of time so I took what looked like the winner (test #9) and additionally tuned readahead and schedules a bit to ensure I got the best combination.  pgbench isn&#8217;t perfect and there are people who dislike it but it gives us another number to compare and consider along with the raw dd/bonnie++ numbers already known.  Keep in mind I am using a PostgreSQL install and pgbench on this new server &#8211; not the actual production server.  Doing the benchmarks on the actual final server just wasn&#8217;t an option.  The only significant difference between the benchmark server and the production server is that the latter has 4 times the RAM and has had nontrivial postgresql.conf tuning so I can only assume these numbers would improve a good bit.</p>
<p>I just tweaked a few things in the postgresql.conf file on the benchmark server.  The non default values:</p>
<ul>
<li><code>shared_buffers = 2048MB</code></li>
<li><code>checkpoint_segments = 10</code></li>
<li><code>effective_cache_size = 4096MB</code></li>
<li><code>max_connections = 500</code></li>
<li><code>work_mem = 20MB</code></li>
<li><code>maintenance_work_mem = 128MB</code></li>
<li><code>synchronous_commits = off</code></li>
<li><code>random_page_cost = 2.0</code></li>
</ul>
<p>Note that I did have to increase SHMMAX to get shared_buffers + connections that high.  See my <a href="http://blog.gtuhl.com/2009/03/08/postgresql-setup-basics/">PostgreSQL setup post</a> for more information about that.</p>
<p>Between each run I dropped the &#8220;test&#8221; database and recreated it.  I initialized with these commands.  Note the different scale factors to get nontrivial data amounts &#8211; scale factor 1000 is pretty large but was necessary before I saw the disks working constantly during the benchmarks.</p>
<p><code>pgbench -i -s 100 -U postgres -d test</code><br />
<code>pgbench -i -s 1000 -U postgres -d test</code></p>
<p>Then I ran the tests specifying 40 clients and 10,000 transactions per client in the params.  <a href="http://developer.postgresql.org/pgdocs/postgres/pgbench.html">Read more about the pgbench tool</a> for the test differences and what the transactions involve.</p>
<p>Here are the pgbench results.  Numbers are transactions/second.  Values listed are again the middle of 3 runs.  I would have liked to do these pgbench runs on more configs but I again was running out of time and these suckers took a long time to run with all the dropping and re-initializing.</p>
<table border="1" bordercolor="gray" cellpadding="3" cellspacing="0">
<tr bgcolor="#cccccc">
<th>Test#</th>
<th>Config#</th>
<th>TPC-B s=100</th>
<th>SELECT s=100</th>
<th>TPC-B s=1000</th>
<th>SELECT s=1000</th>
</tr>
<tr>
<td>4</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1580</td>
<td align="right">10885</td>
<td align="right">1224</td>
<td align="right">3655</td>
</tr>
<tr>
<td>5</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1591</td>
<td align="right">10667</td>
<td align="right">1270</td>
<td align="right">3499</td>
</tr>
<tr>
<td>6</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1567</td>
<td align="right">10647</td>
<td align="right">1267</td>
<td align="right">3048</td>
</tr>
<tr>
<td>7</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1551</td>
<td align="right">10656</td>
<td align="right">1202</td>
<td align="right">3478</td>
</tr>
<tr>
<td>8</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1553</td>
<td align="right">10644</td>
<td align="right">1209</td>
<td align="right">3386</td>
</tr>
<tr>
<td>9</td>
<td bgcolor="#a8d255">1</td>
<td align="right">1606</td>
<td align="right">10759</td>
<td align="right">1296</td>
<td align="right">3548</td>
</tr>
<tr>
<td>19</td>
<td bgcolor="#5727ef">3</td>
<td align="right">1581</td>
<td align="right">10743</td>
<td align="right">1311</td>
<td align="right">3269</td>
</tr>
<tr>
<td>34</td>
<td bgcolor="#fdfa8e">6</td>
<td align="right">1563</td>
<td align="right">10677</td>
<td align="right">1323</td>
<td align="right">3156</td>
</tr>
</table>
<p>
&nbsp;&nbsp;
</p>
<p>Observations:</p>
<p>These pgbench tests were mostly a wash.  But that isn&#8217;t a big surprise considering I only compared the 3 best configurations that were close in the dd/bonnie++ tests.  Really wish there had been time to get some other configs in that table to see the difference.  The #9 config did well with a top 3 number in all 4 tests.</p>
<h2>Conclusion</h2>
<p>I went with setup #9.  Pure hardware raid 10.  It won almost all tests (with the striped raid 10s being the only real competitors) and pure hardware raid is super easy to configure, maintain, and monitor.</p>
<p>More generally this MD1120 performs pretty well, especially for the relatively low price.  As a quick note on price buying one of these MD1120s as configured above along with a PowerEdge 1950 with 16GB RAM and a Perc6/E card to connect them would cost less than $15k (about $540/mo for a $1 36mo lease) depending on the configuration and the deal you got.   You could probably even shave a few thousand off that number if you got a real good deal.</p>
<p>If you do buy stuff from Dell be sure to get in touch with a small business sales team.  They can offer nontrivial discounts on the price you can finagle in their online shopping cart, you get to talk to the same people every time you place an order, it gives you a contact if you have questions or run into issues with technical support, and it just generally is the way to go.</p>
<p>This MD1120+Perc6/E combo is connected to our existing DB server now and all I will say is performance is excellent.  I am seeing zero query backup, barely any IO wait reported by vmstat, and hugely impressive random IO spikes when load gets heavy (though we haven&#8217;t gotten close to maxing out so who knows how high it could go).  And, thanks to all this testing, I feel really good about having the optimal configuration for the hardware out there doing work.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/05/13/dell-md1120-perc6e-performance/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Bulk Data Loading With PostgreSQL</title>
		<link>http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/</link>
		<comments>http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/#comments</comments>
		<pubDate>Sat, 18 Apr 2009 13:18:22 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=299</guid>
		<description><![CDATA[Bulk data loading is usually a niche task &#8211; something that doesn&#8217;t have to be done too often but when it does it feels painful because it takes so long and if you mess something up it has to be done again. After loading dozens of 7 and 8 digit record count files into PostgreSQL [...]]]></description>
			<content:encoded><![CDATA[<p>Bulk data loading is usually a niche task &#8211; something that doesn&#8217;t have to be done too often but when it does it feels painful because it takes so long and if you mess something up it has to be done again.  After loading dozens of 7 and 8 digit record count files into PostgreSQL databases I think it would be helpful to at least share the basic stuff that can make the process go a lot faster.  The PostgreSQL documentation of course has everything you need for bulk data loading if you would rather read the official docs <a href="http://www.postgresql.org/docs/8.3/interactive/populate.html">here</a> but I consistently hear cases where the advice there isn&#8217;t used so I wanted to show some numbers.  That link covers more techniques, all this post really adds is some numbers to back up the basics.</p>
<p>For this post I am working with 600k records being loaded into a table that looks like this.  The data plus delimiters is about 60MB.</p>
<pre>
                 Table "public.people"
   Column    |            Type             | Modifiers
-------------+-----------------------------+-----------
 id          | character varying(36)       | not null
 first_name  | character varying(255)      |
 last_name   | character varying(255)      |
 middle_name | character varying(255)      |
 client_id   | character varying(36)       |
 birth_date  | timestamp without time zone |
Indexes:
    "people_pkey" PRIMARY KEY, btree (id)
    "idx_people_client_id" btree (client_id)
    "idx_people_name_search" gin (to_tsvector('english'::regconfig,
      (((COALESCE(first_name,
      ''::character varying)::text || ' '::text) ||
      COALESCE(middle_name, ''::character varying)::text) || ' '::text) ||
      COALESCE(last_name, ''::character varying)::text))
</pre>
<p></p>
<p></p>
<p>A few points on this particular table and config:</p>
<ul>
<li>The data is a subset of a real production table made up of entries people provided on themselves so the distribution is realistic and balanced.</li>
<li>The primary key id column is a UUID generated by code.</li>
<li>A very basic index on the client_id column.</li>
<li>A more complex expression gin index on the name fields to allow full text searching.  Full text searching was added in PostgreSQL 8.3 and it&#8217;s pretty powerful.  Read more about that <a href="http://www.postgresql.org/docs/8.3/interactive/textsearch.html">here</a> if interested.</li>
<li>Overall it is a relatively small, simple table but not trivially so and its large enough to get decent timings from bulk load operations for comparison purposes.</li>
<li>Performing tests against a clean DB, empty table, and freshly created indexes so we have a best case scenario.</li>
</ul>
<p>So, let&#8217;s say you have that empty table and want to load 600k records in.  Here is a sequence of methods going from worst to best.  I am doing a VACUUM ANALZYE before and after each operation.  All timings rounded up to nearest second.  We are comparing <a href="http://www.postgresql.org/docs/current/static/sql-insert.html">INSERT</a> and <a href="http://www.postgresql.org/docs/8.3/static/sql-copy.html">COPY</a>.</p>
<h2>First Setup: leave indexes and pkey in place.</h2>
<p><strong>INSERT</strong>: 6 minutes, 5 seconds</p>
<p><strong>COPY</strong>: 3 minutes, 11 seconds. </p>
<h2>Second Setup: drop/recreate indexes, don&#8217;t touch pkey</h2>
<p>When timed it took 1 second to drop indexes and 13 seconds to rebuild them.  I have added that overhead into these numbers.</p>
<p><strong>INSERT</strong>: 3 minutes, 19 seconds</p>
<p><strong>COPY</strong>: 1 minute, 5 seconds</p>
<h2>Third Setup: drop/recreate indexes and pkey</h2>
<p>When timed it took 1 second to drop indexes and 15 seconds to rebuild them (includes pkey).  I have added that overhead into these numbers.</p>
<p><strong>INSERT</strong>: 2 minutes, 49 seconds</p>
<p><strong>COPY</strong>: 21 seconds</p>
<h2>Conclusion</h2>
<p>With that final setup, dropping/recreating all indexes and using the COPY command, I was able to load 600k records in 21 seconds.  That includes the drop/recreate overhead.  The actual COPY command only took 5 seconds.</p>
<p>General notes related to the tests done above:</p>
<ul>
<li>The COPY command is magical.  It completely blows away INSERT for bulk load performance.</li>
<li>Dropping and recreating indexes is far faster than leaving indexes in place and forcing thousands or millions of individual updates to them.</li>
<li>Do a VACUUM ANALYZE before and after bulk loading data to be safe.  If you dropped indexes be sure to recreate them before doing the ANALYZE.</li>
<li>If you don&#8217;t think the speed improvement shown is significant keep in mind we are working with a narrow 6 column table, with just 3 indexes, and not very much data.  The savings found by dropping and recreating indexes becomes enormous as your table height and width increases and/or your number of indexes goes up.  Similarly if you are only working with thousands or low tens of thousands of rows you can just do whatever is easiest since the size is so trivial.</li>
</ul>
<p>Some other notes worth mentioning:</p>
<ul>
<li>The same rules apply for updates.  ALL indexes slow down updates, not just the ones on the column you are updating so if you need to update an entire table dropping and recreating the indexes will provide a massive speed increase.</li>
<li>If you are doing a bulk update and using a field on the updated table for lookups you are generally better off dropping all indexes <strong>except</strong> the one on your lookup column.  Say for each person in my people table I am looking up the specific person by id and setting their client_id.  I would want to leave the index/pkey on the id column so those lookups were fast.  You still pay a penalty at write time because the index/pkey must be updated but not having the index would mean a sequential scan is done on the entire table for each update.</li>
<li>Though incredibly fast the COPY command is <strong>very</strong> picky about its input.  It is an all or nothing operation and if there is even a slight issue with your data it will fail.  You will want to VACUUM your table after a failed COPY to recover space.  Thankfully if it does fail it prints out the exact line number and reason.</li>
<li>When specifying a file for the COPY command you must use an absolute path and the file (and all directories in its path) must be accessible by the postgres user because it is the server that goes and reads the file not the client.  I find it easiest to just toss my input files into /tmp.</li>
<li>Consider partitioning your data to allow drop/add of indexes on certain partitions preventing an entire table from having awful read performance while data is being loaded.  Partitioning is a whole separate topic but <a href="http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html">read the docs</a>, be sure constraint_exclusion is on in your postgresql.conf, and be sure your check constraint is in the where clause of all queries if you take this route.  Partitions and significant bulk data loading fit together really really well.</li>
<li>Seriously read the <a href="http://www.postgresql.org/docs/8.3/interactive/populate.html">PostgreSQL documentation on this stuff</a> if you are bulk loading data.  Aside from the additional SQL-level recommendations it covers the tuning options for your DB that can make a very significant difference.</li>
</ul>
<p>Finally, you may find that expression index for full text searching in my people table above looks strange.  I hope to do a post on indexing specifically that explains in detail.  In my opinion tuning database indexes is a blast and made even greater by the amazing capabilities of PostgreSQL.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/04/18/bulk-data-loading-with-postgresql/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Connection Pooling: PgBouncer or JDBC</title>
		<link>http://blog.gtuhl.com/2009/04/04/connection-pooling-pgbouncer-or-jdbc/</link>
		<comments>http://blog.gtuhl.com/2009/04/04/connection-pooling-pgbouncer-or-jdbc/#comments</comments>
		<pubDate>Sat, 04 Apr 2009 16:09:53 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[jdbc]]></category>
		<category><![CDATA[pgbouncer]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=276</guid>
		<description><![CDATA[Why Use Connection Pooling With recent customer additions we have had to deal with lots of scaling challenges the last few months. One puny fraction of that effort has been around tuning our database connection pools. In very short, if your database server is overwhelmed with concurrent connections it can get bogged down in context [...]]]></description>
			<content:encoded><![CDATA[<h2>Why Use Connection Pooling</h2>
<p>With recent customer additions we have had to deal with lots of scaling challenges the last few months.  One puny fraction of that effort has been around tuning our database connection pools.  In very short, if your database server is overwhelmed with concurrent connections it can get bogged down in context switching and resource/io waiting such that overall throughput or transactions per second can drop.  In the best case things just get slow.  In the worst case things completely lock up because the server just can&#8217;t service all of those connections or your client code starts throwing exceptions because the database has reached it maximum configured connection count.</p>
<p>One way to control the maximum number of connections hitting your database at once is via connection pooling.  This can ensure a consistent load that your database can dispatch quickly to keep a better overall throughput going.  You can also match your aggregate pool sizes to the maximum connection count configured in your database server to ensure your client code never asks for more than the DB can give.  The other HUGE advantage to connection pooling is that you are essentially caching the overhead of obtaining new connections.  To the database there is a set number of connections made and held open &#8211; the pooling software then distributes those pooled connections to clients that need them and when the clients are done the connection remains open and waiting to be used again.  This avoids a surprisingly high amount of overhead.</p>
<p>There is definitely a trade off.  If your application needs to have a certain number of connections simply because you have too many concurrent users it is time to tune your code/queries/indexes (<strong>HUGE</strong> gains can always be found there), push stuff into distributed caches, or buy new database hardware so you can service more connections.   There are recommendations online of a connection count as low as 2-4 per core.  We&#8217;ve settled in around 30 per core and that seems to be a sweet spot for stability and performance.  With 8 cores we can sustain well over 1000 transactions per second on a single PostgreSQL server (running on decent but not exceptional hardware).</p>
<p>We use JDBC connection pooling with <a href="http://www.mchange.com/projects/c3p0/index.html">c3p0</a>.  Each of our application servers has its own pool.  We ensure that the sum of those pool sizes matches the connections that the database allows and also equals our desired connections per core count.</p>
<p>We explored using <a href="https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer">PgBouncer</a> instead.  This differs in that the connection pooling happens on the server &#8211; clients just ask for as many connections as they need talking to PgBouncer instead of directly to PostgreSQL.</p>
<p>You cannot use both JDBC pooling and PgBouncer or other server-side pooling.  Or more correctly you can but there is no point.  Say you set PgBouncer to maintain a pool of 500 connections.  If you have JDBC pooling in the mix that just means you have to tune your individual client JDBC pools such that the sum of their pools doesn&#8217;t exceed 500.  If that is the case, why use PgBouncer?  Just use PostgreSQL and set it to handle 500 connections.  Instead you would drop JDBC pooling and just have your clients ask for whatever they wanted while PgBouncer did the pooling and connection management.</p>
<h2>Comparison</h2>
<p>In deciding to move to PgBouncer or stick with c3p0 I ran a few tests.  Here are those results.  The goal here was to compare the overhead of obtaining connections <strong>not</strong> to determine the overall best solution by throughput.  That would require significantly different and more involved tests.  We were considering a move to PgBouncer and just wanted to consider any difference in connection overhead.  These tests were done with a pool size of 50 and with client code connecting to a separate database server on the same subnet and physical switch.</p>
<p><strong>First Test</strong><br />
Ran 500 queries with each query being a simple <code>select 1</code>.  This simple query means most of the time is connection overhead so its a good test to decide between pooling solutions.</p>
<table>
<tr>
<td>c3p0 ComboPooledDataSource</td>
<td>.99 s</td>
</tr>
<tr>
<td>JDBC no pooling and no PgBouncer</td>
<td>6.41 s</td>
</tr>
<tr>
<td>JDBC no pooling but with PgBouncer</td>
<td>3.99 s</td>
</tr>
</table>
<p></p>
<p><strong>Second Test</strong><br />
Ran 500 queries where the query is selected sequentially from an array of 5 distinct queries varying from very simple to very complex.  Highly imperfect test, but trying to see how connection overhead factors into a more normal load. </p>
<table>
<tr>
<td>c3p0 ComboPooledDataSource</td>
<td>122.18 s</td>
</tr>
<tr>
<td>JDBC no pooling no PgBouncer</td>
<td>139.71 s</td>
</tr>
<tr>
<td>JDBC no pooling but with PgBouncer</td>
<td>129.81 s</td>
</tr>
</table>
<p></p>
<p>So c3p0 definitely offers the fastest solution.  Additionally PgBouncer definitely makes a difference over not using connection pooling (and thus having to obtain a new connection with each query).  This makes some sense as c3p0 is closer to the client code and keeps the pooled connections local.  With PgBouncer the clients have to hit the network to queue up with the server.</p>
<h2>Conclusion</h2>
<p>We stuck with c3p0.  But in my opinion where PgBouncer really shines is on the administrative side.  As we continue to add more servers a move to PgBouncer or something similar seems likely.  It (and <a href="http://pgpool.projects.postgresql.org/">PgPool-II</a>, another solution) offer other capabilities as well.  PgPool-II even offers built-in load balancing of reads against replicated copies of a DB. </p>
<p>Say you have 10 application servers and 1 database server configured for 500 maximum connections.  You set the individual application server pool sizes to 50.  Now you need to add a new application server.  You would have to tune all 10 of your old servers dropping their pool sizes down to make room for the new one.  This problem gets more and more severe as you add more servers.</p>
<p>Also, tuning individual pool sizes per client means there is a chance that one server is starving while others have pooled connections sitting idle &#8211; the connections aren&#8217;t shared.  Probably not a big issue if you are load balancing user connections to your application servers evenly but still a possibility.</p>
<p>As you get more and more machines the server-layer pooling with PgBouncer definitely makes more sense.  It also is obviously client language neutral so if you are running in an environment that doesn&#8217;t have mature pooling solutions its a great option.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/04/04/connection-pooling-pgbouncer-or-jdbc/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Monitoring Dell Perc5 and Perc6 Disks in Arch Linux</title>
		<link>http://blog.gtuhl.com/2009/03/11/monitoring-dell-perc5-and-perc6-disks-in-arch-linux/</link>
		<comments>http://blog.gtuhl.com/2009/03/11/monitoring-dell-perc5-and-perc6-disks-in-arch-linux/#comments</comments>
		<pubDate>Thu, 12 Mar 2009 00:25:14 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[archlinux]]></category>
		<category><![CDATA[dell]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[raid]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=207</guid>
		<description><![CDATA[One of the downsides to hardware raid is that it is not as easy to monitor as software raid. Monitoring individual disk status requires proprietary software made to match the hardware. This is the position you will be in if buying Dell equipment with their Perc5/Perc6 controllers. The reason is that to the OS your [...]]]></description>
			<content:encoded><![CDATA[<p>One of the downsides to hardware raid is that it is not as easy to monitor as software raid.  Monitoring individual disk status requires proprietary software made to match the hardware.  This is the position you will be in if buying Dell equipment with their Perc5/Perc6 controllers.  The reason is that to the OS your big raid array is just a single big disk &#8211; the hardware controller masks knowledge of the individual disks and their status.  You <strong>can</strong> monitor the Dell disks though, you just need the matching software.  This will work for probably any Linux distribution and I suspect the earlier Perc controllers as well.</p>
<p>It is definitely a good idea to monitor your individual disks, else you could have one fail and not even know your raid array is operating in a degraded state.  When a disk fails you want to act quickly (and preferably have hot spares configured in your controller) because disks manufactured in the same batches are rumored to frequently fail around the same time.  Though I have not experienced this personally, it seems plausible.</p>
<p>The first option you might happen upon is installing Dell&#8217;s Open Manage software.  That stuff is pretty bloated and you have an adventure ahead of you in getting that to install if you aren&#8217;t running Redhat or SuSE (or perhaps Debian for which a repacked set of .debs seems to popup quickly after a new version is released).</p>
<p>The other option and one I will shoot through here is using the Megaraid CLI tool from <a href="http://www.lsi.com/storage_home/products_home/internal_raid/index.html">LSI</a>.  The Perc controllers in the Dells apparently are rebranded LSI controllers so you can use this command line tool to extract the good information for monitoring purposes.  Here&#8217;s the steps to installing it.</p>
<h1>Installation</h1>
<ol>
<li>Grab the Megaraid CLI program from LSI.  At the time of this post you can get it at <a href="http://www.lsi.com/DistributionSystem/AssetDocument/4.00.11_Linux_MegaCLI.zip">http://www.lsi.com/DistributionSystem/AssetDocument/4.00.11_Linux_MegaCLI.zip</a>.</li>
<li>Put the downloaded .zip file on your server somewhere in a temporary location and unzip it.</li>
<li>Unzip the zip file that is unpacked (guess the double zip is for good luck?)</li>
<li>Install rpmextract.  That is <code>pacman -Sy rpmextract</code> on Arch Linux.</li>
<li>Unpack the .rpm that was in the innermost .zip file with <code>/usr/bin/rpmextract.sh MegaCLI-4.0.0.11-1.i386.rpm</code></li>
<li><code>mv</code> the resulting MegaRAID directory to <code>/opt/MegaRAID</code>.</li>
</ol>
<p>Note that if you are running one of those fancy rpm-based Linux distributions you will obviously skip the rpmextract part <img src='http://blog.gtuhl.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>That will do it for the install.  Now on to Usage.</p>
<h1>Usage</h1>
<p>You can run <code>/opt/MegaRAID/MegaCli/MegaCli64 -h</code> to see all the available options.  There are a ton and that help output is completely not useful at all.  This tool appears to be able to both query the status of disks as well as perform operations against them.  I&#8217;ve only used it for monitoring and really only use this one command:</p>
<p><code>/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllnfo -aAll</code></p>
<p>This will generate a ton of output on each of the compatible controllers in your server.  If you have just a server (that isn&#8217;t attached to extra storage) you will likely only have 1 controller.  If you have an MD1000/3000 or other direct attached storage connected via an extra Perc adapter you could have multiple controllers.</p>
<p>In all of that mess of output there will be a &#8220;Device Present&#8221; section for each controller.  In that section you will see output like this:</p>
<pre>
                Device Present
                ================
Virtual Drives    : 1
  Degraded        : 0
  Offline         : 0
Physical Devices  : 16
  Disks           : 15
  Critical Disks  : 0
  Failed Disks    : 0
</pre>
<p></p>
<p>You can see 1 virtual drive listed &#8211; this is the big single drive the OS sees.  You can also see its status there &#8211; Degraded is thankfully 0.  If it was greater than 0 it would mean your &#8220;Virtual Drive&#8221; is degraded likely meaning a disk has dropped out.  I suppose Offline would mean your array is fried due to multiple disk failure or isn&#8217;t being used at all.</p>
<p>Additionally you can see the physical drives and their status.  Thankfully none of mine are Critical or Failed.  I can&#8217;t say I understand why it says there are 16 drives when there are only 15 unless it is recounting the virtual drive for some reason. </p>
<p>In any case, you can see that by checking those counts you can pick up whether your raid array is operating in a degraded state.  Here is a command that grabs just the &#8220;Degraded&#8221; number from the &#8220;Virtual Drives&#8221; section:</p>
<p><code>./MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -a0 | grep "Virtual Drives" -A 1 | awk 'END {print $3}'</code></p>
<p>Even better, here is one that sums up the &#8220;Degraded&#8221; number for all controllers.  This one is more flexible as it is equally useful on a server with a bunch of different controllers and raid arrays:</p>
<p><code>./MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aAll -NoLog | grep -A 2 "Virtual Drives" | awk '/Degraded/ {TOTAL += $3} END {print TOTAL}'</code></p>
<p>The output of that last command is a single integer.  That would be easy to snap into your monitoring software of choice &#8211; my preference is zabbix so I would add this to my <code>/etc/zabbix/zabbix_agentd.conf</code> file:</p>
<p><code>UserParameter=custom.megaraid.degradedCount,/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aAll -NoLog | grep -A 2 'Virtual Drives' | awk '/Degraded/ {TOTAL += $3} END {print TOTAL}'</code></p>
<p>Now bounce your zabbix agent, add the &#8220;custom.megaraid.degradedCount&#8221; as an item to your server(s) in the web interface and you are set.</p>
<p>This works really well. I tested it out by yanking disks out of running servers and watching their degraded count jump up.  You may want to do the same just to ensure it is working end-to-end but don&#8217;t say I told you to do it and don&#8217;t yank a disk out of an array that can&#8217;t handle multiple disk failures.  If you do and another disk actually fails while your test disk rebuilds you will be in rough shape.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/03/11/monitoring-dell-perc5-and-perc6-disks-in-arch-linux/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>PostgreSQL Setup Basics</title>
		<link>http://blog.gtuhl.com/2009/03/08/postgresql-setup-basics/</link>
		<comments>http://blog.gtuhl.com/2009/03/08/postgresql-setup-basics/#comments</comments>
		<pubDate>Mon, 09 Mar 2009 01:22:24 +0000</pubDate>
		<dc:creator>Joe</dc:creator>
				<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[postgresql]]></category>

		<guid isPermaLink="false">http://blog.gtuhl.com/?p=166</guid>
		<description><![CDATA[There is a seemingly endless amount of information you can learn about PostgreSQL usage, administration, and tuning. The amount of information out there can be overwhelming but just learning about a small piece of it can allow you to get dramatically better performance. This post is an effort to provide something you can read quickly [...]]]></description>
			<content:encoded><![CDATA[<p>There is a seemingly endless amount of information you can learn about PostgreSQL usage, administration, and tuning.  The amount of information out there can be overwhelming but just learning about a small piece of it can allow you to get <strong>dramatically</strong> better performance.  This post is an effort to provide something you can read quickly and get some benefit from.  This is purely my opinion, you should spend some time reading the <a href="http://www.postgresql.org/docs/8.3/interactive/index.html">excellent PostgreSQL documentation</a>, reviewing proper <a href="http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server">tuning guides</a>, and reading the <a href="http://archives.postgresql.org/pgsql-performance/">PostgreSQL performance list</a> for more information.  I&#8217;ve tried in this post to keep things at a high level and omitted a lot of details to try and make it shorter and easier to read. </p>
<p>On all of the below the <a href="http://www.postgresql.org/community/lists/">PostgreSQL mailing lists</a> are invaluable.  You can ask questions about your hardware, queries, performance, whatever and get utterly helpful and complete information from incredibly knowledgeable people many of whom are the developers behind PostgreSQL.</p>
<h2>Hardware</h2>
<p>PostgreSQL scales up beautifully as you feed it increasingly powerful hardware and can handle enormous concurrent load.  Here is how hardware components rank in importance.</p>
<ol>
<li><strong>Disk IO</strong> is king.  Especially if your database does not fit in memory.  You want a lot of spindles (individual disks), a good raid controller with a battery-backed cache, and raid 10 is recommended almost universally.  Fancy CPUs and lots of RAM are a <strong>complete</strong> waste of money if your disk setup is not up to snuff.  Faster individual disks are obviously better too.</li>
<li><strong>Memory is next important</strong>. Almost as important as disk IO if your database fits completely in memory but that isn&#8217;t a condition that sticks around forever or that you can normally rely on.  The more memory the better.  Several of the configuration parameters in postgresql.conf are direct percentages of total system memory.</li>
<li><strong>CPU is still important</strong>, especially if your database is doing a lot of number crunching and reporting/complex queries, but isn&#8217;t going to be your bottleneck in most cases.  8 cores is easy to achieve now, just buy whichever Xeon CPU hits the price/performance sweet spot.  Don&#8217;t pay huge for bleeding edge GHz.</li>
</ol>
<p>Mainstream vendors should be sufficient for all but massive databases.  Our database holds hundreds of gigs of data and does thousands of transactions a second and it runs on plain Dell equipment (2950 linked to a MD1000, 20 disks total).</p>
<p>Obviously if you are looking to purchase a server for running PostgreSQL do more research than this post before pulling the trigger.  Sometimes you can even get a vendor to let you borrow a unit for a short period of time to benchmark on if it is sufficiently expensive.</p>
<h2>Disk Configuration</h2>
<p>I strongly recommend running PostgreSQL on its own, dedicated server unless you are expecting trivial load.  When you setup your partitions during the OS installation you should be mindful of your disk spindles.  You want to have disks dedicated to doing nothing except database writing and reading.  Here&#8217;s a list of a few things to think about in terms of IO usage:</p>
<ul>
<li>Your operating system.</li>
<li>The actual PostgreSQL data that is written/read.</li>
<li>PostgreSQL logging (lots of log types &#8211; transaction, standard, write-ahead).</li>
</ul>
<p>Considering the above, try to have them each on separate disk arrays.  If that isn&#8217;t possible, try to at least separate the OS from PostgreSQL.  The standard logs for PostgreSQL can be tuned and configured in postgresql.conf.  The transaction logs cannot but they get written to <code>$PGROOT/pg_xlog</code>.  You can use a symlink to point pg_xlog wherever you want, just be sure to shut PostgreSQL down before making that change and preserve any existing files in there in the move.</p>
<p>Lets say you buy a six disk server to run your database on.  You could setup two in a raid 1 for your OS and the remaining four in a raid 10 for all your PostgreSQL stuff.  More disks gives more flexibility and more performance but we ran on that smaller configuration for awhile and it worked pretty well.  Consider going with 2.5&#8243; drives as you can get more disks in the same server chassis.</p>
<p>A final note on this, PostgreSQL supports the notion of tablespaces.  Tablespaces can be mapped to separate physical disk locations allowing you to put certain tables or indexes on different physical disk arrays.  This can be useful for splitting up the IO load of the server or for doing things like moving heavily hit indexes or tables to a faster more expensive set of disks (that perhaps would have been too expensive to make large enough for the entire DB).  We do this.  We have a bigger raid 10 of slower SATA drives for most of our data and then a smaller raid 10 of 15K SAS drives for a few of the hard hit items.</p>
<h2>Installation</h2>
<p>In Arch Linux installing PostgreSQL (via <code>pacman -Sy postgresql</code>) creates an <code>/etc/rc.d/postgresql</code> script for starting and stopping the database.  The first time you &#8220;start&#8221; that script it calls <code>initdb</code> if the configured $PGROOT does not exist.  Subsequent times it simply does a <code>pg_ctl start</code>.  You will get something different but similar (probably /etc/init.d/postgresql) in other distributions.  </p>
<p>Once your database as been initialized all of the important pieces of your PostgreSQL installation (aside from the command line tools) will be underneath the $PGROOT.  This includes the actual data stored in your DB, your postgresql.conf file, directories for the various PostgreSQL logs, and a few other configuration files.  </p>
<p>Several command line tools get put in <code>/usr/bin/</code>.  Here are a few of them and their purpose.  The PostgreSQL documentation has in depth instructions and information on each of these as well as the ones I am omitting.</p>
<ul>
<li><code>pg_ctl</code> &#8211; This is what gets used to actually start and stop PostgreSQL.  It is often not called directly because an init script uses it internally.</li>
<li><code>initdb</code> &#8211; Is used one time to initialize the $PGROOT for your PostgreSQL install.  Can be directed to initialize in a specific location and you can control the locale used (I recommend going with UTF-8) with its various parameters.  This one also generally won&#8217;t be called directly but by an init script in your Linux distribution.  If you need to call it yourself the most common command (for me) is <code>/usr/bin/initdb -D [MyPGROOT] -E UTF8 --locale en_US.UTF-8</code>.</li>
<li><code>pg_dump</code> &#8211; Tool for dumping a specific database in your PostgreSQL install.  This one dumps only a single database and no global/cluster settings like tablespaces or users.</li>
<li><code>pg_dumpall</code> &#8211; Tool for dumping your entire PostgreSQL install to a file.  Includes users, tablespaces, permissions, everything.  Is best choice if dumping for the purpose of moving an entire install to a new machine or to a new version of PostgreSQL.</li>
<li><code>pg_restore</code> &#8211; Tool for restoring a file produced by <code>pg_dumpall</code> or <code>pg_dump</code>.</li>
<li><code>vacuumdb</code> &#8211; Tool for doing a vacuum (and optionally an analyze with the &#8211;analyze parameter) from the command line.</li>
<h2>Database Configuration</h2>
<p>The tuning guide linked at the top of this post is an awesome walkthrough of the various parameters so there is no point in mentioning anything about that here.  <a href="http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server">Go read that</a>.  </p>
<p>Some settings can be changed and reloaded without restarting your DB while others cannot.  The postgresql.conf file itself notes which ones require a restart.  For the ones that do not require a restart you can make the changes and then use <code>pg_ctl</code> to reload the config like this:</p>
<p><code>/usr/bin/pg_ctl reload -D /path/to/$PGROOT</code></p>
<p>If it does require restart you will need to completely bounce PostgreSQL.  In Arch Linux this is simply <code>/etc/rc.d/postgresql restart</code> and you will do something similar in other distributions.</p>
<p>Also, the combination of your shared_buffers setting and your max_connections setting may require you to make a change to the shmmax setting of your server&#8217;s kernel params.  Basically it is the largest chunk of shared memory an application can grab at once.  PostgreSQL will print out a clear error at startup if this parameter is set to an insufficient level but doesn&#8217;t give any pointer on how to change it. </p>
<p>To change it do a <code>cat /proc/sys/kernel/shmmax</code> to check the current value.  The value is in bytes.  If it is too small you can edit <code>/etc/sysctl.conf</code> and add a line at the bottom containing <code>kernel.shmmax = myNewValue</code>.  Then run <code>sysctl -p</code> and you should be good.  I am no expert on tuning these kernel parameters but would say don&#8217;t raise that value above 50% of your memory.  If you need more than that for the PostgreSQL configuration you want you need to buy more RAM.</p>
<p>A very important piece to a basic PostgreSQL configuration is to ensure you are vacuuming and analyzing your database regularly to clear out dead tuples, keep the planner aware of the lastest changes, and keep your queries fast.  <a href="http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html">This page</a> covers the details if you want more technical bits about why this is important.</p>
<p>On top of running autovacuum at an appropriately aggressive level I would recommend setting up a job on your server that runs every night during off-peak times which does a <code>VACUUM ANALYZE</code> on the entire DB.  Your DB can still be used while this is happening though performance will take a bit of a hit.  You can use the command line tool <code>vacuumdb</code> provided with PostgreSQL to easily include this in a shell script.  It&#8217;ll look something like this:</p>
<p><code>/usr/bin/vacuumdb -v --analyze [Database Name]</code></p>
<p>Lastly, you absolutely need to do backups.  Raid is not a backup.  Mirroring your data even across machines or disk arrays is also NOT a backup alone (what happens if you accidentally delete a table and that gets mirrored right on over to the backup?).  Your plan should at minimum include using pg_dump to do a regular backup of everything and store it in at least 2 locations and at best also include using PostgreSQL&#8217;s Point In Time Recovery (PITR) features to keep a warm standby.  Our setup in production includes PITR to an almost equivalent server that we can fail over to instantly on failure as well as a pg_dump that is saved to 2 servers in the production cluster and also uploaded to S3 every night (after breaking it into sub-5GB chunks, the S3 maximum files size, with the split command).  PostgreSQL has <strong>never</strong> failed on me, never shutdown because of an error, and never corrupted any data but you can never be too safe with your production database.  There are occasionally <a href="http://www.techcrunch.com/2006/06/29/couchsurfing-deletes-itself-shuts-down/">very</a> public <a href="http://www.techcrunch.com/2009/01/03/journalspace-drama-all-data-lost-without-backup-company-deadpooled/">stories</a> of companies basically dying when their database fails and their backups are weak or nonexistant.</p>
<h2>Conclusion</h2>
<p>PostgreSQL is a sophisticated piece of software.  It has some utterly amazing engineering behind it but also a lot of complexity.  Hopefully this post can help someone get started that is having trouble finding concrete information.  If you are setting up a new PostgreSQL server you want to make good hardware choices, install the very latest version your package manager allows, spend some time tuning the configuration, and setup good maintenance and backup scripts that run at regular intervals.</p>
<p>At some point I will follow up this post with two more &#8211; one on query and index tuning (the awesomely fun part in my opinion) and another on setting up and using PITR.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.gtuhl.com/2009/03/08/postgresql-setup-basics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
