<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: PostgreSQL Tips and Tricks</title>
	<atom:link href="http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/</link>
	<description>Development, IT, Gadgets, and Startups</description>
	<lastBuildDate>Mon, 15 Aug 2011 08:57:36 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Closer To The Ideal &#187; Blog Archive &#187; JOINs are evil, part II</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/comment-page-1/#comment-4026</link>
		<dc:creator>Closer To The Ideal &#187; Blog Archive &#187; JOINs are evil, part II</dc:creator>
		<pubDate>Fri, 11 Dec 2009 20:53:35 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420#comment-4026</guid>
		<description>[...] everywhere. Ten years ago, when I was learning about databases, I never heard anyone suggest that JOINs are evil:  #4: Avoid [...]</description>
		<content:encoded><![CDATA[<p>[...] everywhere. Ten years ago, when I was learning about databases, I never heard anyone suggest that JOINs are evil:  #4: Avoid [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/comment-page-1/#comment-4012</link>
		<dc:creator>Joe</dc:creator>
		<pubDate>Sat, 07 Nov 2009 16:18:11 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420#comment-4012</guid>
		<description>Hey Florian, thanks for commenting.  I&#039;ve got a lot of experience with PostgreSQL, a decent amount with MySQL, and a much smaller amount with Oracle and MSSQL.  I believe both points are still valid across all of those systems.

On #3 (and #4) its all about moderation and making informed decisions based on specific load.  My initial wording was pretty slanted and typed in the context and bias of some work I was dealing with in our DB at the time.  I hedged it a long while back.  De-normalization can get you massive performance wins in a very large database by preventing joins or reducing query counts on SELECTs.  The maintenance side of de-normalization can indeed be painful so you have to carefully consider each usage.  Sometimes it doesn&#039;t matter (very static data), sometimes you can have the DB handle it automatically with triggers or similar, and occasionally handling it in code isn&#039;t too bad.

I finally did adjust the wording on #4 just now to prevent further confusion.  My point is joins don&#039;t work when your tables are HUGE.  We run OLTP loads (big social networks specifically) with hundreds of thousands of users and at peak times thousands of transactions a second on a single decent Dell server.  We have lots of tables and partitions that are in the 30million+ row range.  Databases handle joins in memory and unless you have the RAM to set work_mem (or non-Postgres equivalent) very high they get done on disk for joining of very large tables.  Once that happens they run at crippling speed.  Joins are wicked fast when they fit in memory and as a general rule of thumb are the way to go (the PostgreSQL planner will even convert queries to joins under the hood when it makes sense) unless you are working with massive tables and result sets.

So I think both points are valid, but my initial language was far too broad.  They make sense when you have a lot of data and are still using an RDBMS to handle it.  Partitioning can help delay the usage of those techniques by keeping the tables sizes smaller but I find it hard that you could avoid both in a very large system.

Thanks again for commenting and perhaps point those fellas in that sqlservercentral forum at this response so they have some context before assuming I don&#039;t know what I am talking about :)</description>
		<content:encoded><![CDATA[<p>Hey Florian, thanks for commenting.  I&#8217;ve got a lot of experience with PostgreSQL, a decent amount with MySQL, and a much smaller amount with Oracle and MSSQL.  I believe both points are still valid across all of those systems.</p>
<p>On #3 (and #4) its all about moderation and making informed decisions based on specific load.  My initial wording was pretty slanted and typed in the context and bias of some work I was dealing with in our DB at the time.  I hedged it a long while back.  De-normalization can get you massive performance wins in a very large database by preventing joins or reducing query counts on SELECTs.  The maintenance side of de-normalization can indeed be painful so you have to carefully consider each usage.  Sometimes it doesn&#8217;t matter (very static data), sometimes you can have the DB handle it automatically with triggers or similar, and occasionally handling it in code isn&#8217;t too bad.</p>
<p>I finally did adjust the wording on #4 just now to prevent further confusion.  My point is joins don&#8217;t work when your tables are HUGE.  We run OLTP loads (big social networks specifically) with hundreds of thousands of users and at peak times thousands of transactions a second on a single decent Dell server.  We have lots of tables and partitions that are in the 30million+ row range.  Databases handle joins in memory and unless you have the RAM to set work_mem (or non-Postgres equivalent) very high they get done on disk for joining of very large tables.  Once that happens they run at crippling speed.  Joins are wicked fast when they fit in memory and as a general rule of thumb are the way to go (the PostgreSQL planner will even convert queries to joins under the hood when it makes sense) unless you are working with massive tables and result sets.</p>
<p>So I think both points are valid, but my initial language was far too broad.  They make sense when you have a lot of data and are still using an RDBMS to handle it.  Partitioning can help delay the usage of those techniques by keeping the tables sizes smaller but I find it hard that you could avoid both in a very large system.</p>
<p>Thanks again for commenting and perhaps point those fellas in that sqlservercentral forum at this response so they have some context before assuming I don&#8217;t know what I am talking about <img src='http://blog.gtuhl.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Florian Reischl</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/comment-page-1/#comment-4011</link>
		<dc:creator>Florian Reischl</dc:creator>
		<pubDate>Sat, 07 Nov 2009 15:48:13 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420#comment-4011</guid>
		<description>Hi

I&#039;m no PostgreSQL guy, what means I don&#039;t know about the truth of all those tips for PostgreSQL. Nevertheless, I know other database systems and most tips sound good and logical to me.

Anyway, are #3 and #4 are really true for PostgreSQL? If yes, for which kind of database? De-normalization is a technique to design selective reporting and warehousing databases, yes. De-normalization is a no-go for an OLTP database.

These &quot;tips&quot; are fine for a warehouse database which is filled by a small count of ETL jobs or very simple systems with only one front end application. My experiences showed me, data redundancy in a large operational database always(!) ends up in data inconsistency.  To keep a de-normalized database in sync within a enterprise system is way too complex and will (not might) fail.

Again, all other tips sound fine to me and are even generally correct to all database systems.

Greets
Flo</description>
		<content:encoded><![CDATA[<p>Hi</p>
<p>I&#8217;m no PostgreSQL guy, what means I don&#8217;t know about the truth of all those tips for PostgreSQL. Nevertheless, I know other database systems and most tips sound good and logical to me.</p>
<p>Anyway, are #3 and #4 are really true for PostgreSQL? If yes, for which kind of database? De-normalization is a technique to design selective reporting and warehousing databases, yes. De-normalization is a no-go for an OLTP database.</p>
<p>These &#8220;tips&#8221; are fine for a warehouse database which is filled by a small count of ETL jobs or very simple systems with only one front end application. My experiences showed me, data redundancy in a large operational database always(!) ends up in data inconsistency.  To keep a de-normalized database in sync within a enterprise system is way too complex and will (not might) fail.</p>
<p>Again, all other tips sound fine to me and are even generally correct to all database systems.</p>
<p>Greets<br />
Flo</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/comment-page-1/#comment-3943</link>
		<dc:creator>Joe</dc:creator>
		<pubDate>Wed, 09 Sep 2009 14:43:56 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420#comment-3943</guid>
		<description>Thanks for the comments, seems we are on the same page with respect to normalization, like your explanation of it.  That edit was inspired by the comments on hacker news, this post picked up some steam there.  The normalization and join advice above have together drawn the most disagreement and that was expected.  The join advice too could use some moderation as many read that as &quot;never do joins&quot; when I really more precisely meant to communicate something like &quot;don&#039;t do joins between tables with lots of rows&quot; where lots of rows means many many millions on the low end.</description>
		<content:encoded><![CDATA[<p>Thanks for the comments, seems we are on the same page with respect to normalization, like your explanation of it.  That edit was inspired by the comments on hacker news, this post picked up some steam there.  The normalization and join advice above have together drawn the most disagreement and that was expected.  The join advice too could use some moderation as many read that as &#8220;never do joins&#8221; when I really more precisely meant to communicate something like &#8220;don&#8217;t do joins between tables with lots of rows&#8221; where lots of rows means many many millions on the low end.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joshua Russo</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/comment-page-1/#comment-3941</link>
		<dc:creator>Joshua Russo</dc:creator>
		<pubDate>Wed, 09 Sep 2009 14:22:30 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420#comment-3941</guid>
		<description>I strongly agree with the comment that inspired the edit to denormalizing (tho I&#039;m not sure how to see it, I only seem to see the last two comments). Any way ... I am also a strong proponent of duplicated data to speed up query results. 

The conceptual understanding that needs to be applied is the concept of a primary data record and all copies as being permutations of that. As long as you can at any time perform the duplication logic from primary data record to produce copies and get what you expect, then you are in safe heaven. You never update the copies except from the duplication logic.</description>
		<content:encoded><![CDATA[<p>I strongly agree with the comment that inspired the edit to denormalizing (tho I&#8217;m not sure how to see it, I only seem to see the last two comments). Any way &#8230; I am also a strong proponent of duplicated data to speed up query results. </p>
<p>The conceptual understanding that needs to be applied is the concept of a primary data record and all copies as being permutations of that. As long as you can at any time perform the duplication logic from primary data record to produce copies and get what you expect, then you are in safe heaven. You never update the copies except from the duplication logic.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Benjamin A. Shelton &#124; Blog &#187; Blog Archive &#187; Links: August 12th</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/comment-page-1/#comment-3745</link>
		<dc:creator>Benjamin A. Shelton &#124; Blog &#187; Blog Archive &#187; Links: August 12th</dc:creator>
		<pubDate>Wed, 12 Aug 2009 07:16:01 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420#comment-3745</guid>
		<description>[...] Uhl has a fascinating article on improving PostgreSQL performance (although the tips can be adapted to other [...]</description>
		<content:encoded><![CDATA[<p>[...] Uhl has a fascinating article on improving PostgreSQL performance (although the tips can be adapted to other [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Twitted by ryszard99</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/comment-page-1/#comment-3743</link>
		<dc:creator>Twitted by ryszard99</dc:creator>
		<pubDate>Tue, 11 Aug 2009 06:19:49 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420#comment-3743</guid>
		<description>[...] This post was Twitted by ryszard99 [...]</description>
		<content:encoded><![CDATA[<p>[...] This post was Twitted by ryszard99 [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: PostgreSQL Tips and Tricks &#124; giswhat.be</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/comment-page-1/#comment-3742</link>
		<dc:creator>PostgreSQL Tips and Tricks &#124; giswhat.be</dc:creator>
		<pubDate>Mon, 10 Aug 2009 09:41:42 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420#comment-3742</guid>
		<description>[...] I wrote an article about writing PostGIS Spatial Queries (here). Today I found the following post: PostgreSQL Tips &amp; Tricks. There are a couple of tips for working with a PostgreSQL database. Here&#8217;s a [...]</description>
		<content:encoded><![CDATA[<p>[...] I wrote an article about writing PostGIS Spatial Queries (here). Today I found the following post: PostgreSQL Tips &amp; Tricks. There are a couple of tips for working with a PostgreSQL database. Here&#8217;s a [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/comment-page-1/#comment-3740</link>
		<dc:creator>Joe</dc:creator>
		<pubDate>Sun, 09 Aug 2009 20:14:23 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420#comment-3740</guid>
		<description>If your database is extremely, massively write heavy then sure the balance may tip more towards too many indexes being worse.  But that is barely ever the case.  Indexes must indeed be updated every time data gets updated (insert, update, or delete) so you absolutely pay a penalty on every write.  Note though that indexes are NOT a copy of the data - they do not contain data.  They are pointers, and its just the structure of the index that must be updated.

I am willing to bet in 99% of the cases throwing down as many indexes as you need is the right way to go.  I am not saying to just create indexes for no reason - just create one for every common SELECT.  If you have a use case or project that falls in that 1% of super heavy writes would love to hear about it (honestly, love hearing about unique setups and problems).

As an example our OLTP database handles a lot of writes (millions a day) but in the overall picture writes are less than 5% of our queries so it simply doesn&#039;t make sense to punish the other 95%.  If we have to do a lot of writing for a big migration or to bring a new customer&#039;s lists online we drop the indexes on the relevant partitions and then recreate them when done.  Single record inserts caused by your application often aren&#039;t going to feel any slower because the code-level stuff is going to hog most of the perceived time anyway.</description>
		<content:encoded><![CDATA[<p>If your database is extremely, massively write heavy then sure the balance may tip more towards too many indexes being worse.  But that is barely ever the case.  Indexes must indeed be updated every time data gets updated (insert, update, or delete) so you absolutely pay a penalty on every write.  Note though that indexes are NOT a copy of the data &#8211; they do not contain data.  They are pointers, and its just the structure of the index that must be updated.</p>
<p>I am willing to bet in 99% of the cases throwing down as many indexes as you need is the right way to go.  I am not saying to just create indexes for no reason &#8211; just create one for every common SELECT.  If you have a use case or project that falls in that 1% of super heavy writes would love to hear about it (honestly, love hearing about unique setups and problems).</p>
<p>As an example our OLTP database handles a lot of writes (millions a day) but in the overall picture writes are less than 5% of our queries so it simply doesn&#8217;t make sense to punish the other 95%.  If we have to do a lot of writing for a big migration or to bring a new customer&#8217;s lists online we drop the indexes on the relevant partitions and then recreate them when done.  Single record inserts caused by your application often aren&#8217;t going to feel any slower because the code-level stuff is going to hog most of the perceived time anyway.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: eggyknap</title>
		<link>http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/comment-page-1/#comment-3739</link>
		<dc:creator>eggyknap</dc:creator>
		<pubDate>Sun, 09 Aug 2009 19:37:46 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gtuhl.com/?p=420#comment-3739</guid>
		<description>&quot;The only time indexes can hurt is when doing bulk inserts. I did a post about dropping/recreating to temporarily boost bulk insert performance awhile back.

Most of the time it doesn’t matter though. Throw down as many indexes as you need.&quot;

I&#039;ve seen plenty of times when having too many indexes mattered quite significantly. An index is effectively another copy of the data, and must be updated whenever the underlying data are updated. If having more indexes didn&#039;t actually matter, PostgreSQL features like Heap-Only Tuples wouldn&#039;t have any significant performance impact, which clearly isn&#039;t the case.</description>
		<content:encoded><![CDATA[<p>&#8220;The only time indexes can hurt is when doing bulk inserts. I did a post about dropping/recreating to temporarily boost bulk insert performance awhile back.</p>
<p>Most of the time it doesn’t matter though. Throw down as many indexes as you need.&#8221;</p>
<p>I&#8217;ve seen plenty of times when having too many indexes mattered quite significantly. An index is effectively another copy of the data, and must be updated whenever the underlying data are updated. If having more indexes didn&#8217;t actually matter, PostgreSQL features like Heap-Only Tuples wouldn&#8217;t have any significant performance impact, which clearly isn&#8217;t the case.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

