Not a Fan of MySQL
It has been a wonderfully busy 2010 thus far. The blog has suffered but hoping to have a post up soon about building a monstrous build server on a budget.
That said, I have been forced (by software, not by management) to use MySQL on a couple projects in the last month and after being a PostgreSQL user for the last several years it has been an incredibly frustrating experience worth throwing up a couple bullet points about. The more I use MySQL the more frustrated I get. These are high level and not well argued but hoping to get points across. I always invite digging into details.
Reasons why I dislike MySQL:
- The planner is incredibly dumb. I feel like it does the wrong thing most of the time.
- Temporary tables and subselects are relatively worthless. They crush performance and are full of bizarre gotchas due to limitations and bugs in the planner (like not being able to use a temp table more than once in the same query).
- The tuning process is not at all intuitive or consistent. Tuning MySQL queries feels like trying to maximally combine hacks and workarounds for bugs in the planner to achieve something vaguely close to desired speed.
- The planner goes to disk a LOT. Subtle adjustments to the query will prevent it from doing so but why can’t it do the right thing and avoid disk except as a last resort or only when something doesn’t fit in memory? My laptop has 4GB and even with a 100MB DB the MySQL planner goes to disk all the time.
- The documentation is superficial, incomplete, and inconsistent. The examples are trivial and unhelpful.
- Doing a DB dump locks the ENTIRE DB. This is awful. I don’t want to setup replication on a trivial DB (less than say 100MB) just to do backups without locking things up.
PostgreSQL Comparison
This experience has really emphasized some huge advantages of PostgreSQL even ignoring the technical points:
- The code is of exceptionally high quality and exceedingly clean. Corner cases are rare and bugs even rarer.
- The documentation is complete, organized, and consistent. You can genuinely learn 95% of what there is to know about PostgreSQL by reading the excellent documentation (that is kept updated and synchronized with each new release). It is also very easy to find. It feels like they spend as much time and effort on the documentation as they do the code. Only OpenMQ documentation has rivaled PostgreSQL in completeness in recent memory.
- The PostgreSQL planner is very solid and it makes the right decisions most of the time without any assistance from a human. You can use the documentation to build a solid foundation of understanding about how things work and then use that plus your own intuition to achieve desired results. If a feature is available in PostgreSQL it will be fast and fully understood by the planner. The same cannot be said for MySQL where raw cycles are required to slowly absorb all the known bugs, workarounds, and feature-specific knowledge about what is actually usable on a non trivial DB and what is not.
Just some thoughts. Perhaps my perception is incorrect given the disparity in usage time. To be blunt I would smile if Oracle destroyed MySQL and on a semi-related note believe the shenanigans of the founder/creator of MySQL around the Oracle acquisition and trying to get back something that was fairly purchased are completely lame.
Trackbacks and Pingbacks
Comments are closed.
amen brother!
— merlin
Every DB is what it is. MySQL, using myIsam (which I assume you are, because mysql does support mvcc with innoDB) has significant limitations by design, as a trade off for other things. It’s meant to be the strong fast retarded guy, who can only execute simple commands. If you’re writing queries that the planner gets wrong, you probably need to think again about what DB you’re using.
You’re absolutely correct about temp tables and subselects — an experienced DBA will not use them on a myIsam DB, because it’s not smart. Complaining about myIsam limitations that are well documented ahead of time? I don’t see the point.
A DBA should be system agnostic once work has begun. When architecture begins, it’s great to debate reads/writes/replications etc and mostly HOW the database will be used.
If you’re using Oracle, MySQL or Postgres in a system where Cassandra (or one of the other strong nosql product) could be used, because you’re “familiar” with some other system, you’ve already lost the architecture battle, because you don’t understand your data.
On the other hand, if you pick MySQL for something where stored procedures or fireign keys or triggers are a real win, you have lost again.
The only place you REALLY lose though is to not accept the system. MySQL is quite capable of storing and retrieving data, even with myIsam. It’s got some strengths. Use them.
In terms of backup up myIsam without locking the whole thing, I threw up a script:
http://www.fishdan.com/2010/05/backing-up-mysql-db-incrementally-using.html
Should get you most of the way there…
Thanks for the comment, all solid statements.
I believe complaining about the weaknesses of the MySQL planner is valid given the abundance and seemingly arbitrary nature of them. You would have to read the entire pile of documentation to catch them all and could at no point run on intuition given a basic understanding of the core components. I would feel better about it if they completely removed every feature that the planner does not handle beautifully rather than tack on features for completeness that in reality are not usable. As it is users are setup for failure as their data grows.
I think Drizzle is the best thing happening in the MySQL space as they are doing exactly that.
My primary point is that the space of problems for which MySQL is the best candidate is very small now that it has sprawled and PostgreSQL continues to distance itself with piles of speed on top of its superior foundation. If I must use MySQL due to a software requirement then I do but otherwise if I am working with the sort of data and load that lends itself well to a rdbms I go with PostgreSQL.
I love using new tech. I only use things I am familiar with if I believe them to be the best option for the specific problem I am dealing with.
A long while ago (2003?) I upgraded from mysql 4. something to 4.something + 10. NULLS which used to sort first, now came last. A few versions later it switched back.
I became a supporter of postgresql mantra of “making it correct” rather than of mysql “making it fast”.
Thanks for the comment Scott. That is my main reason for leaning so heavily towards PostgreSQL. They make sound decisions, implement them very well, and then stick with it. Following the main mailing lists (I am a constant follower of the performance list, occasional follower of the others) only reinforces the feeling as you get to witness the core committers thinking very hard about each change and adding them the right way. It also helps to see how top tier those guys are at writing code.
Now that they have the core so solid the speed is approaching incredibly impressive levels as well.