Monitoring PostgreSQL TPS with Zabbix
I have become a big fan of Zabbix. It is a full-featured monitoring and notification system for keeping an eye on servers, gear, websites, whatever. The interface can be completely confusing but it is incredibly powerful and I have found I prefer it over the other systems I have tried (Nagios, Zenoss, Cacti). I hope to do an in depth post on Zabbix eventually. I have it doing a ton of stuff including tracking CPU utilization and disk IO (awesome IO monitoring thanks to this page) on all servers, detecting outtages and automatically fixing them, monitoring and charting the response time of our websites, all along with notifications via email and text if things ever get out of whack.
This post assumes you already know about Zabbix, already have Zabbix server setup, already have a Zabbix agent on your DB server, and want to monitor the Transactions Per Second of a PostgreSQL database. If you are looking for more general help the Zabbix forums and manual are pretty decent.
On your PostgreSQL server fire up psql. Now run
select * from pg_stat_database. Check out that output. Among other things there is a “xact_commit” column that contains a count of all transactions committed for each database. That is the number we will be using. If you are running PostgreSQL 8.3 or later you will also have columns for tuples_fetched, tuples_inserted, etc which could be pretty interesting to also monitor and chart.
Now that you see what is going to be used for the raw data, edit your
zabbix-agentd.conf file. Mine is at
/etc/zabbix/zabbix_agentd.conf. Go down to the “USER-DEFINED MONITORED PARAMETERS” section at the bottom.
Add a new UserParameter that looks like this:
UserParameter=custom.postgresql.tps,psql [dbname] -A -t -U [dbuser] -c "select sum(xact_commit) from pg_stat_database"
The format on the right is
item key name, command. You can change the item name if you want, and will want to substitute in a [dbname] and [dbuser] that works for your DB.
Now restart your Zabbix agent. On Arch Linux this would be
Now for the Zabbix server side login to your Zabbix administrative web interface. Go to the “Configuration -> Items” section, select your DB server for the host, and hit “Create Item.” Here is a screenshot of my settings here.
Head over to the “Configuration -> Graphs” section now, select your DB server for the host, and hit “Create Graph.” Dimensions and name can be whatever. Add the item you created above to the graph and save it.
That should be it. You can now throw that graph on a screen for monitoring, setup a trigger based on the item (perhaps get a notification if TPS drops very low or gets very high), etc. As suggested earlier it would probably also be very interesting to monitor and plot those “tup_X” columns in pg_stat_database as well to get a constant idea of what your DB is doing most of.
I won’t give our specific numbers, but will say I was mighty impressed with our TPS especially considering our DB server isn’t working too hard on the CPU/disk end of things. Suspect there is a point where the percentage of stuff in RAM will decrease as the DB size increases and then the disk will start working more. But, thanks to Zabbix I will be completely aware of if/when that happens because I have a screen hanging on my wall with all the relevant details as the server runs.