Tuning Postgres Queries

I have an application ported from using Oracle to Postgres – tuning queries is quite different between the two. While I know many of the Postgres devs have detailed blogs, I haven’t found many practical articles on the subject, so I’m writing up what I’ve found. I appreciate any further comments or resources people may have.

  • When you fix a problem in a query in Oracle, you can often achieve a 10x speed-up. In Postgres, this is often 100x- not because Postgres is better, but because you seem to hit problems sooner.
  • TRUNCATE behaves very differently than Oracle. In either case, TRUNCATE empties the table. In Oracle, it is as if you deleted the files on disk, and operates outside a transaction. In Postgres, it behaves as a DELETE without a where clause, although it does reclaim space immediately, and works within a transaction. That said I found that if you a script with a lot of DELETE/TRUNCATE statements, it can be helpful to explicitly call ANALYZE, otherwise you get maddeningly inconsistent query timings (without these, in my testing the total time for a script of ~50-100 queries was consistent, but the actual time of each query varied dramatically)
  • Beware of common table expressions (WITH) – a lot of people seem to be discovering these lately, but not understanding the risks. These are a great way to break up code and make it readable, but in Postgres they are also optimization boundaries. When you write a table with a lot of WITH blocks, you see it materialize each block. Mathematically speaking, an operation may be much faster rearranged (see relational algebra).
  • In some, but not all cases, I’ve had success turning a common table expression into a temp table, and adding indexes (It seems that one does not get indexes automatically when Postgres materializes a table).
  • Like other database software, Postgres supports grouping values in sets based on either sorting or hashing (think “GROUP BY” or count(distinct X)). In my limited testing, I found Postgres was much more likely to use hashing when there were few columns (e.g., two), even though it is generally much faster. I haven’t yet found a way to encourage this for wider tables (if you think this might be an issue, you can prove this out by concatenating all the columns into one and doing your operation, even with the added work, it is pretty fast).
  • Postgres seems to be very sensitive about re-ordering joins. In Oracle, I never had to think about this – if you can, just remove all your outer joins, or put them at the end. In some applications, you can trivially remove outer joins by adding a “null” row (when using numeric IDs, I like to set the ID on this to 0 by convention, which makes it easy to find).
  • Pay attention to how often you see sorting – depending on the application, you may want to give more RAM for this, if you can. If Postgres does disk-sorting, it will be very slow.
  • Bitmap indexes are awesome, but good luck figuring out how to make them turn on. Oracle has a bitmap index which acts completely differently than Postgres (other than the mashing of bits – there is no “star transform” in Postgres). I had a lot of success tuning full-text queries with these, and when they work they make everything super fast, but I have yet to figure out how to make them happen more often.
  • Hope this all helps – if you have any useful tips, please post in the comments below!

4 comments ↓

#1 Andreas on 02.03.14 at 7:00 pm

What do you mean by join ordering? PostgreSQL does not care about the order the joins are written in unless you set a low join_collapse_limit.

#2 Magnus Hagander on 02.04.14 at 9:02 am

You’ve misunderstood a few things about how PostgreSQL works here :) (though the majority is correct)

1. TRUNCATE in PostgreSQL does *not* behave like “DELETE FROM”. It behaves like Oracle in that it unlinks the files, and in that it is a DDL operation and not a DML. So for example your DELETE triggers will *not* fire (truncate triggers will of course). But unlike Oracle, you can still safely use it inside a transaction, and properly roll it back, just like with all DDL.

Truncate also does reset the statistics immediately and should not require a direct ANALYZE (reloading data quickly into the table might, and a DELETE would as well – until autovacuum kicks in the statistics would be out of date)

2. If you find PostgreSQL sensitive about reordering joins, you probably want to look at tuning join_collapse_limit and/or from_collapse_limit.

3. PostgreSQL doesn’t *have* bitmap indexes. At all. PostgreSQL has a Bitmap Scan which is something completely different, as it uses the normal indexes. So you’re really comparing apples to oranges here. Bitmap Scans are used in a few different scenarios, one being an index that requires a rescan (a lossy index – such as your case of full text indexing most likely), and the other one for indexes with low cardinality.

#3 Gary on 02.06.14 at 3:17 am

Thanks for the necessary clarifications. If anyone reads these later, what I’ve written is entirely based on experimental results, so it’s possible we’re mis-attributing something. Hopefully I’ll come back and have revisions later.

1. The truncate section has been most controversial – I attempted to summarize the documentation, which was perhaps a mistake :)

2. Thanks!

3. Fair point. The first time I saw Bitmap Scans I thought they might be related, and I will clarify what I wrote.

#4 Gary on 02.06.14 at 3:18 am

I’ll have to take a second look – this is basically all based on experimentation – a colleague had felt this was the case, but it’s certainly possible we mis-diagnosed what was happening.

Leave a Comment

Current ye@r *