“Postgres for Developers” – Notes from PGConf NYC 2014

I saw a talk by one of the core Postgres developers, which showed a bunch of interesting tricks to handle business rules in Postgres specific SQL. These are all things you could find by reading the documentation, but they are interesting enough to write up examples to highlight some interesting things you can do. A lot of these end up being useful for writing systems with immutable data (especially auditing, and sometimes reporting systems).

Example 1: Array Aggregation

“array_agg” can be used to combine rows, which sort of resembles a pivot table operation (this is the same set of values that would be passed as arguments to other aggregation functions)

SELECT y, array_agg(x) FROM (
  SELECT 1 x, 2 y
  UNION ALL
  SELECT 2 x, 2 y
  UNION ALL 
  SELECT 3 x, 3 y
) a
GROUP BY y
 
2;"{1,2}"
3;"{3}"

If you use the above table as a common table expression, you can also rename the columns in the with block. You can even join on the arrays:

WITH t(a, b) AS
(
  SELECT y, array_agg(x) FROM (
    SELECT 1 x, 2 y
    UNION ALL
    SELECT 2 x, 2 y
    UNION ALL 
    SELECT 3 x, 3 y
  ) a
  GROUP BY y
)
SELECT * 
FROM t t1 JOIN t t2 ON t1.b[2] = t2.a
 
2;"{1,2}";2;"{1,2}"

Example 2: Named Window Functions

I’m not sure yet whether this is just syntactic sugar or has real value, but you can set up named “windows.”

By way of explanation, a lot of times when you start using aggregate functions (min, max, array_agg, etc), you end up using window functions, which resemble the following:

SELECT a, MAX(b) OVER (partition BY a) 
FROM (
  SELECT 1 a, 1 b
  UNION ALL 
  SELECT 2 a, 1 b
  UNION ALL 
  SELECT 1 a, 2 b
) t1
 
1;2
1;2
2;1

These allow you do calculate aggregate functions (like min/max) without combining all the rows.

For instance, if you sort these values, you can find the “next” or “previous” row in the partition, which is pretty standard sql stuff:

SELECT a, lag(b) OVER (partition BY a ORDER BY b) 
FROM (
  SELECT 1 a, 1 b
  UNION ALL 
  SELECT 2 a, 1 b
  UNION ALL 
  SELECT 1 a, 2 b
) t1
 
1;
1;1
2;

If you use the above table as a common table expression, you can then rename the columns in the WITH block. You can even join on the arrays:

WITH t(a, b) AS
(
  SELECT y, array_agg(x) FROM (
    SELECT 1 x, 2 y
    UNION ALL
    SELECT 2 x, 2 y
    UNION ALL 
    SELECT 3 x, 3 y
  ) a
  GROUP BY y
)
SELECT * 
FROM t t1 JOIN t t2 ON t1.b[2] = t2.a
 
2;"{1,2}";2;"{1,2}"

What’s cool is you can move the “over partition by” part out of the query to the end as a named window, which presumably would be really nice if you had a lot of them, or wanted to re-use the same window for multiple fields:

SELECT a, lag(b) OVER w
FROM (
  SELECT 1 a, 1 b
  UNION ALL 
  SELECT 2 a, 1 b
  UNION ALL 
  SELECT 1 a, 2 b
) t1
window w AS (partition BY a ORDER BY b) 
 
1;
1;1
2;

Example 3: Ranges
Postgres has a really cool feature, as of 9.2, where you can query whether something is in a range (ranges are a special type, kind of like the arrays above). This example is a bit contrived, to show that you could combine array_agg and range creation:

WITH _data AS (
  SELECT 1 a, 1 b
  UNION ALL 
  SELECT 2 a, 1 b
  UNION ALL 
  SELECT 1 a, 2 b
  UNION ALL
  SELECT 2 a, 2 b
),
_history AS (
  SELECT a, array_agg(b) _start, array_agg(b) _end
  FROM _data
  GROUP BY a
)
SELECT a, 
       _start[1], 
       _end[1], 
       int4range(_start[1]::INTEGER, _end[2]::INTEGER, '(]'::text) 
FROM _history
 
1;1;1;"[2,3)"
2;1;1;"[2,3)"

There are a bunch of range types built in (based on numerics, timestamps). Note that you can specify whether the endpoints on ranges are inclusive or exclusive.

You can find out if a data value is within a range with the @> operator and see if two ranges overlap with &&. This set of functionality is great for exploring audit records – if you make a range with “[valid_from, valid_to)” you can query to find out what rows were effective on a particular date/time, for instance.

If you’re in this area of functionality, also check out btree_gist indexes, which may be helpful for tuning this.

Example 4: DISTINCT ON
Postgres has a feature to pull back the first value for a row in a group by. I assume this is a performance feature, but at the least it’s a very concise syntax for something that would otherwise require the use of RANK(). I imagine that you’d always want to use an ORDER BY with it.

The example from the docs for this one is pretty clear:

SELECT DISTINCT ON (location) location, TIME, report
FROM weather_reports
ORDER BY location, TIME DESC;

There are a few other features that got a lot of play at the conference (e.g. foreign data wrappers) – more to come.

Tags: ,

1 comment so far ↓

#1 Andreas on 04.09.14 at 1:06 am

Everything you can do with DISTINCT ON can be done with window functions, but DISTINCT ON gives better performance, requires no sub query (meaning shorter query), and was implemented before PostgreSQL supported window functions.

Leave a Comment

Current day month ye@r *