Get the Postgres format for an index expression

Let’s say you’ve built an automated system that builds out your database (common with ORM tools), which includes building indexes with ‘where’ clauses, like so:

create index index_test 
on test (col1, col2, col3) 
where col4 = 'test';

If you were to build an automated migration for this database, it’s helpful to know if this index will differ in the next version. Assuming a ‘duck type’ style equality, you’d want to verify the columns and their types are the same, as well as the where clause – if any of these differ, you need to rebuild the index. The problem is that a developer who writes the where clause may not format it exactly as the database does (they are only writing the ‘where’ bit), so it would be helpful to be able to parse the SQL, the re-constitute it through Postgres to check if there are material differences.

If you have created an index with a WHERE clause, you can query it back like so:

select pg_get_expr(indpred, indrelid), *
FROM pg_index AS idx
JOIN pg_class AS idx_cls ON idx_cls.oid = idx.indexrelid
WHERE pg_class.relname = 'index_test'

Internally, there are several tables that expose data of type pg_node_tree (a parse tree), such as index where clauses, and rewrite rules. The pg_get_expr function turns these back into SQL, given the OID of a type, which works great for reconstructing indexes. There is a third argument to this function to “pretty print” the output, but it seems to just reduce the amount of parentheses included.

The problem is that there doesn’t seem to be an exposed API for plpgsql or SQL to reverse this (build a parse tree). There are examples that wrap C APIs1, but if you search “pg_node_tree site:postgresql.org/docs” you will not find any functions that return pg_node_tree type. For this exercise, I wasn’t inclined to use one of these extension types, although clearly they could work.

Rather, I decided to use the output of “explain”, since it parses the query in the process of determining a plan. One of the nice feature of explain in Postgres is that you can control the format of the output, and now that Postgres has JSON processing, this is fairly concise. If you do something like this, psql and pgadmin both truncate the output with …, so you may need to use some of the JSON enumeration functions (e.g. json_object_keys) to find the correct path to where you want to be:

"[
  {
    "Plan": {
      "Node Type": "Index Scan",
      "Scan Direction": "Forward",
      "Index Name": "tmf_document_s_pk0",
      "Relation Name": "tmf_document_s",
      "Alias": "tmf_document_s",
      "Startup Cost": 0.14,
      "Total Cost": 8.16 (...)"

The solution is to simply take this output, cast to JSON, and find the where clause:

CREATE OR REPLACE FUNCTION format_expr(query text) returns text AS
$BODY$
DECLARE
    query_explain  text;
    explanation    text;
BEGIN
  query_explain := e'EXPLAIN(FORMAT json) ' || query;
  execute query_explain INTO explanation;
  return explanation::json->0->'Plan'->'Index Cond';
END;
$BODY$
LANGUAGE plpgsql;

When you call this function, you need to pass it a query that works off a table as you apparently can’t “explain” a “create index” statement.

select format_expr('select * from test where id = ''123''');

""((id)::text = 'test'::text)""

For my use case (testing during a database migration) this is fine, because it’s relatively simple for the application that generates code to generate “select * from table where …” instead of “create index where …”.

  1. https://github.com/lfittl/pg_query []