Postgres: Time Travelling Debugger

Imagine you’re an engineer doing phone support for Netflix. The movies they show change regularly:

netflix7

There are various reasons for this – Netflix suddenly thinks you like period pieces, or they get into a contract dispute with one of their vendors.

netflix9

As a support engineer, this presents a set of communication challenges, which inspire some technical solutions. Customers often ask questions like “why is this different than a week ago?” It’s a reasonable question, but very time consuming to answer.

netflix10

To make this easier, we need to store the audit history of this data, and write a view that shows the full information available about a movie over time.

For this example, we define a simple schema, with movie and their associated licenses.

CREATE TABLE movies (
  id INT PRIMARY KEY, 
  title text);
 
CREATE TABLE licenses (
  id INT PRIMARY KEY,
  movie_id INT REFERENCES movies (id),
  title text, 
  start_date TIMESTAMP, 
  end_date TIMESTAMP
);

The example database maintains audit history, which stores every change that has happened over time:

CREATE TABLE movies$a (
  id INT, 
  title text, 
  audit_txid BIGINT,
  audit_date TIMESTAMP WITH TIME zone);
 
CREATE TABLE licenses$a (
  id INT,
  movie_id INT REFERENCES movies (id),
  title text, 
  start_date TIMESTAMP, 
  end_date TIMESTAMP,
  audit_txid BIGINT,
  audit_date TIMESTAMP WITH TIME zone
);

Given this, we want to write a view that shows everything about a movie, including an ‘effective’ time interval. If we include an interval in the view, we can filter to a specific time, as shown below (the <@ operator is 'contains')

SELECT
  id, movie_title, license_start, license_end
FROM
  movie_history_vw
WHERE 
  effective <@ (now() - INTERVAL '1 day')

The first step is to generate effective date ranges for rows in the movie and license tables. The ‘tsrange’ function creates intervals from the time of a change and the subsequent change – the last argument shows that it is closed on one end and open on the other. If we find the most recent change, we say it’s effective until infinity:

SELECT
  m.id, m.title,
  tstzrange(
    COALESCE(m.audit_date, '-infinity'), 
    COALESCE(lead(m.audit_date) OVER w_m, 'infinity'),
    '[)'
  ) movie_effective 
FROM movies$a m
window w_m AS (partition BY m.id ORDER BY m.audit_date ASC)

debugger-1

SELECT
  l.id, l.title, movie_id,
  tstzrange(
    COALESCE(l.audit_date, '-infinity'), 
    COALESCE(lead(l.audit_date) OVER w_l, 'infinity'),
    '[)'
  ) license_effective  
FROM licenses$a l
window w_l AS (partition BY l.id ORDER BY l.audit_date ASC)

debugger-2

Once we have effective date ranges for each relevant entity, we need to join the two histories on the foreign key relationship. Since the license is optional, we set it’s effective range to all of time, to make the next step easier:

WITH movies_history AS (...),
licenses_history AS (...)
SELECT m.id, m.title, l.id, l.title,
  movie_effective,
  COALESCE(l.license_effective, '[-infinity,infinity]') license_effective
FROM movies_history m
LEFT JOIN licenses_history l
ON l.movie_id = m.id

debugger-3

Now we can filter the results to only include rows where the license and movie were effective at the same time.

The “&&” operator checks for overlapping intervals, and the “*” operator finds the minimum interval:

WITH movies_history AS (...),
     licenses_history AS (...),
     joined_history AS (...)
SELECT 
  movie_id, movie_title, license_id, license_title,
  movie_effective * license_effective effective
FROM joined_history
WHERE movie_effective && license_effective

debugger-4

From this, we can now generate the query as desired, which filters to values effeffective at a specific time yesterday:

CREATE VIEW movie_history_vw AS 
WITH movies_history AS (...),
     licenses_history AS (...),
     joined_history AS (...)
WHERE movie_effective @> (now() - INTERVAL '1 day')

debugger-5

If you want to provide more parameters, it can be valuable to embed this query in a stored procedure, and use the stored procedure as a view that takes parameters, as shown below:

CREATE OR REPLACE FUNCTION movie_history(TIMESTAMP WITH TIME zone) 
RETURNS setof movie_history_vw AS 
$$
DECLARE
  r record;
  query text;
BEGIN
  RETURN query
    SELECT * 
    FROM movie_history_vw 
    WHERE license_effective @> $1 
END
$$ LANGUAGE 'plpgsql';
 
SELECT * FROM movie_history(now() - INTERVAL '1 day');

For the full solution to this, we need to generate the above views – once it includes dozens of tables, it is far too complex to hand-write.

If you want to do this with plpgsql, the format function works nicely, as it supports multi-line strings and can format identifiers for you:

table_sql := 
  format(
    '-- audit table creation
create table %s$a
     as select t.*, 
       null::text(1) audit_action,
       null::text audit_request,
       null::bigint audit_txid,
       null::text audit_user, 
       null::timestamp audit_date
     from %I t 
     where 0 = 1',
     TABLE_NAME,
     TABLE_NAME
);

The full solution to generate the audit tables and the history view is included in the wingspan-auditing library.