Storing every change to an application’s database allows for sophisticated forensic analysis- usage trends over time, as a long-range debugger or for implementing data correction features more typically found in version control software, like ‘cherry-pick’ or ‘revert’.
Many products require this in the form of an audit trail, which in the simplest case can be used to see who put bad data in the database.
The typical implementation in Postgres involves one or more triggers to capture changes, as well as secondary storage. Unfortunately, even though this is a common problem, the community hasn’t yet converged on single implementation which solves even the majority of use-cases, although there are some good partial solutions1234. Getting this right early in the software development process is very valuable, as it becomes hard to adjust once in production.
Consider the following example:
create table movies ( id int primary key, title text);
Each time a record changes, one possible implementation saves data in a separate table with context for the operation: the user who executed the operation, the time, and so on.
While it’s possible to create a single table which contains present and past data, separating current and past data into separate tables provides natural partitioning, which reflects the different uses of these two data sets.
create table movies$a ( id int, title text, audit_date timestamp with time zone);
I like to keep the triggers as simple as possible, by writing stored procedures that pre-generate queries in the simplest form. This reduces overall load on the system. To generate code in Postgres stored proceudres, the format function works well, allowing you to treat to use multi-line strings as templates – this lets you write the query you you want, then stick in the variable blocks.
create or replace function audit_movie_insert() returns trigger language plpgsql AS $$ begin execute 'insert into movies$a (id, title, audit_date, audit_action) values ($1.id, $1.title, $2, ''I'');' using new, now(); return null; end; $$; create trigger movies$t after insert on movies for each row execute procedure audit_movie_insert();
Having set this up, you can insert data in the audit table, and retrieve it with the timestamp:
Another technique uses the built-in hstore object to store all audit records in a single table. This makes a natural trade-off: a single table is simple to manage and conceptualize, and prevents you from having to deal with migrations. On the other hand, having one audit table per main table makes it easy to drop any audit table into any query from your application. Some people have found the hstore implementation to be faster5 – it also makes it easier to create one single index.
CREATE TABLE audit.logged_actions ( event_id bigserial primary key, schema_name text not null, table_name text not null, relid oid not null, session_user_name text, action_tstamp_tx TIMESTAMP WITH TIME ZONE NOT NULL, action_tstamp_stm TIMESTAMP WITH TIME ZONE NOT NULL, action_tstamp_clk TIMESTAMP WITH TIME ZONE NOT NULL, transaction_id bigint, application_name text, client_addr inet, client_port integer, client_query text, action TEXT NOT NULL CHECK (action IN ('I','D','U', 'T')), row_data hstore, changed_fields hstore, statement_only boolean not null );
The implementation above shows what values Postgres makes readily available. Many of these appear useful, but I would consider of dubious value: Most applications connect through a single user account, and load balancer (e.g. pgbouncer), which obfuscates away much useful context.
On the other hand, some of these are incredibly helpful, but not it’s not intuitively obvious why until you need them. ‘Action’ for instance is valuable for giving the database a way to filter rows quickly during forensic analysis, and for discovering when an entity came into being or was deleted. Knowing the query running at the time of execution is useful, but could generate a significant quantity of audit history.
Two of options particular interest are the current transaction ID and transaction snapshot ID:
select txid_current(), txid_current_snapshot()
The first is a large integer that increments each time a transaction starts, so it can be used to roughly sort updates by when they occurred. However, it will eventually roll back to zero. txid_current_snapshot allows you to find out which transactions completely previous and are currently running, which would let you re-construct which transactions were currently running when an audit record was written.
While these are useful, the above implementions lack the truly useful and necessary context that a real application has: it’s own concept of users, an HTTP request ID, the web server executing a request, and so on.
There are a couple ways to inject context into a trigger. A possible implementation is to build a temp table that has the request context. Note that if you want the optimizer to know how to deal with it, you have to call analyze manually, or the vacuum process will note see the table.
create temp table audit_context ( audit_user text, audit_request_id text ); insert into audit_context ( audit_user, audit_request_id ) values ( 'gary', '375acb29-9dcd-4229-84e1-9d7da71fe72b' ) analyse audit_context;
More simply, you can overload the application_name context parameter. Application_name is supposed to be used to identify an application: typical values are “psql” or “PGAdmin”, but it can be overridden safely. One possible use would be to create a context ID, then send all this data in an async write to a separate database, to be retrieved at a later time.
select split_part(application_name, ',', 1) user, split_part(application_name, ',', 2) request from pg_stat_activity where procpid = pg_backend_pid() into v_context;
Typically it is valuable to add a numeric revision number to each row. This allows the application to detect concurrent modifications, and simplifies sorting in the audit history.
If you’re able to inspect and modify audit history without a performance hit, it is valuable to find the previous row and create an ‘effective time range’ interval. This is how you’d do it if you didn’t store it:
select m.id, m.title, tstzrange( coalesce(m.audit_date, '-infinity'), coalesce(lead(m.audit_date) over w_m, 'infinity'), '[)' ) movie_effective from movies$a m window w_m as (partition by m.id order by m.audit_date asc)
You can also find the previous row, and set pointers going both directions, which causes the database to resemble a doubly-linked list:
select id from ( select audit_txid, id, title from movies$a audit_data where audit_data.audit_txid <= 324101 order by audit_txid desc ) a limit 1
Regardless of whether you create one large table or many smaller ones, it is easy to hide implementation details behind views and stored procedures, which can address many of the above complaints. If you know the column definition you want to return, it's very easy to make a stored procedure that acts as if it were a view, as shown below:
create or replace function movie_history(timestamp with time zone) returns setof movie_history_vw as $$ declare r record; query text; begin return query select * from movie_history_vw where license_effective @> $1 end $$ language 'plpgsql'; select * from movie_history(now() - interval '1 day');
Performance is a complex topic, and depends on your mix of write volume, total data, and need for forensic analysis. For smaller volumes, you can get away with everything in a single system, but as the volume grows, some amount of tuning is required (for instance, I've found that you can get away without indexes for a long time if you only do small amounts of forensic analysis, and it lets you save a lot of operational overhead).
Postgres does not support partitioning per se6, but it does allow you to define a master type and make many tables share the same schema, which gets you part way there. For an auditing system, you'd likely want to split data into time windows. This would require a script to create new windows; but you might wish to create one anyway to move old tables into an alternate tablespace on cheaper/slower storage.
CREATE TABLE measurement_y2006m02 ( ) INHERITS (measurement); CREATE TABLE measurement_y2006m03 ( ) INHERITS (measurement); CREATE TABLE measurement_y2006m02 ( CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2006m03 ( CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' ) ) INHERITS (measurement);
To avoid inconveniencing your end users, you may wish to move auditing out of transactions entirely. One method for doing this is to record the above information and send it to a message queue (e.g. Kafka7 ).
Alternately you can use the Postgres Async API to send queries8
Audit history typically provides a wealth of information, if you can sift through the noise and plan up front what you store. Postgres provides a number of built-in features that support sophisticated analysis, but they only work if you store enough up front.
If you're interested in this subject, check out the Wingspan Auditing library. In future posts I will discuss how to replicate git-style functionality within Postgres.