Trace Logging with React

Long ago, when I was a Delphi (Pascal) programmer, I encountered a neat language feature called properties, which are awesome for debugging (and more well known now, as they are in C#).

  Property Area   : Longint read fArea;
  Property Left   : Longint Index 0 read GetCoord write SetCoord;

Properties are accessed with the same syntax as a class-level variable. The consequence of this is that you can turn any variable into a property by adding three or four lines of code, which gives you a chokepoint where you can put a breakpoint to find out whenever that variable is set, which can be pretty awesome for understanding how an application works. I found this taught me to visualize an application as a plumbing problem: data goes from point A to point B, and all you need to do to fix a defect is figure out where it stops or gets mangled.

In the new world of React programming, we’re taught to treat a UI code as simply a mapping from state to DOM objects, which is similarly freeing (although without the rigidity that XSLT imposed).

If you use JSX, you can write composable components that are pretty easy to read (this is rendering a proposal editor):

return (
  <Accordion
    selectedIndex={self.props.ui.accordionIndex}
    setSelectedIndex={self.props.events.setAccordionIndex}>
    <div title="Choose Template" key="templateChooser">
      <PickList values={this.props.templates} 
                onselect={self.props.events.selectProposal}></PickList>
    </div>
    <FeeEditor 
      title="Fees" 
      currency={currencyFn} 
      fees={self.props.ui.selectedFees} 
      events={self.props.events} 
      products={self.props.products} 
      feeTypes={self.props.feeTypes} 
      worksheetFocus={self.props.ui.worksheetFocus}
      setWorksheetFocus={self.props.events.setWorksheetFocus} />
      <Accordion
        title="Proposal Sections"
        joinParent="true"
        selectedIndex={self.props.ui.sectionIndex}
        setSelectedIndex={self.props.events.setSectionIndex} >
        {sections}
      </Accordion>
      <div 
        title="Client Contact">
        <PickList values={self.props.clients} 
          onselect={self.props.events.selectUser} >
        </PickList>
        <UserEditor user={self.props.ui.selectedUser} />
      </div>
    </Accordion>
  );

All the state that controls the UI is passed in.

This design causes one or more large top-level objects. I split these into three classes of state: the ‘input’ data model (things from the database), the ‘output’ data model (what the end user thinks they picked), and transient UI state (which text box is focused, which tabs are open, etc).

The top level state object exposes event handlers which correspond to things a user does:

var events = {
  selectProposal: function (proposal, event, id) {
    data.ui.sectionIndex = 0;
    data.ui.selectedSections = proposal.sections;
 
    page.setState(data);
  }
...
}

Most of these event handlers look almost the same (using cursors may be eventually a helpful simplification).

In essence, we’ve replicated the “property” functionality that C# and Delphi have, which some fun results.

We can trivially write a function which takes the above events list and wraps each event in logging code:

function trace(events, log) {
  return _(events).reduce(
    function (accum, fn, key) {
      accum[key] = _.partial(log, fn, key);
      return accum;
    },
  {});
}

The ‘log’ function can just write to the console output, although that can get verbose fast:

log.before = function (fn, key, args) {
  console.log(key + ': ' + args.join(', '));
}

Rather than logging everything, I’ve chosen to provide pre/post event hooks – this lets you focus the log on the particular piece of state you’re interested in.

function log() {
  fn = arguments[0];
  key = arguments[1];
 
  var newArgs = _(arguments).slice(2);
 
  if (!!log.before) {
    log.before(fn, key, newArgs);
  }
 
  fn.apply(this, newArgs.value());
 
  if (!!log.after) {
    log.after(fn, key, newArgs);
  }
}

An interesting side effect of this approach is that you can add conditional breakpoints that fire when a value is set:

log.after = function (fn, key, args) {
  console.log('data.ui.accordionIndex: ' + data.ui.accordionIndex);
 
  if (data.ui.accordionIndex === undefined) {
    debugger;
  }
}

You can also record checkpoints of the top level state – this lets you return at any time to the point right before your error occurred:

log.before = function (fn, key, args) {
  console.log('data.ui.accordionIndex: ' + data.ui.accordionIndex);
 
  if (args[0] === undefined) {
    checkpoints.push(_.cloneDeep(data));
  }
}

Alternately, if data became wrong earlier in the process, you could record constant snapshots, allowing you to replay everything in a test case.

Overall, this style of programming is a huge improvement in working with the otherwise very chaotic nature of Javascript development.

Postgres: Time Travelling Debugger

Imagine you’re an engineer doing phone support for Netflix. The movies they show change regularly:

netflix7

There are various reasons for this – Netflix suddenly thinks you like period pieces, or they get into a contract dispute with one of their vendors.

netflix9

As a support engineer, this presents a set of communication challenges, which inspire some technical solutions. Customers often ask questions like “why is this different than a week ago?” It’s a reasonable question, but very time consuming to answer.

netflix10

To make this easier, we need to store the audit history of this data, and write a view that shows the full information available about a movie over time.

For this example, we define a simple schema, with movie and their associated licenses.

CREATE TABLE movies (
  id INT PRIMARY KEY, 
  title text);
 
CREATE TABLE licenses (
  id INT PRIMARY KEY,
  movie_id INT REFERENCES movies (id),
  title text, 
  start_date TIMESTAMP, 
  end_date TIMESTAMP
);

The example database maintains audit history, which stores every change that has happened over time:

CREATE TABLE movies$a (
  id INT, 
  title text, 
  audit_txid BIGINT,
  audit_date TIMESTAMP WITH TIME zone);
 
CREATE TABLE licenses$a (
  id INT,
  movie_id INT REFERENCES movies (id),
  title text, 
  start_date TIMESTAMP, 
  end_date TIMESTAMP,
  audit_txid BIGINT,
  audit_date TIMESTAMP WITH TIME zone
);

Given this, we want to write a view that shows everything about a movie, including an ‘effective’ time interval. If we include an interval in the view, we can filter to a specific time, as shown below (the <@ operator is 'contains')

SELECT
  id, movie_title, license_start, license_end
FROM
  movie_history_vw
WHERE 
  effective <@ (now() - INTERVAL '1 day')

The first step is to generate effective date ranges for rows in the movie and license tables. The ‘tsrange’ function creates intervals from the time of a change and the subsequent change – the last argument shows that it is closed on one end and open on the other. If we find the most recent change, we say it’s effective until infinity:

SELECT
  m.id, m.title,
  tstzrange(
    COALESCE(m.audit_date, '-infinity'), 
    COALESCE(lead(m.audit_date) OVER w_m, 'infinity'),
    '[)'
  ) movie_effective 
FROM movies$a m
window w_m AS (partition BY m.id ORDER BY m.audit_date ASC)

debugger-1

SELECT
  l.id, l.title, movie_id,
  tstzrange(
    COALESCE(l.audit_date, '-infinity'), 
    COALESCE(lead(l.audit_date) OVER w_l, 'infinity'),
    '[)'
  ) license_effective  
FROM licenses$a l
window w_l AS (partition BY l.id ORDER BY l.audit_date ASC)

debugger-2

Once we have effective date ranges for each relevant entity, we need to join the two histories on the foreign key relationship. Since the license is optional, we set it’s effective range to all of time, to make the next step easier:

WITH movies_history AS (...),
licenses_history AS (...)
SELECT m.id, m.title, l.id, l.title,
  movie_effective,
  COALESCE(l.license_effective, '[-infinity,infinity]') license_effective
FROM movies_history m
LEFT JOIN licenses_history l
ON l.movie_id = m.id

debugger-3

Now we can filter the results to only include rows where the license and movie were effective at the same time.

The “&&” operator checks for overlapping intervals, and the “*” operator finds the minimum interval:

WITH movies_history AS (...),
     licenses_history AS (...),
     joined_history AS (...)
SELECT 
  movie_id, movie_title, license_id, license_title,
  movie_effective * license_effective effective
FROM joined_history
WHERE movie_effective && license_effective

debugger-4

From this, we can now generate the query as desired, which filters to values effeffective at a specific time yesterday:

CREATE VIEW movie_history_vw AS 
WITH movies_history AS (...),
     licenses_history AS (...),
     joined_history AS (...)
WHERE movie_effective @> (now() - INTERVAL '1 day')

debugger-5

If you want to provide more parameters, it can be valuable to embed this query in a stored procedure, and use the stored procedure as a view that takes parameters, as shown below:

CREATE OR REPLACE FUNCTION movie_history(TIMESTAMP WITH TIME zone) 
RETURNS setof movie_history_vw AS 
$$
DECLARE
  r record;
  query text;
BEGIN
  RETURN query
    SELECT * 
    FROM movie_history_vw 
    WHERE license_effective @> $1 
END
$$ LANGUAGE 'plpgsql';
 
SELECT * FROM movie_history(now() - INTERVAL '1 day');

For the full solution to this, we need to generate the above views – once it includes dozens of tables, it is far too complex to hand-write.

If you want to do this with plpgsql, the format function works nicely, as it supports multi-line strings and can format identifiers for you:

table_sql := 
  format(
    '-- audit table creation
create table %s$a
     as select t.*, 
       null::text(1) audit_action,
       null::text audit_request,
       null::bigint audit_txid,
       null::text audit_user, 
       null::timestamp audit_date
     from %I t 
     where 0 = 1',
     TABLE_NAME,
     TABLE_NAME
);

The full solution to generate the audit tables and the history view is included in the wingspan-auditing library.

Auditing Data Modifications in Postgres

Implementing Auditing
Storing every change to an application’s database allows for sophisticated forensic analysis- usage trends over time, as a long-range debugger or for implementing data correction features more typically found in version control software, like ‘cherry-pick’ or ‘revert’.

Many products require this in the form of an audit trail, which in the simplest case can be used to see who put bad data in the database.

The typical implementation in Postgres involves one or more triggers to capture changes, as well as secondary storage. Unfortunately, even though this is a common problem, the community hasn’t yet converged on single implementation which solves even the majority of use-cases, although there are some good partial solutions. Getting this right early in the software development process is very valuable, as it becomes hard to adjust once in production.

Consider the following example:

CREATE TABLE movies (
  id INT PRIMARY KEY, 
  title text);

Each time a record changes, one possible implementation saves data in a separate table with context for the operation: the user who executed the operation, the time, and so on.

While it’s possible to create a single table which contains present and past data, separating current and past data into separate tables provides natural partitioning, which reflects the different uses of these two data sets.

CREATE TABLE movies$a (
  id INT, 
  title text, 
  audit_date TIMESTAMP WITH TIME zone);

Data Model
I like to keep the triggers as simple as possible, by writing stored procedures that pre-generate queries in the simplest form. This reduces overall load on the system. To generate code in Postgres stored proceudres, the format function works well, allowing you to treat to use multi-line strings as templates – this lets you write the query you you want, then stick in the variable blocks.

CREATE OR REPLACE FUNCTION audit_movie_insert() RETURNS TRIGGER
LANGUAGE plpgsql AS $$
BEGIN
  EXECUTE
    'insert into movies$a (id, title, audit_date, audit_action)
     values ($1.id, $1.title, $2, ''I'');'
  USING NEW, now();
 
  RETURN NULL;
END;
$$;
 
CREATE TRIGGER movies$t
  after INSERT ON movies 
  FOR each ROW EXECUTE PROCEDURE audit_movie_insert();

Having set this up, you can insert data in the audit table, and retrieve it with the timestamp:

pgadmin

Another technique uses the built-in hstore object to store all audit records in a single table. This makes a natural trade-off: a single table is simple to manage and conceptualize, and prevents you from having to deal with migrations. On the other hand, having one audit table per main table makes it easy to drop any audit table into any query from your application. Some people have found the hstore implementation to be faster – it also makes it easier to create one single index.

CREATE TABLE audit.logged_actions (
    event_id bigserial PRIMARY KEY,
    schema_name text NOT NULL,
    TABLE_NAME text NOT NULL,
    relid oid NOT NULL,
    session_user_name text,
    action_tstamp_tx TIMESTAMP WITH TIME ZONE NOT NULL,
    action_tstamp_stm TIMESTAMP WITH TIME ZONE NOT NULL,
    action_tstamp_clk TIMESTAMP WITH TIME ZONE NOT NULL,
    transaction_id BIGINT,
    application_name text,
    client_addr inet,
    client_port INTEGER,
    client_query text,
    action TEXT NOT NULL CHECK (action IN ('I','D','U', 'T')),
    row_data hstore,
    changed_fields hstore,
    statement_only BOOLEAN NOT NULL
);

Context
The implementation above shows what values Postgres makes readily available. Many of these appear useful, but I would consider of dubious value: Most applications connect through a single user account, and load balancer (e.g. pgbouncer), which obfuscates away much useful context.

On the other hand, some of these are incredibly helpful, but not it’s not intuitively obvious why until you need them. ‘Action’ for instance is valuable for giving the database a way to filter rows quickly during forensic analysis, and for discovering when an entity came into being or was deleted. Knowing the query running at the time of execution is useful, but could generate a significant quantity of audit history.

Two of options particular interest are the current transaction ID and transaction snapshot ID:

SELECT txid_current(), txid_current_snapshot()

The first is a large integer that increments each time a transaction starts, so it can be used to roughly sort updates by when they occurred. However, it will eventually roll back to zero. txid_current_snapshot allows you to find out which transactions completely previous and are currently running, which would let you re-construct which transactions were currently running when an audit record was written.

While these are useful, the above implementions lack the truly useful and necessary context that a real application has: it’s own concept of users, an HTTP request ID, the web server executing a request, and so on.

There are a couple ways to inject context into a trigger. A possible implementation is to build a temp table that has the request context. Note that if you want the optimizer to know how to deal with it, you have to call analyze manually, or the vacuum process will note see the table.

CREATE temp TABLE audit_context (
  audit_user text, 
  audit_request_id text
);
 
INSERT INTO audit_context (
  audit_user, 
  audit_request_id
) 
VALUES (
  'gary',
  '375acb29-9dcd-4229-84e1-9d7da71fe72b'
)
 
analyse audit_context;

More simply, you can overload the application_name context parameter. Application_name is supposed to be used to identify an application: typical values are “psql” or “PGAdmin”, but it can be overridden safely. One possible use would be to create a context ID, then send all this data in an async write to a separate database, to be retrieved at a later time.

SELECT 
  split_part(application_name, ',', 1) USER,
  split_part(application_name, ',', 2) request
FROM pg_stat_activity 
WHERE procpid = pg_backend_pid() 
INTO v_context;

Typically it is valuable to add a numeric revision number to each row. This allows the application to detect concurrent modifications, and simplifies sorting in the audit history.

If you’re able to inspect and modify audit history without a performance hit, it is valuable to find the previous row and create an ‘effective time range’ interval. This is how you’d do it if you didn’t store it:

SELECT
  m.id, m.title,
  tstzrange(
    COALESCE(m.audit_date, '-infinity'), 
    COALESCE(lead(m.audit_date) OVER w_m, 'infinity'),
    '[)'
  ) movie_effective 
FROM movies$a m
window w_m AS (partition BY m.id ORDER BY m.audit_date ASC)

You can also find the previous row, and set pointers going both directions, which causes the database to resemble a doubly-linked list:

SELECT id
FROM (
  SELECT audit_txid, id, title
  FROM movies$a audit_data
  WHERE audit_data.audit_txid <= 324101
  ORDER BY audit_txid DESC
) a
LIMIT 1

Regardless of whether you create one large table or many smaller ones, it is easy to hide implementation details behind views and stored procedures, which can address many of the above complaints. If you know the column definition you want to return, it’s very easy to make a stored procedure that acts as if it were a view, as shown below:

CREATE OR REPLACE FUNCTION movie_history(TIMESTAMP WITH TIME zone) 
RETURNS setof movie_history_vw AS 
$$
DECLARE
  r record;
  query text;
BEGIN
  RETURN query
    SELECT * 
    FROM movie_history_vw 
    WHERE license_effective @> $1 
END
$$ LANGUAGE 'plpgsql';
 
SELECT * FROM movie_history(now() - INTERVAL '1 day');

Performance
Performance is a complex topic, and depends on your mix of write volume, total data, and need for forensic analysis. For smaller volumes, you can get away with everything in a single system, but as the volume grows, some amount of tuning is required (for instance, I’ve found that you can get away without indexes for a long time if you only do small amounts of forensic analysis, and it lets you save a lot of operational overhead).

Postgres does not support partitioning per se, but it does allow you to define a master type and make many tables share the same schema, which gets you part way there. For an auditing system, you’d likely want to split data into time windows. This would require a script to create new windows; but you might wish to create one anyway to move old tables into an alternate tablespace on cheaper/slower storage.

CREATE TABLE measurement_y2006m02 ( ) INHERITS (measurement);
CREATE TABLE measurement_y2006m03 ( ) INHERITS (measurement);
 
CREATE TABLE measurement_y2006m02 (
    CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )
) INHERITS (measurement);
CREATE TABLE measurement_y2006m03 (
    CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )
) INHERITS (measurement);

To avoid inconveniencing your end users, you may wish to move auditing out of transactions entirely. One method for doing this is to record the above information and send it to a message queue (e.g. Kafka ).

Alternately you can use the Postgres Async API to send queries

Audit history typically provides a wealth of information, if you can sift through the noise and plan up front what you store. Postgres provides a number of built-in features that support sophisticated analysis, but they only work if you store enough up front.

If you’re interested in this subject, check out the Wingspan Auditing library. In future posts I will discuss how to replicate git-style functionality within Postgres.

Fixing NullPointerException in EdmNavigationPropertyImplProv.getMultiplicity

Sometimes you may receive a NullPointerException in the getMultiplicity function of EdmNavigationPropertyImplProv.java, part of the Olingo OData library.

This is caused by not having set associations / association sets up correction (i.e. foreign keys). To fix the problem you either need to fix your associations and association sets to match and have ‘setMultiplicity’ calls, or remove the associations entirely.

Fixing OData/Olingo error: “Could not find an entity set or function import for ‘x’.”

Sometimes with Apache Olingo you see this error message:

Could not find an entity set or function import for 'my_table'.

The Olingo sample applications register types like so:

static final String 
  ENTITY_SET_NAME_CARS = "Cars";
 
private static final 
  FullQualifiedName ENTITY_TYPE_1_1 = 
    new FullQualifiedName(
      NAMESPACE, 
      ENTITY_NAME_CAR);
 
...
 
List<EntitySet> entitySets = 
  new ArrayList<EntitySet>();
 
entitySets.add(
  getEntitySet(
    ENTITY_CONTAINER, 
    ENTITY_SET_NAME_CARS));

So, all you need to do to fix this error is to fill out these spots for your new type. To set up EntitySet you can do the following:

new EntitySet()
  .setName(name)
  .setEntityType(
    new FullQualifiedName(NAMESPACE, name)
   )

Note that you may want to use different values for ‘setName’ and ‘entityType’ – in the sample apps they use the pluralized form in setName and the singular in entity type.

Fixing Olingo Error “Format of ‘1’ is not compatible with ‘Edm.String’.”

When changing types in Olingo OData services, you may get an error like the following:

Format of '1' is not compatible with 'Edm.String'.

This indicates you have a line like this:

return type.valueOfString(key.getLiteral(), 
  EdmLiteralKind.DEFAULT, 
  property.getFacets(), 
  Integer.class);

Which should in fact be like this:

return type.valueOfString(
  key.getLiteral(), 
  EdmLiteralKind.DEFAULT, 
  property.getFacets(), 
  String.class);

Tips for Debugging Olingo Services

If you get an error debugging an Olingo service (one of the Java libraries for OData), you often get an error like “An exception occurred.”

This, apparently, is a result of decision by the Olingo team to not be dependent on a specific logging library (although I’m not sure how this makes sense, since they are dependent on CXF, which seems pretty heavy).

The simple solution to this problem they recommend is to add a callback method to your ODataServiceFactory implementation:

public <T extends ODataCallback> T getCallback(final Class<? extends ODataCallback> callbackInterface)
{ 
  return (T) (callbackInterface.isAssignableFrom(ScenarioErrorCallback.class) ? 
    new ScenarioErrorCallback() : 
      callbackInterface.isAssignableFrom(ODataDebugCallback.class) ? 
        new ScenarioDebugCallback() : 
        super.getCallback(callbackInterface));
}

And then, classes for each:

private final class ScenarioDebugCallback implements ODataDebugCallback {
  @Override
  public boolean isDebugEnabled()
  { 
    return true; 
  }
}
 
private final class ScenarioErrorCallback implements ODataDebugCallback {
@Override
  public boolean isDebugEnabled()
  { 
    return true; 
  }
}

Now, it seems silly that you’d want to have both of these, but this adds error messages to the JSON response, with each line of Java stack traces as a separate JSON object.

With these attributes set, add “odata-debug=json” to any Olingo URL and you’ll get debug information back.

Once you do this, you will then discover that the JSON payloads of Olingo are monstrous, and to deal with this you need a browser plugin – I like JSON Formatter, which auto-detects JSON, then formats and colorizes it:

json-formatter2

Getting Started with OData in Java

OData is a new web service protocol, developed by Microsoft, which sits somewhere between ODBC, Rest, and a business intelligence oriented SQL.

The protocol seems to be fairly complex, and as such, does not seem well-suited to “big data” workflows. It does, however have built in support for things like differential data loads, so there is some hope there, however minor. While Microsoft claims the big benefits of OData is API standardization, the big win currently is that a lot of Microsoft products speak the protocol: Excel 2013 can natively import data from OData services (and there is a plugin for 2010), as can Sharepoint. Notably the demo applications of Excel use access files with ~5-10 million row tables in them, which is well beyond what I would naively expect Excel to handle.

When you enable “PowerPivot”, you get a new ribbon:

excel-powerpivot

 

When you click ‘Manage’ you get to another screen, where you can import data from other Excel files, Access databases, real databases, or a web service. From here you can also define foreign key relationships, so once you get a bunch of data downloaded, you can join across multiple sources.

While this sounds like an abomination, there are people who already run entire businesses this way. For instance, I ran across a grocery store that would have a regular database for member information and inventory, and separate excel spreadsheets emailed to them by manufacturers – this all joined together in Excel to produce mailings, etc.

 

powerpivot2

 

One other thing of note is that you can do imports from the ‘Azure Data Market’, which is a bunch of free and for-sale data sets, potentially a good distribution channel for someone with valuable data.

powerpivot2-azure

 

If you’re strictly interested in developing OData API endpoints, Excel is still a nice tool, because you can trap the network traffic in Fiddler to see what the protocol looks like.

There are a few Java libraries to help support this protocol, although so far all seem a lot larger than one would really want (e.g. you end up pulling in a lot of dependencies, many of which will be incompatible what what you already use, e.g. CXF).

You can, however, get a really simple project up and running fairly quickly by following the maven instructions [[http://olingo.apache.org/doc/sample-setup.html]] (maven archetypes are just templates for projects, so these instructions aren’t really that clever)

mvn archetype:generate \
  -DinteractiveMode=false \
  -Dversion=1.0.0-SNAPSHOT \
  -DgroupId=com.sample \
  -DartifactId=my-car-service \
  -DarchetypeGroupId=org.apache.olingo \
  -DarchetypeArtifactId=olingo-odata2-sample-cars-service-archetype \
  -DarchetypeVersion=RELEASE \
 
cd my-car-service
 
mvn clean install
 
cd target

You then need to copy “my-car-service.war” into Tomcat, and when you load http://localhost:8080/my-car-service/, you’ll get this page:

mvn

 

To get Excel to read this, you need to use a slightly different URL:

http://localhost:8080/my-car-service/MyFormula.svc/

odata-import

 

One of the neat things about this protocol is that it allows Excel to auto-discover the table names:

 

 

 

odata-import2

 

So far, it just seems to import all the data, so be careful:

 

 

 

 

odata-import3

 

Now you have the data, and can pull a refresh whenever you want it, and that’s it!

 

 

odata-import4

How to fix MojoFailureException configuring Olingo

If you set up Olingo, they have you run a Maven archetype (project template) like so:

mvn archetype:generate \
  -DinteractiveMode=false \
  -Dversion=1.0.0-SNAPSHOT \
  -DgroupId=com.sample \
  -DartifactId=my-car-service \
  -DarchetypeGroupId=org.apache.olingo \
  -DarchetypeArtifactId=olingo-odata2-sample-cars-service-archetype \
  -DarchetypeVersion=RELEASE \

If you didn’t follow the instructions correctly, you’ll get the following error:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-archetype-plugin:2.2:generate (default-cli) on project OlingoSampleClient: A Maven 2 project already exists in the directory d:\projects\odata\my-car-service -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging. 
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

All what this means is that you need to make a ‘project’ folder, move everything into it – Maven doesn’t like the .pom file in the current folder:

gary@GARY-PC /d/projects/odata
$ ls
my-car-service  pom.xml  project

“Postgres for Developers” – Notes from PGConf NYC 2014

I saw a talk by one of the core Postgres developers, which showed a bunch of interesting tricks to handle business rules in Postgres specific SQL. These are all things you could find by reading the documentation, but they are interesting enough to write up examples to highlight some interesting things you can do. A lot of these end up being useful for writing systems with immutable data (especially auditing, and sometimes reporting systems).

Example 1: Array Aggregation

“array_agg” can be used to combine rows, which sort of resembles a pivot table operation (this is the same set of values that would be passed as arguments to other aggregation functions)

SELECT y, array_agg(x) FROM (
  SELECT 1 x, 2 y
  UNION ALL
  SELECT 2 x, 2 y
  UNION ALL 
  SELECT 3 x, 3 y
) a
GROUP BY y
 
2;"{1,2}"
3;"{3}"

If you use the above table as a common table expression, you can also rename the columns in the with block. You can even join on the arrays:

WITH t(a, b) AS
(
  SELECT y, array_agg(x) FROM (
    SELECT 1 x, 2 y
    UNION ALL
    SELECT 2 x, 2 y
    UNION ALL 
    SELECT 3 x, 3 y
  ) a
  GROUP BY y
)
SELECT * 
FROM t t1 JOIN t t2 ON t1.b[2] = t2.a
 
2;"{1,2}";2;"{1,2}"

Example 2: Named Window Functions

I’m not sure yet whether this is just syntactic sugar or has real value, but you can set up named “windows.”

By way of explanation, a lot of times when you start using aggregate functions (min, max, array_agg, etc), you end up using window functions, which resemble the following:

SELECT a, MAX(b) OVER (partition BY a) 
FROM (
  SELECT 1 a, 1 b
  UNION ALL 
  SELECT 2 a, 1 b
  UNION ALL 
  SELECT 1 a, 2 b
) t1
 
1;2
1;2
2;1

These allow you do calculate aggregate functions (like min/max) without combining all the rows.

For instance, if you sort these values, you can find the “next” or “previous” row in the partition, which is pretty standard sql stuff:

SELECT a, lag(b) OVER (partition BY a ORDER BY b) 
FROM (
  SELECT 1 a, 1 b
  UNION ALL 
  SELECT 2 a, 1 b
  UNION ALL 
  SELECT 1 a, 2 b
) t1
 
1;
1;1
2;

If you use the above table as a common table expression, you can then rename the columns in the WITH block. You can even join on the arrays:

WITH t(a, b) AS
(
  SELECT y, array_agg(x) FROM (
    SELECT 1 x, 2 y
    UNION ALL
    SELECT 2 x, 2 y
    UNION ALL 
    SELECT 3 x, 3 y
  ) a
  GROUP BY y
)
SELECT * 
FROM t t1 JOIN t t2 ON t1.b[2] = t2.a
 
2;"{1,2}";2;"{1,2}"

What’s cool is you can move the “over partition by” part out of the query to the end as a named window, which presumably would be really nice if you had a lot of them, or wanted to re-use the same window for multiple fields:

SELECT a, lag(b) OVER w
FROM (
  SELECT 1 a, 1 b
  UNION ALL 
  SELECT 2 a, 1 b
  UNION ALL 
  SELECT 1 a, 2 b
) t1
window w AS (partition BY a ORDER BY b) 
 
1;
1;1
2;

Example 3: Ranges
Postgres has a really cool feature, as of 9.2, where you can query whether something is in a range (ranges are a special type, kind of like the arrays above). This example is a bit contrived, to show that you could combine array_agg and range creation:

WITH _data AS (
  SELECT 1 a, 1 b
  UNION ALL 
  SELECT 2 a, 1 b
  UNION ALL 
  SELECT 1 a, 2 b
  UNION ALL
  SELECT 2 a, 2 b
),
_history AS (
  SELECT a, array_agg(b) _start, array_agg(b) _end
  FROM _data
  GROUP BY a
)
SELECT a, 
       _start[1], 
       _end[1], 
       int4range(_start[1]::INTEGER, _end[2]::INTEGER, '(]'::text) 
FROM _history
 
1;1;1;"[2,3)"
2;1;1;"[2,3)"

There are a bunch of range types built in (based on numerics, timestamps). Note that you can specify whether the endpoints on ranges are inclusive or exclusive.

You can find out if a data value is within a range with the @> operator and see if two ranges overlap with &&. This set of functionality is great for exploring audit records – if you make a range with “[valid_from, valid_to)” you can query to find out what rows were effective on a particular date/time, for instance.

If you’re in this area of functionality, also check out btree_gist indexes, which may be helpful for tuning this.

Example 4: DISTINCT ON
Postgres has a feature to pull back the first value for a row in a group by. I assume this is a performance feature, but at the least it’s a very concise syntax for something that would otherwise require the use of RANK(). I imagine that you’d always want to use an ORDER BY with it.

The example from the docs for this one is pretty clear:

SELECT DISTINCT ON (location) location, TIME, report
FROM weather_reports
ORDER BY location, TIME DESC;

There are a few other features that got a lot of play at the conference (e.g. foreign data wrappers) – more to come.