September, 2013 - Gary Sieling

One of the challenges in supporting complex applications is to answer questions like “Why doesn’t my document show up in search results,” or “why isn’t this event in my report?”

Even though database operations are often modeled with set theory, returning the inverse of a query is not sufficient.

Consider, however, a hypothetical ORM layer, where one specifies what is needed, and the ORM builds SQL for you:

query = {
  select: ['document_id', 'document_name', 'document_author', 'document_description'],
  where: [
    {column: 'company', op: '=', value: 'ACME Corp'},
    {column: 'creation_date', op: '>', value: '1/1/2013'}
  ]
};

The simplest ORM layers will just concatenate this into SQL according to known formatting rules; more complex systems handle ACLs, construct joins, or encapsulate rules for tuning queries.

// Vastly oversimplified way of constructing SQL
function sql(query) {
  var result =
    "SELECT " + query.select.join(", ") +
    " WHERE ";

  for (var i = 0; i < query.where.length; i++) 
    result += query.where[i].column + 
              query.where[i].op +
              "'" + query.where[i].value + "'"
}

function runQuery(query) {
  var q = sql(query); 
  // run query
  ...
  // return results
}

Now, let's consider the support question, "where is document X". Hopefully it is expressed in this form and not "there were more things yesterday," although you could extend this technique to handle scenarios like that.

// Treat query objects as immutable
function copyQuery(query) {
  return {
    select: query.select.slice(0),
    where: query.where.slice(0)
  }
};

function getTestQuery(query, id) {
  var result = copyQuery(query);
  result.where.push(
    {column: 'document_id', op: '=', value: id}
  );
  return result;
}

If the earlier query was fast, this should be as well. You want to run this query first, to prove that the user's ticket is actually caused by data, rather than misconfigured caching or a higher level defect.

From here, this becomes a simple bread-first search:

function removeWhere(query, i) {
 return {
    select: query.select.slice(0),
    where: query.where.slice(0).splice(i, i+1)
  } 
}

var tests = [];
tests.push(removeWhere(query, 0));
tests.push(removeWhere(query, 1));
// etc

Then, all you have to do is run through these tests until you get a result, essentially the same as what a developer would do by hand. Looks simple, right?

The reason you want to do a bread-first search is that for very complex queries (dozens of where clauses), removing fields will tend on average to make things slower. There is a risk, for instance, of removing an indexed field. Ideally this "why not" analysis then becomes baked into your ORM, which would work well if it knows which fields are indexed.

An alternate, general case scenario would be for the ORM to run a database execution plan and apply some heuristics, e.g. if it changes structurally from the original, put it later in the queue.

A final consideration is that of ACLs: it may be mandatory for this support functionality to be baked into an ORM layer, as you'd want to run this with an "ignore permissions" option, to determine if ACLs are the cause. The analysis of "why doesn't this user have permissions" is too complex (and often this is a business process or training issue), and better answered through other means.

This technique lets you unlock some of the power of your ORM layer (assuming it has any to begin with), automating typical support work through code. In practice I've yet to see this done, but it seems like a very practical feature for the next generation of ORMs to consider, as unglamorous problems like support are not always prime topics of discussion among developers.

Month: September 2013

Using ORM layers to ease Support