Philly ETE – Database as a Value

This was the first time I’ve seen Rich Hickey’s talk on Datomic, which lent great clarity to the product. As implemented, Datomic functions as an immutable database for philosophical reasons, although in practice it doesn’t manage it’s own storage, and may eventually support deletions to satisfy legal and compliance issues around privacy.

This database technology aims to solve common problems encountered in relational database systems: inability to reason about the current state of data, concurrency issues, challenges of viewing data as of a given time, mismatched metaphors between code and data, and various frustrating design patterns. Making data immutable and adding a time dimension to data allows a number of great simplifications, and produces an index resembling git history (although unlike git, you can’t rewrite history).

There are a number of possible implementations – naive approaches would copy data or build append-logs (as is typical in database rewrite logs), but the Datomic structures bear study, as they are all based on wide, short trees. Because data nodes are immutable, the contents of a tree won’t change, but only have new data added. This lets the db track novelty, and arranged as they have it currently, it appears that you can easily traverse the history as a range scan. Each tree-node has a series of pointers to other tree locations, based on various pre-indexed ways to traverse the tree.

Architecturally, this can have many readers, which don’t require coordination (a query would specify the database/db version it queries against), but a single writer.

Data elements are designed to resemble facts (e.g. :noun :verb :object), so one might think of this as resembling a cell or row in a traditional database. Because these are immutable, they can also be cached anywhere, or cached split across machines, so a database could have arbitrary subsets of data in memory – query results are a merge-join of memory and disk storage. As I understood it, data is read into memory as segments (pages) like traditional databases.

This is designed to take a functional approach to database programming, so instead of running a query inside a database, you run a function on a database, e.g. f(db) -> results. This has great potential – ease of mocking data, running hypothetical inserts to see the result, passing custom code in database queries. Notably, this is not necessarily intended to work well for very large data (click streams) or non-functional uses of a database (i.e. anything that uses a db as a global variable, such as a counter).