Migrating Node.js apps from Heroku to Dokku

Heroku (and Dokku) make for an awesome development environment- the most well-known feature being the use of “git push” to deploy changes to a server. For simple projects, I’ve found that the deployment time is as fast as compilation times on JVM based projects, except that at the end you have a working environment.

Knowing you can deploy an environment at any time during a project means you can spin up as many environments as you want later, and it forces you to think through migrations in your checkins.

The downside of Heroku is that you are somewhat locked in, and if you want to build many small applications as a hobbyist, it adds up ($7/mo to keep one app running – the free ones shut down all the time). On the other hand, you could just set up a VM to run several apps, which guarantees many wasted hours on dependency hell problems (every time you update a shared library, you risk breaking what you have previously built).

Conceptually this setup is not that complex, but there are a lot of moving parts to get right, and it isn’t that satisfying to work on vs. actually building things.

To build a system like Heroku where you can push from git, you’d start by checking a script into the repository that runs when code is pushed to one of the remote repositories (a post-commit hook). This would kick off the deployment.

Heroku also has you check in a script to the root of your repository that defines what you want to run (a bash script with something like “node index.js” and any arguments). This lets them control the deployment, and lets the developer control what actually runs.

Virtualization at the operating system-level saves you from dependency issues (called “Linux Containers” or “LXC”), without having to build and secure a VM for every app. Linux containers seem pretty complex and I would assume there are some good ways to screw this up from a security perspective. It is new enough that tools and best practices aren’t as mature and clear as the older generation, so it’s easy to get mired in all the options (the few hours I spent trying to get Docker working have yet to pay off)

There is an open-source script called Dokku (a Star Wars reference, perhaps?), which ties all these pieces together for you, and tries to act as much like Heroku as possible. Consequently you can use a cheap virtual machine, and then upgrade it when you need real infrastructure. The marketing materials for Dokku seem quite slick, so I suspect that one or more companies is sponsoring it (probably someone like Digital Ocean), although I haven’t been able to find evidence of this.

Digital Ocean has a pre-built Dokku image you can use at $5/mo. While they’ve had some growing pains, they typically give out credits when they have issues. If you follow them on Facebook and subscribe to their give out a lot of coupons as well, which has made the $5 VMs basically free – clearly a loss leader for them (feel free to use my referral link – you get $10 off to start, and I get another $25)

Regardless of who you use, if you set up a Dokku image, there is a page where you upload your SSH public key:
dokku1

This lets it accepts your “git push.”

If you want to use subdomains for each app like Heroku does, you’ll need to add them to your DNS manually. I put Cloudflare on any site I can, if only because it has the best DNS management UI I’ve ever seen (and they don’t do referral links, unfortunately!):

dokku2

Once you set these two things up, ssh into the VM.

You can run create commands for each app you want to configure:

 dokku apps:create editor
 dokku apps:create meeting
 dokku apps:create gmail-search
 dokku apps:create document-search

This will let your VM receive pushes.

Then, to push changes, you need to set up the remote:

git remote add dokku dokku@apps.garysieling.com:generic-search-ui

If you add this to an existing app, you can push to both Heroku and Dokku simultaneously by doing this:

git push heroku master
git push dokku master

You will see the same sort of commit log on the remote that you see with Heroku:

gary@gary-PC MINGW64 /d/projects/heroku/editor (master)
$ git push dokku master
Counting objects: 451, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (350/350), done.
Writing objects: 100% (451/451), 47.84 KiB | 0 bytes/s, done.
Total 451 (delta 305), reused 138 (delta 91)
-----> Cleaning up...
-----> Building editor from herokuish...
-----> Setting config vars
       CURL_CONNECT_TIMEOUT: 5
-----> Setting config vars
       CURL_TIMEOUT: 30
-----> Adding BUILD_ENV to build environment...
-----> Node.js app detected

-----> Creating runtime environment

       NPM_CONFIG_LOGLEVEL=error
       NPM_CONFIG_PRODUCTION=true
       NODE_ENV=production
       NODE_MODULES_CACHE=true

-----> Installing binaries
       engines.node (package.json):  5.5.0
       engines.npm (package.json):   unspecified (use default)

       Downloading and installing node 5.5.0...
       Using default npm version: 3.3.12

-----> Restoring cache
       Skipping cache restore (new runtime signature)

-----> Building dependencies
       Pruning any extraneous modules
       Installing node modules (package.json)
       editor@1.0.0 /tmp/build
       +-- body-parser@1.14.2
       ...


-----> Caching build
       Clearing previous node cache
       Saving 2 cacheDirectories (default):
       - node_modules
       - bower_components (nothing to cache)

-----> Build succeeded!
       +-- body-parser@1.14.2
       +-- cookie-parser@1.4.1
       +-- ejs@2.3.4
       +-- errorhandler@1.4.3
       +-- express@4.13.4
       +-- express-session@1.13.0
       +-- foreman@1.4.1
       +-- http@0.0.0
       +-- lodash@4.0.0
       `-- morgan@1.6.1

-----> Discovering process types
       Procfile declares types -> web
-----> Releasing editor (dokku/editor:latest)...
-----> Deploying editor (dokku/editor:latest)...
-----> DOKKU_SCALE file not found in app image. Generating one based on Procfile...
-----> New DOKKU_SCALE file generated
=====> web=1
-----> Running pre-flight checks
       For more efficient zero downtime deployments, create a file CHECKS.
       See http://dokku.viewdocs.io/dokku/checks-examples.md for examples
       CHECKS file not found in container: Running simple container check...
-----> Waiting for 10 seconds ...
-----> Default container check successful!
=====> editor container output:
       Node app is running on port 5000
=====> end editor container output
-----> Running post-deploy
=====> renaming container (3735de61fe0a) suspicious_euclid to editor.web.1
-----> Creating new /home/dokku/editor/VHOST...
-----> Setting config vars
       DOKKU_NGINX_PORT: 80
-----> Configuring editor.garysieling.com...
       (using /var/lib/dokku/plugins/available/
               nginx-vhosts/templates/nginx.conf.template)
-----> Creating http nginx.conf
-----> Running nginx-pre-reload
       Reloading nginx
-----> Setting config vars
       DOKKU_APP_RESTORE: 1
=====> Application deployed:
       http://editor.garysieling.com

To dokku@apps.garysieling.com:editor
 * [new branch]      master -> master

Then, if you need a database, you can install this as well:

sudo dokku plugin:install https://github.com/dokku/dokku-postgres.git
 
dokku postgres:link editor editor

I name my database and my apps the same, which makes the above command a little unclear.

Dokku supports the same concept of environment configuration variables that Heroku does (i.e. your database connection string, API keys, etc should never be checked in). When you run the above commands, you will get log entries like this:

no config vars for generic-search-ui
-----> Setting config vars
       DATABASE_URL: postgres://postgres:baf4@dokku-editor:5432/editor
-----> Restarting app generic-search-ui
App generic-search-ui has not been deployed

If you are migrating an application and change the DNS, you will need to update any references in external services – e.g. typekit, the twitter developer API, Google Developer keys, etc.

Once you’ve set this up, there are some useful troubleshooting commands. E.g. list installed apps:

dokku apps
root@ubuntu-512mb-nyc2-01:~# dokku apps
=====> My Apps
editor
generic-search
generic-search-ui

Viewing logs is a little more complex – you need to list the running processes, and get them using docker:

root@ubuntu-512mb-nyc2-01:~# docker ps
CONTAINER ID        IMAGE                            COMMAND                  CREATED              STATUS              PORTS               NAMES
fcc5604164c9        dokku/generic-search-ui:latest   "/start web"             About a minute ago   Up About a minute                       generic-search-ui.web.1
e4e42ace32f2        postgres:9.5.0                   "/docker-entrypoint.s"   About an hour ago    Up About an hour    5432/tcp            dokku.postgres.generic-search-ui
697e11c55807        dokku/editor:latest              "/start web"             2 days ago           Up 2 days                               editor.web.1
 
docker attach fcc5604164c9

And there you have it- your own PaaS, with limited dependencies on external companies for hosting.

,

Testing against multiple jQuery versions

I recently took over maintenance of a jQuery UI plugin that lets you highlight bits of a textarea. Since it’s a UI plugin, most of the testing is done by visual inspection, and supporting different versions of jQuery presents an interesting testing challenge, since jQuery is one of the first things to load on the page.

jquery-highlight

This is the code that generates the above example:

function runTest() {
  $('#demo1').highlightTextarea({
      words: {
        color: '#ADF0FF',
        words: ['Lorem ipsum','vulputate']
      },
      debug: true
  });
}

It turns out that you can dynamically add scripts to a page, so rather than including jQuery in a script tag, we can import it after the page loads.

Normally you might use “$(document).ready” to run initialization code, but you can use window.onload directly instead:

window.onload = loadJavascript;

Inside the runTests function, you need the ability to append a script, and call a callback once it loads, since this is an asynchronous process.

function appendScript(src, callback) {
  var head = document.getElementsByTagName('head')[0];
 
  var elt = document.createElement("script");
  elt.type = "text/javascript";
  elt.src = src;
 
  elt.onload = function() {
    callback();
  }
 
  head.appendChild(elt);
}

To make the page load correctly, I’ve set this up so you can specify the jQuery version on the URL. Once jQuery loads, it triggers jQuery UI loading, and then finally our own scritpts.

function loadJavascript() {
  // this sets the default
  var jqv = "jquery-1.11.1.min.js";
 
  try {
    jqv = window.location.search.split('?')[1].split('=')[1];
  } catch (e) {}
 
  appendScript("http://code.jquery.com/" + jqv,
    function() {
      $('#jqv').val(jqv);
 
      appendScript(
        "http://code.jquery.com/ui/1.10.4/jquery-ui.min.js",
      function() {
        appendScript("../jquery.highlighttextarea.js",
        runTest
      )})});
}

Once this works, you can add a dropdown with all the jQuery versions you want to support, and set the page to refresh when one is selected:

function setJQuery(newVersion) {
  var jqv = $('#jqv').val();
 
  window.location = 'index.html?jqv=' + jqv;
}

For completeness, here is the dropdown:

<select id="jqv" onchange="setJQuery()">
  <option value="jquery-2.2.0.js">jQuery Core 2.2.0</option>
  <option value="jquery-2.1.4.js">jQuery Core 2.1.4</option>
  <option value="jquery-2.1.3.js">jQuery Core 2.1.3</option>
  <option value="jquery-2.1.2.js">jQuery Core 2.1.2</option>
  <option value="jquery-2.1.1.js">jQuery Core 2.1.1</option>
  <option value="jquery-2.1.1-rc2.js">jQuery Core 2.1.1-rc2</option>
  <option value="jquery-2.1.1-rc1.js">jQuery Core 2.1.1-rc1</option>
  <option value="jquery-2.1.1-beta1.js">jQuery Core 2.1.1-beta1</option>
  <option value="jquery-2.1.0.js">jQuery Core 2.1.0</option>
  <option value="jquery-2.1.0-rc1.js">jQuery Core 2.1.0-rc1</option>
  <option value="jquery-2.1.0-beta3.js">jQuery Core 2.1.0-beta3</option>
  <option value="jquery-2.1.0-beta2.js">jQuery Core 2.1.0-beta2</option>
  <option value="jquery-2.1.0-beta1.js">jQuery Core 2.1.0-beta1</option>
  <option value="jquery-2.0.3.js">jQuery Core 2.0.3</option>
  <option value="jquery-2.0.2.js">jQuery Core 2.0.2</option>
  <option value="jquery-2.0.1.js">jQuery Core 2.0.1</option>
  <option value="jquery-2.0.0.js">jQuery Core 2.0.0</option>
  <option value="jquery-2.0.0-beta3.js">jQuery Core 2.0.0-beta3</option>
  <option value="jquery-2.0.0b2.js">jQuery Core 2.0.0-b2</option>
  <option value="jquery-2.0.0b1.js">jQuery Core 2.0.0-b1</option>
  <option value="jquery-1.12.0.js">jQuery Core 1.12.0</option>
  <option value="jquery-1.11.3.js">jQuery Core 1.11.3</option>
  <option value="jquery-1.11.2.js">jQuery Core 1.11.2</option>
  <option value="jquery-1.11.1.js">jQuery Core 1.11.1</option>
  <option value="jquery-1.11.1-rc2.js">jQuery Core 1.11.1-rc2</option>
  <option value="jquery-1.11.1-rc1.js">jQuery Core 1.11.1-rc1</option>
  <option value="jquery-1.11.1-beta1.js">jQuery Core 1.11.1-beta1</option>
  <option value="jquery-1.11.0.js">jQuery Core 1.11.0</option>
  <option value="jquery-1.11.0-rc1.js">jQuery Core 1.11.0-rc1</option>
  <option value="jquery-1.11.0-beta3.js">jQuery Core 1.11.0-beta3</option>
  <option value="jquery-1.11.0-beta2.js">jQuery Core 1.11.0-beta2</option>
  <option value="jquery-1.11.0-beta1.js">jQuery Core 1.11.0-beta1</option>
  <option value="jquery-1.10.2.js">jQuery Core 1.10.2</option>
  <option value="jquery-1.10.1.js">jQuery Core 1.10.1</option>
  <option value="jquery-1.10.0.js">jQuery Core 1.10.0</option>
  <option value="jquery-1.10.0-beta1.js">jQuery Core 1.10.0-beta1</option>
  <option value="jquery-1.9.1.js">jQuery Core 1.9.1</option>
  <option value="jquery-1.9.0.js">jQuery Core 1.9.0</option>
  <option value="jquery-1.9.0rc1.js">jQuery Core 1.9.0-rc1</option>
  <option value="jquery-1.9.0b1.js">jQuery Core 1.9.0-b1</option>
  <option value="jquery-1.8.3.js">jQuery Core 1.8.3</option>
  <option value="jquery-1.8.2.js">jQuery Core 1.8.2</option>
  <option value="jquery-1.8.1.js">jQuery Core 1.8.1</option>
  <option value="jquery-1.8.0.js">jQuery Core 1.8.0</option>
  <option value="jquery-1.8rc1.js">jQuery Core 1.8.0-rc1</option>
  <option value="jquery-1.8b2.js">jQuery Core 1.8.0-b2</option>
  <option value="jquery-1.8b1.js">jQuery Core 1.8.0-b1</option>
  <option value="jquery-1.7.2.js">jQuery Core 1.7.2</option>
  <option value="jquery-1.7.2rc1.js">jQuery Core 1.7.2-rc1</option>
  <option value="jquery-1.7rc1.js">jQuery Core 1.7.0-rc1</option>
  <option value="jquery-1.7b2.js">jQuery Core 1.7.0-b2</option>
  <option value="jquery-1.7b1.js">jQuery Core 1.7.0-b1</option>
  <option value="jquery-1.6.4.js">jQuery Core 1.6.4</option>
  <option value="jquery-1.6.4rc1.js">jQuery Core 1.6.4-rc1</option>
  <option value="jquery-1.6.3.js">jQuery Core 1.6.3</option>
  <option value="jquery-1.6.3rc1.js">jQuery Core 1.6.3-rc1</option>
  <option value="jquery-1.6.2.js">jQuery Core 1.6.2</option>
  <option value="jquery-1.6.2rc1.js">jQuery Core 1.6.2-rc1</option>
  <option value="jquery-1.6.1.js">jQuery Core 1.6.1</option>
  <option value="jquery-1.6.1rc1.js">jQuery Core 1.6.1-rc1</option>
  <option value="jquery-1.6.js">jQuery Core 1.6.0</option>
  <option value="jquery-1.6rc1.js">jQuery Core 1.6.0-rc1</option>
  <option value="jquery-1.6b1.js">jQuery Core 1.6.0-b1</option>
  <option value="jquery-1.5.2.js">jQuery Core 1.5.2</option>
  <option value="jquery-1.5.2rc1.js">jQuery Core 1.5.2-rc1</option>
  <option value="jquery-1.5.1.js">jQuery Core 1.5.1</option>
  <option value="jquery-1.5.1rc1.js">jQuery Core 1.5.1-rc1</option>
  <option value="jquery-1.5.js">jQuery Core 1.5.0</option>
  <option value="jquery-1.5rc1.js">jQuery Core 1.5.0-rc1</option>
  <option value="jquery-1.5b1.js">jQuery Core 1.5.0-b1</option>
  <option value="jquery-1.4.4.js">jQuery Core 1.4.4</option>
  <option value="jquery-1.4.4rc3.js">jQuery Core 1.4.4-rc3</option>
  <option value="jquery-1.4.4rc2.js">jQuery Core 1.4.4-rc2</option>
  <option value="jquery-1.4.4rc1.js">jQuery Core 1.4.4-rc1</option>
  <option value="jquery-1.4.3.js">jQuery Core 1.4.3</option>
  <option value="jquery-1.4.3rc2.js">jQuery Core 1.4.3-rc2</option>
  <option value="jquery-1.4.3rc1.js">jQuery Core 1.4.3-rc1</option>
  <option value="jquery-1.4.2.js">jQuery Core 1.4.2</option>
  <option value="jquery-1.4.1.js">jQuery Core 1.4.1</option>
  <option value="jquery-1.4.js">jQuery Core 1.4.0</option>
  <option value="jquery-1.4rc1.js">jQuery Core 1.4.0-rc1</option>
  <option value="jquery-1.4a2.js">jQuery Core 1.4.0-a2</option>
  <option value="jquery-1.4a1.js">jQuery Core 1.4.0-a1</option>
  <option value="jquery-1.3.2.js">jQuery Core 1.3.2</option>
  <option value="jquery-1.3.1.js">jQuery Core 1.3.1</option>
  <option value="jquery-1.3.1rc1.js">jQuery Core 1.3.1-rc1</option>
  <option value="jquery-1.3.js">jQuery Core 1.3.0</option>
  <option value="jquery-1.3rc2.js">jQuery Core 1.3.0-rc2</option>
  <option value="jquery-1.3rc1.js">jQuery Core 1.3.0-rc1</option>
  <option value="jquery-1.3b2.js">jQuery Core 1.3.0-b2</option>
  <option value="jquery-1.3b1.js">jQuery Core 1.3.0-b1</option>
  <option value="jquery-1.2.6.js">jQuery Core 1.2.6</option>
  <option value="jquery-1.2.5.js">jQuery Core 1.2.5</option>
  <option value="jquery-1.2.4.js">jQuery Core 1.2.4</option>
  <option value="jquery-1.2.4b.js">jQuery Core 1.2.4-b</option>
  <option value="jquery-1.2.4a.js">jQuery Core 1.2.4-a</option>
  <option value="jquery-1.2.3.js">jQuery Core 1.2.3</option>
  <option value="jquery-1.2.3b.js">jQuery Core 1.2.3-b</option>
  <option value="jquery-1.2.3a.js">jQuery Core 1.2.3-a</option>
  <option value="jquery-1.2.3.js">jQuery Core 1.2.3</option>
  <option value="jquery-1.2.3b.js">jQuery Core 1.2.3-b</option>
  <option value="jquery-1.2.3a.js">jQuery Core 1.2.3-a</option>
  <option value="jquery-1.2.2.js">jQuery Core 1.2.2</option>
  <option value="jquery-1.2.2b2.js">jQuery Core 1.2.2-b2</option>
  <option value="jquery-1.2.2b.js">jQuery Core 1.2.2-b</option>
  <option value="jquery-1.2.1.js">jQuery Core 1.2.1</option>
  <option value="jquery-1.2.js">jQuery Core 1.2.0</option>
  <option value="jquery-1.1.4.js">jQuery Core 1.1.4</option>
  <option value="jquery-1.1.3.js">jQuery Core 1.1.3</option>
  <option value="jquery-1.1.3a.js">jQuery Core 1.1.3-a</option>
  <option value="jquery-1.1.2.js">jQuery Core 1.1.2</option>
  <option value="jquery-1.1.1.js">jQuery Core 1.1.1</option>
  <option value="jquery-1.1.js">jQuery Core 1.1.0</option>
  <option value="jquery-1.1b.js">jQuery Core 1.1.0-b</option>
  <option value="jquery-1.1a.js">jQuery Core 1.1.0-a</option>
  <option value="jquery-1.0.4.js">jQuery Core 1.0.4</option>
  <option value="jquery-1.0.3.js">jQuery Core 1.0.3</option>
  <option value="jquery-1.0.2.js">jQuery Core 1.0.2</option>
  <option value="jquery-1.0.1.js">jQuery Core 1.0.1</option>
  <option value="jquery-1.0.js">jQuery Core 1.0.0</option>
</select>

Lodash debounce example

To ensure that a Javascript function only gets called once ever few seconds, you can run run it through debounce:

function run() {
   console.log('abc');
}
 
var realFunction = 
  _.debounce(run, 1500);
 
realFunction();
realFunction();
realFunction();

Output is:

abc

In this case, the function will only get run once.

If you pass an argument, it will be sent through to the function, but still only one call is made. This is why these work well as click handlers.

function run(a) {
   console.log(a);
}
 
var realFunction = 
  _.debounce(run, 1500);
 
realFunction('a');
realFunction('b');
realFunction('c');

Output is:

c

By default, debounce waits the interval before making the call.

If you want it to run immediately without a pause (if possible) do this:

var realFunction = 
  _.debounce(run, 1500, {leading: true});
 
realFunction('a'); 
realFunction('b'); 
realFunction('c');

Prints out:

a

Building a Squarespace integration with Node.js

Javascript integrations must be built carefully to avoid excessive rendering and network overhead. If well-built, they make it easy to get get complex functionality in a website, without being locked into the vendor (e.g. Squarespace).

I built a small application that uses a search engine to find articles you’ve written for other people, so you can update your writing portfolio if they disappear. This is essentially what you’d do manually to monitor this, and it gives you a list of articles you’ve written. For this article, I’m going to show how to embed this in a Squarespace site.

Here is a sample of the data we want to embed:

squarespace3

Squarespace is a subscription platform for people who want a blog and e-commerce tools, but don’t want the maintenance headaches of WordPress. The templates look pretty nice out of the box:

squarespace2

Squarespace does allow developer customizations but they focus on modifying the look and feel of the site, like changing page templates or CSS. You can customize a site by checking out the template with git, but for non-technical users this is a hassle.

Ideally we want to be able to write a piece of Javascript that can be droped into the page:
squarespace4

While this type of integration has a lot of power, it does break if a browser is configured to block Javascript.

Before we start building something, lets consider an instructive example, the Google Analytics tracking script:

var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-1570898-2']);
_gaq.push(['_gat._forceSSL']);
_gaq.push(['_trackPageview']);
 
(function () {
  var ga = document.createElement('script');
  ga.type = 'text/javascript';
  ga.async = true;
  ga.src = 
    ('https:' == document.location.protocol ? 
     'https://ssl' : 'http://www') + 
     '.google-analytics.com/ga.js';
  var s = document.getElementsByTagName('script')[0];
  s.parentNode.insertBefore(ga, s);
})();

Information about your account is set on global variables, then a script block is added to the page, which loads asynchronously. It accesses the script from different domains depending on whether you are on SSL or not. Once loaded, Google Analytics monitors for specific actions, and reports back to the server as operations it tracks occur.

When designing a Javascript integration the least amount of code should live on the page, as it will be nearly impossible to change once people are using it. It may even be valuable to plan on versioning the API.

The backend code we’re going to render is a simple ExpressJS script. I’ve chosen not to add CSS classes, so that this can inherit the styling of the site it is added to:

<ul>
<% for(var i = 0; i < alerts.length; i++) {%>
  <li><%= alerts[i].title %></li>
<% } %>
</ul>

We can then set up a simple Javascript script that can be injected into the page, like so:

<script src="//garysieling.com/squarespace/1/embed.js" async="" defer="defer"></script>

We set async and defer to prevent the script from blocking other rendering activities.

To handle SSL and non-ssl pages, I’ve chosen to use a protocol-less URL. This has few downsides. I haven’t found any documentation on why you’d want to use two domains (SSL and non-SSL) like Google Analytics does, but I suspect that this makes configuring a load balancer easier.

It’s worth noting that older implementations of this type of script used document.write to add the script to the page, which blocked rendering.

Note that because I put a user ID in the URL, this script won’t be cached across user accounts. There will also be separate copies cached for HTTP and HTTPs.

For the backend of this call, we set up a template in Express.js that can add our code to the page:

"use strict";
(function() {
  var template = '<template from above example>';
 
  var id = document.querySelector('&lt;%= selector %&gt;');
  document.querySelector(id).innerHTML = template; })();
});

This illustrates some of the challenges to this integration.

6d6f4bd5-6153-4875-ba61-0791e2e99fb3If you want to write the above code to use ES6, or have this minified, it should be done in advance and checked in. While there is middleware that can minifies code it incur add a performance penalty (realistically it is probably much worse than that of having the browser deal with some extra whitespace).

Squarespace has multiple configurable templates, and they use a UI framework (YUI3) that randomly names divs on the page. Because of this we can’t choose a reliable selector to allow adding widgets to the page.

To make this integration robust, we need a way to configure where the widget content is placed. This can be done by allowing creating a configuration setting for the CSS selector, which sets where their portfolio will be visible on the page.

Compare this to the Google Analytics script, which sets global variables on the window object- this clutters up the namespace, but allows for better caching.

Speaking of caching, to make this work reliably, if we use settings, we need to be able to cache the output effectively. One option is to cache for a short period (e.g. 30 seconds), or to allow for a ‘developer mode’ that sets this cache header to 0. A more robust implementation would use ETags. Here is a simple example:

app.get('/squarespace/:user_id/embed.js', (req, res) =&gt; {
  res.header("Content-Type", "application/javascript");
  res.header("Access-Control-Allow-Origin", "*");
  res.header("cache-control", "public, max-age=30, must-revalidate");
});

Passing a CSS selector to the script allows adding support for other platforms later and makes it easy to handle different themes. Alternately, the selector could also be set on the script tag itself as a data attribute:

<script 
  src="//aaa.herokuapp.com/embed.js" 
  async defer
  type="text/javascript" 
  data-name="6d6f4bd5-6153-4875-ba61-0791e2e99fb3" 
  data-id="div[data-type=&quot;page&quot;] div div div">
</script>

And then the actual Javascript to do the lookup becomes:

var id = 
  document.querySelector(
    'script[data-name="profile-script-6d6f4bd5-6153-4875-ba61-0791e2e99fb3"]'
  ).getAttribute("data-id");
 
document.querySelector(id).innerHTML = template;

The advantage of this approach is that it lets you configure the integration through the Squarespace UI, rather than opening two tabs. On the other hand, having integration parameters in your own application allows you to configure this for the client using the application, which makes support very simple.

By using a GUID, we can be reasonably sure that this won’t conflict with other scripts, but this does incur the performance penalty of using two DOM lookups instead of one.

Squarespace does not have a way that I can find to make Javascript only render on specific pages. Consequently, we also need to provide an option for users to filter which pages they want this change applied to (e.g. since this is a tool to display your portfolio, we might restrict it to “/about”).

For this application, I’ve allowed people to select specific pages to filter where the integration is applied.

We can add a piece of code that checks this:

if (window.location.pathname !== '/about') {
  ...
}

Alternately, if this was not available, we could use a regex to match the URL (this is hard coded to about):

(http|https):\/\/([^\/]*)\/\b(about)\b(\/)?([#?]|$)

If you find this interesting, there is a Manning Book called Third Party Javascript, which may be helpful as well (I haven’t read it yet, but hope to soon).

And there you have it! If you’re interested in monitoring a writing portfolio, email me at gary@garysieling.com to get early access to this tool.

Regular Expression to match Postgres Vacuum log statements

You can get log entries for every vacuum in Postgres by enabling this setting in postgresql.conf:

log_autovacuum_min_duration=0

Obviously you may prefer to increase the timing too, if you get a lot of entries. This is what the log entries look like:

automatic vacuum of table "db.pg_catalog.pg_class": index scans: 1
	pages: 0 removed, 143 remain
	tuples: 204 removed, 4498 remain
	buffer usage: 463 hits, 87 misses, 131 dirtied
	avg read rate: 1.654 MB/s, avg write rate: 2.490 MB/s
	system usage: CPU 0.00s/0.00u sec elapsed 0.41 sec

For monitoring purposes it is helpful to extract all these values. In testing this, I found that “system usage” was not always reported, so there are actually two regexes:

automatic vacuum of table "(?P&lt;vacuum_table_database>[^"]*)\.
(?P&lt;vacuum_table_schema&gt;[^"]*)\.(?P&lt;vacuum_table_name&gt'[^"]*)": 
index scans: (?P&lt;vacuum_index_scans&gt;\d*).*pages: 
(?P&lt;vacuum_pages_removed&gt;\d*) removed, 
(?P&lt;vacuum_pages_remain&gt;\d*) remain.*tuples: 
(?P&lt;vacuum_tuples_removed&gt;\d*) removed, 
(?P&lt;vacuum_tuples_remain&gt;\d*) remain.*buffer usage: 
(?P&lt;vacuum_buffer_usage_hits&gt;\d*) hits, 
(?P&lt;vacuum_buffer_usage_misses&gt;\d*) misses, 
(?P&lt;vacuum_buffer_usage_dirtied&gt;\d*) dirtied.*
avg read rate: (?P&lt;vacuum_avg_read_rate&gt;[0-9.]*) 
(?P&lt;vacuum_read_rate_units&gt;[^,]*), 
avg write rate: (?P&lt;vacuum_avg_write_rate&gt;[0-9.]*) 
(?P&lt;vacuum_write_rate_units&gt;[^,]*)

And:

.*system usage: CPU (?P&lt;vacuum_cpu_seconds&gt;[0-9.]*)s.
(?P&lt;vacuum_cpu_time&gt;[0-9.]*)u sec elapsed 
(?P&lt;vacuum_elapsed_time&gt;[0-9.]*) 
(?P&lt;vacuum_cpu_elapsed_unit&gt;.*)

Regular Expression for Postgres log messages warning of locks

To match locks in Postgres logs you can use the following regular expressions. The first matches messages that show tables being locked, and the second shows the application loading share locks.

.*user=(?P&lt;lock_user_name&gt;\w+),db=(?P&lt;lock_database&gt;\w+) 
LOG:\s+process (?P&lt;lock_process_id&gt;\d+) 
acquired (?P&lt;lock_type&gt;\w+) on (?&lt;lock_on&gt;\w+) 
(?P&lt;lock_tuple&gt;[\(\)0-9,]+) of (?P&lt;lock_object_type&gt;\w+) 
(?P&lt;lock_object_oid&gt;\d+) of database (?P&lt;lock_db_oid&gt;\d+) 
after (?P&lt;lock_wait_time&gt;[0-9.]+).*
.*user=(?P&lt;lock_user_name&gt;\w+),db=(?P&lt;lock_database&gt;\w+) 
LOG:\s+process (?P&lt;lock_process_id&gt;\d+) acquired 
(?P&lt;lock_type&gt;\w+) on transaction (?&lt;lock_transaction_id&gt;\d+) 
after (?P&lt;lock_wait_time&gt;[0-9.]+) .*

The useful thing about this is it allows you to see which queries wait on locks, and for how long.

To make these entries show up in the log, you need this in postgresql.conf:

log_lock_waits = on

Locks on tables are particularly interesting, because they will list both the rows and tables that are locked. However, they do this using internal identifiers.

You can look up the tables like so:

SELECT pg_class.oid, nspname, relname
FROM pg_class, pg_namespace
WHERE pg_class.relnamespace = pg_namespace.oid
  AND pg_class.oid = 1234

The log output will give you row IDs (representing where the row is stored). You can query this like so:

SELECT *
FROM mytable
WHERE ctid::text = '(0,1)'

If you are experiencing a lot of locks and are concerned, it is also worth trying to get the server name of the offending transaction, so that you can determine whether multiple servers in a farm are causing you issues, or several processes on the same server.

Map Showing the locations of Paris and Tours

Computer Science in the Real World Moving Information: From Pigeons to the Internet

The discipline of Computer Science affects us all, but for many people it is shrouded in mystery. Forged from a union of mathematics and electrical engineering, it beautifully blends the abstract and practical, although knowing this does little to clarify what that means to the non-initiate. The term “Informatik”, or “information science”, was coined in German universities in the late 1950’s to describe the discipline, by the late founding father of German Computer Science, Karl Steinbuch.

This really gets at the heart of the matter – Computer Science is about the acquisition, storage, and transmission of data. One of the founding legends of the field is about the Internet itself (originally a system named ArpaNet): that it was designed as a communication tool for military use that could withstand a nuclear attack. While the original aims were actually more modest, fundamental design principles for internet technologies do emphasize robustness, by providing many routes for information to travel between two individual points. This allows for the network to survive failures in particular regions of the network by routing information around the problem area.

Early academic papers on internet technology focus on building communication systems that transfer information using discrete packets of data. This stands in opposition to communication systems that transfer information in a single circuit. This can be easily understood by imagining mailing someone one book a page at a time, as compared to reading it to her over the phone. Mailing books by the page could send a book more quickly by transferring pages in parallel, but requires sorting the mail as it arrives.

The robustness of packet based systems justifies dealing with some fairly abstract problems. For instance, pages may arrive out of order and need to be re-assembled. If a page takes too long to arrive one might ask for it again, knowing that requesting it to be re-sent could actually lead to receiving it twice.

The original ARPA (now DARPA) organization was instrumental in supporting research that eventually formed the backbone of the internet. It was founded by the Eisenhower administration as a research unit within the U.S. Defense Department. Spurred on by fear of the Soviet launch of Sputnik, this organization has supported many high risk/high reward projects. In 2004, DARPA started a competition to encourage teams to build self-driving vehicles, which became a popular extracurricular project in schools with Computer Science programs.

Before DARPA and the internet there were many ingenious low-tech ways to transmit military communication. Lower tech communication methods worked at a slower speed and a smaller scale than today’s internet, but the underlying problems help us to understand equivalent issues we experience today.

Slideshow: “La Ville d’Orleans” 

On one November evening in Paris, a balloon named “La Ville d’Orleans” took off near midnight, carrying two men, pigeons, and most importantly mail. This wasn’t just any commercial mail. In 1870 France was at war with Prussia, and Paris was under siege. The balloon trip aimed to deliver a message from the Parisian governor to the French resistance, but after nearly twelve hours in flight in the wrong direction, it overshot its mark, landing in Norway.

Earlier in the war, the French started a balloon-based air mail service. This is thought to have processed 2-3 million letters over the course of a single year – demonstrating the enormous demand for the transmission of data, whether it be love letters or commercial contracts.

The pigeons carried on La Ville d’Orleans bear special note: pigeons were used to send a large amount of mail into Paris. Pigeons cannot carry as many letters as a hot air balloon, so the French cleverly encoded the mail onto collodion film slides, then re-transcribed the letters on arrival. The internet today, like the postal system, relies on a variety of routing techniques and vehicles for information delivery. Many homes today have at least three networks, each with their own internal design: wireless internet access, wired access (via ethernet), and a phone or cable line to the outside, although all three are typically bridged by a single device.

1
2
Map Showing the locations of Paris and Tours
Image from David Rumsey Maps 

Nearly a hundred years prior to the balloon incident, the U.S. Post Office was established by Congress, and when the seat of government moved from Philadelphia to Washington, DC, the Post Office moved its supplies with two horse-drawn wagons. As the nation and technology grew, mail began to be moved across the country by rail and early airplanes.

In 1919, the first transatlantic flights started, and almost immediately carried mail. Daytime mail flights from New York to Chicago soon began. As air flights expanded across the nation, new innovations– such as parachutes, lighted runways, and equipment optimized for lower altitude flights– were developed to alleviate the dangers of air mail (for a fascinating reference, see Air Mail, an Illustrated History).

Even before transcontinental air mail, the time savings of flying mail from New York to Chicago was enough to meet an earlier train, saving a full day on the trip across the country. This was funded by special stamps, allowing the consumer of the service to choose how their mail was prioritized and routed to its destination.

In modern times, the Post Office has many pricing options that control how mail is routed and delivered. For instance, this includes a service that allows external companies to separately route their own mail across, but use the Postal Service for delivery. Depending on where the mail goes, it may be scanned on a belt machine, automatically dropped into a truck, or sorted manually by mail carriers. The key here is that customers have some influence (the type of stamp they buy), but the mechanics of how to route a specific piece of mail from it’s current point to the next can be determined without all parties to the operation having to discuss it.

The same military and commercial drivers that influenced the development of postal mail also controlled the development of early mainframes, although as a consumer of computing, it is difficult sometimes to imagine the scale. According to one of the pioneers of Computer Science, Edgar Dijkstra, the IBM/360 cost more to develop than the Manhattan project (not adjusted for inflation).

Dijkstra was known for being a tough instructor. Originally from the Netherlands, he began his career as an academic in Cambridge, England, studying theoretical physics. While completing his studies he took a part-time position as a programmer at the Mathematical Centre in Amsterdam. At the time, they were trying to build their first computers. Since the computers were in development, programming was done on paper.

As a professor, he required students to submit their programming homework handwritten, noting that you could tell if someone’s thoughts were unclear if their work had many erasures. He observed that the labor of writing forces you to carefully consider whether the design of an algorithm is overly complex, providing an incentive to simplify, and being too poor to buy better equipment forces you to make better use of what you have.

When he encountered peers in the United States, he complained of a lack of rigor among software developers. Of particular concern was a tendency to work in an experimental fashion: trying things to see what works, rather than thinking problems through in advance. He was particularly critical of IBM’s work, and suggested that the greatest American victory in the Cold War was that the Soviets decided to clone the IBM/360 (a mainframe model that started shipping in 1966 ).

IBM/360 At NASA
An IBM/360 installed at NASA 

While completing his studies in theoretical physics, Dijkstra decided to become a programmer because he found the intellectual challenge of programming greater than that of theoretical physics. He discovered that programming was not appreciated in the university – to the physicists he was a deserter. Mathematicians looked down on programming because it requires the practitioner to operate in the world of the finite.

At the time of his graduation, the computer he’d been working on was unveiled – the awesomely named Automatische Rekenmachine MAthematische Centrum (ARMAC). In order to do a public demonstration of this new computing machine, the team needed a piece of software that could demonstrate the use of the equipment to the public. Earlier demonstrations showed random number generation. While random numbers may seem like an abstract use for a computing machine, consider that this is necessary to operate a lottery. Prior to computerization lotteries used public information to determine winners, such as the last digits of stock prices, or baseball scores.

For the release of the ARMAC, Dijkstra built an algorithm for finding routes on maps – a good problem for a machine that was now reliable. The software he built could find the shortest distance between two selected cities in the Netherlands, out of a list of 64.

In an interview, he described the development:

“What’s the shortest way to travel from Rotterdam to Groningen, in general: from given city to given city. It is the algorithm for the shortest path, which I designed in about twenty minutes. One morning I was shopping in Amsterdam with my young fiancée, and tired, we sat down on the café terrace to drink a cup of coffee and I was just thinking about whether I could do this, and I then designed the algorithm for the shortest path. As I said, it was a twenty-minute invention. In fact, it was published in ’59, three years late. The publication is still readable, it is, in fact, quite nice. One of the reasons that it is so nice was that I designed it without pencil and paper. … Eventually that algorithm became, to my great amazement, one of the cornerstones of my fame. I found it in the early ’60’s in a German book on management science – – “Das Dijkstra’sche Verfahren.” Suddenly, there was a method named after me.”

Dijkstra’s Algorithm

The algorithm, now termed “Dijkstra’s Algorithm” wasn’t published until three years later, in 1959. Dijkstra was asked by a friend if he had anything he could contribute to a German mathematical journal called Numerische Mathematik, so he submitted a write-up of the algorithm and another he’d designed. At the time this was not considered mathematics, but he felt that the additional material made it more compelling. This type of material is now treated as fundamental to Computer Science programs, under the heading of “Discrete Mathematics”. The shift in importance of non-infinite mathematics stems from being the foundation of a host of well-known technologies, including GPS.

Dijkstra’s algorithm assumes that you can identify a series of locations, paths between them, and a “cost” of each – cost can be based on the distance, the time it takes to travel, or how many pieces of mail your balloon can carry. Locations could be cities, but they can also be air strips or rail stations. Variants of this algorithm handle special cases, like pigeons that can only travel one way.

More sophisticated algorithms power the routing of information on the internet. GPS-style routing has omniscience about the structure of the network, whereas internet routing devices maintain tables that route data based on attributes of the address, and each entity that effectively forwards mail makes the best decision they can without knowing the final route. Following the postal analogy, a router in California might have a rule that zip codes from 10000-15000 are forwarded to a single peer in New York State, which would then fan this out to regional routers, who handle the final connection.

In the case of mail, you can also tie a “cost” function based on time or price to the type of stamp on your mail, you can route mail to the next stop by train or by plane, as appropriate, without knowing any information about the intent of the sender – a simple routing algorithm .

Much technology exists to improve the deliverability of information on the internet, like compressing technology that functions similarly to storing letters on micro-film. In fact, the routing of information on the internet actually solves a problem the French didn’t fully solve- it can re-send your mail when the pigeon is shot down by a sniper.

Russian Stamp Depicting a Balloon in Flight

Russian Stamp Depicting a Balloon in Flight
Enter your email below to receive future essays:

Node CommonJS Example Library Template

If you want to make a new CommonJS module for Node, you can use the following as a template file (we’ll call this ‘detector.js’, but obviously you can name it whatever you want):

"use strict";
 
let _ = require('lodash');
 
exports.functionA =
  (data) => {
    return data;
  };
 
exports.functionB =
  (data) => {
    return data;
  };

Then, when you want to use it, you can pull it in with require (note this is in the same folder as the module I’ve named detector):

"use strict";
 
let express = require('express'),
    _ = require('lodash'),
    Sequelize = require('sequelize'),
    passport = require('passport'),
    http = require('http'),
    db = require('./models'),
    detector = require('./detector');
 
app.get('/preview/:alert_id', (req, res) => {
  let id = req.params.alert_id;
 
  db.Alert.find(
    {where: {
      id: id,
      UserId: req.user.id
      }})
  .then(
    (alert) => detector.functionA(alert)
  )
  .then(
    (alert) => detector.functionB(alert)
  )
});

This example shows how you can use the functions defined in the detector module as well (this is in an ExpressJS app on heroku)

Sequelize: IS NULL / IS NOT NULL

Sequelize has a neat / bizarre syntax for making parts of WHERE clauses into JSON objects. For “IS NULL”, try this:

 db.Alert.find(
    {where: {
      UserId: req.params.user_id,
      url: {
        $ne: null
      }
    }})
    .then(
      (alert) => {
        // handle result
      }
    );

This renders the following SQL (formatted for readability):

SELECT 
   "id", "title", "text", "domain", 
   "lastRun", "lastSeen", "url", 
    "createdAt", "updatedAt", "UserId" 
FROM "Alerts" AS "Alert" 
WHERE "Alert"."UserId" = '1' 
AND "Alert"."url" IS NOT NULL 
LIMIT 1;

Note that this forces everything to be case sensitive, if you’re using Postgres. It’s kind of irritating actually, because this forces you to quote things when you query your own database.

You can also do “is null”, like so:

db.Alert.find(
  {where: {
    UserId: req.params.user_id,
    url: null
  }})
  .then(
    (alerts) => {
      res.render('embed', { alerts: alerts });
    }
  );

Sequelize Update example

There are a couple ways to update values in the Sequelize ORM:

db.Alert.update(
  { url: url },
  { 
    fields: ['url'],
    where: {id: id}
  }
);

If you have an object, this is also supposed to work:

var alert = 
  db.Alert.find(
    {where: {
      id: id,
      UserId: req.user.id
    }});
 
alert.update({
  url: 'https://www.garysieling.com/blog'
}).then(function() {})