Finding all images in HTML files over a certain size with Python BeautifulSoup

This example shows how to use the Beautiful Soup library to find all images referenced in a bunch of html files, then filter to a particular size range – this works well to take out header images, logos, tracking pictures, etc. This assumes a system where you mirrored a website’s directory structure with wget. Unlike […]

Making Maps with Tilemill

TileMill is a piece of map-making software for rendering beautiful maps. You can export the maps to MapBox, for a Google Maps feel or combine with a tool like D3.js for interactive infographics. There are a surprising number of data sources: weather, earthquake locations, crime statistics, and ship and plane locations. A lot of this is from federal and municipal agencies […]

Processing Command Line Arguments in Java

Rather than parsing command line arguments yourself, Apache has a nice library to do it for you, called Apache Commons CLI. It has a few different options, for various flavors of parsing, although this example demonstrates the two most common use cases (I think) – a flag, and setting a value. The nice thing about […]

Javascript to remove line number, author, revision columns from Fisheye/Crucible

Fisheye puts a bunch of useful columns in code reviews, but they’re irritating if you want to copy code out, because they copy too: I’ve found it helpful to create bulk reviews to view patches, where the code is spread across many repositories (CVS + Git + many revisions + many branches, don’t ask). The […]

Proxying HTTP requests with PHP

The following code will proxy requests to an external API. This has several advantages: Control over an API key Set caching headers to prevent overuse of an API Prevent issues with cross-domain scripting errors Limit the scope of what APIs can be called through your proxy $query = urlencode($_GET[’query’]); $url = ”;   $url = […]

Identifying important keywords using Lunr.js and the Blekko API

Lunr.js is a simple full-text engine in Javascript. Full text search ranks documents returned from a query by how closely they resemble the query, based on word frequency and grammatical considerations – frequently occurring words have minimal effect, whereas if a rare word occurs in a document several times, it boosts the ranking significantly. This […]


Data Exploration in Javascript

Google Analytics has a nice screen which shows alerts for changes that appear interesting – basically any large increase or decrease in traffic from a particular source: With appropriate API hooks, this screen could be built for any application that models data in a dimensional fashion, e.g. that uses faceted navigation (like Amazon search), or […]


Visualizing Pigments with D3.js

“In the early 1900s, scientists found that a liquid or solid heated to high temperatures would give off a broad range of colours of light. However, a gas heated to similar temperatures would emit light only at certain specific colours (wavelengths). The reason for this observation was not understood at the time.” [1] I’ve wondered […]