“Comparing Hosted Database Performance” on blog

One of my articles, Comparing Hosted Database Performance was featured on the blog. This discusses a technique for comparing the quality of hosted databases. Since many providers have differing offerings, I’ve run a series of tests to generate a sort of fingerprint of an environment, to compare before and after a migration.

Visualizing Six Million Files and Folders

Each year there are nearly 300,000 of these in Federal Federal Civil Court, 1.3-1.6 million in Federal Bankruptcy Court, but this pales in comparison to state courts, which accept just over 100 million cases each year. Even a small extract of these takes up a fair amount of space: This is what a court docket […]

, ,

Building an Directory Structure Index in Python

I’m working through examples in “Natural Language Processing with Python” (read my review) and found that the corpus I have to work with is large enough to require special performance tuning exercises. If you have a large enough directory structure, it becomes difficult to walk with os.walk – for instance any failure in longer scripts […]

, ,

Scraping Adsense Ads with PhantomJS

PhantomJS is a headless WebKit, which lets you run Javascript in a browser from the command line. It adds additional API calls which facilitate automated testing, screenshots, and scraping. I thought it would be interesting to write a script to retrieve Adsense destination URLs and text with PhantomJS. Extracting advertisement blocks requires fairly simple CSS […]

Diagnosing Disk I/O issues in a VPS

Every so often, my Linode goes into a state of apparent frantic I/O. Page loads slow down a bit, and I get regular email alerts indicating a potential problem: Subject: Linode Alert – disk io rate Your Linode, linode90147, has exceeded the notification threshold (800) for disk io rate by averaging 2146.05 for the last […]