Testing ETL Processes

ETL (“extract, transform, load”) come in many shapes, sizes, and product types, and occur under many names – “data migration” projects, business intelligence software, analytics, reporting, scraping, database upgrades, and so on. I’ve collected some notes, attempting to classify these projects by their principal attributes, so that you can estimate the risks and plan the […]


My First A/B Test… with Results

A/B testing gets a lot of attention on Hacker News, inbound.org, and other forums, and appeals to me as a data analysis exercise. As a software engineer with a practical bent, I like the concept of data analysis techniques which produce useful results while treating a system as a black box. This stands in contrast […]


Lessons Learned from 0 to 40,000 Readers

Starting Out I started writing a little over a year ago, after finding “Technical Blogging” by Antonio Cangiano through Hacker News. Since then, a bit over 40,000 people have read articles I’ve written, not a huge number in the grand scheme of things, but enough to draw a few lessons. The more I write, the […]

Scala vs. Clojure

LinkedIn shows 42% growth (year over year?) people claiming Clojure as a skill – surprisingly beating out Scala’s 9%, a surprising feat for a lisp-variant. Turns out LinkedIn’s default view is misleading – Scala shows more new adopters (2.4k) vs  1.5k for Clojure. LinkedIn refuses to show counts for Java- but check out the number […]

Job Title Trends in Computing Fields

The Bureau of Labor Statistics creates a listing of job titles, average salary, number of jobs, and projections. Their taxonomy groups people into 750 job title categories, in some odd groupings. Few categories are set to show declines, particularly in any job type even vaguely related to the IT field. There are a few exceptions, […]

Top Four Proposal Software Applications

When you own your own business, everything takes time, but one of the most critical areas to spend your time is writing proposals. Proposals are key not only to getting work, but to getting the work you want at the right price. Using a proposal writing software is one way to cut down on the […]


Data Warehousing, NoSQL, and the Cloud

With the nascent advent of NoSql, cloud computing and slick new databases, we seem to have forgotten from whence we came. I went to a conference recently on the open source search product Solr/Lucene. One of the keynote speakers, Chief Data Scientist of HortonWorks, discussed what turned him to NoSQL databases, in this case, a […]

1/3 of old Flippa website auctions point to abandoned sites

Flippa is an auction site for buying and selling websites as businesses. Browsing the listings shows many low quality products. With careful inspection, there are often interesting, quality listings, but they are swallowed in the noise. Occasionally there are successful e-commerce sites, un-maintained high-traffic developer forums, or fire-sales on start-ups. Often these are educational, but […]

Advertisers used by banned sellers in Flippa auctions

In a previous post, I listed the top Flippa advertisers, gained through the node.js web scraper. Which advertisers are mentioned most often in auctions by banned sellers? As you can see, there is a big drop in the “unknown” category, and a big increase in banned accounts associated with Infolinks and CJ. After visual inspection, […]