A brief introduction to Weka

Weka is a GPL data mining tool written in Java, published by the University of Waikato. It includes an extensive series of pre-implemented machine learning algorithms, including well known classification and clustering algorithms. If you’ve ever been curious how Bayes Theorem works, this is a great tool to get up and running. Weka uses a […]

Expert Search Statistics

The following are some interesting statistics about the Github expert-finder. Unique repositories: 18,977 Source git repos (GB): 250+ GB Solr Index Size: 3.2 GB Time to build index: ~12 hours spread over several days (had to restart indexer several times) Number of commits:  4,579,236  

, ,

Advertisers used by banned sellers in Flippa auctions

In a previous post, I listed the top Flippa advertisers, gained through the node.js web scraper. Which advertisers are mentioned most often in auctions by banned sellers? As you can see, there is a big drop in the “unknown” category, and a big increase in banned accounts associated with Infolinks and CJ. After visual inspection, […]

ExtJS TreePanel Example

Problem You want to display a file manager style tree grid. Solution Use the Ext.chart.Chart, and set several properties under the “series” property to render a pie chart.   Discussion The official ExtJS documentation shows how to build a tree grid from an array, which I don’t find terribly helpful. More realistic applications require JSON […]


Diagnosing Connection Leaks in Node.js and Postgres

In building a website scraper with Chrome and Node.js, I made mistakes that led to connection leaks. In this application, the scraper runs in a browser and connects to a node.js server, which saves data off to a database. Once you know what the issues look like, they are easy to see, but otherwise often difficult […]

Advertisers referenced in Flippa auctions

The following data show revenue generating partners most commonly mentioned in Flippa auctions. This data is from approximately 76,000 auctions representing over 58,000 domains – note that in many cases more than one advertiser is mentioned per site. This is a broad section of revenue generation strategies, from CPC, CPA, affiliate sales, link sales, etc. […]