How to Write a Wav File in Scala

The following code will play a chord to a Wav file using Scala. This is adapted from sample code and uses the library here to handle the file format. One of the nice things about Scala is the ease of using existing Java code, since it runs on the JVM. I started this project by […]

Finding Corporate Sponsors of Open Source

I copied about 19,000 git repositories into a full-text solr index. Because commits are tied to email addresses this provides interesting insight into corporate open source contributions. The search front-end I added lets you search for programmers or companies, grouped by the number of commits. For example, searching for Linux returns the following results: linux-foundation […]

Reading the Youtube API from PhantomJS

The following code will retrieve the duration of a video from the Youtube API. The output must be in JSON, because PhantomJS currently doesn’t handle the XML correctly (from email threads it appears this will be fixed in a future release) address = ‘http://gdata.youtube.com/feeds/api/videos/’ + id + ‘?v=2&alt=json’; page.open(address, function (status) { var duration = […]

, ,

Scraping Adsense Ads with PhantomJS

PhantomJS is a headless WebKit, which lets you run Javascript in a browser from the command line. It adds additional API calls which facilitate automated testing, screenshots, and scraping. I thought it would be interesting to write a script to retrieve Adsense destination URLs and text with PhantomJS. Extracting advertisement blocks requires fairly simple CSS […]

Don’t use Access-Control-Allow-Origin

Access-Control-Allow-Origin is an HTTP header that allows servers to specify which hosts may send cross domain AJAX requests. Let’s say you were building an ad network, fetching content via AJAX. You would add this header to HTTP responses, once for each allowed domain. Clearly this is not scalable, but it’s a bad idea for other reasons […]

1/3 of old Flippa website auctions point to abandoned sites

Flippa is an auction site for buying and selling websites as businesses. Browsing the listings shows many low quality products. With careful inspection, there are often interesting, quality listings, but they are swallowed in the noise. Occasionally there are successful e-commerce sites, un-maintained high-traffic developer forums, or fire-sales on start-ups. Often these are educational, but […]

Detecting parked domains

Looking at old Flippa auctions that I scraped, it would be interesting to determine if domains are parked. I found this post, which describes a few options, including checking DNS entries for redirects, finding a blacklist, or content inspection. Some people have built APIs, but none appear maintained, and DNS providers rate limit whois lookups. […]