Data Exploration in Javascript

Google Analytics has a nice screen which shows alerts for changes that appear interesting – basically any large increase or decrease in traffic from a particular source: With appropriate API hooks, this screen could be built for any application that models data in a dimensional fashion, e.g. that uses faceted navigation (like Amazon search), or […]


Visualizing Pigments with D3.js

“In the early 1900s, scientists found that a liquid or solid heated to high temperatures would give off a broad range of colours of light. However, a gas heated to similar temperatures would emit light only at certain specific colours (wavelengths). The reason for this observation was not understood at the time.” [1] I’ve wondered […]

, , , , ,

Parsing PDFs at Scale with Node.js, PDF.js, and Lunr.js

Technologies used: Vagrant + Virtualbox, Node.js, node-static, Lunr.js, node-lazy, phantomjs Much information is trapped inside PDFs, and if you want to analyze it you’ll need a tool that extracts the text contents. If you’re processing many PDFs (XX millions), this takes time but parallelizes naturally. I’ve only seen this done on the JVM, and decided […]

, , , ,

Full-Text Indexing PDFs in Javascript

I once worked for a company that sold access to legal and financial databases (as they call it, “intelligent information“). Most court records are PDFS available through PACER, a website developed specifically to distribute court records. Meaningful database products on this dataset require building a processing pipeline that can extract and index text from the […]

, , , ,

Building a Naive Bayes Classifier in the Browser using Map-Reduce

The last decade of Javascript performance improvements in the browser provide exciting possibilities for distributed computing. Like SETI and Folding@Home, client-side javascript could be used to build a distributed super-computer, although at the risk of compromising data security and consistency. New HTML5 APIs extend the vast range of Javascript libraries available; for instance, the audio […]

What I learned from a failed iPad app

About a year ago, I thought I’d build a small news app that played a series of videos from youtube- a news channel for people without cable. Fortunately, in the mean time, someone built this. There are several interesting possibilities from this – to see news that is not normally available in your country, and […]