Posts

Parsing Code in a Search Engine with Node.js

In my last article I presented a simple tool for discovering senior developers from source code history, which imports Git history into a search engine. This lets you do searches like “who is our Sharepoint expert,” or “who fixed all our Oracle problems.” There are a few code search products that handle similiar problems, e.g. […]

Functional Programming Patterns in Four Popular Javascript Libraries

I generally find discussions of design patterns a bit dry, but in testing new Javascript libraries, I’ve stumbled across some interesting tactics. Object oriented design patterns are typical not a perfect fit to Javascript, given it’s untyped nature. The language lends itself more to powerful functional programming techniques. I find studying libraries particularly helpful, as […]

,

Building a full-text index of git commits using lunr.js and Github APIs

Github has a nice API for inspecting repositories – it lets you read gists, issues, commit history, files and so on. Git repository data lends itself to demonstrating the power of combining full text and faceted search, as there is a mix of free text fields (commit messages, code) and enumerable fields (committers, dates, committer […]

, , , ,

Full-Text Indexing PDFs in Javascript

I once worked for a company that sold access to legal and financial databases (as they call it, “intelligent information“). Most court records are PDFS available through PACER, a website developed specifically to distribute court records. Meaningful database products on this dataset require building a processing pipeline that can extract and index text from the […]