Finding Parties Named in U.S. Law using Python and NLTK

U.S. Law periodically names specific institutions; historically it is possible for Congress to write a law naming an individual, although I think that has become less common. I expect the most common entities named in Federal Law to be groups like Congress. It turns out this is true, but the other most common entities are […]

Examining Citations in Federal Law using Python

Congress frequently passes laws which amend or repeal sections of prior laws; this produces a series of edits to law which programmers will recognize as bearing resemblance to source control history. In concept this is simple, but in practice this is incredibly complex – for instance like source control, the system must handle renumbering. What […]

U.S. Code Available in XML Format

I saw today that the U.S. Code is available online now for download in a structured format. Ideas for apps, anyone? It’s worth noting that this was available in some form already, e.g. through Cornell. To give you a taste of what is there, I extracted a few interesting sections. The first part is some […]


Part of Speech Tagging: NLTK vs Stanford NLP

One of the difficulties inherent in machine learning techniques is that the most accurate algorithms refuse to tell a story: we can discuss the confusion matrix, testing and training data, accuracy and the like, but it’s often hard to explain in simple terms what’s really going on. Practically speaking this isn’t a big issue from […]

, ,

Exploring Zipf’s Law with Python, NLTK, SciPy, and Matplotlib

Zipf’s Law states that the frequency of a word in a corpus of text is proportional to it’s rank – first noticed in the 1930’s. Unlike a “law” in the sense of mathematics or physics, this is purely on observation, without strong explanation that I can find of the causes. We can explore this concept […]