Entries by Gary


Counting Citations in U.S. Law

The U.S. Congress recently released a series of XML documents containing U.S. Laws. The structure of these documents allow us to find which sections of the law are most commonly cited. Examining which citations occur most frequently allows us to see what Congress has spent the most time thinking about. Citations occur for many reasons: […]

Examining Citations in Federal Law using Python

Congress frequently passes laws which amend or repeal sections of prior laws; this produces a series of edits to law which programmers will recognize as bearing resemblance to source control history. In concept this is simple, but in practice this is incredibly complex – for instance like source control, the system must handle renumbering. What […]

U.S. Laws vs. The Human Genome

Since you can download the U.S. Code, I thought it would be interesting to compare the size to that of the Human Genome, operating on the premise that the latter represents the DNA for a living thing, and the former, the DNA for a nation. I’ve charted this below – to reproduce this you need […]

U.S. Code Available in XML Format

I saw today that the U.S. Code is available online now for download in a structured format. Ideas for apps, anyone? It’s worth noting that this was available in some form already, e.g. through Cornell. To give you a taste of what is there, I extracted a few interesting sections. The first part is some […]

Converting JSON to a CSV file with Python

In a previous post, I showed how to extract data from the Google Maps API, which leaves a series of JSON files, like this: {“address_components”: [{“long_name”:”576″,”short_name”:”576″,”types”:[“street_number”]}, {“long_name”:”Concord Road”,”short_name”:”Concord Road”,”types”:[“route”]},{“long_name”:”Glen Mills”,”short_name”:”Glen Mills”,”types”:[“locality”,”political”]},{“long_name”:”PA”,”short_name”:”PA”,”types”:… Ideally we want selections from these as a CSV for manual review, and import into mapping software. First, we load a list of files: […]

Generating Randomized Sample Data in Python

If you have access to a production data set, it is helpful to generate testing data which follows a similar format, in varying quantities. By introspecting a database, we can identify stated constraints. Given sufficient data volume, we can also infer implicit business process constraints. If preferred, we can also find records that may generate […]