U.S. Laws vs. The Human Genome

Since you can download the U.S. Code, I thought it would be interesting to compare the size to that of the Human Genome, operating on the premise that the latter represents the DNA for a living thing, and the former, the DNA for a nation.

I’ve charted this below – to reproduce this you need to plot the sizes of the compressed file for each genome. Using the compressed form rather than uncompressed means that these numbers represent the amount of unique information encoded in a file, rather than counting superfluous data like whitespace and repetitive symbols (if we were to use uncompressed files, this would make make the U.S. Code look quite large by comparison- ~486 MB)

Here, we have see the sizes of genomic data for many species:

figure_1

If we zoom in on the left, we can add the U.S. Code:

figure_2

It looks like it’s near in size to a few types of fish – if you could obtain relevant state laws, this would likely jump quite a bit (my home state of Pennsylvania does not easily allow downloading a copy of all the laws at once).

Should you wish to access this data or reproduce my results, it is available as a simple python script:

sizes = {"Lizard": 492,
"Human": 778,
"Alpaca": 738,
"Armadillo": 902,
"Cod": 238,
"Baboon": 826,
"budgerigar ": 312,
"Bushbaby": 630,
"Cat": 615,
"Chicken": 296,
"Chimp": 823,
"coelacanth ": 796,
"Cow": 745,
"Dog": 603,
"Dolphin": 646,
"Elephant": 800,
"Ferret": 603,
"Fugu": 107,
"Gibbon": 737,
"Gorilla": 758,
"Hedgehog": 901,
"Kangaroo": 545,
"Lamprey": 238,
"Manatee": 775,
"Marmoset": 727,
"Ground Finch": 305,
"Megabat": 500,
"Microbat": 507,
"Mouse": 682,
"Lemur": 722,
"Naked Mole-rat": 653,
"Tilapia": 261,
"Monodelphis domestica ": 907,
"Painted Turtle": 714,
"Panda": 577,
"Pig": 702,
"Pika": 844,
"Rabbit": 682,
"Rat": 725,
"Rhesus": 743,
"Rock Hyrax": 751,
"Sheep": 718,
"Shrew": 796,
"Sloth": 627,
"Squirrel Monkey": 652,
"Tasmanian Devil": 920,
"Tenrec": 947,
"Tetroadon": 98,
"Tree Shrew": 908,
"Turkey": 257,
"Wallaby": 787,
"Rhino": 613,
"US Federal Code": 84,
"Zebrafish": 355,
"Yeast": 2.9}


import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()

width = .1
ind = np.arange(len(sizes))
values = sizes.values()
values.sort()
plt.bar(ind, values)

plt.ylabel("Megabytes (Compressed)")

keys = sizes.keys()
keys.sort(lambda a, b: int(round(sizes[a] - sizes[b])))
plt.xticks(ind + width / 2, keys)

fig.autofmt_xdate(rotation = 90)

plt.show()
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()

width = .1
ind = np.arange(len(sizes))
values = sizes.values()
values.sort()
plt.bar(ind, values)


plt.ylabel("Megabytes (Compressed)")

keys = sizes.keys()
keys.sort(lambda a, b: int(round(sizes[a] - sizes[b])))
plt.xticks(ind + width / 2, keys)

plt.xticks(rotation=90)

ax1 = fig.add_subplot(111)
bars = ax1.bar(range(0,len(sizes)), range(0,len(sizes)), color='blue', edgecolor='black')

bars[1].set_facecolor('red')
bars[1].set_height(sizes["US Federal Code"])

plt.subplots_adjust(left=0.125, right=0.9, top=0.9, bottom=0.2, wspace=0.2, hspace=0.2)

plt.show()