{"id":1536,"date":"2013-08-02T12:33:22","date_gmt":"2013-08-02T12:33:22","guid":{"rendered":"http:\/\/garysieling.com\/blog\/?p=1536"},"modified":"2020-03-31T00:46:31","modified_gmt":"2020-03-31T00:46:31","slug":"examining-citations-in-federal-law-using-python","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/examining-citations-in-federal-law-using-python\/","title":{"rendered":"Examining Citations in Federal Law using Python"},"content":{"rendered":"<p>Congress frequently passes laws which amend or repeal sections of prior laws; this produces a series of edits to law which programmers will recognize as bearing resemblance to source control history.<\/p>\n<p>In concept this is simple, but in practice this is incredibly complex &#8211; for instance like source control, the system must handle renumbering. What we will see below is that while it is possible to get some data about links, it is difficult to resolve what those links point to.<\/p>\n<p>Here is an example paragraph where, rather than amending a law, the citation serves as a justification for why several words are absent in one section:<\/p>\n<pre lang=\"xml\">\nIn subsection (b)(3)(C), the words \u201cand the EPA \nAdministrator may prescribe rules for purposes of carrying \nout this subparagraph\u201d are omitted as surplus because of the \nauthority of the Administrator to prescribe regulations\nunder 49:32910(d). The amendment made by section 371(b)(2)\nof the North American Free Trade Implementation Act (<ref href=\"\/us\/pl\/103\/182\">Public Law 103\u2013182<\/ref>, <ref href=\"\/us\/stat\/107\/2128\">107 Stat. 2128<\/ref>) is not given \neffect because the last sentence of section 503(b)(2)(E) of \nthe Motor Vehicle and Cost Savings Act (<ref href=\"\/us\/pl\/92\/513\">Public Law 92\u2013513<\/ref>, <ref href=\"\/us\/stat\/86\/947\">86 Stat. 947<\/ref>) was omitted in \nthe restatement of title 49 because of the authority of the \nAdministrator to prescribe regulations under 49:32910(d).\n\n<\/pre>\n<p>We can find all of these references by tracing through the XML documents Congress provides using xpath expressions. It&#8217;s worth noting here that there are actually several forms of citation (&#8220;a href&#8221;, &#8220;ref href&#8221;), so if you want to do something specific you <a href=\"http:\/\/uscodebeta.house.gov\/download\/resources\/USLM-User-Guide.pdf\">should consult the documentation<\/a> (warning: PDF).<\/p>\n<pre lang=\"python\">hrefs = {}\ntitles = {}\nfor root, dirs, files in os.walk(\".\"):\n  for f in files:\n    if f.endswith('.xml'):\n      tree = ET.parse(f)\n      root = tree.getroot()\n      h = {t.attrib.get('href'): f + ' ' + t.text \\\n          for t in tree.findall('.\/\/{http:\/\/xml.house.gov\/schemas\/uslm\/1.0}ref')}\n      hrefs = dict(hrefs.items() + h.items())\n<\/pre>\n<p>This will take a few minutes to generate. Once we have this we can count the links to see which ones are most frequent.<\/p>\n<pre lang=\"python\">cnt = Counter()\ncounts = [(ref, c) for (ref, c) in Counter(hrefs).most_common(10)]\n<\/pre>\n<p>And then we can print out the common links, files they are contained in, and text. It&#8217;s worth noting that you want to use the python print function to do this, because there are a lot of special unicode characters, such as the section symbol.<\/p>\n<pre>\/us\/pl\/85\/726 usc49.xml Public Law 85\u2013726: 287\n\/us\/stat\/72\/731 usc49.xml 72 Stat. 731: 260\n\/us\/act\/1936-06-29\/ch858 usc47.xml act June 29, 1936, ch. 858: 240\n\/us\/act\/1949-06-30\/ch288 usc51.xml act June 30, 1949, ch. 288: 195\n None: 160\n\/us\/act\/1950-05-05\/ch169\/s1 usc50.xml act May 5, 1950, ch. 169, \u00a7\u202f1: 140\n\/us\/pl\/88\/365 usc49.xml Public Law 88\u2013365: 111\n\/us\/stat\/78\/302 usc49.xml 78 Stat. 302: 111\n\/us\/stat\/80\/938 usc50.xml 80 Stat. 938: 75\n\/us\/pl\/92\/513 usc49.xml Public Law 92\u2013513: 75\n<\/pre>\n<p>Unfortunately there is currently no easy way to resolve these links. The documentation for the XML here describes a hypothetical system which could resolve arbitrary links, which would identify links within the provided XML, outside (to other laws) or to absolute URLs.<\/p>\n<p>Even if we have this system, we still need to develop a way of traversing up the hierarchy. For instance, to answer a query like &#8220;what law or set of laws have been amended the most times,&#8221; we will need to find citations in bills which have passed and filter to citations which amend prior laws. This may require following a string of citations to determine what is effective. Those citations will then need to be rolled up to the level of a statute, which ideally means stepping up a few levels in the XML and finding a nearby section which identifies a section title.<\/p>\n<p>Here, we&#8217;re only looking at U.S. Federal Law &#8211; there are many other documents which contain legal references &#8211; if you are interested in this topic, you may want to read up <a href=\"http:\/\/westlawinsider.com\/legal-research\/reference-attorney-tips\/05-06-13\/\">on what KeyCite does<\/a>. All in all, it&#8217;s quite a complex system.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Congress frequently passes laws which amend or repeal sections of prior laws; this produces a series of edits to law which programmers will recognize as bearing resemblance to source control history. In concept this is simple, but in practice this is incredibly complex &#8211; for instance like source control, the system must handle renumbering. What &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.garysieling.com\/blog\/examining-citations-in-federal-law-using-python\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Examining Citations in Federal Law using Python&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[5,6],"tags":[335,385,447,495,604,605],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/1536"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=1536"}],"version-history":[{"count":1,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/1536\/revisions"}],"predecessor-version":[{"id":6476,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/1536\/revisions\/6476"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=1536"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=1536"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=1536"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}