{"id":1518,"date":"2013-07-31T20:45:32","date_gmt":"2013-07-31T20:45:32","guid":{"rendered":"http:\/\/garysieling.com\/blog\/?p=1518"},"modified":"2020-03-31T00:46:31","modified_gmt":"2020-03-31T00:46:31","slug":"u-s-code-available-in-xml-format","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/u-s-code-available-in-xml-format\/","title":{"rendered":"U.S. Code Available in XML Format"},"content":{"rendered":"<p>I saw today that the U.S. Code is available <a href=\"http:\/\/uscodebeta.house.gov\/download\/download.shtml\">online now for download<\/a> in a structured format. Ideas for apps, anyone?<\/p>\n<p>It&#8217;s worth noting that this was available in some form already, e.g. through <a href=\"http:\/\/www.law.cornell.edu\/uscode\/text\/50\/chapter-5\/subchapter-I\">Cornell<\/a>. <\/p>\n<p>To give you a taste of what is there, I extracted a few interesting sections. The first part is some metadata about  dc:creator is set to USCConverter 1.0, which would suggest that they built a custom tool to do this. <\/p>\n<pre lang=\"xml\">\n<meta>\n  <dc:title>Title 50<\/dc:title>\n  <dc:type>USCTitle<\/dc:type>\n  <docNumber>50<\/docNumber>\n  <docPublicationName>Online@113-21<\/docPublicationName>\n  <dc:publisher>OLRC<\/dc:publisher>\n  <dcterms:created>2013-07-26T05:19:57<\/dcterms:created>\n  <dc:creator>USCConverter 1.0<\/dc:creator>\n<\/meta>\n<\/pre>\n<p>Some of the headings are split up in this somewhat odd fashion. <\/p>\n<pre lang=\"xml\">\n<num value=\"50\">Title 50\u2014<\/num><heading>WAR AND NATIONAL DEFENSE<\/heading>\n<\/pre>\n<p>What I found most interesting is the ease with which you can extract citations. They seem to have used a GUID (or similar) structure for IDs. These have the nice property of making it easy to generate unique identifiers, without cross-referencing other locations. The downside is that they are not &#8220;natural&#8221; keys, meaning it looks like you can&#8217;t infer anything about where you are.<\/p>\n<p>An open question to investigate &#8211; how much text would have to change over time in a section to trigger a new ID? These also points out the lack of historical information &#8211; you have to manually get old values for this text. Even armed with this citation information, you need access to court records, as these clarify, amend, or remove sections of law. It&#8217;d also be interested to know how \/ when the text in these documents are updated if a court strikes down a law. Right now, the only way I know of to get access to court records easily is through PACER (or RECAP, which has an archive of a small fraction).<\/p>\n<pre lang=\"xml\">\n<subsection class=\"indent0\" id=\"idd035386a-f63e-11e2-8470-abc29ba29c4d\"\n identifier=\"\/us\/usc\/t50\/s3618\/e\">\n<num value=\"e\">(e)<\/num>\n<heading> Crediting of amounts collected<\/heading>\n<content>\n<p style=\"-uslm-lc:I11\" class=\"indent0\">Amounts collected under this \nsection shall be credited to the account or accounts from which costs \nassociated with such amounts have been or will be incurred, to reimburse \nor offset the direct costs of the program referred to in subsection (a).<\/p>\n<\/pre>\n<p>Another selection &#8211; the interesting bit to me here is they&#8217;ve included all sorts of random formatting information. You could probably use this to train an NLP algorithm to extract some form of context, but it would be a lot of work.<\/p>\n<pre lang=\"xml\">\n<tr style=\" -uslm-lc:II01; \">\n<td style=\" text-align:left; vertical-align:top; \nborder-right:1px solid black; padding-right:2pt;\">\n<p style=\" text-align:left; text-indent: -1em; padding-left:1em;\">\n401 note \n(<a href=\"\/us\/act\/1947-07-26\/ch343\">Act July 26, 1947, ch. 343<\/a>,\ntitle III, \u00a7\u202f310, \n<a href=\"\/us\/stat\/61\/509\">61 Stat. 509<\/a>)\n<\/p>\n<\/td>\n<td style=\" text-align:left; vertical-align:top; border-left:1px \nsolid black; padding-left: 2pt;\">\n<p style=\" text-align:left; text-indent: -1em; padding-left:1em;\">3077<\/p>\n<\/td>\n<\/tr>\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I saw today that the U.S. Code is available online now for download in a structured format. Ideas for apps, anyone? It&#8217;s worth noting that this was available in some form already, e.g. through Cornell. To give you a taste of what is there, I extracted a few interesting sections. The first part is some &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.garysieling.com\/blog\/u-s-code-available-in-xml-format\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;U.S. Code Available in XML Format&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[5,6],"tags":[335,385],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/1518"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=1518"}],"version-history":[{"count":1,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/1518\/revisions"}],"predecessor-version":[{"id":6505,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/1518\/revisions\/6505"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=1518"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=1518"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=1518"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}