{"id":6729,"date":"2023-01-29T17:47:39","date_gmt":"2023-01-29T17:47:39","guid":{"rendered":"https:\/\/www.garysieling.com\/blog\/?p=6729"},"modified":"2023-01-29T17:48:12","modified_gmt":"2023-01-29T17:48:12","slug":"geocoding-with-python","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/geocoding-with-python\/","title":{"rendered":"Geocoding with Python"},"content":{"rendered":"\n<p>The script below will take a CSV containing locations and find them.<\/p>\n\n\n\n<p>It has the following features:<\/p>\n\n\n\n<ul><li>You can swap out geocoding services<\/li><li>For each run, it will look at the results of the past run as a cache<\/li><li>For runs that hit an API, it will pause briefly to avoid rate limiting<\/li><li>It caches the results as it goes, so that you do not hit an API too many times<\/li><li>It prints out stats on how many rows failed to find matches, so that you can correct the input<\/li><li>It handles blank rows &#8211; this allows you to copy from a partially populated spreadsheet, and pasted back into the original without alignment issues<\/li><\/ul>\n\n\n\n<p>Usage:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python3 geocode.py input.csv output.csv address<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>from geopy.geocoders import Nominatim\nfrom functools import cache\n<code data-enlighter-language=\"generic\" class=\"EnlighterJSRAW\">import sys<\/code>\nimport csv\nimport time\nimport os\n\ninputfile = sys.argv&#91;1]\noutputfile = sys.argv&#91;2]\ninputlocation = sys.argv&#91;3]\n\nprint(\"Reading: \" + inputfile)\nprint(\"Writing: \" + outputfile)\nprint(\"Address Column: \" + inputlocation)\n\ngeolocator = Nominatim(user_agent=\"tracker\")\n\nclass stats:\n    lookups = 0\n    totalrows = 0\n    skips = 0\n    unknowns = 0\n    successes = 0\n    priorruncachehits = 0\n\n@cache\ndef lookup(value):\n    stats.lookups = stats.lookups + 1\n    return geolocator.geocode(locationvalue, language=\"en\")\n\nprevious = {}\nif (os.path.exists(outputfile)):\n    with open(outputfile, newline='') as f:\n        reader = csv.DictReader(f)\n\n        for row in reader:\n            locationvalue = row&#91;inputlocation]\n            resolved = row&#91;'resolved_' + inputlocation]\n\n            if (resolved != None and resolved != \"\"):\n                print('resolved: ' + resolved + \" \"  + str(row))\n                cachedlocation = {}\n                cachedlocation&#91;inputlocation] = locationvalue\n                cachedlocation&#91;'address'] = resolved\n                cachedlocation&#91;'latitude'] = row&#91;'latitude']\n                cachedlocation&#91;'longitude'] = row&#91;'longitude']\n\n                previous&#91;locationvalue] = cachedlocation\nwith open(outputfile, 'w+', newline='') as f:\n    writer = csv.writer(f)\n\n    headers = &#91;inputlocation, 'resolved_' + inputlocation, 'latitude', 'longitude']\n\n    writer.writerow(headers)\n    with open('input.csv', newline='') as csvfile:\n        reader = csv.DictReader(csvfile)\n        for row in reader:\n            stats.totalrows = stats.totalrows + 1\n            locationvalue = row&#91;inputlocation]\n\n            print(locationvalue)\n            if (locationvalue != \"\"):\n                if (locationvalue in previous.keys()):\n                    stats.priorruncachehits = stats.priorruncachehits + 1\n                    writer.writerow(&#91;locationvalue, previous&#91;locationvalue]&#91;'address'], previous&#91;locationvalue]&#91;'latitude'], previous&#91;locationvalue]&#91;'longitude']])\n                else:\n                    location = lookup(locationvalue)\n                    if (location == None):\n                        stats.unknowns = stats.unknowns + 1\n                        print(\"Failed to find \" + locationvalue)\n                        writer.writerow(&#91;locationvalue, \"\", \"\", \"\"])\n                    else:\n                        stats.successes = stats.successes + 1\n                        print(location)\n                        writer.writerow(&#91;locationvalue, location.address, location.latitude, location.longitude])\n                        time.sleep(1)\n            else:\n                stats.skips = stats.skips + 1\n                writer.writerow(&#91;\"\", \"\", \"\", \"\"])\n\n\nprint(\"rows: \" + str(stats.totalrows))\nprint(\"successes: \" + str(stats.successes))\nprint(\"lookups: \" + str(stats.lookups))\nprint(\"blank lines: \" + str(stats.skips))\nprint(\"unknowns: \" + str(stats.unknowns))\nprint(\"cache hits from prior run: \" + str(stats.priorruncachehits))<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>The script below will take a CSV containing locations and find them. It has the following features: You can swap out geocoding services For each run, it will look at the results of the past run as a cache For runs that hit an API, it will pause briefly to avoid rate limiting It caches &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.garysieling.com\/blog\/geocoding-with-python\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Geocoding with Python&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[4,6],"tags":[152,634,632,447],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/6729"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=6729"}],"version-history":[{"count":3,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/6729\/revisions"}],"predecessor-version":[{"id":6732,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/6729\/revisions\/6732"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=6729"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=6729"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=6729"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}