Scraping Google Maps Search Results with Javascript and PHP

Google Maps provides several useful APIs for accessing data: a geocoding API to convert addresses to latitude and longitude, a search API to provide locations matching a term, and a details API for retrieving location metadata.

For many mapping tasks it is valuable to get a large list of locations (restaurants, churches, etc) – since this is valuable, Google places a rate limiter on the information, and encourages caching query results.

You can load a specific area of a map – the best way to find the starting point for the latitude and longitude is to enter an address in a geocoding API:

map = new google.maps.Map(document.getElementById('map-canvas'), {
  mapTypeId: google.maps.MapTypeId.ROADMAP,
  center: new google.maps.LatLng(curLat, curLong),
  zoom: 15,
  styles: [
    {
      stylers: [
        { visibility: 'simplified' }
      ]
    },
    {
      elementType: 'labels',
      stylers: [
        { visibility: 'off' }
      ]
    }
  ]
});

To run a search, you can use the radarSearch API, which appears to return up to 200 results. However, this only returns latitudes and longitudes – not place names or anything you’d really want to a full application.

google.maps.event.addListenerOnce(map, 'bounds_changed', performSearch);

function performSearch() {
  var request = {
    bounds: map.getBounds(),
    keyword: 'church'
  };
  service.radarSearch(request, callback);
}

Once that finishes, it runs a callback – in this we save off the results so far, and set up a timer to get the full address of each entity. I determined experimentally that the Maps API won’t let you run a query more than once every two seconds – this adds a little extra lag because I’d rather the script continue than risk an error being slightly too soon.

function callback(results, status) {
for (var i = 0, place; place = results[i]; i++) {
  createMarker(place);

  setTimeout(loadPlace, 2200 * i);
}

Each “place” is hydrated using the getDetails function on the maps API, then saved back to a server:

function loadPlace() { 
  place = places[placeIdx++];

  service.getDetails(place, 
    function(result, status) {
      if (status !=
      google.maps.places.PlacesServiceStatus.OK) {
        return;
    }
    $.post(
      "save.php",
      {text: JSON.stringify(result)},
      function() {
        next();
      });  
  });
}

This requires a simple PHP file- the results can be extracted later or used as a cache.

$text = $_POST['text'];
$json = json_decode($text, true);

$id = md5($text);
file_put_contents('db/' . $id, $text);

Up to this point, we only have the ability to script a specific segment of a map – in reality we likely want to loop back and forth across an area. I found a bounding box that encompasses Philadelphia and the surrounding counties relatively well experimentally, by loading the map in several areas until I found good edges.

Interestingly, Google Maps does not seem to have the same scale for latitude and longitude, as I found about one map unit area to be about 20x longitude as latitude (ideally this is slightly smaller than one box – this gives a little overlap and record a few entries twice)

var minLat = 39.873;
var minLong = -75.483;
var maxLat = 40.453;
var maxLong = -75.163;

var dLat = 0.01;
var dLong = 0.2;

Finally, we need to define a function which moves the current map location over to the right or down, back and forth, until we read the entire area we want:

function next() {
 if (placeIdx >= places.length) {
    curLat += dLat;
    if (curLat > maxLat) {
      curLong += dLong;
      curLat = minLat;
    }
    if (curLong <= maxLong) {
       setTimeout(initialize, 
         Math.max(
           2100, 
           2100 * (places.length - placeIdx)));
    }
  }
}

This function must be called in a few places- anywhere there could be an error or a finished task which would otherwise stop the script. If we don't do this, it will stop partway through:

if (status != google.maps.places.PlacesServiceStatus.OK) {
  placeIdx = 1000000;    
  next();
  return;
}

places = results;

if (!results) { 
  next();
  return;
}
if (results.length == 0) {
  next();
  return;
}