Gary Sieling

Scraping a list of links from a document into a CSV file

First, right click an element you are interested in, select “Inspect Element”. In the Developer Tools window, select “Copy XPath”. If all goes well, this will be an array valued path, and you can modify it slightly to return all nodes, instead of the selected item.

nodes = document.evaluate( '//*[@id="hm-lower-background"]/div/a' ,document, 
   null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null )

function csvEscape(value) {
  return '"' + value.replace(/"/g, '\\"') + '"'
}

for (var i = 0; i < nodes.snapshotLength; i++) { 
  var item = nodes.snapshotItem(i);
  if (item.innerText != "") {
    console.log(csvEscape(item.innerText) + "," + csvEscape(item.href)); 
  }
}

Which gives you output like this:

"Cathedral Basilica of SS Peter&Paul, Philadelphia PHILA. SOUTH","http://archphila.org/parishes/7000.php"
"St. Agatha , Philadelphia ","http://archphila.org/parishes/0.php"
"St. Adalbert (Polish) , Philadelphia PHILADELPHIA-NORTH","http://archphila.org/parishes/7485.php"
"St. Agatha-St. James , Philadelphia PHILADELPHIA-SOUTH","http://archphila.org/parishes/7490.php"
"St. Agnes , Sellersville BUCKS COUNTY","http://archphila.org/parishes/7500.php"
"St. Agnes , West Chester CHESTER COUNTY","http://archphila.org/parishes/7505.php"
"St. Agnes-St. John Nepomucene (Slovak) , Philadelphia PHILADELPHIA-SOUTH","http://archphila.org/parishes/7495.php"
"St. Albert the Great , Huntingdon Valley MONTGOMERY COUNTY","http://archphila.org/parishes/7510.php"
"St. Alice , Upper Darby DELAWARE COUNTY","http://archphila.org/parishes/7515.php"
"All Saints , Philadelphia PHILADELPHIA-NORTH","http://archphila.org/parishes/7010.php"
Exit mobile version