Posts

Scraping a List of Adsense Sites Within a Niche

One of the challenges in web crawling and scraping is determining which URLs to scrape. It’s easy for a site to have many urls that aren’t visited by humans, like a stock photo site that uses an API to supplement its data. Sites with sessionid parameters or dynamic content may make many duplicate or similar […]

, ,

Scraping Adsense Ads with PhantomJS

PhantomJS is a headless WebKit, which lets you run Javascript in a browser from the command line. It adds additional API calls which facilitate automated testing, screenshots, and scraping. I thought it would be interesting to write a script to retrieve Adsense destination URLs and text with PhantomJS. Extracting advertisement blocks requires fairly simple CSS […]