Using the WordPress Rest API plugin you can easily get a JSON payload containing data from your blog.
If you use SSL, you likely will need to use Python 3, as this includes many bug fixes.
First, load the page text:
url = 'https://www.garysieling.com/blog/wp-json/wp/v2/posts?per_page=10&page=18' import urllib3 http = urllib3.PoolManager(10) response = http.request('GET', url)
Then parse it as JSON:
import json jsonData = response.data.decode('utf-8') posts = json.loads(jsonData)
From there, the blog post text is readily available:
post = posts["content"]["rendered"] title = posts["title"]["rendered"]
Then, you can easily rip out all the HTML tags (see this stackoverflow post for the source of this solution)
from html.parser import HTMLParser class MLStripper(HTMLParser): def __init__(self): self.reset() self.strict = False self.convert_charrefs= True self.fed =  def handle_data(self, d): self.fed.append(d) def get_data(self): return ''.join(self.fed) def strip_tags(html): s = MLStripper() s.feed(html) return s.get_data() clean = strip_tags(post)
Unfortunately, the summarization library does not support python 3. There is ap atch for this, and you can install it directly from github, like so:
pip install https://github.com/voneiden/PyTeaser/archive/py3.zip
Once you do this, you can get a summary for the given post (Lessons Learned from 0 to 40,000 Readers).
from pyteaser import Summarize " ".join(Summarize(title, clean))
This results in the following text, which gives a decent summary of the article:
'Since then, a bit over 40,000 people have read articles I’ve written, not a huge number in the grand scheme of things, but enough to draw a few lessons. Posts I’ve made received more votes, even though they are self posts, because they are at least relevant. In practice, I’ve written on wider subjects – anything within “full stack web development” is fair game, trying to focus on new, or popular tech – Scala, DevOps (Vagrant/Chef/Virtualization), Hadoop, R, and scraping. It’s the only thing I’ve written that seems to have received significant attention on Google+ (19 events on G+, 24 on Twitter). I’ve written several articles which have been posted to Twitter by 20+ people.'