, ,

Generate summaries for your WordPress blog posts using Python

Using the WordPress Rest API plugin you can easily get a JSON payload containing data from your blog. If you use SSL, you likely will need to use Python 3, as this includes many bug fixes. First, load the page text: url = ‘https://www.garysieling.com/blog/wp-json/wp/v2/posts?per_page=10&page=18′   import urllib3 http = urllib3.PoolManager(10) response = http.request(’GET’, url) Then […]

, ,

Finding template documents with RethinkDB and the Bing API in Python

Finding template documents with RethinkDB and the Bing API in Python Many businesses require new clients to fill out a form to pre-qualify them or to get information to quote a project. Given a list of industries, we can use search engine results to filter the industry list to just those that require forms (you […]

, ,

Auditing Data Modifications in Postgres

Implementing Auditing Storing every change to an application’s database allows for sophisticated forensic analysis- usage trends over time, as a long-range debugger or for implementing data correction features more typically found in version control software, like ‘cherry-pick’ or ‘revert’. Many products require this in the form of an audit trail, which in the simplest case […]

,

Mining Association Rules with R and Postgres

In one of my earlier pieces I explored decision trees in python, which lets you to train a machine learning algorithm to predict or classify data. I like this style of model because the model itself is valuable; I’m more interested in finding underlying patterns than attempting to predict the future. Decision trees are nice […]

,

Extracting Dates and Times from Text with Stanford NLP and Scala

Stanford NLP is a library for text manipulation, which can parse and tokenize natural language texts. Typically applications which operate on text first split the text into words, then annotate the words with their part of speech, using a combination of heuristics and statistical rules. Other operations on the text build upon these results with […]

,

Decision Trees: “Gini” vs. “Entropy” criteria

The scikit-learn documentation has an argument to control how the decision tree algorithm splits nodes: criterion : string, optional (default=”gini”) The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. It seems like something that could be important since this determines the […]

, , ,

Discovering Corporate Open Source Contributions

A year or two ago, I saw a Microsoft’s director deliver the keynote at a conference. He claimed at the time that Microsoft had entered a new era of supporting open source development, which raised a few eyebrows considering their history. This was at a Solr conference, which made me think it’d be interesting to […]

,

Talk Summary: What is Acunu?

I recently attended a talk by a sales engineer for Acunu (http://www.acunu.com/), an analytics platform for Cassandra. I came away with a couple interesting notes: – The product aims to build data cubes for you in a “big data” scenario – Operating on the principle that disk space is cheap, they increment lots of counters […]