RethinkDB: Starts with example

RethinkDB filters work on regular expressions. To do starts with queries, you can match the regular expression on the start of the data: r.db(’test’) .table(’users’) .filter( (doc) => doc(’user_name’).match("^.*ielin") ) Note that the end isn’t matched, so it will include results with various ends. 1 row returned. Displaying rows 1-1 first_name id last_name user_name 1 […]

Decision Tree Testing Lessons

I’m running some tests on sklearn decision trees, and the lessons learned so far may be interesting. I’ve put my measurement code at the end – I’m tracking % correct, number of tests that are positive, negative, and false positives and negatives. When running predictions, if you have a defect where you include the ‘answer’ […]

Fixing Solr error “no segments* file found in NRTCachingDirectory”

If you try to run a Solr DataImportHandler by running HTTP commands (e.g. from a build script), you may hit an error like this: “no segments* file found in NRTCachingDirectory” It appears that this indicates that Solr didn’t initialize properly. You can fix this by adding a second request, to hit the core directly: http://localhost:8080/solr/#/core/dataimport//dataimport

Summary of Weekly R News

Reading Interesting discussion on career paths and current buzzwords Statisticians Contemplate Their Own Extinction Multi-Armed Bandit Simulation Spare Matrices Statistics An interesting look at a fairly simple back-testing strategy over time: Calendar Based Sector Strategy A demonstration of how to draw an ellipse around 95% confidence interval on a 2-D plt Drawing a 95% Confidence […]

Optimizing WordPress Tag Pages

Normally I don’t like to write about “blogging,” but since website traffic generates some interesting data, it’s worth looking at it from a computer science perspective, to see the issues involved. By default, WordPress has two multi-valued fields associated with an article, “Categories” and “Tags.” Categories are treated as a closed, hierarchical set, and tags […]

Cobol v. Fortran

I thought it’d be interesting to compare how many people admit to knowing ancient programming languages on their LinkedIn pages. This is in a contrast to my post on the popularity of hip JVM languages Scala and Clojure. True to it’s reputation for scientific computation power, Fortran is primarily used by scientific organization – an […]

Generating ARFF files for Weka from Postgres

Since all my scraped data is in Postgres, this is the easiest way to get it out – the fastest iteration possible. At some point I’ll probably switch to a Java library. It’s interesting to see, but probably the only lesson from this is that all ETL scripts are ugly. WITH advertisers_ranked AS ( SELECT […]