Building a statistical significance testing web service powered by R

R is a programming language focused on solving statistical and mathematical calculations. R programs often operate on large, in-memory data sets, which feels somewhat similar to database programming. Examples in the R Cookbook bear a resemblence to functional programming in clojure, as others have noted.

I’ve been exploring the language to gain insight into related, but disparate technologies that I use with regularity (e.g. Postgres), but for this to be really useful, I’d like to see R behind a webservice. Looking through the official website, there are many defunct attempts at using R in this manner, often abandoned once the maintainer finishes their masters.

A couple have survived, notably Rook and rApache. Rook is a web server inside of R, and rApache, as you might guess, is an Apache module that calls R. I’ve chosen rApache, as I’d like to have a battle-tested front-end for this – while R seems to have very committed maintainers, there do not seem to be very many of them, and I have yet to find examples of anyone running this as a production application.

Inspired by WolframAlpha’s APIs, I built a small web service to test statistical significance. In the future I intend to do tests on performance and security, as well as available JSON libraries.

Here is the installation procedure:

apt-get upgrade
apt-get update
apt-get install r-base r-base-dev 
apt-get install apache2-mpm-prefork apache2-prefork-dev 
apt-get install git-core
git clone https://github.com/jeffreyhorner/rapache.git
cd rapache
./configure
make
make test
make install
vi /etc/apache2/httpd.conf

Apache configuration settings:

 
LoadModule R_module /usr/lib/apache2/modules/mod_R.so
 
<Location /RApacheInfo>
SetHandler r-info
</Location>
 
ROutputErrors
 
<Directory /var/www/R>
        SetHandler r-script
        RHandler sys.source
</Directory>
/etc/init.d/apache2 restart

And these are the contents of ws.R:

 
setContentType("application/json")
 
zscore<-function(p, pc, N, Nc){ (p-pc) 
     / sqrt(p * (1-p) / N + pc * (1-pc) / Nc) }
significant<-function(p, pc, N, Nc){ 
     zscore(p, pc, N, Nc) > 1.65 }
 
valid<-function(x){ nchar(x) < 10 }
 
if (!valid(GET$pc) 
 || !valid(GET$p) 
 || !valid(GET$N) 
 || !valid(GET$Nc)) {
  cat('error:arg length')
} else {
cat(significant(as.numeric(GET$p), 
                as.numeric(GET$pc), 
                as.numeric(GET$N), 
                as.numeric(GET$Nc)))
}
 
OK

For instance, the output of http://localhost:8080/R/ws.R?p=.15&pc=.10&N=1000&Nc=1100
is “TRUE”

Tags: , , , ,

1 comment so far ↓

#1 larry on 10.18.12 at 4:52 pm

I would take a look at package Rserve. I’ve heard good things about it for web services with R.

Leave a Comment

Current day month ye@r *