Java Geek and managing director@comSysto GmbH in Munich ... Spring over JavaEE, Apache Wicket over JSF, Gradle over Maven, Lean over Waterfall, exploring JavaFX, Highcharts, Android, AgileUX, Lean Startup. Daniel is a DZone MVB and is not an employee of DZone and has posted 40 posts at DZone. You can read more from them at their website. View Full User Profile

Java-R-Integration with JRI for On-Demand Predictions

07.13.2013
| 8416 views |
  • submit to reddit

This article provides you with a short overview of how to use JRI for using R from within a Java application. In particular, it will give you an understanding of how to use this technology for on-demand predictions based on R models.

Note: Trivial aspects such as constant definitions or exception handling are omitted in the provided code snippets.

What is JRI?

JRI is a Java/R Interface providing a Java API to R functionality. A JavaDoc specification of this interface (org.rosuda.JRI.Rengine) can be found here. The project homepage describes how to initially set up JRI in various environments.

Typical Use Cases for On-Demand Predictions via JRI

Classification or numeric prediction models embedded in R scripts can originate from legacy implementations or conscious decisions to use R for a certain use case. Typically, a static set of data is used to train and validate a model which can then be applied to another static set of unclassified data. However, this approach rather aims at deriving general insights from data sets than predicting concrete instances. It is in particular insufficient for systems with real-time user interaction, for example for custom welcome screens depending on the estimated value of a user that has just registered.

Hello R World from Java

After installing R as well as the JRI package, any Java application will be able to instantiate org.rosuda.JRI.Rengine after adding the corresponding JARs to its build path. The following simplistic example demonstrates how we can use the R interface.

import org.rosuda.JRI.Rengine;
import org.rosuda.JRI.REXP;




public class HelloRWorld {
Rengine rengine; // initialized in constructor or autowired




public void helloRWorld() {
rengine.eval(String.format("greeting <- '%s'", "Hello R World"));
REXP result = rengine.eval("greeting");
System.out.println("Greeting from R: "+result.asString());
}
}

Calling Rengine.eval(String) corresponds to typing commands to the R console and hence provides access to any required functionality. Note that even in this trivial example two separate calls share a common context which is maintained throughout the lifecycle of a Rengine instance. Objects of org.rosuda.JRI.REXP encapsulate any output from R to the user. Depending on the evaluated command, other methods than REXP.asString() may be suitable for extracting its result (see JavaDoc).

Running .R scripts from Java

Even though it would be possible to implement large R scripts in Java by passing each statement to Rengine.eval(String), this is much less convenient then writing or even re-using traditional .R scripts. So let’s have a look at how we can achieve the same result with a slightly different solution.

Project structure:

/src/main
/java
com.comsysto.jriexample.HelloRWorld2.java
/resources
helloWorld.R

helloWorld.R:

greeting <- 'Hello R World'

HelloRWorld2.java:

import org.rosuda.JRI.Rengine;
import org.rosuda.JRI.REXP;
import org.springframework.core.io.ClassPathResource;




public class HelloRWorld2 {
Rengine rengine; // initialized in constructor or autowired




public void helloRWorld() {
ClassPathResource rScript = new ClassPathResource("helloWorld.R");
rengine.eval(String.format("source('%s')",
rScript.getFile().getAbsolutePath()));
REXP result = rengine.eval("greeting");
System.out.println("Greeting from R: "+result.asString());
}
}

Any .R script can be executed like this and all variables it adds to the context will be accessible via JRI. However, this code does not work if the Java application is packaged as a JAR or WAR archive because the .R script will not have a valid absolute path. In this case, copying the script to a regular folder (e.g. java.io.tmpdir) at runtime and passing the temporary file to R is a feasible workaround.

Training R Models with Java Application Data

Now that we know how to execute .R scripts using JRI we are able to integrate prediction models based on R into a Java application. The only remaining question is: how can R access the required data for training the model? The easiest way is to use the following file-based approach. We will build a linear regression model that predicts y from x1 and x2.

  1. Extract suitable data from the Java persistence layer and store it in a temporary .csv file.
    import au.com.bytecode.opencsv.CSVWriter;
    
    
    
    
    private void writeTrainingDataToFile(File file) {
    CSVWriter writer = new CSVWriter(new FileWriter(file), ";");
    writer.writeNext(new String[] {"x1","x2","y"});
    for (Instance i : instances) {
    writer.writeNext(new String[] {i.x1, i.x2, i.y};
    }
    writer.close();
    }
  2. Pass the location of this file to R using JRI.
    public void trainRModel() {
    File trainingData = new File(TRAINING_DATA_PATH);
    writeTrainingDataToFile(trainingData);
    rengine.eval(String.format("trainingDataPath <- '%s'",
    trainingData.getAbsolutePath()));
    // trigger execution of .R script here
    }
  3. Execute a .R script to build the model. The script needs to be syntactically compatible with the extracted .csv file.
    # trainingDataPath injected from Java
    data <- read.csv(trainingDataPath, header=TRUE, sep=";");
    # use linear regression model as a trivial example
    model <- lm(y ~ x1+x2, data);

After executing this script, the resulting model will be available for any future calls until the entire application or the Rengine instance is re-initialized.

On-Demand Predictions using the R Model

With the knowledge we already have, predicting a new instance (x1,x2) with unknown y is now pretty straightforward:

public double predictInstance(int x1, int x2) {
rengine.eval(String.format("greeting <- data.frame(x1=%s,x2=%s)",
x1, x2));
REXP result = rengine.eval("predict(model, newData)");
return result.asDouble();
}

If you have any feedback, please write to Christian.Kroemer@comsysto.com!

Published at DZone with permission of Daniel Bartl, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)