Skip to content

Intro To Python R Interoperability With Radian

Gordon Fleetwood edited this page Oct 28, 2019 · 3 revisions

Overview

Python and R were once rivals in the Data Science space, but time has drawn the antagonism from the either of the languages users. The prevailing advice urges learning both languages to take advantage of the strengths of both ecosystems. In practice however, this meld is far from seamless, with most people preferring to some some section of analysis in one language, before moving a data object first to disk, and then to the other.

Efforts like rpy2 and rchitect (accessing R from Python) in Python and R's reticulate (accessing Python from R) greatly lessened the overheard and, more importantly, integrated with the languages' preferred IDEs - the Jupyter notebook and RStudio respectively. The missing portion was an IDE which supported a seamless conduit between for data, objects,and code. That IDE could have been the Beaker notebook, which supported seamlessly switching between languages in a notebook environment. Unfortunately, it - and its current status as a set of Jupyter extensions called Beaker X - only support data transfer.

radian is not an IDE, but it fills the needs for a high level of communication between Python and R through its design and support of both reticulate and rchitect. What lies below is an outline of an interoperability Python and R workflow built on radian, reticulate and rchitect.

Preparation

Before using radian ensure that Python (3) and R are installed on your machine. Also install radian, rchitect, reticulate, the tidyverse R library and the numpy, pandas, and scikit-learn libraries for Python.

When you're done open a terminal, type radian, and press Enter to enter the radian shell. The interface begins with r$>.

> radian
r$>

Modeling The Iris

Let's start with a simple example: building a regression model on the iris data set.

r$> df = iris
r$> mdl = lm(Petal.Width ~ ., data = df) 

To enter Python mode press Ctrl + ~. The terminal will now begin with >>>. Then import rchitect as rc as well as numpy and pandas.

The r from reticulate gives Python access to some R objects. r.df contains the copy of the iris data set used to create the mdl. However, calling r.mdl with throw an error. That's where rchitect comes in. rc.reval("mdl") will print out information about the model object. For simplicity's sake, let's only access R objects using rchitect. We can add both the data and the model by using rc.reval. Finally, we can make predictions using rchitect's rcall function to call R's predict method.

The exit command returns us to the R shell.

>>> import rchitect as rc
>>> import numpy as np
>>> import pandas as pd
>>> df_py = rc.reval("df")
>>> mdl_py = rc.reval("mdl")
>>> preds = rc.rcall("predict", mdl_py, df_py)
>>> exit
r$>

You can do the same with a saved model object (representing the same model created above).

r$> mdl_saved = readRDS("rgr_iris.rds")
r$> # Ctrl + ~ to re-enter Python mode
>>> mdl_saved_py = rc.reval("mdl_saved")
>>> preds = rc.rcall("predict", mdl_saved_py, df_py)
>>> 

The type of df_py or mdl_py is a special rchitect class called RObject. They can be converted to Python objects (ordered dictionaries) by using rchitect's rcopy function. The robject function does the opposite. This doesn't work as one would hope for model objects. An R model converted to Python and then back to R would just be a list.

Let's try doing this in the other direction. I'm going to load a Python model equivalent to the one created above and make predictions from it in both Python and R. In the first instance I first have to convert the RObject df_py to a Python object and then to a pandas dataframe before making a prediction. In the second instance I can access the Python model through the py reticulate object.

>>> modelpy = load("clf_iris.joblib") 
>>> df_pypy = pd.DataFrame(rc.copy(df_py)).drop(["Species"], axis = 1)
>>> preds = modelpy.predict(df_pypy) 
>>> exit
r$> preds = py$modelpy$predict(select(df, -Species))

Conclusion

This short exploration concentrated on data and model transfer which were the two biggest obstacles to interoperability between Python and R. With those two out of the way, there is little else barring the way to a seamless workflow between Data Science's most popular languages.

Clone this wiki locally