So, how come we are able to use TensorFlow from R?
Which pc language is most intently related to TensorFlow? Whereas on the TensorFlow for R weblog, we might in fact like the reply to be R, likelihood is it’s Python (although TensorFlow has official bindings for C++, Swift, Javascript, Java, and Go as effectively).
So why is it you’ll be able to outline a Keras mannequin as
(good with %>%
s and all!) – then prepare and consider it, get predictions and plot them, all that with out ever leaving R?
The quick reply is, you’ve keras
, tensorflow
and reticulate
put in.
reticulate
embeds a Python session inside the R course of. A single course of means a single deal with house: The identical objects exist, and might be operated upon, no matter whether or not they’re seen by R or by Python. On that foundation, tensorflow
and keras
then wrap the respective Python libraries and allow you to write R code that, in truth, appears like R.
This submit first elaborates a bit on the quick reply. We then go deeper into what occurs within the background.
One notice on terminology earlier than we leap in: On the R facet, we’re making a transparent distinction between the packages keras
and tensorflow
. For Python we’re going to use TensorFlow and Keras interchangeably. Traditionally, these have been totally different, and TensorFlow was generally considered one attainable backend to run Keras on, in addition to the pioneering, now discontinued Theano, and CNTK. Standalone Keras does nonetheless exist, however latest work has been, and is being, performed in tf.keras. After all, this makes Python Keras
a subset of Python TensorFlow
, however all examples on this submit will use that subset so we are able to use each to seek advice from the identical factor.
So keras, tensorflow, reticulate, what are they for?
Firstly, nothing of this is able to be attainable with out reticulate
. reticulate is an R package deal designed to permit seemless interoperability between R and Python. If we completely needed, we may assemble a Keras mannequin like this:
<class 'tensorflow.python.keras.engine.sequential.Sequential'>
We may go on including layers …
m$add(tf$keras$layers$Dense(32, "relu"))
m$add(tf$keras$layers$Dense(1))
m$layers
[[1]]
<tensorflow.python.keras.layers.core.Dense>
[[2]]
<tensorflow.python.keras.layers.core.Dense>
However who would need to? If this had been the one approach, it’d be much less cumbersome to instantly write Python as an alternative. Plus, as a person you’d must know the whole Python-side module construction (now the place do optimizers stay, at the moment: tf.keras.optimizers
, tf.optimizers
…?), and sustain with all path and title modifications within the Python API.
That is the place keras
comes into play. keras
is the place the TensorFlow-specific usability, re-usability, and comfort options stay.
Performance offered by keras
spans the entire vary between boilerplate-avoidance over enabling elegant, R-like idioms to offering technique of superior function utilization. For example for the primary two, take into account layer_dense
which, amongst others, converts its models
argument to an integer, and takes arguments in an order that enable it to be “pipe-added” to a mannequin: As an alternative of
mannequin <- keras_model_sequential()
mannequin$add(layer_dense(models = 32L))
we are able to simply say
mannequin <- keras_model_sequential()
mannequin %>% layer_dense(models = 32)
Whereas these are good to have, there’s extra. Superior performance in (Python) Keras largely depends upon the power to subclass objects. One instance is customized callbacks. Should you had been utilizing Python, you’d must subclass tf.keras.callbacks.Callback
. From R, you’ll be able to create an R6 class inheriting from KerasCallback
, like so
It’s because keras
defines an precise Python class, RCallback
, and maps your R6 class’ strategies to it.
One other instance is custom models, launched on this weblog about a year ago.
These fashions might be skilled with customized coaching loops. In R, you employ keras_model_custom
to create one, for instance, like this:
m <- keras_model_custom(title = "mymodel", perform(self) {
self$dense1 <- layer_dense(models = 32, activation = "relu")
self$dense2 <- layer_dense(models = 10, activation = "softmax")
perform(inputs, masks = NULL) {
self$dense1(inputs) %>%
self$dense2()
}
})
Right here, keras
will be certain an precise Python object is created which subclasses tf.keras.Mannequin
and when known as, runs the above nameless perform()
.
In order that’s keras
. What concerning the tensorflow
package deal? As a person you solely want it when it’s important to do superior stuff, like configure TensorFlow machine utilization or (in TF 1.x) entry components of the Graph
or the Session
. Internally, it’s utilized by keras
closely. Important inside performance consists of, e.g., implementations of S3 strategies, like print
, [
or +
, on Tensor
s, so you’ll be able to function on them like on R vectors.
Now that we all know what every of the packages is “for”, let’s dig deeper into what makes this attainable.
Present me the magic: reticulate
As an alternative of exposing the subject top-down, we observe a by-example method, build up complexity as we go. We’ll have three eventualities.
First, we assume we have already got a Python object (that has been constructed in no matter approach) and must convert that to R. Then, we’ll examine how we are able to create a Python object, calling its constructor. Lastly, we go the opposite approach spherical: We ask how we are able to go an R perform to Python for later utilization.
Situation 1: R-to-Python conversion
Let’s assume we’ve created a Python object within the world namespace, like this:
So: There’s a variable, known as x, with worth 1, residing in Python world. Now how will we deliver this factor into R?
We all know the principle entry level to conversion is py_to_r
, outlined as a generic in conversion.R
:
py_to_r <- perform(x) {
ensure_python_initialized()
UseMethod("py_to_r")
}
… with the default implementation calling a perform named py_ref_to_r
:
#' @export
<- function(x) {
py_to_r.default
[...]<- py_ref_to_r(x)
x
[...] }
To find out more about what is going on, debugging on the R level won’t get us far. We start gdb
so we can set breakpoints in C++ functions:
$ R -d gdb
GNU gdb (GDB) Fedora 8.3-6.fc30
[... some more gdb saying hello ...]
Reading symbols from /usr/lib64/R/bin/exec/R...
Reading symbols from /usr/lib/debug/usr/lib64/R/bin/exec/R-3.6.0-1.fc30.x86_64.debug...
Now start R, load reticulate
, and execute the assignment we’re going to presuppose:
(gdb) run
Starting program: /usr/lib64/R/bin/exec/R
[...]
R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
Copyright (C) 2019 The R Foundation for Statistical Computing
[...]
> library(reticulate)
> py_run_string("x = 1")
So that set up our scenario, the Python object (named x
) we want to convert to R. Now, use Ctrl-C to “escape” to gdb
, set a breakpoint in py_to_r
and type c
to get back to R:
(gdb) b py_to_r
Breakpoint 1 at 0x7fffe48315d0 (2 locations)
(gdb) c
Now what are we going to see when we access that x
?
> py$x
Thread 1 "R" hit Breakpoint 1, 0x00007fffe48315d0 in py_to_r(libpython::_object*, bool)@plt () from /home/key/R/x86_64-redhat-linux-gnu-library/3.6/reticulate/libs/reticulate.so
Here are the relevant (for our investigation) frames of the backtrace:
Thread 1 "R" hit Breakpoint 3, 0x00007fffe48315d0 in py_to_r(libpython::_object*, bool)@plt () from /home/key/R/x86_64-redhat-linux-gnu-library/3.6/reticulate/libs/reticulate.so
(gdb) bt
#0 0x00007fffe48315d0 in py_to_r(libpython::_object*, bool)@plt () from /home/key/R/x86_64-redhat-linux-gnu-library/3.6/reticulate/libs/reticulate.so
#1 0x00007fffe48588a0 in py_ref_to_r_with_convert (x=..., convert=true) at reticulate_types.h:32
#2 0x00007fffe4858963 in py_ref_to_r (x=...) at /home/key/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include/RcppCommon.h:120
#3 0x00007fffe483d7a9 in _reticulate_py_ref_to_r (xSEXP=0x55555daa7e50) at /home/key/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include/Rcpp/as.h:151
...
...
#14 0x00007ffff7cc5fc7 in Rf_usemethod (generic=0x55555757ce70 "py_to_r", obj=obj@entry=0x55555daa7e50, call=call@entry=0x55555a0fe198, args=args@entry=0x55555557c4e0,
rho=rho@entry=0x55555dab2ed0, callrho=0x55555dab48d8, defrho=0x5555575a4068, ans=0x7fffffff69e8) at objects.c:486
We’ve removed a few intermediate frames related to (R-level) method dispatch.
As we already saw in the source code, py_to_r.default
will delegate to a method called py_ref_to_r
, which we see appears in #2. But what is _reticulate_py_ref_to_r
in #3, the frame just below? Here is where the magic, unseen by the user, begins.
Let’s look at this from a bird’s eye’s view. To translate an object from one language to another, we need to find a common ground, that is, a third language “spoken” by both of them. In the case of R and Python (as well as in a lot of other cases) this will be C / C++. So assuming we are going to write a C function to talk to Python, how can we use this function in R?
While R users have the ability to call into C directly, using .Call
or .External
, this is made much more convenient by Rcpp : You simply write your C++ perform, and Rcpp takes care of compilation and offers the glue code essential to name this perform from R.
So py_ref_to_r
actually is written in C++:
// [[Rcpp::export]]
(PyObjectRef x) {
SEXP py_ref_to_rreturn py_ref_to_r_with_convert(x, x.convert());
}
but the comment // [[Rcpp::export]]
tells Rcpp to generate an R wrapper, py_ref_to_R
, that itself calls a C++ wrapper, _reticulate_py_ref_to_r
…
py_ref_to_r <- function(x) {
.Call(`_reticulate_py_ref_to_r`, x)
}
which lastly wraps the “actual” factor, the C++ perform py_ref_to_R
we noticed above.
By way of py_ref_to_r_with_convert
in #1, a one-liner that extracts an object’s “convert” function (see beneath)
// [[Rcpp::export]]
(PyObjectRef x, bool convert) {
SEXP py_ref_to_r_with_convertreturn py_to_r(x, convert);
}
we finally arrive at py_to_r
in #0.
Before we look at that, let’s contemplate that C/C++ “bridge” from the other side – Python.
While strictly, Python is a language specification, its reference implementation is CPython, with a core written in C and much more functionality built on top in Python. In CPython, every Python object (including integers or other numeric types) is a PyObject
. PyObject
s are allocated through and operated on using pointers; most C API functions return a pointer to one, PyObject *
.
So this is what we expect to work with, from R. What then is PyObjectRef
doing in py_ref_to_r
?
PyObjectRef
is not part of the C API, it is part of the functionality introduced by reticulate
to manage Python objects. Its main purpose is to make sure the Python object is automatically cleaned up when the R object (an Rcpp::Environment
) goes out of scope.
Why use an R environment to wrap the Python-level pointer? This is because R environments can have finalizers: functions that are called before objects are garbage collected.
We use this R-level finalizer to ensure the Python-side object gets finalized as well:
::RObject xptr = R_MakeExternalPtr((void*) object, R_NilValue, R_NilValue);
Rcpp(xptr, python_object_finalize); R_RegisterCFinalizer
python_object_finalize
is interesting, as it tells us something crucial about Python – about CPython, to be precise: To find out if an object is still needed, or could be garbage collected, it uses reference counting, thus placing on the user the burden of correctly incrementing and decrementing references according to language semantics.
inline void python_object_finalize(SEXP object) {
* pyObject = (PyObject*)R_ExternalPtrAddr(object);
PyObjectif (pyObject != NULL)
(pyObject);
Py_DecRef}
Resuming on PyObjectRef
, note that it also stores the “convert” feature of the Python object, used to determine whether that object should be converted to R automatically.
Back to py_to_r
. This one now really gets to work with (a pointer to the) Python object,
(PyObject* x, bool convert) {
SEXP py_to_r//...
}
and – but wait. Didn’t py_ref_to_r_with_convert
pass it a PyObjectRef
? So how come it receives a PyObject
instead? This is because PyObjectRef
inherits from Rcpp::Environment
, and its implicit conversion operator is used to extract the Python object from the Environment
. Concretely, that operator tells the compiler that a PyObjectRef
can be used as though it were a PyObject*
in some concepts, and the associated code specifies how to convert from PyObjectRef
to PyObject*
:
operator PyObject*() const {
return get();
}
* get() const {
PyObject= getFromEnvironment("pyobj");
SEXP pyObject if (pyObject != R_NilValue) {
* obj = (PyObject*)R_ExternalPtrAddr(pyObject);
PyObjectif (obj != NULL)
return obj;
}
::stop("Unable to access object (object is from previous session and is now invalid)");
Rcpp}
So py_to_r
works with a pointer to a Python object and returns what we want, an R object (a SEXP
).
The function checks for the type of the object, and then uses Rcpp to construct the adequate R object, in our case, an integer:
else if (scalarType == INTSXP)
return IntegerVector::create(PyInt_AsLong(x));
For other objects, typically there’s more action required; but essentially, the function is “just” a big if
–else
tree.
So this was scenario 1: converting a Python object to R. Now in scenario 2, we assume we still need to create that Python object.
Scenario 2:
As this scenario is considerably more complex than the previous one, we will explicitly concentrate on some aspects and leave out others. Importantly, we’ll not go into module loading, which would deserve separate treatment of its own. Instead, we try to shed a light on what’s involved using a concrete example: the ubiquitous, in keras
code, keras_model_sequential()
. All this R function does is
function(layers = NULL, name = NULL) {
keras$models$Sequential(layers = layers, name = name)
}
How can keras$models$Sequential()
give us an object? When in Python, you run the equivalent
tf.keras.models.Sequential()
this calls the constructor, that is, the __init__
method of the class:
class Sequential(training.Model):
def __init__(self, layers=None, name=None):
# ...
# ...
So this time, before – as always, in the end – getting an R object back from Python, we need to call that constructor, that is, a Python callable. (Python callable
s subsume functions, constructors, and objects created from a class that has a call
method.)
So when py_to_r
, inspecting its argument’s type, sees it is a Python callable (wrapped in a PyObjectRef
, the reticulate
-specific subclass of Rcpp::Environment
we talked about above), it wraps it (the PyObjectRef
) in an R function, using Rcpp:
::Function f = py_callable_as_function(pyFunc, convert); Rcpp
The cpython-side action starts when py_callable_as_function
then calls py_call_impl
. py_call_impl
executes the actual call and returns an R object, a SEXP
. Now you may be asking, how does the Python runtime know it shouldn’t deallocate that object, now that its work is done? This is taken of by the same PyObjectRef
class used to wrap instances of PyObject *
: It can wrap SEXP
s as well.
While a lot more could be said about what happens before we finally get to work with that Sequential
model from R, let’s stop here and look at our third scenario.
Scenario 3: Calling R from Python
Not surprisingly, sometimes we need to pass R callbacks to Python. An example are R data generators that can be used with keras
models .
In general, for R objects to be passed to Python, the process is somewhat opposite to what we described in example 1. Say we type:
This assigns 1
to a variable a
in the python main module.
To enable assignment, reticulate
provides an implementation of the S3 generic $<-
, $<-.python.builtin.object
, which delegates to py_set_attr
, which then calls py_set_attr_impl
– yet another C++ function exported via Rcpp.
Let’s focus on a different aspect here, though. A prerequisite for the assignment to happen is getting that 1
converted to Python. (We’re using the simplest possible example, obviously; but you can imagine this getting a lot more complex if the object isn’t a simple number).
For our “minimal example”, we see a stacktrace like the following
#0 0x00007fffe4832010 in r_to_py_cpp(Rcpp::RObject_Impl<Rcpp::PreserveStorage>, bool)@plt () from /home/key/R/x86_64-redhat-linux-gnu-library/3.6/reticulate/libs/reticulate.so
#1 0x00007fffe4854f38 in r_to_py_impl (object=..., convert=convert@entry=true) at /home/key/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include/RcppCommon.h:120
#2 0x00007fffe48418f3 in _reticulate_r_to_py_impl (objectSEXP=0x55555ec88fa8, convertSEXP=<optimized out>) at /home/key/R/x86_64-redhat-linux-gnu-library/3.6/Rcpp/include/Rcpp/as.h:151
...
#12 0x00007ffff7cc5c03 in dispatchMethod (sxp=0x55555d0cf1a0, dotClass=<optimized out>, cptr=cptr@entry=0x7ffffffeaae0, method=method@entry=0x55555bfe06c0,
generic=0x555557634458 "r_to_py", rho=0x55555d1d98a8, callrho=0x5555555af2d0, defrho=0x555557947430, op=<optimized out>, op=<optimized out>) at objects.c:436
#13 0x00007ffff7cc5fc7 in Rf_usemethod (generic=0x555557634458 "r_to_py", obj=obj@entry=0x55555ec88fa8, call=call@entry=0x55555c0317b8, args=args@entry=0x55555557cc60,
rho=rho@entry=0x55555d1d98a8, callrho=0x5555555af2d0, defrho=0x555557947430, ans=0x7ffffffe9928) at objects.c:486
Whereas r_to_py
is a generic (like py_to_r
above), r_to_py_impl
is wrapped by Rcpp and r_to_py_cpp
is a C++ function that branches on the type of the object – basically the counterpart of the C++ r_to_py
.
In addition to that general process, there is more going on when we call an R function from Python. As Python doesn’t “speak” R, we need to wrap the R function in CPython – basically, we are extending Python here! How to do this is described in the official Extending Python Guide.
In official phrases, what reticulate
does it embed and prolong Python.
Embed, as a result of it allows you to use Python from inside R. Lengthen, as a result of to allow Python to name again into R it must wrap R capabilities in C, so Python can perceive them.
As a part of the previous, the specified Python is loaded (Py_Initialize()
); as a part of the latter, two capabilities are outlined in a brand new module named rpycall
, that can be loaded when Python itself is loaded.
("rpycall", &initializeRPYCall); PyImport_AppendInittab
These methods are call_r_function
, used by default, and call_python_function_on_main_thread
, used in cases where we need to make sure the R function is called on the main thread:
[] = {
PyMethodDef RPYCallMethods, "Call an R function" ,
METH_KEYWORDS, "Call a Python function on the main thread" ,
METH_KEYWORDS{ NULL, NULL, 0, NULL }
};
call_python_function_on_main_thread
is especially interesting. The R runtime is single-threaded; while the CPython implementation of Python effectively is as well, due to the Global Interpreter Lock, this isn’t robotically the case when different implementations are used, or C is used instantly. So call_python_function_on_main_thread
makes certain that until we are able to execute on the principle thread, we wait.
That’s it for our three “spotlights on reticulate
”.
Wrapup
It goes with out saying that there’s loads about reticulate
we didn’t cowl on this article, comparable to reminiscence administration, initialization, or specifics of information conversion. Nonetheless, we hope we had been capable of shed a bit of sunshine on the magic concerned in calling TensorFlow from R.
R is a concise and stylish language, however to a excessive diploma its energy comes from its packages, together with people who mean you can name into, and work together with, the surface world, comparable to deep studying frameworks or distributed processing engines. On this submit, it was a particular pleasure to deal with a central constructing block that makes a lot of this attainable: reticulate
.
Thanks for studying!