https://github.com/SciNim/rnim
A bridge between R and Nim
https://github.com/SciNim/rnim
bridge hacktoberfest r r-lang rlang rstats
Last synced: 28 days ago
JSON representation
A bridge between R and Nim
- Host: GitHub
- URL: https://github.com/SciNim/rnim
- Owner: SciNim
- Created: 2020-09-27T18:47:49.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-03-26T09:53:35.000Z (about 1 year ago)
- Last Synced: 2024-08-04T03:08:27.483Z (10 months ago)
- Topics: bridge, hacktoberfest, r, r-lang, rlang, rstats
- Language: Nim
- Homepage:
- Size: 201 KB
- Stars: 24
- Watchers: 4
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.org
- Changelog: changelog.org
Awesome Lists containing this project
- awesome-nim - rnim - A bridge between R and Nim. Currently this is a barely working prototype. --> (Development Tools / Binding Generators)
README
* rnim - A bridge between R ⇔ Nim
Currently this is a barely working prototype.
Calling R functions from Nim works reasonably well, if basic Nim types
are used. Both named and unnamed function arguments are supported.The R =SEXP= object can be converted into all Nim types, which are
supported in the other direction.Interfacing with shared libraries written in Nim works for basic
types. See the =tNimFromR.nim= and =tCallNimFromR.R= files for an
example in =tests=.** Basic syntax to call R from Nim
Intefacing with R from Nim works by making use of the =Rembedded.h=
functionality, which effectively launches a silent, embedded R repl.This repl is then fed with S expressions to be evaluated. The S
expression is the basic data type on the C side of R. Essentially
everything is mapped to different kinds of S expressions, be it symbols,
functions, simple data types, vectors etc.This library aims to hide both the data conversions and memory
handling from the user.This means that typically one sets up the R repl, does some calls to R
and finally shuts down the R repl again:
#+begin_src nim
let R = setupR()
# some or many calls to R functions
teardown(R)
#+end_srcThe returned =R= object is essentially just a dummy object, which is
used to help with overload resolution (we want =untyped= templates to
allow calling and R function by ident without having to manually wrap
them) and it keeps track of the state of the repl.In order to not have to call the =teardown= procedure manually, there
are two options:
- a =withR= template, which takes a block of code and injects a
variable =R= into its calling scope. The repl will be shut down when
leaving its scope
- by compiling with =--gc:arc= or =--gc:orc=. In that case we can
define a proper destructor, which will be automatically called when
the =R= variable runs out of scope and is destroyed.Note two things:
1. in principle there is a finalizer defined for the non ARC / ORC
case, which performs the same duty. However, at least according to
my understanding, it's run whenever the GC decides to collect the
=R= variable. This might not be very convenient.
2. I don't know whether it's an inherent limitation of the embedded R
repl, but it seems like one cannot destroy an R repl and construct
a new one. If one tries, one is greeted by
#+begin_src sh
R is already initialized
#+end_src
message.*** Simple usage example
The above out of the way, let's look at the basic things currently
possible.For clarity I will annotate the types even where not required.
#+begin_src nim
import rnim
let R = setupR()
# perform a call to the R stdlib function `sum`, by using
# the `.()` dot call template and handing a normal Nim seq
let res: SEXP = R.sum(@[1, 2, 3])
# the result is a `SEXP`, the basic R data type. We can now
# use the `to` proc to get a Nim type from it:
doAssert res.to(int) == 6
#+end_srcSome functions, which have atypical names may not be possible to call
via the dot call template. In that case, we can call the underlying
macro directly, called =callEval= (possibly name change incoming...):
#+begin_src nim
doAssert callEval(`+`, 4.5, 10.5).to(float) == 15.0
#+end_src
This also showcases that functions taking multiple arguments work as
expected. At the moment we're limited to 6 arguments (there's specific
C functions to construct calls up to 6 arguments. Need to
implement arbitrary numbers manually).Also named arguments are supported. Let's use the =seq= function as an
example, the more general version of the =:= operator in R
(e.g. =1:5=):
#+begin_src nim
check R.seq(1, 10, by = 2).to(seq[int]) == toSeq(countup(1, 10, 2))
#+end_src
As we can see, we can also convert =SEXPs= containing vectors back to
Nim sequences.Finally, we can also source from arbitrary R files. Assuming we have
some R file =foo.R=:
#+begin_src R
hello <- function(name) {
return(paste(c("Hello", name), sep = " ", collapse = " "))
}
#+end_src
From Nim we can then call it via:
#+begin_src nim
import rnim
# first set up an R interpreter
let R = setupR()
# now source the file
R.source("foo.R")
# and now we can call R functions defined in the sourced file
doAssert R.hello("User").to(string) == "Hello User"
#+end_srcThat covers the most basic functionality in place so far.
*** Vectors (data arrays)
Arrays are always a special case, as they are usually the main source
of computational work. Avoiding unnecessary copies of arrays is
important to keep performance high.To provide a no-copy interface to data arrays (R vectors) from R,
there are two types to help: =NumericVector[T]= and =RawVector[T]=.
They provide a nice Nim interface to work with such numerical data.Any R =SEXP= can be converted to either of these two types. If the
corresponding =SEXP= does *not* correspond to a vector, an exception
will be thrown at runtime.These types internally simply keep a copy of the underlying data array
in the =SEXP=.From a usability standpoint =NumericVector[T]= is the main type that
should be used. =RawVector[T]= simply provides a *slightly* lower
wrapper, which is however more restrictive.A =RawVector[T]= can only be constructed for: =cint, int32, float,
cdouble=. This is because the underlying R =SEXP= come only in two
types: =INTSXP= and =REALSXP=, the former stores 32-bit integers and
the latter 64-bit floats (technically afaik the platform specific
size, so 32-bit floats on a 32-bit machine. The inverse is *not* the
case for =INTSXP= though!). There is no way to treat a =REALSXP=
vector as a =RawVector[int32]= for instance.This is where =NumericVector[T]= comes in. It can be constructed for
all numerical types larger or equal to 32-bit in size (to avoid loss
of information when constructing *from* a =SEXP=). Unsigned integers
so far are also not supported.A short example:
#+begin_src nim :tangle /tmp/readme_numericvector.nim
import rnim
let R = setupR()let x = @[1, 2, 3]
let xR: SEXP = x.nimToR # types for clarity
var nv = initNumericVector[int](xR)
# `nv` is now a vector pointing to the same data as `xR`
# we can access individual elements:
echo nv[1] # 2
# modify elements:
nv[2] = 5
# check its length
doAssert nv.len == 3
# iterate over it
for i in 0 .. nv.high:
echo nv[i]
for x in nv:
echo x
for i, x in nv:
echo "Index ", i, " contains ", x
# compare them:
doAssert nv == nv
# and print them:
echo nv # NumericVector[int](len: 3, kind: vkFloat, data: [1, 2, 5])
# as `xR` contains the same memory location, constructing another vector
# and comparing them yields `true`, even though we modified `nv`
let nv2 = initNumericVector[int](xR)
doAssert nv == nv2
# finally we can also construct a `NumericVector` straight from a Nim sequence
let nv3 = @[1.5, 2.5, 3.5].toNumericVector()
echo nv3
#+end_srcIf you ran this code you will see a message:
#+begin_src
Interpreting input vector of type `REALSXP` as int loses information!
#+end_srcThis is because we first constructed a =SEXP= from a 64-bit integer
sequence in Nim. As mentioned before, 64-bit integers do not
exist. Therefore, the =xR SEXP= above is actually stored in a
=REALSXP=. By constructing a =NumericVector[int]= we tell the Nim
compiler we wish to convert from and to =int=, no matter the
underlying type of the =SEXP= array, i.e. =INTSXP= or =REALSXP=. The
message simply makes you aware that this is happening (it may be taken
out in the future).The fact that this conversion happens internally is the reason for the
existence of =RawVector=, which explicitly disallows this.Further, =NumericVector= is actually a variant object. Depending on
the runtime type of the =SEXP= from which we construct a =SEXP= the
correct branch of the variant object will be filled.
For extremely performance sensitive application it may thus be
preferable to have a type where variant kind checks and possible type
conversions do not happen.*** =Rctx= macro
As mentioned in the previous secton, some function names are weird and
require the user to use =callEval= directly.To make calling such functions a bit nicer, there is an =Rctx= macro,
which allows for directly calling R functions with e.g. dots in their
names, and also allows for assignments.#+begin_src nim
let x = @[5, 10, 15]
let y = @[2.0, 4.0, 6.0]var df: SEXP
Rctx:
df = data.frame(Col1 = x, Col2 = y)
let df2 = data.frame(Col1 = x, Col2 = y)
print("Hello from R")
#+end_src
where both =df= as well as =df2= will then store an equivalent data
frame. The last line shows that it's also possible to use this macro
to avoid the need to discard all R calls.** Calling Nim code from R
Nim can be used to write extensions for R. This is done by compiling a
Nim file as a shared library and calling it in R using the =.Call=
interface.An example can be seen from the tests:
- https://github.com/SciNim/rnim/blob/master/tests/tNimFromR.nim
the Nim file that is compiled to a shared library
- https://github.com/SciNim/rnim/blob/master/tests/tCallNimFromR.R
the corresponding R file that wraps the shared libraryIn the near future the latter R file will be auto generated by the Nim
code at compile time.The basic idea is as follows. Assume you want to write an extension
that adds two numbers in Nim to be called from R.You write a Nim file with the desired procedure and attach the
={.exportR.}= pragma as follows:=myRmodule.nim=:
#+begin_src nim
import rnimproc addNumbers*(x, y: SEXP): SEXP {.exportR.} =
## adds two numbers. We will treat them as floats
let xNim = x.to(float)
let yNim = y.to(float)
result = (x + y).nimToR
#+end_srcNote the usage of =SEXP= as the input and output types. In the future
the conversions (and possibly non copy access) will be automated. For
now we have to convert manually to and from Nim types.This file is compiled as follows:
#+begin_src sh
nim c (-d:danger) --app:lib (--gc:arc) myRModule.nim
#+end_src
where the =danger= and =ARC= usage are of course optional (but ARC/ORC
is recommended).This will generate a =libmyRmodule.so=. The resulting shared library
in principle needs to be manually loaded via =dyn.load= in R and each
procedure in it needs to be called using the =.Call= interface.Fortunately, this can be automated easily. Therefore, when compiling
such a shared library, we automatically emit an R wrapper, that has
the same name as the input Nim file. So the following file is generated:=myRmodule.R=:
#+begin_src R
dyn.load("libmyRmodule.so")addNumbers <- function(a, b) {
return(.Call("addNumbers", a, b))
}
#+end_srcThis file can now be sourced from the R interpreter (using the
=source= function) or in an R script and then =addNumbers= is usable
and will execute the compiled Nim code!Note that the autogeneration logic assumes the shared library and the
generated R script will live in the same directory. If you wish to
move one, you might have to adjust the paths that perform the
~dyn.load~ command!** Trying it out
To try out the functionality of calling R from Nim, you need to meet a
few prerequisites.*** Setup on Linux
- a working R installation _with_ a =libR.so= shared library
- the shell environment variable =R_HOME= needs to be defined and has
to point to the directory which contains the full R directory
structure. That is /not/ the path where the R binary lies!
Finally, the =libR.so= has to be findable for dynamic loading. On my
machine the path of it by default isn't added to =ld= via
=/etc/ld.so.conf.d= (for the time being I just define =LD_LIBRARY_PATH=
Setup on my machine:
#+begin_src sh
which R
echo $R_HOME
echo $LD_LIBRARY_PATH
#+end_src
#+begin_src sh
/usr/bin/R
/usr/lib/R
/usr/lib/R/lib
#+end_srcAn easy way to set the =R_HOME= variable is by asking R about it:
#+begin_src sh
R RHOME
#+end_src
returns the correct path. We can use that to set the =R_HOME=
variable:
#+begin_src sh
export R_HOME=`R RHOME`
export LD_LIBRARY_PATH=$R_HOME/lib # maybe not required on your system
#+end_src*** Setup on Windows
- a working R installation _with_ a =R.dll= shared library
- the shell environment variable =R_HOME= needs to be defined and has
to point to the directory which contains the full R directory
structure. That is /not/ the path where the R binary lies!
Example setup:
#+begin_src sh
where R.dll
set R_HOME
#+end_src
#+begin_src sh
C:\Program Files\R\R-4.0.4\bin\x64\R.dll
R_HOME=C:\Program Files\R\R-4.0.4
#+end_src*** Test your setup
Run the test file:
#+begin_src sh
nim c -r tests/tRfromNim.nim
#+end_src