https://github.com/ColinFay/hordes
R from NodeJS, the right way.
https://github.com/ColinFay/hordes
Last synced: 5 months ago
JSON representation
R from NodeJS, the right way.
- Host: GitHub
- URL: https://github.com/ColinFay/hordes
- Owner: ColinFay
- Archived: true
- Created: 2020-05-11T07:02:02.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-12-30T20:39:40.000Z (over 2 years ago)
- Last Synced: 2024-08-09T02:16:37.163Z (9 months ago)
- Language: JavaScript
- Homepage: https://colinfay.me/hordes/
- Size: 8.82 MB
- Stars: 58
- Watchers: 12
- Forks: 8
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- jimsghstars - ColinFay/hordes - R from NodeJS, the right way. (JavaScript)
README
# hordes 0.2.1
R from NodeJS, the right way.
## Install
`hordes` can be installed from npm with
```
npm install hordes
```## Jump to Examples
Maybe you don't have time to read the background and you just want to jump straight to the examples:
+ [Simple Example](#simple-example)
+ [API using Express](#api-using-express)
+ [Golem Creator](#golem-creator)## About
`hordes` makes R available from NodeJS.
The general idea of `hordes` is that NodeJS is the perfect tool when it comes to HTTP i/o, hence we can leverage the strength of this ecosystem to build Web Services that can serve R results.
For example, if you have a web service that needs authentication, using `hordes` allows to reuse existing NodeJS modules, which are widely used and tested inside the NodeJS ecosystem.
Another good example is NodeJS native cluster mode, and external modules like `pm2` which are designed to launch your app in a multicore mode, and also that watches that your app is still running continuously, and relaunches it if one of the process stop (kind of handy for a production application that handle a lot of load).
It also makes things easier when it comes to mixing various languages in the same API: for example, you can serve standard html on an endpoint, and R on others.
And don't get me started on scaling NodeJS applications.From the R point of view, the general idea with `hordes` is that every R function call should be stateless.
Keeping this idea in mind, you can build a package where functions are to be considered as 'endpoints' which are then called from NodeJS.
In other words, there is no "shared-state" between two calls to R—if you want this to happen, you should either register the values inside Node, save it on disk, or use a database as a backend (which should be the preferred solution if you ask me).Examples below will probably make this idea clearer.
### How to
The `hordes` module contains the following functions:
#### `hordes_init`
The `library()` and `mlibrary()` functions will be talking to [RServe](https://www.rforge.net/Rserve/doc.html) through [node-rio](https://github.com/albertosantini/node-rio).
You can either launch Rserve by hand, or from Node by calling `hordes_init()` at the top of your script if you want to lauch it.You can serve several instances of RServe, by calling `hordes_init(port = XXX)` where `XXX` is a port.
That also mean that you can open and call several instances of RServe using a Node load balancer.#### `library`
`library` behaves as R `library()` function, except that the output is a JavaScript object with all the functions from the package.
For example, `library("stats")` will return an object with all the functions from `{stats}`.
By doing `const stats = library("stats");`, you will have access to all the functions from `{stats}`, for example `stats.lm()`.> Note that if you want to call functions with dot (for example `as.numeric()`), you should do it using the `[` notation, not the dot one (i.e `base['as.numeric']`, not `base.as.numeric`).
Calling `stats.lm("code")` will launch R, run `stats::lm("code")` and return the output to Node.
``` javascript
// Here, we suppose you already have Rserve running in the background on port 6311
const {library} = require('hordes');
const stats = library(package = "stats");stats.lm("Sepal.Length ~ Sepal.Width, data = iris")
.then((e) => console.log(e.join("\n")))
.catch((err) => console.error(err))
``````
Call:
stats::lm(formula = Sepal.Length ~ Sepal.Width, data = iris)Coefficients:
(Intercept) Sepal.Width
6.5262 -0.2234
```As they are promises, you can use them in an async/await pattern or with `then/catch`.
The rest of this README will use `async/await```` javascript
const { library, hordes_init } = require('hordes');
const stats = library("stats");(async() => {
// You can ignore this if you already have Rserve running on the background
await hordes_init();try {
const a = await stats.lm("Sepal.Length ~ Sepal.Width, data = iris")
console.log(a.join("\n"))
} catch (e) {
console.log(e)
}try {
const a = stats.lm("Sepal.Length ~ Sepal.Width, data = iris")
const b = stats.lm("Sepal.Length ~ Petal.Width, data = iris")
const ab = await Promise.all([a, b])
console.log(ab[0].join("\n"))
console.log(ab[1].join("\n"))
} catch (e) {
console.log(e)
}
})();
``````
Call:
stats::lm(formula = Sepal.Length ~ Sepal.Width, data = iris)Coefficients:
(Intercept) Sepal.Width
6.5262 -0.2234Call:
stats::lm(formula = Sepal.Length ~ Sepal.Width, data = iris)Coefficients:
(Intercept) Sepal.Width
6.5262 -0.2234Call:
stats::lm(formula = Sepal.Length ~ Petal.Width, data = iris)Coefficients:
(Intercept) Petal.Width
4.7776 0.8886
```By default, these functions will return an array of characters, corresponding to the output of R.
If you want to return one of the types supported by `node-rio`, you can specify `capture_output = false` in the `library()` function: that could improve the performance of your application if you have a lot of load.#### `mlibrary`
`mlibrary` does the same job as `library` except the functions are natively memoized.
__This is probably the mode you will want to use on a regular basis, unless your data are changing regularly.__``` javascript
const {library, mlibrary} = require('hordes');
const base = library("base");
const mbase = mlibrary("base");(async () => {
try {
const a = await base.sample("1:100, 5")
console.log("a:", a)
const b = await base.sample("1:100, 5")
console.log("b:", b)
} catch(e){
console.log(e)
}try {
const a = await mbase.sample("1:100, 5")
console.log("a:", a)
const b = await mbase.sample("1:100, 5")
console.log("b:", b)
} catch(e){
console.log(e)
}
}
)();
``````
a: [1] 49 13 37 25 91b: [1] 5 17 68 26 29
a: [1] 96 17 6 4 75
b: [1] 96 17 6 4 75
```### Data Exchange
If you want to exchange data between R and NodeJS, you can rely on the default `node-rio`, that can share a series of formats (string, numbers...), by passing `{capture_output: false}` as the option parameter to `library()`.
Otherwise, the function calls will return a string, so use an interchangeable format that can be converted in Node: JSON, arrow, base64 for images, raw strings...
``` javascript
const {library} = require('hordes');
const jsonlite = library("jsonlite");
const base = library("base");(async () => {
await hordes_init();
try {
const a = await jsonlite.toJSON("iris")
console.log(JSON.parse(a)[0])
} catch(e){
console.log(e)
}
try {
const b = await base.cat("21")
console.log(parseInt(b) * 2)
} catch(e){
console.log(e)
}
}
)();
``````
{
'Sepal.Length': 5.1,
'Sepal.Width': 3.5,
'Petal.Length': 1.4,
'Petal.Width': 0.2,
Species: 'setosa'
}
42
```Note that there is a `hordes` R package here on the [r-hordes](r-hordes) folder, and that it contains some functions to facilitate the data translation.
It can be installed with
```r
remotes::install_github("colinfay/hordes", subdir = "r-hordes")
```For example, to share images, you can create a function in a package (here named "`{hordex}`") that does:
```r
ggpoint <- function(n) {
gg <- ggplot(iris[1:n, ], aes(Sepal.Length, Sepal.Width)) +
geom_point()
hordes::base64_img_ggplot(gg)
}```
Then in NodeJS:
```javascript
const express = require('express');
const {mlibrary} = require('hordes');
const app = express();
const hordesx = mlibrary("hordesx")app.get('/ggplot', async (req, res) => {
try {
const im = await hordesx.ggpoint(`n = ${req.query.n}`);
const img = Buffer.from(im, 'base64');
res.writeHead(200, {
'Content-Type': 'image/png',
'Content-Length': img.length
});
res.end(img);
} catch(e){
res.status(500).send(e)
}
})app.listen(2811, function () {
console.log('Example app listening on port 2811!')
})
```> http://localhost:2811/ggplot?n=5
> http://localhost:2811/ggplot?n=50
> http://localhost:2811/ggplot?n=150### `get_hash`
Before calling `library()` or `mlibrary()`, you can check that the install package still match a hash previously compiled with `get_hash`.
This hash is computed from the `DESCRIPTION` of the package called.That way, if ever the `DESCRIPTION` file changes (version update, or stuff like that...), you can get alerted (app won't launch).
Just ignore this if you don't care about checking this has (but you should in a production setting, so you can be alerted that the package you are using stays the same).``` javascript
const { check_hash, get_hash } = require('hordes');
console.log(get_hash("golem"))
``````
'fdfe0166629045e6ae8f7ada9d9ca821742e8135efec62bc2226cf0811f44ef3'
```Then if you call `library()` with another hash, the app will fail.
```javascript
check_hash("golem", "blabla")
``````
throw new Error("Hash from DESCRIPTION doesn't match specified hash.")
``````javascript
check_hash("golem", 'e2167f289a708b2cd3b774dd9d041b9e4b6d75584b9421185eb8d80ca8af4d8a')
var golem = library("golem")
Object.keys(golem).length
``````
104
```#### `waiter`
You can launch an R process that streams data and wait for a specific output in the stdout.
The specificity of `waiter` is that it doesn't rely on `node-rio`, but spawn a real R process, and reads the elements streamed on stdout.The promise resolves with and `{proc, raw_output}`: `proc` is the process object created by Node, `raw_output` is the output buffer, that can be turned to string with `.toString()`.
A streaming process here is considered in a lose sense: what we mean here is anything that prints various elements to the console.
For example, when you create a new application using the `{golem}` package, the app is ready once this last line is printed to the console.
This is exactly what `waiter` does, it waits for this last line to be printed to the R stdout before resolving.```r
> golem::create_golem('pouet')
-- Checking package name -------------------------------------------------------
v Valid package name
-- Creating dir ----------------------------------------------------------------
v Created package directory
-- Copying package skeleton ----------------------------------------------------
v Copied app skeleton
-- Setting the default config --------------------------------------------------
v Configured app
-- Done ------------------------------------------------------------------------
A new golem named pouet was created at /private/tmp/pouet .
To continue working on your app, start editing the 01_start.R file.
``````javascript
const { waiter } = require("hordes")
const express = require('express');
const app = express();app.get('/creategolem', async(req, res) => {
try {
await waiter("golem::create_golem('pouet')", {solve_on: "To continue working on your app"});
res.send("Created ")
} catch (e) {
console.log(e)
res.status(500).send("Error creating the golem project")
}
})app.listen(2811, function() {
console.log('Example app listening on port 2811!')
})
```-> http://localhost:2811/creategolem
### Changing the process that runs R in waiters
By default, the R code is launched by `RScript`, but you can specify another (for example if you need another version of R):
```javascript
const { waiter } = require("hordes")
const express = require('express');
const app = express();app.get('/creategolem', async(req, res) => {
try {
await waiter("golem::create_golem('pouet')", {solve_on: "To continue working on your app", process: '/usr/local/bin/RScript'});
res.send("Created ")
} catch (e) {
console.log(e)
res.status(500).send("Error creating the golem project")
}
})app.listen(2811, function() {
console.log('Example app listening on port 2811!')
})
```### Examples
#### Simple example
``` javascript
const { mlibrary } = require('hordes');
const dplyr = mlibrary("dplyr");
const stats = mlibrary("stats");(async() => {
try {
const sample = await dplyr.sample_n("iris, 5")
console.log(sample)
} catch (e) {
console.log(e)
}try {
const pull = await dplyr.pull("airquality, Month")
console.log(pull)
} catch (e) {
console.log(e)
}try {
const lm = await stats.lm("Sepal.Length ~ Sepal.Width, data = iris")
console.log(lm)
} catch (e) {
console.log(e)
}
})();
``````
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.7 3.8 1.7 0.3 setosa
2 6.7 2.5 5.8 1.8 virginica
3 6.9 3.1 5.1 2.3 virginica
4 6.4 2.9 4.3 1.3 versicolor
5 5.1 3.3 1.7 0.5 setosa[1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6
[38] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7
[75] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
[112] 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
[149] 9 9 9 9 9Call:
stats::lm(formula = Sepal.Length ~ Sepal.Width, data = iris)Coefficients:
(Intercept) Sepal.Width
6.5262 -0.2234
```#### API using express
``` javascript
const express = require('express');
const { mlibrary } = require('hordes');
const app = express();
const stats = mlibrary("stats");app.get('/lm', async(req, res) => {
try {
const output = await stats.lm(`${req.query.left} ~ ${req.query.right}`)
res.send('' + output + '')
} catch (e) {
res.status(500).send(e)
}
})app.get('/rnorm', async(req, res) => {
try {
const output = await stats.rnorm(req.query.left)
res.send('' + output + '')
} catch (e) {
res.status(500).send(e)
}
})app.listen(2811, function() {
console.log('Example app listening on port 2811!')
})
```-> http://localhost:2811/lm?left=iris$Sepal.Length&right=iris$Petal.Length
-> http://localhost:2811/rnorm?left=10
### Golem Creator
```javascript
const { waiter } = require("hordes")
const express = require('express');
const app = express();app.get('/creategolem', async(req, res) => {
try {
await waiter(`golem::create_golem('${req.query.name}')`, solve_on = "To continue working on your app");
res.send("Created ")
} catch (e) {
console.log(e)
res.status(500).send("Error creating the golem project")
}
})app.listen(2811, function() {
console.log('Example app listening on port 2811!')
})
```-> http://localhost:2811/creategolem?name=coucou
*