https://github.com/jtilly/r-conda-env

Wrap Python code including its conda environment in an R Package
https://github.com/jtilly/r-conda-env

conda interface python r reproducibility reticulate

Last synced: 3 months ago
JSON representation

Wrap Python code including its conda environment in an R Package

Host: GitHub
URL: https://github.com/jtilly/r-conda-env
Owner: jtilly
Created: 2020-04-08T13:55:13.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2020-04-13T05:02:45.000Z (about 5 years ago)
Last Synced: 2025-01-12T06:07:04.905Z (4 months ago)
Topics: conda, interface, python, r, reproducibility, reticulate
Language: R
Homepage:
Size: 63.5 KB
Stars: 2
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        

# r-conda-env

[![Travis build

status](https://travis-ci.org/jtilly/r-conda-env.svg?branch=master)](https://travis-ci.org/jtilly/r-conda-env)

R-Package = conda environment wrapped model

This package allows you to ship and deploy machine learning models built

in Python using an R package.

Say you have a Python model that works in one specific conda environment

and you want to make it accessible to R users via reticulate. How do you

go about doing that?

This proof of concept R-package comes with a fully specified conda

environment that will be created when the R package is installed. All

Python code inside this package will then be run in this conda

environment. We can ship several models in the same R package as long as

they share their conda environment. If two models do not share their

conda environment, we ship them in separate R packages.

## Install

You need to have conda installed on your system and reticulate must be

able to [find

it](https://rstudio.github.io/reticulate/reference/conda-tools.html#finding-conda).

``` r

# install.packages("remotes")

remotes::install_github("jtilly/r-conda-env")

```

## Usage

``` r

library(rcondaenv)

create_package_env()

#> Creating conda environment now.

#> Environment 2f0409c2f60c564607d28c44c8edc52c already exists. Removing it first...

#> Created conda environment 2f0409c2f60c564607d28c44c8edc52c

df <- tibble::tribble(

  ~x, ~y, ~z,

  "a", 2, 3.6,

  "b", 1, 8.5

)

python_model_predict(df)

#> [1] 0 1

check_pandas_version()

#> [1] "The installed Pandas version is 1.0.3"

```

## Details

  - The conda requirements are defined in `inst/conda-requirements.txt`

    and installed with the R Package.

    

        python=3.8.2=he5300dc_5_cpython

        pandas=1.0.3=py38hcb8c335_0

        numpy=1.18.1=py38h8854b6b_1

    

    Package versions are currently pinned. There’s an unpinned version

    for non-Linux systems.

  - Arbitrary Python code can be shipped with the package. Currently,

    there’s only one file `inst/model.py`:

    

    ``` python

    import pandas as pd

    

    def predict(df):

      """Trivial predict function that returns a sequence 0, 1, ..., n-1."""

      return df.reset_index(drop=True).index.astype(float)

    

    def check_pandas_version():

      return(f"The installed Pandas version is {pd.__version__}")

    ```

  - The reticulate calls are in `R/predict.R`.

  - We overcome the problem that you cannot use reticulate to interface

    with different Python executables within the same R session (see

    [this

    comment](https://github.com/rstudio/reticulate/issues/27#issuecomment-512256949))

    by running the reticulate call on a different worker (via the

    `parallel` package - both `PSOCK` and `FORK` work here). This comes

    with overhead, both for setting up the

    [cluster](https://developer.r-project.org/Blog/public/2020/03/17/socket-connections-update/index.html)

    and for serializing the data and communicating with the worker,

    which may or may not be tolerable depending on your use case.

## Performance

A benchmark is provided for a data set with 10 numerical columns, 10

string columns, and 10 date columns. `encapsulate` uses the little hack

that allows us to use reticulate with different Python executables in

the same R session. `do_not_encapsulate` goes straight from the user’s R

session to reticulate.

``` r

set_cluster_type("FORK")

results <- bench(n = 1e6)

#> Running with:

#>         n

#> 1       1

#> 2      10

#> 3     100

#> 4    1000

#> 5   10000

#> 6  100000

#> 7 1000000

knitr::kable(results[c("expression", "n", "median")])

```

| expression               |     n |   median |

| :----------------------- | ----: | -------: |

| encapsulate(df)          | 1e+00 | 542.29ms |

| do\_not\_encapsulate(df) | 1e+00 | 459.25ms |

| encapsulate(df)          | 1e+01 | 510.14ms |

| do\_not\_encapsulate(df) | 1e+01 | 465.09ms |

| encapsulate(df)          | 1e+02 | 500.33ms |

| do\_not\_encapsulate(df) | 1e+02 | 464.31ms |

| encapsulate(df)          | 1e+03 | 566.11ms |

| do\_not\_encapsulate(df) | 1e+03 | 470.98ms |

| encapsulate(df)          | 1e+04 | 614.53ms |

| do\_not\_encapsulate(df) | 1e+04 | 532.51ms |

| encapsulate(df)          | 1e+05 |    1.91s |

| do\_not\_encapsulate(df) | 1e+05 |    1.17s |

| encapsulate(df)          | 1e+06 |   12.01s |

| do\_not\_encapsulate(df) | 1e+06 |    7.52s |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jtilly/r-conda-env

Awesome Lists containing this project

README