https://github.com/wainberg/ryp

R inside Python
https://github.com/wainberg/ryp
bioinformatics data-science python python-to-r r r-to-python rstats statistics
Last synced: about 1 month ago
JSON representation
R inside Python
Host: GitHub
URL: https://github.com/wainberg/ryp
Owner: Wainberg
License: mit
Created: 2024-09-29T01:43:04.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-04-11T01:46:11.000Z (about 1 month ago)
Last Synced: 2025-04-13T02:09:55.922Z (about 1 month ago)
Topics: bioinformatics, data-science, python, python-to-r, r, r-to-python, rstats, statistics
Language: Python
Homepage:
Size: 395 KB
Stars: 173
Watchers: 1
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        # ryp: R inside Python

ryp is a minimalist, powerful Python library for:

- running R code inside Python

- quickly transferring huge datasets between Python (NumPy/pandas/polars) and R

  without writing to disk

- interactively working in both languages at the same time

ryp is an alternative to the widely used [rpy2](https://github.com/rpy2/rpy2) 

library. Compared to rpy2, ryp provides:

- increased stability

- a much simpler API, with less of a learning curve

- interactive printouts of R variables that match what you'd see in R

- a full-featured R terminal inside Python for interactive work

- inline plotting in Jupyter notebooks (requires the `svglite` R package)

- much faster data conversion with [Arrow](https://arrow.apache.org) (also

  provided by [rpy2-arrow](https://github.com/rpy2/rpy2-arrow))

- support for *every* NumPy, pandas and polars data type representable in base

  R, no matter how obscure

- support for sparse arrays/matrices

- recursive conversion of containers like R lists, Python tuples/lists/dicts, 

  and S3/S4/R6 objects

- full Windows support

ryp does the opposite of the 

[reticulate](https://rstudio.github.io/reticulate) R library, which runs Python

inside R.

## Table of Contents

- [Installation](#installation)

- [Functionality](#functionality)

  - [`r()`](#r)

  - [`to_r()`](#to_r)

    - [The `format` argument](#the-format-argument)

    - [The `rownames` and `colnames` arguments](#the-rownames-and-colnames-arguments)

  - [`to_py()`](#to_py)

    - [The `format` argument](#the-format-argument-1)

    - [The `index` argument](#the-index-argument)

    - [The `squeeze` argument](#the-squeeze-argument)

  - [`options()`](#options)

- [Conversion rules](#conversion-rules)

  - [Python to R (`to_r()`)](#python-to-r-to_r)

    - [NumPy data types](#numpy-data-types)

    - [pandas-specific data types](#pandas-specific-data-types)

    - [pandas Arrow data types (`pd.ArrowDtype`)](#pandas-arrow-data-types-pdarrowdtype)

    - [Polars data types](#polars-data-types)

    - [Notes](#notes)

  - [R to Python (`to_py()`)](#r-to-python-to_py)

    - [Data types](#data-types)

- [Examples](#examples)

## Installation

Install ryp via pip:

```bash

pip install ryp

```

conda:

```bash

conda install ryp

```

or mamba:

```bash

mamba install ryp

```

Or, install the development version via pip:

```bash

pip install git+https://github.com/Wainberg/ryp

```

ryp's only mandatory dependencies are:

- Python 3.7+

- R

- the [cffi](https://cffi.readthedocs.io/en/stable) Python package

- the [pyarrow](https://arrow.apache.org/docs/python) Python package, which 

  includes [NumPy](https://numpy.org) as a dependency

- the [arrow](https://arrow.apache.org/docs/r) R library

R and the arrow R library are automatically installed when installing ryp via 

conda or mamba, but not via pip. ryp uses the R installation pointed to by the

environment variable `R_HOME`, or if `R_HOME` is not defined or not a 

directory, by running `R RHOME` through `subprocess.run()`.

ryp also has several optional dependencies, which are not installed 

automatically with pip, conda or mamba. These are:

- [pandas](https://pandas.pydata.org), for `format='pandas'`

- [polars](https://pola.rs), for `format='polars'`

- [SciPy](https://scipy.org) and the

  [Matrix](https://cran.r-project.org/web/packages/Matrix) R library, for sparse

  matrices

- the [svglite](https://cran.r-project.org/web/packages/svglite) R library, for

  inline plotting in Jupyter notebooks

## Functionality

ryp consists of just four functions:

1. [`r(R_code)`](#r) runs a string of R code. [`r()`](#r) with no arguments 

   opens up an R terminal inside your Python terminal for interactive work.

2. [`to_r(python_object, R_variable_name)`](#to_r) converts a Python object 

   into an R object named `R_variable_name`. 

3. [`to_py(R_statement)`](#to_py) converts the R object produced by evaluating 

   `R_statement` to Python. `R_statement` may be a single variable name, or a 

   more complex code snippet that evaluates to the R object you'd like to 

   convert.

4. [`options()`](#options), for getting or setting ryp's configuration options.

### `r()`

```python

r(R_code: str = ...) -> None

```

`r(R_code)` runs a string of R code inside ryp's R interpreter, which is 

embedded inside Python. It can contain multiple statements separated by

semicolons or newlines (e.g. within a triple-quoted Python string). It returns

`None`; use `to_py()` instead if you would like to convert the result back to 

Python.

`r()` with no arguments opens up an R terminal inside your Python terminal 

for interactive debugging. Press `Ctrl + D` to exit back to the Python 

terminal. R variables defined from Python will be available in the R terminal,

and variables defined in the R terminal will be available from Python once you

exit:

```python

>>> from ryp import r

>>> r('a = 1')

>>> r()

> a

[1]

1

> b <- 2

>

>>> r('b')

[1]

2

```

Note that the default value for `R_code` is the special sentinel value `...` 

(`Ellipsis`) rather than `None`. This stops users from inadvertently opening 

the terminal when passing a variable that is supposed to be a string but is 

unexpectedly `None`.

### `to_r()`

```python

to_r(python_object: object, R_variable_name: str, *, 

     format: Literal['keep', 'matrix', 'data.frame'] | None = None,

     rownames: object = None, colnames: object = None) -> None

```

`to_r(python_object, R_variable_name)` converts `python_object` to R, adding it

to R's global namespace (`globalenv`) as a variable named `R_variable_name`. 

If `python_object` is a container (`list`, `tuple`, or `dict`), `to_r()`

recursively converts each element and returns an R named list (if

`python_object` is a `dict`) or unnamed list (if `python_object` is a `list` or

`tuple`).

#### The `format` argument

By default (`format='keep'`), ryp converts polars and pandas DataFrames (and 

pandas MultiIndexes) into R data frames, and 2D NumPy arrays into R matrices. 

Specify `format='matrix'` to convert everything (even DataFrames) to R matrices

(in which case all DataFrame columns must have the same type), and 

`format='data.frame'` to convert everything (even 2D NumPy arrays) to R 

data frames.

`format` must be `None` unless `python_object` is a DataFrame, MultiIndex or 2D

NumPy array – or unless `python_object` is a `list`, `tuple`, or `dict`, in 

which case the `format` will apply recursively to any DataFrames, MultiIndexes

or 2D NumPy arrays it contains.

#### The `rownames` and `colnames` arguments

Since NumPy arrays, polars DataFrames and Series, and scipy sparse arrays and 

matrices lack row and column names, you can specify these separately via the 

`rownames` and/or `colnames` arguments, and they will be added to the converted

R object. `rownames` and `colnames` can be lists, tuples, string Series, or 

categorical Series with string categories, and will be automatically converted

to R character vectors. 

`rownames` and `colnames` must match the length or `shape[1]`, respectively, of

the object being converted. The one exception is that rownames of any length

may be added to a 0 × 0 polars DataFrame, since polars does not have the 

concept of an `N` × 0 DataFrame for nonzero `N`. (Dropping all the 

columns of a polars DataFrame always results in a 0 × 0 DataFrame, even 

if the original DataFrame had more than 0 rows.)

Because Python `bool`, `int`, `float`, and `str` convert to length-1 R vectors

that support names, you can pass length-1 `rownames` when converting objects of

these types. You can also pass `rownames` and/or `colnames` when 

`python_object` is a `list`, `tuple`, or `dict`, in which case row and column 

names will only be added to elements that support them. All elements that 

support `rownames` must have the same length as the `rownames`, and similarly 

for the `colnames`. 

`rownames` cannot be specified if `python_object` is a pandas Series or 

DataFrame (since they already have rownames, i.e. an index), or 

`bytes`/`bytearray` (since these convert to `raw` vectors, which lack 

rownames). `colnames` cannot be specified unless `python_object` is a 

multidimensional NumPy array or scipy sparse array or matrix, or something that

might contain one (`list`, `tuple`, or `dict`).

### `to_py()`

```python

to_py(R_statement: str, *,

      format: Literal['polars', 'pandas', 'pandas-pyarrow', 'numpy'] |

              dict[Literal['vector', 'matrix', 'data.frame'],

                   Literal['polars', 'pandas', 'pandas-pyarrow',

                           'numpy']] | None = None,

      index: str | Literal[False] | None = None,

      squeeze: bool | None = None) -> Any

```

`to_py(R_statement)` runs a single statement of R code (which can be as simple 

as a single variable name) and converts the resulting R object to Python. 

If the object is a list/S3 object, S4 object, or environment/R6 object, it

recursively converts each attribute/slot/field and returns a Python `dict` (or 

`list`, if the object is an unnamed list). For R6 objects, only public fields

will be converted.

#### The `format` argument

By default, or when `format='polars'`, R vectors will be converted to polars 

Series, and R data frames and matrices will be converted to polars DataFrames. 

You can change this by setting the `format` argument to `'pandas'`, 

`'pandas-pyarrow'` (like `'pandas'`, but converting to pyarrow dtypes wherever 

possible) or `'numpy'`. (You can also change the default format, e.g. with 

`options(to_py_format='pandas')`.)

For finer-grained control, you can set `format` for only certain R variable 

types by specifying a dictionary with `'vector'`, `'matrix'`, and/or

`'data.frame'` as keys and `'polars'`, `'pandas'`, `'pandas-pyarrow'` and/or 

`'numpy'` as values. 

`format` must be `None` when `R_statement` evaluates to `NULL`, when it 

evaluates to an array of 3 or more dimensions (these are always converted to 

NumPy arrays), or when the final result would be a Python scalar (see `squeeze`

below).

#### The `index` argument

By default, the R object's `names` or `rownames` will become the index (for 

pandas) or the first column (for polars) of the output Python object, named 

`'index'`. Set the `index` argument to a different string to change this name, 

or set `index=False` to not convert the `names`/`rownames`. 

Note that for polars, the output will be a two-column DataFrame (not a Series!)

when the input is an R vector, unless `index=False`. 

When the output is a NumPy array, `names` and `rownames` will always be 

discarded, since numeric NumPy arrays cannot store string indexes except with 

the inefficient `dtype=object`. 

`index` must be `None` when `format='numpy'`, or when the final result would be

a Python scalar (see `squeeze` below).

#### The `squeeze` argument

By default, length-1 R vectors, matrices and arrays will be converted to Python

scalars instead of Python arrays, Series or DataFrames. Set `squeeze=False` to

disable this special case. (R data frames are never converted to Python scalars

even if `squeeze=True`.) 

`squeeze` must be `None` unless the R object is a vector, matrix or array

(`raw` vectors don't count, because they always convert to Python scalars).

### `options()`

```python

options(*, to_r_format=None, to_py_format=None, index=None, squeeze=None, 

        plot_width: int | float | None = None, 

        plot_height: int | float | None = None) -> None

```

`options` gets or sets ryp's configuration settings:

- `to_r_format`: the default value for the `format` parameter in `to_r()`; 

  must be `'keep'` (the default), `'matrix'`, or `'data.frame'`.

- `to_py_format`: the default value for the `format` parameter in `to_py()`; 

  must be `'polars'` (the default), `'pandas'`, `'pandas-pyarrow'`, `'numpy'`,

  or a dictionary with one of those four Python formats and/or `None` as values

  and `'vector'`, `'matrix'` and/or `'data.frame'` as keys. If certain keys are 

  missing or have `None` as the format, leave their format unchanged.

- `index`: the default value for the `index` parameter in to_py(); must be a 

  string (default: `'index'`) or `False`. 

- `squeeze`: the default value for the `squeeze` parameter in `to_py()`; must  

  be `True` (the default) or `False`.

- `plot_width`: the width, in inches, of inline plots in Jupyter notebooks;

  must be a positive number. Defaults to 6.4 inches, to match Matplotlib's 

  default.

- `plot_height`: the height, in inches, of inline plots in Jupyter notebooks;

  must be a positive number. Defaults to 4.8 inches, to match Matplotlib's 

  default.

For instance, to set pandas as the default format in `to_py()`, run 

`options(to_py_format='pandas')`. This leaves the other options unchanged.

`options()` with no arguments returns the current configuration options as a 

dictionary, with keys `to_r_format`, `to_py_format`, `index`, `squeeze`, 

`plot_width`, and `plot_height`.

For additional customization, users can specify ryp-specific settings in their

`.Rprofile`:

```R

if ("ryp" %in% commandArgs()) {

    # Custom settings for running R within ryp

} else {

    # Custom settings for native R

}

```

## Conversion rules

### Python to R (`to_r()`)

Arrays and Series with `float64`, `int32`, `int64`, `uint32`, and `uint64` data 

types will be converted without copying the underlying data ("zero-copy"). This 

is extremely fast but comes with two important caveats:

1. Modifying the data in R will also modify it in Python, and vice versa.

2. Because R lacks support for unsigned integers, `uint32` values will be 

   reinterpreted as `int32` and `uint64` values will be reinterpreted as 

   `int64`. This means that for `uint32`, `2_147_483_648` (`INT32_MAX + 1`) 

   will become `NA` and larger values will become negative numbers. For 

   `uint64`, `9_223_372_036_854_775_808` (`INT64_MAX + 1`) will become `NA` and 

   larger values will become negative numbers.

Polars Series with `null` values or multiple chunks, and pandas Series with

non-NumPy data types, will *not* be converted zero-copy, but are still subject 

to caveat #2.

| Python                                                                  | R                                                                                                         |

|-------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|

| `None`                                                                  | `NULL` (if scalar) or `NA` (if inside NumPy, pandas or polars)                                            |

| `nan`                                                                   | `NaN` (if scalar or inside NumPy or polars) or `NA` (if inside pandas)                                    |

| `pd.NA`                                                                 | `NA`                                                                                                      |

| `pd.NaT`, `np.datetime64('NaT')`, `np.timedelta64('NaT')`               | `NA`                                                                                                      |   

| `bool`                                                                  | length-1 `logical` vector                                                                                 |

| `int`                                                                   | length-1 `integer` (if `abs(x) <= 2_147_483_647`) or `bit64::integer64` vector                            |

| `float`                                                                 | length-1 `numeric` vector                                                                                 |

| `str`                                                                   | length-1 `character` vector                                                                               |

| `complex`                                                               | length-1 `complex` vector                                                                                 |

| `datetime.date`                                                         | length-1 `Date` vector                                                                                    |

| `datetime.datetime`                                                     | length-1 `POSIXct` vector                                                                                 |

| `datetime.timedelta`                                                    | length-1 `difftime(units='secs')` vector                                                                  |

| `datetime.time` (`tzinfo` must be `None`)                               | length-1 `hms::hms` vector                                                                                |

| `bytes`, `bytearray`                                                    | `raw` vector                                                                                              |

| `list`, `tuple`                                                         | unnamed list                                                                                              |

| `dict` (all keys must be strings)                                       | named list                                                                                                |

| polars Series, pandas Series^&ast;, pandas `Index`            | vector                                                                                                    |

| polars DataFrame, pandas DataFrame^&ast;, pandas `MultiIndex` | matrix^† (if `format == 'matrix'`; all columns must have same data type) or `data.frame` |

| 1D NumPy array                                                          | vector                                                                                                    |

| 2D NumPy array                                                          | `data.frame` (if `format == 'data.frame'`) or matrix^†                                   |

| ≥ 3D NumPy array                                                     | array^†                                                                                  |

| 0D NumPy array (e.g. `np.array(1)`), NumPy generic (e.g. `np.int32(1)`) | length-1 vector                                                                                           |

| `csr_array`, `csr_matrix`                                               | `dgRMatrix` (if `float64`), `lgRMatrix` (if `bool`), -- (otherwise)                                       | 

| `csc_array`, `csc_matrix`                                               | `dgCMatrix` (if `float64`), `lgCMatrix` (if `bool`), -- (otherwise)                                       |

| `coo_array`, `coo_matrix`                                               | `dgTMatrix` (if `float64`), `lgTMatrix` (if `bool`), -- (otherwise)                                       |

#### NumPy data types

| Python                                                | R                                          |

|-------------------------------------------------------|--------------------------------------------|

| `bool`                                                | `logical`                                  |

| `int8`, `uint8`, `int16`, `uint16`, `int32`, `uint32` | `integer`                                  |

| `int64`, `uint64`                                     | `bit64::integer64`                         |

| `float16`, `float32`, `float64`, `float128`           | `numeric`                                  |

| `complex64`, `complex128`                             | `complex`                                  |

| `bytes` (e.g. `'S1'`)                                 | --                                         |

| `str`/`unicode` (e.g. `'U1'`)                         | `character`                                |

| `datetime64`                                          | `POSIXct`                                  | 

| `timedelta64`                                         | `difftime(units='secs')`                   |

| `void` (unstructured)                                 | `raw`                                      |

| `void` (structured)                                   | --                                         |

| `object`                                              | depends on the contents^‡ |

#### pandas-specific data types

| Python                                                                              | R                  |

|-------------------------------------------------------------------------------------|--------------------|

| `BooleanDtype`                                                                      | `logical`          |

| `Int8Dtype`, `UInt8Dtype`, `Int16Dtype`, `UInt16Dtype`, `Int32Dtype`, `UInt32Dtype` | `integer`          |

| `Int64Dtype`, `UInt64Dtype`                                                         | `bit64::integer64` |  

| `Float32Dtype`, `Float64Dtype`                                                      | `numeric`          |

| `StringDtype`                                                                       | `character`        |

| `CategoricalDtype(ordered=False)`                                                   | unordered `factor` |

| `CategoricalDtype(ordered=True)`                                                    | ordered `factor`   |

| `DatetimeTZDtype`, `PeriodDtype`                                                    | `POSIXct`          |

| `IntervalDtype`, `SparseDtype`                                                      | --                 |

#### pandas Arrow data types (`pd.ArrowDtype`)

| Python                                                                  | R                        |

|-------------------------------------------------------------------------|--------------------------|

| `pa.bool_`                                                              | `logical`                |

| `pa.int8`, `pa.uint8`, `pa.int16`, `pa.uint16`, `pa.int32`, `pa.uint32` | `integer`                |

| `pa.int64`, `pa.uint64`                                                 | `bit64::integer64`       |

| `pa.float32`, `pa.float64`                                              | `numeric`                |

| `pa.string`, `pa.large_string`                                          | `character`              |

| `pa.date32`                                                             | `Date`                   |

| `pa.date64`, `pa.timestamp`                                             | `POSIXct`                |

| `pa.duration`                                                           | `difftime(units='secs')` |

| `pa.time32`, `pa.time64`                                                | `hms::hms`               |

| `pa.dictionary(any integer type, pa.string(), ordered=0)`               | unordered `factor`       |

| `pa.dictionary(any integer type, pa.string(), ordered=1)`               | ordered `factor`         |

| `pa.null()`                                                             | `vctrs::unspecified`     |

#### Polars data types

| Python                                                | R                                          |

|-------------------------------------------------------|--------------------------------------------|

| `Boolean`                                             | `logical`                                  |

| `Int8`, `UInt8`, `Int16`, `UInt16`, `Int32`, `UInt32` | `integer`                                  |

| `Int64`, `UInt64`                                     | `bit64::integer64`                         |

| `Float32`, `Float64`                                  | `numeric`                                  |

| `Date`                                                | `Date`                                     |

| `Datetime`                                            | `POSIXct`                                  |

| `Duration`                                            | `difftime(units='secs')`                   |

| `Time`                                                | `hms::hms`                                 |

| `String`                                              | `character`                                |

| `Categorical`                                         | unordered `factor`                         |

| `Enum`                                                | ordered `factor`                           |

| `Object`                                              | depends on the contents^‡ |

| `Null`                                                | `vctrs::unspecified`                       | 

| `Binary`, `Decimal`, `List`, `Array`                  | --                                         |

#### Notes

^&ast; For pandas Series and DataFrames, string indexes (and 

categorical indexes where the categories are strings) will be automatically

converted to `names`/`rownames`. The default index

(`pd.RangeIndex(len(python_object))`) will be ignored. All other indexes are

disallowed. 

^† Because R does not support `POSIXct` and `Date` matrices or

arrays, dates and datetimes cannot be converted to R matrices or arrays.

^‡ For `dtype=object` and `dtype=pl.Object`, the output R type

depends on the contents, e.g. `'character'` if all elements are strings. Some

additional notes on ryp's handling of object data types:

- `None`, `np.nan`, `pd.NA`, `pd.NaT`, `np.datetime64('NaT')`, and 

  `np.timedelta64('NaT')` are all treated as missing values – even for 

  polars, where `np.nan` is ordinarily treated as a floating-point number 

  rather than a missing value. 

- Length-0 and all-missing data will be converted to the `vctrs::unspecified` R

  type (`vctrs` is part of the tidyverse). 

- If the elements are objects with a mix of types (or datetimes with a mix of

  time zones), Arrow will generally cause the conversion to fail, though mixes

  of related types (e.g. int and float) will be automatically cast to the

  common supertype and succeed. 

- Conversion will also fail if the contents are objects that are not 

  representable as R vector elements. This includes `bytes`/`bytearray` (which

  are only representable in R when scalar, as a `raw` vector) and Python

  containers (`list`, `tuple`, and  `dict`). 

- pandas `Timedelta` objects will be rounded down to the nearest microsecond,

  following the behavior of Arrow.

### R to Python (`to_py()`)

Unlike `to_r()`, zero-copy conversion is only guaranteed when:

1. The data being converted is backed by an Arrow array. This is the case for 

   data that was previously converted from Python with `to_py()`, or that the 

   user created manually from an Arrow array.

2. The data is integer (i.e. `int32`) or numeric (i.e. `float64`).

3. The data is being converted to a NumPy array or polars Series.

| R                                                                            | Python                                                                                                                                                                                        |

|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

| `NULL`                                                                       | `None`                                                                                                                                                                                        |

| `NA`                                                                         | `None` (if scalar or `format='polars'`), `None`/`nan`/`pd.NA`/`pd.NaT`/`np.datetime64('NaT', 'us')`/`np.timedelta64('NaT', 'ns')`/etc. (if `format='numpy'` `'pandas'` or `'pandas-pyarrow'`) |

| `NaN`                                                                        | `nan`                                                                                                                                                                                         |

| length-1 vector, matrix or array, `squeeze == False`                         | scalar                                                                                                                                                                                        | 

| vector or 1D array, `format == 'numpy'`                                      | 1D NumPy array                                                                                                                                                                                |

| vector or 1D array, `format == 'pandas'` or `format == 'pandas-pyarrow'`     | pandas Series                                                                                                                                                                                 |

| vector or 1D array, `format == 'polars'`                                     | polars Series (if `index=False`) or two-column DataFrame                                                                                                                                      |

| matrix or `data.frame`, `format == 'numpy'`                                  | 2D NumPy array                                                                                                                                                                                |

| matrix or `data.frame`, `format == 'pandas'` or `format == 'pandas-pyarrow'` | pandas DataFrame                                                                                                                                                                              |

| matrix or `data.frame`, `format == 'polars'`                                 | polars DataFrame                                                                                                                                                                              |

| ≥ 3D array                                                                | NumPy array                                                                                                                                                                                   |  

| unnamed list                                                                 | `list`                                                                                                                                                                                        |

| named list, S3 object, S4 object, environment, S6 object                     | `dict`                                                                                                                                                                                        |

| `dgRMatrix`                                                                  | `csr_array(dtype='float64')`                                                                                                                                                                  |

| `dgCMatrix`                                                                  | `csc_array(dtype='float64')`                                                                                                                                                                  |

| `dgTMatrix`                                                                  | `coo_array(dtype='float64')`                                                                                                                                                                  |

| `lgRMatrix`, `ngRMatrix`                                                     | `csr_array(dtype=bool)`                                                                                                                                                                       |

| `lgCMatrix`, `ngCMatrix`                                                     | `csc_array(dtype=bool)`                                                                                                                                                                       |

| `lgTMatrix`, `ngTMatrix`                                                     | `coo_array(dtype=bool)`                                                                                                                                                                       |

| formula (`~`)                                                                | --                                                                                                                                                                                            |

#### Data types

| R                           | Python scalar                        | NumPy                                                    | pandas                                                   | pandas-pyarrow                                                 | polars                                     |

|-----------------------------|--------------------------------------|----------------------------------------------------------|----------------------------------------------------------|----------------------------------------------------------------|--------------------------------------------|

| `logical`                   | `bool`                               | `bool`                                                   | `bool`                                                   | `ArrowDtype(pa.bool_())`                                       | `Boolean`                                  |

| `integer`                   | `int`                                | `int32`                                                  | `int32`                                                  | `ArrowDtype(pa.int32())`                                       | `Int32`                                    |

| `bit64::integer64`          | `int`                                | `int64`                                                  | `int64`                                                  | `ArrowDtype(pa.int64())`                                       | `Int64`                                    |

| `numeric`                   | `float`                              | `float`                                                  | `float`                                                  | `ArrowDtype(pa.float64())`                                     | `Float64`                                  |

| `character`                 | `str`                                | `object` (with `str` elements)                           | `object` (with `str` elements)                           | `ArrowDtype(pa.string())`                                      | `String`                                   |

| `complex`                   | `complex`                            | `complex128`                                             | `complex128`                                             | `complex128`                                                   | --                                         |

| `raw`                       | `bytearray`                          | --                                                       | --                                                       | --                                                             | --                                         |

| unordered `factor`          | `str`                                | `object` (with `str` elements)                           | `CategoricalDtype(ordered=False)`                        | `ArrowDtype(pa.dictionary(pa.int8(), pa.string(), ordered=0))` | `Categorical`                              |

| ordered `factor`            | `str`                                | `object` (with `str` elements)                           | `CategoricalDtype(ordered=True)`                         | `ArrowDtype(pa.dictionary(pa.int8(), pa.string(), ordered=1))` | `Enum`                                     |

| `POSIXct` without time zone | `datetime.datetime`^&ast;  | `datetime64[us]`^&ast;                         | `datetime64[us]`^&ast;                         | `ArrowDtype(pa.timestamp('us'))`^&ast;               | `Datetime('us')`^&ast;           |

| `POSIXct` with time zone    | `datetime.datetime`^&ast;  | `datetime64[us]`^&ast; (time zone discarded)   | `DatetimeTZDtype('us', time_zone)`^&ast;       | `ArrowDtype(pa.timestamp('us', time_zone))`^&ast;    | `Datetime('us, time_zone)`^&ast; | 

| `POSIXlt`                   | `dict` of scalars                    | `dict` of NumPy arrays                                   | `dict` of pandas Series                                  | `dict` of pandas Series                                        | `dict` of polars Series                    |

| `Date`                      | `datetime.date`                      | `datetime64[D]`                                          | `datetime64[ms]`                                         | `ArrowDtype(pa.date32('day'))`                                 | `Date`                                     |

| `difftime`                  | `datetime.timedelta`^&ast; | `timedelta64[ns]`                                        | `timedelta64[ns]`                                        | `ArrowDtype(pa.duration('ns'))`                                | `Duration(time_unit='ns')`                 |

| `hms::hms`                  | `datetime.time`^&ast;      | `object` (with `datetime.time` elements)^&ast; | `object` (with `datetime.time` elements)^&ast; | `ArrowDtype(pa.time64('ns'))`^&ast;                  | `Time`                                     |

| `vctrs::unspecified`        | `None`                               | `object` (with `None` elements)                          | `object` (with `None` elements)                          | `ArrowDtype(pa.null())`                                        | `Null`                                     |

^&ast; Due to the limitations of conversion with Arrow, `POSIXct` and

`hms::hms` values are rounded down to the nearest microsecond when converting

to Python, except for `hms::hms` when converting to polars. `difftime` values

are also rounded down to the nearest microsecond, but only when converting to

scalar `datetime.timedelta` values (which cannot represent nanoseconds).

## Examples

1. Apply R's `scale()` function to a pandas DataFrame:

```python

import pandas as pd

from ryp import r, to_py, to_r

data = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 3, 4]})

to_r(data, 'data')

r('data')

#   a b

# 1 1 1

# 2 2 3

# 3 3 4

r('data <- scale(data)')  # scale the R data.frame

scaled_data = to_py('data', format='pandas')  # convert the R data.frame to Python

scaled_data

#      a         b

# 0 -1.0 -1.091089

# 1  0.0  0.218218

# 2  1.0  0.872872

```

Note: we could have just written `to_py('scale(data)')` instead of

`r('data <- scale(data)')` followed by `to_py('data')`. We could also have 

run `options(to_py_format='pandas')` at the top, to avoid having to specify

`format='pandas'` in each `to_py()` call.

2. Run a linear model on a polars DataFrame:

```python

import polars as pl

from ryp import r, to_py, to_r

data = pl.DataFrame({'y': [7, 1, 2, 3, 6], 'x': [5, 2, 3, 2, 5]})

to_r(data, 'data')

r('model <- lm(y ~ x, data=data)')

coef = to_py('summary(model)$coefficients', index='variable')

p_value = coef.filter(variable='x').select('Pr(>|t|)')[0, 0]

p_value

# 0.02831035772841884

```

3. Recursive conversion, showcasing all the keyword arguments of `to_r()` and

   `to_py()`:

```python

import numpy as np

from ryp import r, to_py, to_r

arrays = {'ints': np.array([[1, 2], [3, 4]]),

          'floats': np.array([[0.5, 1.5], [2.5, 3.5]])}

to_r(arrays, 'arrays', format='data.frame',

     rownames = ['row1', 'row2'], colnames = ['col1', 'col2'])

r('arrays')

# $ints

#      col1 col2

# row1    1    2

# row2    3    4

# 

# $floats

#      col1 col2

# row1  0.5  1.5

# row2  2.5  3.5

arrays = to_py('arrays', format='pandas', index='foo')

arrays['ints']

#       col1  col2

# foo

# row1     1     2

# row2     3     4

arrays['floats']

#       col1  col2

# foo

# row1   0.5   1.5

# row2   2.5   3.5

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wainberg/ryp

Awesome Lists containing this project

README