{"id":22851984,"url":"https://github.com/wainberg/ryp","last_synced_at":"2025-04-13T02:10:09.593Z","repository":{"id":257804224,"uuid":"864722356","full_name":"Wainberg/ryp","owner":"Wainberg","description":"R inside Python","archived":false,"fork":false,"pushed_at":"2025-04-11T01:46:11.000Z","size":404,"stargazers_count":173,"open_issues_count":1,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-13T02:09:55.922Z","etag":null,"topics":["bioinformatics","data-science","python","python-to-r","r","r-to-python","rstats","statistics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Wainberg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-29T01:43:04.000Z","updated_at":"2025-04-11T01:46:15.000Z","dependencies_parsed_at":null,"dependency_job_id":"e543277d-0539-493a-a61f-89423a58b26f","html_url":"https://github.com/Wainberg/ryp","commit_stats":null,"previous_names":["wainberg/ryp"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wainberg%2Fryp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wainberg%2Fryp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wainberg%2Fryp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Wainberg%2Fryp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Wainberg","download_url":"https://codeload.github.com/Wainberg/ryp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248654090,"owners_count":21140236,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","data-science","python","python-to-r","r","r-to-python","rstats","statistics"],"created_at":"2024-12-13T06:06:40.031Z","updated_at":"2025-04-13T02:10:09.520Z","avatar_url":"https://github.com/Wainberg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ryp: R inside Python\r\n\r\nryp is a minimalist, powerful Python library for:\r\n- running R code inside Python\r\n- quickly transferring huge datasets between Python (NumPy/pandas/polars) and R\r\n  without writing to disk\r\n- interactively working in both languages at the same time\r\n\r\nryp is an alternative to the widely used [rpy2](https://github.com/rpy2/rpy2) \r\nlibrary. Compared to rpy2, ryp provides:\r\n- increased stability\r\n- a much simpler API, with less of a learning curve\r\n- interactive printouts of R variables that match what you'd see in R\r\n- a full-featured R terminal inside Python for interactive work\r\n- inline plotting in Jupyter notebooks (requires the `svglite` R package)\r\n- much faster data conversion with [Arrow](https://arrow.apache.org) (also\r\n  provided by [rpy2-arrow](https://github.com/rpy2/rpy2-arrow))\r\n- support for *every* NumPy, pandas and polars data type representable in base\r\n  R, no matter how obscure\r\n- support for sparse arrays/matrices\r\n- recursive conversion of containers like R lists, Python tuples/lists/dicts, \r\n  and S3/S4/R6 objects\r\n- full Windows support\r\n\r\nryp does the opposite of the \r\n[reticulate](https://rstudio.github.io/reticulate) R library, which runs Python\r\ninside R.\r\n\r\n## Table of Contents\r\n\r\n- [Installation](#installation)\r\n- [Functionality](#functionality)\r\n  - [`r()`](#r)\r\n  - [`to_r()`](#to_r)\r\n    - [The `format` argument](#the-format-argument)\r\n    - [The `rownames` and `colnames` arguments](#the-rownames-and-colnames-arguments)\r\n  - [`to_py()`](#to_py)\r\n    - [The `format` argument](#the-format-argument-1)\r\n    - [The `index` argument](#the-index-argument)\r\n    - [The `squeeze` argument](#the-squeeze-argument)\r\n  - [`options()`](#options)\r\n- [Conversion rules](#conversion-rules)\r\n  - [Python to R (`to_r()`)](#python-to-r-to_r)\r\n    - [NumPy data types](#numpy-data-types)\r\n    - [pandas-specific data types](#pandas-specific-data-types)\r\n    - [pandas Arrow data types (`pd.ArrowDtype`)](#pandas-arrow-data-types-pdarrowdtype)\r\n    - [Polars data types](#polars-data-types)\r\n    - [Notes](#notes)\r\n  - [R to Python (`to_py()`)](#r-to-python-to_py)\r\n    - [Data types](#data-types)\r\n- [Examples](#examples)\r\n\r\n## Installation\r\n\r\nInstall ryp via pip:\r\n\r\n```bash\r\npip install ryp\r\n```\r\n\r\nconda:\r\n\r\n```bash\r\nconda install ryp\r\n```\r\n\r\nor mamba:\r\n\r\n```bash\r\nmamba install ryp\r\n```\r\n\r\nOr, install the development version via pip:\r\n\r\n```bash\r\npip install git+https://github.com/Wainberg/ryp\r\n```\r\n\r\nryp's only mandatory dependencies are:\r\n- Python 3.7+\r\n- R\r\n- the [cffi](https://cffi.readthedocs.io/en/stable) Python package\r\n- the [pyarrow](https://arrow.apache.org/docs/python) Python package, which \r\n  includes [NumPy](https://numpy.org) as a dependency\r\n- the [arrow](https://arrow.apache.org/docs/r) R library\r\n\r\nR and the arrow R library are automatically installed when installing ryp via \r\nconda or mamba, but not via pip. ryp uses the R installation pointed to by the\r\nenvironment variable `R_HOME`, or if `R_HOME` is not defined or not a \r\ndirectory, by running `R RHOME` through `subprocess.run()`.\r\n\r\nryp also has several optional dependencies, which are not installed \r\nautomatically with pip, conda or mamba. These are:\r\n- [pandas](https://pandas.pydata.org), for `format='pandas'`\r\n- [polars](https://pola.rs), for `format='polars'`\r\n- [SciPy](https://scipy.org) and the\r\n  [Matrix](https://cran.r-project.org/web/packages/Matrix) R library, for sparse\r\n  matrices\r\n- the [svglite](https://cran.r-project.org/web/packages/svglite) R library, for\r\n  inline plotting in Jupyter notebooks\r\n\r\n## Functionality\r\n\r\nryp consists of just four functions:\r\n\r\n1. [`r(R_code)`](#r) runs a string of R code. [`r()`](#r) with no arguments \r\n   opens up an R terminal inside your Python terminal for interactive work.\r\n2. [`to_r(python_object, R_variable_name)`](#to_r) converts a Python object \r\n   into an R object named `R_variable_name`. \r\n3. [`to_py(R_statement)`](#to_py) converts the R object produced by evaluating \r\n   `R_statement` to Python. `R_statement` may be a single variable name, or a \r\n   more complex code snippet that evaluates to the R object you'd like to \r\n   convert.\r\n4. [`options()`](#options), for getting or setting ryp's configuration options.\r\n\r\n### `r()`\r\n\r\n```python\r\nr(R_code: str = ...) -\u003e None\r\n```\r\n\r\n`r(R_code)` runs a string of R code inside ryp's R interpreter, which is \r\nembedded inside Python. It can contain multiple statements separated by\r\nsemicolons or newlines (e.g. within a triple-quoted Python string). It returns\r\n`None`; use `to_py()` instead if you would like to convert the result back to \r\nPython.\r\n\r\n`r()` with no arguments opens up an R terminal inside your Python terminal \r\nfor interactive debugging. Press `Ctrl + D` to exit back to the Python \r\nterminal. R variables defined from Python will be available in the R terminal,\r\nand variables defined in the R terminal will be available from Python once you\r\nexit:\r\n\r\n```python\r\n\u003e\u003e\u003e from ryp import r\r\n\u003e\u003e\u003e r('a = 1')\r\n\u003e\u003e\u003e r()\r\n\u003e a\r\n[1]\r\n1\r\n\u003e b \u003c- 2\r\n\u003e\r\n\u003e\u003e\u003e r('b')\r\n[1]\r\n2\r\n```\r\n\r\nNote that the default value for `R_code` is the special sentinel value `...` \r\n(`Ellipsis`) rather than `None`. This stops users from inadvertently opening \r\nthe terminal when passing a variable that is supposed to be a string but is \r\nunexpectedly `None`.\r\n\r\n### `to_r()`\r\n\r\n```python\r\nto_r(python_object: object, R_variable_name: str, *, \r\n     format: Literal['keep', 'matrix', 'data.frame'] | None = None,\r\n     rownames: object = None, colnames: object = None) -\u003e None\r\n```\r\n\r\n`to_r(python_object, R_variable_name)` converts `python_object` to R, adding it\r\nto R's global namespace (`globalenv`) as a variable named `R_variable_name`. \r\n\r\nIf `python_object` is a container (`list`, `tuple`, or `dict`), `to_r()`\r\nrecursively converts each element and returns an R named list (if\r\n`python_object` is a `dict`) or unnamed list (if `python_object` is a `list` or\r\n`tuple`).\r\n\r\n#### The `format` argument\r\n\r\nBy default (`format='keep'`), ryp converts polars and pandas DataFrames (and \r\npandas MultiIndexes) into R data frames, and 2D NumPy arrays into R matrices. \r\nSpecify `format='matrix'` to convert everything (even DataFrames) to R matrices\r\n(in which case all DataFrame columns must have the same type), and \r\n`format='data.frame'` to convert everything (even 2D NumPy arrays) to R \r\ndata frames.\r\n\r\n`format` must be `None` unless `python_object` is a DataFrame, MultiIndex or 2D\r\nNumPy array – or unless `python_object` is a `list`, `tuple`, or `dict`, in \r\nwhich case the `format` will apply recursively to any DataFrames, MultiIndexes\r\nor 2D NumPy arrays it contains.\r\n\r\n#### The `rownames` and `colnames` arguments\r\n\r\nSince NumPy arrays, polars DataFrames and Series, and scipy sparse arrays and \r\nmatrices lack row and column names, you can specify these separately via the \r\n`rownames` and/or `colnames` arguments, and they will be added to the converted\r\nR object. `rownames` and `colnames` can be lists, tuples, string Series, or \r\ncategorical Series with string categories, and will be automatically converted\r\nto R character vectors. \r\n\r\n`rownames` and `colnames` must match the length or `shape[1]`, respectively, of\r\nthe object being converted. The one exception is that rownames of any length\r\nmay be added to a 0 \u0026times; 0 polars DataFrame, since polars does not have the \r\nconcept of an `N` \u0026times; 0 DataFrame for nonzero `N`. (Dropping all the \r\ncolumns of a polars DataFrame always results in a 0 \u0026times; 0 DataFrame, even \r\nif the original DataFrame had more than 0 rows.)\r\n\r\nBecause Python `bool`, `int`, `float`, and `str` convert to length-1 R vectors\r\nthat support names, you can pass length-1 `rownames` when converting objects of\r\nthese types. You can also pass `rownames` and/or `colnames` when \r\n`python_object` is a `list`, `tuple`, or `dict`, in which case row and column \r\nnames will only be added to elements that support them. All elements that \r\nsupport `rownames` must have the same length as the `rownames`, and similarly \r\nfor the `colnames`. \r\n\r\n`rownames` cannot be specified if `python_object` is a pandas Series or \r\nDataFrame (since they already have rownames, i.e. an index), or \r\n`bytes`/`bytearray` (since these convert to `raw` vectors, which lack \r\nrownames). `colnames` cannot be specified unless `python_object` is a \r\nmultidimensional NumPy array or scipy sparse array or matrix, or something that\r\nmight contain one (`list`, `tuple`, or `dict`).\r\n\r\n### `to_py()`\r\n\r\n```python\r\nto_py(R_statement: str, *,\r\n      format: Literal['polars', 'pandas', 'pandas-pyarrow', 'numpy'] |\r\n              dict[Literal['vector', 'matrix', 'data.frame'],\r\n                   Literal['polars', 'pandas', 'pandas-pyarrow',\r\n                           'numpy']] | None = None,\r\n      index: str | Literal[False] | None = None,\r\n      squeeze: bool | None = None) -\u003e Any\r\n```\r\n\r\n`to_py(R_statement)` runs a single statement of R code (which can be as simple \r\nas a single variable name) and converts the resulting R object to Python. \r\n\r\nIf the object is a list/S3 object, S4 object, or environment/R6 object, it\r\nrecursively converts each attribute/slot/field and returns a Python `dict` (or \r\n`list`, if the object is an unnamed list). For R6 objects, only public fields\r\nwill be converted.\r\n\r\n#### The `format` argument\r\n\r\nBy default, or when `format='polars'`, R vectors will be converted to polars \r\nSeries, and R data frames and matrices will be converted to polars DataFrames. \r\nYou can change this by setting the `format` argument to `'pandas'`, \r\n`'pandas-pyarrow'` (like `'pandas'`, but converting to pyarrow dtypes wherever \r\npossible) or `'numpy'`. (You can also change the default format, e.g. with \r\n`options(to_py_format='pandas')`.)\r\n\r\nFor finer-grained control, you can set `format` for only certain R variable \r\ntypes by specifying a dictionary with `'vector'`, `'matrix'`, and/or\r\n`'data.frame'` as keys and `'polars'`, `'pandas'`, `'pandas-pyarrow'` and/or \r\n`'numpy'` as values. \r\n\r\n`format` must be `None` when `R_statement` evaluates to `NULL`, when it \r\nevaluates to an array of 3 or more dimensions (these are always converted to \r\nNumPy arrays), or when the final result would be a Python scalar (see `squeeze`\r\nbelow).\r\n\r\n#### The `index` argument\r\n\r\nBy default, the R object's `names` or `rownames` will become the index (for \r\npandas) or the first column (for polars) of the output Python object, named \r\n`'index'`. Set the `index` argument to a different string to change this name, \r\nor set `index=False` to not convert the `names`/`rownames`. \r\n\r\nNote that for polars, the output will be a two-column DataFrame (not a Series!)\r\nwhen the input is an R vector, unless `index=False`. \r\n\r\nWhen the output is a NumPy array, `names` and `rownames` will always be \r\ndiscarded, since numeric NumPy arrays cannot store string indexes except with \r\nthe inefficient `dtype=object`. \r\n\r\n`index` must be `None` when `format='numpy'`, or when the final result would be\r\na Python scalar (see `squeeze` below).\r\n\r\n#### The `squeeze` argument\r\n\r\nBy default, length-1 R vectors, matrices and arrays will be converted to Python\r\nscalars instead of Python arrays, Series or DataFrames. Set `squeeze=False` to\r\ndisable this special case. (R data frames are never converted to Python scalars\r\neven if `squeeze=True`.) \r\n\r\n`squeeze` must be `None` unless the R object is a vector, matrix or array\r\n(`raw` vectors don't count, because they always convert to Python scalars).\r\n\r\n### `options()`\r\n\r\n```python\r\noptions(*, to_r_format=None, to_py_format=None, index=None, squeeze=None, \r\n        plot_width: int | float | None = None, \r\n        plot_height: int | float | None = None) -\u003e None\r\n```\r\n\r\n`options` gets or sets ryp's configuration settings:\r\n\r\n- `to_r_format`: the default value for the `format` parameter in `to_r()`; \r\n  must be `'keep'` (the default), `'matrix'`, or `'data.frame'`.\r\n- `to_py_format`: the default value for the `format` parameter in `to_py()`; \r\n  must be `'polars'` (the default), `'pandas'`, `'pandas-pyarrow'`, `'numpy'`,\r\n  or a dictionary with one of those four Python formats and/or `None` as values\r\n  and `'vector'`, `'matrix'` and/or `'data.frame'` as keys. If certain keys are \r\n  missing or have `None` as the format, leave their format unchanged.\r\n- `index`: the default value for the `index` parameter in to_py(); must be a \r\n  string (default: `'index'`) or `False`. \r\n- `squeeze`: the default value for the `squeeze` parameter in `to_py()`; must  \r\n  be `True` (the default) or `False`.\r\n- `plot_width`: the width, in inches, of inline plots in Jupyter notebooks;\r\n  must be a positive number. Defaults to 6.4 inches, to match Matplotlib's \r\n  default.\r\n- `plot_height`: the height, in inches, of inline plots in Jupyter notebooks;\r\n  must be a positive number. Defaults to 4.8 inches, to match Matplotlib's \r\n  default.\r\n\r\nFor instance, to set pandas as the default format in `to_py()`, run \r\n`options(to_py_format='pandas')`. This leaves the other options unchanged.\r\n\r\n`options()` with no arguments returns the current configuration options as a \r\ndictionary, with keys `to_r_format`, `to_py_format`, `index`, `squeeze`, \r\n`plot_width`, and `plot_height`.\r\n\r\nFor additional customization, users can specify ryp-specific settings in their\r\n`.Rprofile`:\r\n\r\n```R\r\nif (\"ryp\" %in% commandArgs()) {\r\n    # Custom settings for running R within ryp\r\n} else {\r\n    # Custom settings for native R\r\n}\r\n```\r\n\r\n## Conversion rules\r\n\r\n### Python to R (`to_r()`)\r\n\r\nArrays and Series with `float64`, `int32`, `int64`, `uint32`, and `uint64` data \r\ntypes will be converted without copying the underlying data (\"zero-copy\"). This \r\nis extremely fast but comes with two important caveats:\r\n\r\n1. Modifying the data in R will also modify it in Python, and vice versa.\r\n2. Because R lacks support for unsigned integers, `uint32` values will be \r\n   reinterpreted as `int32` and `uint64` values will be reinterpreted as \r\n   `int64`. This means that for `uint32`, `2_147_483_648` (`INT32_MAX + 1`) \r\n   will become `NA` and larger values will become negative numbers. For \r\n   `uint64`, `9_223_372_036_854_775_808` (`INT64_MAX + 1`) will become `NA` and \r\n   larger values will become negative numbers.\r\n\r\nPolars Series with `null` values or multiple chunks, and pandas Series with\r\nnon-NumPy data types, will *not* be converted zero-copy, but are still subject \r\nto caveat #2.\r\n\r\n| Python                                                                  | R                                                                                                         |\r\n|-------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|\r\n| `None`                                                                  | `NULL` (if scalar) or `NA` (if inside NumPy, pandas or polars)                                            |\r\n| `nan`                                                                   | `NaN` (if scalar or inside NumPy or polars) or `NA` (if inside pandas)                                    |\r\n| `pd.NA`                                                                 | `NA`                                                                                                      |\r\n| `pd.NaT`, `np.datetime64('NaT')`, `np.timedelta64('NaT')`               | `NA`                                                                                                      |   \r\n| `bool`                                                                  | length-1 `logical` vector                                                                                 |\r\n| `int`                                                                   | length-1 `integer` (if `abs(x) \u003c= 2_147_483_647`) or `bit64::integer64` vector                            |\r\n| `float`                                                                 | length-1 `numeric` vector                                                                                 |\r\n| `str`                                                                   | length-1 `character` vector                                                                               |\r\n| `complex`                                                               | length-1 `complex` vector                                                                                 |\r\n| `datetime.date`                                                         | length-1 `Date` vector                                                                                    |\r\n| `datetime.datetime`                                                     | length-1 `POSIXct` vector                                                                                 |\r\n| `datetime.timedelta`                                                    | length-1 `difftime(units='secs')` vector                                                                  |\r\n| `datetime.time` (`tzinfo` must be `None`)                               | length-1 `hms::hms` vector                                                                                |\r\n| `bytes`, `bytearray`                                                    | `raw` vector                                                                                              |\r\n| `list`, `tuple`                                                         | unnamed list                                                                                              |\r\n| `dict` (all keys must be strings)                                       | named list                                                                                                |\r\n| polars Series, pandas Series\u003csup\u003e\u0026ast;\u003c/sup\u003e, pandas `Index`            | vector                                                                                                    |\r\n| polars DataFrame, pandas DataFrame\u003csup\u003e\u0026ast;\u003c/sup\u003e, pandas `MultiIndex` | matrix\u003csup\u003e\u0026dagger;\u003c/sup\u003e (if `format == 'matrix'`; all columns must have same data type) or `data.frame` |\r\n| 1D NumPy array                                                          | vector                                                                                                    |\r\n| 2D NumPy array                                                          | `data.frame` (if `format == 'data.frame'`) or matrix\u003csup\u003e\u0026dagger;\u003c/sup\u003e                                   |\r\n| \u0026ge; 3D NumPy array                                                     | array\u003csup\u003e\u0026dagger;\u003c/sup\u003e                                                                                  |\r\n| 0D NumPy array (e.g. `np.array(1)`), NumPy generic (e.g. `np.int32(1)`) | length-1 vector                                                                                           |\r\n| `csr_array`, `csr_matrix`                                               | `dgRMatrix` (if `float64`), `lgRMatrix` (if `bool`), -- (otherwise)                                       | \r\n| `csc_array`, `csc_matrix`                                               | `dgCMatrix` (if `float64`), `lgCMatrix` (if `bool`), -- (otherwise)                                       |\r\n| `coo_array`, `coo_matrix`                                               | `dgTMatrix` (if `float64`), `lgTMatrix` (if `bool`), -- (otherwise)                                       |\r\n\r\n#### NumPy data types\r\n\r\n| Python                                                | R                                          |\r\n|-------------------------------------------------------|--------------------------------------------|\r\n| `bool`                                                | `logical`                                  |\r\n| `int8`, `uint8`, `int16`, `uint16`, `int32`, `uint32` | `integer`                                  |\r\n| `int64`, `uint64`                                     | `bit64::integer64`                         |\r\n| `float16`, `float32`, `float64`, `float128`           | `numeric`                                  |\r\n| `complex64`, `complex128`                             | `complex`                                  |\r\n| `bytes` (e.g. `'S1'`)                                 | --                                         |\r\n| `str`/`unicode` (e.g. `'U1'`)                         | `character`                                |\r\n| `datetime64`                                          | `POSIXct`                                  | \r\n| `timedelta64`                                         | `difftime(units='secs')`                   |\r\n| `void` (unstructured)                                 | `raw`                                      |\r\n| `void` (structured)                                   | --                                         |\r\n| `object`                                              | depends on the contents\u003csup\u003e\u0026Dagger;\u003c/sup\u003e |\r\n\r\n#### pandas-specific data types\r\n\r\n| Python                                                                              | R                  |\r\n|-------------------------------------------------------------------------------------|--------------------|\r\n| `BooleanDtype`                                                                      | `logical`          |\r\n| `Int8Dtype`, `UInt8Dtype`, `Int16Dtype`, `UInt16Dtype`, `Int32Dtype`, `UInt32Dtype` | `integer`          |\r\n| `Int64Dtype`, `UInt64Dtype`                                                         | `bit64::integer64` |  \r\n| `Float32Dtype`, `Float64Dtype`                                                      | `numeric`          |\r\n| `StringDtype`                                                                       | `character`        |\r\n| `CategoricalDtype(ordered=False)`                                                   | unordered `factor` |\r\n| `CategoricalDtype(ordered=True)`                                                    | ordered `factor`   |\r\n| `DatetimeTZDtype`, `PeriodDtype`                                                    | `POSIXct`          |\r\n| `IntervalDtype`, `SparseDtype`                                                      | --                 |\r\n\r\n#### pandas Arrow data types (`pd.ArrowDtype`)\r\n\r\n| Python                                                                  | R                        |\r\n|-------------------------------------------------------------------------|--------------------------|\r\n| `pa.bool_`                                                              | `logical`                |\r\n| `pa.int8`, `pa.uint8`, `pa.int16`, `pa.uint16`, `pa.int32`, `pa.uint32` | `integer`                |\r\n| `pa.int64`, `pa.uint64`                                                 | `bit64::integer64`       |\r\n| `pa.float32`, `pa.float64`                                              | `numeric`                |\r\n| `pa.string`, `pa.large_string`                                          | `character`              |\r\n| `pa.date32`                                                             | `Date`                   |\r\n| `pa.date64`, `pa.timestamp`                                             | `POSIXct`                |\r\n| `pa.duration`                                                           | `difftime(units='secs')` |\r\n| `pa.time32`, `pa.time64`                                                | `hms::hms`               |\r\n| `pa.dictionary(any integer type, pa.string(), ordered=0)`               | unordered `factor`       |\r\n| `pa.dictionary(any integer type, pa.string(), ordered=1)`               | ordered `factor`         |\r\n| `pa.null()`                                                             | `vctrs::unspecified`     |\r\n\r\n#### Polars data types\r\n\r\n| Python                                                | R                                          |\r\n|-------------------------------------------------------|--------------------------------------------|\r\n| `Boolean`                                             | `logical`                                  |\r\n| `Int8`, `UInt8`, `Int16`, `UInt16`, `Int32`, `UInt32` | `integer`                                  |\r\n| `Int64`, `UInt64`                                     | `bit64::integer64`                         |\r\n| `Float32`, `Float64`                                  | `numeric`                                  |\r\n| `Date`                                                | `Date`                                     |\r\n| `Datetime`                                            | `POSIXct`                                  |\r\n| `Duration`                                            | `difftime(units='secs')`                   |\r\n| `Time`                                                | `hms::hms`                                 |\r\n| `String`                                              | `character`                                |\r\n| `Categorical`                                         | unordered `factor`                         |\r\n| `Enum`                                                | ordered `factor`                           |\r\n| `Object`                                              | depends on the contents\u003csup\u003e\u0026Dagger;\u003c/sup\u003e |\r\n| `Null`                                                | `vctrs::unspecified`                       | \r\n| `Binary`, `Decimal`, `List`, `Array`                  | --                                         |\r\n\r\n#### Notes\r\n\r\n\u003csup\u003e\u0026ast;\u003c/sup\u003e For pandas Series and DataFrames, string indexes (and \r\ncategorical indexes where the categories are strings) will be automatically\r\nconverted to `names`/`rownames`. The default index\r\n(`pd.RangeIndex(len(python_object))`) will be ignored. All other indexes are\r\ndisallowed. \r\n\r\n\u003csup\u003e\u0026dagger;\u003c/sup\u003e Because R does not support `POSIXct` and `Date` matrices or\r\narrays, dates and datetimes cannot be converted to R matrices or arrays.\r\n\r\n\u003csup\u003e\u0026Dagger;\u003c/sup\u003e For `dtype=object` and `dtype=pl.Object`, the output R type\r\ndepends on the contents, e.g. `'character'` if all elements are strings. Some\r\nadditional notes on ryp's handling of object data types:\r\n- `None`, `np.nan`, `pd.NA`, `pd.NaT`, `np.datetime64('NaT')`, and \r\n  `np.timedelta64('NaT')` are all treated as missing values \u0026ndash; even for \r\n  polars, where `np.nan` is ordinarily treated as a floating-point number \r\n  rather than a missing value. \r\n- Length-0 and all-missing data will be converted to the `vctrs::unspecified` R\r\n  type (`vctrs` is part of the tidyverse). \r\n- If the elements are objects with a mix of types (or datetimes with a mix of\r\n  time zones), Arrow will generally cause the conversion to fail, though mixes\r\n  of related types (e.g. int and float) will be automatically cast to the\r\n  common supertype and succeed. \r\n- Conversion will also fail if the contents are objects that are not \r\n  representable as R vector elements. This includes `bytes`/`bytearray` (which\r\n  are only representable in R when scalar, as a `raw` vector) and Python\r\n  containers (`list`, `tuple`, and  `dict`). \r\n- pandas `Timedelta` objects will be rounded down to the nearest microsecond,\r\n  following the behavior of Arrow.\r\n\r\n### R to Python (`to_py()`)\r\n\r\nUnlike `to_r()`, zero-copy conversion is only guaranteed when:\r\n1. The data being converted is backed by an Arrow array. This is the case for \r\n   data that was previously converted from Python with `to_py()`, or that the \r\n   user created manually from an Arrow array.\r\n2. The data is integer (i.e. `int32`) or numeric (i.e. `float64`).\r\n3. The data is being converted to a NumPy array or polars Series.\r\n\r\n| R                                                                            | Python                                                                                                                                                                                        |\r\n|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\r\n| `NULL`                                                                       | `None`                                                                                                                                                                                        |\r\n| `NA`                                                                         | `None` (if scalar or `format='polars'`), `None`/`nan`/`pd.NA`/`pd.NaT`/`np.datetime64('NaT', 'us')`/`np.timedelta64('NaT', 'ns')`/etc. (if `format='numpy'` `'pandas'` or `'pandas-pyarrow'`) |\r\n| `NaN`                                                                        | `nan`                                                                                                                                                                                         |\r\n| length-1 vector, matrix or array, `squeeze == False`                         | scalar                                                                                                                                                                                        | \r\n| vector or 1D array, `format == 'numpy'`                                      | 1D NumPy array                                                                                                                                                                                |\r\n| vector or 1D array, `format == 'pandas'` or `format == 'pandas-pyarrow'`     | pandas Series                                                                                                                                                                                 |\r\n| vector or 1D array, `format == 'polars'`                                     | polars Series (if `index=False`) or two-column DataFrame                                                                                                                                      |\r\n| matrix or `data.frame`, `format == 'numpy'`                                  | 2D NumPy array                                                                                                                                                                                |\r\n| matrix or `data.frame`, `format == 'pandas'` or `format == 'pandas-pyarrow'` | pandas DataFrame                                                                                                                                                                              |\r\n| matrix or `data.frame`, `format == 'polars'`                                 | polars DataFrame                                                                                                                                                                              |\r\n| \u0026ge; 3D array                                                                | NumPy array                                                                                                                                                                                   |  \r\n| unnamed list                                                                 | `list`                                                                                                                                                                                        |\r\n| named list, S3 object, S4 object, environment, S6 object                     | `dict`                                                                                                                                                                                        |\r\n| `dgRMatrix`                                                                  | `csr_array(dtype='float64')`                                                                                                                                                                  |\r\n| `dgCMatrix`                                                                  | `csc_array(dtype='float64')`                                                                                                                                                                  |\r\n| `dgTMatrix`                                                                  | `coo_array(dtype='float64')`                                                                                                                                                                  |\r\n| `lgRMatrix`, `ngRMatrix`                                                     | `csr_array(dtype=bool)`                                                                                                                                                                       |\r\n| `lgCMatrix`, `ngCMatrix`                                                     | `csc_array(dtype=bool)`                                                                                                                                                                       |\r\n| `lgTMatrix`, `ngTMatrix`                                                     | `coo_array(dtype=bool)`                                                                                                                                                                       |\r\n| formula (`~`)                                                                | --                                                                                                                                                                                            |\r\n\r\n#### Data types\r\n\r\n| R                           | Python scalar                        | NumPy                                                    | pandas                                                   | pandas-pyarrow                                                 | polars                                     |\r\n|-----------------------------|--------------------------------------|----------------------------------------------------------|----------------------------------------------------------|----------------------------------------------------------------|--------------------------------------------|\r\n| `logical`                   | `bool`                               | `bool`                                                   | `bool`                                                   | `ArrowDtype(pa.bool_())`                                       | `Boolean`                                  |\r\n| `integer`                   | `int`                                | `int32`                                                  | `int32`                                                  | `ArrowDtype(pa.int32())`                                       | `Int32`                                    |\r\n| `bit64::integer64`          | `int`                                | `int64`                                                  | `int64`                                                  | `ArrowDtype(pa.int64())`                                       | `Int64`                                    |\r\n| `numeric`                   | `float`                              | `float`                                                  | `float`                                                  | `ArrowDtype(pa.float64())`                                     | `Float64`                                  |\r\n| `character`                 | `str`                                | `object` (with `str` elements)                           | `object` (with `str` elements)                           | `ArrowDtype(pa.string())`                                      | `String`                                   |\r\n| `complex`                   | `complex`                            | `complex128`                                             | `complex128`                                             | `complex128`                                                   | --                                         |\r\n| `raw`                       | `bytearray`                          | --                                                       | --                                                       | --                                                             | --                                         |\r\n| unordered `factor`          | `str`                                | `object` (with `str` elements)                           | `CategoricalDtype(ordered=False)`                        | `ArrowDtype(pa.dictionary(pa.int8(), pa.string(), ordered=0))` | `Categorical`                              |\r\n| ordered `factor`            | `str`                                | `object` (with `str` elements)                           | `CategoricalDtype(ordered=True)`                         | `ArrowDtype(pa.dictionary(pa.int8(), pa.string(), ordered=1))` | `Enum`                                     |\r\n| `POSIXct` without time zone | `datetime.datetime`\u003csup\u003e\u0026ast;\u003c/sup\u003e  | `datetime64[us]`\u003csup\u003e\u0026ast;\u003c/sup\u003e                         | `datetime64[us]`\u003csup\u003e\u0026ast;\u003c/sup\u003e                         | `ArrowDtype(pa.timestamp('us'))`\u003csup\u003e\u0026ast;\u003c/sup\u003e               | `Datetime('us')`\u003csup\u003e\u0026ast;\u003c/sup\u003e           |\r\n| `POSIXct` with time zone    | `datetime.datetime`\u003csup\u003e\u0026ast;\u003c/sup\u003e  | `datetime64[us]`\u003csup\u003e\u0026ast;\u003c/sup\u003e (time zone discarded)   | `DatetimeTZDtype('us', time_zone)`\u003csup\u003e\u0026ast;\u003c/sup\u003e       | `ArrowDtype(pa.timestamp('us', time_zone))`\u003csup\u003e\u0026ast;\u003c/sup\u003e    | `Datetime('us, time_zone)`\u003csup\u003e\u0026ast;\u003c/sup\u003e | \r\n| `POSIXlt`                   | `dict` of scalars                    | `dict` of NumPy arrays                                   | `dict` of pandas Series                                  | `dict` of pandas Series                                        | `dict` of polars Series                    |\r\n| `Date`                      | `datetime.date`                      | `datetime64[D]`                                          | `datetime64[ms]`                                         | `ArrowDtype(pa.date32('day'))`                                 | `Date`                                     |\r\n| `difftime`                  | `datetime.timedelta`\u003csup\u003e\u0026ast;\u003c/sup\u003e | `timedelta64[ns]`                                        | `timedelta64[ns]`                                        | `ArrowDtype(pa.duration('ns'))`                                | `Duration(time_unit='ns')`                 |\r\n| `hms::hms`                  | `datetime.time`\u003csup\u003e\u0026ast;\u003c/sup\u003e      | `object` (with `datetime.time` elements)\u003csup\u003e\u0026ast;\u003c/sup\u003e | `object` (with `datetime.time` elements)\u003csup\u003e\u0026ast;\u003c/sup\u003e | `ArrowDtype(pa.time64('ns'))`\u003csup\u003e\u0026ast;\u003c/sup\u003e                  | `Time`                                     |\r\n| `vctrs::unspecified`        | `None`                               | `object` (with `None` elements)                          | `object` (with `None` elements)                          | `ArrowDtype(pa.null())`                                        | `Null`                                     |\r\n\r\n\u003csup\u003e\u0026ast;\u003c/sup\u003e Due to the limitations of conversion with Arrow, `POSIXct` and\r\n`hms::hms` values are rounded down to the nearest microsecond when converting\r\nto Python, except for `hms::hms` when converting to polars. `difftime` values\r\nare also rounded down to the nearest microsecond, but only when converting to\r\nscalar `datetime.timedelta` values (which cannot represent nanoseconds).\r\n\r\n## Examples\r\n\r\n1. Apply R's `scale()` function to a pandas DataFrame:\r\n\r\n```python\r\nimport pandas as pd\r\nfrom ryp import r, to_py, to_r\r\ndata = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 3, 4]})\r\nto_r(data, 'data')\r\nr('data')\r\n#   a b\r\n# 1 1 1\r\n# 2 2 3\r\n# 3 3 4\r\n\r\nr('data \u003c- scale(data)')  # scale the R data.frame\r\nscaled_data = to_py('data', format='pandas')  # convert the R data.frame to Python\r\nscaled_data\r\n#      a         b\r\n# 0 -1.0 -1.091089\r\n# 1  0.0  0.218218\r\n# 2  1.0  0.872872\r\n```\r\nNote: we could have just written `to_py('scale(data)')` instead of\r\n`r('data \u003c- scale(data)')` followed by `to_py('data')`. We could also have \r\nrun `options(to_py_format='pandas')` at the top, to avoid having to specify\r\n`format='pandas'` in each `to_py()` call.\r\n\r\n2. Run a linear model on a polars DataFrame:\r\n\r\n```python\r\nimport polars as pl\r\nfrom ryp import r, to_py, to_r\r\ndata = pl.DataFrame({'y': [7, 1, 2, 3, 6], 'x': [5, 2, 3, 2, 5]})\r\nto_r(data, 'data')\r\nr('model \u003c- lm(y ~ x, data=data)')\r\ncoef = to_py('summary(model)$coefficients', index='variable')\r\np_value = coef.filter(variable='x').select('Pr(\u003e|t|)')[0, 0]\r\np_value\r\n# 0.02831035772841884\r\n```\r\n\r\n3. Recursive conversion, showcasing all the keyword arguments of `to_r()` and\r\n   `to_py()`:\r\n\r\n```python\r\nimport numpy as np\r\nfrom ryp import r, to_py, to_r\r\narrays = {'ints': np.array([[1, 2], [3, 4]]),\r\n          'floats': np.array([[0.5, 1.5], [2.5, 3.5]])}\r\nto_r(arrays, 'arrays', format='data.frame',\r\n     rownames = ['row1', 'row2'], colnames = ['col1', 'col2'])\r\nr('arrays')\r\n# $ints\r\n#      col1 col2\r\n# row1    1    2\r\n# row2    3    4\r\n# \r\n# $floats\r\n#      col1 col2\r\n# row1  0.5  1.5\r\n# row2  2.5  3.5\r\narrays = to_py('arrays', format='pandas', index='foo')\r\narrays['ints']\r\n#       col1  col2\r\n# foo\r\n# row1     1     2\r\n# row2     3     4\r\narrays['floats']\r\n#       col1  col2\r\n# foo\r\n# row1   0.5   1.5\r\n# row2   2.5   3.5\r\n```\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwainberg%2Fryp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwainberg%2Fryp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwainberg%2Fryp/lists"}