https://github.com/rvandewater/ReciPies
🥧ReciPys: easily define and execute preprocessing and feature engineering steps on Polars and Pandas dataframes.
https://github.com/rvandewater/ReciPies
data-science pandas polars python scikit-learn tidymodels
Last synced: 8 months ago
JSON representation
🥧ReciPys: easily define and execute preprocessing and feature engineering steps on Polars and Pandas dataframes.
- Host: GitHub
- URL: https://github.com/rvandewater/ReciPies
- Owner: rvandewater
- License: mit
- Created: 2022-11-25T11:56:34.000Z (over 3 years ago)
- Default Branch: development
- Last Pushed: 2024-10-24T12:53:26.000Z (over 1 year ago)
- Last Synced: 2025-01-31T02:21:53.872Z (over 1 year ago)
- Topics: data-science, pandas, polars, python, scikit-learn, tidymodels
- Language: Python
- Homepage:
- Size: 3.74 MB
- Stars: 5
- Watchers: 1
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

# 🥧ReciPys🐍
[](https://github.com/rvandewater/recipys/actions/workflows/ci.yml)
[](https://github.com/psf/black)

[](LICENSE)
[](https://pypi.python.org/pypi/recipies/)
[](http://arxiv.org/abs/2306.05109)
The ReciPys package is a preprocessing framework operating on [Polars](https://github.com/pola-rs/polars)
and [Pandas](https://github.com/pandas-dev/pandas) dataframes. The backend can be chosen by the user.
The operation of this package is inspired by the R-package [recipes](https://recipes.tidymodels.org/).
This package allows the user to apply a number of extensible operations for imputation, feature generation/extraction,
scaling, and encoding.
It operates on modified Dataframe objects from the established data science package Pandas.
## Installation
You can install ReciPys from pip using:
```
pip install recipies
```
> Note that the package is called `recipies` and not `recipys` on pip due to a name clash with an existing package.
>
You can install ReciPys from source to ensure you have the latest version:
```
conda env update -f environment.yml
conda activate recipys
pip install -e .
```
> Note that the last command installs the package called `recipies`.
## Usage
To define preprocessing operations, one has to supply _roles_ to the different columns of the Dataframe.
This allows the user to create groups of columns which have a particular function.
Then, we provide several "steps" that can be applied to the datasets, among which: Historical accumulation,
Resampling the time resolution, A number of imputation methods, and a wrapper for any
[Scikit-learn](https://github.com/scikit-learn/scikit-learn) preprocessing step.
We believe to have covered any basic preprocessing needs for prepared datasets.
Any missing step can be added by following the step interface.
# 📄Paper
If you use this code in your research, please cite the following publication (a standalone paper is in preparation):
```
@inproceedings{vandewaterYetAnotherICUBenchmark2024,
title = {Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML},
shorttitle = {Yet Another ICU Benchmark},
booktitle = {The Twelfth International Conference on Learning Representations},
author = {van de Water, Robin and Schmidt, Hendrik Nils Aurel and Elbers, Paul and Thoral, Patrick and Arnrich, Bert and Rockenschaub, Patrick},
year = {2024},
month = oct,
urldate = {2024-02-19},
langid = {english},
}
```
This paper can also be found on arxiv: https://arxiv.org/pdf/2306.05109.pdf