{"id":14958358,"url":"https://github.com/rvandewater/ReciPies","last_synced_at":"2025-10-24T14:31:54.421Z","repository":{"id":115800574,"uuid":"570525006","full_name":"rvandewater/ReciPys","owner":"rvandewater","description":"🥧ReciPys: easily define and execute preprocessing and feature engineering steps on Polars and Pandas dataframes. ","archived":false,"fork":false,"pushed_at":"2024-10-24T12:53:26.000Z","size":3917,"stargazers_count":5,"open_issues_count":3,"forks_count":2,"subscribers_count":1,"default_branch":"development","last_synced_at":"2025-01-31T02:21:53.872Z","etag":null,"topics":["data-science","pandas","polars","python","scikit-learn","tidymodels"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rvandewater.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-25T11:56:34.000Z","updated_at":"2025-01-20T15:14:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"edfc8094-5883-4b0b-a1da-20b8343899fe","html_url":"https://github.com/rvandewater/ReciPys","commit_stats":{"total_commits":173,"total_committers":4,"mean_commits":43.25,"dds":0.5028901734104047,"last_synced_commit":"b0bb6301166553866ab96fe11ccbd77293d307c5"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rvandewater%2FReciPys","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rvandewater%2FReciPys/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rvandewater%2FReciPys/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rvandewater%2FReciPys/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rvandewater","download_url":"https://codeload.github.com/rvandewater/ReciPys/tar.gz/refs/heads/development","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237990569,"owners_count":19398452,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","pandas","polars","python","scikit-learn","tidymodels"],"created_at":"2024-09-24T13:16:49.987Z","updated_at":"2025-10-24T14:31:48.761Z","avatar_url":"https://github.com/rvandewater.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![logo](https://github.com/rvandewater/ReciPys/blob/development/docs/figures/recipys_logo.png?raw=true)\n\n# 🥧ReciPys🐍\n\n[![CI](https://github.com/rvandewater/recipys/actions/workflows/ci.yml/badge.svg)](https://github.com/rvandewater/recipys/actions/workflows/ci.yml)\n[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n![Platform](https://img.shields.io/badge/platform-linux--64%20|%20win--64%20|%20osx--64-lightgrey)\n[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/recipies.svg)](https://pypi.python.org/pypi/recipies/)\n[![arXiv](https://img.shields.io/badge/arXiv-2306.05109-b31b1b.svg)](http://arxiv.org/abs/2306.05109)\n\nThe ReciPys package is a preprocessing framework operating on [Polars](https://github.com/pola-rs/polars)\nand [Pandas](https://github.com/pandas-dev/pandas) dataframes. The backend can be chosen by the user.\nThe operation of this package is inspired by the R-package [recipes](https://recipes.tidymodels.org/).\nThis package allows the user to apply a number of extensible operations for imputation, feature generation/extraction,\nscaling, and encoding.\nIt operates on modified Dataframe objects from the established data science package Pandas.\n\n## Installation\n\nYou can install ReciPys from pip using:\n\n```\npip install recipies\n```\n\n\u003e Note that the package is called `recipies` and not `recipys` on pip due to a name clash with an existing package.\n\u003e\nYou can install ReciPys from source to ensure you have the latest version:\n\n```\nconda env update -f environment.yml\nconda activate recipys\npip install -e .\n```\n\n\u003e Note that the last command installs the package called `recipies`.\n\n## Usage\n\nTo define preprocessing operations, one has to supply _roles_ to the different columns of the Dataframe.\nThis allows the user to create groups of columns which have a particular function.\nThen, we provide several \"steps\" that can be applied to the datasets, among which: Historical accumulation,\nResampling the time resolution, A number of imputation methods, and a wrapper for any\n[Scikit-learn](https://github.com/scikit-learn/scikit-learn) preprocessing step.\nWe believe to have covered any basic preprocessing needs for prepared datasets.\nAny missing step can be added by following the step interface.\n\n# 📄Paper\n\nIf you use this code in your research, please cite the following publication (a standalone paper is in preparation):\n\n```\n@inproceedings{vandewaterYetAnotherICUBenchmark2024,\n  title = {Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML},\n  shorttitle = {Yet Another ICU Benchmark},\n  booktitle = {The Twelfth International Conference on Learning Representations},\n  author = {van de Water, Robin and Schmidt, Hendrik Nils Aurel and Elbers, Paul and Thoral, Patrick and Arnrich, Bert and Rockenschaub, Patrick},\n  year = {2024},\n  month = oct,\n  urldate = {2024-02-19},\n  langid = {english},\n}\n\n```\n\nThis paper can also be found on arxiv: https://arxiv.org/pdf/2306.05109.pdf\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frvandewater%2FReciPies","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frvandewater%2FReciPies","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frvandewater%2FReciPies/lists"}