{"id":19382577,"url":"https://github.com/neuro-ml/reskit","last_synced_at":"2025-07-19T22:03:34.435Z","repository":{"id":71103711,"uuid":"75554162","full_name":"neuro-ml/reskit","owner":"neuro-ml","description":"A library for creating and curating reproducible pipelines for scientific and industrial machine learning","archived":false,"fork":false,"pushed_at":"2017-07-20T14:22:25.000Z","size":38126,"stargazers_count":27,"open_issues_count":13,"forks_count":7,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-23T20:47:47.551Z","etag":null,"topics":["data-preparation","grid-search","pipeline","prepare-data","python","reproducible-experiments","reproducible-research","scikit-learn"],"latest_commit_sha":null,"homepage":"http://reskit.readthedocs.io/en/0.1.x/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neuro-ml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-12-04T17:56:13.000Z","updated_at":"2023-07-25T14:05:50.000Z","dependencies_parsed_at":"2023-02-25T09:00:44.801Z","dependency_job_id":null,"html_url":"https://github.com/neuro-ml/reskit","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/neuro-ml/reskit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuro-ml%2Freskit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuro-ml%2Freskit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuro-ml%2Freskit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuro-ml%2Freskit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neuro-ml","download_url":"https://codeload.github.com/neuro-ml/reskit/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuro-ml%2Freskit/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266026162,"owners_count":23866030,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-preparation","grid-search","pipeline","prepare-data","python","reproducible-experiments","reproducible-research","scikit-learn"],"created_at":"2024-11-10T09:22:15.961Z","updated_at":"2025-07-19T22:03:34.405Z","avatar_url":"https://github.com/neuro-ml.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Reskit\n\n[![Documentation Status](https://readthedocs.org/projects/reskit/badge/?version=0.1.0)](http://reskit.readthedocs.io/en/0.1.0/?badge=0.1.0)\n[![Join the chat at https://gitter.im/ResearcherKit/Lobby](https://badges.gitter.im/ResearcherKit/Lobby.svg)](https://gitter.im/ResearcherKit/Lobby?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n[![Build Status](https://travis-ci.org/neuro-ml/reskit.svg?branch=master)](https://travis-ci.org/neuro-ml/reskit)\n[![codecov](https://codecov.io/gh/neuro-ml/reskit/branch/master/graph/badge.svg)](https://codecov.io/gh/neuro-ml/reskit)\n\n\nReskit (researcher’s kit) is a library for creating and curating reproducible\npipelines for scientific and industrial machine learning. The natural extension\nof the ``scikit-learn`` Pipelines to general classes of pipelines, Reskit\nallows for the efficient and transparent optimization of each pipeline step.\nMain features include data caching, compatibility with most of the scikit-learn\nobjects, optimization constraints (e.g. forbidden combinations), and table\ngeneration for quality metrics. Reskit also allows for the injection of custom\nmetrics into the underlying scikit frameworks. Reskit is intended for use by\nresearchers who need pipelines amenable to versioning and reproducibility, yet\nwho also have a large volume of experiments to run.\n\n## Features\n\n* Ability to combine pipelines with an equal number of steps in list of\n  experiments, running them and returning results in a convenient format for\n  human consumption (Pandas dataframe).\n\n* Step caching. Standard SciKit-learn pipelines cannot cache temporary steps.\n  Reskit includes the option  to save fixed steps, so in next pipeline\n  specified steps won’t be recalculated.\n\n* Forbidden combination constraints. Not all possible combinations of pipelines\n  are viable or meaningfully different. For example, in a classification task\n  comparing the performance of  logistic regression and decision trees the\n  former requires feature scaling while the latter may not. In this case you\n  can block the unnecessary pair. Reskit supports general tuple blocking as\n  well. \n\n* Full compatibility with scikit-learn objects. Reskit can use any scikit-learn\n  data transforming object and/or predictive model, and assumably many other\n  libraries that uses the scikit template.\n\n* Evaluation of multiple performance metrics simultaneously. Evaluation is\n  simply another step in the pipeline, so we can specify a number of possible\n  evaluation metrics and Reskit will expand out the computations for each\n  metric for each pipeline.\n\n* The DataTransformer class, which is Reskit’s simplfied interface for\n  specifying fit/transform methods in pipeline steps. A DataTransformer\n  subclass need only specify one function.\n\n* Tools for learning on graphs. Due to our original motivations Reskit includes\n  a number of operations for network data. In particular, it allows  a variety\n  of normalization choices foradjacency matrices , as well as built in  local\n  graph metric calculations. These were implemented using  DataTransformer and\n  in some cases the BCTpy (the Brain Connectivity Toolbox python version)\n\n## Documentation\n\nThe documentation includes detailed\n[tutorial](http://reskit.readthedocs.io/en/0.1.0/tutorial/index.html), but for\na quick view, we provide for you an example.\n\n## Example\n\nLet's say we want to prepare data and try some scalers and classifiers for\nprediction in a classification problem. We will tune paramaters of classifiers\nby grid search technique.\n\nData preparing:\n\n```python\nfrom sklearn.datasets import make_classification\n\n\nX, y = make_classification()\n```\n\nSetting steps for our pipelines and parameters for grid search:\n\n```python\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.preprocessing import MinMaxScaler\n\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.svm import SVC\n\nscalers = [\n    ('minmax', MinMaxScaler()),\n    ('standard', StandardScaler())\n]\n\nclassifiers = [\n    ('LR', LogisticRegression()),\n    ('SVC', SVC())\n]\n\nsteps = [\n    ('Scaler', scalers),\n    ('Classifier', classifiers)\n]\n\nparam_grid = {\n    'LR' : {\n        'penalty' : ['l1', 'l2']},\n    'SVC' : {\n        'kernel' : ['linear', 'poly', 'rbf', 'sigmoid']}}\n```\n\nSetting a cross-validation for grid searching of hyperparameters and for\nevaluation of models with obtained hyperparameters.\n\n```python\nfrom sklearn.model_selection import StratifiedKFold\n\n\ngrid_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)\neval_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=1)\n```\n\nCreating a plan of our research:\n\n```python\npipeliner = Pipeliner(steps=steps, grid_cv=grid_cv, eval_cv=eval_cv, param_grid=param_grid)\npipeliner.plan_table\n```\n\n  |    |  scaler    |  classifier\n  |----|------------|-------------\n  | 0  |  standard  |  LR\n  | 1  |  standard  |  SVC\n  | 2  |  minmax    |  LR\n  | 3  |  minmax    |  SVC\n\nTo tune parameters of models and evaluate this models, run:\n\n```python\npipeliner.get_results(X, y, scoring='roc_auc')\n```\n\n```bash\nLine: 1/4\nLine: 2/4\nLine: 3/4\nLine: 4/4\n```\n\n  |   |  scaler    |  classifier  |  grid_roc_auc_mean  |  grid_roc_auc_std  |  grid_roc_auc_best_params  |  eval_roc_auc_mean  |  eval_roc_auc_std  |  eval_roc_auc_scores\n  |---|------------|--------------|---------------------|--------------------|----------------------------|---------------------|--------------------|---------------------------------\n  | 0 |  standard  |  LR          |  0.956              |  0.0338230690506   |  {'penalty': 'l1'}         |  0.968              |  0.0324961536185   |  [ 0.92  1.    1.    0.94  0.98]\n  | 1 |  standard  |  SVC         |  0.962              |  0.0278567765544   |  {'kernel': 'poly'}        |  0.976              |  0.0300665927567   |  [ 0.95  1.    1.    0.93  1.  ]\n  | 2 |  minmax    |  LR          |  0.964              |  0.0412795348811   |  {'penalty': 'l1'}         |  0.966              |  0.0377359245282   |  [ 0.92  1.    1.    0.92  0.99]\n  | 3 |  minmax    |  SVC         |  0.958              |  0.0411825205639   |  {'kernel': 'rbf'}         |  0.962              |  0.0401995024845   |  [ 0.93  1.    1.    0.9   0.98]\n\n\n## Installation\n\nReskit currently requires ``Python 3.4`` or later to run. Please install\n``Python`` and ``pip`` via the package manager of your operating system if it\nis not included already.\n\nReskit depends on:\n\n* `numpy`\n* `scikit-learn`\n* `pandas`\n* `scipy`\n* `python-igraph`\n* `networkx`\n\nIf you don't want to use graph metrics, you should to comment `scipy`,\n`python-igraph` and `networkx` in `requirements.txt`.\n\nTo install dependencies run next command:\n\n```bash\npip install -r https://raw.githubusercontent.com/neuro-ml/reskit/master/requirements.txt\n```\n\nLast stable version is 0.1.0. You can install it via:\n\n```bash\npip3 install -U https://github.com/neuro-ml/reskit/archive/0.1.0.zip\n```\n\nTo install the latest development version of Reskit, run the following commands:\n\n```bash\npip install -U https://github.com/neuro-ml/reskit/archive/master.zip\n```\n\n## Docker\n\nIf you just want to try Reskit or don’t want to install Python, you can build\ndocker image and make all reskit’s stuff there. Also, in this case, you can\nprovide the simple way to reproduce your experiment. To run Reskit in docker\nyou can use next commands.\n\n1. Clone repository:\n\n    ```bash\n    git clone https://github.com/neuro-ml/reskit.git\n    cd reskit\n    ```\n\n2. Build docker image:\n\n    ```bash\n    docker build -t docker-reskit -f Dockerfile .\n    ```\n\n3. Run docker image.\n  * If you want to run bash in container:\n\n    ```bash\n    docker run -it docker-reskit bash\n    ```\n\n  * If you want to run bash in container with shared directory:\n\n    ```bash\n    docker run -v $PWD/scripts:/reskit/scripts -it docker-reskit bash\n    ```\n\n  * If you want to start Jupyter Notebook server at http://localhost:8809 in\n    container:\n\n    ```bash\n    docker run -v $PWD/scripts:/reskit/scripts -it -p 8809:8809 docker-reskit jupyter notebook --no-browser --ip=\"*\" --allow-root --port 8809\n    ```\n    \n    You will see message:\n\n    ```bash\n      Copy/paste this URL into your browser when you connect for the first time,\n      to login with a token:\n        http://localhost:8809/?token=some_token\n    ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneuro-ml%2Freskit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneuro-ml%2Freskit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneuro-ml%2Freskit/lists"}