{"id":13695058,"url":"https://github.com/yandex/rep","last_synced_at":"2025-05-15T19:09:51.931Z","repository":{"id":30333898,"uuid":"33886250","full_name":"yandex/rep","owner":"yandex","description":"Machine Learning toolbox for Humans","archived":false,"fork":false,"pushed_at":"2024-07-31T13:36:37.000Z","size":137600,"stargazers_count":696,"open_issues_count":29,"forks_count":149,"subscribers_count":49,"default_branch":"master","last_synced_at":"2025-05-08T05:04:23.854Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://yandex.github.io/rep/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yandex.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-04-13T18:32:24.000Z","updated_at":"2025-04-08T02:10:49.000Z","dependencies_parsed_at":"2022-09-10T07:21:04.340Z","dependency_job_id":"83847594-364e-46b3-922e-aafe34b1db32","html_url":"https://github.com/yandex/rep","commit_stats":{"total_commits":885,"total_committers":14,"mean_commits":"63.214285714285715","dds":0.6542372881355932,"last_synced_commit":"5f09045c74d05c75bd1274635573086b39332ac0"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yandex%2Frep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yandex%2Frep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yandex%2Frep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yandex%2Frep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yandex","download_url":"https://codeload.github.com/yandex/rep/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254404357,"owners_count":22065641,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T17:02:00.540Z","updated_at":"2025-05-15T19:09:51.892Z","avatar_url":"https://github.com/yandex.png","language":"Jupyter Notebook","funding_links":[],"categories":["Machine Learning","Jupyter Notebook","Python"],"sub_categories":["General Purpose Machine Learning","General-Purpose Machine Learning","Automatic Plotting"],"readme":"# Reproducible Experiment Platform (REP)\n\n[![Join the chat at https://gitter.im/yandex/rep](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/yandex/rep?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n[![Build Status](https://travis-ci.org/yandex/rep.svg?branch=master)](https://travis-ci.org/yandex/rep)\n[![PyPI version](https://badge.fury.io/py/rep.svg)](https://badge.fury.io/py/rep)\n[![Documentation](https://img.shields.io/badge/documentation-link-blue.svg)](https://yandex.github.io/rep/)\n[![CircleCI](https://circleci.com/gh/arogozhnikov/rep.svg?style=svg)](https://circleci.com/gh/arogozhnikov/rep)\n\n__REP__ is ipython-based environment for conducting data-driven research in a consistent and reproducible way.\n\n## Main features:\n\n  * unified python wrapper for different ML libraries (wrappers follow extended __scikit-learn__ interface)\n    * Sklearn\n    * TMVA\n    * XGBoost\n    * uBoost\n    * Theanets\n    * Pybrain\n    * Neurolab\n    * MatrixNet service(**available to CERN**)\n  * parallel training of classifiers on cluster \n  * classification/regression reports with plots\n  * interactive plots supported\n  * smart grid-search algorithms with parallel execution\n  * research versioning using git\n  * pluggable quality metrics for classification\n  * meta-algorithm design (aka 'rep-lego')\n\n__REP__ is not trying to substitute __scikit-learn__, but extends it and provides better user experience.\n\n### Howto examples\n\nTo get started, look at the notebooks in [/howto/](https://github.com/yandex/rep/tree/master/howto)\n\nNotebooks can be viewed (not executed) online at [nbviewer](http://nbviewer.ipython.org/github/yandex/rep/tree/master/howto/)  \u003cbr /\u003e\nThere are basic introductory notebooks (about python, IPython) and more advanced ones (about the **REP** itself)\n\nExamples code is written in python 2, but library is python 2 and python 3 compatible.\n\n### Installation with Docker\n\nWe provide the [docker image](https://registry.hub.docker.com/u/yandex/rep/) with `REP` and all it's dependencies. \nIt is a recommended way, specially if you're not experienced in python.\n\n* [install with Docker on Linux](https://github.com/yandex/rep/wiki/Install-REP-with-Docker-(Linux))\n* [install with Docker on Mac and Windows](https://github.com/yandex/rep/wiki/Install-REP-with-Docker-(Mac-OS-X,-Windows))\n\n\n### Installation with bare hands\n\nHowever, if you want to install `REP` and all of its dependencies on your machine yourself, follow this manual: \n[installing manually](https://github.com/yandex/rep/wiki/Installing-manually) and \n[running manually](https://github.com/yandex/rep/wiki/Running-manually).\n\n\n### Links\n\n* [documentation](http://yandex.github.io/rep/)\n* [howto](http://nbviewer.ipython.org/github/yandex/rep/tree/master/howto/)\n* [bugtracker](https://github.com/yandex/rep/issues)\n* [gitter chat, troubleshooting](https://gitter.im/yandex/rep)\n* [API, contributing new estimator](https://github.com/yandex/rep/wiki/Contributing-new-estimator)\n* [API, contributing new metric](https://github.com/yandex/rep/wiki/Contributing-new-metrics)\n* [Tutorial](https://github.com/yandexdataschool/REP_tutorial) based on [Flavour of physics challenge](https://www.kaggle.com/c/flavours-of-physics)\n* If you use REP in research, please consider [citing](http://arxiv.org/abs/1510.00624)\n\n### License\nApache 2.0, library is open-source.\n\n### Minimal examples\n\n__REP__ wrappers are sklearn compatible:\n\n```python\nfrom rep.estimators import XGBoostClassifier, SklearnClassifier, TheanetsClassifier\nclf = XGBoostClassifier(n_estimators=300, eta=0.1).fit(trainX, trainY)\nprobabilities = clf.predict_proba(testX)\n```\n\nBeloved trick of kagglers is to run bagging over complex algorithms. This is how it is done in __REP__:\n\n```python\nfrom sklearn.ensemble import BaggingClassifier\nclf = BaggingClassifier(base_estimator=XGBoostClassifier(), n_estimators=10)\n# wrapping sklearn to REP wrapper\nclf = SklearnClassifier(clf)\n```\n\nAnother useful trick is to use folding instead of splitting data into train/test. \nThis is specially useful when you're using some kind of complex stacking\n\n```python\nfrom rep.metaml import FoldingClassifier\nclf = FoldingClassifier(TheanetsClassifier(), n_folds=3)\nprobabilities = clf.fit(X, y).predict_proba(X)\n```\nIn example above all data are splitted into 3 folds, \nand each fold is predicted by classifier which was trained on other 2 folds.  \n\nAlso __REP__ classifiers provide report:\n\n```python\nreport = clf.test_on(testX, testY)\nreport.roc().plot() # plot ROC curve\nfrom rep.report.metrics import RocAuc\n# learning curves are useful when training GBDT!\nreport.learning_curve(RocAuc(), steps=10)  \n```\n\nYou can read about other __REP__ tools (like smart distributed grid search, folding and factory) \nin documentation and howto examples.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyandex%2Frep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyandex%2Frep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyandex%2Frep/lists"}