{"id":13738330,"url":"https://github.com/alegonz/baikal","last_synced_at":"2025-06-22T10:01:42.141Z","repository":{"id":45439850,"uuid":"166814594","full_name":"alegonz/baikal","owner":"alegonz","description":"A graph-based functional API for building complex scikit-learn pipelines.","archived":false,"fork":false,"pushed_at":"2022-12-08T03:38:11.000Z","size":666,"stargazers_count":590,"open_issues_count":6,"forks_count":30,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-05-27T09:50:42.359Z","etag":null,"topics":["data-science","graph-based","machine-learning","python","scikit-learn"],"latest_commit_sha":null,"homepage":"https://baikal.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alegonz.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-01-21T12:59:02.000Z","updated_at":"2025-04-06T21:32:02.000Z","dependencies_parsed_at":"2023-01-24T08:45:12.390Z","dependency_job_id":null,"html_url":"https://github.com/alegonz/baikal","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/alegonz/baikal","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alegonz%2Fbaikal","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alegonz%2Fbaikal/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alegonz%2Fbaikal/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alegonz%2Fbaikal/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alegonz","download_url":"https://codeload.github.com/alegonz/baikal/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alegonz%2Fbaikal/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261273836,"owners_count":23133828,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","graph-based","machine-learning","python","scikit-learn"],"created_at":"2024-08-03T03:02:18.845Z","updated_at":"2025-06-22T10:01:37.107Z","avatar_url":"https://github.com/alegonz.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"![baikal](illustrations/baikal1_blue.png)\n\n# A graph-based functional API for building complex scikit-learn pipelines\n\n[![docs](https://img.shields.io/badge/docs-read%20now-blue.svg)](https://baikal.readthedocs.io)\n[![build status](https://circleci.com/gh/alegonz/baikal/tree/master.svg?style=svg\u0026circle-token=fb67eeed2067c361989d2091b9d4d03e6899010b)](https://circleci.com/gh/alegonz/baikal/tree/master)\n[![coverage](https://codecov.io/gh/alegonz/baikal/branch/master/graph/badge.svg?token=SSoeQETNh6)](https://codecov.io/gh/alegonz/baikal)\n[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/alegonz/baikal.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/alegonz/baikal/context:python)\n[![code style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![latest release](https://img.shields.io/pypi/v/baikal.svg)](https://pypi.org/project/baikal)\n[![Anaconda-Server Badge](https://anaconda.org/conda-forge/baikal/badges/version.svg)](https://anaconda.org/conda-forge/baikal)\n[![license](https://img.shields.io/pypi/l/baikal.svg)](https://github.com/alegonz/baikal/blob/master/LICENSE)\n\n**baikal** is written in pure Python. It supports Python 3.5 and above.\n\nNote: **baikal** is still a young project and there might be backward incompatible changes. \nThe next development steps and backwards-incompatible changes are announced and discussed \nin [this issue](https://github.com/alegonz/baikal/issues/16). Please subscribe to it if \nyou use **baikal**.\n\n### What is baikal?\n\n**baikal is a graph-based, functional API for building complex machine learning pipelines \nof objects that implement the** [scikit-learn API](https://scikit-learn.org/stable/developers/contributing.html#different-objects). \nIt is mostly inspired on the excellent [Keras](https://keras.io) API for Deep Learning, \nand borrows a few concepts from the [TensorFlow](https://www.tensorflow.org) framework \nand the (perhaps lesser known) [graphkit](https://github.com/yahoo/graphkit) package.\n\n**baikal** aims to provide an API that allows to build complex, non-linear machine learning \npipelines that look like this: \n\n![multiple_input_nonlinear_pipeline_example_diagram](illustrations/multiple_input_nonlinear_pipeline_example_diagram.png \"An example of a multiple-input, nonlinear pipeline\")\n\n\nwith code that looks like this:\n\n```python\nx1 = Input()\nx2 = Input()\ny_t = Input()\n\ny1 = ExtraTreesClassifier()(x1, y_t)\ny2 = RandomForestClassifier()(x2, y_t)\nz = PowerTransformer()(x2)\nz = PCA()(z)\ny3 = LogisticRegression()(z, y_t)\n\nensemble_features = Stack()([y1, y2, y3])\ny = SVC()(ensemble_features, y_t)\n\nmodel = Model([x1, x2], y, y_t)\n```\n\n### What can I do with it?\n\nWith **baikal** you can\n\n- build non-linear pipelines effortlessly\n- handle multiple inputs and outputs\n- add steps that operate on targets as part of the pipeline\n- nest pipelines\n- use prediction probabilities (or any other kind of output) as inputs to other steps in the pipeline\n- query intermediate outputs, easing debugging\n- freeze steps that do not require fitting\n- define and add custom steps easily\n- plot pipelines\n\nAll with boilerplate-free, readable code.\n\n### Why baikal?\t\n\nThe pipeline above (to the best of the author's knowledge) cannot be easily built using \n[scikit-learn's composite estimators API](https://scikit-learn.org/stable/modules/compose.html#pipelines-and-composite-estimators) \nas you encounter some limitations:\t\n\n1. It is aimed at linear pipelines\t\n    - You could add some step parallelism with the [ColumnTransformer](https://scikit-learn.org/stable/modules/compose.html#columntransformer-for-heterogeneous-data) \n      API, but this is limited to transformer objects.\t\n2. Classifiers/Regressors can only be used at the end of the pipeline.\t\n    - This means we cannot use the predicted labels (or their probabilities) as features \n      to other classifiers/regressors.\t\n    - You could leverage mlxtend's [StackingClassifier](http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/#stackingclassifier) \n      and come up with some clever combination of the above composite estimators \n      (`Pipeline`s, `ColumnTransformer`s, and `StackingClassifier`s, etc), but you might \n      end up with code that feels hard-to-follow and verbose.\t\n3. Cannot handle multiple input/multiple output models.\t\n\nPerhaps you could instead define a big, composite estimator class that integrates each of \nthe pipeline steps through composition. This, however, most likely will require \t\n* writing big `__init__` methods to control each of the internal steps' knobs;\t\n* being careful with `get_params` and `set_params` if you want to use, say, `GridSearchCV`;\t\n* and adding some boilerplate code if you want to access the outputs of intermediate \n  steps for debugging.\t\n\nBy using **baikal** as shown in the example above, code can be more readable, less verbose \nand closer to our mental representation of the pipeline. **baikal** also provides an API \nto fit, predict with, and query the entire pipeline with single commands. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falegonz%2Fbaikal","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falegonz%2Fbaikal","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falegonz%2Fbaikal/lists"}