{"id":21380998,"url":"https://github.com/elisim/hydra-sklearn-pipelines","last_synced_at":"2025-08-09T21:15:56.998Z","repository":{"id":47138580,"uuid":"388393811","full_name":"elisim/hydra-sklearn-pipelines","owner":"elisim","description":"Code accompanying the blogpost: \"Creating Configurable Data Pre-Processing Pipelines by Combining Hydra and Sklearn\" by Eli Simhayev \u0026 Benjamin Bodner","archived":false,"fork":false,"pushed_at":"2024-06-26T12:43:12.000Z","size":26,"stargazers_count":25,"open_issues_count":2,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-06-26T15:44:00.338Z","etag":null,"topics":["data-science","hydra","machine-learning","scikit-learn"],"latest_commit_sha":null,"homepage":"https://medium.com/beyondminds/creating-configurable-data-pre-processing-pipelines-by-combining-hydra-and-sklearn-812065c9ab64","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elisim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-07-22T08:55:32.000Z","updated_at":"2024-06-26T12:43:17.000Z","dependencies_parsed_at":"2023-02-02T02:46:27.203Z","dependency_job_id":null,"html_url":"https://github.com/elisim/hydra-sklearn-pipelines","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elisim%2Fhydra-sklearn-pipelines","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elisim%2Fhydra-sklearn-pipelines/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elisim%2Fhydra-sklearn-pipelines/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elisim%2Fhydra-sklearn-pipelines/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elisim","download_url":"https://codeload.github.com/elisim/hydra-sklearn-pipelines/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225885790,"owners_count":17539640,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","hydra","machine-learning","scikit-learn"],"created_at":"2024-11-22T10:43:53.528Z","updated_at":"2024-11-22T10:43:53.966Z","avatar_url":"https://github.com/elisim.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hydra-Sklearn Preprocessing Pipelines\n\n![Sklearn-Hydra](https://user-images.githubusercontent.com/17675462/131835987-63b1d347-5a05-49c8-af36-d1a393d87c22.png)\n\n\nThis repository accompanying the blog post:\n\n[Creating Configurable Data Pre-Processing Pipelines by Combining Hydra and Sklearn](https://medium.com/beyondminds/creating-configurable-data-pre-processing-pipelines-by-combining-hydra-and-sklearn-812065c9ab64) - by Eli Simhayev \u0026 Benjamin Bodner\n\n## Update 4.1.23\nWhen I wrote this blog-post, the stable version of Hydra was 1.1.\nNow, the stable version is 1.3, so note that this code work with Hydra 1.1 :) \n\n# Running Different Pipelines\nRun:\n\n```commandline\npython main.py preprocessing_pipeline=decision_tree\n```\n\nto execute the `decision_tree` preprocessing pipeline. You might also run other pipelines (from `configs/preprocessing_pipeline`)\nby just changing:\n\n```commandline\npython main.py preprocessing_pipeline=\u003cyour-pipeline\u003e\n```\nHydra also supports [Tab completion](https://hydra.cc/docs/tutorials/basic/running_your_app/tab_completion/) to complete config.\n\n\n# Adding New Pipelines\nAdding new pipelines can be easily done using a yaml configuration in `configs/preprocessing_pipeline`.\nYou might add another configurations: which model to use, which visualizations, etc. - learn more here: [Hydra — A fresh look at configuration for machine learning projects](https://medium.com/pytorch/hydra-a-fresh-look-at-configuration-for-machine-learning-projects-50583186b710)\n\n\n#### We hope this will help you to better organize your data preprocessing pipelines 🙂\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felisim%2Fhydra-sklearn-pipelines","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felisim%2Fhydra-sklearn-pipelines","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felisim%2Fhydra-sklearn-pipelines/lists"}