Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/elisim/hydra-sklearn-pipelines
Code accompanying the blogpost: "Creating Configurable Data Pre-Processing Pipelines by Combining Hydra and Sklearn" by Eli Simhayev & Benjamin Bodner
https://github.com/elisim/hydra-sklearn-pipelines
data-science hydra machine-learning scikit-learn
Last synced: 2 months ago
JSON representation
Code accompanying the blogpost: "Creating Configurable Data Pre-Processing Pipelines by Combining Hydra and Sklearn" by Eli Simhayev & Benjamin Bodner
- Host: GitHub
- URL: https://github.com/elisim/hydra-sklearn-pipelines
- Owner: elisim
- Created: 2021-07-22T08:55:32.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-06-26T12:43:12.000Z (7 months ago)
- Last Synced: 2024-06-26T15:44:00.338Z (7 months ago)
- Topics: data-science, hydra, machine-learning, scikit-learn
- Language: Jupyter Notebook
- Homepage: https://medium.com/beyondminds/creating-configurable-data-pre-processing-pipelines-by-combining-hydra-and-sklearn-812065c9ab64
- Size: 25.4 KB
- Stars: 25
- Watchers: 2
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Hydra-Sklearn Preprocessing Pipelines
![Sklearn-Hydra](https://user-images.githubusercontent.com/17675462/131835987-63b1d347-5a05-49c8-af36-d1a393d87c22.png)
This repository accompanying the blog post:
[Creating Configurable Data Pre-Processing Pipelines by Combining Hydra and Sklearn](https://medium.com/beyondminds/creating-configurable-data-pre-processing-pipelines-by-combining-hydra-and-sklearn-812065c9ab64) - by Eli Simhayev & Benjamin Bodner
## Update 4.1.23
When I wrote this blog-post, the stable version of Hydra was 1.1.
Now, the stable version is 1.3, so note that this code work with Hydra 1.1 :)# Running Different Pipelines
Run:```commandline
python main.py preprocessing_pipeline=decision_tree
```to execute the `decision_tree` preprocessing pipeline. You might also run other pipelines (from `configs/preprocessing_pipeline`)
by just changing:```commandline
python main.py preprocessing_pipeline=
```
Hydra also supports [Tab completion](https://hydra.cc/docs/tutorials/basic/running_your_app/tab_completion/) to complete config.# Adding New Pipelines
Adding new pipelines can be easily done using a yaml configuration in `configs/preprocessing_pipeline`.
You might add another configurations: which model to use, which visualizations, etc. - learn more here: [Hydra — A fresh look at configuration for machine learning projects](https://medium.com/pytorch/hydra-a-fresh-look-at-configuration-for-machine-learning-projects-50583186b710)#### We hope this will help you to better organize your data preprocessing pipelines 🙂