Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/farrajota/automl_toolbox
Toolbox for building automatic Data Science solutions
https://github.com/farrajota/automl_toolbox
automatic automl data-science machine-learning python3 toolbox
Last synced: about 12 hours ago
JSON representation
Toolbox for building automatic Data Science solutions
- Host: GitHub
- URL: https://github.com/farrajota/automl_toolbox
- Owner: farrajota
- License: mit
- Created: 2018-09-26T11:32:13.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2021-03-20T00:08:40.000Z (over 3 years ago)
- Last Synced: 2023-10-20T12:34:42.244Z (about 1 year ago)
- Topics: automatic, automl, data-science, machine-learning, python3, toolbox
- Language: Python
- Size: 72.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AutoML toolbox - PROJECT DEPRECATED
[![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/)
The **Auto**matic **M**achine **L**earning (AutoML) toolbox is a collection of methods with a simple API which can assist you in your Data Science tasks by providing boilerplate code in a form of simple functions ready to be used. You can use it as your personal Data Science assistant / wizard for creating ETL processes and/or a fully automated machine learning pipelines in a simple and quick way.
This library is intended to be used as a testing playground for a bunch of wrapper methods to serve as high-level APIs to a bunch of common tasks like:
- data profiling
- cleaning missing values
- detecting outliers
- performing feature engineering
- hyper-parameter optimization
- evaluating machine learning models
- creating ensembles of such models
- etc.## Warning
This code base is in heavy development for now. Once it reaches `v0.1.0` you may then try it, but for now you are at your own risk.
## Installation
For now, to install this package you must build it from source. To do that, just run the following command in the terminal:
```bash
python setup.py install
```> Note: once this package reaches `v0.1.0` it will be possible to install it via pip.
## Key Libraries used
This toolbox integrates the following packages in its core for doing most of its work. Basically, you can think of this package as a wrapper for a bunch functions you would uneed like cross-validation, hyperparameter optimization, etc., but with a nice, high-level API.
- Numpy
- Pandas
- pandas-profiling (data profiler)
- Scikit-learn (collection of ML libs)
- xgboost (ML lib)
- lightgbm (ML lib)
- Hyperopt (hyperparam optim - bo)
- HpBandSter (hyperparam optim - hyperband + bo)### Libraries to be integrated in the future
- dask (distributed computing / big data)
- keras (DL lib)
- feature-tools (automatic feature engineering)
- [pygdf](https://github.com/rapidsai/pygdf) (GPU DataFrame)## TODO
Funcionalities intended to be added to the toolbox:
- [x] basic data profiler
- [ ] automatic analysis / benchmarking and filling of missing values
- [ ] automatic analysis / benchmarking and cleaning of outliers
- [ ] automatic feature transformations / normalization
- [ ] automatic feature engineering
- [ ] automatic feature selection
- [ ] automatic model selection
- [ ] automatic model optimization (hyper-parameter optimization)
- [ ] automatic model ensembling
- [ ] pre-defined parameter list of the most popular ML models in scikit-learn
- [ ] distributed computing (integrate Dask)
- [ ] pipeline generation## License
[MIT](LICENSE)