Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/akoury/ml-helper
Python library with helpers to speed up and structure machine learning projects.
https://github.com/akoury/ml-helper
data data-visualization machine-learning ml python scikit-learn sklearn
Last synced: 4 months ago
JSON representation
Python library with helpers to speed up and structure machine learning projects.
- Host: GitHub
- URL: https://github.com/akoury/ml-helper
- Owner: akoury
- License: mit
- Created: 2019-02-20T16:48:09.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-06-05T12:30:18.000Z (over 5 years ago)
- Last Synced: 2024-09-27T20:23:08.928Z (4 months ago)
- Topics: data, data-visualization, machine-learning, ml, python, scikit-learn, sklearn
- Language: Python
- Homepage: https://pypi.org/project/ml-helper/
- Size: 11.7 MB
- Stars: 10
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: license.txt
Awesome Lists containing this project
README
# ML Helper
---
Helpers to speed up and structure machine learning projects.The library is available in [Pypi](https://pypi.org/project/ml-helper/)
### Installing
---The easiest way to install ml-helper is through ```pip```
```python
pip install ml-helper
```To use it in your project, you must first import the library
```python
from ml_helper.helper import Helper
```And then create a Helper object with a dictionary of keys related to your project
```python
KEYS = {
'SEED': 1,
'TARGET': 'y',
'METRIC': 'r2',
'TIMESERIES': True,
'SPLITS': 5
}hp = Helper(KEYS)
```After this, you may use the helper object's many functions
#### Dependencies
ML-Helper requires:
* Python (>3.5)
* Numpy (>=1.16)
* Pandas (>=0.23.4)
* Seaborn (>=0.9)
* Scikit-learn (>=0.20)
* Natplotlib (>=3)
* Scipy (>=1)
* Imblearn
* Vecstack### Functionality
---The functionality is separated into 4 groups:
* Data Exploration
* Missing Data
* Boxplot of numerical variables
* Coefficient of variation
* Correlation (numerical and categorical)
* Under Represented Features
* Target Variable Distribution
* Feature Importance
* PCA Component Variance
* Data Preparation
* Convert features to categories
* Drop multiple columns
* Modeling
* Cross Validation (with stratified kfolds, or time series split depending on use case)
* Randomized Grid Search
* Pipeline: Collection of models and pipeline steps that get performed and scored
* Predict: Predict on unseen data
* Stack Predict: Build a stacked model and perform a prediction
* Regression
* Plots for predictions
* Classification
* ROC Curve
* Classification Report
* Others
* Select features based on types
* Split X and y
* Plot models/pipelines### Working Examples
---
If you wish to see the library in use, you may view the notebooks in the [examples](examples) section.Also, you can see the implementation in their corresponding Kaggle Kernels:
* [Bike Sharing in Washington D.C.: Time Series Regression](https://www.kaggle.com/akoury/bike-sharing-in-washington-d-c-using-ml-helper)
* [Employee Attrition: Classification](https://www.kaggle.com/akoury/employee-attrition-basis-to-create-ml-helper-lib)
### ML-Helper Coding Style
---
Ml-Helper complies to PEP8 and uses ```black``` for coding standards### Versioning
---
[SemVer](http://semver.org/) is used for versioning.### License
---
This project is licensed under the MIT License - see the [License](license.txt) file for details