https://github.com/akoury/ml-helper

Python library with helpers to speed up and structure machine learning projects.
https://github.com/akoury/ml-helper

data data-visualization machine-learning ml python scikit-learn sklearn

Last synced: 5 months ago
JSON representation

Python library with helpers to speed up and structure machine learning projects.

Host: GitHub
URL: https://github.com/akoury/ml-helper
Owner: akoury
License: mit
Created: 2019-02-20T16:48:09.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-06-05T12:30:18.000Z (about 6 years ago)
Last Synced: 2025-01-31T02:22:44.901Z (6 months ago)
Topics: data, data-visualization, machine-learning, ml, python, scikit-learn, sklearn
Language: Python
Homepage: https://pypi.org/project/ml-helper/
Size: 11.7 MB
Stars: 10
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: license.txt

Awesome Lists containing this project

README

# ML Helper
---
Helpers to speed up and structure machine learning projects.

The library is available in [Pypi](https://pypi.org/project/ml-helper/)

### Installing
---

The easiest way to install ml-helper is through ```pip```

```python
pip install ml-helper
```

To use it in your project, you must first import the library

```python
from ml_helper.helper import Helper
```

And then create a Helper object with a dictionary of keys related to your project

```python
KEYS = {
'SEED': 1,
'TARGET': 'y',
'METRIC': 'r2',
'TIMESERIES': True,
'SPLITS': 5
}

hp = Helper(KEYS)
```

After this, you may use the helper object's many functions

#### Dependencies

ML-Helper requires:
* Python (>3.5)
* Numpy (>=1.16)
* Pandas (>=0.23.4)
* Seaborn (>=0.9)
* Scikit-learn (>=0.20)
* Natplotlib (>=3)
* Scipy (>=1)
* Imblearn
* Vecstack

### Functionality
---

The functionality is separated into 4 groups:
* Data Exploration
* Missing Data
* Boxplot of numerical variables
* Coefficient of variation
* Correlation (numerical and categorical)
* Under Represented Features
* Target Variable Distribution
* Feature Importance
* PCA Component Variance
* Data Preparation
* Convert features to categories
* Drop multiple columns
* Modeling
* Cross Validation (with stratified kfolds, or time series split depending on use case)
* Randomized Grid Search
* Pipeline: Collection of models and pipeline steps that get performed and scored
* Predict: Predict on unseen data
* Stack Predict: Build a stacked model and perform a prediction
* Regression
* Plots for predictions
* Classification
* ROC Curve
* Classification Report
* Others
* Select features based on types
* Split X and y
* Plot models/pipelines

### Working Examples
---
If you wish to see the library in use, you may view the notebooks in the [examples](examples) section.

Also, you can see the implementation in their corresponding Kaggle Kernels:

* [Bike Sharing in Washington D.C.: Time Series Regression](https://www.kaggle.com/akoury/bike-sharing-in-washington-d-c-using-ml-helper)

* [Employee Attrition: Classification](https://www.kaggle.com/akoury/employee-attrition-basis-to-create-ml-helper-lib)

### ML-Helper Coding Style
---
Ml-Helper complies to PEP8 and uses ```black``` for coding standards

### Versioning
---
[SemVer](http://semver.org/) is used for versioning.

### License
---
This project is licensed under the MIT License - see the [License](license.txt) file for details

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/akoury/ml-helper

Awesome Lists containing this project

README