Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ZhiningLiu1998/imbalanced-ensemble

๐Ÿ› ๏ธ Class-imbalanced Ensemble Learning Toolbox. | ็ฑปๅˆซไธๅนณ่กก/้•ฟๅฐพๆœบๅ™จๅญฆไน ๅบ“
https://github.com/ZhiningLiu1998/imbalanced-ensemble

class-imbalance classification data-mining data-science ensemble ensemble-imbalanced-learning ensemble-learning ensemble-model imbalanced-classification imbalanced-data imbalanced-learning long-tail machine-learning multi-class-classification python python3 scikit-learn sklearn

Last synced: about 4 hours ago
JSON representation

๐Ÿ› ๏ธ Class-imbalanced Ensemble Learning Toolbox. | ็ฑปๅˆซไธๅนณ่กก/้•ฟๅฐพๆœบๅ™จๅญฆไน ๅบ“

Awesome Lists containing this project

README

        

![](https://raw.githubusercontent.com/ZhiningLiu1998/figures/master/imbalanced-ensemble/imbens-logo.png)


IMBENS: Class-imbalanced Ensemble Learning in Python


Status




CircleCI Status


Read the Docs











PyPI










Traffic















Documentation












Paper & Citation












Language







โณQuick Start with our 5-minute Guide & Detailed Examples

***IMBENS* (imported as `imbens`) is a Python library for quick implementation, modification, evaluation, and visualization of ensemble [learning from class-imbalanced data](https://github.com/ZhiningLiu1998/awesome-imbalanced-learning)**.
Currently, IMBENS includes **[over 15 ensemble imbalanced learning algorithms](#list-of-implemented-methods) (SMOTEBoost, SMOTEBagging, RUSBoost, EasyEnsemble, SelfPacedEnsemble, etc)** and **[19 over-/under-sampling methods](https://imbalanced-ensemble.readthedocs.io/en/latest/api/sampler/api.html) (SMOTE, ADASYN, TomekLinks, etc)** from [imbalance-learn](https://imbalanced-learn.org/stable/references/index.html#api).

๐ŸŒˆ IMBENS Highlights

- ๐Ÿง‘โ€๐Ÿ’ป **Ease-of-use:** Unified, easy-to-use APIs with [documentation](https://imbalanced-ensemble.readthedocs.io/) and [examples](https://imbalanced-ensemble.readthedocs.io/en/latest/auto_examples/index.html#).
- ๐Ÿš€ **Performance:** Optimized performance with parallelization using [joblib](https://github.com/joblib/joblib).
- ๐Ÿ“Š **Benchmarking:** Running & comparing multiple models with our [visualizer](#visualize-ensemble-classifiers).
- ๐Ÿ“บ **Monitoring:** Powerful, customizable, interactive training [logging]((#customizing-training-log)).
- ๐Ÿช **Versatility:** Full compatibility with [scikit-learn](https://scikit-learn.org/stable/) and [imbalanced-learn](https://imbalanced-learn.org/stable/).
- ๐Ÿ“ˆ **Functionality:** Extending existing techniques from binary to ***multi-class*** setting.

### โœ‚๏ธ **Use IMBENS for class-imbalanced classification with <5 lines of code:**

```python
# Train an SPE classifier
from imbens.ensemble import SelfPacedEnsembleClassifier
clf = SelfPacedEnsembleClassifier(random_state=42)
clf.fit(X_train, y_train)

# Predict with an SPE classifier
y_pred = clf.predict(X_test)
```

### ๐Ÿค— Citing IMBENS

๐Ÿป We appreciate your citation if you find our work helpful! The BibTeX entry:

```bib
@article{liu2023imbens,
title={IMBENS: Ensemble Class-imbalanced Learning in Python},
author={Liu, Zhining and Kang, Jian and Tong, Hanghang and Chang, Yi},
journal={arXiv preprint arXiv:2111.12776},
year={2023}
}
```

### ๐Ÿ‘ฏโ€โ™‚๏ธ Contribute to IMBENS

Join us and become a contributor!
Please refer to the [contributing guidelines](https://github.com/ZhiningLiu1998/imbalanced-ensemble/blob/main/CONTRIBUTING.md).

๐Ÿ“š Table of Contents

- [Installation](#installation)
- [List of implemented methods](#list-of-implemented-methods)
- [5-min Quick Start with IMBENS](#5-min-quick-start-with-imbens)
- [A minimal working example](#a-minimal-working-example)
- [Visualize ensemble classifiers](#visualize-ensemble-classifiers)
- [Customizing training log](#customizing-training-log)
- [About imbalanced learning](#about-imbalanced-learning)
- [Acknowledgements](#acknowledgements)
- [References](#references)
- [Related Projects](#related-projects)
- [Contributors โœจ](#contributors-)

## Installation

It is recommended to use **pip** for installation.
Please make sure the **latest version** is installed to avoid potential problems:
```shell
$ pip install imbalanced-ensemble # normal install
$ pip install --upgrade imbalanced-ensemble # update if needed
```

Or you can install imbalanced-ensemble by clone this repository:
```shell
$ git clone https://github.com/ZhiningLiu1998/imbalanced-ensemble.git
$ cd imbalanced-ensemble
$ pip install .
```

imbalanced-ensemble requires following dependencies:

- [Python](https://www.python.org/) (>=3.6)
- [numpy](https://numpy.org/) (>=1.16.0)
- [pandas](https://pandas.pydata.org/) (>=1.1.3)
- [scipy](https://www.scipy.org/) (>=1.9.1)
- [joblib](https://pypi.org/project/joblib/) (>=0.11)
- [scikit-learn](https://scikit-learn.org/stable/) (>=1.2.0)
- [matplotlib](https://matplotlib.org/) (>=3.3.2)
- [seaborn](https://seaborn.pydata.org/) (>=0.11.0)
- [tqdm](https://tqdm.github.io/) (>=4.50.2)

## List of implemented methods

**Currently (v0.1.3, 2021/06), *16* ensemble imbalanced learning methods were implemented:
(Click to jump to the document page)**

- **Resampling-based**
- *Under-sampling + Ensemble*
1. **[`SelfPacedEnsembleClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.SelfPacedEnsembleClassifier.html) [1] ([in Github](https://github.com/ZhiningLiu1998/self-paced-ensemble))**
2. **[`BalanceCascadeClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.BalanceCascadeClassifier.html) [2]**
3. **[`BalancedRandomForestClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.BalancedRandomForestClassifier.html) [3] ([imblearn version](https://imbalanced-learn.org/stable/references/generated/imblearn.ensemble.BalancedRandomForestClassifier.html))**
4. **[`EasyEnsembleClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.EasyEnsembleClassifier.html) [2] ([imblearn version](https://imbalanced-learn.org/stable/references/generated/imblearn.ensemble.EasyEnsembleClassifier.html))**
5. **[`RUSBoostClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.RUSBoostClassifier.html) [4] ([imblearn version](https://imbalanced-learn.org/stable/references/generated/imblearn.ensemble.RUSBoostClassifier.html))**
6. **[`UnderBaggingClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.UnderBaggingClassifier.html) [5] ([imblearn version](https://imbalanced-learn.org/stable/references/generated/imblearn.ensemble.BalancedBaggingClassifier.html))**
- *Over-sampling + Ensemble*
1. **[`OverBoostClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.OverBoostClassifier.html)**
2. **[`SMOTEBoostClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.SMOTEBoostClassifier.html) [6]**
3. **[`KmeansSMOTEBoostClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.KmeansSMOTEBoostClassifier.html)**
4. **[`OverBaggingClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.OverBaggingClassifier.html) [5] ([imblearn version](https://imbalanced-learn.org/stable/references/generated/imblearn.ensemble.BalancedBaggingClassifier.html))**
5. **[`SMOTEBaggingClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.SMOTEBaggingClassifier.html) [7] ([imblearn version](https://imbalanced-learn.org/stable/references/generated/imblearn.ensemble.BalancedBaggingClassifier.html))**
- **Reweighting-based**
- *Cost-sensitive Learning*
1. **[`AdaCostClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.AdaCostClassifier.html) [8]**
2. **[`AdaUBoostClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.AdaUBoostClassifier.html) [9]**
3. **[`AsymBoostClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.AsymBoostClassifier.html) [10]**
- **Compatible**
- **[`CompatibleAdaBoostClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.CompatibleAdaBoostClassifier.html) [11]**
- **[`CompatibleBaggingClassifier`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.CompatibleBaggingClassifier.html) [12]**

> **Note: `imbalanced-ensemble` is still under development, please see [API reference](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/api.html) for the latest list.**

## 5-min Quick Start with IMBENS

**Here, we provide some quick guides to help you get started with IMBENS.**
**We strongly encourage users to check out the [**example gallery**](https://imbalanced-ensemble.readthedocs.io/en/latest/auto_examples/index.html#) for more comprehensive usage examples, which demonstrate many advanced features of IMBENS.**

![](https://raw.githubusercontent.com/ZhiningLiu1998/figures/master/imbalanced-ensemble/example_gallery_snapshot.png)

### A minimal working example

Taking self-paced ensemble [1] as an example, it only requires less than 10 lines of code to deploy it:

```python
>>> from imbens.ensemble import SelfPacedEnsembleClassifier
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>>
>>> X, y = make_classification(n_samples=1000, n_classes=3,
... n_informative=4, weights=[0.2, 0.3, 0.5],
... random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(
... X, y, test_size=0.2, random_state=42)
>>> clf = SelfPacedEnsembleClassifier(random_state=0)
>>> clf.fit(X_train, y_train)
SelfPacedEnsembleClassifier(...)
>>> clf.predict(X_test)
array([...])
```

### Visualize ensemble classifiers

The [`imbens.visualizer`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/visualizer/api.html) sub-module provide an [`ImbalancedEnsembleVisualizer`](https://imbalanced-ensemble.readthedocs.io/en/latest/api/visualizer/_autosummary/imbens.visualizer.ImbalancedEnsembleVisualizer.html).
It can be used to visualize the ensemble estimator(s) for further information or comparison.
Please refer to [**visualizer documentation**](https://imbalanced-ensemble.readthedocs.io/en/latest/api/visualizer/_autosummary/imbens.visualizer.ImbalancedEnsembleVisualizer.html) and [**examples**](https://imbalanced-ensemble.readthedocs.io/en/latest/auto_examples/index.html) for more details.

**Fit an ImbalancedEnsembleVisualizer**
```python
from imbens.ensemble import SelfPacedEnsembleClassifier
from imbens.ensemble import RUSBoostClassifier
from imbens.ensemble import EasyEnsembleClassifier
from sklearn.tree import DecisionTreeClassifier

# Fit ensemble classifiers
init_kwargs = {'estimator': DecisionTreeClassifier()}
ensembles = {
'spe': SelfPacedEnsembleClassifier(**init_kwargs).fit(X_train, y_train),
'rusboost': RUSBoostClassifier(**init_kwargs).fit(X_train, y_train),
'easyens': EasyEnsembleClassifier(**init_kwargs).fit(X_train, y_train),
}

# Fit visualizer
from imbens.visualizer import ImbalancedEnsembleVisualizer
visualizer = ImbalancedEnsembleVisualizer().fit(ensembles=ensembles)
```
**Plot performance curves**
```python
fig, axes = visualizer.performance_lineplot()
```
![](https://raw.githubusercontent.com/ZhiningLiu1998/figures/master/imbalanced-ensemble/examples/visualize_performance_example.png)

**Plot confusion matrices**
```python
fig, axes = visualizer.confusion_matrix_heatmap()
```
![](https://raw.githubusercontent.com/ZhiningLiu1998/figures/master/imbalanced-ensemble/examples/visualize_confusion_matrix_example.png)

### Customizing training log

All ensemble classifiers in IMBENS support customizable training logging.
The training log is controlled by 3 parameters `eval_datasets`, `eval_metrics`, and `training_verbose` of the `fit()` method.
Read more details in the [**fit documentation**](https://imbalanced-ensemble.readthedocs.io/en/latest/api/ensemble/_autosummary/imbens.ensemble.SelfPacedEnsembleClassifier.html#imbens.ensemble.SelfPacedEnsembleClassifier.fit).

**Enable auto training log**
```python
clf.fit(..., train_verbose=True)
```
```
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ โ”ƒ โ”ƒ Data: train โ”ƒ
โ”ƒ #Estimators โ”ƒ Class Distribution โ”ƒ Metric โ”ƒ
โ”ƒ โ”ƒ โ”ƒ acc balanced_acc weighted_f1 โ”ƒ
โ”ฃโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ซ
โ”ƒ 1 โ”ƒ {0: 150, 1: 150, 2: 150} โ”ƒ 0.838 0.877 0.839 โ”ƒ
โ”ƒ 5 โ”ƒ {0: 150, 1: 150, 2: 150} โ”ƒ 0.924 0.949 0.924 โ”ƒ
โ”ƒ 10 โ”ƒ {0: 150, 1: 150, 2: 150} โ”ƒ 0.954 0.970 0.954 โ”ƒ
โ”ƒ 15 โ”ƒ {0: 150, 1: 150, 2: 150} โ”ƒ 0.979 0.986 0.979 โ”ƒ
โ”ƒ 20 โ”ƒ {0: 150, 1: 150, 2: 150} โ”ƒ 0.990 0.993 0.990 โ”ƒ
โ”ƒ 25 โ”ƒ {0: 150, 1: 150, 2: 150} โ”ƒ 0.994 0.996 0.994 โ”ƒ
โ”ƒ 30 โ”ƒ {0: 150, 1: 150, 2: 150} โ”ƒ 0.988 0.992 0.988 โ”ƒ
โ”ƒ 35 โ”ƒ {0: 150, 1: 150, 2: 150} โ”ƒ 0.999 0.999 0.999 โ”ƒ
โ”ƒ 40 โ”ƒ {0: 150, 1: 150, 2: 150} โ”ƒ 0.995 0.997 0.995 โ”ƒ
โ”ƒ 45 โ”ƒ {0: 150, 1: 150, 2: 150} โ”ƒ 0.995 0.997 0.995 โ”ƒ
โ”ƒ 50 โ”ƒ {0: 150, 1: 150, 2: 150} โ”ƒ 0.993 0.995 0.993 โ”ƒ
โ”ฃโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ซ
โ”ƒ final โ”ƒ {0: 150, 1: 150, 2: 150} โ”ƒ 0.993 0.995 0.993 โ”ƒ
โ”—โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ปโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ปโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”›
```

**Customize granularity and content of the training log**
```python
clf.fit(...,
train_verbose={
'granularity': 10,
'print_distribution': False,
'print_metrics': True,
})
```

Click to view example output

```
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ โ”ƒ Data: train โ”ƒ
โ”ƒ #Estimators โ”ƒ Metric โ”ƒ
โ”ƒ โ”ƒ acc balanced_acc weighted_f1 โ”ƒ
โ”ฃโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ซ
โ”ƒ 1 โ”ƒ 0.964 0.970 0.964 โ”ƒ
โ”ƒ 10 โ”ƒ 1.000 1.000 1.000 โ”ƒ
โ”ƒ 20 โ”ƒ 1.000 1.000 1.000 โ”ƒ
โ”ƒ 30 โ”ƒ 1.000 1.000 1.000 โ”ƒ
โ”ƒ 40 โ”ƒ 1.000 1.000 1.000 โ”ƒ
โ”ƒ 50 โ”ƒ 1.000 1.000 1.000 โ”ƒ
โ”ฃโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ซ
โ”ƒ final โ”ƒ 1.000 1.000 1.000 โ”ƒ
โ”—โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ปโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”›
```

**Add evaluation dataset(s)**
```python
clf.fit(...,
eval_datasets={
'valid': (X_valid, y_valid)
})
```

Click to view example output

```
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ โ”ƒ Data: train โ”ƒ Data: valid โ”ƒ
โ”ƒ #Estimators โ”ƒ Metric โ”ƒ Metric โ”ƒ
โ”ƒ โ”ƒ acc balanced_acc weighted_f1 โ”ƒ acc balanced_acc weighted_f1 โ”ƒ
โ”ฃโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ซ
โ”ƒ 1 โ”ƒ 0.939 0.961 0.940 โ”ƒ 0.935 0.933 0.936 โ”ƒ
โ”ƒ 10 โ”ƒ 1.000 1.000 1.000 โ”ƒ 0.971 0.974 0.971 โ”ƒ
โ”ƒ 20 โ”ƒ 1.000 1.000 1.000 โ”ƒ 0.982 0.981 0.982 โ”ƒ
โ”ƒ 30 โ”ƒ 1.000 1.000 1.000 โ”ƒ 0.983 0.983 0.983 โ”ƒ
โ”ƒ 40 โ”ƒ 1.000 1.000 1.000 โ”ƒ 0.983 0.982 0.983 โ”ƒ
โ”ƒ 50 โ”ƒ 1.000 1.000 1.000 โ”ƒ 0.983 0.982 0.983 โ”ƒ
โ”ฃโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ซ
โ”ƒ final โ”ƒ 1.000 1.000 1.000 โ”ƒ 0.983 0.982 0.983 โ”ƒ
โ”—โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ปโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ปโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”›
```

**Customize evaluation metric(s)**
```python
from sklearn.metrics import accuracy_score, f1_score
clf.fit(...,
eval_metrics={
'acc': (accuracy_score, {}),
'weighted_f1': (f1_score, {'average':'weighted'}),
})
```

Click to view example output

```
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ โ”ƒ Data: train โ”ƒ Data: valid โ”ƒ
โ”ƒ #Estimators โ”ƒ Metric โ”ƒ Metric โ”ƒ
โ”ƒ โ”ƒ acc weighted_f1 โ”ƒ acc weighted_f1 โ”ƒ
โ”ฃโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ซ
โ”ƒ 1 โ”ƒ 0.942 0.961 โ”ƒ 0.919 0.936 โ”ƒ
โ”ƒ 10 โ”ƒ 1.000 1.000 โ”ƒ 0.976 0.976 โ”ƒ
โ”ƒ 20 โ”ƒ 1.000 1.000 โ”ƒ 0.977 0.977 โ”ƒ
โ”ƒ 30 โ”ƒ 1.000 1.000 โ”ƒ 0.981 0.980 โ”ƒ
โ”ƒ 40 โ”ƒ 1.000 1.000 โ”ƒ 0.980 0.979 โ”ƒ
โ”ƒ 50 โ”ƒ 1.000 1.000 โ”ƒ 0.981 0.980 โ”ƒ
โ”ฃโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‹โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ซ
โ”ƒ final โ”ƒ 1.000 1.000 โ”ƒ 0.981 0.980 โ”ƒ
โ”—โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ปโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ปโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”›
```

## About imbalanced learning

**Class-imbalance** (also known as the **long-tail problem**) is the fact that the classes are not represented equally in a classification problem, which is quite common in practice. For instance, fraud detection, prediction of rare adverse drug reactions and prediction gene families. Failure to account for the class imbalance often causes inaccurate and decreased predictive performance of many classification algorithms. **Imbalanced learning** aims to tackle the class imbalance problem to learn an unbiased model from imbalanced data.

For more resources on imbalanced learning, please refer to [**awesome-imbalanced-learning**](https://github.com/ZhiningLiu1998/awesome-imbalanced-learning).

## Acknowledgements

IMBENS was initially developed on top of [imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn), but has undergone heavy developments to implement many important imbalanced ensemble techniques.
The infrastructure also underwent significant refactoring to support advanced ensemble learning features that are essential to practical usability (fine-grained training control, parallel computing, multi-class support, training logs, visualization, etc).

## References

| # | Reference |
| ---- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [1] | Zhining Liu, Wei Cao, Zhifeng Gao, Jiang Bian, Hechang Chen, Yi Chang, and Tie-Yan Liu. 2019. Self-paced Ensemble for Highly Imbalanced Massive Data Classification. 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020, pp. 841-852. |
| [2] | X.-Y. Liu, J. Wu, and Z.-H. Zhou, Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539โ€“550, 2009. |
| [3] | Chen, Chao, Andy Liaw, and Leo Breiman. โ€œUsing random forest to learn imbalanced data.โ€ University of California, Berkeley 110 (2004): 1-12. |
| [4] | C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, Rusboost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 40, no. 1, pp. 185โ€“197, 2010. |
| [5] | Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. AAAI/IAAI, 1997, 546-551. |
| [6] | N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, Smoteboost: Improving prediction of the minority class in boosting. in European conference on principles of data mining and knowledge discovery. Springer, 2003, pp. 107โ€“119 |
| [7] | S. Wang and X. Yao, Diversity analysis on imbalanced data sets by using ensemble models. in 2009 IEEE Symposium on Computational Intelligence and Data Mining. IEEE, 2009, pp. 324โ€“331. |
| [8] | Fan, W., Stolfo, S. J., Zhang, J., & Chan, P. K. (1999, June). AdaCost: misclassification cost-sensitive boosting. In Icml (Vol. 99, pp. 97-105). |
| [9] | Shawe-Taylor, G. K. J., & Karakoulas, G. (1999). Optimizing classifiers for imbalanced training sets. Advances in neural information processing systems, 11(11), 253. |
| [10] | Viola, P., & Jones, M. (2001). Fast and robust classification using asymmetric adaboost and a detector cascade. Advances in Neural Information Processing System, 14. |
| [11] | Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139. |
| [12] | Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140. |
| [13] | Guillaume Lemaรฎtre, Fernando Nogueira, and Christos K. Aridas. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1โ€“5, 2017. |

## Related Projects

**Check out [Zhining](https://zhiningliu.com/)'s other open-source projects!**



Imbalanced Learning [Awesome]



GitHub stars



Machine Learning [Awesome]



GitHub stars



Self-paced Ensemble [ICDE]



GitHub stars



Meta-Sampler [NeurIPS]



GitHub stars


## Contributors โœจ

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):



Zhining Liu
Zhining Liu

๐Ÿ’ป ๐Ÿค” ๐Ÿšง ๐Ÿ› ๐Ÿ“–
leaphan
leaphan

๐Ÿ›
hannanhtang
hannanhtang

๐Ÿ›
H.J.Ren
H.J.Ren

๐Ÿ›
Marc Skov Madsen
Marc Skov Madsen

๐Ÿ›

This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!