https://github.com/georgedouzas/imbalanced-learn-extra

Implementation of novel oversampling algorithms.
https://github.com/georgedouzas/imbalanced-learn-extra

clustering-based-oversampling data-science g-somo geometric-smote imbalanced-learning kmeans-smote machine-learning oversampling python scikit-learn smote

Last synced: 3 months ago
JSON representation

Implementation of novel oversampling algorithms.

Host: GitHub
URL: https://github.com/georgedouzas/imbalanced-learn-extra
Owner: georgedouzas
License: mit
Created: 2019-06-22T10:51:42.000Z (over 6 years ago)
Default Branch: main
Last Pushed: 2025-02-05T17:05:31.000Z (8 months ago)
Last Synced: 2025-06-24T03:36:46.735Z (4 months ago)
Topics: clustering-based-oversampling, data-science, g-somo, geometric-smote, imbalanced-learning, kmeans-smote, machine-learning, oversampling, python, scikit-learn, smote
Language: Python
Homepage: https://georgedouzas.github.io/imbalanced-learn-extra/
Size: 1.29 MB
Stars: 34
Watchers: 2
Forks: 16
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          [scikit-learn]: 

[imbalanced-learn]: 

[SOMO]: 

[KMeans-SMOTE]: 

[G-SOMO]: 

[black badge]: 

[black]: 

[docformatter badge]: 

[docformatter]: 

[ruff badge]: 

[ruff]: 

[mypy badge]: 

[mypy]: 

[mkdocs badge]: 

[mkdocs]: 

[version badge]: 

[pythonversion badge]: 

[downloads badge]: 

[gitter]: 

[gitter badge]: 

[discussions]: 

[discussions badge]: 

[ci]: 

[ci badge]: 

[doc]: 

[doc badge]: 

# imbalanced-learn-extra

[![ci][ci badge]][ci] [![doc][doc badge]][doc]

| Category          | Tools    |

| ------------------| -------- |

| **Development**   | [![black][black badge]][black] [![ruff][ruff badge]][ruff] [![mypy][mypy badge]][mypy] [![docformatter][docformatter badge]][docformatter] |

| **Package**       | ![version][version badge] ![pythonversion][pythonversion badge] ![downloads][downloads badge] |

| **Documentation** | [![mkdocs][mkdocs badge]][mkdocs]|

| **Communication** | [![gitter][gitter badge]][gitter] [![discussions][discussions badge]][discussions] |

## Introduction

`imbalanced-learn-extra` is a Python package that extends [imbalanced-learn]. It implements algorithms that are not included in

[imbalanced-learn] due to their novelty or lower citation number. The current version includes the following:

- A general interface for clustering-based oversampling algorithms.

- The Geometric SMOTE algorithm. It is a geometrically enhanced drop-in replacement for SMOTE, that handles numerical as well as

categorical features.

## Installation

For user installation, `imbalanced-learn-extra` is currently available on the PyPi's repository, and you can

install it via `pip`:

```bash

pip install imbalanced-learn-extra

```

Development installation requires cloning the repository and then using [PDM](https://github.com/pdm-project/pdm) to install the

project as well as the main and development dependencies:

```bash

git clone https://github.com/georgedouzas/imbalanced-learn-extra.git

cd imbalanced-learn-extra

pdm install

```

SOM clusterer requires optional dependencies:

```bash

pip install imbalanced-learn-extra[som]

```

## Usage

All the classes included in `imbalanced-learn-extra` follow the [imbalanced-learn] API using the functionality of the base

oversampler. Using [scikit-learn] convention, the data are represented as follows:

- Input data `X`: 2D array-like or sparse matrices.

- Targets `y`: 1D array-like.

The oversamplers implement a `fit` method to learn from `X` and `y`:

```python

oversampler.fit(X, y)

```

They also implement a `fit_resample` method to resample `X` and `y`:

```python

X_resampled, y_resampled = clustering_based_oversampler.fit_resample(X, y)

```

## Citing `imbalanced-learn-extra`

Publications using clustering-based oversampling:

- [G. Douzas, F. Bacao, "Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning", Expert Systems with

    Applications, vol. 82, pp. 40-52, 2017.][SOMO]

- [G. Douzas, F. Bacao, F. Last, "Improving imbalanced learning through a heuristic oversampling method based on k-means and

    SMOTE", Information Sciences, vol. 465, pp. 1-20, 2018.][KMeans-SMOTE]

- [G. Douzas, F. Bacao, F. Last, "G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE", Expert

    Systems with Applications, vol. 183,115230, 2021.][G-SOMO]

Publications using Geometric-SMOTE:

- Douzas, G., Bacao, B. (2019). Geometric SMOTE: a geometrically enhanced

  drop-in replacement for SMOTE. Information Sciences, 501, 118-135.

  

- Fonseca, J., Douzas, G., Bacao, F. (2021). Increasing the Effectiveness of

  Active Learning: Introducing Artificial Data Generation in Active Learning

  for Land Use/Land Cover Classification. Remote Sensing, 13(13), 2619.

  

- Douzas, G., Bacao, F., Fonseca, J., Khudinyan, M. (2019). Imbalanced

  Learning in Land Cover Classification: Improving Minority Classes’

  Prediction Accuracy Using the Geometric SMOTE Algorithm. Remote Sensing,

  11(24), 3040.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/georgedouzas/imbalanced-learn-extra

Awesome Lists containing this project

README