Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nova-ims-innovation-and-analytics-lab/cluster-over-sampling

A general interface for clustering based over-sampling algorithms.
https://github.com/nova-ims-innovation-and-analytics-lab/cluster-over-sampling

data-science imbalanced-data imbalanced-learning machine-learning oversampling python3 scikit-learn

Last synced: 27 days ago
JSON representation

A general interface for clustering based over-sampling algorithms.

Awesome Lists containing this project

README

        

[scikit-learn]:
[imbalanced-learn]:
[SOMO]:
[KMeans-SMOTE]:
[G-SOMO]:
[black badge]:
[black]:
[docformatter badge]:
[docformatter]:
[ruff badge]:
[ruff]:
[mypy badge]:
[mypy]:
[mkdocs badge]:
[mkdocs]:
[version badge]:
[pythonversion badge]:
[downloads badge]:
[gitter]:
[gitter badge]:
[discussions]:
[discussions badge]:
[ci]:
[ci badge]:
[doc]:
[doc badge]:

[![Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed;
support/maintenance will be provided as time
allows.](https://www.repostatus.org/badges/latest/inactive.svg)](https://www.repostatus.org/#inactive)

> **The project has been moved to [imbalanced-learn-extra](https://github.com/georgedouzas/imbalanced-learn-extra).**

# cluster-over-sampling

[![ci][ci badge]][ci] [![doc][doc badge]][doc]

| Category | Tools |
| ------------------| -------- |
| **Development** | [![black][black badge]][black] [![ruff][ruff badge]][ruff] [![mypy][mypy badge]][mypy] [![docformatter][docformatter badge]][docformatter] |
| **Package** | ![version][version badge] ![pythonversion][pythonversion badge] ![downloads][downloads badge] |
| **Documentation** | [![mkdocs][mkdocs badge]][mkdocs]|
| **Communication** | [![gitter][gitter badge]][gitter] [![discussions][discussions badge]][discussions] |

## Introduction

A general interface for clustering based over-sampling algorithms.

## Installation

For user installation, `cluster-over-sampling` is currently available on the PyPi's repository, and you can
install it via `pip`:

```bash
pip install cluster-over-sampling
```

Development installation requires to clone the repository and then use [PDM](https://github.com/pdm-project/pdm) to install the
project as well as the main and development dependencies:

```bash
git clone https://github.com/georgedouzas/cluster-over-sampling.git
cd cluster-over-sampling
pdm install
```

SOM clusterer requires optional dependencies:

```bash
pip install cluster-over-sampling[som]
```

## Usage

All the classes included in `cluster-over-sampling` follow the [imbalanced-learn] API using the functionality of the base
oversampler. Using [scikit-learn] convention, the data are represented as follows:

- Input data `X`: 2D array-like or sparse matrices.
- Targets `y`: 1D array-like.

The clustering-based oversamplers implement a `fit` method to learn from `X` and `y`:

```python
clustering_based_oversampler.fit(X, y)
```

They also implement a `fit_resample` method to resample `X` and `y`:

```python
X_resampled, y_resampled = clustering_based_oversampler.fit_resample(X, y)
```

## References

If you use `cluster-over-sampling` in a scientific publication, we would appreciate citations to any of the following papers:

- [G. Douzas, F. Bacao, "Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning", Expert Systems with
Applications, vol. 82, pp. 40-52, 2017.][SOMO]
- [G. Douzas, F. Bacao, F. Last, "Improving imbalanced learning through a heuristic oversampling method based on k-means and
SMOTE", Information Sciences, vol. 465, pp. 1-20, 2018.][KMeans-SMOTE]
- [G. Douzas, F. Bacao, F. Last, "G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE", Expert
Systems with Applications, vol. 183,115230, 2021.][G-SOMO]