Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/georgedouzas/imbalanced-learn-extra
Implementation of novel oversampling algorithms.
https://github.com/georgedouzas/imbalanced-learn-extra
clustering-base-oversampling data-science geometric-smote geometric-somo imbalanced-data imbalanced-learn imbalanced-learning kmeans-smote machine-learning oversampling python scikit-learn smote somo
Last synced: about 2 months ago
JSON representation
Implementation of novel oversampling algorithms.
- Host: GitHub
- URL: https://github.com/georgedouzas/imbalanced-learn-extra
- Owner: georgedouzas
- License: mit
- Created: 2024-06-28T15:05:58.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-06-29T05:45:28.000Z (6 months ago)
- Last Synced: 2024-10-07T20:04:12.282Z (3 months ago)
- Topics: clustering-base-oversampling, data-science, geometric-smote, geometric-somo, imbalanced-data, imbalanced-learn, imbalanced-learning, kmeans-smote, machine-learning, oversampling, python, scikit-learn, smote, somo
- Language: Python
- Homepage: https://georgedouzas.github.io/imbalanced-learn-extra
- Size: 2.79 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
[scikit-learn]:
[imbalanced-learn]:
[SOMO]:
[KMeans-SMOTE]:
[G-SOMO]:
[black badge]:
[black]:
[docformatter badge]:
[docformatter]:
[ruff badge]:
[ruff]:
[mypy badge]:
[mypy]:
[mkdocs badge]:
[mkdocs]:
[version badge]:
[pythonversion badge]:
[downloads badge]:
[gitter]:
[gitter badge]:
[discussions]:
[discussions badge]:
[ci]:
[ci badge]:
[doc]:
[doc badge]:# imbalanced-learn-extra
[![ci][ci badge]][ci] [![doc][doc badge]][doc]
| Category | Tools |
| ------------------| -------- |
| **Development** | [![black][black badge]][black] [![ruff][ruff badge]][ruff] [![mypy][mypy badge]][mypy] [![docformatter][docformatter badge]][docformatter] |
| **Package** | ![version][version badge] ![pythonversion][pythonversion badge] ![downloads][downloads badge] |
| **Documentation** | [![mkdocs][mkdocs badge]][mkdocs]|
| **Communication** | [![gitter][gitter badge]][gitter] [![discussions][discussions badge]][discussions] |## Introduction
`imbalanced-learn-extra` is a Python package that extends [imbalanced-learn]. It implements algorithms that are not included in
[imbalanced-learn] due to their novelty or lower citation number. The current version includes the following:- A general interface for clustering-based oversampling algorithms.
- The Geometric SMOTE algorithm. It is a geometrically enhanced drop-in replacement for SMOTE, that handles numerical as well as
categorical features.## Installation
For user installation, `imbalanced-learn-extra` is currently available on the PyPi's repository, and you can
install it via `pip`:```bash
pip install imbalanced-learn-extra
```Development installation requires cloning the repository and then using [PDM](https://github.com/pdm-project/pdm) to install the
project as well as the main and development dependencies:```bash
git clone https://github.com/georgedouzas/imbalanced-learn-extra.git
cd imbalanced-learn-extra
pdm install
```SOM clusterer requires optional dependencies:
```bash
pip install imbalanced-learn-extra[som]
```## Usage
All the classes included in `imbalanced-learn-extra` follow the [imbalanced-learn] API using the functionality of the base
oversampler. Using [scikit-learn] convention, the data are represented as follows:- Input data `X`: 2D array-like or sparse matrices.
- Targets `y`: 1D array-like.The oversamplers implement a `fit` method to learn from `X` and `y`:
```python
oversampler.fit(X, y)
```They also implement a `fit_resample` method to resample `X` and `y`:
```python
X_resampled, y_resampled = clustering_based_oversampler.fit_resample(X, y)
```## Citing `imbalanced-learn-extra`
Publications using clustering-based oversampling:
- [G. Douzas, F. Bacao, "Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning", Expert Systems with
Applications, vol. 82, pp. 40-52, 2017.][SOMO]
- [G. Douzas, F. Bacao, F. Last, "Improving imbalanced learning through a heuristic oversampling method based on k-means and
SMOTE", Information Sciences, vol. 465, pp. 1-20, 2018.][KMeans-SMOTE]
- [G. Douzas, F. Bacao, F. Last, "G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE", Expert
Systems with Applications, vol. 183,115230, 2021.][G-SOMO]Publications using Geometric-SMOTE:
- Douzas, G., Bacao, B. (2019). Geometric SMOTE: a geometrically enhanced
drop-in replacement for SMOTE. Information Sciences, 501, 118-135.
- Fonseca, J., Douzas, G., Bacao, F. (2021). Increasing the Effectiveness of
Active Learning: Introducing Artificial Data Generation in Active Learning
for Land Use/Land Cover Classification. Remote Sensing, 13(13), 2619.
- Douzas, G., Bacao, F., Fonseca, J., Khudinyan, M. (2019). Imbalanced
Learning in Land Cover Classification: Improving Minority Classes’
Prediction Accuracy Using the Geometric SMOTE Algorithm. Remote Sensing,
11(24), 3040.