An open API service indexing awesome lists of open source software.

https://github.com/andreazoccatelli/tabular_data_augmentation_continuous

This repository contains the scripts used to write my master degree thesis project: "Augmentation of tabular data with continuous features for binary imbalanced classification problems"
https://github.com/andreazoccatelli/tabular_data_augmentation_continuous

cgan copula data-augmentation imbalanced-classification imbalanced-data imbalanced-learning

Last synced: 7 months ago
JSON representation

This repository contains the scripts used to write my master degree thesis project: "Augmentation of tabular data with continuous features for binary imbalanced classification problems"

Awesome Lists containing this project

README

          

# Augmentation of tabular data with continuous features for binary imbalanced classification problems

The aim of this project is to augment the observations that belong to the minority class using copula sampling and conditional GANs in order to improve the performance of the classifiers for binary imbalanced classification problems.

- For the augmentation based on copulas, my library, GenCopula has been used.
``` r
library(devtools)
install_github("AndreaZoccatelli/GenCopula")
```
- The library used for the augmentation based on cGAN is CTGAN
- To re-create the datasets used in the project run Create_data.ipynb

- These notebooks report the results obtained on the different dataset:
- Best case
- 20-30% Safe
- Less 20% Safe
- 10% Minority
- 5% Minority
- 4 Features
- 8 Features
- Default