https://github.com/tinyadapter/relddpm

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/tinyadapter/relddpm
Owner: tinyAdapter
Created: 2024-12-24T02:44:46.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-12-24T03:18:13.000Z (over 1 year ago)
Last Synced: 2024-12-24T04:17:30.768Z (over 1 year ago)
Language: Python
Size: 25.2 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # RelDDPM

## Introduction

This is the source code of the paper **Controllable Tabular Data Synthesis Using Diffusion Models**

## Quick Start

### Environment Setup

Before running the code, please make sure your Python version is above **3.7**.

We recommend running the code under a virtual environment:

```sh

conda create -n relddpm_env python=3.8

conda activate relddpm_env

```

Then install the necessary packages by :

```sh

pip install -r requirements.txt

```

Install PyTorch :

```sh

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

```

### Code Structure

```sh

|-- datasets

    |-- minority_class_oversampling # datasets used in minority class oversampling task

    |-- missing_tuple_completion # datasets used in missing tuple completion task

|-- ddpm # the denoise diffusion probabilistic model package

|-- lib_completion # the library used in missing tuple completion task 

|-- lib_oversampling # the library used in minority class oversampling task 

|-- data_utils.py # the class to preprocess the dataset

|-- eval_utils.py # the class to evaluate

|-- eval.py # code of the evaluation

|-- main.py # main code

```

### Run

#### Minority Class Oversampling

Run the code to generate synthetic data for minority class oversampling with the following command:

```sh

python main.py --task-name=oversampling --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]

python eval.py --task-name=oversampling --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]

```

The parameter "dataset" should be "default", "shoppers" or "weatherAUS".

For example:

```sh

python main.py --task-name=oversampling --dataset-name=default --device=0 --save-name=default_output

python eval.py --task-name=oversampling --dataset-name=default --device=0 --save-name=default_output

```

#### Missing Tuple Completion

Run the code to generate synthetic data for missing tuple completion with the following command:

```sh

python main.py --task-name=completion --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]

python eval.py --task-name=completion --dataset-name=[dataset] --device=[GPU id] --save-name=[output file]

```

The parameter "dataset" should be "heart", "airbnb" or "imdb".

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tinyadapter/relddpm

Awesome Lists containing this project

README