https://github.com/gretelai/gretel-synthetics
Synthetic data generators for structured and unstructured text, featuring differentially private learning.
https://github.com/gretelai/gretel-synthetics
artificial-intelligence differential-privacy privacy synthetic-data tensorflow
Last synced: 5 days ago
JSON representation
Synthetic data generators for structured and unstructured text, featuring differentially private learning.
- Host: GitHub
- URL: https://github.com/gretelai/gretel-synthetics
- Owner: gretelai
- License: other
- Created: 2020-03-02T15:54:44.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2025-03-17T22:02:43.000Z (29 days ago)
- Last Synced: 2025-04-03T22:03:26.943Z (12 days ago)
- Topics: artificial-intelligence, differential-privacy, privacy, synthetic-data, tensorflow
- Language: Python
- Homepage: https://gretel.ai/platform/synthetics
- Size: 2.35 MB
- Stars: 626
- Watchers: 30
- Forks: 90
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-privacy-engineering - Gretel Synthetics - Synthetic data generators for structured and unstructured text, featuring differentially private learning. (Awesome Privacy Engineering [](https://awesome.re) / Synthetic Data)
- awesome-synthetic-data - gretel-synthetics - Generative models for structured and unstructured text, tabular, and multi-variate time-series data featuring differentially private learning. (Libraries / Text, Tabular and Time-Series)
- awesome-open-data-centric-ai - Gretel Synthetics - synthetics?style=social) | <a href="https://github.com/gretelai/gretel-synthetics/blob/main/LICENSE"><img src="https://img.shields.io/github/license/gretelai/gretel-synthetics" height="15"/></a> | (Augmentation and synthetic data)
- awesome-data-synthesis - gretel - create fake, synthetic datasets with enhanced privacy guarantees (Data-driven methods / Tabular)
- awesome-dataset-creation - gretel-synthetics - Generative models for structured and unstructured text, tabular, and multi-variate time-series data featuring differentially private learning. (Libraries / Text, Tabular and Time-Series)
- awesome-starred - gretelai/gretel-synthetics - Synthetic data generators for structured and unstructured text, featuring differentially private learning. (artificial-intelligence)
- awesome-production-machine-learning - Gretel Synthetics - synthetics.svg?style=social) - Gretel Synthetics is a synthetic data generators for structured and unstructured text, featuring differentially private learning. (Data Annotation and Synthesis)
README
# Gretel Synthetics
A permissive synthetic data library from Gretel.ai[](https://gretel-synthetics.readthedocs.io/en/stable/?badge=stable)
[](https://cla-assistant.io/gretelai/gretel-synthetics)
[](https://badge.fury.io/py/gretel-synthetics)
[](https://github.com/gretelai/gretel-synthetics)
[](https://pepy.tech/project/gretel-synthetics)
[](https://github.com/gretelai/gretel-synthetics)
[](https://gretel.ai/discord)## Documentation
- [Get started with gretel-synthetics](https://gretel-synthetics.readthedocs.io/en/stable/)
- [Configuration](https://gretel-synthetics.readthedocs.io/en/stable/api/config.html)
- [Train your model](https://gretel-synthetics.readthedocs.io/en/stable/api/train.html)
- [Generate synthetic records](https://gretel-synthetics.readthedocs.io/en/stable/api/generate.html)## Try it out now
If you want to quickly discover gretel-synthetics, simply click the button below and follow the tutorials!
[](https://colab.research.google.com/github/gretelai/gretel-synthetics/blob/master/examples/synthetic_records.ipynb)
Check out additional examples [here](https://github.com/gretelai/gretel-synthetics/tree/master/examples).
## Getting Started
This section will guide you through installation of `gretel-synthetics` and dependencies that are not directly installed by the Python package manager.
### Dependency Requirements
By default, we do not install certain core requirements, the following dependencies should be installed _external to the installation_
of `gretel-synthetics`, depending on which model(s) you plan to use.- Torch: Used by Timeseries DGAN and ACTGAN (for ACTGAN, Torch is installed by SDV), we recommend version 2.0
- SDV (Synthetic Data Vault): Used by ACTGAN, we recommend version 0.17.xThese dependencies can be installed by doing the following:
```
pip install sdv<0.18 # for ACTGAN
pip install torch==2.0 # for Timeseries DGAN
```To install the actual `gretel-synthetics` package, first clone the repo and then...
```
pip install -U .
```_or_
```
pip install gretel-synthetics
```_then..._
```
pip install jupyter
jupyter notebook
```When the UI launches in your browser, navigate to `examples/synthetic_records.ipynb` and get generating!
If you want to install `gretel-synthetics` locally and use a GPU (recommended):
1. Create a virtual environment (e.g. using `conda`)
```
conda create --name tf python=3.9
```2. Activate the virtual environment
```
conda activate tf
```3. Run the setup script `./setup-utils/setup-gretel-synthetics-tensorflow24-with-gpu.sh`
The last step will install all the necessary software packages for GPU usage, `tensorflow=2.8` and `gretel-synthetics`.
Note that this script works only for Ubuntu 18.04. You might need to modify it for other OS versions.## Timeseries DGAN Overview
The [timeseries DGAN module](https://synthetics.docs.gretel.ai/en/stable/models/timeseries_dgan.html#timeseries-dgan) contains a PyTorch implementation of a DoppelGANger model that is optimized for timeseries data. Similar to tensorflow, you will need to manually install pytorch:
```
pip install torch==1.13.1
```[This notebook](https://github.com/gretelai/gretel-synthetics/blob/master/examples/timeseries_dgan.ipynb) shows basic usage on a small data set of smart home sensor readings.
## ACTGAN Overview
ACTGAN (Anyway CTGAN) is an extension of the popular [CTGAN implementation](https://sdv.dev/SDV/user_guides/single_table/ctgan.html) that provides
some additional functionality to improve memory usage, autodetection and transformation of columns, and more.To use this model, you will need to manually install SDV:
```
pip install sdv<0.18
```Keep in mind that this will also install several dependencies like PyTorch that SDV relies on, which may conflict with PyTorch
versions installed for use with other models like Timeseries DGAN.The ACTGAN interface is a superset of the CTGAN interface. To see the additional features, please take a look at the ACTGAN demo notebook in the `examples` directory of this repo.