{"id":13692830,"url":"https://github.com/sdv-dev/CTGAN","last_synced_at":"2025-05-02T19:32:42.230Z","repository":{"id":38305072,"uuid":"207162195","full_name":"sdv-dev/CTGAN","owner":"sdv-dev","description":"Conditional GAN for generating synthetic tabular data.","archived":false,"fork":false,"pushed_at":"2025-04-17T19:04:34.000Z","size":1912,"stargazers_count":1377,"open_issues_count":38,"forks_count":308,"subscribers_count":23,"default_branch":"main","last_synced_at":"2025-04-25T15:48:48.372Z","etag":null,"topics":["data-generation","generative-adversarial-network","synthetic-data","synthetic-data-generation","tabular-data"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sdv-dev.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-09-08T19:24:12.000Z","updated_at":"2025-04-24T08:22:52.000Z","dependencies_parsed_at":"2023-02-16T12:00:29.878Z","dependency_job_id":"7ba7d7bf-9649-4993-ab9c-81b87582104f","html_url":"https://github.com/sdv-dev/CTGAN","commit_stats":{"total_commits":274,"total_committers":20,"mean_commits":13.7,"dds":0.7262773722627738,"last_synced_commit":"39cfbbe02da18640a880c161f38202e6ef1d0691"},"previous_names":[],"tags_count":27,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdv-dev%2FCTGAN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdv-dev%2FCTGAN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdv-dev%2FCTGAN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sdv-dev%2FCTGAN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sdv-dev","download_url":"https://codeload.github.com/sdv-dev/CTGAN/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251054336,"owners_count":21529131,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-generation","generative-adversarial-network","synthetic-data","synthetic-data-generation","tabular-data"],"created_at":"2024-08-02T17:01:02.482Z","updated_at":"2025-05-02T19:32:39.527Z","avatar_url":"https://github.com/sdv-dev.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\u003cbr/\u003e\n\u003cp align=\"center\"\u003e\n    \u003ci\u003eThis repository is part of \u003ca href=\"https://sdv.dev\"\u003eThe Synthetic Data Vault Project\u003c/a\u003e, a project from \u003ca href=\"https://datacebo.com\"\u003eDataCebo\u003c/a\u003e.\u003c/i\u003e\n\u003c/p\u003e\n\n[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)\n[![PyPI Shield](https://img.shields.io/pypi/v/ctgan.svg)](https://pypi.python.org/pypi/ctgan)\n[![Unit Tests](https://github.com/sdv-dev/CTGAN/actions/workflows/unit.yml/badge.svg)](https://github.com/sdv-dev/CTGAN/actions/workflows/unit.yml)\n[![Downloads](https://pepy.tech/badge/ctgan)](https://pepy.tech/project/ctgan)\n[![Coverage Status](https://codecov.io/gh/sdv-dev/CTGAN/branch/main/graph/badge.svg)](https://codecov.io/gh/sdv-dev/CTGAN)\n\n\u003cdiv align=\"left\"\u003e\n\u003cbr/\u003e\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://github.com/sdv-dev/CTGAN\"\u003e\n\u003cimg align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/CTGAN-DataCebo.png\"\u003e\u003c/img\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n# Overview\n\nCTGAN is a collection of Deep Learning based synthetic data generators for single table data, which are able to learn from real data and generate synthetic data with high fidelity.\n\n| Important Links                               |                                                                      |\n| --------------------------------------------- | -------------------------------------------------------------------- |\n| :computer: **[Website]**                      | Check out the SDV Website for more information about our overall synthetic data ecosystem.|\n| :orange_book: **[Blog]**                      | A deeper look at open source, synthetic data creation and evaluation.|\n| :book: **[Documentation]**                    | Quickstarts, User and Development Guides, and API Reference.         |\n| :octocat: **[Repository]**                    | The link to the Github Repository of this library.                   |\n| :keyboard: **[Development Status]**           | This software is in its Pre-Alpha stage.                             |\n| [![][Slack Logo] **Community**][Community]    | Join our Slack Workspace for announcements and discussions.          |\n\n[Website]: https://sdv.dev\n[Blog]: https://datacebo.com/blog\n[Documentation]: https://bit.ly/sdv-docs\n[Repository]: https://github.com/sdv-dev/CTGAN\n[License]: https://github.com/sdv-dev/CTGAN/blob/main/LICENSE\n[Development Status]: https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha\n[Slack Logo]: https://github.com/sdv-dev/SDV/blob/stable/docs/images/slack.png\n[Community]: https://bit.ly/sdv-slack-invite\n\nCurrently, this library implements the **CTGAN** and **TVAE** models described in the [Modeling Tabular data using Conditional GAN](https://arxiv.org/abs/1907.00503) paper, presented at the 2019 NeurIPS conference.\n\n# Install\n\n## Use CTGAN through the SDV library\n\n:warning: If you're just getting started with synthetic data, we recommend installing the SDV library which provides user-friendly APIs for accessing CTGAN. :warning:\n\nThe SDV library provides wrappers for preprocessing your data as well as additional usability features like constraints. See the [SDV documentation](https://bit.ly/sdv-docs) to get started.\n\n## Use the CTGAN standalone library\n\nAlternatively, you can also install and use **CTGAN** directly, as a standalone library:\n\n**Using `pip`:**\n\n```bash\npip install ctgan\n```\n\n**Using `conda`:**\n\n```bash\nconda install -c pytorch -c conda-forge ctgan\n```\n\nWhen using the CTGAN library directly, you may need to manually preprocess your data into the correct format, for example:\n\n* Continuous data must be represented as floats\n* Discrete data must be represented as ints or strings\n* The data should not contain any missing values\n\n# Usage Example\n\nIn this example we load the [Adult Census Dataset](https://archive.ics.uci.edu/ml/datasets/adult)* which is a built-in demo dataset. We use CTGAN to learn from the real data and then generate some synthetic data.\n\n```python3\nfrom ctgan import CTGAN\nfrom ctgan import load_demo\n\nreal_data = load_demo()\n\n# Names of the columns that are discrete\ndiscrete_columns = [\n    'workclass',\n    'education',\n    'marital-status',\n    'occupation',\n    'relationship',\n    'race',\n    'sex',\n    'native-country',\n    'income'\n]\n\nctgan = CTGAN(epochs=10)\nctgan.fit(real_data, discrete_columns)\n\n# Create synthetic data\nsynthetic_data = ctgan.sample(1000)\n```\n\n*For more information about the dataset see:\nDua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml].\nIrvine, CA: University of California, School of Information and Computer Science.\n\n# Join our community\n\nJoin our [Slack channel](https://bit.ly/sdv-slack-invite) to discuss more about CTGAN and synthetic data. If you find a bug or have a feature request, you can also [open an issue](https://github.com/sdv-dev/CTGAN/issues) on our GitHub.\n\n**Interested in contributing to CTGAN?** Read our [Contribution Guide](CONTRIBUTING.rst) to get started.\n\n# Citing CTGAN\n\nIf you use CTGAN, please cite the following work:\n\n*Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni.* **Modeling Tabular data using Conditional GAN**. NeurIPS, 2019.\n\n```LaTeX\n@inproceedings{ctgan,\n  title={Modeling Tabular data using Conditional GAN},\n  author={Xu, Lei and Skoularidou, Maria and Cuesta-Infante, Alfredo and Veeramachaneni, Kalyan},\n  booktitle={Advances in Neural Information Processing Systems},\n  year={2019}\n}\n```\n\n# Related Projects\nPlease note that these projects are external to the SDV Ecosystem. They are not affiliated with or maintained by DataCebo.\n\n* **R Interface for CTGAN**: A wrapper around **CTGAN** that brings the functionalities to **R** users.\nMore details can be found in the corresponding repository: https://github.com/kasaai/ctgan\n* **CTGAN Server CLI**: A package to easily deploy CTGAN onto a remote server. Created by Timothy Pillow @oregonpillow at: https://github.com/oregonpillow/ctgan-server-cli\n\n---\n\n\n\u003cdiv align=\"center\"\u003e\n\u003ca href=\"https://datacebo.com\"\u003e\u003cimg align=\"center\" width=40% src=\"https://github.com/sdv-dev/SDV/blob/stable/docs/images/DataCebo.png\"\u003e\u003c/img\u003e\u003c/a\u003e\n\u003c/div\u003e\n\u003cbr/\u003e\n\u003cbr/\u003e\n\n[The Synthetic Data Vault Project](https://sdv.dev) was first created at MIT's [Data to AI Lab](\nhttps://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we\ncreated [DataCebo](https://datacebo.com) in 2020 with the goal of growing the project.\nToday, DataCebo is the proud developer of SDV, the largest ecosystem for\nsynthetic data generation \u0026 evaluation. It is home to multiple libraries that support synthetic\ndata, including:\n\n* 🔄 Data discovery \u0026 transformation. Reverse the transforms to reproduce realistic data.\n* 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,\n  multi table and time series data.\n* 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data\n  generation models.\n\n[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully\nintegrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries\nfor specific needs.\n","funding_links":[],"categories":["Data-driven methods","Synthetic Data Generators"],"sub_categories":["Tabular"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsdv-dev%2FCTGAN","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsdv-dev%2FCTGAN","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsdv-dev%2FCTGAN/lists"}