Projects in Awesome Lists tagged with synthetic-data
A curated list of projects in awesome lists tagged with synthetic-data .
https://github.com/stefan-jansen/machine-learning-for-trading
Code for Machine Learning for Algorithmic Trading, 2nd edition.
artificial-intelligence data-science deep-learning finance investment investment-strategies machine-learning ml4t-workflow synthetic-data trading trading-agent trading-strategies
Last synced: 13 May 2025
https://github.com/datajuicer/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
data data-analysis data-pipeline data-processing data-science data-visualization foundation-models instruction-tuning large-language-models llm llms multi-modal pre-training synthetic-data
Last synced: 08 Nov 2025
https://github.com/lk-geimfari/mimesis
Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
data dataframe datascience dummy factory factory-boy fake fixtures generator json-generator mimesis mock pandas polars pytest-plugin python schema syntetic synthetic-data testing
Last synced: 28 Dec 2025
https://github.com/modelscope/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
chinese data-analysis data-science data-visualization dataset gpt gpt-4 instruction-tuning large-language-models llama llava llm llms multi-modal nlp opendata pre-training pytorch streamlit synthetic-data
Last synced: 13 May 2025
https://github.com/nucleuscloud/neosync
Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.
benthos docker etl faker fine-tuning golang kubernetes mysql nextjs open-source orchestration postgresql reactjs self-hosted synthetic-data synthetic-data-generation test-data-generator testing typescript
Last synced: 12 May 2025
https://github.com/kiln-ai/kiln
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
ai chain-of-thought collaboration dataset-generation evals evaluation fine-tuning machine-learning macos ml ollama openai prompt prompt-engineering python rlhf synthetic-data windows
Last synced: 23 Apr 2025
https://github.com/DLR-RM/BlenderProc
A procedural Blender pipeline for photorealistic training image generation
3d-engines 3d-front-dataset 3d-graphics 3d-reconstruction blender blender-installation blender-pipeline camera-positions camera-sampling computer-graphics depth-images pose-estimation python rendering segmentation suncg-scene synthetic synthetic-data
Last synced: 09 May 2025
https://github.com/pgmpy/pgmpy
Python Library for Causal and Probabilistic Modeling using Bayesian Networks
bayesian-networks causal-discovery causal-identification causal-inference causal-models causal-validation mixed-data probabilistic-inference python simulation synthetic-data
Last synced: 15 May 2025
https://github.com/sdv-dev/sdv
Synthetic data generation for tabular data
data-generation deep-learning gan gans generative-adversarial-network generative-ai generative-model generativeai machine-learning multi-table relational-datasets sdv synthetic-data synthetic-data-generation time-series
Last synced: 11 May 2025
https://github.com/argilla-io/distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
ai huggingface llms openai python rlaif rlhf synthetic-data synthetic-dataset-generation
Last synced: 11 Apr 2025
https://github.com/synthetichealth/synthea
Synthetic Patient Population Simulator
fhir health-data simulation synthea synthetic-data synthetic-population
Last synced: 13 May 2025
https://github.com/hitsz-ids/synthetic-data-generator
SDG is a specialized framework designed to generate high-quality structured tabular data.
agent data-generator deep-learning gan generative-ai llm machine-learning privacy synthetic-data tabular-data
Last synced: 13 May 2025
https://github.com/sdv-dev/SDV
Synthetic data generation for tabular data
data-generation deep-learning gan gans generative-adversarial-network generative-ai generative-model generativeai machine-learning multi-table relational-datasets sdv synthetic-data synthetic-data-generation time-series
Last synced: 26 Mar 2025
https://synthetichealth.github.io/synthea/
Synthetic Patient Population Simulator
fhir health-data simulation synthea synthetic-data synthetic-population
Last synced: 04 Apr 2025
https://github.com/unrealcv/unrealcv
UnrealCV: Connecting Computer Vision to Unreal Engine
computer-vision embodied-ai machine-learning simulation synthetic-data ue4 virtual-worlds
Last synced: 25 Sep 2025
https://github.com/ydataai/ydata-synthetic
Synthetic data generators for tabular and time-series data
datageneration datagenerator deep-learning gan gan-architectures gans generative-adversarial-network machine-learning python3 pytorch synthetic-data tensorflow2 time-series timeseries training-data
Last synced: 13 May 2025
https://github.com/huggingface/aisheets
Build, enrich, and transform datasets using AI models with no code
ai llm-evaluation llms nocode oss synthetic-data
Last synced: 14 Oct 2025
https://github.com/shuttle-hq/synth
The Declarative Data Generator
data-generation hacktoberfest json postgres realistic-data rust synthetic-data test-data-generator
Last synced: 15 May 2025
https://github.com/sdv-dev/ctgan
Conditional GAN for generating synthetic tabular data.
data-generation generative-adversarial-network synthetic-data synthetic-data-generation tabular-data
Last synced: 13 May 2025
https://github.com/sdv-dev/CTGAN
Conditional GAN for generating synthetic tabular data.
data-generation generative-adversarial-network synthetic-data synthetic-data-generation tabular-data
Last synced: 02 May 2025
https://github.com/GreenmaskIO/greenmask
PostgreSQL database anonymization and synthetic data generation tool
anonymization deterministic dump golang masking obfuscation obfuscator postgresql restore s3 security security-tools staging synthetic-data transform
Last synced: 05 Apr 2025
https://github.com/datadreamer-dev/DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
alignment deep-learning fine-tuning gpt instruction-tuning llm llmops llms machine-learning natural-language-processing nlp nlp-library openai python pytorch synthetic-data synthetic-dataset-generation transformers
Last synced: 06 Nov 2025
https://github.com/datadreamer-dev/datadreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
alignment deep-learning fine-tuning gpt instruction-tuning llm llmops llms machine-learning natural-language-processing nlp nlp-library openai python pytorch synthetic-data synthetic-dataset-generation transformers
Last synced: 29 Sep 2025
https://github.com/jofpin/synthBTC
A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.
bitcoin data-processing monte-carlo-simulation nodejs prediction synthetic-data turbit
Last synced: 27 Sep 2025
https://github.com/Kiln-AI/Kiln
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
ai chain-of-thought collaboration dataset-generation fine-tuning machine-learning macos ml ollama openai prompt prompt-engineering python rlhf synthetic-data windows
Last synced: 06 Oct 2025
https://github.com/batsresearch/bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
domain-adaptation gpt llm synthetic-data synthetic-dataset-generation task-adaptation zero-shot-learning
Last synced: 21 Apr 2025
https://github.com/magpie-align/magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
alignment dataset gemma llama2 llama3 llm nlp paper phi3 qwen2 supervised-finetuning synthetic-data synthetic-dataset-generation
Last synced: 15 May 2025
https://github.com/jofpin/synthbtc
A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.
bitcoin data-processing monte-carlo-simulation nodejs prediction synthetic-data turbit
Last synced: 16 May 2025
https://github.com/BatsResearch/bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
domain-adaptation gpt llm synthetic-data synthetic-dataset-generation task-adaptation zero-shot-learning
Last synced: 16 Apr 2025
https://github.com/gretelai/gretel-synthetics
Synthetic data generators for structured and unstructured text, featuring differentially private learning.
artificial-intelligence differential-privacy privacy synthetic-data tensorflow
Last synced: 14 May 2025
https://github.com/SciPhi-AI/synthesizer
A multi-purpose LLM framework for RAG and data creation.
agents ai artificial-intelligence machine-learning synthetic-data
Last synced: 10 Jul 2025
https://github.com/paulbricman/thisrepositorydoesnotexist
A curated list of awesome projects which use Machine Learning to generate synthetic content.
generation-algorithms generative-adversarial-network synthetic-data synthetic-dataset-generation synthetic-images
Last synced: 12 Oct 2025
https://github.com/sdv-dev/copulas
A library to model multivariate data using copulas.
copulas data-generation generative-ai generative-model machine-learning synthetic-data synthetic-data-generation tabular-data
Last synced: 14 May 2025
https://github.com/vanderschaarlab/synthcity
A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
data-augmentation fairness-ml generative-model machine-learning privacy pytorch synthetic-data tabular-data
Last synced: 16 May 2025
https://github.com/plaitpy/plaitpy
plait.py - a fake data modeler
declarative modeling synthetic-data
Last synced: 04 Apr 2025
https://github.com/stackloklabs/promptwright
Generate large synthetic data using an LLM
ai data-science dataset huggingface huggingface-datasets machine-learning synthetic-data synthetic-dataset-generation
Last synced: 16 May 2025
https://github.com/databrickslabs/dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
data-generation databricks datagen datageneration datagenerator delta-live-tables deltalake faker pyspark python spark spark-streaming synthetic-data
Last synced: 07 Jul 2025
https://github.com/stacklok/promptwright
Generate large synthetic data using an LLM
ai data-science dataset huggingface huggingface-datasets machine-learning synthetic-data synthetic-dataset-generation
Last synced: 08 Apr 2025
https://github.com/sparkfish/augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
augmentation-pipeline computer-vision crappification data-augmentation data-pipeline deep-neural-networks image-processing machine-learning synthetic-data synthetic-dataset-generation training-data
Last synced: 10 Apr 2025
https://github.com/GeorgeCazenavette/mtt-distillation
Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"
artificial-intelligence computer-vision machine-learning synthetic-data
Last synced: 08 May 2025
https://github.com/wenbowen123/iros20-6d-pose-tracking
[IROS 2020] se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains
3d 6d-pose-estimation 6dof-pose 6dof-tracking computer-vision dataset domain-adaptation human-robot-interaction manipulation pose-estimation robot robotics robots synthetic-data synthetic-domains tracking ycb ycb-video
Last synced: 07 May 2025
https://github.com/unity-technologies/synthdet
SynthDet - An end-to-end object detection pipeline using synthetic data
computer-vision deep-learning detection domain-randomization machine-learning object-detection pose-estimation synthetic-data synthetic-dataset synthetic-dataset-generation
Last synced: 14 Aug 2025
https://github.com/microsoft/genalog
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.
data-generation data-science machine-learning ner ocr-recognition python synthetic-data synthetic-data-generation synthetic-images text-alignment
Last synced: 04 Apr 2025
https://github.com/BMW-InnovationLab/BMW-Labeltool-Lite
This repository provides you with an easy-to-use labeling tool for State-of-the-art Deep Learning training purposes. It supports Auto-Labeling.
annotaion auto-label autolabeling bounding-box boundingbox computer-vision deep-learning docker image-annotation inference label labeling-tool labeltool neural-network object-detection smart-labeling synthetic-data tensorflow voc yolov4
Last synced: 07 May 2025
https://github.com/unity-technologies/robotics-object-pose-estimation
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.
autonomy computer-vision deep-learning machine-learning manipulation model-training motion-planning perception physics-simulation pose-estimation robotics robotics-simulation ros simulation synthetic-data trajectory-generation tutorial unity ur3-robot-arm urdf
Last synced: 06 Apr 2025
https://github.com/tabularis-ai/be_great
A novel approach for synthesizing tabular data using pretrained large language models
data-generation deep-learning synthetic-data synthetic-dataset-generation tabular-data transformers
Last synced: 12 Jun 2025
https://github.com/Unity-Technologies/PeopleSansPeople
Unity's privacy-preserving human-centric synthetic data generator
applied-ml-research billing-5160 computer-vision deep-learning human-activity-recognition human-centric-ml human-pose-estimation icml-2022 labeling object-detection owner-machine-learning perception pose-estimation synthetic-data synthetic-data-generation synthetic-dataset-generation synthetic-datasets transfer-learning unity unity3d
Last synced: 24 Apr 2025
https://github.com/unity-technologies/peoplesanspeople
Unity's privacy-preserving human-centric synthetic data generator
applied-ml-research billing-5160 computer-vision deep-learning human-activity-recognition human-centric-ml human-pose-estimation icml-2022 labeling object-detection owner-machine-learning perception pose-estimation synthetic-data synthetic-data-generation synthetic-dataset-generation synthetic-datasets transfer-learning unity unity3d
Last synced: 06 Apr 2025
https://github.com/tirthajyoti/pydbgen
Random dataframe and database table generator
data-generation data-science database fake-data generator pandas-dataframe python random-generation sqlite sqlite3 synthetic-data synthetic-dataset-generation
Last synced: 05 Apr 2025
https://github.com/milaan9/clustering-datasets
This repository contains the collection of UCI (real-life) datasets and Synthetic (artificial) datasets (with cluster labels and MATLAB files) ready to use with clustering algorithms.
benchmark-datasets cluster cluster-labels clustering clustering-datasets dataset datasets real-world-datasets synthetic-data synthetic-datasets uci uci-dataset uci-machine-learning
Last synced: 03 Jul 2025
https://github.com/fjxmlzn/DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
dataset-generation datasets doppelganger fidelity gan gans generative-adversarial-network privacy synthetic-data synthetic-data-generation synthetic-data-generator synthetic-dataset-generation time-series timeseries
Last synced: 15 May 2025
https://github.com/ZumoLabs/zpy
Synthetic data for computer vision. An open source toolkit using Blender and Python.
ai blender blender-addon computer-vision data deep-learning ml python synthetic synthetic-data
Last synced: 11 May 2025
https://github.com/sdv-dev/TGAN
Generative adversarial training for generating synthetic tabular data.
generative-adversarial-network synthesizing-tabular-data synthetic-data tabular-data
Last synced: 02 May 2025
https://github.com/sdv-dev/tgan
Generative adversarial training for generating synthetic tabular data.
generative-adversarial-network synthesizing-tabular-data synthetic-data tabular-data
Last synced: 06 Apr 2025
https://github.com/gszfwsb/NCFM
Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function" (NCFM) in CVPR 2025.
computer-vision data-centric-ai dataset-distillation synthetic-data
Last synced: 01 Apr 2025
https://github.com/sdv-dev/sdgym
Benchmarking synthetic data generation methods.
benchmark deep-learning generative-adversarial-network generative-ai generative-models sdgym-synthesizers synthetic-data synthetic-data-vault tabular-data
Last synced: 15 May 2025
https://github.com/sdv-dev/SDGym
Benchmarking synthetic data generation methods.
benchmark deep-learning generative-adversarial-network generative-ai generative-models sdgym-synthesizers synthetic-data synthetic-data-vault tabular-data
Last synced: 02 May 2025
https://github.com/expectedparrot/edsl
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
anthropic data-labeling deepinfra domain-specific-language experiments llama2 llm llm-agent llm-framework llm-inference market-research mixtral open-source openai python social-science surveys synthetic-data
Last synced: 15 May 2025
https://github.com/worldbank/REaLTabFormer
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
data-generation deep-learning gpt gpt-2 seq2seq-model synthetic-data synthetic-dataset-generation tabular-data transformers
Last synced: 17 Aug 2025
https://github.com/sdv-dev/SDMetrics
Metrics to evaluate quality and efficacy of synthetic datasets.
metrics quality synthetic-data
Last synced: 02 May 2025
https://github.com/sdv-dev/sdmetrics
Metrics to evaluate quality and efficacy of synthetic datasets.
metrics quality synthetic-data
Last synced: 14 Apr 2025
https://github.com/worldbank/realtabformer
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
data-generation deep-learning gpt gpt-2 seq2seq-model synthetic-data synthetic-dataset-generation tabular-data transformers
Last synced: 04 Jan 2026
https://github.com/jrieke/shape-detection
🟣 Object detection of abstract shapes with neural networks
dataset deep-learning keras machine-learning machine-learning-tutorials neural-networks object-detection object-recognition synthetic-data tutorial
Last synced: 09 Apr 2025
https://github.com/project-agml/agml
AgML is a centralized framework for agricultural machine learning. AgML provides access to public agricultural datasets for common agricultural deep learning tasks, with standard benchmarks and pretrained models, as well the ability to generate synthetic data and annotations.
agriculture computer-vision dataset deep-learning image-classification object-detection pytorch semantic-segmentation synthetic-data
Last synced: 15 May 2025
https://github.com/Project-AgML/AgML
AgML is a centralized framework for agricultural machine learning. AgML provides access to public agricultural datasets for common agricultural deep learning tasks, with standard benchmarks and pretrained models, as well the ability to generate synthetic data and annotations.
agriculture computer-vision dataset deep-learning image-classification object-detection pytorch semantic-segmentation synthetic-data
Last synced: 07 May 2025
https://github.com/firmai/datagene
DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)
data-structures data-transformations dataset-generation dataset-similarity decomposition distance-calculations distance-measures encoding finance model-checking predictive-maintenance similarity-measures similarity-score synthesizers synthetic-data synthetic-dataset-generation testing-framework transformation-recipes
Last synced: 11 Jul 2025
https://github.com/TonicAI/masquerade
A Postgres Proxy to Mask Data in Realtime
fake-data postgres postgresql synthetic-data
Last synced: 23 Aug 2025
https://github.com/ndrplz/surround_vehicles_awareness
Learn to map surrounding vehicles onto a bird's eye view of the scene.
adas bird-eye deep-learning self-driving-car synthetic-data
Last synced: 23 Oct 2025
https://github.com/alexandervnikitin/tsgm
Generation and evaluation of synthetic time series datasets (also, augmentations, visualizations, a collection of popular datasets) NeurIPS'24
augmentations data-augmentation data-science datasets deep-learning generative-model keras machine-learning python synthetic-data synthetic-time-series tensorflow2 time-series vae
Last synced: 06 Apr 2025
https://github.com/zjrwtx/sft-data-builder
利用免费的大模型api来结合你的私域数据来生成sft训练数据(妥妥白嫖)支持llamafactory等工具的训练数据格式synthetic data
agents alpaca cot datagene gpt40 llm mllm multiagents o1 python react sharegpt slm synthetic-data tailwindcss visionlanguagemodel
Last synced: 05 Apr 2025
https://github.com/mhliao/synthtext3d
Project page of SynthText3D
scene-text-detection scene-text-images synthetic-data synthetic-data-generator
Last synced: 03 May 2025
https://github.com/aimclub/BAMT
Repository of a data modeling and analysis tool based on Bayesian networks
bayesian-networks mixed-data parameters-learning structure-learning synthetic-data
Last synced: 02 May 2025
https://github.com/sdv-dev/deepecho
Synthetic Data Generation for mixed-type, multivariate time series.
data-generation deep-learning generative-adversarial-network sdv synthetic-data synthetic-data-generation time-series
Last synced: 16 May 2025
https://github.com/sdv-dev/DeepEcho
Synthetic Data Generation for mixed-type, multivariate time series.
data-generation deep-learning generative-adversarial-network sdv synthetic-data synthetic-data-generation time-series
Last synced: 04 Apr 2025
https://github.com/khawar-islam/diffuseMix
Official PyTorch implementation of DiffuseMix : Label-Preserving Data Augmentation with Diffusion Models (CVPR'2024)
cutmix data-augmentation diffusion-models generative-data-augmentation image-classification mixup synthetic-data transfer-learning
Last synced: 15 Aug 2025
https://github.com/stefan-jansen/synthetic-data-for-finance
Material for QuantUniversity talk on Sythetic Data Generation for Finance.
algorithmic-trading finance generative-adversarial-network machine-learning synthetic-data
Last synced: 12 Apr 2025
https://github.com/microsoft/dpsda
Private Evolution: Generating DP Synthetic Data without Training [ICLR 2024, ICML 2024 Spotlight]
differential-privacy foundation-models private-evolution synthetic-data training-free
Last synced: 04 Jul 2025
https://github.com/firmai/mtss-gan
MTSS-GAN: Multivariate Time Series Simulation with Generative Adversarial Networks (by @firmai)
adverserial finance generative-adversarial-network model-validation multivariate-data multivariate-timeseries similarity-measures simulation stress-test synthetic-data synthetic-dataset-generation time-series
Last synced: 02 May 2025
https://github.com/microsoft/DPSDA
Private Evolution: Generating DP Synthetic Data without Training [ICLR 2024, ICML 2024 Spotlight]
differential-privacy foundation-models private-evolution synthetic-data training-free
Last synced: 04 Apr 2025
https://github.com/Baukebrenninkmeijer/table-evaluator
Evaluate real and synthetic datasets against each other
data data-evaluation evaluation generation synthetic synthetic-data table-evaluator
Last synced: 02 May 2025
https://github.com/justchenhao/IAug_CDNet
Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images.
bi-temporal-images building-change-detection cdnet change-detection instance-augmentation remote-se synthetic-data
Last synced: 11 May 2025
https://github.com/barseghyanartur/faker-file
Create files with fake data. In many formats. With no efforts.
factories fake-data fake-data-generator faker faker-file files synthetic-data synthetic-data-generator synthetic-file synthetic-file-generator synthetic-files synthetic-files-generator test-data test-data-generator test-file test-file-generator test-files test-files-generator testing
Last synced: 04 Apr 2025
https://github.com/bmw-innovationlab/sordi-ai-evaluation-gui
This repository allows you to evaluate a trained computer vision model and get general information and evaluation metrics with little configuration.
ai bmw computer-vision dataset deeplearning docker evaluation evaluation-framework no-code python rest-api sordi synthetic-data tensorflow
Last synced: 02 Jul 2025
https://github.com/ryoungj/BoLT
Code for "Reasoning to Learn from Latent Thoughts"
language-model latent-variable-models pretraining self-improvement synthetic-data
Last synced: 04 Oct 2025
https://github.com/jason718/game-feature-learning
Code for paper "Cross-Domain Self-supervised Multi-task Feature Learning using Synthetic Imagery", Ren et al., CVPR'18
computer-vision deep-learning domain-adaptation representation-learning self-supervised synthetic-data
Last synced: 10 Jul 2025
https://github.com/nvidia-ai-iot/synthetic_data_generation_training_workflow
Workflow for generating synthetic data and training CV models.
isaac-sim jetson object-detection omniverse robotics ros2 synthetic-data transfer-learning warehouse
Last synced: 28 Jun 2025
https://github.com/dmey/synthia
📈 🐍 Multidimensional synthetic data generation with Copula and fPCA models in Python
augmentation climate copula data-augmentation data-generation data-generator data-modelling data-science dependency-analysis dependency-modeling finance fpca functional-data machine-learning oversampling principal-component-analysis statistics synthetic-data weather xarray
Last synced: 02 May 2025
https://github.com/spiros/tofu
Tofu is a Python tool for generating synthetic UK Biobank data.
Last synced: 09 Apr 2025
https://github.com/gretelai/gretel-python-client
The Gretel Python Client allows you to interact with the Gretel REST API.
datascience machine-learning privacy privacy-enhancing-technologies stream-processing synthetic-data
Last synced: 04 Apr 2025
https://github.com/howiehwong/unigen
[ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models
benchmark dataset dataset-generation large-language-models llm synthetic-data toolkit
Last synced: 09 Apr 2025
https://github.com/Baukebrenninkmeijer/On-the-Generation-and-Evaluation-of-Synthetic-Tabular-Data-using-GANs
Repository for the results of my master thesis, about the generation and evaluation of synthetic data using GANs
data-evaluation data-synthesis gan generative-adversarial-networks synthetic-data synthetic-dataset-generation tabular-data
Last synced: 02 May 2025
https://github.com/sodascience/metasyn
Transparent and privacy-friendly synthetic data generation
metadata open-data privacy synthetic-data
Last synced: 07 Apr 2025
https://github.com/sunchang0124/dp_cgans
A library to generate synthetic tabular or RDF data using Conditional Generative Adversary Networks (GANs) combined with Differential Privacy techniques.
differential-privacy gan synthesizer synthetic-data
Last synced: 07 Apr 2025
https://chenshuang-zhang.github.io/imagenet_d/
[CVPR 2024 Highlight] ImageNet-D
benchmark computer-vision dataset diffusion-models generative-models image-recognition imagenet large-language-model multi-modality out-of-distribution recognition robustness stable-diffusion synthetic-data text-to-image-synthesis vision-language-model
Last synced: 31 Mar 2025
https://github.com/gretelai/synthetic-data-genomics
Proof of concept code from Gretel.ai and Illumina using generative neural networks to create synthetic versions of mouse genotype and phenotype data.
generative-model genomics privacy-enhancing-technologies synthetic-data
Last synced: 11 Jul 2025
https://github.com/dbt-labs/jaffle-shop-generator
🥪🏭 A simple CLI for generating synthetic Jaffle Shop data.
analytics-engineering faker synthetic-data synthetic-data-generator
Last synced: 01 May 2025
https://github.com/vincentkoc/synthetic-user-research
Example Notebook for Synthetic User Research with Persona Prompting and Autonomous Agents
autogen autonomous-agents research synthetic-data
Last synced: 22 Mar 2025
https://github.com/vincentkoc/tiny_qa_benchmark_pp
Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.
benchmark dataset evaluation huggingface-datasets litellm llm llm-testing llmops qa-dataset smoke-test synthetic-data tinybenchmarks
Last synced: 12 Jun 2025
https://github.com/unity-technologies/anthronet
Unity's Privacy-Preserving Novel Human Body Model Trained Solely on Synthetic Data and Corresponding Dense Anthropometric Measurements
applied-machine-learning applied-ml-research billing-7054 computer-graphics computer-vision computervision deep-learning deep-neural-networks human-activity-recognition human-centric-ml human-pose-estimation owner-ai synthetic-data synthetic-data-generation synthetic-dataset-generation unity3d
Last synced: 19 Oct 2025
https://github.com/hicservices/synthehr
Library and CLI for randomly generating medical data like you might get out of an Electronic Health Records (EHR) system
cli dataset ehr electronic-health-records hospital-admission nuget patient synthetic-data testing-tools tests
Last synced: 26 Jul 2025