Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with synthetic-dataset-generation
A curated list of projects in awesome lists tagged with synthetic-dataset-generation .
https://github.com/eladlev/autoprompt
A framework for prompt tuning using Intent-based Prompt Calibration
prompt-engineering prompt-tuning synthetic-dataset-generation
Last synced: 19 Dec 2024
https://github.com/Eladlev/AutoPrompt
A framework for prompt tuning using Intent-based Prompt Calibration
prompt-engineering prompt-tuning synthetic-dataset-generation
Last synced: 30 Oct 2024
https://github.com/unity-technologies/com.unity.perception
Perception toolkit for sim2real training and validation in Unity
computer-vision deep-learning detection domain-randomization machine-learning object-detection perception pose-estimation segmentation synthetic-dataset-generation
Last synced: 19 Dec 2024
https://github.com/Unity-Technologies/com.unity.perception
Perception toolkit for sim2real training and validation in Unity
computer-vision deep-learning detection domain-randomization machine-learning object-detection perception pose-estimation segmentation synthetic-dataset-generation
Last synced: 06 Nov 2024
https://github.com/datadreamer-dev/DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
alignment deep-learning fine-tuning gpt instruction-tuning llm llmops llms machine-learning natural-language-processing nlp nlp-library openai python pytorch synthetic-data synthetic-dataset-generation transformers
Last synced: 13 Oct 2024
https://github.com/batsresearch/bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
domain-adaptation gpt llm synthetic-data synthetic-dataset-generation task-adaptation zero-shot-learning
Last synced: 09 Nov 2024
https://github.com/argilla-io/distilabel
Distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
ai huggingface llms openai python rlaif rlhf synthetic-data synthetic-dataset-generation
Last synced: 18 Dec 2024
https://github.com/BatsResearch/bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
domain-adaptation gpt llm synthetic-data synthetic-dataset-generation task-adaptation zero-shot-learning
Last synced: 08 Nov 2024
https://github.com/paulbricman/thisrepositorydoesnotexist
A curated list of awesome projects which use Machine Learning to generate synthetic content.
generation-algorithms generative-adversarial-network synthetic-data synthetic-dataset-generation synthetic-images
Last synced: 17 Nov 2024
https://github.com/nvidia/dataset_synthesizer
NVIDIA Deep learning Dataset Synthesizer (NDDS)
computer-vision deep-learning domain-randomization object-detection pose-estimation synthetic-dataset-generation
Last synced: 15 Dec 2024
https://github.com/NVIDIA/Dataset_Synthesizer
NVIDIA Deep learning Dataset Synthesizer (NDDS)
computer-vision deep-learning domain-randomization object-detection pose-estimation synthetic-dataset-generation
Last synced: 26 Oct 2024
https://github.com/magpie-align/magpie
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
alignment dataset gemma llama2 llama3 llm nlp paper phi3 qwen2 supervised-finetuning synthetic-data synthetic-dataset-generation
Last synced: 21 Dec 2024
https://github.com/unity-technologies/synthdet
SynthDet - An end-to-end object detection pipeline using synthetic data
computer-vision deep-learning detection domain-randomization machine-learning object-detection pose-estimation synthetic-data synthetic-dataset synthetic-dataset-generation
Last synced: 15 Dec 2024
https://github.com/sparkfish/augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
augmentation-pipeline computer-vision crappification data-augmentation data-pipeline deep-neural-networks image-processing machine-learning synthetic-data synthetic-dataset-generation training-data
Last synced: 17 Dec 2024
https://github.com/unity-technologies/peoplesanspeople
Unity's privacy-preserving human-centric synthetic data generator
applied-ml-research billing-5160 computer-vision deep-learning human-activity-recognition human-centric-ml human-pose-estimation icml-2022 labeling object-detection owner-machine-learning perception pose-estimation synthetic-data synthetic-data-generation synthetic-dataset-generation synthetic-datasets transfer-learning unity unity3d
Last synced: 15 Dec 2024
https://github.com/Unity-Technologies/PeopleSansPeople
Unity's privacy-preserving human-centric synthetic data generator
applied-ml-research billing-5160 computer-vision deep-learning human-activity-recognition human-centric-ml human-pose-estimation icml-2022 labeling object-detection owner-machine-learning perception pose-estimation synthetic-data synthetic-data-generation synthetic-dataset-generation synthetic-datasets transfer-learning unity unity3d
Last synced: 10 Nov 2024
https://github.com/tirthajyoti/pydbgen
Random dataframe and database table generator
data-generation data-science database fake-data generator pandas-dataframe python random-generation sqlite sqlite3 synthetic-data synthetic-dataset-generation
Last synced: 21 Dec 2024
https://github.com/fjxmlzn/DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
dataset-generation datasets doppelganger fidelity gan gans generative-adversarial-network privacy synthetic-data synthetic-data-generation synthetic-data-generator synthetic-dataset-generation time-series timeseries
Last synced: 19 Nov 2024
https://github.com/worldbank/realtabformer
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
data-generation deep-learning gpt gpt-2 seq2seq-model synthetic-data synthetic-dataset-generation tabular-data transformers
Last synced: 20 Dec 2024
https://github.com/worldbank/REaLTabFormer
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.
data-generation deep-learning gpt gpt-2 seq2seq-model synthetic-data synthetic-dataset-generation tabular-data transformers
Last synced: 17 Dec 2024
https://github.com/firmai/datagene
DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)
data-structures data-transformations dataset-generation dataset-similarity decomposition distance-calculations distance-measures encoding finance model-checking predictive-maintenance similarity-measures similarity-score synthesizers synthetic-data synthetic-dataset-generation testing-framework transformation-recipes
Last synced: 12 Nov 2024
https://github.com/rozumden/DeFMO
[CVPR 2021] DeFMO: Deblurring and Shape Recovery of Fast Moving Objects
cvpr2021 dataset deblurring deep-learning disentangled-representations fast-moving-objects motion-blur semi-supervised-learning synthetic-dataset-generation temporal-super-resolution
Last synced: 10 Nov 2024
https://github.com/squeezeailab/llm2llm
[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
data-augmentation llama llama2 llm llms natural-language-processing nlp synthetic-dataset-generation transformer
Last synced: 05 Dec 2024
https://github.com/nvidia/dataset_utilities
NVIDIA Dataset Utilities (NVDU)
deep-learning pose-estimation synthetic-dataset-generation visualization
Last synced: 29 Oct 2024
https://github.com/isarandi/synthetic-occlusion
Synthetic Occlusion Augmentation
computer-vision data-augmentation occlusion python synthetic-dataset-generation
Last synced: 16 Nov 2024
https://github.com/jtheiner/LegoBrickClassification
Repository to identify Lego bricks automatically only using images
dataset image-classification lego synthetic-dataset-generation transfer-learning
Last synced: 13 Nov 2024
https://github.com/firmai/mtss-gan
MTSS-GAN: Multivariate Time Series Simulation with Generative Adversarial Networks (by @firmai)
adverserial finance generative-adversarial-network model-validation multivariate-data multivariate-timeseries similarity-measures simulation stress-test synthetic-data synthetic-dataset-generation time-series
Last synced: 12 Nov 2024
https://github.com/VinAIResearch/Dataset-Diffusion
Dataset Diffusion: Diffusion-based Synthetic Data Generation for Pixel-Level Semantic Segmentation (NeurIPS2023)
diffusion-models semantic-segmentation synthetic-dataset-generation
Last synced: 04 Nov 2024
https://github.com/astorfi/cor-gan
:unlock: COR-GAN: Correlation-Capturing Convolutional Neural Networks for Generating Synthetic Healthcare Records
deep-learning generative-adversarial-network healthcare python pytorch synthetic-dataset-generation
Last synced: 22 Oct 2024
https://github.com/Baukebrenninkmeijer/On-the-Generation-and-Evaluation-of-Synthetic-Tabular-Data-using-GANs
Repository for the results of my master thesis, about the generation and evaluation of synthetic data using GANs
data-evaluation data-synthesis gan generative-adversarial-networks synthetic-data synthetic-dataset-generation tabular-data
Last synced: 12 Nov 2024
https://github.com/arkanivasarkar/eeg-data-augmentation-using-variational-autoencoder
Improving performance of motor imagery classification using variational-autoencoder and synthetic EEG signals
data-augmentation eeg-signals eegnet keras keras-tensorflow motor-imagery-classification synthetic-dataset-generation variational-autoencoder
Last synced: 06 Dec 2024
https://github.com/unity-technologies/anthronet
Unity's Privacy-Preserving Novel Human Body Model Trained Solely on Synthetic Data and Corresponding Dense Anthropometric Measurements
applied-machine-learning applied-ml-research billing-7054 computer-graphics computer-vision computervision deep-learning deep-neural-networks human-activity-recognition human-centric-ml human-pose-estimation owner-ai synthetic-data synthetic-data-generation synthetic-dataset-generation unity3d
Last synced: 07 Oct 2024
https://github.com/zakimjz/IBMGenerator
IBM Synthetic Data Generator for Itemsets and Sequences
itemset-mining sequence-datasets sequence-mining synthetic-dataset-generation
Last synced: 30 Oct 2024
https://github.com/subake/DAPS3D
DAPS3D: Domain Adaptive Projective Segmentation of 3D LiDAR Point Clouds
lidar lidar-point-cloud projective-semantic-segmentation rellis-3d semantic-kitti semantic-segmentation synthetic-dataset-generation
Last synced: 28 Oct 2024
https://github.com/maxvandenhoven/blenderline
A Blender pipeline for generating synthetic images of production lines
blender blenderline conveyor-belt object-detection production-line python semantic-segmentation synthetic-data synthetic-dataset-generation
Last synced: 07 Nov 2024
https://github.com/georgeoshardo/SyMBac
Accurate segmentation of bacterial microscope images using deep learning synthetically generated image data.
biology deep-learning image-processing machine-learning microscopy segmentation synthetic-biology synthetic-data synthetic-dataset-generation
Last synced: 14 Nov 2024
https://github.com/openlayer-ai/examples-gallery
Sample notebooks that use the Openlayer Python API
ai data-centric machine-learning machine-learning-operations ml ml-infrastructure mlops model-deployment model-explainability synthetic-dataset-generation tensorflow unbox
Last synced: 10 Nov 2024
https://github.com/rubixml/colors
Demonstrating unsupervised clustering using the K Means algorithm and synthetic color data.
clustering contingency-table cross-validation k-means k-means-clustering k-means-plus-plus kmeans machine-learning machine-learning-tutorial php php-machine-learning php-ml rubix-ml synthetic-data synthetic-dataset-generation tutorial unsupervised-learning
Last synced: 04 Dec 2024
https://github.com/mantyni/Multi-object-detection-lego
Multi object detection of lego bricks in a dataset generated using using blender.
blender blender-python multi-object-tracking object-detection ssd synthetic-dataset-generation yolov5
Last synced: 13 Nov 2024
https://github.com/majsylw/microbial-counting-review
A list of useful resources in the microbial colony classification and detection, such as datasets, papers, links to projects
agar-plate cfu classification datasets density-estimation microbial-colonies microbial-counting microbial-detection microbiology object-detection synthetic-dataset-generation
Last synced: 04 Dec 2024
https://github.com/stefanrmmr/differentially_private_synthetic_data
Differentially Private Synthetic Data Generation [DP-SDG] - Experimental Setups & Knowledge Base - WORK IN PROGRESS
data-analysis data-anonymity data-anonymization differential-privacy differentially-private dpgan dpsdg dpwgan dsgvokonform gdpr pategan privacy privacy-enhancing-technologies privacy-preserving-machine-learning privacy-preserving-synthetic-data quasi-identifiers sensitive-data-security synthetic-data synthetic-data-generation synthetic-dataset-generation
Last synced: 27 Oct 2024
https://github.com/SidharthMacherla/conjurer
R Package to generate synthetic data.
dummy-data-generator r rpackage synthetic-data synthetic-data-generation synthetic-dataset-generation synthetic-tabular-data
Last synced: 12 Nov 2024
https://github.com/openlayer-ai/openlayer-python
The official Python library for Openlayer, the Continuous Model Improvement Platform for AI. 📈
ai data-centric machine-learning machine-learning-operations ml ml-infrastructure mlops model-deployment model-explainability openlayer synthetic-dataset-generation tensorflow unbox
Last synced: 14 Oct 2024
https://github.com/rajeevatla/supercongan
Training a GAN using superconductivity data
gan generative-adversarial-network superconductivity superconductivity-data superconductivity-dataset superconductor-simulation-data superconductors synthetic-dataset-generation tabular-data
Last synced: 14 Nov 2024
https://github.com/amazon-science/synthesizrr
Synthesizing realistic and diverse text-datasets from augmented LLMs
deep-learning language-modeling natural-language-generation retrieval-augmented retrieval-augmented-generation synthetic-data synthetic-dataset-generation
Last synced: 12 Nov 2024
https://github.com/amanpriyanshu/generatorpromptkit
GeneratorPromptKit: A Python Library and Framework for Automated Generator Prompting and Dataset Generation
augmentation automated-prompt-engineering data data-augmentation data-science dataset dataset-generation diverse-data llm llm-training llms prompt-engineering synthetic-data synthetic-dataset-generation
Last synced: 28 Oct 2024
https://github.com/travvy88/doge
Synthetic Data Generator for Document AI. Creates document images annotated with text and bounding boxes of each word. Images contain headings, tables, pargraphs with different formatting and fonts.
ai bounding-boxes dataset document synthetic-dataset-generation
Last synced: 01 Dec 2024
https://github.com/mosure/bevy_zeroverse
bevy zeroverse synthetic dataset generator
bevy reconstruction synthetic-dataset-generation zeroverse
Last synced: 09 Oct 2024
https://github.com/hrolive/unreal-engine-for-remote-visualization-and-machine-learning
In-depth training to using Unreal Engine as a data generator and integrat it in a simple ML workflow, in one of the leading supercomputing centres.
data-generator hpc machine-learning synthetic-data synthetic-dataset-generation unreal-engine webrtc
Last synced: 09 Nov 2024
https://github.com/daspartho/distillclassifier
Easily generate synthetic data for classification tasks using LLMs
classification classification-models dataset-generation distillation distillation-model distilling-the-knowledge large-language-models nlp synthetic-data synthetic-dataset-generation text-classification
Last synced: 15 Dec 2024
https://github.com/my-north-ai/semantic_audio_filtering
Synthetic data augmentation technique via LLM for Automatic Speech Recognition fine tuning.
automatic-speech-recognition fine-tuning synthetic-dataset-generation text-to-speech whisper
Last synced: 24 Oct 2024
https://github.com/steffenmoritz/synthetic_data_challenge
UNECE HLG-MOS Synthetic Data Challenge (TEAM DESTATIS)
challenge official-statistics synthetic-data synthetic-dataset-generation unece
Last synced: 12 Nov 2024
https://github.com/i-partalas/industrial-rag-qna-benchmark
Benchmarking the performance of proprietary vs open-source LLMs in industrial QnA tasks using various RAG-based implementations and evaluation metrics.
azureopenai benchmarking chromadb chunking docker huggingface langchain large-language-models llms-benchmarking metrics openai pytorch retrieval-augmented-generation streamlit synthetic-dataset-generation
Last synced: 08 Dec 2024
https://github.com/samthiriot/knime.bayesiannetworks
Bayesian Networks nodes for KNIME
bayesian-networks knime knime-node scientific-computing scientific-machine-learning scientific-workflows synthetic-data synthetic-dataset-generation synthetic-population-library synthetic-populations
Last synced: 25 Nov 2024
https://github.com/zafarrehan/custom_od_architecture_from_scratch
This repository walks you through creating your own custom One-Stage object detection model architecture ( in keras ) , with a synthetic dataset generator on board for training and evaluation
anchor-box custom-architecture jupyter-notebook keras non-maximum-suppression object-detection synthetic-dataset-generation tensorflow2
Last synced: 25 Nov 2024
https://github.com/neoneye/arc-output-size
Predicting the output size of an ARC task (Abstraction and Reasoning Corpus)
dataset dataset-generation generate-data synthetic-dataset-generation
Last synced: 17 Dec 2024
https://github.com/sartajbhuvaji/data-science-research
Data Science Research at Seattle University under Dr.Wan Bae. We try to mitigate class imbalance problem in asthama dataset using multiple algorithms.
autoencoders private-repository research seattleu synthetic-dataset-generation
Last synced: 15 Nov 2024
https://github.com/automators-com/datamaker-rs
DataMaker assists with generating realistic relational data for testing and development purposes.
datagenerator datamaker rust synthetic-dataset-generation
Last synced: 08 Nov 2024
https://github.com/kanishknavale/training_don_a_novel_approach
Repository for research paper "Training Dense Object Nets: A Novel Approach"
computer-vision deep-neural-networks dense-object-nets image-correspondences image-dataset image-processing keypointnet pytorch pytorch-lightning synthetic-augmentation synthetic-dataset-generation
Last synced: 10 Nov 2024
https://github.com/rip4kobe/syntheticwastegenerator
Synthetic Municipal Solid Waste Generator for AI-powered Waste Recognition System
deep-learning synthetic-dataset-generation waste-classification
Last synced: 30 Nov 2024
https://github.com/dimits-ts/llm_moderation_research
Synthetic dialogue creation. Experiments exploring the role of LLMs in the moderation of deliberation systems.
llm moderation natural-language-processing research synthetic-dataset-generation
Last synced: 07 Nov 2024
https://github.com/akapet00/thermal-dosimetry-surrogate
Standardized Benchmark Dataset for Localized Exposure to a Realistic Source at 10-90 GHz
electromagnetic-fields human-exposure machine-learning surrogate-modelling synthetic-dataset-generation
Last synced: 13 Nov 2024
https://github.com/ayushsiloiya619/synthetic_data_vault
SDV
sdv synthetic-data synthetic-dataset-generation
Last synced: 30 Nov 2024
https://github.com/pikastunner/syntheticdatagen
SyntheticDataGen is a tool for converting RGB and depth images into 3D meshes and generating synthetic image datasets.
computer-vision nvidia-omniverse pyqt5 python realsense-camera synthetic-dataset-generation
Last synced: 19 Dec 2024
https://github.com/rwth-irt/blenderproc.disstimredick
BlenderProc setup to generate the synthetic datasets from Tim Redick's dissertation. STERI models not included since the CAD files are proprietary.
blender3d pose-estimation synthetic-dataset-generation
Last synced: 12 Nov 2024
https://github.com/amanpriyanshu/synthesizedaten
A repository for generating synthetic data (images) using various DL/ML models.
dbn dbnet gans gans-models synthetic-data synthetic-dataset-generation vae vae-implementation vae-pytorch
Last synced: 15 Dec 2024
https://github.com/jer-nc/blender-aws-batch-instant-ngp-dataset
Generate synthetic dataset for Instant-NGP using Blender and AWS Batch.
aws aws-batch blender blender-nerf docker instant-ngp nerf python synthetic-dataset-generation
Last synced: 28 Nov 2024
https://github.com/matteodoria/stable-diffusion-dataset-generation
This repository contains a class-conditioned adaptation of the Stable Diffusion model in order to generate a Synthetic Dataset suitable for a Downstream Classification task.
artificial-intelligence deep-learning stable-diffusion synthetic-data synthetic-dataset-generation
Last synced: 09 Dec 2024
https://github.com/sabasyed/synthetic-data-generation
Synthetically generated latex expressions along with their python codes.
lambdify latex python sympy sympy-expressions synthetic-data synthetic-dataset-generation
Last synced: 18 Dec 2024