Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with synthetic-dataset-generation

A curated list of projects in awesome lists tagged with synthetic-dataset-generation .

https://github.com/eladlev/autoprompt

A framework for prompt tuning using Intent-based Prompt Calibration

prompt-engineering prompt-tuning synthetic-dataset-generation

Last synced: 19 Dec 2024

https://github.com/Eladlev/AutoPrompt

A framework for prompt tuning using Intent-based Prompt Calibration

prompt-engineering prompt-tuning synthetic-dataset-generation

Last synced: 30 Oct 2024

https://github.com/batsresearch/bonito

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

domain-adaptation gpt llm synthetic-data synthetic-dataset-generation task-adaptation zero-shot-learning

Last synced: 09 Nov 2024

https://github.com/argilla-io/distilabel

Distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

ai huggingface llms openai python rlaif rlhf synthetic-data synthetic-dataset-generation

Last synced: 18 Dec 2024

https://github.com/BatsResearch/bonito

A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

domain-adaptation gpt llm synthetic-data synthetic-dataset-generation task-adaptation zero-shot-learning

Last synced: 08 Nov 2024

https://github.com/paulbricman/thisrepositorydoesnotexist

A curated list of awesome projects which use Machine Learning to generate synthetic content.

generation-algorithms generative-adversarial-network synthetic-data synthetic-dataset-generation synthetic-images

Last synced: 17 Nov 2024

https://github.com/magpie-align/magpie

Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!

alignment dataset gemma llama2 llama3 llm nlp paper phi3 qwen2 supervised-finetuning synthetic-data synthetic-dataset-generation

Last synced: 21 Dec 2024

https://github.com/fjxmlzn/DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

dataset-generation datasets doppelganger fidelity gan gans generative-adversarial-network privacy synthetic-data synthetic-data-generation synthetic-data-generator synthetic-dataset-generation time-series timeseries

Last synced: 19 Nov 2024

https://github.com/worldbank/realtabformer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

data-generation deep-learning gpt gpt-2 seq2seq-model synthetic-data synthetic-dataset-generation tabular-data transformers

Last synced: 20 Dec 2024

https://github.com/worldbank/REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

data-generation deep-learning gpt gpt-2 seq2seq-model synthetic-data synthetic-dataset-generation tabular-data transformers

Last synced: 17 Dec 2024

https://github.com/squeezeailab/llm2llm

[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

data-augmentation llama llama2 llm llms natural-language-processing nlp synthetic-dataset-generation transformer

Last synced: 05 Dec 2024

https://github.com/jtheiner/LegoBrickClassification

Repository to identify Lego bricks automatically only using images

dataset image-classification lego synthetic-dataset-generation transfer-learning

Last synced: 13 Nov 2024

https://github.com/VinAIResearch/Dataset-Diffusion

Dataset Diffusion: Diffusion-based Synthetic Data Generation for Pixel-Level Semantic Segmentation (NeurIPS2023)

diffusion-models semantic-segmentation synthetic-dataset-generation

Last synced: 04 Nov 2024

https://github.com/astorfi/cor-gan

:unlock: COR-GAN: Correlation-Capturing Convolutional Neural Networks for Generating Synthetic Healthcare Records

deep-learning generative-adversarial-network healthcare python pytorch synthetic-dataset-generation

Last synced: 22 Oct 2024

https://github.com/zakimjz/IBMGenerator

IBM Synthetic Data Generator for Itemsets and Sequences

itemset-mining sequence-datasets sequence-mining synthetic-dataset-generation

Last synced: 30 Oct 2024

https://github.com/subake/DAPS3D

DAPS3D: Domain Adaptive Projective Segmentation of 3D LiDAR Point Clouds

lidar lidar-point-cloud projective-semantic-segmentation rellis-3d semantic-kitti semantic-segmentation synthetic-dataset-generation

Last synced: 28 Oct 2024

https://github.com/georgeoshardo/SyMBac

Accurate segmentation of bacterial microscope images using deep learning synthetically generated image data.

biology deep-learning image-processing machine-learning microscopy segmentation synthetic-biology synthetic-data synthetic-dataset-generation

Last synced: 14 Nov 2024

https://github.com/mantyni/Multi-object-detection-lego

Multi object detection of lego bricks in a dataset generated using using blender.

blender blender-python multi-object-tracking object-detection ssd synthetic-dataset-generation yolov5

Last synced: 13 Nov 2024

https://github.com/majsylw/microbial-counting-review

A list of useful resources in the microbial colony classification and detection, such as datasets, papers, links to projects

agar-plate cfu classification datasets density-estimation microbial-colonies microbial-counting microbial-detection microbiology object-detection synthetic-dataset-generation

Last synced: 04 Dec 2024

https://github.com/travvy88/doge

Synthetic Data Generator for Document AI. Creates document images annotated with text and bounding boxes of each word. Images contain headings, tables, pargraphs with different formatting and fonts.

ai bounding-boxes dataset document synthetic-dataset-generation

Last synced: 01 Dec 2024

https://github.com/mosure/bevy_zeroverse

bevy zeroverse synthetic dataset generator

bevy reconstruction synthetic-dataset-generation zeroverse

Last synced: 09 Oct 2024

https://github.com/hrolive/unreal-engine-for-remote-visualization-and-machine-learning

In-depth training to using Unreal Engine as a data generator and integrat it in a simple ML workflow, in one of the leading supercomputing centres.

data-generator hpc machine-learning synthetic-data synthetic-dataset-generation unreal-engine webrtc

Last synced: 09 Nov 2024

https://github.com/my-north-ai/semantic_audio_filtering

Synthetic data augmentation technique via LLM for Automatic Speech Recognition fine tuning.

automatic-speech-recognition fine-tuning synthetic-dataset-generation text-to-speech whisper

Last synced: 24 Oct 2024

https://github.com/i-partalas/industrial-rag-qna-benchmark

Benchmarking the performance of proprietary vs open-source LLMs in industrial QnA tasks using various RAG-based implementations and evaluation metrics.

azureopenai benchmarking chromadb chunking docker huggingface langchain large-language-models llms-benchmarking metrics openai pytorch retrieval-augmented-generation streamlit synthetic-dataset-generation

Last synced: 08 Dec 2024

https://github.com/zafarrehan/custom_od_architecture_from_scratch

This repository walks you through creating your own custom One-Stage object detection model architecture ( in keras ) , with a synthetic dataset generator on board for training and evaluation

anchor-box custom-architecture jupyter-notebook keras non-maximum-suppression object-detection synthetic-dataset-generation tensorflow2

Last synced: 25 Nov 2024

https://github.com/neoneye/arc-output-size

Predicting the output size of an ARC task (Abstraction and Reasoning Corpus)

dataset dataset-generation generate-data synthetic-dataset-generation

Last synced: 17 Dec 2024

https://github.com/sartajbhuvaji/data-science-research

Data Science Research at Seattle University under Dr.Wan Bae. We try to mitigate class imbalance problem in asthama dataset using multiple algorithms.

autoencoders private-repository research seattleu synthetic-dataset-generation

Last synced: 15 Nov 2024

https://github.com/automators-com/datamaker-rs

DataMaker assists with generating realistic relational data for testing and development purposes.

datagenerator datamaker rust synthetic-dataset-generation

Last synced: 08 Nov 2024

https://github.com/rip4kobe/syntheticwastegenerator

Synthetic Municipal Solid Waste Generator for AI-powered Waste Recognition System

deep-learning synthetic-dataset-generation waste-classification

Last synced: 30 Nov 2024

https://github.com/dimits-ts/llm_moderation_research

Synthetic dialogue creation. Experiments exploring the role of LLMs in the moderation of deliberation systems.

llm moderation natural-language-processing research synthetic-dataset-generation

Last synced: 07 Nov 2024

https://github.com/akapet00/thermal-dosimetry-surrogate

Standardized Benchmark Dataset for Localized Exposure to a Realistic Source at 10-90 GHz

electromagnetic-fields human-exposure machine-learning surrogate-modelling synthetic-dataset-generation

Last synced: 13 Nov 2024

https://github.com/pikastunner/syntheticdatagen

SyntheticDataGen is a tool for converting RGB and depth images into 3D meshes and generating synthetic image datasets.

computer-vision nvidia-omniverse pyqt5 python realsense-camera synthetic-dataset-generation

Last synced: 19 Dec 2024

https://github.com/rwth-irt/blenderproc.disstimredick

BlenderProc setup to generate the synthetic datasets from Tim Redick's dissertation. STERI models not included since the CAD files are proprietary.

blender3d pose-estimation synthetic-dataset-generation

Last synced: 12 Nov 2024

https://github.com/amanpriyanshu/synthesizedaten

A repository for generating synthetic data (images) using various DL/ML models.

dbn dbnet gans gans-models synthetic-data synthetic-dataset-generation vae vae-implementation vae-pytorch

Last synced: 15 Dec 2024

https://github.com/jer-nc/blender-aws-batch-instant-ngp-dataset

Generate synthetic dataset for Instant-NGP using Blender and AWS Batch.

aws aws-batch blender blender-nerf docker instant-ngp nerf python synthetic-dataset-generation

Last synced: 28 Nov 2024

https://github.com/matteodoria/stable-diffusion-dataset-generation

This repository contains a class-conditioned adaptation of the Stable Diffusion model in order to generate a Synthetic Dataset suitable for a Downstream Classification task.

artificial-intelligence deep-learning stable-diffusion synthetic-data synthetic-dataset-generation

Last synced: 09 Dec 2024

https://github.com/sabasyed/synthetic-data-generation

Synthetically generated latex expressions along with their python codes.

lambdify latex python sympy sympy-expressions synthetic-data synthetic-dataset-generation

Last synced: 18 Dec 2024