Projects in Awesome Lists tagged with data-synthesis
A curated list of projects in awesome lists tagged with data-synthesis .
https://github.com/agamiko/data-augmentation-review
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
audio-augmentation augmentation-policies autoaugment data-augmentation data-augmentations data-generation data-synthesis generative-adversarial-network graph-data-augmentation image-augmentation machine-learning nlp-augmentation review style-transfer survey
Last synced: 28 Jan 2026
https://github.com/AgaMiko/data-augmentation-review
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
audio-augmentation augmentation-policies autoaugment data-augmentation data-augmentations data-generation data-synthesis generative-adversarial-network graph-data-augmentation image-augmentation machine-learning nlp-augmentation review style-transfer survey
Last synced: 20 Mar 2025
https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
alignment compression data-augmentation data-synthesis feedback instruction-following kd knowledge-distillation large-language-model llm multi-modal self-distillation self-training supervised-finetuning survey
Last synced: 12 Apr 2025
https://github.com/open-sciencelab/GraphGen
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
ai4science data-generation data-synthesis graphgen knowledge-graph llama-factory llm llm-training pretrain pretraining qa question-answering qwen sft sft-data xtuner
Last synced: 29 Nov 2025
https://github.com/swz30/CycleISP
[CVPR 2020--Oral] CycleISP: Real Image Restoration via Improved Data Synthesis
camera-imaging-pipeline computer-vision cvpr2020 cycleisp data-synthesis image-denoising image-restoration low-level-vision pytorch raw2rgb rgb2raw
Last synced: 02 Apr 2025
https://github.com/swz30/cycleisp
[CVPR 2020--Oral] CycleISP: Real Image Restoration via Improved Data Synthesis
camera-imaging-pipeline computer-vision cvpr2020 cycleisp data-synthesis image-denoising image-restoration low-level-vision pytorch raw2rgb rgb2raw
Last synced: 09 Apr 2025
https://github.com/diyer22/bpycv
Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
6dof-pose blender blender-cv blender-python computer-vision data-synthesis dataset-generation deep-learning depth instance-segmentation synthetic-datasets ycb
Last synced: 04 Sep 2025
https://github.com/mrgiovanni/synthetictumors
[CVPR 2023] Label-Free Liver Tumor Segmentation
data-synthesis label-free segmentation tumor-segmentation unet
Last synced: 16 May 2025
https://github.com/modelengine-group/datamate
DataMate is an enterprise-level data processing platform designed for model fine-tuning and RAG retrieval.
data-evaluation data-pipeline data-synthesis rag
Last synced: 06 Mar 2026
https://github.com/OS-Copilot/OS-Genesis
Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
agents data-synthesis gui multimodal
Last synced: 23 Feb 2025
https://github.com/hewei2001/reachqa
Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"
Last synced: 30 Jul 2025
https://github.com/Baukebrenninkmeijer/On-the-Generation-and-Evaluation-of-Synthetic-Tabular-Data-using-GANs
Repository for the results of my master thesis, about the generation and evaluation of synthetic data using GANs
data-evaluation data-synthesis gan generative-adversarial-networks synthetic-data synthetic-dataset-generation tabular-data
Last synced: 02 May 2025
https://github.com/Gariscat/loopy
A data framework for music information retrieval focusing on electronic music.
data-synthesis music-information-retrieval
Last synced: 14 Jul 2025
https://github.com/vkit-x/vkit
Boosting Document Intelligence
chineseocr computer-vision data-augmentation data-synthesis deep-learning document-inteligence image-augmentation machine-learning ocr python text-detection text-detection-recognition text-recognition vkit vkit-x
Last synced: 08 Apr 2026
https://github.com/sushant1827/Trigger-Word-Detection
Coursera - RNN Programming Assignment: In this project, we will construct a speech dataset and implement an algorithm for trigger word detection (sometimes also called keyword detection, or wake word detection).
data-synthesis keras-tensorflow spectrogram trigger-word-detection
Last synced: 14 Oct 2025
https://github.com/smithsonian/ccn-data-library
The Coastal Carbon Network Data Library: An open-source database featuring carbon data from tidal wetlands around the world
coastal-carbon-network data-synthesis open-source wetland-science
Last synced: 09 Feb 2026
https://github.com/Smithsonian/CCN-Data-Library
The Coastal Carbon Network Data Library: An open-source database featuring carbon data from tidal wetlands around the world
coastal-carbon-network data-synthesis open-source wetland-science
Last synced: 20 Jul 2025
https://github.com/etiennechollet/synthshapes
Generate Synthetic Shapes in 3D for Biomedical Image Augmentation and Synthesis.
biomedical-image-segmentation biomedical-imaging data-synthesis machine-learning
Last synced: 28 Jan 2026
https://github.com/pd-mera/object-detection-data-synthesis
Synthesis data in YOLO format given background and object images
Last synced: 07 Jun 2026
https://github.com/h-iaac/sumo_data_synthesis
Repo for trying out SUMO
data-synthesis python routine-generator sumo traffic-simulation
Last synced: 09 May 2026
https://github.com/johanneswiesner/nisynth
A repository for synthesizing and simulating MRI images
brain-imaging brain-mri data-synthesis neuroscience python
Last synced: 18 Aug 2025
https://github.com/ready4-dev/ready4web
Website of the ready4 suite of tools for data synthesis and modelling in mental health
data-synthesis health health-economics mental policy simulation
Last synced: 20 Jan 2026
https://github.com/kazkozdev/llmflow-search
LLMFlow Search agent processes complex queries, deep searches, and synthesizes information from the web.
data-synthesis deep-search intelligent-search llm llm-agent search-agent
Last synced: 15 May 2025