An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-synthesis

A curated list of projects in awesome lists tagged with data-synthesis .

https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs

This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.

alignment compression data-augmentation data-synthesis feedback instruction-following kd knowledge-distillation large-language-model llm multi-modal self-distillation self-training supervised-finetuning survey

Last synced: 12 Apr 2025

https://github.com/open-sciencelab/GraphGen

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

ai4science data-generation data-synthesis graphgen knowledge-graph llama-factory llm llm-training pretrain pretraining qa question-answering qwen sft sft-data xtuner

Last synced: 29 Nov 2025

https://github.com/swz30/CycleISP

[CVPR 2020--Oral] CycleISP: Real Image Restoration via Improved Data Synthesis

camera-imaging-pipeline computer-vision cvpr2020 cycleisp data-synthesis image-denoising image-restoration low-level-vision pytorch raw2rgb rgb2raw

Last synced: 02 Apr 2025

https://github.com/swz30/cycleisp

[CVPR 2020--Oral] CycleISP: Real Image Restoration via Improved Data Synthesis

camera-imaging-pipeline computer-vision cvpr2020 cycleisp data-synthesis image-denoising image-restoration low-level-vision pytorch raw2rgb rgb2raw

Last synced: 09 Apr 2025

https://github.com/diyer22/bpycv

Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)

6dof-pose blender blender-cv blender-python computer-vision data-synthesis dataset-generation deep-learning depth instance-segmentation synthetic-datasets ycb

Last synced: 04 Sep 2025

https://github.com/mrgiovanni/synthetictumors

[CVPR 2023] Label-Free Liver Tumor Segmentation

data-synthesis label-free segmentation tumor-segmentation unet

Last synced: 16 May 2025

https://github.com/modelengine-group/datamate

DataMate is an enterprise-level data processing platform designed for model fine-tuning and RAG retrieval.

data-evaluation data-pipeline data-synthesis rag

Last synced: 06 Mar 2026

https://github.com/OS-Copilot/OS-Genesis

Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

agents data-synthesis gui multimodal

Last synced: 23 Feb 2025

https://github.com/hewei2001/reachqa

Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"

data-synthesis llm mllm

Last synced: 30 Jul 2025

https://github.com/Gariscat/loopy

A data framework for music information retrieval focusing on electronic music.

data-synthesis music-information-retrieval

Last synced: 14 Jul 2025

https://github.com/sushant1827/Trigger-Word-Detection

Coursera - RNN Programming Assignment: In this project, we will construct a speech dataset and implement an algorithm for trigger word detection (sometimes also called keyword detection, or wake word detection).

data-synthesis keras-tensorflow spectrogram trigger-word-detection

Last synced: 14 Oct 2025

https://github.com/smithsonian/ccn-data-library

The Coastal Carbon Network Data Library: An open-source database featuring carbon data from tidal wetlands around the world

coastal-carbon-network data-synthesis open-source wetland-science

Last synced: 09 Feb 2026

https://github.com/Smithsonian/CCN-Data-Library

The Coastal Carbon Network Data Library: An open-source database featuring carbon data from tidal wetlands around the world

coastal-carbon-network data-synthesis open-source wetland-science

Last synced: 20 Jul 2025

https://github.com/etiennechollet/synthshapes

Generate Synthetic Shapes in 3D for Biomedical Image Augmentation and Synthesis.

biomedical-image-segmentation biomedical-imaging data-synthesis machine-learning

Last synced: 28 Jan 2026

https://github.com/pd-mera/object-detection-data-synthesis

Synthesis data in YOLO format given background and object images

data-synthesis yolo

Last synced: 07 Jun 2026

https://github.com/johanneswiesner/nisynth

A repository for synthesizing and simulating MRI images

brain-imaging brain-mri data-synthesis neuroscience python

Last synced: 18 Aug 2025

https://github.com/ready4-dev/ready4web

Website of the ready4 suite of tools for data synthesis and modelling in mental health

data-synthesis health health-economics mental policy simulation

Last synced: 20 Jan 2026

https://github.com/kazkozdev/llmflow-search

LLMFlow Search agent processes complex queries, deep searches, and synthesizes information from the web.

data-synthesis deep-search intelligent-search llm llm-agent search-agent

Last synced: 15 May 2025