Projects in Awesome Lists tagged with datacuration
A curated list of projects in awesome lists tagged with datacuration .
https://github.com/NVIDIA/NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 29 Jul 2025
https://github.com/data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
code-quality data data-prep data-preparation data-preprocessing data-preprocessing-pipelines datacuration datarecipes deduplication finetuning large-language-models large-scale-data-processing llm llmapps malware python ray spark
Last synced: 11 Feb 2026
https://github.com/NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 20 Jul 2025
https://github.com/chapmanjacobd/library
99+ CLI tools to build, browse, and blend your media library
broadcatching cli command-line curation data-collection datacuration datasette-tool ffmpeg ffprobe files folders gallery-dl media mpv music playlist qbittorrent-nox sqlite videos yt-dlp
Last synced: 06 Jan 2026
https://github.com/WDscholia/scholia
Wikidata-based scholarly profiles
bibliography bibliometrics bibtex citations code4lib datacuration dataviz fairdata hacktoberfest latex linked-open-data literature scientometrics sparql wikicite wikidata
Last synced: 27 Mar 2025