Projects in Awesome Lists tagged with data-processing-pipelines
A curated list of projects in awesome lists tagged with data-processing-pipelines .
https://github.com/NVIDIA/NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 29 Jul 2025
https://github.com/NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 20 Jul 2025
https://github.com/edrewitz/wxdata
A Python package of end-to-end weather data clients & raw data clients with VPN/PROXY support, data processors that decode variable keys from GRIB format into a plain-language format & various tools for assisting Python automated workflows, querying meteorological datasets and filling gaps in meteorological data.
automation data data-clients data-engineering data-engineering-pipeline data-processing data-processing-pipelines data-science meteorology meteorology-library python weather-data
Last synced: 23 May 2026
https://github.com/graphbookai/graphbook
The framework for AI-driven data pipelines. Build interactive, highly efficient data pipelines with PyTorch. ⭐ Leave a star to support us!
ai data-processing data-processing-pipelines data-science framework machine-learning ml pytorch research workflow
Last synced: 07 Sep 2025
https://github.com/tamasgal/thepipe
A simplistic, general purpose pipeline framework.
data-processing data-processing-pipelines data-science hacktoberfest pipelines provenance python
Last synced: 21 Mar 2025
https://github.com/mehanix/dhrw
🎢 IaaS visual editor to create & deploy data processing pipelines - python, rmq, react, meteorjs
computational-graph computational-graphs data-analysis data-engineering data-pipeline data-pipelines data-processing data-processing-and-analysis data-processing-pipelines data-processing-system data-science data-visualization docker-compose good-first-issue help-wanted meteorjs-application rabbitmq react-flow
Last synced: 02 May 2026