Projects in Awesome Lists tagged with data-preprocessing-pipelines
A curated list of projects in awesome lists tagged with data-preprocessing-pipelines .
https://github.com/IBM/data-prep-kit
Open source project for data preparation of LLM application builders
code-quality data data-prep data-preparation data-preprocessing data-preprocessing-pipelines datacuration datarecipes deduplication finetuning large-language-models large-scale-data-processing llm llmapps malware python ray spark
Last synced: 11 Jan 2025
https://github.com/shamspias/gpt3-data-preprocessing
This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.
artificial-intelligence data-preprocessing data-preprocessing-pipelines data-science gpt-3 machine-learning
Last synced: 04 Dec 2024
https://github.com/firefly-cpp/succulent
Collect POST requests
data-collection data-preprocessing-pipelines data-science esp32 machine-learning raspberry-pi
Last synced: 13 Apr 2025
https://github.com/kolhesamiksha/nemo_curator
This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation
curator data-preprocessing-pipelines finetuning-llms generative-ai nemo nvidia synthetic-dataset-generation
Last synced: 20 Feb 2025