An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-preprocessing-pipelines

A curated list of projects in awesome lists tagged with data-preprocessing-pipelines .

https://github.com/shamspias/gpt3-data-preprocessing

This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.

artificial-intelligence data-preprocessing data-preprocessing-pipelines data-science gpt-3 machine-learning

Last synced: 04 Dec 2024

https://github.com/kolhesamiksha/nemo_curator

This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation

curator data-preprocessing-pipelines finetuning-llms generative-ai nemo nvidia synthetic-dataset-generation

Last synced: 20 Feb 2025