Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kolhesamiksha/nemo_curator
This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation
https://github.com/kolhesamiksha/nemo_curator
curator data-preprocessing-pipelines finetuning-llms generative-ai nemo nvidia synthetic-dataset-generation
Last synced: about 1 month ago
JSON representation
This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation
- Host: GitHub
- URL: https://github.com/kolhesamiksha/nemo_curator
- Owner: kolhesamiksha
- Created: 2024-12-16T19:02:55.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-12-25T04:39:23.000Z (about 1 month ago)
- Last Synced: 2024-12-25T05:28:29.215Z (about 1 month ago)
- Topics: curator, data-preprocessing-pipelines, finetuning-llms, generative-ai, nemo, nvidia, synthetic-dataset-generation
- Language: Jupyter Notebook
- Homepage:
- Size: 127 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Nemo_Curator
This repository contains a sample text data-preparation code using Nemo Curator for pre-training or synthetic data generation![image](https://github.com/user-attachments/assets/b28fe3ee-fe06-4c51-a2ef-d6cc702cc4ae)