Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with semantic-deduplication
A curated list of projects in awesome lists tagged with semantic-deduplication .
https://github.com/nvidia/nemo-curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 20 Dec 2024
https://github.com/NVIDIA/NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 27 Nov 2024