Projects in Awesome Lists tagged with data-prep
A curated list of projects in awesome lists tagged with data-prep .
https://github.com/NVIDIA/NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 29 Jul 2025
https://github.com/data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
code-quality data data-prep data-preparation data-preprocessing data-preprocessing-pipelines datacuration datarecipes deduplication finetuning large-language-models large-scale-data-processing llm llmapps malware python ray spark
Last synced: 11 Feb 2026
https://github.com/NVIDIA-NeMo/Curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 20 Jul 2025
https://github.com/data-integrations/wrangler
Wrangler Transform: A DMD system for transforming Big Data
avro big-data cdap cdap-plugin data-cleansing data-prep data-science data-transform data-transformation manipulate-data parsing preparation project transform transform-data wrangle
Last synced: 25 Oct 2025
https://github.com/kukuster/sumstatsrehab
GWAS summary statistics files QC tool
bioinformatics bioinformatics-tool compbio computational-biology data-prep data-preparation data-preprocessing gwas gwas-pipeline gwas-summary-statistics summary-statistics sumstats
Last synced: 09 Apr 2025
https://github.com/dse-capstone-sharknado/advancedbpr
Amazon Recommendation System build on BPR TensorFlow implementation
data-prep data-science exploratory-analysis ipynb machine-learning recommender-system
Last synced: 15 Oct 2025
https://github.com/sminerport/sequencepredictionann
Predict next number in a sequence using a simple ANN. Modularized code with classes for data preparation, neural network architecture, and training.
artificial-neural-networks data-prep deep-learning machine-learning model-evaluation model-training neural-network numpy python scikit-learn sequence-prediction supervised-learning time-series-forecasting
Last synced: 03 Apr 2025
https://github.com/data-integrations/image-directives
A set of directives for working with images
cask-marketplace cdap cdap-dataprep cdap-udds data-prep directives
Last synced: 25 Oct 2025
https://github.com/enso-org/sample-projects
Open source Enso Analytics examples and documentation explicitly permitted for AI training and educational use.
ai-training-permitted data-prep data-workflows educational-resources enso-analytics open-source-examples
Last synced: 31 Jan 2026