https://github.com/data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
https://github.com/data-prep-kit/data-prep-kit
code-quality data data-prep data-preparation data-preprocessing data-preprocessing-pipelines datacuration datarecipes deduplication finetuning large-language-models large-scale-data-processing llm llmapps malware python ray spark
Last synced: 11 days ago
JSON representation
Open source project for data preparation for GenAI applications
- Host: GitHub
- URL: https://github.com/data-prep-kit/data-prep-kit
- Owner: data-prep-kit
- License: apache-2.0
- Created: 2024-04-08T23:43:52.000Z (over 1 year ago)
- Default Branch: dev
- Last Pushed: 2025-12-12T14:48:14.000Z (14 days ago)
- Last Synced: 2025-12-14T06:11:20.141Z (12 days ago)
- Topics: code-quality, data, data-prep, data-preparation, data-preprocessing, data-preprocessing-pipelines, datacuration, datarecipes, deduplication, finetuning, large-language-models, large-scale-data-processing, llm, llmapps, malware, python, ray, spark
- Language: HTML
- Homepage: https://data-prep-kit.github.io/data-prep-kit/
- Size: 245 MB
- Stars: 861
- Watchers: 18
- Forks: 230
- Open Issues: 224
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Governance: GOVERNANCE.md
- Maintainers: MAINTAINERS.md
Awesome Lists containing this project
- awesome-llm - IBM 数据预处理工具包 - 高效处理非结构化数据的开源工具包。 (大型语言模型(LLM)排行榜 / LLM 数据)
- Awesome-LLM - IBM data-prep-kit - Open-Source Toolkit for Efficient Unstructured Data Processing with Pre-built Modules and Local to Cluster Scalability. (LLM Data)