Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/IBM/data-prep-kit
Open source project for data preparation of LLM application builders
https://github.com/IBM/data-prep-kit
code-quality data data-prep data-preparation data-preprocessing data-preprocessing-pipelines datacuration datarecipes deduplication finetuning large-language-models large-scale-data-processing llm llmapps malware python ray spark
Last synced: 28 days ago
JSON representation
Open source project for data preparation of LLM application builders
- Host: GitHub
- URL: https://github.com/IBM/data-prep-kit
- Owner: IBM
- License: apache-2.0
- Created: 2024-04-08T23:43:52.000Z (10 months ago)
- Default Branch: dev
- Last Pushed: 2024-12-20T17:25:37.000Z (about 2 months ago)
- Last Synced: 2024-12-20T23:33:18.396Z (about 2 months ago)
- Topics: code-quality, data, data-prep, data-preparation, data-preprocessing, data-preprocessing-pipelines, datacuration, datarecipes, deduplication, finetuning, large-language-models, large-scale-data-processing, llm, llmapps, malware, python, ray, spark
- Language: Python
- Homepage: https://ibm.github.io/data-prep-kit/
- Size: 176 MB
- Stars: 370
- Watchers: 18
- Forks: 143
- Open Issues: 148
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-llm - IBM 数据预处理工具包 - 高效处理非结构化数据的开源工具包。 (大型语言模型(LLM)排行榜 / LLM 数据)
- awesome-llm - IBM 数据预处理工具包 - 高效处理非结构化数据的开源工具包。 (大型语言模型(LLM)排行榜 / LLM 数据)
- Awesome-LLM - IBM data-prep-kit - Open-Source Toolkit for Efficient Unstructured Data Processing with Pre-built Modules and Local to Cluster Scalability. (LLM Data)