Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://github.com/Unstructured-IO/unstructured
data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing
Last synced: 2 months ago
JSON representation
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
- Host: GitHub
- URL: https://github.com/Unstructured-IO/unstructured
- Owner: Unstructured-IO
- License: apache-2.0
- Created: 2022-09-26T21:53:41.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-29T13:03:43.000Z (2 months ago)
- Last Synced: 2024-10-29T14:04:58.631Z (2 months ago)
- Topics: data-pipelines, deep-learning, document-image-analysis, document-image-processing, document-parser, document-parsing, docx, donut, information-retrieval, langchain, llm, machine-learning, ml, natural-language-processing, nlp, ocr, pdf, pdf-to-json, pdf-to-text, preprocessing
- Language: HTML
- Homepage: https://www.unstructured.io/
- Size: 161 MB
- Stars: 8,939
- Watchers: 59
- Forks: 733
- Open Issues: 227
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- project-awesome - Unstructured-IO/unstructured - Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines. (HTML)
- AiTreasureBox - Unstructured-IO/unstructured - 01-07_9695_8](https://img.shields.io/github/stars/Unstructured-IO/unstructured.svg)|Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.| (Repos)
- jimsghstars - Unstructured-IO/unstructured - Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines. (HTML)
- awesome-ml-dev-tools - Unstructured - Python library for preprocessing of PDFs, images, HTML, Word documents etc. (Q&A over documents)
- StarryDivineSky - Unstructured-IO/unstructured
- awesome-production-machine-learning - unstructured - IO/unstructured.svg?style=social) - unstructured streamlines and optimizes the data processing workflow for LLMs, ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. (Data Pipeline)
- Awesome-LLM-RAG-Application - Unstructured
- awesome - Unstructured-IO/unstructured - Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines. (HTML)
- awesome - Unstructured-IO/unstructured - Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines. (HTML)