Projects in Awesome Lists by Unstructured-IO
A curated list of projects in awesome lists by Unstructured-IO .
https://github.com/unstructured-io/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing
Last synced: 24 Apr 2026
https://github.com/Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing
Last synced: 26 Mar 2025
https://github.com/unstructured-io/pipeline-sec-filings
Preprocessing pipeline notebooks and API supporting text extraction from SEC documents
Last synced: 06 Apr 2025
https://github.com/unstructured-io/unstructured-python-client
A Python client for the Unstructured Platform API
Last synced: 15 May 2025
https://github.com/unstructured-io/unstructured-js-client
A JavaScript/Typescript client for the Unstructured Platform API
Last synced: 06 Oct 2025
https://github.com/unstructured-io/pipeline-paddleocr
Pipeline for converting PDFs to raw text with PaddleOCR
Last synced: 14 Aug 2025
https://github.com/unstructured-io/pipeline-oer
Pipeline for extraction information from Army OERs
Last synced: 06 Apr 2025
https://github.com/unstructured-io/docs
Documentation for all Unstructured products and libraries
Last synced: 06 Apr 2025
https://github.com/unstructured-io/unstructured-api-gui
Unstructured.io API GUI
Last synced: 06 Apr 2025
https://github.com/Unstructured-IO/danswer
Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
Last synced: 27 Feb 2025
https://github.com/unstructured-io/base-images
Store Dockerfiles and Packer configs for images to use as a base to build upon
Last synced: 06 Apr 2025
https://github.com/unstructured-io/pipeline-receipts
Preprocessing pipeline notebooks and API supporting text extraction from receipts images
Last synced: 10 Apr 2025
https://github.com/unstructured-io/pipeline-document-layout
Pipeline for layout extraction
Last synced: 02 Aug 2025
https://github.com/unstructured-io/model-cards
FedRAMP formatted model cards
Last synced: 19 Mar 2026
https://github.com/unstructured-io/aws-blog-post-example
Script to accompany the AWS blog post on unstructured data ETL with Unstructured Ingest library
Last synced: 10 Apr 2025
https://github.com/unstructured-io/js-client-batch
JS Client Batch Processing
Last synced: 10 Apr 2025
https://github.com/unstructured-io/rag-over-hybrid-data-sources
Two sources (S3, ElasticSearch) to RAG DB pipeline.
Last synced: 30 Oct 2025