Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with document-analysis

A curated list of projects in awesome lists tagged with document-analysis .

https://github.com/opendatalab/mineru

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 16 Dec 2024

https://github.com/opendatalab/MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 29 Oct 2024

https://github.com/yuliang-liu/curve-text-detector

This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

deep-learning document-analysis object-detection scene-text

Last synced: 21 Dec 2024

https://github.com/Yuliang-Liu/Curve-Text-Detector

This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

deep-learning document-analysis object-detection scene-text

Last synced: 03 Nov 2024

https://github.com/wenwenyu/PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

document-analysis document-understanding graph-convolutional-network graph-learning graph-neural-networks key-information-extraction

Last synced: 11 Nov 2024

https://github.com/ispras/dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

doc document-analysis document-content-extraction documents docx docx-parser excel html html-parser logical-structure-extraction ocr odt pdf pdf-parser scanned-documents table-of-contents table-recognition txt

Last synced: 15 Dec 2024

https://github.com/mirabdullahyaser/retrieval-augmented-generation-engine-with-langchain-and-streamlit

Powerful web application that combines Streamlit, LangChain, and Pinecone to simplify document analysis. Powered by OpenAI's GPT-3, RAG enables dynamic, interactive document conversations, making it ideal for efficient document retrieval and summarization.

artificial-intelligence chat-application document-analysis generative-ai gpt-3 langchain large-language-models natural-language-processing openai-chatgpt question-answering retrieval-augmented-generation streamlit

Last synced: 15 Dec 2024

https://github.com/xyntopia/pydoxtools

Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.

chatgpt document-analysis document-extraction extraction information-retrieval llm nlp pdf python

Last synced: 17 Nov 2024

https://github.com/zeninglin/vibertgrid-pytorch

An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"

document-ai document-analysis information-extraction key-information-extraction visual-information-extraction

Last synced: 30 Oct 2024

https://github.com/aws-solutions/enhanced-document-understanding-on-aws

Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.

document-analysis document-processing

Last synced: 16 Nov 2024

https://github.com/microsoft/synthetic-rag-index

Service to import data from various sources and index it in AI Search. Increases data relevance and reduces final size by 90%+. Useful for RAG scenarios with LLM. Hosted in Azure with serverless architecture.

azure document-analysis few-shot-learning large-language-model llm rag retrieval-augmented-generation serverless

Last synced: 04 Dec 2024

https://github.com/muhd-umer/pyramidtabnet

Official PyTorch implementation of PyramidTabNet: Transformer-based Table Recognition in Image-based Documents

computer-vision deep-learning document-analysis implementation pytorch table-detection table-structure-recognition

Last synced: 08 Nov 2024

https://github.com/omni-us/research-contentdistillation-htr

Source code for ICFHR20 "Distilling Content from Style for Handwritten Word Recognition"

document-analysis generative-adversarial-network handwriting-recognition

Last synced: 08 Nov 2024

https://github.com/arsath-eng/rag1-nvidia-genai

A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.

document-analysis embeddings faiss langchain llama-models llm nvidia-ai-faundry pdf-processing question-answering rag streamlit vector-store

Last synced: 20 Dec 2024

https://github.com/miku/grobidclient

A Go (golang) client for GROBID.

cli document-analysis golang grobid

Last synced: 24 Nov 2024

https://github.com/leg0shii/smart-documents

A web application that enables users to upload documents and utilize AI techniques like semantic search and text summarization for efficient analysis. Built with Python, FastAPI, Svelte, PostgreSQL, and LangChain.

ai document-analysis fastapi langchain semantic-search

Last synced: 29 Oct 2024

https://github.com/x1ao4/doc-merger

通过 python 脚本将两个相对不完整的文档合并为一个完整的文档 / merge two relatively incomplete documents into one complete document via python script

data-analysis data-merging document-analysis document-comparison document-processing documents filtering filtering-data merge merge-documents

Last synced: 08 Nov 2024

https://github.com/coditheck/imgext

Image extraction from document.

document-analysis image-extractor python

Last synced: 03 Dec 2024

https://github.com/alinababer/document-analysis-identification-with-rag-vector-database-and-mistral-llm

This Document Analysis pipeline is a comprehensive document analysis system, designed to automate the processing and analysis of documents from acquisition to consumption. It integrates advanced machine learning & AI models like RAG (Retrieval Augmented Generation) & Mistral LLM to efficiently extract, match, enrich, process document

document-analysis document-analysis-recognition document-pipeline document-uploader llm mistral paddleocr python rag tesseract

Last synced: 16 Dec 2024

https://github.com/dito97/neural-deskew

toolkit for learning efficient document image skew estimation (DISE)

deskewing document-analysis pytorch-2 self-supervised-learning

Last synced: 06 Dec 2024

https://github.com/alinababer/data-science-and-insight-agent-rag-llama3-lava-llm-django-api

Data-Science-and-Insight-Agent-RAG-LLama3-Lava-LLM-Django-WebApplication is an advanced AI-driven chatbot designed to assist in data science, document analysis, and image interpretation. This repository contain the Django based rest apis of this project.

chatbot django document-analysis image-analysis large-language-models lava llama python redis-server rest-api retrival-augmented-generation visual-large-language-models

Last synced: 19 Nov 2024