Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with layout-analysis

A curated list of projects in awesome lists tagged with layout-analysis .

https://github.com/opendatalab/mineru

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 16 Dec 2024

https://github.com/opendatalab/MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 29 Oct 2024

https://github.com/breezedeus/Pix2Text

An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

image-to-markdown latex latex-pdf layout-analysis math-formula math-formula-recognition math-ocr mathpix ocr python pytorch table-ocr

Last synced: 09 Nov 2024

https://github.com/breezedeus/pix2text

An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

image-to-markdown latex latex-pdf layout-analysis math-formula math-formula-recognition math-ocr mathpix ocr python pytorch table-ocr

Last synced: 17 Dec 2024

https://github.com/rapidai/rapiddoc

📝 针对文档类图像做内容提取,将文档类图像一比一输出到Word或者Txt中,便于进一步使用或处理。后续计划支持输入PDF/图像,输出对应json格式、Txt格式、Word格式和Markdown格式。

layout-analysis layout-recover

Last synced: 15 Dec 2024

https://github.com/rapidai/rapidlayout

Analysis of Chinese and English layouts 中英文版面分析

cdla doclayout-yolo layout layout-analysis pp-structure

Last synced: 20 Dec 2024

https://github.com/bobld/pdfpigmlnetblockclassifier

Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.

classifier csharp document-layout document-layout-analysis layout-analysis lightgbm machine-learning ml-net pdf pdf-document pdf-document-processor pdfpig publaynet

Last synced: 15 Oct 2024

https://github.com/os-climate/crrf-det

A web application for PDF content and table extraction, featuring image-based visual layout analysis, indexed document search, batch processing and extraction result annotation.

annotation data-extraction layout-analysis pdf table-extraction

Last synced: 07 Nov 2024

https://github.com/colintr/livedesktoptranslator

Live capture your screen and replace textual elements with their translations

electron layout-analysis ocr python translation

Last synced: 03 Dec 2024