An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with layout-analysis

A curated list of projects in awesome lists tagged with layout-analysis .

https://github.com/opendatalab/mineru

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 06 Jan 2026

https://github.com/opendatalab/MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 24 Mar 2025

https://github.com/breezedeus/pix2text

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

image-to-markdown latex latex-pdf layout-analysis math-formula math-formula-recognition math-ocr mathpix ocr python pytorch table-ocr

Last synced: 14 May 2025

https://github.com/breezedeus/Pix2Text

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

image-to-markdown latex latex-pdf layout-analysis math-formula math-formula-recognition math-ocr mathpix ocr python pytorch table-ocr

Last synced: 22 Apr 2025

https://github.com/rapidai/rapiddoc

📝 针对文档类图像做内容提取,将文档类图像一比一输出到Word或者Txt中,便于进一步使用或处理。后续计划支持输入PDF/图像,输出对应json格式、Txt格式、Word格式和Markdown格式。

layout-analysis layout-recover

Last synced: 28 Dec 2025

https://github.com/rapidai/rapidlayout

Analysis of Chinese and English layouts 中英文版面分析

cdla doclayout-yolo layout layout-analysis pp-structure

Last synced: 16 May 2025

https://github.com/bobld/pdfpigmlnetblockclassifier

Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.

classifier csharp document-layout document-layout-analysis layout-analysis lightgbm machine-learning ml-net pdf pdf-document pdf-document-processor pdfpig publaynet

Last synced: 14 Apr 2025

https://github.com/os-climate/crrf-det

A web application for PDF content and table extraction, featuring image-based visual layout analysis, indexed document search, batch processing and extraction result annotation.

annotation data-extraction layout-analysis pdf table-extraction

Last synced: 12 Apr 2025

https://github.com/colintr/livedesktoptranslator

Live capture your screen and replace textual elements with their translations

electron layout-analysis ocr python translation

Last synced: 26 Mar 2025