Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with layout-analysis
A curated list of projects in awesome lists tagged with layout-analysis .
https://github.com/opendatalab/mineru
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python
Last synced: 01 Oct 2024
https://github.com/layout-parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
computer-vision deep-learning detectron2 document-image-processing document-layout-analysis layout-analysis layout-detection layout-parser object-detection ocr
Last synced: 01 Oct 2024
https://github.com/Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
computer-vision deep-learning detectron2 document-image-processing document-layout-analysis layout-analysis layout-detection layout-parser object-detection ocr
Last synced: 31 Jul 2024
https://github.com/UglyToad/PdfPig
Read and extract text and other content from PDFs in C# (port of PDFBox)
alto-xml csharp document-analysis hocr layout-analysis netstandard page-xml pdf pdf-document pdf-document-processor pdf-extractor pdf-files pdf-generation pdfbox
Last synced: 31 Jul 2024
https://github.com/uglytoad/pdfpig
Read and extract text and other content from PDFs in C# (port of PDFBox)
alto-xml csharp document-analysis hocr layout-analysis netstandard page-xml pdf pdf-document pdf-document-processor pdf-extractor pdf-files pdf-generation pdfbox
Last synced: 01 Oct 2024
https://github.com/breezedeus/Pix2Text
An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
image-to-markdown latex latex-pdf layout-analysis math-formula math-formula-recognition math-ocr mathpix ocr python pytorch table-ocr
Last synced: 02 Aug 2024
https://github.com/breezedeus/pix2text
An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
image-to-markdown latex latex-pdf layout-analysis math-formula math-formula-recognition math-ocr mathpix ocr python pytorch table-ocr
Last synced: 28 Sep 2024
https://github.com/mittagessen/kraken
OCR engine for all the languages
alto-xml handwritten-text-recognition hocr htr layout-analysis neural-networks ocr optical-character-recognition page-xml
Last synced: 30 Jul 2024
https://github.com/BobLd/DocumentLayoutAnalysis
Document Layout Analysis resources repos for development with PdfPig.
alto alto-xml csharp docstrum document-layout-analysis hocr hocr-documents layout-analysis page-segmentation page-xml pdf pdfpig recursive-xy-cut table-extraction tei xy-cut xycut
Last synced: 03 Aug 2024