Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ispras/dedoc
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
https://github.com/ispras/dedoc
doc document-analysis document-content-extraction documents docx docx-parser excel html html-parser logical-structure-extraction ocr odt pdf pdf-parser scanned-documents table-of-contents table-recognition txt
Last synced: 6 days ago
JSON representation
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
- Host: GitHub
- URL: https://github.com/ispras/dedoc
- Owner: ispras
- License: apache-2.0
- Created: 2020-12-07T13:53:27.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2024-12-25T10:04:35.000Z (about 1 month ago)
- Last Synced: 2025-01-20T01:04:09.366Z (13 days ago)
- Topics: doc, document-analysis, document-content-extraction, documents, docx, docx-parser, excel, html, html-parser, logical-structure-extraction, ocr, odt, pdf, pdf-parser, scanned-documents, table-of-contents, table-recognition, txt
- Language: Python
- Homepage:
- Size: 229 MB
- Stars: 204
- Watchers: 12
- Forks: 22
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt