Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/opendatalab/mineru
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
https://github.com/opendatalab/mineru
ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python
Last synced: about 8 hours ago
JSON representation
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
- Host: GitHub
- URL: https://github.com/opendatalab/mineru
- Owner: opendatalab
- License: agpl-3.0
- Created: 2024-02-29T08:52:34.000Z (7 months ago)
- Default Branch: master
- Last Pushed: 2024-08-06T09:37:00.000Z (about 2 months ago)
- Last Synced: 2024-08-06T10:09:17.635Z (about 2 months ago)
- Topics: ai4science, document-analysis, extract-data, layout-analysis, ocr, parser, pdf, pdf-converter, pdf-extractor-llm, pdf-extractor-pretrain, pdf-extractor-rag, pdf-parser, python
- Language: Python
- Homepage: https://opendatalab.com/OpenSourceTools
- Size: 62.5 MB
- Stars: 6,754
- Watchers: 39
- Forks: 522
- Open Issues: 67
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md