Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wzbsocialsciencecenter/pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
https://github.com/wzbsocialsciencecenter/pdftabextract
data-mining image-processing ocr pdf python tables
Last synced: about 12 hours ago
JSON representation
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
- Host: GitHub
- URL: https://github.com/wzbsocialsciencecenter/pdftabextract
- Owner: WZBSocialScienceCenter
- License: apache-2.0
- Created: 2016-07-08T11:44:46.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2022-06-24T09:51:22.000Z (over 2 years ago)
- Last Synced: 2024-09-21T05:02:46.377Z (4 days ago)
- Topics: data-mining, image-processing, ocr, pdf, python, tables
- Language: Python
- Homepage: https://datascience.blog.wzb.eu/2017/02/16/data-mining-ocr-pdfs-using-pdftabextract-to-liberate-tabular-data-from-scanned-documents/
- Size: 138 MB
- Stars: 2,208
- Watchers: 84
- Forks: 369
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- License: LICENSE