Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ahlusar1989/pdf-table-extractor_saran_version1

PDF
https://github.com/ahlusar1989/pdf-table-extractor_saran_version1

Last synced: 23 days ago
JSON representation

PDF

Awesome Lists containing this project

README

        

*PDF Table Extraction Utility.* Analyses a page in a PDF looking
for well delineated table cells, and extracts the text in each cell.
Outputs include JSON, XML, and CSV lists of cell locations, shapes,
and contents, and CSV and HTML versions of the tables. This utility
is intended to be the first step in automatically processing data
in tables from a PDF file, and was originally designed to read the
tables in ST Micro’s datasheets. The script requires numpy and poppler
(pdftoppm and pdftotext)

###License
[MIT Expat](http://ashimagroup.net/os/license/mit-expat)

###Tags
[Utilities](http://ashimagroup.net/os/tag/utilities)