Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jonathanlink/pdflayouttextstripper
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
https://github.com/jonathanlink/pdflayouttextstripper
data-extraction extract java layout pdf pdfbox text
Last synced: about 8 hours ago
JSON representation
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
- Host: GitHub
- URL: https://github.com/jonathanlink/pdflayouttextstripper
- Owner: JonathanLink
- License: apache-2.0
- Created: 2015-10-11T22:49:10.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2023-12-17T17:19:17.000Z (9 months ago)
- Last Synced: 2024-09-21T16:17:34.126Z (4 days ago)
- Topics: data-extraction, extract, java, layout, pdf, pdfbox, text
- Language: Java
- Homepage: https://jonathanlink.ch/PDFLayoutTextStripper.html
- Size: 21.1 MB
- Stars: 1,568
- Watchers: 54
- Forks: 208
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
- License: LICENSE