https://github.com/skitsanos/extract-pdf-tables
PDF Tables extraction with Java and Tabula
https://github.com/skitsanos/extract-pdf-tables
cli cli-app command-line command-line-tool java pdf pdf-extractor pdf-table pdf-table-extract pdf-table-extraction
Last synced: 4 months ago
JSON representation
PDF Tables extraction with Java and Tabula
- Host: GitHub
- URL: https://github.com/skitsanos/extract-pdf-tables
- Owner: skitsanos
- Created: 2023-09-05T11:02:36.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-02T08:51:22.000Z (4 months ago)
- Last Synced: 2025-01-02T09:40:30.519Z (4 months ago)
- Topics: cli, cli-app, command-line, command-line-tool, java, pdf, pdf-extractor, pdf-table, pdf-table-extract, pdf-table-extraction
- Language: Java
- Homepage:
- Size: 19.5 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PDF Tables Extractor
Showcasing the use of `tabula` to extract tables from PDF documents
```shell
JSON=$(java -jar "target/extract-pdf-tables-1.0.2-jar-with-dependencies.jar" "{{.FILE}}" | jq '[.[] | select(length > 0)]')# store JSON in extracts.json file
echo $JSON > out/extracts.json# display the first table found
echo $JSON | jq '.[0]'
```