https://github.com/robocorp/example-parse-pdf-invoice
Extract information from PDF invoices
https://github.com/robocorp/example-parse-pdf-invoice
ai library pdf rpaframework text
Last synced: 12 months ago
JSON representation
Extract information from PDF invoices
- Host: GitHub
- URL: https://github.com/robocorp/example-parse-pdf-invoice
- Owner: robocorp
- License: apache-2.0
- Created: 2023-01-31T09:14:27.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-02-07T08:26:04.000Z (over 2 years ago)
- Last Synced: 2025-02-26T16:50:19.802Z (over 1 year ago)
- Topics: ai, library, pdf, rpaframework, text
- Language: Python
- Homepage:
- Size: 206 KB
- Stars: 2
- Watchers: 14
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Extract data from PDF files displaying invoice like information
Show-case multiple ways of extracting information from different kinds of PDF files
(text based or scans), mainly presenting invoice data.
Read more on the
[challenges](https://pypdf.readthedocs.io/en/latest/user/extract-text.html) of getting
information out of PDF files.
## Tasks
### Extract Text Data
Extract textual data from a PDF file.
> Usually this is sufficient for most of the cases.
### Extract element from table in PDF
In some cases, it may be easier to find the elements and their neighbours instead of just parsing the text. In this example we find rows and columns from a table in a PDF document.