https://github.com/zevio/pcu_pdf
PDF parser component (Apache Tika) for PCU project
https://github.com/zevio/pcu_pdf
apache component parser pcu pdf pdf-parser-component pdf-to-text python tika
Last synced: about 1 month ago
JSON representation
PDF parser component (Apache Tika) for PCU project
- Host: GitHub
- URL: https://github.com/zevio/pcu_pdf
- Owner: zevio
- License: gpl-3.0
- Created: 2018-09-10T12:02:21.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-11-28T21:39:44.000Z (over 7 years ago)
- Last Synced: 2025-08-20T20:17:10.847Z (10 months ago)
- Topics: apache, component, parser, pcu, pdf, pdf-parser-component, pdf-to-text, python, tika
- Language: Python
- Homepage:
- Size: 53.3 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pcu_pdf (Apache Tika parser for PCU project)
PDF parser component (Apache Tika) for PCU project.
From the path of a PDF file, get its textual content.
Based on [Apache Tika][tika].

----
[Check PCU project][pcu].
[tika]: https://tika.apache.org
[pcu]: https://github.com/zevio/pcu_core
## Usage in another project
If you wish to import this module in another Python project, please install it :
`pip install pcu-pdf`
Then, add this import line at the beginning of your Python file :
`from pcu_pdf import pcu_pdf`
You can now use pcu_pdf's functions, for example :
`pcu_pdf.PDFParser("path/to/pdf/file")`
## Test
To test your installation, go to pcu_pdf/ directory and execute the Makefile with the following command line :
`make test`