https://github.com/zevio/pcu_pdf

PDF parser component (Apache Tika) for PCU project
https://github.com/zevio/pcu_pdf

apache component parser pcu pdf pdf-parser-component pdf-to-text python tika

Last synced: 3 months ago
JSON representation

PDF parser component (Apache Tika) for PCU project

Host: GitHub
URL: https://github.com/zevio/pcu_pdf
Owner: zevio
License: gpl-3.0
Created: 2018-09-10T12:02:21.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2018-11-28T21:39:44.000Z (over 7 years ago)
Last Synced: 2025-08-20T20:17:10.847Z (11 months ago)
Topics: apache, component, parser, pcu, pdf, pdf-parser-component, pdf-to-text, python, tika
Language: Python
Homepage:
Size: 53.3 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # pcu_pdf (Apache Tika parser for PCU project)

PDF parser component (Apache Tika) for PCU project.

From the path of a PDF file, get its textual content.

Based on [Apache Tika][tika].

![pdf](https://framapic.org/3KUuLTR6t4ot/ZK3b8GArxwxC.png)

----

[Check PCU project][pcu].

[tika]: https://tika.apache.org

[pcu]: https://github.com/zevio/pcu_core

## Usage in another project

If you wish to import this module in another Python project, please install it :

`pip install pcu-pdf`

Then, add this import line at the beginning of your Python file :

`from pcu_pdf import pcu_pdf`

You can now use pcu_pdf's functions, for example :

`pcu_pdf.PDFParser("path/to/pdf/file")`

## Test

To test your installation, go to pcu_pdf/ directory and execute the Makefile with the following command line : 

`make test`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zevio/pcu_pdf

Awesome Lists containing this project

README